
I’ve had to do a lot of work on the import of posts to ensure it goes smoothly, taking into account stuff relating to the work I did for the vB5 migration back then. I was pretty sure it was good to go, so I cleaned down the database and started a new import. All the previous steps went smoothly and it started importing posts… and it seemed incredibly slow

I left it running for an hour to give it a chance, but by the end of that I did a quick calculation and concluded that it would take it around forty days to import them all! Clearly, something was wrong

So I tried single-stepping through the top level function, and it turned out one function call was taking a few seconds to run. All that function does is run a SQL query and return the results, and this was one of the bits I’d had to modify; but I didn’t think I’d done anything too drastic

But it was using a table I’d added to the vB5 database back in the day and, on further investigation, I realised that one of the columns it was using in there didn’t have an index set! This column was only retrieved as one of the columns in a SELECT by the stuff I added to vB5, whereas in the importer, it’s used in the WHERE part of the query. So this meant the DB was having to scan this table, which contains millions of rows, repeatedly. That was the decision I made back then - I added indexes for the columns the vB5 code made use of, but didn’t bother for the one that’s now suddenly important

In addition, the way the importers work in the new stuff is that they’ll set a time limit for processing a chunk of data, apparently so they can output a progress message even if running on very slow shared hosting and so on, and pick up where they left off if they didn’t complete that chunk - but they fetch a new chunk with the current starting position from the DB. So it was going and getting 500 rows of data, which took several seconds because of the missing index, then processing the first row - after which it found it was over its time limit! This meant it was processing one row at a time, and taking loads more time for each one than it should have taken it to process 500 rows

I added an index for that column, and now it’s running along at a cracking pace








Leave a comment: