Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!
The plan now is to move over to a Java spider using John Cowan's excellent TagSoup parser to turn pages into a series of SAX events which can be fed into an XSLT transform which will output whatever's necessary to put stuff into mySQL, moving gradually over to Amazon SimpleDB. Once that's working, it can run on an Amazon EC2 server under a cron job, so it doesn't depend on me having my laptop open.
Those pages will provide excellent test cases for the new parser... some of the errors in there are really nasty things, over and above peculiar control characters.
And so off to the pub - see you later denizens, and thank you for getting the band to come out in such cold weather BI
Comment