• Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
  • Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

Reply to: Web scraping

Collapse

You are not logged in or you do not have permission to access this page. This could be due to one of several reasons:

  • You are not logged in. If you are already registered, fill in the form below to log in, or follow the "Sign Up" link to register a new account.
  • You may not have sufficient privileges to access this page. Are you trying to edit someone else's post, access administrative features or some other privileged system?
  • If you are trying to post, the administrator may have disabled your account, or it may be awaiting activation.

Previously on "Web scraping"

Collapse

  • Cirrus
    replied
    Try Surface Automation

    Most people use Selenium but why not have a go with RaiMan's SikuliX

    Leave a comment:


  • Platypus
    replied
    A friend of mine uses this to great effect for scraping data
    Browser Automation, Data Extraction and Web Testing | iMacros Software

    Leave a comment:


  • NickFitz
    replied
    Originally posted by BrilloPad View Post
    Didn't you do tpd too?
    Yes - that was in PHP rather than Python, though

    Leave a comment:


  • Hobosapien
    replied
    Originally posted by woohoo View Post
    I worked with a company that used a 3rd party company to scrape prices (wont say more as will identify the business).

    Apparently, it's a constant battle to keep the scraping software working because the sites being scrapped are constantly changing things around to stop it. I was told the 3rd party employed people to continuously keep the software up to date, doesn't sound fun at all.
    Seems many companies are wasting resource scraping or preventing scraping when the better solution would be for the target to provide an API to sell the info. If it's getting 'stolen' anyway, they may as well make money from it. Also makes it more solid in court if the data is licensed via an appropriate channel and others are stealing the data to avoid paying the licence.

    Leave a comment:


  • woohoo
    replied
    Originally posted by NickFitz View Post
    I've done a fair bit of ad hoc web scraping, usually involving a bit of Python knocked together in an hour or so. For example, I've got a script I run occasionally to archive Monday Links to AWS S3 then parse and extract to a database, so I can easily search to make sure I'm not posting the same thing twice. (I caught one I'd already posted about seven years ago recently.)

    If I'd realised people were willing to pay for that kind of thing, I would have made it a plan B ages ago
    I worked with a company that used a 3rd party company to scrape prices (wont say more as will identify the business).

    Apparently, it's a constant battle to keep the scraping software working because the sites being scrapped are constantly changing things around to stop it. I was told the 3rd party employed people to continuously keep the software up to date, doesn't sound fun at all.

    Leave a comment:


  • Hobosapien
    replied
    AtW should have been able to sell you the data, already scraped via his backlink scanning service, but last time I raised this as a potential additional income stream he said they didn't retain all the scraped data. Scrapes the whole internet and throws out the majority of the data.

    Plenty of web scraping tools out there, and sure you can scrape a competitors site, but if they catch you and have said not to do such a thing (in their robots.txt and/or T&Cs page or copyright notices) then you risk being 'done'. I suppose the risk increases if you make the data public rather than capture it for internal use only, but if they are clued up they will be checking their web logs for obvious competitor activity.

    Leave a comment:


  • BrilloPad
    replied
    Originally posted by NickFitz View Post
    I've done a fair bit of ad hoc web scraping, usually involving a bit of Python knocked together in an hour or so. For example, I've got a script I run occasionally to archive Monday Links to AWS S3 then parse and extract to a database, so I can easily search to make sure I'm not posting the same thing twice. (I caught one I'd already posted about seven years ago recently.)

    If I'd realised people were willing to pay for that kind of thing, I would have made it a plan B ages ago
    Didn't you do tpd too?

    Leave a comment:


  • NickFitz
    replied
    I've done a fair bit of ad hoc web scraping, usually involving a bit of Python knocked together in an hour or so. For example, I've got a script I run occasionally to archive Monday Links to AWS S3 then parse and extract to a database, so I can easily search to make sure I'm not posting the same thing twice. (I caught one I'd already posted about seven years ago recently.)

    If I'd realised people were willing to pay for that kind of thing, I would have made it a plan B ages ago

    Leave a comment:


  • northernladuk
    replied
    Originally posted by original PM View Post
    Thanks I'll give them a try - and I cannot tell you that because it would probably be too obvious where I work!
    Hee hee.

    Leave a comment:


  • original PM
    replied
    Originally posted by mudskipper View Post
    Services out there that do this.

    e.g. Datafiniti | Intelligent Web Data for Data-Driven Businesses (might be mainly US, but they'll add in stuff to your requirements)

    What product line?
    Thanks I'll give them a try - and I cannot tell you that because it would probably be too obvious where I work!

    Leave a comment:


  • mudskipper
    replied
    Originally posted by original PM View Post
    We are trying to use a service or some software to trawl the web to obtain competitor prices

    Has anyone ever done anything like this?

    Any suggestions/recommendations?

    TIA!!
    Services out there that do this.

    e.g. Datafiniti | Intelligent Web Data for Data-Driven Businesses (might be mainly US, but they'll add in stuff to your requirements)

    What product line?

    Leave a comment:


  • original PM
    started a topic Web scraping

    Web scraping

    We are trying to use a service or some software to trawl the web to obtain competitor prices

    Has anyone ever done anything like this?

    Any suggestions/recommendations?

    TIA!!

Working...
X