• Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.

Web scraping

Collapse
X
  •  
  • Filter
  • Time
  • Show
Clear All
new posts

    Web scraping

    We are trying to use a service or some software to trawl the web to obtain competitor prices

    Has anyone ever done anything like this?

    Any suggestions/recommendations?

    TIA!!

    #2
    Originally posted by original PM View Post
    We are trying to use a service or some software to trawl the web to obtain competitor prices

    Has anyone ever done anything like this?

    Any suggestions/recommendations?

    TIA!!
    Services out there that do this.

    e.g. Datafiniti | Intelligent Web Data for Data-Driven Businesses (might be mainly US, but they'll add in stuff to your requirements)

    What product line?

    Comment


      #3
      Originally posted by mudskipper View Post
      Services out there that do this.

      e.g. Datafiniti | Intelligent Web Data for Data-Driven Businesses (might be mainly US, but they'll add in stuff to your requirements)

      What product line?
      Thanks I'll give them a try - and I cannot tell you that because it would probably be too obvious where I work!

      Comment


        #4
        Originally posted by original PM View Post
        Thanks I'll give them a try - and I cannot tell you that because it would probably be too obvious where I work!
        Hee hee.
        'CUK forum personality of 2011 - Winner - Yes really!!!!

        Comment


          #5
          I've done a fair bit of ad hoc web scraping, usually involving a bit of Python knocked together in an hour or so. For example, I've got a script I run occasionally to archive Monday Links to AWS S3 then parse and extract to a database, so I can easily search to make sure I'm not posting the same thing twice. (I caught one I'd already posted about seven years ago recently.)

          If I'd realised people were willing to pay for that kind of thing, I would have made it a plan B ages ago

          Comment


            #6
            Originally posted by NickFitz View Post
            I've done a fair bit of ad hoc web scraping, usually involving a bit of Python knocked together in an hour or so. For example, I've got a script I run occasionally to archive Monday Links to AWS S3 then parse and extract to a database, so I can easily search to make sure I'm not posting the same thing twice. (I caught one I'd already posted about seven years ago recently.)

            If I'd realised people were willing to pay for that kind of thing, I would have made it a plan B ages ago
            Didn't you do tpd too?

            Comment


              #7
              AtW should have been able to sell you the data, already scraped via his backlink scanning service, but last time I raised this as a potential additional income stream he said they didn't retain all the scraped data. Scrapes the whole internet and throws out the majority of the data.

              Plenty of web scraping tools out there, and sure you can scrape a competitors site, but if they catch you and have said not to do such a thing (in their robots.txt and/or T&Cs page or copyright notices) then you risk being 'done'. I suppose the risk increases if you make the data public rather than capture it for internal use only, but if they are clued up they will be checking their web logs for obvious competitor activity.
              Maybe tomorrow, I'll want to settle down. Until tomorrow, I'll just keep moving on.

              Comment


                #8
                Originally posted by NickFitz View Post
                I've done a fair bit of ad hoc web scraping, usually involving a bit of Python knocked together in an hour or so. For example, I've got a script I run occasionally to archive Monday Links to AWS S3 then parse and extract to a database, so I can easily search to make sure I'm not posting the same thing twice. (I caught one I'd already posted about seven years ago recently.)

                If I'd realised people were willing to pay for that kind of thing, I would have made it a plan B ages ago
                I worked with a company that used a 3rd party company to scrape prices (wont say more as will identify the business).

                Apparently, it's a constant battle to keep the scraping software working because the sites being scrapped are constantly changing things around to stop it. I was told the 3rd party employed people to continuously keep the software up to date, doesn't sound fun at all.

                Comment


                  #9
                  Originally posted by woohoo View Post
                  I worked with a company that used a 3rd party company to scrape prices (wont say more as will identify the business).

                  Apparently, it's a constant battle to keep the scraping software working because the sites being scrapped are constantly changing things around to stop it. I was told the 3rd party employed people to continuously keep the software up to date, doesn't sound fun at all.
                  Seems many companies are wasting resource scraping or preventing scraping when the better solution would be for the target to provide an API to sell the info. If it's getting 'stolen' anyway, they may as well make money from it. Also makes it more solid in court if the data is licensed via an appropriate channel and others are stealing the data to avoid paying the licence.
                  Maybe tomorrow, I'll want to settle down. Until tomorrow, I'll just keep moving on.

                  Comment


                    #10
                    Originally posted by BrilloPad View Post
                    Didn't you do tpd too?
                    Yes - that was in PHP rather than Python, though

                    Comment

                    Working...
                    X