• Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
  • Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

Data matching, merging and cleansing

Collapse
X
  •  
  • Filter
  • Time
  • Show
Clear All
new posts

    Data matching, merging and cleansing

    please

    Anyone know of commercial (price not really an issue) tools that perform automated, realtime matching, merging and cleansing of data based around persons and addresses?

    ie. Robert J Smith, Rob Smith, R J Smythe, Bob John Smith etc

    The tool should cope with building matching rules based on a complex set of fields, give weightings to each field etc.

    For addresses, again intelligent matching. For example transposed or erranous digits in post codes, miss spelled street names etc, house number 4 and four, Salop and Shrops and Shropshire. Rules again would need to be tuneable so that, for instance more weighting is given to postcode than street name.

    The databases involved are pretty huge (SQL Server 2000, 100 million rows +) and the matching and merging of data needs to be very fast as new data is added.

    Anyone know of tools that provide this level of intelligent data cleansing and consolidation?

    Again, forget price (e.g. £100K per license no issue at all).
    Last edited by DimPrawn; 23 November 2005, 17:56.

    #2
    clean names & addresses

    Tried AFD?


    http://www.afd.co.uk/products.asp

    Refiner looked good last time I looked.

    They are one of the market leaders, support were very helpful and the internet product & client is fairly fast. last contact with them was 3 years ago so they have been taken over by Alien lizards.

    Others like Hopweiser and PostcodeAnywhere are in this trade.
    Always forgive your enemies; nothing annoys them so much.

    Comment


      #3
      Thanks Vetran, looks interesting.

      The main issue I have is the volume of data and the fact that the client spec needs to support continous updates and inserts at rates of several per second. Nutters.

      Comment


        #4
        Originally posted by DimPrawn
        Thanks Vetran, looks interesting.

        The main issue I have is the volume of data and the fact that the client spec needs to support continous updates and inserts at rates of several per second. Nutters.
        Quick address PAF is what we always used, but we did a lot of turd polishing before offering the file to PAF. How are you getting your addresses, if they're keyed then you'll also need to do stuff like swear checks etc.

        p.s. Be carefull you do your swearchecks carefully or people in Scunthorpe will never get any

        Comment


          #5
          The data is highly sensitive and needs security clearance to view it and the matching, merging must be done in a secure server environment in real time. Data has been entered by officials and so swearing is unlikely

          Problem with all the systems out there is:

          1. They are too feeble. Must match and merge new data against millions of records in a few milliseconds, all real time, not batch processing.
          2. Too cheap.


          Anyway, keep pointing at systems out there as I'm learning a lot about the types of matching the data feeds involved etc.

          Comment


            #6
            errr

            easy, outsource it to india.
            will solve all your ID card probs.

            do you only need English or also foreign names/addresses ?

            Comment


              #7
              Originally posted by nobody here but us chicke
              easy, outsource it to india.
              will solve all your ID card probs.

              do you only need English or also foreign names/addresses ?
              UK data. Which means a large number of foreign names and addresses of course.

              Comment


                #8
                Another one

                Check this one... http://www.helpit.com/, seems they have the features but they don't cite performance metrics.

                Comment


                  #9
                  If your validating addresses etc at point of entry then I still think Quick Address is your best answer as you can check against the electoral role in real time as well as having a batch option for cleansing large datasets.

                  We used it at News Int for cleansing about 20 million addresses and it handled that no probs, plus we also used it for our data entry systems.

                  All the other systems we looked at (we looked at HelpIt for example) just didn't come up to scratch.

                  That being said, this was 3 years ago, so the other available systems could have got a lot better.

                  Comment


                    #10
                    Trillium data cleansing tools could be what you need.
                    I worked on an implementation at a large US utility company.
                    It's efficient, and wonderfully expensive.

                    http://www.trilliumsoftware.com/site...olutionset.asp

                    Comment

                    Working...
                    X