• Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
  • Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

Can SQLserver do things like Oracle's regular expression

Collapse
X
  •  
  • Filter
  • Time
  • Show
Clear All
new posts

    Can SQLserver do things like Oracle's regular expression

    Hi guys,

    In Oracle, I am loading some staging data but cleansing them using regular expressions (as part of the retrieval) to make sure our destination tables does not contain duff data. e.g. making sure names are valid

    Such constraints seem to be lacking from our data provider and they can reverse my oracle pl/sql code into sql server code since regex is not ANSI.

    so whats the best way in SQL Server to determine that the data/column below is incorrect as a name ? :

    css_jay99
    css_jay
    cs?jay
    css'jay
    cs-ja-y


    cheers

    #2
    are you loading the data using TSQL or are you using SSIS as an ETL tool?

    If using SSIS you could use a transformation component such as SQLIS | Regular Expression Transformation

    or you could use standard functionality and code your own using a scipt task in the Data Flow task.

    Or in TSQL this link gives you a pretty comprehensive breakdown of your options

    Pattern Matching (Regex) in T-SQL | SQL Lion

    Comment


      #3
      Originally posted by css_jay99 View Post
      Hi guys,

      In Oracle, I am loading some staging data but cleansing them using regular expressions (as part of the retrieval) to make sure our destination tables does not contain duff data. e.g. making sure names are valid

      Such constraints seem to be lacking from our data provider and they can reverse my oracle pl/sql code into sql server code since regex is not ANSI.

      so whats the best way in SQL Server to determine that the data/column below is incorrect as a name ? :

      css_jay99
      css_jay
      cs?jay
      css'jay
      cs-ja-y


      cheers
      Which version of SQL Server are you using?
      Knock first as I might be balancing my chakras.

      Comment


        #4
        Originally posted by suityou01 View Post
        Which version of SQL Server are you using?
        You need at least 2 years technical experience before posting on this...

        Comment


          #5
          Originally posted by stek View Post
          You need at least 2 years technical experience before posting on this...
          LOL!!!

          OP - which of those do you consider a 'name'?

          On the basis of "Better to leave a thousand dupes/duds festering in your database than to delete one good record", Data cleansing is never guaranteed to completely clean your data. This could be a HUGE undertaking.

          Because, once you've id'ed the bad data, how are you going to handle it? Delete the records or alter them? If the latter, read on.

          Lets say we delete all entries with a '-'. One of your records contained such a hyphen. But so too do the surnames of almost 10% of married cows who don't want to relinquish their maiden names. Also, what about the toffs? Shall we delete all the Smythe-Farquarsons from your database too??

          I return to my original question; how do you define a 'name'?

          Comment


            #6
            Originally posted by Fandango View Post
            are you loading the data using TSQL or are you using SSIS as an ETL tool?

            If using SSIS you could use a transformation component such as SQLIS | Regular Expression Transformation

            or you could use standard functionality and code your own using a scipt task in the Data Flow task.

            Or in TSQL this link gives you a pretty comprehensive breakdown of your options

            Pattern Matching (Regex) in T-SQL | SQL Lion
            Originally posted by suityou01 View Post
            Which version of SQL Server are you using?
            Originally posted by SQLSwerver View Post
            LOL!!!

            OP - which of those do you consider a 'name'?

            On the basis of "Better to leave a thousand dupes/duds festering in your database than to delete one good record", Data cleansing is never guaranteed to completely clean your data. This could be a HUGE undertaking.

            Because, once you've id'ed the bad data, how are you going to handle it? Delete the records or alter them? If the latter, read on.

            Lets say we delete all entries with a '-'. One of your records contained such a hyphen. But so too do the surnames of almost 10% of married cows who don't want to relinquish their maiden names. Also, what about the toffs? Shall we delete all the Smythe-Farquarsons from your database too??

            I return to my original question; how do you define a 'name'?


            Maybe I started of not explaining myself properly.

            I work purely on Oracle so issue is not at my end.
            I don't do Sql server and dont have a clue what version of sql server the data provider is using or what process they are trying to use to load the data into our staging table.

            They are to load data into a staging table on our oracle database and I noticed that lots of the data contain crappy surnames. Its no issue for my transformation process because I ignore bad names using a routine written with regexp ... but I would just rather not have such crap in the staging environment

            So I told third party to clean their data before sending it to us and told them what I considered good data. To speed things along, I sent a snipped of my code and the reply was they cant do such things in SQL server ....... I think they assumed my cleansing process was written purely in ANSI SQL which regexp is not!

            My query here is to prove them wrong ...unless sql server is simply not up to such an easy task by most programing languages !

            SQLSwerver - The point is that none of the names I listed above are valid for a start ..... since we don't care about loading/cleansing such info

            Comment


              #7
              Originally posted by Fandango View Post
              Or in TSQL this link gives you a pretty comprehensive breakdown of your options

              Pattern Matching (Regex) in T-SQL | SQL Lion
              That links looks like the solution, I'll forward it on to their developer

              Comment


                #8
                Originally posted by SQLSwerver View Post
                One of your records contained such a hyphen. But so too do the surnames of almost 10% of married cows who don't want to relinquish their maiden names.
                Misogyny as well as racism.

                Nice.

                Comment


                  #9
                  Originally posted by stek View Post
                  You need at least 2 years technical experience before posting on this...
                  Can we also insist on 2 references confirming that experience?
                  merely at clientco for the entertainment

                  Comment


                    #10
                    Originally posted by stek View Post
                    You need at least 2 years technical experience before posting on this...
                    All good then

                    Originally posted by eek View Post
                    Can we also insist on 2 references confirming that experience?
                    If you must. Inbox me for references.
                    Knock first as I might be balancing my chakras.

                    Comment

                    Working...
                    X