• Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
  • Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

Regular Expressions and Special Characters

Collapse
X
  •  
  • Filter
  • Time
  • Show
Clear All
new posts

    Regular Expressions and Special Characters

    Does anyone know of an easy way to use regular expressions to replace special characters?

    I'm trying to compare data from two tables to see where there are discrepancies, but a lot of them contain special characters. The business have said that if the character is the same without any accent, then that's OK by them - so if one system says "n" and the other says "ñ" then they want that to match.

    So, I was hoping to do a regular expression replace on the text in both tables to replace "ñ" with "n" in both systems, and the same for a,e,i,o,u etc. etc.

    Apart from going through a character map, does anyone know of a regular expression syntax that I can use to do the replacement?
    Best Forum Advisor 2014
    Work in the public sector? You can read my FAQ here
    Click here to get 15% off your first year's IPSE membership

    #2
    Would in not be better to find all the accented letters? And replace those or add them to your regex

    A simple regex will do that.

    Comment


      #3
      What language are you using? The term you need to google is diacritics.

      Comment


        #4
        Is this Oracle?

        Linguistic Sorting and String Searching

        Comment


          #5
          If you're using (or could use) an up to date build of PERL you can use Unicode::Normalize to do it for you

          Unicode::Normalize - perldoc.perl.org

          In fact any language that explicitly supports Unicode should have a Normalization module available
          Last edited by DaveB; 24 July 2014, 08:49.
          "Being nice costs nothing and sometimes gets you extra bacon" - Pondlife.

          Comment


            #6
            Originally posted by rashm2k View Post
            Would in not be better to find all the accented letters? And replace those or add them to your regex

            A simple regex will do that.
            That's one way to do it, but I was wondering if there was a standard way to do it - for example, Oracle has a [:punct:] option which will remove all punctuation.

            At the moment, I've gone looking for special characters, but since we are multi-muilti-lingual, there are lots of characters that I could expect, and if I've missed one then I'll need to make more changes to add in extra characters.
            Best Forum Advisor 2014
            Work in the public sector? You can read my FAQ here
            Click here to get 15% off your first year's IPSE membership

            Comment


              #7
              Originally posted by mudskipper View Post
              Thanks - there's some things in there that I might need to plug into my code.
              Best Forum Advisor 2014
              Work in the public sector? You can read my FAQ here
              Click here to get 15% off your first year's IPSE membership

              Comment


                #8
                Originally posted by TheFaQQer View Post
                Thanks
                We should have a button for that....

                Comment


                  #9
                  Originally posted by mudskipper View Post
                  We should have a button for that....
                  ... although a pint would be equally acceptable.

                  Comment


                    #10
                    Originally posted by mudskipper View Post
                    ... although a pint would be equally acceptable.
                    It wasn't THAT helpful

                    Have a instead.
                    Best Forum Advisor 2014
                    Work in the public sector? You can read my FAQ here
                    Click here to get 15% off your first year's IPSE membership

                    Comment

                    Working...
                    X