• Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
  • Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

RegExp conundrum

Collapse
X
  •  
  • Filter
  • Time
  • Show
Clear All
new posts

    RegExp conundrum

    Driving me nuts this.

    Consider the following string

    0ABC123456 ABC123456

    The following regexp is used through .Net regexp

    ^.*\s(?<Reference>(0[a-zA-Z]{3}\d{6}|[a-zA-Z]{3}\d{3,7}))
    Which in theory should return two matches as both patterns return a match, and they are "|" seperated.

    It only returns the last one.

    If I make the match lazy, it returns only the first one as expected.

    eg

    ^.*?\s(?<Reference>(0[a-zA-Z]{3}\d{6}|[a-zA-Z]{3}\d{3,7}))
    Any regexp buffs out there?
    Knock first as I might be balancing my chakras.

    #2
    Originally posted by suityou01 View Post
    Driving me nuts this.

    Consider the following string

    0ABC123456 ABC123456

    The following regexp is used through .Net regexp



    Which in theory should return two matches as both patterns return a match, and they are "|" seperated.

    It only returns the last one.

    If I make the match lazy, it returns only the first one as expected.

    eg



    Any regexp buffs out there?
    Whats the description of what it should be doing? I think I get it from the regex but most of the trouble I've had from them has been in the translation from what I want to do to what I tell the regex to do.
    "Being nice costs nothing and sometimes gets you extra bacon" - Pondlife.

    Comment


      #3
      Originally posted by DaveB View Post
      Whats the description of what it should be doing? I think I get it from the regex but most of the trouble I've had from them has been in the translation from what I want to do to what I tell the regex to do.
      It should return both matches.

      ABC123456 ABC123456

      So in .Net you would execute

      Matches m = regexp.matches(theString);

      And then iterate through the matches. There should be two matchs, each containing a group called "Reference" that contains one of the about strings.
      Knock first as I might be balancing my chakras.

      Comment


        #4
        Originally posted by suityou01 View Post
        It should return both matches.

        ABC123456 ABC123456

        So in .Net you would execute

        Matches m = regexp.matches(theString);

        And then iterate through the matches. There should be two matchs, each containing a group called "Reference" that contains one of the about strings.
        I meant the meaning of the actual regex written in out in long hand.

        Edit - having a brain fart and misreading / remembering regex structure. it has been a while.
        Last edited by DaveB; 27 August 2009, 14:17.
        "Being nice costs nothing and sometimes gets you extra bacon" - Pondlife.

        Comment


          #5
          Originally posted by suityou01 View Post
          It should return both matches.

          ABC123456 ABC123456

          So in .Net you would execute

          Matches m = regexp.matches(theString);

          And then iterate through the matches. There should be two matchs, each containing a group called "Reference" that contains one of the about strings.
          I think you are wrong (but I might be wrong too).

          I would expect that to return one match, whichever of the two regexes between the | matches the string first.

          This is because your named capture applies to both the expressions between the |

          HTH

          Comment


            #6
            Originally posted by DimPrawn View Post
            I think you are wrong (but I might be wrong too).

            I would expect that to return one match, whichever of the two regexes between the | matches the string first.

            This is because your named capture applies to both the expressions between the |

            HTH
            (?<Reference>(C|G))

            returns both C and G as seperate groups.

            So it is possible to return more than one using |
            Knock first as I might be balancing my chakras.

            Comment


              #7
              Originally posted by DaveB View Post
              Whats the description of what it should be doing? I think I get it from the regex but most of the trouble I've had from them has been in the translation from what I want to do to what I tell the regex to do.
              [a-z][A-Z]{3}\d{7}

              Means match 3 characters and 7 digits

              eg

              ABC123456 is a match

              [a-z][A-Z]{3}\d{3,7}

              Means match 3 characters followed by between 3 and 7 digits

              so

              ABC123456 is a match.

              The ?<Reference> stuff means return the matches into a group called Reference.

              So what I should get spat out is

              Reference ABC123456
              Reference ABC123456

              I only get one though
              Knock first as I might be balancing my chakras.

              Comment


                #8
                Originally posted by suityou01 View Post
                [a-z][A-Z]{3}\d{7}

                Means match 3 characters and 7 digits

                eg

                ABC123456 is a match

                [a-z][A-Z]{3}\d{3,7}

                Means match 3 characters followed by between 3 and 7 digits

                so

                ABC123456 is a match.

                The ?<Reference> stuff means return the matches into a group called Reference.

                So what I should get spat out is

                Reference ABC123456
                Reference ABC123456

                I only get one though

                You are matching for exactly 7 digits in the forst check, but there are only 6 in the strings you are testing? If thats correct then only the second check ( between three and 7 ) will match and you only get one result.
                "Being nice costs nothing and sometimes gets you extra bacon" - Pondlife.

                Comment


                  #9
                  Originally posted by DaveB View Post
                  You are matching for exactly 7 digits in the forst check, but there are only 6 in the strings you are testing? If thats correct then only the second check ( between three and 7 ) will match and you only get one result.
                  Sorry, I didn't mean that. I was typing at 100 miles an hour. The regex and source data match independantly but not when or'ed.
                  Knock first as I might be balancing my chakras.

                  Comment


                    #10
                    Originally posted by suityou01 View Post
                    (?<Reference>(C|G))

                    returns both C and G as seperate groups.

                    So it is possible to return more than one using |
                    Clearly it doesn't as you are only getting one group back.

                    The | is an or, which means one or the other. In this case it doesn't mean both.

                    Can't you move on (or around the problem) rather than always banging your head against the wall for day after day?

                    Comment

                    Working...
                    X