• Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
  • Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

Reply to: RegExp conundrum

Collapse

You are not logged in or you do not have permission to access this page. This could be due to one of several reasons:

  • You are not logged in. If you are already registered, fill in the form below to log in, or follow the "Sign Up" link to register a new account.
  • You may not have sufficient privileges to access this page. Are you trying to edit someone else's post, access administrative features or some other privileged system?
  • If you are trying to post, the administrator may have disabled your account, or it may be awaiting activation.

Previously on "RegExp conundrum"

Collapse

  • suityou01
    replied
    Originally posted by DimPrawn View Post
    Clearly it doesn't as you are only getting one group back.

    The | is an or, which means one or the other. In this case it doesn't mean both.

    Can't you move on (or around the problem) rather than always banging your head against the wall for day after day?
    Why do you always have to post something contrary? Is it living in Swindon? Not happpy at your work? Try and ease up a bit mate, you'll grow old and bitter.

    I might be getting my terminology wrong as I know bugger all about regex. I am reading up about it, and also have a solution for the client which I have implemented. The thing is, the regex SHOULD return two matches and doesn't.

    This is the code I am using

    private void btnParse_Click(object sender, RoutedEventArgs e)
    {
    //Regex regEx = new Regex(@txtRegExp.ToString());
    Regex regEx = new Regex(@txtRegExp.Text);

    txtCaptures.Text += "************************************************* **********************\n";

    if (regEx.IsMatch(@txtInputString.ToString()))
    {
    string[] groupNames = regEx.GetGroupNames();
    MatchCollection matches = regEx.Matches(@txtInputString.Text);
    Match match = regEx.Match(@txtInputString.Text);

    foreach (Match m in matches)
    {
    foreach (string s in groupNames)
    {
    Group g = m.Groups[s];
    if (g.Success)
    {
    string matchedValue = g.Value;

    txtCaptures.Text += "[" + s.ToString() + "](" + matchedValue + ")\n";
    }
    }
    }
    }
    txtCaptures.Text += "************************************************* **********************\n";
    }
    If I use the following string

    0ABC123456 ABC123456

    and the following Regex

    ^.*\s(?<Reference>(0[a-zA-Z]{3}\d{6}|[a-zA-Z]{3}\d{3,7}))

    I get the following output

    ************************************************** *********************
    [0](0ABC123456 ABC123456)
    [1](ABC123456)
    [Reference](ABC123456)
    ************************************************** *********************
    So you see, once match in [Reference] group, capture, match or whatevet the correct flipping name is. One match. That is all.

    If however I use the following regex

    ^.*\s(?<Reference>(0[a-zA-Z]{3}\d{6}))

    I get the following output

    ************************************************** *********************
    [0]( 0ABC123456)
    [1](0ABC123456)
    [Reference](0ABC123456)
    ************************************************** *********************
    So both sides of the or actually match. And if we take DPs solution which is that a logical or would only return one result then all well and good.

    However, in the first scenario, why did it return the second string and not the first? What gave the second pattern priority? It parses from left to right by default.

    So what happens if we try the one character pattern I mentioned before

    ^.*(?<Reference>(A|C))

    The output is

    ************************************************** *********************
    [0]( 0ABC123456 ABC)
    [1](C)
    [Reference](C)
    ************************************************** *********************
    So why is it ignoring the first A?

    If I make the search lazy

    ^.*?(?<Reference>(A|C))

    It returns

    ************************************************** *********************
    [0]( 0A)
    [1](A)
    [Reference](A)
    ************************************************** *********************

    Leave a comment:


  • Jaws
    replied
    In fact the multiple groups are not even required. It appears your problems lie in the fact you start your expression with the ^ indicating the start of the line / string. The rest of it just says match the following one time only.

    Leave a comment:


  • Jaws
    replied
    http://www.regular-expressions.info/named.html have ever actually read about .net regex ?

    The following worked for me:

    Code:
                Regex r = new Regex(@"^.*\s(?<Reference>0[a-zA-Z]{3}\d{6})|(?<Reference>[a-zA-Z]{3}\d{3,7})", RegexOptions.Multiline);         
                MatchCollection m = r.Matches("sdssd 0ABC123456" + Environment.NewLine + "asds ABC123456");
    VS.NET immediate window output:

    m[0].Groups["Reference"].Captures[0]
    {0ABC123456}
    [System.Text.RegularExpressions.Group]: {0ABC123456}
    Index: 6
    Length: 10
    Value: "0ABC123456"
    m[1].Groups["Reference"].Captures[0]
    {ABC123456}
    [System.Text.RegularExpressions.Group]: {ABC123456}
    Index: 23
    Length: 9
    Value: "ABC123456"
    Last edited by Jaws; 27 August 2009, 19:53. Reason: Code correction...

    Leave a comment:


  • DimPrawn
    replied
    Originally posted by suityou01 View Post
    (?<Reference>(C|G))

    returns both C and G as seperate groups.

    So it is possible to return more than one using |
    Clearly it doesn't as you are only getting one group back.

    The | is an or, which means one or the other. In this case it doesn't mean both.

    Can't you move on (or around the problem) rather than always banging your head against the wall for day after day?

    Leave a comment:


  • suityou01
    replied
    Originally posted by DaveB View Post
    You are matching for exactly 7 digits in the forst check, but there are only 6 in the strings you are testing? If thats correct then only the second check ( between three and 7 ) will match and you only get one result.
    Sorry, I didn't mean that. I was typing at 100 miles an hour. The regex and source data match independantly but not when or'ed.

    Leave a comment:


  • DaveB
    replied
    Originally posted by suityou01 View Post
    [a-z][A-Z]{3}\d{7}

    Means match 3 characters and 7 digits

    eg

    ABC123456 is a match

    [a-z][A-Z]{3}\d{3,7}

    Means match 3 characters followed by between 3 and 7 digits

    so

    ABC123456 is a match.

    The ?<Reference> stuff means return the matches into a group called Reference.

    So what I should get spat out is

    Reference ABC123456
    Reference ABC123456

    I only get one though

    You are matching for exactly 7 digits in the forst check, but there are only 6 in the strings you are testing? If thats correct then only the second check ( between three and 7 ) will match and you only get one result.

    Leave a comment:


  • suityou01
    replied
    Originally posted by DaveB View Post
    Whats the description of what it should be doing? I think I get it from the regex but most of the trouble I've had from them has been in the translation from what I want to do to what I tell the regex to do.
    [a-z][A-Z]{3}\d{7}

    Means match 3 characters and 7 digits

    eg

    ABC123456 is a match

    [a-z][A-Z]{3}\d{3,7}

    Means match 3 characters followed by between 3 and 7 digits

    so

    ABC123456 is a match.

    The ?<Reference> stuff means return the matches into a group called Reference.

    So what I should get spat out is

    Reference ABC123456
    Reference ABC123456

    I only get one though

    Leave a comment:


  • suityou01
    replied
    Originally posted by DimPrawn View Post
    I think you are wrong (but I might be wrong too).

    I would expect that to return one match, whichever of the two regexes between the | matches the string first.

    This is because your named capture applies to both the expressions between the |

    HTH
    (?<Reference>(C|G))

    returns both C and G as seperate groups.

    So it is possible to return more than one using |

    Leave a comment:


  • DimPrawn
    replied
    Originally posted by suityou01 View Post
    It should return both matches.

    ABC123456 ABC123456

    So in .Net you would execute

    Matches m = regexp.matches(theString);

    And then iterate through the matches. There should be two matchs, each containing a group called "Reference" that contains one of the about strings.
    I think you are wrong (but I might be wrong too).

    I would expect that to return one match, whichever of the two regexes between the | matches the string first.

    This is because your named capture applies to both the expressions between the |

    HTH

    Leave a comment:


  • DaveB
    replied
    Originally posted by suityou01 View Post
    It should return both matches.

    ABC123456 ABC123456

    So in .Net you would execute

    Matches m = regexp.matches(theString);

    And then iterate through the matches. There should be two matchs, each containing a group called "Reference" that contains one of the about strings.
    I meant the meaning of the actual regex written in out in long hand.

    Edit - having a brain fart and misreading / remembering regex structure. it has been a while.
    Last edited by DaveB; 27 August 2009, 14:17.

    Leave a comment:


  • suityou01
    replied
    Originally posted by DaveB View Post
    Whats the description of what it should be doing? I think I get it from the regex but most of the trouble I've had from them has been in the translation from what I want to do to what I tell the regex to do.
    It should return both matches.

    ABC123456 ABC123456

    So in .Net you would execute

    Matches m = regexp.matches(theString);

    And then iterate through the matches. There should be two matchs, each containing a group called "Reference" that contains one of the about strings.

    Leave a comment:


  • DaveB
    replied
    Originally posted by suityou01 View Post
    Driving me nuts this.

    Consider the following string

    0ABC123456 ABC123456

    The following regexp is used through .Net regexp



    Which in theory should return two matches as both patterns return a match, and they are "|" seperated.

    It only returns the last one.

    If I make the match lazy, it returns only the first one as expected.

    eg



    Any regexp buffs out there?
    Whats the description of what it should be doing? I think I get it from the regex but most of the trouble I've had from them has been in the translation from what I want to do to what I tell the regex to do.

    Leave a comment:


  • suityou01
    started a topic RegExp conundrum

    RegExp conundrum

    Driving me nuts this.

    Consider the following string

    0ABC123456 ABC123456

    The following regexp is used through .Net regexp

    ^.*\s(?<Reference>(0[a-zA-Z]{3}\d{6}|[a-zA-Z]{3}\d{3,7}))
    Which in theory should return two matches as both patterns return a match, and they are "|" seperated.

    It only returns the last one.

    If I make the match lazy, it returns only the first one as expected.

    eg

    ^.*?\s(?<Reference>(0[a-zA-Z]{3}\d{6}|[a-zA-Z]{3}\d{3,7}))
    Any regexp buffs out there?
Working...
X