• Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
  • Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

Regular Expressions

Collapse
X
  •  
  • Filter
  • Time
  • Show
Clear All
new posts

    Regular Expressions

    I'm looking for a way to match the city Lens in a variety of texts. It has to match " Lens ", " Lens,", " Lens." and "Lens'"

    Does anyone know the best way of doing this with a java/perl regular expression?

    Is this any good?

    String j="Lens";
    String pattern = " "+j+" | "+j+",| "+j+"\\.|"+j+"'";

    And to find any word with a fullstop following does it have to have the fullstop like this \\. ?

    #2
    regex

    of the top of my head cos I cant be bothered to check it
    you need java.util.regex and I think the pattern

    "[ ]*Lens[ ,\.\']+"


    Id expect to find a fullstop with \. \\. I would expect to find a slash folloed by any character.

    Comment


      #3
      Re: regex

      Are the leading spaces in the first three cases mandatory (and not in the fourth)?

      Comment


        #4
        Re: regex

        cant be arsed to optimise but the following Perl regex should work:

        my $RegExp=" Lens( |\,|\.|\')";

        If its performance critical (did not sound like) I'd do it in two stages:

        my $RegExp=" Lens.";

        .' will match any symbol, and in case of match I'd then check using simple switch what the symbol is.

        oh year Perl's question regarding trailing space in 4th stands, I assumed space is required in all cases.

        Comment


          #5
          re

          cheers whats. Perl, the space should be at the start of the 4th as well.

          Comment


            #6
            reex

            Reynolds

            dont forget about start of line conditions ie

            blah blah balh.
            Lens is a wonderful place.
            blah blah


            Of course I forgot java will itself strip \\ down to \ so you were right about the \\. to match a dot. Sorry havent done any java for a long time, I tend to use regex in awk.

            there is a good reference - look at the tutorials at
            www.javaregex.com specifically tutorial 3.

            Comment


              #7
              Re: reex

              Code:
               foreach $s (" Lens ", " Lens,", " Lens.", " Lens'", " Lens\\", "Lens", "Lens ", " Lens" )
                 {
                 if ($s =~ / Lens[ ,\.']/)
                    {
                    print "--$s--\n";
                    }
                 }
              
              
              -- Lens --
              -- Lens,--
              -- Lens.--
              -- Lens'--

              Comment


                #8
                regex

                Ah but if I do that in Java, it takes a lot longer. Basically the scene is - I've got a web crawler looking for news written in Java using multiple JVMs and it needs to recognise certain terms to dump in the DB. Because its multithreaded the algorithm needs to be as tight as possible or else it'll bottleneck.

                Comment

                Working...
                X