• Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
  • Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

Anyone any good at regexp?

Collapse
X
  •  
  • Filter
  • Time
  • Show
Clear All
new posts

    #31
    Originally posted by suityou01 View Post
    And I don't feel like I've drip fed anything. My original question was, can someone give me some regexp to parse this specific string. No more, no less.
    Is there anything you're good at except getting canned?
    ǝןqqıʍ

    Comment


      #32
      Originally posted by original PM View Post
      Yeah I was a PM so have no technical knowledge at all...

      but when will it be completed by I have a box to tick?

      If you have any say in the format of the incoming string, I'd try and steer them into using a CSV format from which each field can be split by using the built-in C# CSV Parser or a 3rd party CSV parser.

      You may even be able to specify the delimiter character with those. (The default will be commas, but you may prefer colons.)

      Use of a proper CSV parser avoids issues with quoted delimiters, e.g. "a,b","c","d" which actually comprise only the three fields "a,b" and "c" and "d". It also even allows multi-line fields.

      The string parsing, however you do it, and the database handling are two different issues. So I would try and keep those conceptually separate.

      edit: I think CSV even supports two levels of delimiter. So you could parse something like "a=1,b=3,c=4" into a handy structure by specifying "," as the top-level delimiter and "=" as the next level delimiter. But that said, CSV means different things to different people and probably not all parser implementations support multi-level parsing.
      Last edited by OwlHoot; 30 July 2014, 15:11.
      Work in the public sector? Read the IR35 FAQ here

      Comment


        #33
        Originally posted by darmstadt View Post
        No you didn't
        All in post #4

        HTH
        Knock first as I might be balancing my chakras.

        Comment


          #34
          Originally posted by OwlHoot View Post
          If you have any say in the format of the incoming string, I'd try and steer them into using a CSV format from which each field can be split by using the built-in C# CSV Parser or a 3rd party CSV parser.

          You may even be able to specify the delimiter character with those. (The default will be commas, but you may prefer colons.)

          Use of a proper CSV parser avoids issues with quoted delimiters, e.g. "a,b","c","d" which actually comprise only the three fields "a,b" and "c" and "d". It also even allows multi-line fields.

          The string parsing, however you do it, and the database handling are two different issues. So I would try and keep those conceptually separate.
          Tricky one to sell as ClientCo want regex.

          I do take your point though.
          Knock first as I might be balancing my chakras.

          Comment


            #35
            Can't you just split on ':' ?

            Comment


              #36
              I just use StackOverflow for Regex questions. Agree about them being a pig though.

              Recently I've gotten into using regex-search in Visual C++... or even regex-find-and-replace. You can do pretty fancy stuff except MS use their own regex syntax (of course they do).
              Originally posted by MaryPoppins
              I'd still not breastfeed a nazi
              Originally posted by vetran
              Urine is quite nourishing

              Comment


                #37
                I'm not even sure it's possible with a regex.

                Thisisalargestring: And here is some data up until the next space character Thisisanotherlargestring: And here is some more data

                "And here is some data up until the next space character" is full of spaces so how are you supposed to know that the last space is the one to stop at? Maybe because "Thisisanotherlargestring" is followed by a colon? But you don't want to include "Thisisanotherlargestring" in the match so you need to look-ahead. Now it's getting beyond my knowledge of regexes.

                Comment


                  #38
                  Originally posted by FiveTimes View Post
                  Can't you just split on ':' ?
                  Exactly, even if that is just a first step to then using regexps for individual fields.

                  That's why I suggested using CSVs (which despite standing for "comma-separated values" can just as well use colons as field delimiters)
                  Work in the public sector? Read the IR35 FAQ here

                  Comment


                    #39
                    Originally posted by OwlHoot View Post
                    Exactly, even if that is just a first step to then using regexps for individual fields.

                    That's why I suggested using CSVs (which despite standing for "comma-separated values" can just as well use colons as field delimiters)
                    And then clientco say oh, suity we want to retrieve this piece of data and it looks like :

                    *12345*

                    I'm not so sure on why everyone is so hung up on why I want to use regexp for parsing unstructured data.

                    Thanks for all your input, it is appreciated.
                    Knock first as I might be balancing my chakras.

                    Comment


                      #40
                      You still haven't said how you want to parse it. Do you want to extract all uppercase letters? All hexadecimal digits that are odd? All non-space characters?

                      In default of any clear specification, I shall make the assumption that FolderGUID, Outcome, and DataItem are field labels, the field value follows the label after a colon, and fields are delimited by a single space or the end of the line.

                      If that is the case then, in JavaScript (because I can test that in the browser, and I'm not firing up a C# compiler for a no-brainer like this) the following regular expression:

                      Code:
                      /FolderGUID:([^ ]*) Outcome:([^ ]*) DataItem:(.*$)/
                      will give the following array when presented with the input you specified (which is repeated as the complete match at element 0):

                      Code:
                      [
                          "FolderGUID:67bfabff-ad78-4d30-918e-811dd2636f83 Outcome:Accept DataItem:SomeData",
                          "67bfabff-ad78-4d30-918e-811dd2636f83",
                          "Accept",
                          "SomeData"
                      ]
                      To generalise it into a function that takes the input as a (single line) string and returns an object with the extracted values as named properties:

                      Code:
                      function parseLineInWhatOneAssumesToBeTheRequiredWay(line) {
                          var pieces = line.match(/FolderGUID:([^ ]*) Outcome:([^ ]*) DataItem:(.*$)/);
                          if (pieces) {
                              return {
                                  folderGUID: pieces[1],
                                  outcome: pieces[2],
                                  dataItem: pieces[3]
                              };
                          }
                          return null;
                      }
                      Last edited by NickFitz; 30 July 2014, 16:42. Reason: Oops, wrong field names :-)

                      Comment

                      Working...
                      X