• Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
  • Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

Removing an array from a JSON file

Collapse
X
  •  
  • Filter
  • Time
  • Show
Clear All
new posts

    Removing an array from a JSON file

    I have a huge JSON file (10m rows and still counting as it's not fully opened yet) however 99.9% is useless to me. There are about 2200 records in the following format

    Code:
    {
    "type" : "FeatureCollection",
    "name" : "%filename%",
    "features" : [
    {
    "type" : "Feature",
    "geometry" : {=}
    "properties" : {=} 
    },
    {
    "type" : "Feature",
    "geometry" : {=}
    "properties" : {=}
    },
    .
    .
    .
    ]
    }
    The problem is the geometry array (if that is the right term) is a list of co-ordinates with anything from 7 - 21 thousand lines that are useless to me, this means the overall file is 5gb+

    At the moment I am going through each record and manaully deleting the array, is there a way to automate it? I don't want to spend an day going through all 2200 records just to be given a new version at some point meaning I will have to do it again.
    Originally posted by Stevie Wonder Boy
    I can't see any way to do it can you please advise?

    I want my account deleted and all of my information removed, I want to invoke my right to be forgotten.

    #2
    Originally posted by SimonMac View Post
    I have a huge JSON file (10m rows and still counting as it's not fully opened yet) however 99.9% is useless to me. There are about 2200 records in the following format

    Code:
    {
    "type" : "FeatureCollection",
    "name" : "%filename%",
    "features" : [
    {
    "type" : "Feature",
    "geometry" : {=}
    "properties" : {=}
    },
    {
    "type" : "Feature",
    "geometry" : {=}
    "properties" : {=}
    },
    .
    .
    .
    ]
    }
    The problem is the geometry array (if that is the right term) is a list of co-ordinates with anything from 7 - 21 thousand lines that are useless to me, this means the overall file is 5gb+

    At the moment I am going through each record and manaully deleting the array, is there a way to automate it? I don't want to spend an day going through all 2200 records just to be given a new version at some point meaning I will have to do it again.
    delete NameOfJsonObject.type.features.geometry would work but that only works if it's in memory.
    merely at clientco for the entertainment

    Comment


      #3
      Might be your edit but that's not valid JSON.

      As you can't have line breaks in a property, you should be able to treat the 'geometry' line as a single line.

      Because it's a large file, you don't want to read it all into memory in one go so this will do 5000 lines at a time:


      Code:
      $batchSize = 5000
      
      Get-Content E:\bigFile.json -ReadCount $batchSize |
      Foreach-Object {
      $_ -replace "^\W*`"geometry`".+,$" |
      Where-Object { -not [String]::IsNullOrWhiteSpace($_) } |
      Add-Content E:\Temp\newFile.json
          }
      Works on a little test file I made. YMMV.

      Comment


        #4
        Originally posted by fulcon View Post
        Might be your edit but that's not valid JSON.

        As you can't have line breaks in a property, you should be able to treat the 'geometry' line as a single line.

        Because it's a large file, you don't want to read it all into memory in one go so this will do 5000 lines at a time:


        Code:
        $batchSize = 5000
        
        Get-Content E:\bigFile.json -ReadCount $batchSize |
        Foreach-Object {
        $_ -replace "^\W*`"geometry`".+,$" |
        Where-Object { -not [String]::IsNullOrWhiteSpace($_) } |
        Add-Content E:\Temp\newFile.json
         }
        Works on a little test file I made. YMMV.
        As Eek will attest I am not a code monkey, what type of script is that? shell?
        Originally posted by Stevie Wonder Boy
        I can't see any way to do it can you please advise?

        I want my account deleted and all of my information removed, I want to invoke my right to be forgotten.

        Comment


          #5
          Based on fulcon's regular expression.

          Open the file up in Notepad++ and use the following search and replace

          Click image for larger version

Name:	Screenshot_117.png
Views:	200
Size:	23.8 KB
ID:	4219164

          Replace the *, with *} if there isn't a , at the end of the geometry line but there should be.
          merely at clientco for the entertainment

          Comment


            #6
            Originally posted by SimonMac View Post

            As Eek will attest I am not a code monkey, what type of script is that? shell?
            It's Windows PowerShell. May work on Mac or Linux if you have PowerShell installed but I can't test that.

            Comment


              #7
              Originally posted by fulcon View Post
              Might be your edit but that's not valid JSON.

              As you can't have line breaks in a property, you should be able to treat the 'geometry' line as a single line.

              Because it's a large file, you don't want to read it all into memory in one go so this will do 5000 lines at a time:


              Code:
              $batchSize = 5000
              
              Get-Content E:\bigFile.json -ReadCount $batchSize |
              Foreach-Object {
              $_ -replace "^\W*`"geometry`".+,$" |
              Where-Object { -not [String]::IsNullOrWhiteSpace($_) } |
              Add-Content E:\Temp\newFile.json
               }
              Works on a little test file I made. YMMV.
              It looks like the original file is GeoJSON, so the geometry property is an object and a simple regex removing single lines isn't going to work.

              Comment


                #8
                Click image for larger version

Name:	regular_expressions.png
Views:	157
Size:	152.9 KB
ID:	4219331

                Comment

                Working...
                X