• Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
  • Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

You are not logged in or you do not have permission to access this page. This could be due to one of several reasons:

  • You are not logged in. If you are already registered, fill in the form below to log in, or follow the "Sign Up" link to register a new account.
  • You may not have sufficient privileges to access this page. Are you trying to edit someone else's post, access administrative features or some other privileged system?
  • If you are trying to post, the administrator may have disabled your account, or it may be awaiting activation.

Previously on "Removing an array from a JSON file"

Collapse

  • TheDude
    replied
    Click image for larger version

Name:	regular_expressions.png
Views:	179
Size:	152.9 KB
ID:	4219331

    Leave a comment:


  • NickFitz
    replied
    Originally posted by fulcon View Post
    Might be your edit but that's not valid JSON.

    As you can't have line breaks in a property, you should be able to treat the 'geometry' line as a single line.

    Because it's a large file, you don't want to read it all into memory in one go so this will do 5000 lines at a time:


    Code:
    $batchSize = 5000
    
    Get-Content E:\bigFile.json -ReadCount $batchSize |
    Foreach-Object {
    $_ -replace "^\W*`"geometry`".+,$" |
    Where-Object { -not [String]::IsNullOrWhiteSpace($_) } |
    Add-Content E:\Temp\newFile.json
     }
    Works on a little test file I made. YMMV.
    It looks like the original file is GeoJSON, so the geometry property is an object and a simple regex removing single lines isn't going to work.

    Leave a comment:


  • fulcon
    replied
    Originally posted by SimonMac View Post

    As Eek will attest I am not a code monkey, what type of script is that? shell?
    It's Windows PowerShell. May work on Mac or Linux if you have PowerShell installed but I can't test that.

    Leave a comment:


  • eek
    replied
    Based on fulcon's regular expression.

    Open the file up in Notepad++ and use the following search and replace

    Click image for larger version

Name:	Screenshot_117.png
Views:	221
Size:	23.8 KB
ID:	4219164

    Replace the *, with *} if there isn't a , at the end of the geometry line but there should be.

    Leave a comment:


  • SimonMac
    replied
    Originally posted by fulcon View Post
    Might be your edit but that's not valid JSON.

    As you can't have line breaks in a property, you should be able to treat the 'geometry' line as a single line.

    Because it's a large file, you don't want to read it all into memory in one go so this will do 5000 lines at a time:


    Code:
    $batchSize = 5000
    
    Get-Content E:\bigFile.json -ReadCount $batchSize |
    Foreach-Object {
    $_ -replace "^\W*`"geometry`".+,$" |
    Where-Object { -not [String]::IsNullOrWhiteSpace($_) } |
    Add-Content E:\Temp\newFile.json
     }
    Works on a little test file I made. YMMV.
    As Eek will attest I am not a code monkey, what type of script is that? shell?

    Leave a comment:


  • fulcon
    replied
    Might be your edit but that's not valid JSON.

    As you can't have line breaks in a property, you should be able to treat the 'geometry' line as a single line.

    Because it's a large file, you don't want to read it all into memory in one go so this will do 5000 lines at a time:


    Code:
    $batchSize = 5000
    
    Get-Content E:\bigFile.json -ReadCount $batchSize |
    Foreach-Object {
    $_ -replace "^\W*`"geometry`".+,$" |
    Where-Object { -not [String]::IsNullOrWhiteSpace($_) } |
    Add-Content E:\Temp\newFile.json
        }
    Works on a little test file I made. YMMV.

    Leave a comment:


  • eek
    replied
    Originally posted by SimonMac View Post
    I have a huge JSON file (10m rows and still counting as it's not fully opened yet) however 99.9% is useless to me. There are about 2200 records in the following format

    Code:
    {
    "type" : "FeatureCollection",
    "name" : "%filename%",
    "features" : [
    {
    "type" : "Feature",
    "geometry" : {=}
    "properties" : {=}
    },
    {
    "type" : "Feature",
    "geometry" : {=}
    "properties" : {=}
    },
    .
    .
    .
    ]
    }
    The problem is the geometry array (if that is the right term) is a list of co-ordinates with anything from 7 - 21 thousand lines that are useless to me, this means the overall file is 5gb+

    At the moment I am going through each record and manaully deleting the array, is there a way to automate it? I don't want to spend an day going through all 2200 records just to be given a new version at some point meaning I will have to do it again.
    delete NameOfJsonObject.type.features.geometry would work but that only works if it's in memory.

    Leave a comment:


  • SimonMac
    started a topic Removing an array from a JSON file

    Removing an array from a JSON file

    I have a huge JSON file (10m rows and still counting as it's not fully opened yet) however 99.9% is useless to me. There are about 2200 records in the following format

    Code:
    {
    "type" : "FeatureCollection",
    "name" : "%filename%",
    "features" : [
    {
    "type" : "Feature",
    "geometry" : {=}
    "properties" : {=} 
    },
    {
    "type" : "Feature",
    "geometry" : {=}
    "properties" : {=}
    },
    .
    .
    .
    ]
    }
    The problem is the geometry array (if that is the right term) is a list of co-ordinates with anything from 7 - 21 thousand lines that are useless to me, this means the overall file is 5gb+

    At the moment I am going through each record and manaully deleting the array, is there a way to automate it? I don't want to spend an day going through all 2200 records just to be given a new version at some point meaning I will have to do it again.

Working...
X