• Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
  • FREE workshop: Preparing contractors for Autumn : Weds 29th Sep at 7.15pm. More details here.
Collapse

You are not logged in or you do not have permission to access this page. This could be due to one of several reasons:

  • You are not logged in. If you are already registered, fill in the form below to log in, or follow the "Sign Up" link to register a new account.
  • You may not have sufficient privileges to access this page. Are you trying to edit someone else's post, access administrative features or some other privileged system?
  • If you are trying to post, the administrator may have disabled your account, or it may be awaiting activation.

Previously on "Storing 4M+ json snippets"

Collapse

  • unixman
    replied
    It depends what you want to do with them, but Linux (if that is your platform) will have no trouble storing 4 million json files in a single directory. Better to break them into multiple directories though. Easier to handle and gives a little flexibility with regards to backups etc.

    As for the 12 million images, at the platform level, same applies, except the storage requirement will be large, eg 12 TB if each image is 1 MB in size.

    Leave a comment:


  • darrylmg
    replied
    Originally posted by anim View Post
    Hi all,

    I have a script that generates 4 million + snippets of json code.
    What would you agree on on being the best way to store them?

    I have considered:
    - json files - way too many even if split in subfolders
    - mongo DB
    - mySQL with json type field

    The aim is to be able to easy retrieve them for processing later.
    Language is python.

    Bonus question: each of these 4m+ json will have 10-15 images related to it. Where and how do you store the images?
    We don't know how your app works.
    If you access the image first and then go looking for the related json, then you could embed the json in the image and save doing the subsequent lookup.
    Google for "steganography".

    Leave a comment:


  • OwlHoot
    replied
    Originally posted by anim View Post
    Hi all,

    I have a script that generates 4 million + snippets of json code.
    What would you agree on on being the best way to store them?

    I have considered:
    - json files - way too many even if split in subfolders
    - mongo DB
    - mySQL with json type field

    The aim is to be able to easy retrieve them for processing later.
    Language is python.

    Bonus question: each of these 4m+ json will have 10-15 images related to it. Where and how do you store the images?
    If you want to pack them away in the database pronto, but aren't too bothered about retrieval speed, then Cassandra would be a good choice.

    Edit: It's free (open source) and by now fairly mature

    Leave a comment:


  • BigDataPro
    replied
    Come out of old school and consider using

    - Azure Data Lake Storage (Gen2)
    - AWS S3
    - GCP Cloud Storage / Filestore.

    If you want to keep a copy without your permission and without being charged then go for Alibaba Cloud
    Last edited by BigDataPro; 23 September 2020, 15:59.

    Leave a comment:


  • Hobosapien
    replied
    Azure Table Storage for the Json snippets.
    Azure Blob container for the images.

    Link the json to the images using columns in the Azure table (one column for each image blob key).

    Use Python via Azure Functions to manipulate the data if want 'serverless'. i.e. M$ handle the infrastructure, availability, backup. Though regular sycing to a local or alternative cloud backup is a good idea.

    Not sure how much it may cost, so use the Azure pricing calculator based on your estimates for an idea.

    Sorted.

    Leave a comment:


  • TheGreenBastard
    replied
    Postgres + JSONB

    Leave a comment:


  • minestrone
    replied
    Ive done this exact same thing with cloud storage, both on azure and S3.

    About 600k folders with about 10 images and 15 json files in each.

    Leave a comment:


  • minestrone
    replied
    Hi SAS

    Leave a comment:


  • _V_
    replied
    I would vote for MongoDB in this use case. Because BSON is the native format of documents stored in MongoDB, you can parse this and store in MongoDB as a queryable object.

    Of course if you just want to store them as strings you can choose pretty much any SQL database you like. One option if you don't want a server based DB is to insert them into a local SQLite database.

    SQLite Home Page

    Leave a comment:


  • eek
    replied
    Your bonus question gives you the only 1 sane answer to your question but you haven't got there yet.

    Leave a comment:


  • anim
    started a topic Storing 4M+ json snippets

    Storing 4M+ json snippets

    Hi all,

    I have a script that generates 4 million + snippets of json code.
    What would you agree on on being the best way to store them?

    I have considered:
    - json files - way too many even if split in subfolders
    - mongo DB
    - mySQL with json type field

    The aim is to be able to easy retrieve them for processing later.
    Language is python.

    Bonus question: each of these 4m+ json will have 10-15 images related to it. Where and how do you store the images?

Working...
X