It depends what you want to do with them, but Linux (if that is your platform) will have no trouble storing 4 million json files in a single directory. Better to break them into multiple directories though. Easier to handle and gives a little flexibility with regards to backups etc.
As for the 12 million images, at the platform level, same applies, except the storage requirement will be large, eg 12 TB if each image is 1 MB in size.
- Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
- Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!
Reply to: Storing 4M+ json snippets
Collapse
You are not logged in or you do not have permission to access this page. This could be due to one of several reasons:
- You are not logged in. If you are already registered, fill in the form below to log in, or follow the "Sign Up" link to register a new account.
- You may not have sufficient privileges to access this page. Are you trying to edit someone else's post, access administrative features or some other privileged system?
- If you are trying to post, the administrator may have disabled your account, or it may be awaiting activation.
Logging in...
Previously on "Storing 4M+ json snippets"
Collapse
-
We don't know how your app works.Originally posted by anim View PostHi all,
I have a script that generates 4 million + snippets of json code.
What would you agree on on being the best way to store them?
I have considered:
- json files - way too many even if split in subfolders
- mongo DB
- mySQL with json type field
The aim is to be able to easy retrieve them for processing later.
Language is python.
Bonus question: each of these 4m+ json will have 10-15 images related to it. Where and how do you store the images?
If you access the image first and then go looking for the related json, then you could embed the json in the image and save doing the subsequent lookup.
Google for "steganography".
Leave a comment:
-
If you want to pack them away in the database pronto, but aren't too bothered about retrieval speed, then Cassandra would be a good choice.Originally posted by anim View PostHi all,
I have a script that generates 4 million + snippets of json code.
What would you agree on on being the best way to store them?
I have considered:
- json files - way too many even if split in subfolders
- mongo DB
- mySQL with json type field
The aim is to be able to easy retrieve them for processing later.
Language is python.
Bonus question: each of these 4m+ json will have 10-15 images related to it. Where and how do you store the images?
Edit: It's free (open source) and by now fairly mature
Leave a comment:
-
Come out of old school and consider using
- Azure Data Lake Storage (Gen2)
- AWS S3
- GCP Cloud Storage / Filestore.
If you want to keep a copy without your permission and without being charged then go for Alibaba Cloud
Last edited by BigDataPro; 23 September 2020, 15:59.
Leave a comment:
-
Azure Table Storage for the Json snippets.
Azure Blob container for the images.
Link the json to the images using columns in the Azure table (one column for each image blob key).
Use Python via Azure Functions to manipulate the data if want 'serverless'. i.e. M$ handle the infrastructure, availability, backup. Though regular sycing to a local or alternative cloud backup is a good idea.
Not sure how much it may cost, so use the Azure pricing calculator based on your estimates for an idea.
Sorted.
Leave a comment:
-
Ive done this exact same thing with cloud storage, both on azure and S3.
About 600k folders with about 10 images and 15 json files in each.
Leave a comment:
-
I would vote for MongoDB in this use case. Because BSON is the native format of documents stored in MongoDB, you can parse this and store in MongoDB as a queryable object.
Of course if you just want to store them as strings you can choose pretty much any SQL database you like. One option if you don't want a server based DB is to insert them into a local SQLite database.
SQLite Home Page
Leave a comment:
-
Your bonus question gives you the only 1 sane answer to your question but you haven't got there yet.
Leave a comment:
-
Storing 4M+ json snippets
Hi all,
I have a script that generates 4 million + snippets of json code.
What would you agree on on being the best way to store them?
I have considered:
- json files - way too many even if split in subfolders
- mongo DB
- mySQL with json type field
The aim is to be able to easy retrieve them for processing later.
Language is python.
Bonus question: each of these 4m+ json will have 10-15 images related to it. Where and how do you store the images?Tags: None
- Home
- News & Features
- First Timers
- IR35 / S660 / BN66
- Employee Benefit Trusts
- Agency Workers Regulations
- MSC Legislation
- Limited Companies
- Dividends
- Umbrella Company
- VAT / Flat Rate VAT
- Job News & Guides
- Money News & Guides
- Guide to Contracts
- Successful Contracting
- Contracting Overseas
- Contractor Calculators
- MVL
- Contractor Expenses
Advertisers
Contractor Services
CUK News
- All the big IR35/employment status cases of 2025: ranked Dec 23 08:55
- Why IT contractors are (understandably) fed up with recruitment agencies Today 13:57
- Contractors, don’t fall foul of HMRC’s expenses rules this Christmas party season Dec 19 09:55
- A delay to the employment status consultation isn’t why an IR35 fix looks further out of reach Dec 18 08:22
- How asking a tech jobs agency basic questions got one IT contractor withdrawn Dec 17 07:21
- Are Home Office immigration policies sacrificing IT contractors for ‘cheap labour’? Dec 16 07:48
- Will 2026 see the return of the ‘Outside IR35’ contractor? Dec 15 07:51
- Contractors, Reeves’ dividends raid is disastrous. Act, but without acceptance Dec 12 07:10
- Why JSL indemnity clauses putting umbrella contractors on the hook could be a PR disaster Dec 11 07:36
- The JSL legislation we’ll surely get just dropped. Here’s 4 ‘indelibles’ Dec 10 07:26

Leave a comment: