It depends what you want to do with them, but Linux (if that is your platform) will have no trouble storing 4 million json files in a single directory. Better to break them into multiple directories though. Easier to handle and gives a little flexibility with regards to backups etc.
As for the 12 million images, at the platform level, same applies, except the storage requirement will be large, eg 12 TB if each image is 1 MB in size.
- Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
- Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!
Storing 4M+ json snippets
Collapse
X
-
Originally posted by anim View PostHi all,
I have a script that generates 4 million + snippets of json code.
What would you agree on on being the best way to store them?
I have considered:
- json files - way too many even if split in subfolders
- mongo DB
- mySQL with json type field
The aim is to be able to easy retrieve them for processing later.
Language is python.
Bonus question: each of these 4m+ json will have 10-15 images related to it. Where and how do you store the images?
If you access the image first and then go looking for the related json, then you could embed the json in the image and save doing the subsequent lookup.
Google for "steganography".Leave a comment:
-
Originally posted by anim View PostHi all,
I have a script that generates 4 million + snippets of json code.
What would you agree on on being the best way to store them?
I have considered:
- json files - way too many even if split in subfolders
- mongo DB
- mySQL with json type field
The aim is to be able to easy retrieve them for processing later.
Language is python.
Bonus question: each of these 4m+ json will have 10-15 images related to it. Where and how do you store the images?
Edit: It's free (open source) and by now fairly matureLeave a comment:
-
Come out of old school and consider using
- Azure Data Lake Storage (Gen2)
- AWS S3
- GCP Cloud Storage / Filestore.
If you want to keep a copy without your permission and without being charged then go for Alibaba CloudLast edited by BigDataPro; 23 September 2020, 15:59.Leave a comment:
-
Azure Table Storage for the Json snippets.
Azure Blob container for the images.
Link the json to the images using columns in the Azure table (one column for each image blob key).
Use Python via Azure Functions to manipulate the data if want 'serverless'. i.e. M$ handle the infrastructure, availability, backup. Though regular sycing to a local or alternative cloud backup is a good idea.
Not sure how much it may cost, so use the Azure pricing calculator based on your estimates for an idea.
Sorted.Leave a comment:
-
Ive done this exact same thing with cloud storage, both on azure and S3.
About 600k folders with about 10 images and 15 json files in each.Leave a comment:
-
I would vote for MongoDB in this use case. Because BSON is the native format of documents stored in MongoDB, you can parse this and store in MongoDB as a queryable object.
Of course if you just want to store them as strings you can choose pretty much any SQL database you like. One option if you don't want a server based DB is to insert them into a local SQLite database.
SQLite Home PageLeave a comment:
-
Your bonus question gives you the only 1 sane answer to your question but you haven't got there yet.Leave a comment:
- Home
- News & Features
- First Timers
- IR35 / S660 / BN66
- Employee Benefit Trusts
- Agency Workers Regulations
- MSC Legislation
- Limited Companies
- Dividends
- Umbrella Company
- VAT / Flat Rate VAT
- Job News & Guides
- Money News & Guides
- Guide to Contracts
- Successful Contracting
- Contracting Overseas
- Contractor Calculators
- MVL
- Contractor Expenses
Advertisers
Contractor Services
CUK News
- Streamline Your Retirement with iSIPP: A Solution for Contractor Pensions Sep 1 09:13
- Making the most of pension lump sums: overview for contractors Sep 1 08:36
- Umbrella company tribunal cases are opening up; are your wages subject to unlawful deductions, too? Aug 31 08:38
- Contractors, relabelling 'labour' as 'services' to appear 'fully contracted out' won't dupe IR35 inspectors Aug 31 08:30
- How often does HMRC check tax returns? Aug 30 08:27
- Work-life balance as an IT contractor: 5 top tips from a tech recruiter Aug 30 08:20
- Autumn Statement 2023 tipped to prioritise mental health, in a boost for UK workplaces Aug 29 08:33
- Final reminder for contractors to respond to the umbrella consultation (closing today) Aug 29 08:09
- Top 5 most in demand cyber security contract roles Aug 25 08:38
- Changes to the right to request flexible working are incoming, but how will contractors be affected? Aug 24 08:25
Leave a comment: