It depends what you want to do with them, but Linux (if that is your platform) will have no trouble storing 4 million json files in a single directory. Better to break them into multiple directories though. Easier to handle and gives a little flexibility with regards to backups etc.
As for the 12 million images, at the platform level, same applies, except the storage requirement will be large, eg 12 TB if each image is 1 MB in size.
- Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
- Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!
Reply to: Storing 4M+ json snippets
Collapse
You are not logged in or you do not have permission to access this page. This could be due to one of several reasons:
- You are not logged in. If you are already registered, fill in the form below to log in, or follow the "Sign Up" link to register a new account.
- You may not have sufficient privileges to access this page. Are you trying to edit someone else's post, access administrative features or some other privileged system?
- If you are trying to post, the administrator may have disabled your account, or it may be awaiting activation.
Logging in...
Previously on "Storing 4M+ json snippets"
Collapse
-
Originally posted by anim View PostHi all,
I have a script that generates 4 million + snippets of json code.
What would you agree on on being the best way to store them?
I have considered:
- json files - way too many even if split in subfolders
- mongo DB
- mySQL with json type field
The aim is to be able to easy retrieve them for processing later.
Language is python.
Bonus question: each of these 4m+ json will have 10-15 images related to it. Where and how do you store the images?
If you access the image first and then go looking for the related json, then you could embed the json in the image and save doing the subsequent lookup.
Google for "steganography".
Leave a comment:
-
Originally posted by anim View PostHi all,
I have a script that generates 4 million + snippets of json code.
What would you agree on on being the best way to store them?
I have considered:
- json files - way too many even if split in subfolders
- mongo DB
- mySQL with json type field
The aim is to be able to easy retrieve them for processing later.
Language is python.
Bonus question: each of these 4m+ json will have 10-15 images related to it. Where and how do you store the images?
Edit: It's free (open source) and by now fairly mature
Leave a comment:
-
Come out of old school and consider using
- Azure Data Lake Storage (Gen2)
- AWS S3
- GCP Cloud Storage / Filestore.
If you want to keep a copy without your permission and without being charged then go for Alibaba CloudLast edited by BigDataPro; 23 September 2020, 15:59.
Leave a comment:
-
Azure Table Storage for the Json snippets.
Azure Blob container for the images.
Link the json to the images using columns in the Azure table (one column for each image blob key).
Use Python via Azure Functions to manipulate the data if want 'serverless'. i.e. M$ handle the infrastructure, availability, backup. Though regular sycing to a local or alternative cloud backup is a good idea.
Not sure how much it may cost, so use the Azure pricing calculator based on your estimates for an idea.
Sorted.
Leave a comment:
-
Ive done this exact same thing with cloud storage, both on azure and S3.
About 600k folders with about 10 images and 15 json files in each.
Leave a comment:
-
I would vote for MongoDB in this use case. Because BSON is the native format of documents stored in MongoDB, you can parse this and store in MongoDB as a queryable object.
Of course if you just want to store them as strings you can choose pretty much any SQL database you like. One option if you don't want a server based DB is to insert them into a local SQLite database.
SQLite Home Page
Leave a comment:
-
Your bonus question gives you the only 1 sane answer to your question but you haven't got there yet.
Leave a comment:
-
Storing 4M+ json snippets
Hi all,
I have a script that generates 4 million + snippets of json code.
What would you agree on on being the best way to store them?
I have considered:
- json files - way too many even if split in subfolders
- mongo DB
- mySQL with json type field
The aim is to be able to easy retrieve them for processing later.
Language is python.
Bonus question: each of these 4m+ json will have 10-15 images related to it. Where and how do you store the images?Tags: None
- Home
- News & Features
- First Timers
- IR35 / S660 / BN66
- Employee Benefit Trusts
- Agency Workers Regulations
- MSC Legislation
- Limited Companies
- Dividends
- Umbrella Company
- VAT / Flat Rate VAT
- Job News & Guides
- Money News & Guides
- Guide to Contracts
- Successful Contracting
- Contracting Overseas
- Contractor Calculators
- MVL
- Contractor Expenses
Advertisers
Contractor Services
CUK News
- Streamline Your Retirement with iSIPP: A Solution for Contractor Pensions Sep 1 09:13
- Making the most of pension lump sums: overview for contractors Sep 1 08:36
- Umbrella company tribunal cases are opening up; are your wages subject to unlawful deductions, too? Aug 31 08:38
- Contractors, relabelling 'labour' as 'services' to appear 'fully contracted out' won't dupe IR35 inspectors Aug 31 08:30
- How often does HMRC check tax returns? Aug 30 08:27
- Work-life balance as an IT contractor: 5 top tips from a tech recruiter Aug 30 08:20
- Autumn Statement 2023 tipped to prioritise mental health, in a boost for UK workplaces Aug 29 08:33
- Final reminder for contractors to respond to the umbrella consultation (closing today) Aug 29 08:09
- Top 5 most in demand cyber security contract roles Aug 25 08:38
- Changes to the right to request flexible working are incoming, but how will contractors be affected? Aug 24 08:25
Leave a comment: