Suddenly occurred to me that it would be very useful to be able to configure a database connection in which the server could deliberately make statement execution slower, depending on the key structure and complexity of the query, so that apps could be tested on a modest-sized test database as if they were run against a production database comprising millions of records, and defective or sub-optimal table keys or SQL statements could thereby be identified. Never come across anything like that though.
- Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
- Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!
Snail mode for simulating vast databases when using a small test database
Collapse
X
-
Snail mode for simulating vast databases when using a small test database
Work in the public sector? Read the IR35 FAQ here -
Originally posted by OwlHoot View PostSuddenly occurred to me that it would be very useful to be able to configure a database connection in which the server could deliberately make statement execution slower, depending on the key structure and complexity of the query, so that apps could be tested on a modest-sized test database as if they were run against a production database comprising millions of records, and defective or sub-optimal table keys or SQL statements could thereby be identified. Never come across anything like that though. -
You can buy my old ZX81 if you like. 500 quid.bloggoth
If everything isn't black and white, I say, 'Why the hell not?'
John Wayne (My guru, not to be confused with my beloved prophet Jeremy Clarkson)Comment
-
Originally posted by SpontaneousOrder View Postif such a thing were possible, wouldn't it be easier to just flag up the sub-optimal SQL in the first place without the fake slow execution times?
A simple SELECT is fine and will run like lightning on a table with a hundred records and no keys - But try running the same SQL on a table with a hundred million records. That is a simplistic example, but one could envisage a complicated statement involving joins across several tables etc, which again could be a perfectly sound SQL, and optimal for the database structure as it stands, but hard if not impossible to anticipate how it would scale to huge tables.
So the proposed snail mode would just be, as it were, a useful way of protoyping the database structure and/or the SQL to check that SQL statements did scale well.
But it was just a passing idea, which I'm not interested in defending to the last ditch if anyone sees no potential use for it.Last edited by OwlHoot; 18 March 2015, 18:54.Work in the public sector? Read the IR35 FAQ hereComment
-
Originally posted by OwlHoot View PostIn many cases, but the SQL isn't always at fault, or even sub-optimal relative to a given table + key structure, if it runs without a syntax error and does what its creator intended.
A simple SELECT is fine and will run like lightning on a table with a hundred records and no keys - But try running the same SQL on a table with a hundred million records. That is a simplistic example, but one could envisage a complicated statement involving joins across several tables etc, which again could be a perfectly sound SQL, and optimal for the database structure as it stands, but hard if not impossible to anticipate how it would scale to huge tables.
So the proposed snail mode would just be, as it were, a useful way of protoyping the database structure and/or the SQL to check that SQL statements did scale well.
But it was just a passing idea, which I'm not interested in defending to the last ditch if anyone sees no potential use for it.
I.e. if the increase in execution time as the table sizes scale isn't a linear progression, then as you say whether the SQL is good or not depends on the size and distribution of data. And if you could predict how it would respond under different table sizes & data distributions (in order to reflect that), then why would you bother actually running the queries for real?
If it's a case of just slowing everything down so that the impact of complicated queries is exaggerated so that's it's noticeable with small data sets, then you don't get the non-linear performance degradation.Comment
-
Surely it'd only cost a few bob to run up an Amazon RDS server (or equivalent if you use another provider) of a suitable size, fill it with a few hundred million randomly-generated records (pre-generated, of course - no point running the DB until you have the data), and run your queries against that. The biggest DB server they offer (db.r3.8xlarge) costs $3.78/hour. IIRC intra-datacentre data transfer is free. So generate your however-many-GB of data using a small-and-cheap EC2 instance (1.3 cents per hour, much cheaper if you use spot instances) which writes it to a file in S3 (3 cents per Gig per month at top price), fire up the DB server and import the data from S3, run your queries, then shut down the database server and either delete the data or (maybe better) move it to Glacier, where it's dirt-cheap to store (1 cent per GB per month) but needs a few hours notice to access again.
You could probably achieve the entire objective for about £20, and a couple of quid a month if you wanted to keep the random data around. It might be cheaper to regenerate a random dataset each time; you'd know after the first time.Comment
-
Originally posted by NickFitz View PostSurely it'd only cost a few bob to run up an Amazon RDS server (or equivalent if you use another provider) of a suitable size, fill it with a few hundred million randomly-generated records (pre-generated, of course - no point running the DB until you have the data), and run your queries against that. The biggest DB server they offer (db.r3.8xlarge) costs $3.78/hour. IIRC intra-datacentre data transfer is free. So generate your however-many-GB of data using a small-and-cheap EC2 instance (1.3 cents per hour, much cheaper if you use spot instances) which writes it to a file in S3 (3 cents per Gig per month at top price), fire up the DB server and import the data from S3, run your queries, then shut down the database server and either delete the data or (maybe better) move it to Glacier, where it's dirt-cheap to store (1 cent per GB per month) but needs a few hours notice to access again.
You could probably achieve the entire objective for about £20, and a couple of quid a month if you wanted to keep the random data around. It might be cheaper to regenerate a random dataset each time; you'd know after the first time.
Combined with generatedata.com or this SQL Data Generator - Data generator for MS SQL Server databases
Sorted.Comment
-
- Reduce the DB buffer cache.
- Reduce DB server O/S memory.
- Reduce DB I/O performance by running some other crappy program at the same time on the DB server.
- Use NFS for DB files.
- Copy the table stats from a eeal big version of the same DB making the statement optimiser of your small DB choose sub-optimal access paths.
Just some options I could think of that I've seen in the past.
Especially a poorly configured NFS setup. Classic.
Oh yes, and Antivirus scanning the db files.Don't believe it, until you see it!Comment
-
Originally posted by DimPrawn View Post^ this
Combined with generatedata.com or this SQL Data Generator - Data generator for MS SQL Server databases
Sorted.
Slight side-track, recently I've been using sql monitor from red-gate to monitor the production db. You can compare performance against a baseline and anything that's introduced that causes a spike in performance you can identify. Might be a bit after the fact but if your working in a team getting them all to performance test their code/queries against generated data and simulating production usage is very difficult.Comment
-
Sticking data on the cloud won't fly for most organisations that have large databases.
Also many companies have test and dev databases that are considerably smaller than production.
I see the relevance of what owlhoot is suggesting.
"But this page worked so much faster when we tested it before the release" is not an uncommon complaint.
Run the query on prod, stick that time 'estimate' into the DEV and UAT phases and give the users a real feel on how it will perform on prod.Comment
- Home
- News & Features
- First Timers
- IR35 / S660 / BN66
- Employee Benefit Trusts
- Agency Workers Regulations
- MSC Legislation
- Limited Companies
- Dividends
- Umbrella Company
- VAT / Flat Rate VAT
- Job News & Guides
- Money News & Guides
- Guide to Contracts
- Successful Contracting
- Contracting Overseas
- Contractor Calculators
- MVL
- Contractor Expenses
Advertisers
Contractor Services
CUK News
- Streamline Your Retirement with iSIPP: A Solution for Contractor Pensions Sep 1 09:13
- Making the most of pension lump sums: overview for contractors Sep 1 08:36
- Umbrella company tribunal cases are opening up; are your wages subject to unlawful deductions, too? Aug 31 08:38
- Contractors, relabelling 'labour' as 'services' to appear 'fully contracted out' won't dupe IR35 inspectors Aug 31 08:30
- How often does HMRC check tax returns? Aug 30 08:27
- Work-life balance as an IT contractor: 5 top tips from a tech recruiter Aug 30 08:20
- Autumn Statement 2023 tipped to prioritise mental health, in a boost for UK workplaces Aug 29 08:33
- Final reminder for contractors to respond to the umbrella consultation (closing today) Aug 29 08:09
- Top 5 most in demand cyber security contract roles Aug 25 08:38
- Changes to the right to request flexible working are incoming, but how will contractors be affected? Aug 24 08:25
Comment