• Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
  • Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

You are not logged in or you do not have permission to access this page. This could be due to one of several reasons:

  • You are not logged in. If you are already registered, fill in the form below to log in, or follow the "Sign Up" link to register a new account.
  • You may not have sufficient privileges to access this page. Are you trying to edit someone else's post, access administrative features or some other privileged system?
  • If you are trying to post, the administrator may have disabled your account, or it may be awaiting activation.

Previously on "Snail mode for simulating vast databases when using a small test database"

Collapse

  • Sysman
    replied
    Originally posted by minestrone View Post
    Sticking data on the cloud won't fly for most organisations that have large databases.

    Also many companies have test and dev databases that are considerably smaller than production.

    I see the relevance of what owlhoot is suggesting.
    Yep. I've seen it time and again where projects have been developed on the latest flashy kit but expected to run well on lesser machines. Graphics designers using gigantic screens and bags of RAM not thinking about the poor users with their commodity gear is one example. With SQL, there's a similar danger that developers might have fancy disk arrays or SSDs to develop on but the end users have to live with existing kit which already has a workload.

    Originally posted by minestrone View Post
    "But this page worked so much faster when we tested it before the release" is not an uncommon complaint.

    Run the query on prod, stick that time 'estimate' into the DEV and UAT phases and give the users a real feel on how it will perform on prod.
    Sounds better.

    The idea of restricting server RAM and loading up a test database on some USB2 disks has merit

    Leave a comment:


  • SpontaneousOrder
    replied
    Originally posted by woohoo View Post
    If your data is too generic then your queries may not be selective enough to use indexes and you may have performance issues you would not get in production. Or if your test data is too random then indexes may be used that would not in production.
    Contention due to data distribution is an issue to be aware of too. I've build a system before that ran like treacle through a sieve in test. it turned out that they were putting through hundreds of TPS using only 50 credit card numbers. When they added more credit card numbers it flew.

    Leave a comment:


  • minestrone
    replied
    Sticking data on the cloud won't fly for most organisations that have large databases.

    Also many companies have test and dev databases that are considerably smaller than production.

    I see the relevance of what owlhoot is suggesting.

    "But this page worked so much faster when we tested it before the release" is not an uncommon complaint.

    Run the query on prod, stick that time 'estimate' into the DEV and UAT phases and give the users a real feel on how it will perform on prod.

    Leave a comment:


  • woohoo
    replied
    Originally posted by DimPrawn View Post
    The red-gate product looks interesting, thanks for the link. Goes without saying, so im going to say it, you need to understand your data before generating test records. If your data is too generic then your queries may not be selective enough to use indexes and you may have performance issues you would not get in production. Or if your test data is too random then indexes may be used that would not in production.

    Slight side-track, recently I've been using sql monitor from red-gate to monitor the production db. You can compare performance against a baseline and anything that's introduced that causes a spike in performance you can identify. Might be a bit after the fact but if your working in a team getting them all to performance test their code/queries against generated data and simulating production usage is very difficult.

    Leave a comment:


  • darrylmg
    replied
    - Reduce the DB buffer cache.
    - Reduce DB server O/S memory.
    - Reduce DB I/O performance by running some other crappy program at the same time on the DB server.
    - Use NFS for DB files.
    - Copy the table stats from a eeal big version of the same DB making the statement optimiser of your small DB choose sub-optimal access paths.

    Just some options I could think of that I've seen in the past.
    Especially a poorly configured NFS setup. Classic.
    Oh yes, and Antivirus scanning the db files.

    Leave a comment:


  • DimPrawn
    replied
    Originally posted by NickFitz View Post
    Surely it'd only cost a few bob to run up an Amazon RDS server (or equivalent if you use another provider) of a suitable size, fill it with a few hundred million randomly-generated records (pre-generated, of course - no point running the DB until you have the data), and run your queries against that. The biggest DB server they offer (db.r3.8xlarge) costs $3.78/hour. IIRC intra-datacentre data transfer is free. So generate your however-many-GB of data using a small-and-cheap EC2 instance (1.3 cents per hour, much cheaper if you use spot instances) which writes it to a file in S3 (3 cents per Gig per month at top price), fire up the DB server and import the data from S3, run your queries, then shut down the database server and either delete the data or (maybe better) move it to Glacier, where it's dirt-cheap to store (1 cent per GB per month) but needs a few hours notice to access again.

    You could probably achieve the entire objective for about £20, and a couple of quid a month if you wanted to keep the random data around. It might be cheaper to regenerate a random dataset each time; you'd know after the first time.
    ^ this

    Combined with generatedata.com or this SQL Data Generator - Data generator for MS SQL Server databases

    Sorted.

    Leave a comment:


  • NickFitz
    replied
    Surely it'd only cost a few bob to run up an Amazon RDS server (or equivalent if you use another provider) of a suitable size, fill it with a few hundred million randomly-generated records (pre-generated, of course - no point running the DB until you have the data), and run your queries against that. The biggest DB server they offer (db.r3.8xlarge) costs $3.78/hour. IIRC intra-datacentre data transfer is free. So generate your however-many-GB of data using a small-and-cheap EC2 instance (1.3 cents per hour, much cheaper if you use spot instances) which writes it to a file in S3 (3 cents per Gig per month at top price), fire up the DB server and import the data from S3, run your queries, then shut down the database server and either delete the data or (maybe better) move it to Glacier, where it's dirt-cheap to store (1 cent per GB per month) but needs a few hours notice to access again.

    You could probably achieve the entire objective for about £20, and a couple of quid a month if you wanted to keep the random data around. It might be cheaper to regenerate a random dataset each time; you'd know after the first time.

    Leave a comment:


  • SpontaneousOrder
    replied
    Originally posted by OwlHoot View Post
    In many cases, but the SQL isn't always at fault, or even sub-optimal relative to a given table + key structure, if it runs without a syntax error and does what its creator intended.

    A simple SELECT is fine and will run like lightning on a table with a hundred records and no keys - But try running the same SQL on a table with a hundred million records. That is a simplistic example, but one could envisage a complicated statement involving joins across several tables etc, which again could be a perfectly sound SQL, and optimal for the database structure as it stands, but hard if not impossible to anticipate how it would scale to huge tables.

    So the proposed snail mode would just be, as it were, a useful way of protoyping the database structure and/or the SQL to check that SQL statements did scale well.

    But it was just a passing idea, which I'm not interested in defending to the last ditch if anyone sees no potential use for it.
    I might be missing the point, but I was wondering how it would know whether to slow down the query or not?
    I.e. if the increase in execution time as the table sizes scale isn't a linear progression, then as you say whether the SQL is good or not depends on the size and distribution of data. And if you could predict how it would respond under different table sizes & data distributions (in order to reflect that), then why would you bother actually running the queries for real?

    If it's a case of just slowing everything down so that the impact of complicated queries is exaggerated so that's it's noticeable with small data sets, then you don't get the non-linear performance degradation.

    Leave a comment:


  • OwlHoot
    replied
    Originally posted by SpontaneousOrder View Post
    if such a thing were possible, wouldn't it be easier to just flag up the sub-optimal SQL in the first place without the fake slow execution times?
    In many cases, but the SQL isn't always at fault, or even sub-optimal relative to a given table + key structure, if it runs without a syntax error and does what its creator intended.

    A simple SELECT is fine and will run like lightning on a table with a hundred records and no keys - But try running the same SQL on a table with a hundred million records. That is a simplistic example, but one could envisage a complicated statement involving joins across several tables etc, which again could be a perfectly sound SQL, and optimal for the database structure as it stands, but hard if not impossible to anticipate how it would scale to huge tables.

    So the proposed snail mode would just be, as it were, a useful way of protoyping the database structure and/or the SQL to check that SQL statements did scale well.

    But it was just a passing idea, which I'm not interested in defending to the last ditch if anyone sees no potential use for it.
    Last edited by OwlHoot; 18 March 2015, 18:54.

    Leave a comment:


  • xoggoth
    replied
    You can buy my old ZX81 if you like. 500 quid.

    Leave a comment:


  • SpontaneousOrder
    replied
    Originally posted by OwlHoot View Post
    Suddenly occurred to me that it would be very useful to be able to configure a database connection in which the server could deliberately make statement execution slower, depending on the key structure and complexity of the query, so that apps could be tested on a modest-sized test database as if they were run against a production database comprising millions of records, and defective or sub-optimal table keys or SQL statements could thereby be identified. Never come across anything like that though.
    if such a thing were possible, wouldn't it be easier to just flag up the sub-optimal SQL in the first place without the fake slow execution times?

    Leave a comment:


  • Snail mode for simulating vast databases when using a small test database

    Suddenly occurred to me that it would be very useful to be able to configure a database connection in which the server could deliberately make statement execution slower, depending on the key structure and complexity of the query, so that apps could be tested on a modest-sized test database as if they were run against a production database comprising millions of records, and defective or sub-optimal table keys or SQL statements could thereby be identified. Never come across anything like that though.

Working...
X