• Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
  • Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

You are not logged in or you do not have permission to access this page. This could be due to one of several reasons:

  • You are not logged in. If you are already registered, fill in the form below to log in, or follow the "Sign Up" link to register a new account.
  • You may not have sufficient privileges to access this page. Are you trying to edit someone else's post, access administrative features or some other privileged system?
  • If you are trying to post, the administrator may have disabled your account, or it may be awaiting activation.

Previously on "Big Data Solution - best performance for lowest price"

Collapse

  • NorthWestPerm2Contr
    replied
    Originally posted by GB9 View Post
    SQL Server would easily manage that volume so Azure would be a spot on choice.

    More significant is what you want to analyse it for and with. Are you ok to write a query, go for a cuppa and then look at the resulting dataset? Or do you want realtime manipulation i.e. drag / drop etc? the only thing with a cloud based solution is that at some point you MAY need to download the results into the real world. In this case, pre-aggregation in the fluffy stuff may be of use.

    If you are doing any in-memory analytics then pre-agg will be a must, unless you have a bank of servers!
    Of course have thought about this - will be looking at a combination of cache tables and regular refreshes on the BI analytics server (Tableau in this case). That should enable the real-time data interaction we are looking for. The option to double or triple the performance at any time should give us everything we need. Looking at loading in the region of 100 million rows initially (cover the last 2 years) for a limited dataset so can't see performance being an issue for a while yet.

    We don't really need any drag and drop just yet as I've come in as the combined ETL, Data warehousing and Analytics consultant. Having a great time playing with this new technology.

    Leave a comment:


  • GB9
    replied
    SQL Server would easily manage that volume so Azure would be a spot on choice.

    More significant is what you want to analyse it for and with. Are you ok to write a query, go for a cuppa and then look at the resulting dataset? Or do you want realtime manipulation i.e. drag / drop etc? the only thing with a cloud based solution is that at some point you MAY need to download the results into the real world. In this case, pre-aggregation in the fluffy stuff may be of use.

    If you are doing any in-memory analytics then pre-agg will be a must, unless you have a bank of servers!

    Leave a comment:


  • NorthWestPerm2Contr
    replied
    Originally posted by Scruff View Post
    What are you going to do with the data?
    What is the retention period?
    What do you want to store it on (Local Disk / SAN / NAS / VSAN)?
    Do you need to back it up / replicate it?

    Ongoing storage is going to be expensive.

    Splunk is one option, but it all depends on what you want to do with the data - If you are just required to store it and not analyse / mine it, then Open Source is an option...

    Cheers for the response. Looking for something cloud based ultimately and given my extensive Microsoft background it made sense to stick to something Microsoft based rather than to learn something new. Don't get me wrong, I'd love to pick up some big data skills but i'm ultimately here to deliver in super quick time.

    Found a cloud based solution which is actually exactly what we need - Azure SQL Data Warehouse. Stores the data in non-conventional format but is ultimately accesses via relational querying. Super quick and tidy and the scalability is immense.

    Leave a comment:


  • Scruff
    replied
    What are you going to do with the data?
    What is the retention period?
    What do you want to store it on (Local Disk / SAN / NAS / VSAN)?
    Do you need to back it up / replicate it?

    Ongoing storage is going to be expensive.

    Splunk is one option, but it all depends on what you want to do with the data - If you are just required to store it and not analyse / mine it, then Open Source is an option...

    Leave a comment:


  • Big Data Solution - best performance for lowest price

    I'm looking for a solution which can allow me to load in the region of 400 million wide rows per year. There will be about 30 columns which will be used for analytical purposes.

    Any ideas on a possible implementation of this? Needs to be an affordable (most likely open source) solution.

Working...
X