• Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
  • Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!

Big Data Solution - best performance for lowest price

Collapse
X
  •  
  • Filter
  • Time
  • Show
Clear All
new posts

    Big Data Solution - best performance for lowest price

    I'm looking for a solution which can allow me to load in the region of 400 million wide rows per year. There will be about 30 columns which will be used for analytical purposes.

    Any ideas on a possible implementation of this? Needs to be an affordable (most likely open source) solution.

    #2
    What are you going to do with the data?
    What is the retention period?
    What do you want to store it on (Local Disk / SAN / NAS / VSAN)?
    Do you need to back it up / replicate it?

    Ongoing storage is going to be expensive.

    Splunk is one option, but it all depends on what you want to do with the data - If you are just required to store it and not analyse / mine it, then Open Source is an option...
    I was an IPSE Consultative Council Member, until the BoD abolished it. I am not an IPSE Member, since they have no longer have any relevance to me, as an IT Contractor. Read my lips...I recommend QDOS for ALL your Insurance requirements (Contact me for a referral code).

    Comment


      #3
      Originally posted by Scruff View Post
      What are you going to do with the data?
      What is the retention period?
      What do you want to store it on (Local Disk / SAN / NAS / VSAN)?
      Do you need to back it up / replicate it?

      Ongoing storage is going to be expensive.

      Splunk is one option, but it all depends on what you want to do with the data - If you are just required to store it and not analyse / mine it, then Open Source is an option...

      Cheers for the response. Looking for something cloud based ultimately and given my extensive Microsoft background it made sense to stick to something Microsoft based rather than to learn something new. Don't get me wrong, I'd love to pick up some big data skills but i'm ultimately here to deliver in super quick time.

      Found a cloud based solution which is actually exactly what we need - Azure SQL Data Warehouse. Stores the data in non-conventional format but is ultimately accesses via relational querying. Super quick and tidy and the scalability is immense.

      Comment


        #4
        SQL Server would easily manage that volume so Azure would be a spot on choice.

        More significant is what you want to analyse it for and with. Are you ok to write a query, go for a cuppa and then look at the resulting dataset? Or do you want realtime manipulation i.e. drag / drop etc? the only thing with a cloud based solution is that at some point you MAY need to download the results into the real world. In this case, pre-aggregation in the fluffy stuff may be of use.

        If you are doing any in-memory analytics then pre-agg will be a must, unless you have a bank of servers!

        Comment


          #5
          Originally posted by GB9 View Post
          SQL Server would easily manage that volume so Azure would be a spot on choice.

          More significant is what you want to analyse it for and with. Are you ok to write a query, go for a cuppa and then look at the resulting dataset? Or do you want realtime manipulation i.e. drag / drop etc? the only thing with a cloud based solution is that at some point you MAY need to download the results into the real world. In this case, pre-aggregation in the fluffy stuff may be of use.

          If you are doing any in-memory analytics then pre-agg will be a must, unless you have a bank of servers!
          Of course have thought about this - will be looking at a combination of cache tables and regular refreshes on the BI analytics server (Tableau in this case). That should enable the real-time data interaction we are looking for. The option to double or triple the performance at any time should give us everything we need. Looking at loading in the region of 100 million rows initially (cover the last 2 years) for a limited dataset so can't see performance being an issue for a while yet.

          We don't really need any drag and drop just yet as I've come in as the combined ETL, Data warehousing and Analytics consultant. Having a great time playing with this new technology.

          Comment

          Working...
          X