- Visitors can check out the Forum FAQ by clicking this link. You have to register before you can post: click the REGISTER link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. View our Forum Privacy Policy.
- Want to receive the latest contracting news and advice straight to your inbox? Sign up to the ContractorUK newsletter here. Every sign up will also be entered into a draw to WIN £100 Amazon vouchers!
Reply to: Big Data
Collapse
You are not logged in or you do not have permission to access this page. This could be due to one of several reasons:
- You are not logged in. If you are already registered, fill in the form below to log in, or follow the "Sign Up" link to register a new account.
- You may not have sufficient privileges to access this page. Are you trying to edit someone else's post, access administrative features or some other privileged system?
- If you are trying to post, the administrator may have disabled your account, or it may be awaiting activation.
Logging in...
Previously on "Big Data"
Collapse
-
Sounds like something my brother invented. Did a website for him recently because he couldn't. According to him, all programming was unnecessarily complicated, there was a much simpler, far more intuitive way to do everything. Naturally when I asked him to share his wisdom I got no answers. When I asked how to do a paragraph in an easier way than surrounding it by <p> </p> I got no answers. The "wise" people of the world are usually idiots.
-
Originally posted by greenlake View Post
Leave a comment:
-
Originally posted by SunnyInHades View PostBIG DATA FOR DUMMIES CHEAT SHEET
From Big Data For Dummies
Snip
/
Companies are swimming in big data. The problem is that they often don’t know how to pragmatically use that data to be able to predict the future, execute important business processes, or simply gain new insights. The goal of your big data strategy and plan should be to find a pragmatic way to leverage data for more predictable business outcomes.
Snip
.
This is my concern that large amounts of money is being spent on this - but in reality results are mediocre at best.
Lets face it if you are an ice cream manufacturer you will sell more in hot periods and probably over xmas - not sure how much money you need to spend on big data to work that out.
Surely also most of the people currently using big data are economists - who never seem to get it right either.
So it is all really just a sham - a way of trying to cover your ass over decisions being made?
Anyone get any experience of this in a good way?
Leave a comment:
-
I have just built a new analytics platform for a very exiting and current industry. Using a traditional DBMS was completely out of the question as the company deals with 1 billion+ records per year for its core business and then there is some GPS data which is in the trillions. I've not got as far as the GPS data but I'm working with a really cool unstructured semi-big data cloud platform called Azure SQL data warehouse. It's limit is 10 TB but we could potentially spin up multiple ones and then cache and combine. We would never need more than 30TB so doesn't seem to make sense to move onto the parallel processing paradigm of Hadoop - it depends how far you want to take the data - sky is the limit
Leave a comment:
-
BIG DATA FOR DUMMIES CHEAT SHEET
From Big Data For Dummies
By Judith Hurwitz, Alan Nugent, Fern Halper, Marcia Kaufman
Companies must find a practical way to deal with big data to stay competitive — to learn new ways to capture and analyze growing amounts of information about customers, products, and services. Data is becoming increasingly complex in structured and unstructured ways. New sources of data come from machines, such as sensors; social business sites; and website interaction, such as click-stream data. Meeting these changing business requirements demands that the right information be available at the right time.
DEFINING BIG DATA: VOLUME, VELOCITY, AND VARIETY
Big data enables organizations to store, manage, and manipulate vast amounts of disparate data at the right speed and at the right time. To gain the right insights, big data is typically broken down by three characteristics:
Volume: How much data
Velocity: How fast data is processed
Variety: The various types of data
While it is convenient to simplify big data into the three Vs, it can be misleading and overly simplistic. For example, you may be managing a relatively small amount of very disparate, complex data or you may be processing a huge volume of very simple data. That simple data may be all structured or all unstructured.
Even more important is the fourth V, veracity. How accurate is that data in predicting business value? Do the results of a big data analysis actually make sense? Data must be able to be verified based on both accuracy and context. An innovative business may want to be able to analyze massive amounts of data in real time to quickly assess the value of that customer and the potential to provide additional offers to that customer. It is necessary to identify the right amount and types of data that can be analyzed in real time to impact business outcomes.
Big data incorporates all the varieties of data, including structured data and unstructured data from e-mails, social media, text streams, and so on. This kind of data management requires companies to leverage both their structured and unstructured data.
UNDERSTANDING UNSTRUCTURED DATA
Unstructured data is different than structured data in that its structure is unpredictable. Examples of unstructured data include documents, e-mails, blogs, digital images, videos, and satellite imagery. It also includes some data generated by machines or sensors. In fact, unstructured data accounts for the majority of data that’s on your company’s premises as well as external to your company in online private and public sources such as Twitter and Facebook.
In the past, most companies weren’t able to either capture or store this vast amount of data. It was simply too expensive or too overwhelming. Even if companies were able to capture the data, they didn’t have the tools to easily analyze the data and use the results to make decisions. Very few tools could make sense of these vast amounts of data. The tools that did exist were complex to use and did not produce results in a reasonable time frame.
In the end, those who really wanted to go to the enormous effort of analyzing this data were forced to work with snapshots of data. This has the undesirable effect of missing important events because they were not in a particular snapshot.
One approach that is becoming increasingly valued as a way to gain business value from unstructured data is text analytics, the process of analyzing unstructured text, extracting relevant information, and transforming it into structured information that can then be leveraged in various ways. The analysis and extraction processes take advantage of techniques that originated in computational linguistics, statistics, and other computer science disciplines.
THE ROLE OF TRADITIONAL OPERATIONAL DATA IN THE BIG DATA ENVIRONMENT
Knowing what data is stored and where it is stored are critical building blocks in your big data implementation. It’s unlikely that you’ll use RDBMSs for the core of the implementation, but it’s very likely that you’ll need to rely on the data stored in RDBMSs to create the highest level of value to the business with big data.
Most large and small companies probably store most of their important operational information in relational database management systems (RDBMSs), which are built on one or more relations and represented by tables. These tables are defined by the way the data is stored.The data is stored in database objects called tables — organized in rows and columns. RDBMSs follow a consistent approach in the way that data is stored and retrieved.
To get the most business value from your real-time analysis of unstructured data, you need to understand that data in context with your historical data on customers, products, transactions, and operations. In other words, you will need to integrate your unstructured data with your traditional operational data.
BASICS OF BIG DATA INFRASTRUCTURE
Big data is all about high velocity, large volumes, and wide data variety, so the physical infrastructure will literally “make or break” the implementation. Most big data implementations need to be highly available, so the networks, servers, and physical storage must be resilient and redundant.
Resiliency and redundancy are interrelated. An infrastructure, or a system, is resilient to failure or changes when sufficient redundant resources are in place ready to jump into action. Resiliency helps to eliminate single points of failure in your infrastructure. For example, if only one network connection exists between your business and the Internet, you have no network redundancy, and the infrastructure is not resilient with respect to a network outage.
In large data centers with business continuity requirements, most of the redundancy is in place and can be leveraged to create a big data environment. In new implementations, the designers have the responsibility to map the deployment to the needs of the business based on costs and performance.
MANAGING BIG DATA WITH HADOOP: HDFS AND MAPREDUCE
Hadoop, an open-source software framework, uses HDFS (the Hadoop Distributed File System) and MapReduce to analyze big data on clusters of commodity hardware—that is, in a distributed computing environment.
The Hadoop Distributed File System (HDFS) was developed to allow companies to more easily manage huge volumes of data in a simple and pragmatic way. Hadoop allows big problems to be decomposed into smaller elements so that analysis can be done quickly and cost effectively. HDFS is a versatile, resilient, clustered approach to managing files in a big data environment.
HDFS is not the final destination for files. Rather it is a data “service” that offers a unique set of capabilities needed when data volumes and velocity are high.
MapReduce is a software framework that enables developers to write programs that can process massive amounts of unstructured data in parallel across a distributed group of processors. MapReduce was designed by Google as a way of efficiently executing a set of functions against a large amount of data in batch mode.
The “map” component distributes the programming problem or tasks across a large number of systems and handles the placement of the tasks in a way that balances the load and manages recovery from failures. After the distributed computation is completed, another function called “reduce” aggregates all the elements back together to provide a result. An example of MapReduce usage would be to determine how many pages of a book are written in each of 50 different languages.
LAYING THE GROUNDWORK FOR YOUR BIG DATA STRATEGY
Companies are swimming in big data. The problem is that they often don’t know how to pragmatically use that data to be able to predict the future, execute important business processes, or simply gain new insights. The goal of your big data strategy and plan should be to find a pragmatic way to leverage data for more predictable business outcomes.
Begin your big data strategy by embarking on a discovery process. You need to get a handle on what data you already have, where it is, who owns and controls it, and how it is currently used. For example, what are the third-party data sources that your company relies on? This process can give you a lot of insights:
You can determine how many data sources you have and how much overlap exists.
You can identify gaps exist in knowledge about those data sources.
You might discover that you have lots of duplicate data in one area of the business and almost no data in another area.
You might ascertain that you are dependent on third-party data that isn’t as accurate as it should be.
Spend the time you need to do this discovery process because it will be the foundation for your planning and execution of your big data strategy.
Leave a comment:
-
Been on a Big Data project for the last year, just think of it as a huge bucket. Great at having lots of data in one place that you can run analysis against, because its distributed compute, parallelism jobs are the way to go.
Good use case is machine logs, chuck them all in there, and analyise later.
Leave a comment:
-
Originally posted by MarillionFan View PostContract I have now taken is a Big Data project. There are going to be some quite useful use cases at clientco, but initially they're buying the kit thinking it will do every form of operational, management & analytical reporting they want (including financial) just by chucking the data in.
I am still skeptical, but it's a greenfield site so will have a certain free hand. I shall report back on any real use cases over classic BI.
Leave a comment:
-
Originally posted by MarillionFan View PostContract I have now taken is a Big Data project. There are going to be some quite useful use cases at clientco, but initially they're buying the kit thinking it will do every form of operational, management & analytical reporting they want (including financial) just by chucking the data in.
I am still skeptical, but it's a greenfield site so will have a certain free hand. I shall report back on any real use cases over classic BI.
Leave a comment:
-
Contract I have now taken is a Big Data project. There are going to be some quite useful use cases at clientco, but initially they're buying the kit thinking it will do every form of operational, management & analytical reporting they want (including financial) just by chucking the data in.
I am still skeptical, but it's a greenfield site so will have a certain free hand. I shall report back on any real use cases over classic BI.
Leave a comment:
-
We use SAP Hana for data collection. We're tracking usage against 5000 cloud instances, who's inserted, deleted records etc, growth of application usage.
This is all then tied back using Licensing decoder I designed to make sense of it to the business. Now using for product recommendations.
But it's all business SME plus data scientists. We're hampered by the fact that IT decided to chuck the data into SAP Hana in a more unstructured manner as they reckoned it was so quick that it would handle unstructured data for analysis.
Guess what they were wrong and now having to build out a proper set of star schemas with additional
Business input.
Most companies don't need it but call it out as Big data.
Leave a comment:
-
Originally posted by DimPrawn View PostYes, that's Big Data. It's got nothing whatsoever to do with databases, it's map reduce, distributed computing, predictive analytics, data science. It's not a SQL Server DB.
Leave a comment:
-
Originally posted by DimPrawn View PostThat's not big data. That's analysing structured data in a DW.
Big data is analysing something like every tweet in realtime and obtaining understanding of how the World sees your products.
BTW there's 9100 tweets per second to analyse, 24/7/365. Try that in your SQL server DB.
Leave a comment:
-
Originally posted by DimPrawn View PostThat's not big data. That's analysing structured data in a DW.
Big data is analysing something like every tweet in realtime and obtaining understanding of how the World sees your products.
BTW there's 9100 tweets per second to analyse, 24/7/365. Try that in your SQL server DB.
Leave a comment:
- Home
- News & Features
- First Timers
- IR35 / S660 / BN66
- Employee Benefit Trusts
- Agency Workers Regulations
- MSC Legislation
- Limited Companies
- Dividends
- Umbrella Company
- VAT / Flat Rate VAT
- Job News & Guides
- Money News & Guides
- Guide to Contracts
- Successful Contracting
- Contracting Overseas
- Contractor Calculators
- MVL
- Contractor Expenses
Advertisers
Contractor Services
CUK News
- How to answer at interview, ‘What’s your greatest weakness?’ Nov 14 09:59
- Business Asset Disposal Relief changes in April 2025: Q&A Nov 13 09:37
- How debt transfer rules will hit umbrella companies in 2026 Nov 12 09:28
- IT contractor demand floundering despite Autumn Budget 2024 Nov 11 09:30
- An IR35 bill of £19m for National Resources Wales may be just the tip of its iceberg Nov 7 09:20
- Micro-entity accounts: Overview, and how to file with HMRC Nov 6 09:27
- Will HMRC’s 9% interest rate bully you into submission? Nov 5 09:10
- Business Account with ANNA Money Nov 1 15:51
- Autumn Budget 2024: Reeves raids contractor take-home pay Oct 31 14:11
- How Autumn Budget 2024 affects homes, property and mortgages Oct 31 09:23
Leave a comment: