Big Data - Big Problems

Definition of Big Data

So how does Big Data define itself? A globally recognised, uniform definition has not yet been established. One reason for this is that all major manufacturers of software, hardware and appliance solutions in the "conventional" business intelligence sector are trying to ride the big data wave and enforce their own definitions. In the German-speaking world, the definition of the Barc Institute Würzburg is the most popular: "Big Data refers to methods and technologies for the highly scalable collection, storage and analysis of polystructured data.

Big Data make or buy?

It will also be exciting to see how the market is divided up between pure software and hardware suppliers, appliance suppliers and specialised service providers. Using blog analysis as an example, some service providers bundle their complete service as an externally purchasable service and supply customers with ready-made analyses. In view of the complexity of the technology and the high degree of specialisation, this model is an interesting alternative in various areas, even if it means that the opportunity for competitive advantages through own, particularly clever implementations is missed. In other areas, big-data approaches will in most cases tend to develop the classic BI architectures in an evolutionary way or supplement them in individual functional areas. Initial experience shows that the complexity is by no means only in the technology - the demands on the analysis tools are increasing as much as on the user before them.

As companies move into new functional areas of data analysis with Big Data, and often pioneer the way, a classic problem that has accompanied the business intelligence industry since its inception remains: The presentation of ROI is no trivial task and usually has to be based on speculative assumptions. The discounted cash flow of a decision made earlier, or by Big Data in the first place, is difficult to calculate reliably. Big Data therefore accompanies Big Data initiatives from the very beginning.

Big Data describes methods and technologies for the highly scalable collection, storage and analysis of polystructured data.
BARC

Technologies for Big Data

So is Big Data a new type of software that completely replaces previous investments in business intelligence and data warehouse? Certainly not in this unambiguousness. The established business intelligence manufacturers are currently expanding their platforms in the direction of better suitability for big data scenarios. Nevertheless, alternative architectures are emerging, some of them from the open source area, where there is an innovative scene around Big Data. In the area of so-called NOSQL databases ("not only SQL"), CouchDB and MongoDB are often cited as examples.

Hadoop, named by its inventor Doug Cutting after his son's favourite yellow elephant, is currently experiencing a real hype. Hadoop is based on the so-called Mapreduce algorithm, which supports the massive parallel processing of large amounts of data and was made popular by Google. The idea behind it is simple: Break the task down into its smallest parts, distribute them to as many computers as possible for massively parallel processing (map) and then reassemble the result (reduce). This is hoped to solve the problem of having to analyze very large, unstructured amounts of data with a manageable investment in hardware. This is done as batch processing and thus sets a counterpoint to the in-memory databases that are becoming increasingly popular in the classic business intelligence sector. Hadoop is an open source framework available in Java, which is increasingly being implemented by major manufacturers such as Microsoft, IBM or SAS or is supported in their own solutions. In addition, Hadoop is now offered by various professional distributors with support and related services, accelerating its spread into the commercial sector. Hadoop is by no means an "out of the box" solution: the quality of the analyses stands or falls with the complex algorithms that have to be developed for each subject matter.

Stefan Sexl
Executive Advisor
LinkedIn

Write comment

Comments

No Comments