The growth in data volumes is exponential, at the latest due to the triumphant advance of the Internet. But even beyond the data octopus of the Internet, a lot of data is being collected: According to a McKinsey study, the potentially exploitable data volume generated by a fully occupied airplane on a single one-way flight is estimated at over 200 terabytes. Contrary to expectations that might arise in view of the term, it is by no means only the data volume that is at the centre of the Big Data movement. This is because the current weapon cabinet of data analysis, which is usually summarized under the term Business Intelligence, requires strongly pre-structured and well-designed data models - and thus a time-consuming process. This is one reason why relevant projects are very often overtaken by reality - in surveys, end users cite speed in response to structural changes and new requirements as the main problems in their bi-initiatives, in addition to the "evergreen" query speed.
This is why the big data approach differs from conventional approaches in that it can handle polystructured data flexibly. In addition to classic structured data, such as those generated by an internal ERP system, semi-structured documents, such as documents based on HTML or XML standards, or completely unstructured documents are also used. Blogs are a good example of the latter: Many brand manufacturers try to find out and analyze how often and in what context their products are mentioned in blogs and forums. And this example is the best way to summarize the problems that Big Data is trying to solve:
First: There is a lot of data, it is not structured or the structure is not within the company's sphere of influence and can change dynamically.
Second: Methodological evaluation requires completely new analysis technologies: In this specific case, algorithms which recognize the context - is it a complaint, a commendation or a comparison with a competing product - in all major world languages.
Third: This requires not only additional technology, but also a combination of understanding of the business problem and a deep understanding of what is technologically feasible.
Some industry analysts see this as the new profession of "data scientist". Hotline operators are already experimenting with speech recognition algorithms that recognize the context from the tonality of the spoken word and can thus distinguish an emotionally presented complaint from a routine inquiry. If such technologies succeed in recognizing changes in customer behavior much earlier, for example, a competitive advantage is undoubtedly generated. Scenarios of this kind also carry uncertainty inherent in the system: In contrast to BI systems based on internal data, both the aggregation of data and the analysis must work with partially incomplete data and probabilities, a scenario for which conventional bi- and DWH solutions are completely inadequately prepared today. The well-known IT analyst Wolfgang Martin sums it up pointedly: "The single point of truth goes swimming.