Sunday, September 6, 2015

Big Data: "Shit in, shit out"

When I look at the results of my personal event Three struggle EuroCIS, Bitkom Big Data and CeBIT, then draw many colleagues with great momentum on a loose rope.

Many points of my recent articles on theme SmartHome find an ominous projection in the world of industry and 4.0 Big Data. Here I see a huge gap between strategy and operational implementation, which could be due, among other things, that strategy consultants consider the infinite data flow as a given constant.

So what's behind the Buzzword Bingo and where lurk the problems? Topics Big Data, IoT, industry 4.0 were represented on all fairs and also attracted a lot of people on. Like me, many are interested in the state of affairs.

I will start with Big Data, which also provides the basis for the article title. Undeniably, well that by IoT and industry 4.0 have the amount of data with which we operate, again will rise exponentially. An important contribution of must be made here of big data is, to distinguish the important from unimportant data and filter. Even with Hana and Co. The sensor and actuator data would flood our systems.

Even though I now have the data, then you are not saved. About the Data Handling was very little talking at the fairs. Another important point is next to the quantity precisely the quality of the data (see title). And exactly at this point is still the great weakness of Big Data.

No quality, no quality answer

Is it already in internal data difficult to establish a consistent data schema, this is an unstructured external information a nearly impossible task. Alternatively, I buy to external data, which have a high data quality. But here the question of the value of the data must be permitted. Because only by the availability of data results in no intelligent evaluation. Although pseudo intelligent systems such as IBM Watson want to suggest such a thing to us, I still have not found any system that automatically discovers connections. This is exactly what would be the added value of Big Data, for example, for a predictive analysis.

Today's iterative systems show me their failure every day. Why, for example, always advertise products that I have just bought? An individual cross-selling, which over "who bought x also bought y and z" goes, has no one to show me. And this with the company that would have to really know in their master data all about me.

This also applies to decision support systems. Big Data has been here just a search engine with decision probabilities, but fails completely when closed questions. Only to pack more data into the main memory and to combine them more quickly to a dull cube, does not help to produce save more mistakes per day. This can already in management systems have serious consequences, which in the end the cost of Big Data is socialized again. Inadequate data decision trees are not only complex, but lead due not manageable decision parameters also to wrong decisions. With fully automated processes garbage (shit in) at the beginning of the process chain is particularly clear. Erroneous results (Provider, form letters, error-denial) at the bitter end of the process chain (shit out) we have all been driven to madness.

If we transfer the problems in data collection on the IoT industry or 4.0, then each is clear that standards of recording and reporting of data are essential. In this respect, I find it irresponsible when opinion leaders want to do under the guise of innovation, things like data quality and data security as the theme back.

Nothing but Prototype Status

Even me nervous standards and regulations, but without a certain basic level of anarchy and chaos. Rather, we should ensure that the processes for the development of standards be greatly accelerated. I also frequently rumored danger that we fall behind on these issues. But in the last 25 years I have also learned not to chase every trend. As a result of all the key question, which is hype for Cashcow, thus justifying the high Vorinvest of capital and resources. What is currently available on real projects successfully in the market, usually does not go beyond the prototype status or PowerPoint slides. As with Smart Home Again is breaking down the idea of operational measurable results, the sticking point.

Good approaches visible

But even with the strategists matures now the realization that by fine words and deeds has called for. Here were to be seen at all measuring good approaches. This timely standards can be set, but the global players have to sit at a table.

Thus, there is the OSGi Alliance ( or the Arvida project ( These are good approaches in some areas, and the views of the member dive there in an impressive name. However, these initiatives of the big picture is missing. Global standards for data exchange as a basis for IoT, M2M and Big Data need in the link layer deeper (closer hardware) fix than it does for example OSGi in a JavaVM.

So what are currently the strengths of Big Data? Big Data is currently very good there, where it can show the viewer new perspectives in complex scenarios. Because the brain is known to have problems when a problem through multiple vectors is mapped. So I look at the present time the new technologies as a supporter of human decisions. Informed decisions by machines, we are still far away. And when I look at the above-mentioned problems with "intelligent" machines decisions, this is perhaps a good thing.

No comments:

Post a Comment