[ Log In | Sign Up | Lost password? ]

Data Science

Big Data

The term "Big Data" describes the extreme amounts of unsorted, heterogeneous and distributed data scattered across the IT infrastructure. Although a first prominent example of big data occurences is of course related to the information stored by social media, manufacturing industry copes with this problem as well. During the production process, a multitude of different information are stored, measurement values of the sensorial infrastructure used to control the process and key performance indicators of the product that are used for quality analysis are only two examples. Extracting knowledge from this data is difficult and requires smart approaches. Mathematically the data has to be transformed and the real information content has to be identified. Additonally the handling of the large data sets straightforwardly leads to distributed problem solving techniques.

Data Mining

Data mining has been a longstanding influencial technique for many manufacturing companies. Observing the quality of a production facility allows to trace errors and failures quickly and efficiently. In the age of "big data" the mining of heterogenous data from multiple, distributed data bases is still an urgent topic. Furthermore, industrial IT landscapes have been grown for years and most industrial software solutions have to cope with legacy systems used by customers. These so-called "brown-field" scenarios require a detailed interface to existing outdated data base technologies as well as a robust communication through the customers network.

Ongoing Projects

One of my primary research interests is to find suited representations for information content and means of their proper exploitation. Software theoretically, I pursue a very strict object-oriented philosophy that helps to structure content into logical pieces. Once data is represented in the appropriate way, data mining is a straightforward task and typically, I apply here means of Bayesian statistics, decision trees and random forests. Both topics are closely related to my other interests in multi-agent technology and learning ensembles as well. Most of this field is covered by ongoing industrial projects and are yet covered by in-house publications only.