Bachelor's thesis Richard Roth


Analysis and evaluation of string distance algorithms for the meta data analysis in complex energy systems

Illustration of the selection algorithm in Aikido Copyright: EBC Illustration of the selection algorithm in Aikido

Buildings have an enourmous impact on our climate as they are responsible for nearly 40\% of total energy consumption in Germany. To meet the challenge of limiting average global temperature increases, it is essential to reduce energy demands of buildings.

A greater and more efficient use of energy management systems in buildings can contribute to decrease these energy demands significantly. Enabling these systems to work impeccably requires structured and uniformly presented data that is provided by the underlying models of building management systems. However, due to inconsistent data labeling, deployment of these energy management systems is limited.

Structuring data into common schemas manually is time consuming and expensive. Many approaches have been introduced to improve this process with the help of machine learning. One of them is Aikido.

Aikido allows users to label a large number of data points in an active learning process using the BUDO format, which is a is a hierarchically structured naming schema. Persistently, new labels are suggested for classification optimizing further predictions. In order to achieve a steep learning curve it is crucial to select these labels precicely. Therefore, Aikido consists of an selection algorithm, which suggests labels that differ greatly from already classified labels and are similar to unclassified labels. As a measurement for this difference the Levenshtein string distance is used.

This selection leads to no improvement compared to randomly chosen labels. Therefore, this thesis aims to increase the accuracy of Aikido by applying further string distances and different modifications to the selection algorithm of tool.

Results show that the prediction accuracy depends almost exclusively on the data set rather than on the chosen string distance or algorithm modification. This observation is similar to the results of related applications in scientific literature.