"Analyzing and Predicting Large Vector-, Graph- and Spatio-temporal Data"
Large social graph datasets, pertaining to millions of social network users and the billions of relationships between them; complex, high dimensional vector data of biomedical database systems; and petabytes of environmental sensor data are being generated every day. Employing this flood of data for the benefit of all, is one of the main challenges of the 21st century. This talk advances the field of data science for a variety of data types. For vector data two novel subspace clustering techniques are introduced, focusing on redundancy reduction and automation to increase the efficiency of the algorithms. Automation of both algorithms is achieved by creating a coding scheme using the minimum description length principle. For graph data on biomedical data another new clustering method, i.e. community detection method, is employed on the microbiome of a newborn cohort to find out about the influence the microbiome has on diabetes type two. Another method on graphs is a novel outlier detection algorithm of the tree-like representation of cancerogenous cells. Further, an efficient classification method, predicting natural hazards, specifically storms, using extremely large environmental sensor data from historic timeseries is proposed. High efficiency is achieved by applying tensor factorization techniques to this large timeseries data. Last, a way is shown to seamlessly integrate and automatically optimize these methods in modern relational main-memory databases, employing examples of classical clustering and classification approaches. All introduced methods are vastly experimentally evaluated and have already contributed to the respective research society. The integration of these methods into relational main-memory databases allows a wide leap from the theoretical method creation process to praxis-oriented database usage.
Dr. Nina Hubig is a Data Scientist at BMW accountable for discoveries and part of the process mining team applying machine learning methodology to all resorts of the company. She achieved her PhD 2017 at Technical University of Munich (TUM) focusing on developing new data science algorithms for various applications ranging from biomedical to social network analysis. With a strong emphasis on database research, she included her algorithms into the main memory database system HyPer (now sold to Tableau).
Friday, March 15 at 2:30pm to 3:30pm
McAdams Hall, 114
821 McMillan Rd., Clemson, SC 29634, USA