Speaker: Konstantinos Zoumpatianos
University of Trento
Title: Indexing and Mining Big Scientific Data: Enabling exploratory analysis
In recent years there has been an intense need for the development of techniques able to index very large volumes of data series. Examples of their application come from astronomy, biology, the web, and other domains. It is not unusual for these applications to involve numbers of data series in the order of hundreds of millions to billions.
In this talk, we present a novel data structure designed for indexing and mining truly massive collections of time series. We show that the main bottleneck in mining such massive datasets is the time taken to build the index, and we thus describe a novel bulk loading mechanism, the first of this kind specifically tailored to a data series index.
Furthermore, we observe that in several cases scientists, and data analysts in general, need to issue a set of queries as soon as possible, as a first exploratory step of the datasets. In order to address this need, we extend our previous technique and present methods able to adaptively create data series indexes, and at the same time able to correctly answer user queries. We show that we are able to process a large number of queries while the index is being built, and that especially for skewed query workloads our technique offers significant benefits.
Konstantinos (Kostas) Zoumpatianos is a PhD student at the dbTrento group, University of Trento, Italy. His research involves Data Warehouses, Business Intelligence and Data Series management. He holds a MSc Degree in Information Management and a BSc degree in Information and Communication Systems Engineering, both from the University of the Aegean, Greece. Prior to joining the University of Trento he worked as a Software Engineer for various technology startups.