Host: Carsten Binnig, Cooperative State University of Baden-Wuerttemberg
1:00pm – 1:45pm: Speaker: Olaf Hartig , University of Waterloo
1:45pm – 2:30pm: Speaker: Fei Chiang, McMaster University
2:30pm – 2:45pm: BREAK
2:45pm – 3:30pm: Speaker: Eyal de Lara, University of Toronto
Speaker: Olaf Hartig, University of Waterloo
Title: Querying Linked Data on the Web
The World Wide Web (WWW) currently evolves into a Web of Linked Data where content providers publish and link their data as they have done so for Web documents since 20 years. While the execution of SQL-like queries over this emerging dataspace opens possibilities not conceivable before, querying the Web of Linked Data poses novel challenges. Due to the openness of the WWW, it is impossible to know all data sources that might contribute to the answer of a query. To tap the full potential of the Web, traditional query execution paradigms are insufficient because those assume a fixed set of potentially relevant data sources beforehand. In the Web context these data sources might not be known before executing the query.
In this talk we discuss how the Web of Linked Data -conceived as a database system- differs from traditional database scenarios. In this context we present results on theoretical properties of queries over the Web of Linked Data. Furthermore, we introduce a novel query execution paradigm that allows an execution engine to discover potentially relevant data during the execution of queries.
Olaf is a postdoc fellow in the Database Research Group at the University of Waterloo. His research focuses on querying the Web of Linked Data and on information quality of Linked Data. His aim is to develop and study concepts that allow users to query the Web of Linked Data as if it is a huge global database system. As project maintainer and lead developer he is implementing these concepts in the free software project SQUIN. Since query execution in an open environment such as the Web poses questions of information quality and trustworthiness, Olaf also works on concepts for integrating the assessment of quality criteria into the query execution process. Olaf presented several Linked Data related tutorials at major international conferences such as ISWC 2008, ISWC 2009, WWW 2010, ICWE 2012, ESWC 2013 and WWW 2013; and he was lecturer at the Indian-Summer School on Linked Data 2011. Furthermore, he served on various program committees and he was an invited expert in the W3C ! provenan ce incubator group and in the W3C provenance working group.
Speaker: Fei Chiang, McMaster University
Title: Big Data Quality
As increasing amounts of data are being generated and stored, poor data quality is an increasingly pervasive problem for organizations as they try to derive value from raw data. As data is often machine generated, real data contains erroneous, duplicate, incomplete, and missing values. Ensuring the data conforms to a correct set of business and integrity constraints is vital towards realizing maximum value from data processing tasks. To improve data quality and to better understand and query data, we need to be able to discover and maintain the data and the integrity constraints that capture the application semantics. In this talk, I will present our data cleaning techniques that help to improve data quality, and outline future directions in light of the Big Data era.
Fei Chiang is an Assistant Professor in the Department of Computing and Software at McMaster University. Her research interests are broadly in the area of data management, with a focus on data quality, business analytics, and information extraction. She received her M. Math from the University of Waterloo, and B.Sc and PhD degrees from the University of Toronto, all in Computer Science. She is the recipient of an NSERC Canada Graduate Scholarship. She has worked at IBM Global Services, in the Autonomic Computing Group at the IBM Toronto Lab, and in the Data Management, Exploration and Mining Group at Microsoft Research.
Speaker: Eyal de Lara, University of Toronto
Title: Agile Data Processing on the Cloud
A major advantage of cloud computing is the ability to use a variable number of virtual machine (VM) instances depending on the needs of the problem. Unfortunately, instantiating new VMs on existing clouds, such as Amazon’s EC2, is a slow operation that typically takes "minutes." This lack of agility fails to provide users the full potential of the cloud, and forces application providers to overprovision, thus wasting valuable resources.
In this talk, I will describe VM fork, a new abstraction that can replicate a VM into hundreds of cloud hosts in less than a second, and will show how it can be leveraged to run web servers
Eyal de Lara is an Associate Professor in the Department of Computer Science at the University of Toronto. Eyal received his Ph.D. and M.Sc.from Rice University in 2002 and 1999, and a B.Sc. from the Instituto Tecnologico de Monterrey in 1995. His research interests include distributed systems and mobile computing. His research has been recognized with an IBM Faculty Award, a NSERC Discovery Accelerator Award, and the CACS/AIC Oustanding Young Computer Science Researcher Prize.