Speaker: Ihab F. Ilyas
Cheriton School of Computer Science
University of Waterloo
Title: Holistic and Extensible Data Cleaning
In this talk we describe some of our latest work in the area of data cleaning. In particular we advocate for two main design choices when it come to building data cleaning solutions, namely, holistic cleaning, and extensible constraints specification. We present our latest algorithm for holistic data cleaning, where multiple heterogeneous rules are compiled on the data instance and a detection and repairing algorithm discovers and addresses the anomalies with respect to these constraints. We also discuss how we built our open-source extensible data cleaning solution, NADEEF, as a testbed for evaluating repairing algorithms. NADEFF offers an extensible rule specification interface that goes beyond simple closed-form constraints such as functional and matching dependency.
Ihab Ilyas is an Associate Professor of Computer Science at the University of Waterloo. He received his PhD in computer science from Purdue University, West Lafayette in 2004. He holds BS and MS degrees in computer science from Alexandria University. His main research is in the area of database systems, with special interest in data quality, managing uncertain data, rank-aware query processing, and information extraction. From 2011 to 2013 he has been on leave leading the Data Analytics Group at the Qatar Computing Research Institute. He spent two summers with IBM Almaden Research Center and he is currently an IBM CAS faculty fellow since January 2006. Ihab is a recipient of the Ontario Early Researcher Award in 2008, and the David R. Cheriton Faculty Fellowship in 2013. He is a co-founder of Data Tamer Inc., a startup focusing on large-scale data integration and cleaning. For more information and a list of publications, please visit Ihab's web page https://cs.uwaterloo.ca/~ilyas/