Speaker: Fei Chiang, Dept. of Computing and Software, McMaster University
Title: Towards Privacy-Preserving Data Cleaning
Data quality has become a pervasive challenge for organizations as they wrangle with large, heterogeneous datasets to extract value. Given the proliferation of sensitive and confidential information, it is crucial to consider data privacy concerns during the data cleaning process. For example, in medical database applications, varying levels of privacy are enforced across the attribute values. Attributes such as a patient's country or city of residence are less sensitive than the patient's prescribed medication. Traditional data cleaning techniques assume the data is openly accessible, without considering the differing levels of information sensitivity. Although recent work has proposed user defined data cleaning operations over privatized relations, these techniques consider privacy requirements and data cleaning as independent steps by de-coupling these two tasks.
In this project, we take the first steps towards a data cleaning framework that integrates privacy as part of the data cleaning process. I will present a constraint based data cleaning framework based on k-anonymity that allows for two parties, a target data source T, and a master data source M, to exchange information without violating k-anonymity. The goal is to maximize the data utility (and consistency) in T while minimizing the information disclosure from M. This is ongoing work, and feedback is welcomed.
Fei Chiang is an Assistant professor in the Department of Computing and Software at McMaster University, and served as the Associate Director of the MacData Institute until 2016. She received her M. Math from the University of Waterloo, and B.Sc and PhD degrees from the University of Toronto, all in Computer Science. She leads the Data Science Research Group, which is focused on developing tools to facilitate data cleaning, improved data quality and fostering knowledge discovery. She has worked at IBM Global Services, in the Autonomic Computing Group at the IBM Toronto Lab, and in the Data Management, Exploration and Mining Group at Microsoft Research. She holds two filed and two published patents for her work in self-managing database systems. She is a Faculty Fellow with the IBM Centre for Advanced Studies.