|
Speaker: Avrim Blum Carnegie Mellon University
Title: Active (and Passive) Property Testing
Abstract:
Property testing concerns the problem of determining whether an unknown function has a certain desirable property, or is far from having that property, by making only a small number of queries to it. One of the motivations for testing is that it can be a useful preprocessing step before actually aiming to learn the function. However, machine learning applications often do not allow one to query for the label of arbitrary points, a capability needed by most property testing algorithms. Instead, the dominant query paradigm in machine learning--called active learning--is one where the algorithm may ask for examples to be labeled, but only from among those appearing in a given large unlabeled sample (e.g., images or documents appearing on the web, patients in a medical study, etc).
In this work, we consider property testing under these constraints and show testing can still yield substantial benefits in this setting, for a number of important properties for learning. We also give new results for passive testing, where the algorithm cannot query at all and is just given a small random sample. For example, for testing whether the function is a linear separator in R^n, we show that both active and passive testing can be done with O(sqrt{n}) queries, substantially less than the Omega(n) needed to learn such a function well. We also give a method for combining testable properties out of others, as well as develop notions of the "testing dimension" of a given property that characterize the intrinsic number of label requests needed to test that property.
This is joint work with Nina Balcan, Eric Blais, and Liu Yang.
|