Title: Query-Based Data Pricing
Speaker: Dan Suciu, University of Washington
Increasingly, data is being bought and sold online, and Web-based marketplace services have emerged to facilitate selling and buying data. But current pricing mechanisms are very simple. In this talk I will discuss a framework for pricing data that allows the seller to set explicit prices for a set of views of her choice, yet allows the buyer to buy any query; the price of the query is derived automatically from the explicit prices set by the seller. We call this framework ``query-based pricing''. A pricing function must satisfy two important properties: it has to be arbitrage-free and discount-free. I will show that these properties are intimately related to a fundamental database concept called "query-view determinacy", which has been studied by Nash, Segoufin, and Vianu. A view V "determines" a query Q if for any two databases D1, D2, V(D1)=V(D2) implies Q(D1)=Q(D2). Equivalently, a view V determines Q if the query can be answered only from the view, without accessing the database at all. If V determines Q, then the arbitrage-free property requires the seller to set prices such that Price(Q) <= Price(V): otherwise, a middle man can do arbitrage, by purchasing V, computing Q from V, then reselling Q at a profit, while undercutting the original seller. I will discuss several aspects of arbitrage-free pricing functions: The case when the seller sells a service in addition to the data; The distinction between arbitrage-freeness and query containment (fewer answers are sometimes more expensive than more answers!); And the connection to data privacy (data perturbed by a random noise should be cheaper than the accurate data). Time permitting, I will present several theoretical properties, and the complexity of arbitrage-free prices.
Joint work with Paraschos Koutris, Prasang Upadhyaya, Magdalena Balazinska, and Bill Howe