back... |
User 4XXXXX9: Anonymizing Query Logs Eytan Adar
The recent release of the American Online (AOL) Query Logs
highlighted the remarkable amount of private and identifying
information that users are willing to reveal to a search engine.
The release of these types of log files therefore represents a
significant liability and compromise of user privacy. However,
without such data the academic community greatly suffers in their
ability to conduct research on real search engines. This paper
proposes two specific solutions (rather than an overly general
framework) that attempts to balance the needs of certain types of
research while individual privacy. The first solution, based on a
threshold cryptography system, eliminates highly identifying
queries, in real time, without preserving history or statistics about
previous behavior. The second solution attempts to deal with sets
of queries, that when taken in aggregate, are overly identifying.
Both are novel and represent additional options for data
anonymization.
To appear at the Query Log Workshop, WWW'07, PDF (105K) |