• Research

Exploring Rated Datasets with Rating Maps. Proceedings of the 26th International Conference on World Wide Web

Sihem Amer-Yahia, Sofia Kleisarchaki, Naresh Kumar Kolloju, Laks V.S. Lakshmanan, Ruben H. Zamar. pp. 1411-19. doi: 10.1145/3038912.3052623

Online rated datasets have become a source for large-scale population studies for analysts and a means for end-users to achieve routine tasks such as finding a book club. Existing systems however only provide limited insights into the opinions of different segments of the rater population. In this paper, we develop a framework for finding and exploring population segments and their opinions. We propose rating maps, a collection of (population segment, rating distribution) pairs, where a segment, e.g., ?18-29 year old males in CA? has a rating distribution in the form of a histogram that aggregates its ratings for a set of items (e.g., movies starring Russel Crowe). We formalize the problem of build- ing rating maps dynamically given desired input distributions. Our problem raises two challenges: (i) the choice of an appropriate measure for comparing rating distributions, and (ii) the design of efficient algorithms to find segments. We show that the Earth Mover’s Distance (EMD) is well- adapted to comparing rating distributions and prove that finding segments whose rating distribution is close to input ones is NP-complete. We propose an efficient algorithm for building Partition Decision Trees and heuristics for combining the resulting partitions to further improve their quality. Our experiments on real and synthetic datasets validate the utility of rating maps for both analysts and end-users.

Published on August 23, 2018