Personalization From Incomplete Data
(Research Seminar, November 14th, 2001)
Balaji Padmanabhan
The Wharton School, University of Pennsylvania
Abstract
Clickstream data collected at any web site (site-centric data) is inherently incomplete, since it does not capture users' browsing behavior across sites (user-centric data). Hence, models learned from such data may be subject to limitations, the nature of which has not been well studied. Understanding the limitations is particularly important since most current personalization techniques are based on site-centric data only. In this paper, we examine the implications of learning from incomplete data in the context of two specific problems: (a) predicting if the remainder of any given session will result in a purchase and (b) predicting if a given user will make a purchase at any future session. Based on user-level clickstream data gathered from 20,000 users' browsing behavior, we demonstrate that models built on user-centric data outperform models built on site-centric data for both prediction tasks. We discuss implications and present initial approaches to an unconventional type of missing data problem - one in which entire attribute sets are unknown and have costs involved in obtaining these values.
|
|