Importance Sampled Learning Ensembles
(Research Seminar, September 21st, 2004)

Jerome Freidman
Stanford University

Abstract
Learning a function of many arguments is viewed from the perspective of high dimensional numerical quadrature. It is shown that many of the popular ensemble learning procedures can be cast in this framework. In particular randomized methods, including bagging and random forests, are seen to correspond to random Monte Carlo integration methods each based on particular importance sampling strategies. Non random boosting methods are seen to correspond to deterministic quasi Monte Carlo integration techniques. This view helps explain some of their properties and suggests modifications to them that can substantially improve their accuracy while dramatically improving computational performance.