Regression with Missing Data, a Comparison Study of Techniques Based on Random Forests

Abstract

In this paper we present the practical benefits of a new random forest algorithm to deal with missing values in the sample. This follows a first paper on the consistency of this new algorithm. Many different missing value mechanisms (such as MCAR, MAR, NMAR) are considered and simulated. We study the quadratic errors of our algorithm and compare it to the most popular missing value algorithms in the literature. A quick algorithmic complexity study is also given.