On the Consistency of a Random Forest Algorithm in the Presence of Missing Entries

Abstract

This paper tackles the problem of constructing a non-parametric predictor when the latent variables are given with incomplete information. The convenient predictor for this task is the random forest algorithm in conjunction to the so-called CART criterion. The proposed technique enables a partial imputation of the missing values in the data set in a way that suits both a consistent estimator of the regression function as well as a partial recovery of the missing values. A proof of the consistency of the random forest estimator is given in the case where each latent variable is missing completely at random (MCAR).