“Data Monkeys”: A Procedural Model of Extrapolation from Partial Statistics

I present a behavioral model of a “data analyst” who extrapolates a fully specified probability distribution over observable variables from a collection of statistical datasets that cover partially overlapping sets of variables. The analyst employs an iterative extrapolation procedure, whose individual rounds are akin to the stochastic-regression method of imputing missing data. Users of the procedure’s output fail to distinguish between raw and imputed data, and it functions as their practical belief. I characterize the ways in which this belief distorts the correlation structure of the underlying data generating process – focusing on cases in which the distortion can be described as the imposition of a causal model (represented by a directed acyclic graph over observable variables) on the true distribution.

Advance Access

Bookmark the permalink.