Data-intensive Innovation and the State: Evidence from AI Firms in China

Martin Beraja, Massachusetts Institute of Technology, David Y. Yang, Harvard University, Noam Yuchtman, London School of Economics

Developing AI technology requires data. In many domains, government data far exceeds in magnitude and scope data collected by the private sector, and AI firms often gain access to such data when providing services to the state. We argue that such access can stimulate commercial AI innovation in part because data and trained algorithms are shareable across government and commercial uses. We gather comprehensive information on firms and public security procurement contracts in China’s facial recognition AI industry. We quantify the data accessible through contracts by measuring public security agencies’ capacity to collect surveillance video. Using a triple-differences strategy, we find that data-rich contracts, compared to data-scarce ones, lead recipient firms to develop significantly and substantially more commercial AI software. Our analysis suggests a contribution of government data to the rise of China’s facial recognition AI firms, and that states’ data collection and provision policies could shape AI innovation.