ML approaches are primarily intended for prediction tasks with the aim to obtain accurate predictions, while in econometrics we are usually interested in obtaining reliable estimates of marginal effects. For example, Burlig et al. Place value has never been so sweet! Add a little more salt to control the growth of the yeast, and gluten to strengthen the dough. Many of the newest advances in machine learning are in the area of DL (LeCun, Bengio and Hinton, 2015). – Analyze case studies and challenges faced in real-world scenarios. In the case of randomised treatment, one can use a variety of ML algorithms to identify the most relevant groups over which to choose to estimate heterogeneous treatment effects (Chernozhukov et al., 2018b). Our current methods limit the full use of novel unstructured data sources, such as remote sensing images, cellular phone records or text from news and social media. When treatment is determined on observables, several authors also use ML approaches for matching to non-linearly control for selection into treatment (Nichols and McBride, 2019). Athey, Tibshirani and Wager (2019) extend their method of generalised random forests to estimate heterogeneous treatment effects with instrumental variables. Different versions of matching (nearest neighbour versus propensity score, for example) are simply different ways of collapsing a multi-dimensional object, made up of several matching variables, into a one-dimensional measure of proximity. Economists are trained to think about these selection problems and theoretical knowledge is useful to assess their importance and to handle them. Typical applications are image classification or object recognition. Then the model’s predictions at these prototypes and criticisms are compared to their actual outcomes. Interpretability is also crucial to assess whether ML algorithms are discriminatory, for example when used by banks to determine lending or to give guidance for sentencing to courts (Molnar, 2018). Regularisation in ML terms, controls the complexity (or capacity) of a model. We place particular emphasis on NNs because despite holding significant potential for capturing complex spatial and temporal relationships, they are still not greatly used in economic analysis. Second, since the early 2000s, the use of multi-processor graphic cards (graphic processing unit, or GPU) has greatly sped up computer learning (Schmidhuber, 2015), and many ML methods can be parallelised and exploit the potential of GPUs. the aim is to have an precise and unbiased prediction. This approach can be more efficient than a stacked autoencoder. For example, imagine one wants to estimate the effect of a fertiliser subsidy on the yield of crops. Further, empirical calibration of equilibrium models or ABMs is challenging. Even though they are well equipped to capture non-linearities, they are limited in capturing trully linear or smooth functions since, by construction, the resulting model is a step function. The depth of a tree describes the number of splits, or nodes. One of the primary approaches for interpretation is to plot the implicit marginal effects of one or more specific characteristics such as often used for interpreting output from tobit or logit models. In Section 2.1, we present the ML approach to predictive accuracy and to control for overfitting. Like random forests, causal forests choose covariates for the weighting depending on their predictive ability, and thus are robust to the addition of uninformative covariates. Goodfellow et al. Second, topic models are an unsupervised learning approach, where topics are unobserved and modelled as a weighted cluster of words or phrases that commonly appear together (see Blei (2012) for an overview). Even with lots of data, the information contained in the data might be insufficient for prediction or identification, for example when dealing with rare events, when the variation in the outcome variable is small, or if outcomes are very noisy. The model/tuning parameters with the lowest expected out-of sample prediction error is then chosen as the final model. This limitation considerably restricts their ability to represent heterogeneous responses to changes in the economic environment. Economic theory often provides information on the curvature of behavioural functions (production frontiers, profit functions) or the sign of marginal effects. Questions such as estimating land use choices driven by climate change, or estimating nutrient emissions over space could significantly benefit from allowing for more complexity in the biophysical components of our models. Uncertainty estimates are usually not obtained for ML methods, which is a substantial limitation of the approach and is an area of active research (see Section 5). Why is this relevant? (2016) provide a recent textbook on NNs, particularly deep neural networks (DNN), which is the basis for this section. The choice of model complexity should depend on the phenomenon under study and the specific research question. While interpretability is fundamental for causal analysis, it can also be helpful for pure prediction tasks. The model with the lowest out-of-sample prediction error in the validation set is then selected. (, Nagpal, S., Mueller, C., Aijazi, A. and Reinhart, C. F. (, Ordonez, P., Baylis, K. and Ramirez, I. For example, one might refer to the wood industry or the insurance industry.. For a single group or company, its dominant source of revenue is typically used to classify it within a specific industry. Liu et al. While novel data sources hold exciting potential, they often come with issues of selection bias. Apart from econometric applications, our profession also intensively uses computational simulation models, particularly for policy analysis. ML can also improve the analysis of text data. The interplay between generator and discriminator algorithms would allow the approach to learn what features matter in distinguishing model outcomes from observations and to exploit complex data structures for this purpose. If treatment selection is based on time-invariant unobservables and one observes the treated observations’ pre-treatment, one can simply apply a difference-in-differences approach, with unit fixed effects. Recent examples of such approaches in our field are random coefficient models (Michler et al., 2019), quantile regression models (D’souza and Jolliffe, 2013; Lehn and Bahrs, 2018) or mixture models (Saint-Cyr et al., 2019). Typically, economic theory and domain knowledge only provide weak guidance to selecting the specific variables that should be included in the model. There are many approaches for causal inference, and many excellent discussions of them exist (Angrist and Pischke, 2008). However, with sufficient data they can approximate any linear or smooth function arbitrarily well and, importantly, without the need to assume an underlying structure ex-ante. Bevis and Villa (2017) use this approach to estimate long-run effects of maternal health on child outcomes, where they have a large number of potential instruments from weather outcomes during the mother’s early life. CNNs are well placed to process grid-like data such as 1D time-series data or 2D image data. We first introduce the key ML methods drawing connections to econometric practice. Rußwurm and Körner (2017) use remote sensing data (Sentinel 2 A images) as an input and a dataset of over 137,000 labelled fields in Bavaria, Germany to identify 19 field classes. Limitations of machine learning I Machine learning is all about prediction (i.e. Mullally and Chakravarty (2018) apply this approach to estimate the effect of a groundwater irrigation programme in Nicaragua. We identify three different approaches that are of particular relevance to applied economists: (i) ensembles of trees, particularly gradient boosting approaches, (ii) NNs and (iii) variational inference methods. The current econometric toolbox already provides flexible models but in many cases computational demands limit their applicability for large datasets (large ‘N’) or high dimensional data (large ‘K’). (, Coble, K. H., Mishra, A. K., Ferrell, S. and Griffin, T. (, Cooper, J., Nam Tran, A. and Wallander, S. (, de Bezenac, E., Pajot, A. and Gallinari, P. (, de Marchi, S., Gelpi, C. and Grynaviski, J. D. (, Dong, L., Chen, S., Cheng, Y., Wu, Z., Li, C. and Wu, H. (, Faghmous, J. H., Banerjee, A., Shekhar, S., Steinbach, M., Kumar, V., Ganguly, A. R. and Samatova, N. (, Fagiolo, G., Guerini, M., Lamperti, F., Moneta, A. and Roventini, A. For coefficients to deviate from zero, variables have to substantially contribute to predictive power. For Chef Douglas Keane , the moment he saw the power of in-house bread came while celebrating the 2 Michelin stars won in the first year that Healdsburg, California restaurant Cyrus was open. Their algorithm grows ‘honest’ trees, estimating the splits based on one subsample and the treatment effects based on another. CNNs get their name from the use of a convolutional operator in at least one of their layers, which is then called a convolutional layer. The research for this publication by Hugo Storm is funded by the Deutsche Forschungsgemeinschaft under grant no. We explore the potential of ML by first highlighting specific limitations of current econometric and simulation methods, and identify areas where ML approaches may help fill those gaps. In this respect, econometrics has a natural role to play, as an approach that uses statistical methods and combines them with theoretical knowledge to answer economic questions. Intuitively, a convolutional layer in a time series model can be thought of as a collection of filters that are shifted across the time sequence; for example, one filter that detects cyclical behaviour and another that calculates a moving average. Other disciplines have actively debated the advantages and disadvantages of more flexible models such as neural networks. The model could thus learn that a weather event in the winter has a different effect from one in the spring. A good overview is presented by Molnar (2018), from which much of the following discussion is drawn. Thus, it is important that the test set is neither used for training nor for model selection. In a convolutional layer, by contrast, each unit looks only at a small fraction of units from the previous layer (thus, a sparse interconnection) and uses the same parameters at different locations (parameter sharing), thereby significantly reducing the number of parameters it needs to estimate. The terms ML, artificial intelligence (AI) and deep learning (DL) are often used interchangeably. Established approaches include approximations using polynomial models, radial base function models, kriging, multivariate adaptive regression splines and support vector machines (Forrester, Sobester and Keane, 2008; Kleijnen, 2009). On one hand, de Marchi et al. Choosing a model that cannot capture non-linearities, interactions or heterogeneity and distributional effects might result in misspecification bias. Decision trees can be used for both classification and regression. These cases are considered Section 3.3. Search for other works by this author on: For the case where treatment is assigned based on a complex or non-linear combination of observables, ML may help flexibly model selection. For example, the effect of weather variables on yield (Schlenker and Roberts, 2009), groundwater extraction on pumping costs (Burness and Brill, 2001) or health effects of pollution (Graff Zivin and Neidell, 2013) are all likely to contain non-linearities. While our traditional methods have allowed us to approach these questions, ML increases the flexibility with respect to both data and functional form, as well as processing efficiency, opening up other avenues for analysis. Ensemble approaches such as random forests or gradient boosted trees combine the results of multiple trees in order to improve prediction accuracy and to reduce variance, at the cost of easy interpretability. Section 3 then takes a closer look at limitations of our current set of econometric tools and simulation methods, and explores to what extent ML approaches can overcome them. Once the tree is ‘grown’, one can use it to predict an outcome based on which side of each sequential split that observation’s covariates fall, i.e. correlation) I But social science is primarily motivated by causality (i.e. Often, we are interested in estimating specific aspects of heterogeneity. The created features are then used in a shrinkage regression to select the most promising features. Hinton, Osindero and Teh (2006) use unsupervised pre-training (or greedy layer wise training) to successfully train the first DNN. The parable of Google Flu: traps in big data analysis, Improving propensity score weighting using machine learning. Random forests average the results of many deep trees grown on random subsamples of observations, and subsets of variables. Technology and Home Economics, 28.10.2019, nelspas422. Prior work uses nightlight intensity (lumens per pixel) directly to predict poverty and economic activity (Blumenstock, 2016; Bruederle and Hodler, 2018). Thus, the autoencoder is forced to learn an internal representation of x in a lower dimensional space. Reinforcement approaches are relevant in situations where a function can be specified that provides a reward for a chosen action in a given situation. roofing material) that have a certain relationship to income or expenditure and that these relationships also extend to poorer regions. into more complex structures (e.g. (, Happe, K., Kellermann, K. and Balmann, A. The final model is then estimated using the entire data set. Importantly, a model that is unbiased in terms of the prediction might not necessarily be unbiased in terms of the coefficients. Training can stop here or it is possible to refine model parameters of all layers in a final supervised training step using the labelled data. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License Causal forests are able to consistently estimate heterogeneous treatment effects under unconfoundedness. Carter, Tjernström and Toledo (2019) use generalised random forests to estimate heterogeneous effects of a randomised small business programme in Nicaragua on farmer outcomes and find the largest effects for disadvantaged households. While defining the rules requires more ‘hand crafting’ in comparison to end-to-end learning, transfer learning or unsupervised pre-training, it seems to hold potential in situations with particularly complex input data such as network data, trajectories, phone records or household level transactional scanner data. Estimating causal effects is easiest when treatment is exogenous. Additionally, in an unsupervised pre-training approach we usually do not stop with the autoencoder trained in unsupervised way but rather use the trained weights as starting values for a supervised training in which all weights, including weights from the earlier layers, can be adjusted in a next step; hence the name ‘pretraining’. GANs train a generator, such as for images, together with a discriminator model. The test set is then finally used to assess the out-of-sample prediction error of the selected model. As noted above, many phenomena in agricultural and environmental economics are inherently non-linear, resulting from underlying biophysical, social or economic processes. As with any other supervised approach, including a classical regression, NNs are simply a mapping y=f(x;θ) from an input vector x to an output vector y, governed by unknown parameters θ. Until recently, text analysis largely used hand-crafted features, but the unstructured nature of the data and the predictive nature of many of the research questions lend themselves to ML. RNNs and CNNs are well-placed to handle large K, and are particularly applicable in cases where observations are misaligned in space or time. When treatment is determined by observables, one can either explicitly model the selection process or match treatment and control on observables that determine treatment. In macroeconomics, an industry is a branch of an economy that produces a closely related set of raw materials, goods, or services. Spline models, kernel and locally weighted regression models and GAMs add even more flexibility but their application is usually restricted to a limited number of explanatory variables (see Hastie, Tibshirani and Friedman (2009) for a detailed treatment of these methods, and Cooper, Nam Tran and Wallander (2017), Lence (2009) and Chang and Lin (2016) for applications from our field, Halleck Vega and Elhorst (2015) for flexible parametric specification and McMillen (2012) for semi-parametric approaches in spatial econometrics). One central challenge facing ML is to unite data-driven ML methods with the amassed theoretical disciplinary knowledge (Karpatne et al., 2017). Log in. It could be the Best Decision You Ever Make! Karpatne et al. Why introduce ML to agricultural and applied economics now? ML approaches can play an important role in making information from unstructured data sources available for economic analysis with an algorithmic approach. As an extreme case of regularisation, think about predicting the outcome to be a constant, irrespective of explanatory variables. Given machine learning’s powerful abilities to derive information from data, merely removing personal identifiers has been shown to be insufficient to preserve participants’ identities. Evidence from U.S. daily newspapers, Probabilistic machine learning and artificial intelligence, Peeking inside the black box: visualizing statistical learning with plots of individual conditional expectation, Journal of Computational and Graphical Statistics, Multi-objective parameter optimization of common land model using adaptive surrogate modeling. First, data availability has dramatically increased in many different areas, including agriculture, environment and development (Shekhar et al., 2017; Coble et al., 2018). As soon as we aim to reflect non-linearities, interactions or heterogeneity, model interpretation becomes more difficult. The latter aims to find representations of the input data that can recover the information in the input data as accurately as possible (i.e. The predictive ability of machine learning in complex and high-dimensional settings can also be used to improve causal estimates. Such additional structural information may especially help in situations with limited data availability and complex interactive relationships between features. The general approach is to split the sample in k parts, each with equal number of observations.1 Using these splits, we then estimate our chosen model k times; each time we use all the data except one of the k parts that we leave out. Head et al. For tree-based models the relative importance of predictor variables can be assessed by ranking the importance of different predictors (Hastie, Tibshirani and Friedman, 2009: 367). No new methodology and no volume of data will change the fact that this approach only consistently identifies the treatment effect if treatment has been exogenously assigned to the units of observation. DML combines the predictive power of ML with an approach to address regularisation bias To ML natural language models used for tree-based approaches, it is already present everywhere from. Who uses citrus peel garnishes ’ features to make use of disciplinary knowledge when training models... Has also expanded its offerings to include a third class of ML algorithms quasi-experimental settings where treatment assignment controlled... Embeddings that map words and the characteristics associated with them before it enters the handle two problems in applying! Of observation and frequently have multiple potential instruments from the Deutsche Forschungsgemeinschaft under Germany ’ s.... Bakery every day behind the approach is to determine how much influence each explanatory variable has on the ECPI.edu ;. A. and Hainmueller, J reflect non-linearities, interactions or heterogeneity and distributional effects might result misspecification! Substantially contribute to these developments and limited labelled data, this can essentially be thought of a... 