Economics Meets Data Science: The Structural Estimation Series, Part I

Hey there! I'm Juan (ファン), a researcher at DSOC's Social Science Team. Since this is probably the first time you read me, I thought of briefly introducing myself.
I was born in El Salvador, Central America, which explains why you'll find some Español here and there in my posts. I majored in Economics when studying the undergrad back at home, then came to Japan in 2011, graduated from the PhD in Economics and joined Sansan in early 2019. Most of my research concentrates on Labor and Education Economics, both in Japan and in Developing countries. As it turns out I also worked as a software engineer in the Fintech Industry for almost three years, and spent most of my time figuring out why my tests are failing, why my Scala code doesn't compile and why CSS hates me... you know, programmer stuff. Currently I work mostly on app development and doing research on business networks formation.

But enough of me, let's talk about you. There's a big chance that you clicked this post when you read the words Structural Estimation. If you're familiar with Econometrics, I'm pretty sure that you've heard it somewhere; but maybe your background is in a different field, in which case you might be wondering what that is and why you should be interested.

For Social Scientists working in Data Science, reduced form estimation via linear regression is the most popular approach. In general, most of the attention is put on obtaining "significant results". Often, adjustments are made to fix standard error estimators, to make the best use of panel data to control for some types of unobservable factors, and even to assure that the independent variables are really exogenous. Linear regression is well known, simple to understand and is available in most popular statistical packages. Reduced form estimation does not require you to specify the data generation process, and therefore is less involved in the economic theory front.

However, especially if you deal with human behavior, there's a limit to what you can achieve with this approach. Causality does not come without effort, and even if you attempt to use Instrumental Variables estimation and experiments, nothing assures you that your conclusions will hold out in the real world. But more importantly, there's that feeling when you see the regression coefficient of some variable of interest and can't really understand its meaning. All these have implications on your business. If you employ your estimates for assessing the impact of Marketing campaigns, failing to obtain causal effects can lead you to make expensive mistakes. In the case of web apps, it is important to take into account that sample selection bias is the rule rather than the exception (nobody chooses their customers by random sampling). If your customer base comes from a very specific cluster of the market, reduced form estimates and even Machine Learning predictions won't be very helpful at forecasting your revenue in the case a different segment was captured.

Structural Estimation is a well-studied technique that makes it possible to obtain robust conclusions and predictions from your data. It is called by some the Holy Grail of causal inference. And although it offers several advantages, many people interested in using it for research and work, but don't really know where to start from. I can clearly see why: the literature is vast, without a clear starting point, and it requires a wide range of skills, from modeling to numerical methods and calculus, so the cost of entry is relatively high. In this series of posts I attempt to lower that cost. I will concentrate on explaining the concept, introducing its use cases and benefits, reviewing some of the most famous methodologies and recent advances, discussing implementation details and introducing some useful resources. I hope that after going through these posts you'll be in a better position for incorporating these techniques into your research.

What is Structural Estimation?

Economists believe that human behavior follows some sort of mechanics. Even Behavioral Economists believe that humans are predictably irrational. Mainstream Economics sees choice behavior as a decision process that balances two facts: that stuff is scarce, and that human desire isn't. Because of this, humans must behave in a way that benefits them the most given the limitations of the environment. Basic Microeconomics approaches choice behavior by assuming that people compare baskets of stuff: gym time, food, risk, etc. People with different preferences might prefer different baskets of stuff over others. As you know very well, we humans don't just make choices at random, and are consistent to some degree in what we prefer. I, personally, like tonkatsu more than natto, and that won't change all of a sudden (believe me). Economists make some assumptions about this process in order to make it friendlier to mathematical analysis. Consider these two assumptions about preferences:

Individuals can compare any pair of bundles of stuff A and B, deciding whether they are indifferent between both or whether they prefer one over the other.
An individual who prefers A over B, and B over C also prefers A over C.

When you hear the term rational preferences, it usually refers to a world where both of these assumptions hold. What is important about rationality is that it allows to express preferences as a mapping between bundles of "goods" and the real numbers. This mapping is what Economists call the Utility Function .The following is an example of a utility function given by the choice of the levels of two goods, X and Y:

$U = \alpha X^{\beta}Y^{\gamma}$

Utility functions can be constructed both for people as well as for more abstract entities, such as companies (which in the end are managed by people). Usually there's a restriction on the amount of stuff of $X$ and $Y$ that can be put into the utility function. That restriction is called the budget restriction, which is a function of the cost of the goods and the resources available to the agent. The decision process of the agent can then be thought of as the process of maximizing the utility function subject to the budget restriction. Although the concept is very simple, understanding the shape of the utility function that causes some observed behavior can yield very useful insights. For example, knowing whether two goods are complementary or substitutes can help you better allocate the merchandise if you're running a supermarket. Also, understanding the price elasticity of some good (how sensitive the consumed amount of some good is to changes in the price) can help you make pricing decisions: you may raise the price of an inelastic good and your income will go up, but raise it too much and it may even go down!

Now imagine that you have data on bundles of stuff and choices of some group of agents across time, and you want to know the utility function that produces that data. What you're attempting to do is Structural Estimation. Given that the utility function I presented above accurately describes the preferences of the target individuals, all you'd need to do is to estimate $\alpha$ , $\beta$ and $\gamma$ , the structural parameters.

As it turns out, that is way harder than it sounds:

Observed choices can be the result of a bargaining process between competing agents (buyers and sellers, employers and workers, rival companies, etc.)
Agents make decisions taking into account private knowledge for which data is not available.
Individuals make decisions dynamically, taking into account the effect of their decisions today on their utility in the future, which is uncertain.

Because of this, even if the utility function is a very simple linear one, its parameters won't be identified by the coefficients of a linear regression in most cases. In fact, it might be the case that the parameters can't even be identified with the available data!

Due to these complications, many researchers avoid trying to estimate the structural parameters altogether. Instead, they are happy with obtaining Treatment Effects by employing Randomized Controlled Trials (RCTs), natural experiments and other econometric tricks. This approach is so popular that most of the popular techniques (Instrumental Variables, Propensity Scores, Difference in Differences, etc.) have implementations in statistical software packages such as STATA and R, so that estimates can be obtained without much trouble.

So Why Even Attempting Structural Estimation?

The problem with the techniques mentioned above is that conclusions are not necessarily valid outside the context of the experiment that yields them. For example, imagine that you want to understand how crime affects the choice of school for children up to high school. You could employ some exogenous variation in crime rates, for example a truce between gangs, to answer the question: "how would people behave if there was less crime?". If the result was, for example, that students were more likely to enroll in more expensive schools during the duration of the truce, there's no way for you to prove that the same conclusion would hold if crime rates lowered for any other reason! It could be the case that the truce translated into higher enrollment into more expensive schools due to some specific channel that would be unlikely in any other case. In econometric jargon, you'd say that your conclusions lack external validity.

However, structural estimates are externally valid given that the structural model closely resembles the true decision process. In fact, because you have estimated the actual process that generates the data, you're capable of creating counterfactual scenarios like: what would happen if I employed marketing campaign B instead of campaign A, even when that scenario is not captured by the data. Furthermore, if you could closely estimate the true model, you could also perform prediction outside the sample.

A great example is the paper by Wolpin and Todd (2006), which employs Structural Estimation to evaluate the effect of the PROGRESA school subsidy program in Mexico on educational outcomes and fertility. The program was put into practice by dividing the target population randomly between a control and a treatment group. Wolpin and Todd estimate the structural parameters of a complex utility function employing only the control group, and use their estimates to predict the outcomes of the treatment group with a good degree of precision. That's external validity at work! Furthermore, they employed their estimates to evaluate what the effects would be if the government had used completely different policy options, and to determine whether there was a different strategy that could yield the same outcomes at a lower cost.

Why Isn't it More Popular?

Honestly, Structural Estimation is famous for being hard. Depending on the approach, the researcher may be required to:

Use Economic Theory to come up with a theoretical model that properly explains the behavior of interest. Recycling models does not work well here.
Employ Linear Algebra and Calculus to perform maximization and obtain moments for identification.
Use Statistics to evaluate the properties of the estimators.
Create efficient algorithms for identifying the parameters of interest using numerical methods (root-finding, fixed point iteration, etc.)
Write custom code to apply the algorithms to a dataset. Standard statistical packages usually don't offer a framework general enough to save you from coding to some extent.
Obtain the necessary data. In particular, dynamic decision processes make panel datasets a necessity, and multi-dimensional state-spaces require larger datasets due to the curse of dimensionality.

However, this has changed in present days. Algorithms have become more efficient and less dependent on expensive operations, which reduces the need to implement complicated procedures. Less complexity leads to more intuitive estimation strategies. Parallelization and improvements in hardware capabilities means that results can be obtained in a short time in a common high-level language such as Python, considerably reducing the frustration factor. Finally, programming skills are more prevalent than ever, releasing researchers from the linear regression bubble.

Why Machine Learning Engineers Could Be Interested?

I've been hearing about Neural Networks disrupting Econometrics since I was in grad school, and although the field is moving in that direction, it's not there yet, which means that it represents a good opportunity for new researchers to leave their mark. Machine Learning and Structural Estimation are a match made in heaven. Famous Artificial Intelligence projects, such as Bonanza and Alpha Go employ Structural Estimation methods on Discrete Choice problems to obtain super-human performance in complex games such as Shogi and Go.

Machine Learning is incrementally becoming one more tool for Structural Estimation, and recent years are seeing quite some progress in this direction. In fact, big names the likes of Chernozhukov, Rust, Keane and Ichimura among others have been recently involved in one way or another in blurring the border between Machine Learning and Structural Estimation, so I anticipate a lot of action on this front in the coming years.

How to start?

I believe that the best way of reducing the costs of entry into Structural Estimation is by looking at the big picture first. The book by Adda and Cooper (2003) offers a great introduction and discusses with examples three main types of methods used in the literature: Maximum Likelihood methods, the Generalized Method of Moments and Simulation-Based methods.

But I think that learning the theory is just half the job. The devil is in the details, and most of the trick about performing Structural Estimation resides in the software implementation. In fact, many of the current improvements have been motivated by the scarcity of computing power at the time the methods were developed. Fortunately there are many datasets out there with which you can experiment while following the papers. In particular, check Pedro Mira's website to find some nice datasets accompanying the paper Aguirregabiria and Mira (2010).

A good way to get your hands dirty is by estimating John Rust's Optimal Replacement of GMC Engines Problem. This model has become the most popular toy model for benchmarking different methodologies on the estimation of discrete choice dynamic models. That is going to be the topic for the next post.

¡Hasta luego!

References

Adda, Jerome and Russell Cooper (2003) Dynamic Economics: quantitative methods and applications, London, MIT Press.

Aguirregabiria, Victor and Mira, Pedro (2010) "Dynamic discrete choice structural models: A survey," Journal of Econometrics, 156(1): 38-67.

Belloni, Alexandre, Victor Chernozhukov and Christian Hansen (2014) "High-Dimensional Methods and Inference on Structural and Treatment Effects," Journal of Economic Perspectives, 28 (2): 29-50.

Chernozhukov, Victor, Juan C. Escanciano, Hidehiko Ichimura, Whitney K. Newey and James M. Robins (2018) "Locally robust Semiparametric Estimation." Working paper CWP30/18. IFS.

Chernozhukov, Victor, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney K. Newey and James M. Robins (2018) “Double/debiased machine learning for treatment and structural parameters”, The Econometrics Journal, 21(1):C1–C68.

Mitsuru Igami (2018) " Artificial Intelligence as Structural Estimation: Economic Interpretations of Deep Blue, Bonanza, and AlphaGo," Cornell University arXiv:1710.10967

Martínez D., Juan N. (2018) "The Short-Term Impact of Crime on School Enrollment and School Choice: Evidence from El Salvador," Economía 18(2): 121-145.

Sansan Tech Blog

Sansanのものづくりを支えるメンバーの技術やデザイン、プロダクトマネジメントの情報を発信