Economics Meets Data Science: The Structural Estimation Series, Part II

Hola again! DSOC's Juan here! It's autumn now, time to start craving pumpkin tarts and hot drinks. Recently I'm having hot matcha latte very often, maybe because it reminds me when I was a graduate student cleaning my data in some cafe around this time of the year. What's your favorite hot drink for this season?

This is part II of the Structural Estimation (SE) series. In the previous post I discussed some of the advantages and difficulties of SE, and mentioned some of its different flavors. A good way of learning SE is understanding Dynamic Discrete Choice (DDC) Models. In this post I present the basic elements that most DDC models share by introducing the universal toy model: John Rust’s GMC bus engines replacement model, first presented in Rust (1987).

The engine replacement model has been used by several authors. Hotz et. al. (1994) used it to benchmark their simulator estimator, Aguirregabiria and Mira (2002) referenced it to show their approach to solving the decision problem by swapping Rust's NFXP algorithm, and more recently Chernozhukov et. al. (2018) used it to test their semiparametric approach to solving DDC models. Even the Bonanza AI approaches the game of Shogi in a very similar way to the decision problem described by Rust (Igami, 2018). The Conditional Independence assumption made in Rust (1987) has now become a standard approach to simplifying the likelihood function. Everybody knows it, everybody likes it. You could say it is the matcha latte of Structural Estimation models.

The protagonist of the model is Harold Zurcher, who once was the superintendent of maintenance of the Madison Metropolitan Bus Company in Wisconsin. His job included scheduling the repair and maintenance of the vehicles in the fleet. In his day-to-day job Mr. Zurcher would probably check the odometer readings of each of the buses in the fleet, then go like:

https://media.giphy.com/media/3owzW5c1tPq63MPmWk/giphy.gif

and after a few seconds decide for each bus whether to just provide normal maintenance or to completely replace the bus engine for a new one. So what does that decision process look like?

The Utility Function

Zurcher's utility function can be expressed in the following way:

$\begin{align} u(x_t, i_t, \theta_1) = \begin{cases} & -c(x_t, \theta_1) \text{ if } i_t= 0\\ & -R - c(0, \theta_1) \text{ if } i_t=1 \end{cases} \\ \tag{1} \end{align}$

Where:
$x_t$ : the accumulated miles at $t$ since the last replacement
$c()$ : the cost function that defines how the operational cost changes as a function of the accumulated mileage since the last replacement. One could imagine for example that older buses are costlier to maintain, in which case we would expect $c$ to be an increasing function of $x_t$ .
$R$ : the cost of buying a new engine net of the scrap value of the older one.
$i_t$ : the decision taken in period $t$ . It takes the value of one if the engine was replaced and zero otherwise.
$θ_1$ : some unknown parameters of the cost function. This is usually represented as a vector which size depends on the functional form of $c()$ .

Note that time is discrete.

Let's also define the choice set $D(x)$ , representing all the possible choices that can be made. In this case $D(x) = \{0, 1\}$ , meaning that the superintendent only has two options: to replace or to not replace.

Mr. Zurcher’s purpose is to minimize the cost of operating the buses, or equally to maximize the utility function above, not just for today, but also for all the days to come. We could assume that this decision process is performed until Mr. Zurcher’s retirement, and call this a finite horizon problem. It can also be assumed that the process continues forever and name it an infinite horizon problem.

Both types of problems are solved in different ways. Here I’ll follow John Rust’s story and assume an infinite horizon. Also notice that there's only one agent making decisions in this setting: the superintendent. For this reason, it can be labeled a single-agent model. The decision problem then consists of choosing every day an $i_t$ from the choice set $\{0, 1\}$ in order to:

\begin{align}
\max_{i_t}E\{u(x_t,i_t, \theta_1) + \sum_{j=t+1}^{\infty}{\beta^{j-t}u(x_j, i_j, \theta_1)} \}
\tag{2}
\end{align}

$E$ represents the expectation operator. $\beta$ here represents a discount factor, capturing the fact that utility is worth more now than in the future. Basically, it represents how impatient the agent is. $\beta$ is in the range $[0, 1)$ . When it is very close to zero (a very impatient agent), the problem approximates a static one: the agent basically only cares about the present utility.

Rust calls this maximization problem a regenerative optimal stopping model of engine replacement... yeah, it’s quite a mouthful, but let me explain. It’s called regenerative because when the engine of a bus gets exchanged, the bus becomes as good as new. You can see this in the utility function by noting that the cost $c()$ of operating a bus is normally $c(x_t, \theta_1 )$ , except when the engine has been recently replaced, in which case the cost is $c(0,\theta_1 )$ . The system has regenerated to its starting point.

But why is it called an optimal stopping model? Well, simply because Mr. Zurcher must decide every day when to replace (stop using) the older engine in a way that optimizes his utility function by solving the problem stated in Equation 2). And by the way, looking at Equation 2) you’ll notice that there’s an expectation operator... as if there was some random variable with respect to which we must calculate an expected value... That leads me to the following element:

State and the Transition Matrix

Remember that Mr. Zurcher makes decisions about the maintenance of the buses, but he does not drive them! His decisions may influence how much the buses are used, but do not determine the mileage. Besides that, bus mileage may depend on the demand for transportation and other factors that are exogenous to him. From his point of view, the mileage of the buses in the fleet is like the state of the nature. In DDC models, variables such as the mileage are the so-called state.

Whoever decides how much to operate a given bus may take into consideration the current accumulated mileage. In other words, the mileage at $t$ depends on (and only on) the mileage at $t-1$ with some randomness. You could say that the accumulated mileage $x_t$ forms a Markov Chain. The superintendent can’t predict accurately the mileage of a given bus tomorrow, but he can tell more or less about how much will it be by looking at the mileage today. How do we include this intuition into the decision problem above? Enter the concept of transition probabilities, represented as the mapping $p(x_{t+1} | x_t, i_t,\theta_2)$ to the probability space. This represents the probability of the value of $x$ in $t+1$ given its value in $t$ , the replacement decision $i_t$ and a set of parameters $\theta_2$ .

When the state variable is discrete, it is possible to represent its evolution by constructing a transition matrix. For example, if there were 3 possible states, then a transition matrix would be of size 3x3, and each cell $(q, r)$ would represent the probability of the state variable moving from state $q$ to state $r$ , with rows adding up to 1. Furthermore, the transition matrix could be different depending on the choice made the current period. The case of the engine replacement problem is a bit more complicated, because the state variable (accumulated mileage) is a continuous variable, so a transition matrix cannot be formed.

There are many approaches to representing the evolution of the state variable. John Rust decided to discretize the accumulated mileage variable into buckets, so that he could create a transition matrix. Well, actually two, one for the case when the engine is not replaced and one for the case when it is.

Value Function

Now this looks like a very complex problem. After all, Equation 2) includes all the future decisions all the way to infinite periods in the future! But the decision problem can be simplified considerably. First, consider that we’re dealing with an infinite horizon problem, there are infinite days until it ends. And guess what! Tomorrow that will be the case, as well as the day after, and so on. So it turns out that the decision problem is exactly the same every day (it is stationary).

Now, we also know that Mr. Zurcher is rational and prefers lower costs over higher ones. We can be confident in saying that, whatever the situation is tomorrow, he’ll take the optimal choice. So if he knew what his expected discounted utility at the optimal choice will be the next period, he could reduce his decision problem to a simpler one represented by the following Value function:

$\begin{align} V_{\theta}(x_t )=\max_{i_t \in D(x_t)}{\left [u(x_t,i_t,\theta_1 )+ \beta EV_{\theta}(x_t,i_t) \right]} \tag{3} \end{align}$

This equation is usually called the Bellman Equation, and $V_{\theta}$ is the Value Function, representing the maximum utility attained when making the optimal choice given the current state, and that the optimal decision is made in every future period. When $x_t$ is continuous, the expected value of $V$ on the right hand side is given by:

$\begin{align} EV_{\theta}(x_t,i_t )=\int_0^\infty{ V_{\theta}(y)p(dy| x_t, i_t, \theta) } \tag{4} \end{align}$

The most common case is when $x_t$ is discrete. In that case, $V$ is a column vector with the same number of elements as the number of states. It indicates what is the value of each state at every point in time. Think of it as a chess AI evaluating the value of each possible board state given that it chooses to move a horse to a particular square. Define the transition matrix as the SxS matrix $\Pi$ , with $S$ representing a positive integer number of states. In that case, $EV_\theta$ becomes the matrix multiplication between $\Pi$ and the vector $V$ . Also, the transition matrix may be different depending on the value of $i_t$ , so if there are two possible alternatives, there are also two transition matrices.

Structural Parameters

Covering the topic of what are structural parameters would require a full post of its own. For now let's just focus on what parameters we want to estimate: $R$ , $\theta_1$ and $\theta_2$ . We'll also need to estimate $V$ in order to find those parameters. Although estimating $\beta$ is ideal, most of the time its identification is problematic, so most authors tend to simply estimate the rest of the parameters for several fixed values of $\beta$ and then compare the resulting likelihood.

Finally

In this post I introduced the basic elements of a DDC model through the example in Rust (1987). In the next post I'll go over Rust's Nested Fixed Point algorithm for identifying the structural parameters. ¡Hasta pronto!

References

Aguirregabiria, Victor and Pedro Mira (2002) "Swapping the Nested Fixed Point Algorithm: A Class of Estimators for Discrete Markov Decision Models." Econometrica 70(4):1519-543.

Chernozhukov, Victor, Juan C. Escanciano, Hidehiko Ichimura, Whitney K. Newey and James M. Robins (2018) "Locally robust Semiparametric Estimation." Working paper CWP30/18. IFS.

Hotz, J., R. Miller, S. Sanders, and J. Smith (1994) "A Simulation Estimator for Dynamic Models of Discrete Choice" Review of Economic Studies 61:265–289.

Mitsuru Igami (2018) "Artificial Intelligence as Structural Estimation: Economic Interpretations of Deep Blue, Bonanza, and AlphaGo," Cornell University arXiv:1710.10967