I have been playing around with endogenous switching models since I have been becoming more interested in selection bias corrections recently, and these models represent a more general formulation of that problem (i.e. switching between two possible regimes rather than into one possible sample). Since much of the material on these models is sparsely distributed, I have collected it here for my own reference as much as anything.
So, first of all, what is an endogenous switching model? It is a model in which there are two or more regimes and agents may choose which regime to enter based on both their observable and unobservable characteristics. To make this concrete, suppose we have data on patients who are being treated for alcoholism and we want to estimate the effect of participation in group therapy on the number of drinks consumed per day. If we just run OLS, we will be comparing individuals who have chosen to participate in group therapy to those who chose not to. If, for example, very self-motivated and independent people are choosing not to participate and they are also more likely than the participants to limit their alcoholism in the absence of group therapy, then our OLS estimates are going to be biased and inconsistent (in this example, we will underestimate the true effect), which I show below.
The earliest precursors to switching models can actually be found in nineteenth century biology through the works of the famous statistician Karl Pearson on the distribution of organ sizes in the population. He hypothesised that oddly-shaped distributions may arise due to there being many sub-populations, each following a normal distribution with their own mean and variance. Exogenous switching models are later introduced from the 1950’s onward through the work of Quandt, Fair and Jaffee. It isn’t until 1975 that the first endogenous switching model is introduced by Maddala and Nelson, allowing agents to individually choose their switching point. Let’s describe the set-up of this classical model and then demonstrate why OLS is inconsistent,
Note that and are the potential outcomes in this model, meaning that for each individual , they have a potential outcome under regime 1 and a potential outcome under regime 0. The outcome that is actually realised in the data depends on which regime the individual chooses, which is determined by . Putting this together, we have that the observed outcome for a given individual is,
We write in vector form as . We also write our independent variables as an matrix , where , and our coefficients as . Lastly we can write our error term for an individual as and in vector form as . Putting all of this together we can write out our switching regression in the more compact form,
The OLS estimator of is , the second equality by substitution of . Let’s assume that by the LLN, where is symmetric positive definite. Then by the LLN and CMT,
Naturally for to be consistent we need the above expression to be 0. Let’s check this out (using the fact that expectation is a linear operator),
We can similarly show .
This is the root of the inconsistency of OLS under an endogenous switching model. For each regime we are viewing only those individuals whose observable characteristics and unobservable characteristics induced them to select into that regime. If there is correlation between the unobservable characteristics which determine selection and the unobservable characteristics which determine either of the potential outcomes, then there is nothing guaranteeing that the expressions above are zero. Note that, as a corollary, OLS is consistent if and are independent of , which would be the case if there is only selection on observables.
So how does the endogenous switching regression work? The concept behind it is that, if we could estimate the bias terms and , then we would once again be in a selection on observables model and so OLS would be consistent. While the variables and are unobservable for each individual, we need only make assumptions about their distribution in order to estimate the bias terms. This is the key insight. In the classical endogenous switching model, we assume that where
is the covariance between and and is the covariance between and . Note that , the covariance between and , is undefined in this model. It is not estimable given that the bias terms do not depend on it. Note also that it is common to assume that since is estimable only up to a scale factor. In the absence of an exclusion restriction, identification of the parameters in this model comes solely through non-linearity of the normal distribution (as you will see below). It is this extremely strong assumption which has led to the relative unpopularity of this approach in recent years. If you are lucky enough to be in possession of an exclusion restriction (i.e. something which determines selection but not does determine potential outcomes) then it is possible to identify off of that rather than the normality assumption.
Note that the likelihood function of this model can be written as,
Here is where our normality assumption comes into play,
Note that and where is the correlation coefficient between and and is the standard normal density. Performing a similar operation for regime 0 (which I omit) we can rewrite the likelihood function,
The log likelihood function is therefore,
We therefore end up estimating , , , , , and . From here it is very straight-forward to estimate conditional and unconditional outcomes under both regimes.
Something which I have been interested in recently is, in the absence of an exclusion restriction, how sensitive is the endogenous switching regression to violations of the joint normality assumption? I have been running some simulations in which the errors , and are drawn from a skew normal distribution. In another blog post I hope to share some of my results.
Many thanks to Dutoit (2007) for a brief history of switching regressions and an overview of how to derive the likelihood function.