import pandas as pd
import numpy as np
import wooldridge
import statsmodels.formula.api as smf
import matplotlib.pyplot as plt
from scipy.stats import chi2def lm_test(data, dependent, control_vars, test_vars):
# Step 1: Fit restricted model
restricted_formula = f"{dependent} ~ {' + '.join(control_vars)}"
restricted_model = smf.ols(restricted_formula, data=data).fit()
# Step 2: Save residuals
data = data.copy()
data["residuals"] = restricted_model.resid
# Step 3: Fit auxiliary regression (residuals on all vars)
full_vars = control_vars + test_vars
auxiliary_formula = f"residuals ~ {' + '.join(full_vars)}"
auxiliary_model = smf.ols(auxiliary_formula, data=data).fit()
# Step 4: Compute LM statistic and p-value
r_squared = auxiliary_model.rsquared
n = len(data)
q = len(test_vars)
lm_stat = r_squared * n
pval = 1 - chi2.cdf(lm_stat, df=q)
return lm_stat, pvalExamples¶
bwght = wooldridge.data("bwght")
crime1 = wooldridge.data("crime1")5.1 Housing Prices and Distances From an Incinerator¶
: house price, : distance from incinerator, : quality of house
we expect and so if is omitted and positively correlated with , OLS estimators are inconsistent
5.2 Standard Errors in a Birth Weight Equation¶
wooldridge.data("bwght", description=True)name of dataset: bwght
no of variables: 14
no of observations: 1388
+----------+--------------------------------+
| variable | label |
+----------+--------------------------------+
| faminc | 1988 family income, $1000s |
| cigtax | cig. tax in home state, 1988 |
| cigprice | cig. price in home state, 1988 |
| bwght | birth weight, ounces |
| fatheduc | father's yrs of educ |
| motheduc | mother's yrs of educ |
| parity | birth order of child |
| male | =1 if male child |
| white | =1 if white |
| cigs | cigs smked per day while preg |
| lbwght | log of bwght |
| bwghtlbs | birth weight, pounds |
| packs | packs smked per day while preg |
| lfaminc | log(faminc) |
+----------+--------------------------------+
J. Mullahy (1997), “Instrumental-Variable Estimation of Count Data
Models: Applications to Models of Cigarette Smoking Behavior,” Review
of Economics and Statistics 79, 596-593. Professor Mullahy kindly
provided the data. He obtained them from the 1988 National Health
Interview Survey.
model02 = smf.ols("lbwght ~ cigs + lfaminc", data=bwght).fit()
model02.summary2().tables[1].iloc[:, :2]Loading...
df = bwght.iloc[: len(bwght) // 2]
model021 = smf.ols("lbwght ~ cigs + lfaminc", data=df).fit()
model021.summary2().tables[1].iloc[:, :2]Loading...
len(bwght), len(df)(1388, 694)Using half the dataset, the standard error for cigs is 0.0013 but becomes 0.00086 if we use the whole dataset. Ratio of standard errors is which is equal to the approcimation
5.3 Economic Model of Crime¶
wooldridge.data("crime1", description=True)name of dataset: crime1
no of variables: 16
no of observations: 2725
+----------+---------------------------------+
| variable | label |
+----------+---------------------------------+
| narr86 | # times arrested, 1986 |
| nfarr86 | # felony arrests, 1986 |
| nparr86 | # property crme arr., 1986 |
| pcnv | proportion of prior convictions |
| avgsen | avg sentence length, mos. |
| tottime | time in prison since 18 (mos.) |
| ptime86 | mos. in prison during 1986 |
| qemp86 | # quarters employed, 1986 |
| inc86 | legal income, 1986, $100s |
| durat | recent unemp duration |
| black | =1 if black |
| hispan | =1 if Hispanic |
| born60 | =1 if born in 1960 |
| pcnvsq | pcnv^2 |
| pt86sq | ptime86^2 |
| inc86sq | inc86^2 |
+----------+---------------------------------+
J. Grogger (1991), “Certainty vs. Severity of Punishment,” Economic
Inquiry 29, 297-309. Professor Grogger kindly provided a subset of the
data he used in his article.
model03 = smf.ols(
"narr86 ~ pcnv + avgsen + tottime + ptime86 + qemp86", data=crime1
).fit()
model03.summary2().tables[1].iloc[:, :2]Loading...
To test that have no effect using LM statistic
model032 = smf.ols("narr86 ~ pcnv + ptime86 + qemp86", data=crime1).fit()
residuals = model032.residdf = crime1.copy()
df["residuals"] = residuals
model033 = smf.ols(
"residuals ~ pcnv + ptime86 + qemp86 + " "avgsen + tottime", data=df
).fit()
rsquared = model033.rsquaredLM = rsquared * len(df)
LMnp.float64(4.070729461071595)The 10% critical value in a chi square distribution with 2 degrees of freedom is 4.61, so we fail to reject the null hypothesis
pval = 1 - chi2.cdf(LM, df=2)
pvalnp.float64(0.13063282803267184)Using my function
lm_test(
crime1,
dependent="narr86",
control_vars=["pcnv", "ptime86", " qemp86"],
test_vars=["avgsen", "tottime"],
)(np.float64(4.070729461071595), np.float64(0.13063282803267184))