2. The Simple Regression Model

2. The Simple Regression Model#

from wooldridge import *
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import numpy as np

dataWoo()

  J.M. Wooldridge (2016) Introductory Econometrics: A Modern Approach,
  Cengage Learning, 6th edition.

  401k       401ksubs    admnrev       affairs     airfare
  alcohol    apple       approval      athlet1     athlet2
  attend     audit       barium        beauty      benefits
  beveridge  big9salary  bwght         bwght2      campus
  card       catholic    cement        census2000  ceosal1
  ceosal2    charity     consump       corn        countymurders
  cps78_85   cps91       crime1        crime2      crime3
  crime4     discrim     driving       earns       econmath
  elem94_95  engin       expendshares  ezanders    ezunem
  fair       fertil1     fertil2       fertil3     fish
  fringe     gpa1        gpa2          gpa3        happiness
  hprice1    hprice2     hprice3       hseinv      htv
  infmrt     injury      intdef        intqrt      inven
  jtrain     jtrain2     jtrain3       kielmc      lawsch85
  loanapp    lowbrth     mathpnl       meap00_01   meap01
  meap93     meapsingle  minwage       mlb1        mroz
  murder     nbasal      nyse          okun        openness
  pension    phillips    pntsprd       prison      prminwge
  rdchem     rdtelec     recid         rental      return
  saving     sleep75     slp75_81      smoke       traffic1
  traffic2   twoyear     volat         vote1       vote2
  voucher    wage1       wage2         wagepan     wageprc
  wine

Example 2.3 CEO Salary & Return on Equity#

df = dataWoo('ceosal1')
dataWoo('ceosal1', description=True)

name of dataset: ceosal1
no of variables: 12
no of observations: 209

+----------+-------------------------------+
| variable | label                         |
+----------+-------------------------------+
| salary   | 1990 salary, thousands $      |
| pcsalary | % change salary, 89-90        |
| sales    | 1990 firm sales, millions $   |
| roe      | return on equity, 88-90 avg   |
| pcroe    | % change roe, 88-90           |
| ros      | return on firm's stock, 88-90 |
| indus    | =1 if industrial firm         |
| finance  | =1 if financial firm          |
| consprod | =1 if consumer product firm   |
| utility  | =1 if transport. or utilties  |
| lsalary  | natural log of salary         |
| lsales   | natural log of sales          |
+----------+-------------------------------+

I took a random sample of data reported in the May 6, 1991 issue of
Businessweek.

df.head()

	salary	pcsalary	sales	roe	pcroe	ros	indus	lsalary	lsales
0	1095	20	27595.000000	14.1	106.400002	191	1	6.998509	10.225389
1	1001	32	9958.000000	10.9	-30.600000	13	1	6.908755	9.206132
2	1122	9	6125.899902	23.5	-16.299999	14	1	7.022868	8.720281
3	578	-9	16246.000000	5.9	-25.700001	-21	1	6.359574	9.695602
4	1368	7	21783.199219	13.8	-3.000000	56	1	7.221105	9.988894

model = smf.ols(formula='salary ~ 1 + roe', data=df).fit()
model.summary()

OLS Regression Results
Dep. Variable:	salary	R-squared:	0.013
Model:	OLS	Adj. R-squared:	0.008
Method:	Least Squares	F-statistic:	2.767
Date:	Tue, 09 Jul 2024	Prob (F-statistic):	0.0978
Time:	21:56:56	Log-Likelihood:	-1804.5
No. Observations:	209	AIC:	3613.
Df Residuals:	207	BIC:	3620.
Df Model:	1
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	963.1913	213.240	4.517	0.000	542.790	1383.592
roe	18.5012	11.123	1.663	0.098	-3.428	40.431

Omnibus:	311.096	Durbin-Watson:	2.105
Prob(Omnibus):	0.000	Jarque-Bera (JB):	31120.902
Skew:	6.915	Prob(JB):	0.00
Kurtosis:	61.158	Cond. No.	43.3

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

print(model.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 salary   R-squared:                       0.013
Model:                            OLS   Adj. R-squared:                  0.008
Method:                 Least Squares   F-statistic:                     2.767
Date:                Tue, 09 Jul 2024   Prob (F-statistic):             0.0978
Time:                        21:56:56   Log-Likelihood:                -1804.5
No. Observations:                 209   AIC:                             3613.
Df Residuals:                     207   BIC:                             3620.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept    963.1913    213.240      4.517      0.000     542.790    1383.592
roe           18.5012     11.123      1.663      0.098      -3.428      40.431
==============================================================================
Omnibus:                      311.096   Durbin-Watson:                   2.105
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            31120.902
Skew:                           6.915   Prob(JB):                         0.00
Kurtosis:                      61.158   Cond. No.                         43.3
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

model.params

Intercept    963.191336
roe           18.501186
dtype: float64

model.nobs

209.0

if the return on equity increases by one percentage point, then salary is predicted to change by about 18.5

predicted_salary = model.predict(pd.DataFrame({'roe': [30]}))
predicted_salary

0    1518.226927
dtype: float64

Example 2.4 Wage Equation#

df2 = dataWoo('wage1')
dataWoo('wage1', description=True)

name of dataset: wage1
no of variables: 24
no of observations: 526

+----------+---------------------------------+
| variable | label                           |
+----------+---------------------------------+
| wage     | average hourly earnings         |
| educ     | years of education              |
| exper    | years potential experience      |
| tenure   | years with current employer     |
| nonwhite | =1 if nonwhite                  |
| female   | =1 if female                    |
| married  | =1 if married                   |
| numdep   | number of dependents            |
| smsa     | =1 if live in SMSA              |
| northcen | =1 if live in north central U.S |
| south    | =1 if live in southern region   |
| west     | =1 if live in western region    |
| construc | =1 if work in construc. indus.  |
| ndurman  | =1 if in nondur. manuf. indus.  |
| trcommpu | =1 if in trans, commun, pub ut  |
| trade    | =1 if in wholesale or retail    |
| services | =1 if in services indus.        |
| profserv | =1 if in prof. serv. indus.     |
| profocc  | =1 if in profess. occupation    |
| clerocc  | =1 if in clerical occupation    |
| servocc  | =1 if in service occupation     |
| lwage    | log(wage)                       |
| expersq  | exper^2                         |
| tenursq  | tenure^2                        |
+----------+---------------------------------+

These are data from the 1976 Current Population Survey, collected by
Henry Farber when he and I were colleagues at MIT in 1988.

df2.head()

	wage	educ	exper	tenure	female	married	numdep	smsa	...	trade	services	clerocc	servocc	lwage	expersq	tenursq
0	3.10	11	2	0	1	0	2	1	...	0	0	0	0	1.131402	4	0
1	3.24	12	22	2	1	1	3	1	...	0	1	0	1	1.175573	484	4
2	3.00	11	2	0	0	0	2	0	...	1	0	0	0	1.098612	4	0
3	6.00	8	44	28	0	1	0	1	...	0	0	1	0	1.791759	1936	784
4	5.30	12	7	2	0	1	1	0	...	0	0	0	0	1.667707	49	4

5 rows × 24 columns

model2 = smf.ols(formula='wage ~ educ', data=df2).fit()
print(model2.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   wage   R-squared:                       0.165
Model:                            OLS   Adj. R-squared:                  0.163
Method:                 Least Squares   F-statistic:                     103.4
Date:                Tue, 09 Jul 2024   Prob (F-statistic):           2.78e-22
Time:                        21:56:56   Log-Likelihood:                -1385.7
No. Observations:                 526   AIC:                             2775.
Df Residuals:                     524   BIC:                             2784.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.9049      0.685     -1.321      0.187      -2.250       0.441
educ           0.5414      0.053     10.167      0.000       0.437       0.646
==============================================================================
Omnibus:                      212.554   Durbin-Watson:                   1.824
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              807.843
Skew:                           1.861   Prob(JB):                    3.79e-176
Kurtosis:                       7.797   Cond. No.                         60.2
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

The intercept of −0.90 literally means that a person with no education has a predicted hourly wage of −90¢ an hour

one more year of education increases hourly wage by 54¢ an hour

Example 2.5 Vote share#

df3 = dataWoo('vote1')
dataWoo('vote1', description=True)

name of dataset: vote1
no of variables: 10
no of observations: 173

+----------+---------------------------------+
| variable | label                           |
+----------+---------------------------------+
| state    | state postal code               |
| district | congressional district          |
| democA   | =1 if A is democrat             |
| voteA    | percent vote for A              |
| expendA  | camp. expends. by A, $1000s     |
| expendB  | camp. expends. by B, $1000s     |
| prtystrA | % vote for president            |
| lexpendA | log(expendA)                    |
| lexpendB | log(expendB)                    |
| shareA   | 100*(expendA/(expendA+expendB)) |
+----------+---------------------------------+

From M. Barone and G. Ujifusa, The Almanac of American Politics, 1992.
Washington, DC: National Journal.

df3.head()

	state	district	democA	voteA	expendA	expendB	prtystrA	lexpendA	lexpendB	shareA
0	AL	7	1	68	328.295990	8.737000	41	5.793916	2.167567	97.407669
1	AK	1	0	62	626.377014	402.476990	60	6.439952	5.997638	60.881039
2	AZ	2	1	73	99.607002	3.065000	55	4.601233	1.120048	97.014763
3	AZ	3	0	69	319.690002	26.281000	64	5.767352	3.268846	92.403702
4	AR	3	0	75	159.220993	60.054001	66	5.070293	4.095244	72.612473

model3 = smf.ols(formula='voteA ~ shareA', data=df3).fit()
print(model3.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  voteA   R-squared:                       0.856
Model:                            OLS   Adj. R-squared:                  0.855
Method:                 Least Squares   F-statistic:                     1018.
Date:                Tue, 09 Jul 2024   Prob (F-statistic):           6.63e-74
Time:                        21:56:56   Log-Likelihood:                -565.20
No. Observations:                 173   AIC:                             1134.
Df Residuals:                     171   BIC:                             1141.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     26.8122      0.887     30.221      0.000      25.061      28.564
shareA         0.4638      0.015     31.901      0.000       0.435       0.493
==============================================================================
Omnibus:                       20.747   Durbin-Watson:                   1.826
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               44.613
Skew:                           0.525   Prob(JB):                     2.05e-10
Kurtosis:                       5.255   Cond. No.                         112.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

if Candidate A’s share of spending increases by one percentage point, Candidate A receives almost one-half a percentage point (0.464) more of the total vote.

Example 2.6 Table 2.2#

df['salary_hat'] = model.fittedvalues
df['uhat'] = model.resid

df[['roe','salary','salary_hat','uhat']].head(16)

	roe	salary	salary_hat	uhat
0	14.100000	1095	1224.058071	-129.058071
1	10.900000	1001	1164.854261	-163.854261
2	23.500000	1122	1397.969216	-275.969216
3	5.900000	578	1072.348338	-494.348338
4	13.800000	1368	1218.507712	149.492288
5	20.000000	1145	1333.215063	-188.215063
6	16.400000	1078	1266.610785	-188.610785
7	16.299999	1094	1264.760660	-170.760660
8	10.500000	1237	1157.453793	79.546207
9	26.299999	833	1449.772523	-616.772523
10	25.900000	567	1442.372056	-875.372056
11	26.799999	933	1459.023116	-526.023116
12	14.800000	1339	1237.008898	101.991102
13	22.299999	937	1375.767778	-438.767778
14	56.299999	2011	2004.808114	6.191886
15	12.600000	1585	1196.306291	388.693709

The first four CEOs have lower salaries than what we predicted from the OLS regression line (2.26); in other words, given only the firm’s roe, these CEOs make less than what we predicted. As can be seen from the positive uhat, the fifth CEO makes more than predicted from the OLS regression line.

Example 2.7 Wage & education#

df2['wage'].mean()

np.float64(5.896102674787035)

df2['educ'].mean()

np.float64(12.562737642585551)

print(model2.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   wage   R-squared:                       0.165
Model:                            OLS   Adj. R-squared:                  0.163
Method:                 Least Squares   F-statistic:                     103.4
Date:                Tue, 09 Jul 2024   Prob (F-statistic):           2.78e-22
Time:                        21:56:56   Log-Likelihood:                -1385.7
No. Observations:                 526   AIC:                             2775.
Df Residuals:                     524   BIC:                             2784.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.9049      0.685     -1.321      0.187      -2.250       0.441
educ           0.5414      0.053     10.167      0.000       0.437       0.646
==============================================================================
Omnibus:                      212.554   Durbin-Watson:                   1.824
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              807.843
Skew:                           1.861   Prob(JB):                    3.79e-176
Kurtosis:                       7.797   Cond. No.                         60.2
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

model2.predict(pd.DataFrame({'educ': [12.56]}))

0    5.894621
dtype: float64

$\bar{x}$ and $\bar{y}$ fall on the regression line

Example 2.8. CEO Salary - R-squared#

print(model.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 salary   R-squared:                       0.013
Model:                            OLS   Adj. R-squared:                  0.008
Method:                 Least Squares   F-statistic:                     2.767
Date:                Tue, 09 Jul 2024   Prob (F-statistic):             0.0978
Time:                        21:56:56   Log-Likelihood:                -1804.5
No. Observations:                 209   AIC:                             3613.
Df Residuals:                     207   BIC:                             3620.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept    963.1913    213.240      4.517      0.000     542.790    1383.592
roe           18.5012     11.123      1.663      0.098      -3.428      40.431
==============================================================================
Omnibus:                      311.096   Durbin-Watson:                   2.105
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            31120.902
Skew:                           6.915   Prob(JB):                         0.00
Kurtosis:                      61.158   Cond. No.                         43.3
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

model.rsquared

np.float64(0.01318862408103405)

The firm’s return on equity explains only about 1.3% of the variation in salaries for this sample of 209 CEOs.

Example2.9 Voting outcome - R-squared.#

print(model3.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  voteA   R-squared:                       0.856
Model:                            OLS   Adj. R-squared:                  0.855
Method:                 Least Squares   F-statistic:                     1018.
Date:                Tue, 09 Jul 2024   Prob (F-statistic):           6.63e-74
Time:                        21:56:56   Log-Likelihood:                -565.20
No. Observations:                 173   AIC:                             1134.
Df Residuals:                     171   BIC:                             1141.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     26.8122      0.887     30.221      0.000      25.061      28.564
shareA         0.4638      0.015     31.901      0.000       0.435       0.493
==============================================================================
Omnibus:                       20.747   Durbin-Watson:                   1.826
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               44.613
Skew:                           0.525   Prob(JB):                     2.05e-10
Kurtosis:                       5.255   Cond. No.                         112.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

model3.rsquared

np.float64(0.8561408655827665)

The share of campaign expenditures explains over 85% of the variation in the election outcomes for this sample

Exercises#

C1#

The data in 401K are a subset of data analyzed by Papke (1995) to study the relationship between participation in a 401(k) pension plan and the generosity of the plan. The variable prate is the percentage of eligible workers with an active account; this is the variable we would like to explain. The measure of generosity is the plan match rate, mrate. This variable gives the average amount the firm contributes to each worker’s plan for each $1 contribution by the worker. For example , if mrate = 0.50, then a $1 contribution by the worker is matched by a 50¢ contribution by the firm.

df = dataWoo('401K')
df.head()

	prate	mrate	totpart	totelg	age	totemp	sole	ltotemp
0	26.100000	0.21	1653.0	6322.0	8	8709.0	0	9.072112
1	100.000000	1.42	262.0	262.0	6	315.0	1	5.752573
2	97.599998	0.91	166.0	170.0	10	275.0	1	5.616771
3	100.000000	0.42	257.0	257.0	7	500.0	0	6.214608
4	82.500000	0.53	591.0	716.0	28	933.0	1	6.838405

dataWoo('401K', description=True)

name of dataset: 401k
no of variables: 8
no of observations: 1534

+----------+---------------------------------+
| variable | label                           |
+----------+---------------------------------+
| prate    | participation rate, percent     |
| mrate    | 401k plan match rate            |
| totpart  | total 401k participants         |
| totelg   | total eligible for 401k plan    |
| age      | age of 401k plan                |
| totemp   | total number of firm employees  |
| sole     | = 1 if 401k is firm's sole plan |
| ltotemp  | log of totemp                   |
+----------+---------------------------------+

L.E. Papke (1995), “Participation in and Contributions to 401(k)
Pension Plans:Evidence from Plan Data,” Journal of Human Resources 30,
311-325. Professor Papke kindly provided these data. She gathered them
from the Internal Revenue Service’s Form 5500 tapes.

(i) Find the average participation rate and the average match rate in the sample of plans.

print("Average participation rate:", round(df['prate'].mean(), 2))
print("Average match rate:", round(df['mrate'].mean(), 2))

Average participation rate: 87.36
Average match rate: 0.73

(ii) Now, estimate the simple regression equation $$ \hat{prate} = \hat{b}_0 + \hat{b}_1 mrate $$

and report the results along with the sample size and R-squared

prate_hat = smf.ols("prate ~ 1 + mrate", data=df).fit()

print("results:", prate_hat.params)

print("R squared:", prate_hat.rsquared.__round__(3))

print("Sample size:", prate_hat.nobs)

results: Intercept    83.075455
mrate         5.861079
dtype: float64
R squared: 0.075
Sample size: 1534.0

(iii) Interpret the intercept in your equation. Interpret the coefficient on mrate.

print('intercept:', prate_hat.params.iloc[0].__round__(2))

intercept: 83.08

Find the predicted prate when mrate = 3.5. Is this a reasonable prediction? Explain what is happening here.

round(prate_hat.predict({'mrate': 3.5}), 2)

0    103.59
dtype: float64

(v) How much of the variation in prate is explained by mrate? Is this a lot in your opinion?

print("Percentage explained:", round(prate_hat.rsquared * 100, 1))

Percentage explained: 7.5

C2#

The data set in CEOSAL2 contains information on chief executive officers for U.S. corporations. The variable salary is annual compensation, in thousands of dollars, and ceoten is prior number of years as company CEO.

df2 = dataWoo("CEOSAL2")
df2.head()

	salary	age	college	grad	comten	ceoten	sales	profits	mktval	lsalary	lsales	lmktval	comtensq	ceotensq	profmarg
0	1161	49	1	1	9	2	6200.0	966	23200.0	7.057037	8.732305	10.051908	81	4	15.580646
1	600	43	1	1	10	10	283.0	48	1100.0	6.396930	5.645447	7.003066	100	100	16.961130
2	379	51	1	1	9	3	169.0	40	1100.0	5.937536	5.129899	7.003066	81	9	23.668638
3	651	55	1	0	22	22	1100.0	-54	1000.0	6.478509	7.003066	6.907755	484	484	-4.909091
4	497	44	1	1	8	6	351.0	28	387.0	6.208590	5.860786	5.958425	64	36	7.977208

dataWoo("CEOSAL2", description=True)

name of dataset: ceosal2
no of variables: 15
no of observations: 177

+----------+--------------------------------+
| variable | label                          |
+----------+--------------------------------+
| salary   | 1990 compensation, $1000s      |
| age      | in years                       |
| college  | =1 if attended college         |
| grad     | =1 if attended graduate school |
| comten   | years with company             |
| ceoten   | years as ceo with company      |
| sales    | 1990 firm sales, millions      |
| profits  | 1990 profits, millions         |
| mktval   | market value, end 1990, mills. |
| lsalary  | log(salary)                    |
| lsales   | log(sales)                     |
| lmktval  | log(mktval)                    |
| comtensq | comten^2                       |
| ceotensq | ceoten^2                       |
| profmarg | profits as % of sales          |
+----------+--------------------------------+

See CEOSAL1.RAW

(i) Find the average salary and the average tenure in the sample.

print("Average Salary:", round(df2['salary'].mean(), 3))
print("Average ceoten", round(df2["ceoten"].mean(), 2))

Average Salary: 865.864
Average ceoten 7.95

(ii) How many CEOs are in their first year as CEO (that is, ceoten = 0)? What is the longest tenure as a CEO?

print("Number of first year CEO:", (df2['ceoten'] == 0).sum())
print("Longest Tenure:", df2["ceoten"].max())

Number of first year CEO: 5
Longest Tenure: 37

(iii) Estimate the simple regression model $$ \log(salary) = {B}_0 + {B}_1 {ceoten} + u, $$ and report your results in the usual form. What is the (approximate) predicted percentage increase in salary given one more year as a CEO?

log_salary_hat = smf.ols("np.log(salary) ~ 1 + ceoten", data=df2).fit()

print("Paramters:\n", log_salary_hat.params, sep='')

print("Percentage increase:", round(log_salary_hat.params.iloc[1] * 100, 2))

Paramters:
Intercept    6.505498
ceoten       0.009724
dtype: float64
Percentage increase: 0.97

C3#

Use the data in SLEEP75 from Biddle and Hamermesh (1990) to study whether there is a tradeoff between the time spent sleeping per week and the time spent in paid work. We could use either variable as the dependent variable. For concreteness, estimate the model $$ sleep = B_0 + B_1 totwrk + u $$ where sleep is minutes spent sleeping at night per week and totwrk is total minutes worked during the week

df3 = dataWoo("sleep75")
df3.head()

	age	case	educ	earns74	gdhlth	inlf	leis1	...	spwrk75	totwrk	worknrm	exper	yrsmarr	hrwage	agesq
0	32	1	12	0.0	0	1	3529	...	0	3438	3438	14	13	7.070004	1024
1	31	2	14	9500.0	1	1	2140	...	0	5020	5020	11	0	1.429999	961
2	44	3	17	42500.0	1	1	4595	...	1	2815	2815	21	0	20.529997	1936
3	30	4	12	42500.0	1	1	3211	...	1	3786	3786	12	12	9.619998	900
4	64	5	14	2500.0	1	1	4052	...	1	2580	2580	44	33	2.750000	4096

5 rows × 34 columns

dataWoo("sleep75", description=True)

name of dataset: sleep75
no of variables: 34
no of observations: 706

+----------+--------------------------------+
| variable | label                          |
+----------+--------------------------------+
| age      | in years                       |
| black    | =1 if black                    |
| case     | identifier                     |
| clerical | =1 if clerical worker          |
| construc | =1 if construction worker      |
| educ     | years of schooling             |
| earns74  | total earnings, 1974           |
| gdhlth   | =1 if in good or excel. health |
| inlf     | =1 if in labor force           |
| leis1    | sleep - totwrk                 |
| leis2    | slpnaps - totwrk               |
| leis3    | rlxall - totwrk                |
| smsa     | =1 if live in smsa             |
| lhrwage  | log hourly wage                |
| lothinc  | log othinc, unless othinc < 0  |
| male     | =1 if male                     |
| marr     | =1 if married                  |
| prot     | =1 if Protestant               |
| rlxall   | slpnaps + personal activs      |
| selfe    | =1 if self employed            |
| sleep    | mins sleep at night, per wk    |
| slpnaps  | minutes sleep, inc. naps       |
| south    | =1 if live in south            |
| spsepay  | spousal wage income            |
| spwrk75  | =1 if spouse works             |
| totwrk   | mins worked per week           |
| union    | =1 if belong to union          |
| worknrm  | mins work main job             |
| workscnd | mins work second job           |
| exper    | age - educ - 6                 |
| yngkid   | =1 if children < 3 present     |
| yrsmarr  | years married                  |
| hrwage   | hourly wage                    |
| agesq    | age^2                          |
+----------+--------------------------------+

J.E. Biddle and D.S. Hamermesh (1990), “Sleep and the Allocation of
Time,” Journal of Political Economy 98, 922-943. Professor Biddle
kindly provided the data.