2. The Simple Regression Model#

from wooldridge import *
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import numpy as np

dataWoo()
  J.M. Wooldridge (2016) Introductory Econometrics: A Modern Approach,
  Cengage Learning, 6th edition.

  401k       401ksubs    admnrev       affairs     airfare
  alcohol    apple       approval      athlet1     athlet2
  attend     audit       barium        beauty      benefits
  beveridge  big9salary  bwght         bwght2      campus
  card       catholic    cement        census2000  ceosal1
  ceosal2    charity     consump       corn        countymurders
  cps78_85   cps91       crime1        crime2      crime3
  crime4     discrim     driving       earns       econmath
  elem94_95  engin       expendshares  ezanders    ezunem
  fair       fertil1     fertil2       fertil3     fish
  fringe     gpa1        gpa2          gpa3        happiness
  hprice1    hprice2     hprice3       hseinv      htv
  infmrt     injury      intdef        intqrt      inven
  jtrain     jtrain2     jtrain3       kielmc      lawsch85
  loanapp    lowbrth     mathpnl       meap00_01   meap01
  meap93     meapsingle  minwage       mlb1        mroz
  murder     nbasal      nyse          okun        openness
  pension    phillips    pntsprd       prison      prminwge
  rdchem     rdtelec     recid         rental      return
  saving     sleep75     slp75_81      smoke       traffic1
  traffic2   twoyear     volat         vote1       vote2
  voucher    wage1       wage2         wagepan     wageprc
  wine

Example 2.3 CEO Salary & Return on Equity#

df = dataWoo('ceosal1')
dataWoo('ceosal1', description=True)
name of dataset: ceosal1
no of variables: 12
no of observations: 209

+----------+-------------------------------+
| variable | label                         |
+----------+-------------------------------+
| salary   | 1990 salary, thousands $      |
| pcsalary | % change salary, 89-90        |
| sales    | 1990 firm sales, millions $   |
| roe      | return on equity, 88-90 avg   |
| pcroe    | % change roe, 88-90           |
| ros      | return on firm's stock, 88-90 |
| indus    | =1 if industrial firm         |
| finance  | =1 if financial firm          |
| consprod | =1 if consumer product firm   |
| utility  | =1 if transport. or utilties  |
| lsalary  | natural log of salary         |
| lsales   | natural log of sales          |
+----------+-------------------------------+

I took a random sample of data reported in the May 6, 1991 issue of
Businessweek.
df.head()
salary pcsalary sales roe pcroe ros indus finance consprod utility lsalary lsales
0 1095 20 27595.000000 14.1 106.400002 191 1 0 0 0 6.998509 10.225389
1 1001 32 9958.000000 10.9 -30.600000 13 1 0 0 0 6.908755 9.206132
2 1122 9 6125.899902 23.5 -16.299999 14 1 0 0 0 7.022868 8.720281
3 578 -9 16246.000000 5.9 -25.700001 -21 1 0 0 0 6.359574 9.695602
4 1368 7 21783.199219 13.8 -3.000000 56 1 0 0 0 7.221105 9.988894
model = smf.ols(formula='salary ~ 1 + roe', data=df).fit()
model.summary()
OLS Regression Results
Dep. Variable: salary R-squared: 0.013
Model: OLS Adj. R-squared: 0.008
Method: Least Squares F-statistic: 2.767
Date: Tue, 09 Jul 2024 Prob (F-statistic): 0.0978
Time: 21:56:56 Log-Likelihood: -1804.5
No. Observations: 209 AIC: 3613.
Df Residuals: 207 BIC: 3620.
Df Model: 1
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Intercept 963.1913 213.240 4.517 0.000 542.790 1383.592
roe 18.5012 11.123 1.663 0.098 -3.428 40.431
Omnibus: 311.096 Durbin-Watson: 2.105
Prob(Omnibus): 0.000 Jarque-Bera (JB): 31120.902
Skew: 6.915 Prob(JB): 0.00
Kurtosis: 61.158 Cond. No. 43.3


Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
print(model.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 salary   R-squared:                       0.013
Model:                            OLS   Adj. R-squared:                  0.008
Method:                 Least Squares   F-statistic:                     2.767
Date:                Tue, 09 Jul 2024   Prob (F-statistic):             0.0978
Time:                        21:56:56   Log-Likelihood:                -1804.5
No. Observations:                 209   AIC:                             3613.
Df Residuals:                     207   BIC:                             3620.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept    963.1913    213.240      4.517      0.000     542.790    1383.592
roe           18.5012     11.123      1.663      0.098      -3.428      40.431
==============================================================================
Omnibus:                      311.096   Durbin-Watson:                   2.105
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            31120.902
Skew:                           6.915   Prob(JB):                         0.00
Kurtosis:                      61.158   Cond. No.                         43.3
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
model.params
Intercept    963.191336
roe           18.501186
dtype: float64
model.nobs
209.0

if the return on equity increases by one percentage point, then salary is predicted to change by about 18.5

predicted_salary = model.predict(pd.DataFrame({'roe': [30]}))
predicted_salary
0    1518.226927
dtype: float64

Example 2.4 Wage Equation#

df2 = dataWoo('wage1')
dataWoo('wage1', description=True)
name of dataset: wage1
no of variables: 24
no of observations: 526

+----------+---------------------------------+
| variable | label                           |
+----------+---------------------------------+
| wage     | average hourly earnings         |
| educ     | years of education              |
| exper    | years potential experience      |
| tenure   | years with current employer     |
| nonwhite | =1 if nonwhite                  |
| female   | =1 if female                    |
| married  | =1 if married                   |
| numdep   | number of dependents            |
| smsa     | =1 if live in SMSA              |
| northcen | =1 if live in north central U.S |
| south    | =1 if live in southern region   |
| west     | =1 if live in western region    |
| construc | =1 if work in construc. indus.  |
| ndurman  | =1 if in nondur. manuf. indus.  |
| trcommpu | =1 if in trans, commun, pub ut  |
| trade    | =1 if in wholesale or retail    |
| services | =1 if in services indus.        |
| profserv | =1 if in prof. serv. indus.     |
| profocc  | =1 if in profess. occupation    |
| clerocc  | =1 if in clerical occupation    |
| servocc  | =1 if in service occupation     |
| lwage    | log(wage)                       |
| expersq  | exper^2                         |
| tenursq  | tenure^2                        |
+----------+---------------------------------+

These are data from the 1976 Current Population Survey, collected by
Henry Farber when he and I were colleagues at MIT in 1988.
df2.head()
wage educ exper tenure nonwhite female married numdep smsa northcen ... trcommpu trade services profserv profocc clerocc servocc lwage expersq tenursq
0 3.10 11 2 0 0 1 0 2 1 0 ... 0 0 0 0 0 0 0 1.131402 4 0
1 3.24 12 22 2 0 1 1 3 1 0 ... 0 0 1 0 0 0 1 1.175573 484 4
2 3.00 11 2 0 0 0 0 2 0 0 ... 0 1 0 0 0 0 0 1.098612 4 0
3 6.00 8 44 28 0 0 1 0 1 0 ... 0 0 0 0 0 1 0 1.791759 1936 784
4 5.30 12 7 2 0 0 1 1 0 0 ... 0 0 0 0 0 0 0 1.667707 49 4

5 rows × 24 columns

model2 = smf.ols(formula='wage ~ educ', data=df2).fit()
print(model2.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   wage   R-squared:                       0.165
Model:                            OLS   Adj. R-squared:                  0.163
Method:                 Least Squares   F-statistic:                     103.4
Date:                Tue, 09 Jul 2024   Prob (F-statistic):           2.78e-22
Time:                        21:56:56   Log-Likelihood:                -1385.7
No. Observations:                 526   AIC:                             2775.
Df Residuals:                     524   BIC:                             2784.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.9049      0.685     -1.321      0.187      -2.250       0.441
educ           0.5414      0.053     10.167      0.000       0.437       0.646
==============================================================================
Omnibus:                      212.554   Durbin-Watson:                   1.824
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              807.843
Skew:                           1.861   Prob(JB):                    3.79e-176
Kurtosis:                       7.797   Cond. No.                         60.2
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

The intercept of −0.90 literally means that a person with no education has a predicted hourly wage of −90¢ an hour

one more year of education increases hourly wage by 54¢ an hour

Example 2.5 Vote share#

df3 = dataWoo('vote1')
dataWoo('vote1', description=True)
name of dataset: vote1
no of variables: 10
no of observations: 173

+----------+---------------------------------+
| variable | label                           |
+----------+---------------------------------+
| state    | state postal code               |
| district | congressional district          |
| democA   | =1 if A is democrat             |
| voteA    | percent vote for A              |
| expendA  | camp. expends. by A, $1000s     |
| expendB  | camp. expends. by B, $1000s     |
| prtystrA | % vote for president            |
| lexpendA | log(expendA)                    |
| lexpendB | log(expendB)                    |
| shareA   | 100*(expendA/(expendA+expendB)) |
+----------+---------------------------------+

From M. Barone and G. Ujifusa, The Almanac of American Politics, 1992.
Washington, DC: National Journal.
df3.head()
state district democA voteA expendA expendB prtystrA lexpendA lexpendB shareA
0 AL 7 1 68 328.295990 8.737000 41 5.793916 2.167567 97.407669
1 AK 1 0 62 626.377014 402.476990 60 6.439952 5.997638 60.881039
2 AZ 2 1 73 99.607002 3.065000 55 4.601233 1.120048 97.014763
3 AZ 3 0 69 319.690002 26.281000 64 5.767352 3.268846 92.403702
4 AR 3 0 75 159.220993 60.054001 66 5.070293 4.095244 72.612473
model3 = smf.ols(formula='voteA ~ shareA', data=df3).fit()
print(model3.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  voteA   R-squared:                       0.856
Model:                            OLS   Adj. R-squared:                  0.855
Method:                 Least Squares   F-statistic:                     1018.
Date:                Tue, 09 Jul 2024   Prob (F-statistic):           6.63e-74
Time:                        21:56:56   Log-Likelihood:                -565.20
No. Observations:                 173   AIC:                             1134.
Df Residuals:                     171   BIC:                             1141.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     26.8122      0.887     30.221      0.000      25.061      28.564
shareA         0.4638      0.015     31.901      0.000       0.435       0.493
==============================================================================
Omnibus:                       20.747   Durbin-Watson:                   1.826
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               44.613
Skew:                           0.525   Prob(JB):                     2.05e-10
Kurtosis:                       5.255   Cond. No.                         112.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

if Candidate A’s share of spending increases by one percentage point, Candidate A receives almost one-half a percentage point (0.464) more of the total vote.

Example 2.6 Table 2.2#

df['salary_hat'] = model.fittedvalues
df['uhat'] = model.resid
df[['roe','salary','salary_hat','uhat']].head(16)
roe salary salary_hat uhat
0 14.100000 1095 1224.058071 -129.058071
1 10.900000 1001 1164.854261 -163.854261
2 23.500000 1122 1397.969216 -275.969216
3 5.900000 578 1072.348338 -494.348338
4 13.800000 1368 1218.507712 149.492288
5 20.000000 1145 1333.215063 -188.215063
6 16.400000 1078 1266.610785 -188.610785
7 16.299999 1094 1264.760660 -170.760660
8 10.500000 1237 1157.453793 79.546207
9 26.299999 833 1449.772523 -616.772523
10 25.900000 567 1442.372056 -875.372056
11 26.799999 933 1459.023116 -526.023116
12 14.800000 1339 1237.008898 101.991102
13 22.299999 937 1375.767778 -438.767778
14 56.299999 2011 2004.808114 6.191886
15 12.600000 1585 1196.306291 388.693709

The first four CEOs have lower salaries than what we predicted from the OLS regression line (2.26); in other words, given only the firm’s roe, these CEOs make less than what we predicted. As can be seen from the positive uhat, the fifth CEO makes more than predicted from the OLS regression line.

Example 2.7 Wage & education#

df2['wage'].mean()
np.float64(5.896102674787035)
df2['educ'].mean()
np.float64(12.562737642585551)
print(model2.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   wage   R-squared:                       0.165
Model:                            OLS   Adj. R-squared:                  0.163
Method:                 Least Squares   F-statistic:                     103.4
Date:                Tue, 09 Jul 2024   Prob (F-statistic):           2.78e-22
Time:                        21:56:56   Log-Likelihood:                -1385.7
No. Observations:                 526   AIC:                             2775.
Df Residuals:                     524   BIC:                             2784.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.9049      0.685     -1.321      0.187      -2.250       0.441
educ           0.5414      0.053     10.167      0.000       0.437       0.646
==============================================================================
Omnibus:                      212.554   Durbin-Watson:                   1.824
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              807.843
Skew:                           1.861   Prob(JB):                    3.79e-176
Kurtosis:                       7.797   Cond. No.                         60.2
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
model2.predict(pd.DataFrame({'educ': [12.56]}))
0    5.894621
dtype: float64

\(\bar{x}\) and \(\bar{y}\) fall on the regression line

Example 2.8. CEO Salary - R-squared#

print(model.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 salary   R-squared:                       0.013
Model:                            OLS   Adj. R-squared:                  0.008
Method:                 Least Squares   F-statistic:                     2.767
Date:                Tue, 09 Jul 2024   Prob (F-statistic):             0.0978
Time:                        21:56:56   Log-Likelihood:                -1804.5
No. Observations:                 209   AIC:                             3613.
Df Residuals:                     207   BIC:                             3620.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept    963.1913    213.240      4.517      0.000     542.790    1383.592
roe           18.5012     11.123      1.663      0.098      -3.428      40.431
==============================================================================
Omnibus:                      311.096   Durbin-Watson:                   2.105
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            31120.902
Skew:                           6.915   Prob(JB):                         0.00
Kurtosis:                      61.158   Cond. No.                         43.3
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
model.rsquared
np.float64(0.01318862408103405)

The firm’s return on equity explains only about 1.3% of the variation in salaries for this sample of 209 CEOs.

Example2.9 Voting outcome - R-squared.#

print(model3.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  voteA   R-squared:                       0.856
Model:                            OLS   Adj. R-squared:                  0.855
Method:                 Least Squares   F-statistic:                     1018.
Date:                Tue, 09 Jul 2024   Prob (F-statistic):           6.63e-74
Time:                        21:56:56   Log-Likelihood:                -565.20
No. Observations:                 173   AIC:                             1134.
Df Residuals:                     171   BIC:                             1141.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     26.8122      0.887     30.221      0.000      25.061      28.564
shareA         0.4638      0.015     31.901      0.000       0.435       0.493
==============================================================================
Omnibus:                       20.747   Durbin-Watson:                   1.826
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               44.613
Skew:                           0.525   Prob(JB):                     2.05e-10
Kurtosis:                       5.255   Cond. No.                         112.
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
model3.rsquared
np.float64(0.8561408655827665)

The share of campaign expenditures explains over 85% of the variation in the election outcomes for this sample

Exercises#

C1#

The data in 401K are a subset of data analyzed by Papke (1995) to study the relationship between participation in a 401(k) pension plan and the generosity of the plan. The variable prate is the percentage of eligible workers with an active account; this is the variable we would like to explain. The measure of generosity is the plan match rate, mrate. This variable gives the average amount the firm contributes to each worker’s plan for each \(1 contribution by the worker. For example , if mrate = 0.50, then a \)1 contribution by the worker is matched by a 50¢ contribution by the firm.

df = dataWoo('401K')
df.head()
prate mrate totpart totelg age totemp sole ltotemp
0 26.100000 0.21 1653.0 6322.0 8 8709.0 0 9.072112
1 100.000000 1.42 262.0 262.0 6 315.0 1 5.752573
2 97.599998 0.91 166.0 170.0 10 275.0 1 5.616771
3 100.000000 0.42 257.0 257.0 7 500.0 0 6.214608
4 82.500000 0.53 591.0 716.0 28 933.0 1 6.838405
dataWoo('401K', description=True)
name of dataset: 401k
no of variables: 8
no of observations: 1534

+----------+---------------------------------+
| variable | label                           |
+----------+---------------------------------+
| prate    | participation rate, percent     |
| mrate    | 401k plan match rate            |
| totpart  | total 401k participants         |
| totelg   | total eligible for 401k plan    |
| age      | age of 401k plan                |
| totemp   | total number of firm employees  |
| sole     | = 1 if 401k is firm's sole plan |
| ltotemp  | log of totemp                   |
+----------+---------------------------------+

L.E. Papke (1995), “Participation in and Contributions to 401(k)
Pension Plans:Evidence from Plan Data,” Journal of Human Resources 30,
311-325. Professor Papke kindly provided these data. She gathered them
from the Internal Revenue Service’s Form 5500 tapes.

(i) Find the average participation rate and the average match rate in the sample of plans.

print("Average participation rate:", round(df['prate'].mean(), 2))
print("Average match rate:", round(df['mrate'].mean(), 2))
Average participation rate: 87.36
Average match rate: 0.73

(ii) Now, estimate the simple regression equation $\( \hat{prate} = \hat{b}_0 + \hat{b}_1 mrate \)$

and report the results along with the sample size and R-squared

prate_hat = smf.ols("prate ~ 1 + mrate", data=df).fit()

print("results:", prate_hat.params)

print("R squared:", prate_hat.rsquared.__round__(3))

print("Sample size:", prate_hat.nobs)
results: Intercept    83.075455
mrate         5.861079
dtype: float64
R squared: 0.075
Sample size: 1534.0

(iii) Interpret the intercept in your equation. Interpret the coefficient on mrate.

print('intercept:', prate_hat.params.iloc[0].__round__(2))
intercept: 83.08

Find the predicted prate when mrate = 3.5. Is this a reasonable prediction? Explain what is happening here.

round(prate_hat.predict({'mrate': 3.5}), 2)
0    103.59
dtype: float64

(v) How much of the variation in prate is explained by mrate? Is this a lot in your opinion?

print("Percentage explained:", round(prate_hat.rsquared * 100, 1))
Percentage explained: 7.5

C2#

The data set in CEOSAL2 contains information on chief executive officers for U.S. corporations. The variable salary is annual compensation, in thousands of dollars, and ceoten is prior number of years as company CEO.

df2 = dataWoo("CEOSAL2")
df2.head()
salary age college grad comten ceoten sales profits mktval lsalary lsales lmktval comtensq ceotensq profmarg
0 1161 49 1 1 9 2 6200.0 966 23200.0 7.057037 8.732305 10.051908 81 4 15.580646
1 600 43 1 1 10 10 283.0 48 1100.0 6.396930 5.645447 7.003066 100 100 16.961130
2 379 51 1 1 9 3 169.0 40 1100.0 5.937536 5.129899 7.003066 81 9 23.668638
3 651 55 1 0 22 22 1100.0 -54 1000.0 6.478509 7.003066 6.907755 484 484 -4.909091
4 497 44 1 1 8 6 351.0 28 387.0 6.208590 5.860786 5.958425 64 36 7.977208
dataWoo("CEOSAL2", description=True)
name of dataset: ceosal2
no of variables: 15
no of observations: 177

+----------+--------------------------------+
| variable | label                          |
+----------+--------------------------------+
| salary   | 1990 compensation, $1000s      |
| age      | in years                       |
| college  | =1 if attended college         |
| grad     | =1 if attended graduate school |
| comten   | years with company             |
| ceoten   | years as ceo with company      |
| sales    | 1990 firm sales, millions      |
| profits  | 1990 profits, millions         |
| mktval   | market value, end 1990, mills. |
| lsalary  | log(salary)                    |
| lsales   | log(sales)                     |
| lmktval  | log(mktval)                    |
| comtensq | comten^2                       |
| ceotensq | ceoten^2                       |
| profmarg | profits as % of sales          |
+----------+--------------------------------+

See CEOSAL1.RAW

(i) Find the average salary and the average tenure in the sample.

print("Average Salary:", round(df2['salary'].mean(), 3))
print("Average ceoten", round(df2["ceoten"].mean(), 2))
Average Salary: 865.864
Average ceoten 7.95

(ii) How many CEOs are in their first year as CEO (that is, ceoten = 0)? What is the longest tenure as a CEO?

print("Number of first year CEO:", (df2['ceoten'] == 0).sum())
print("Longest Tenure:", df2["ceoten"].max())
Number of first year CEO: 5
Longest Tenure: 37

(iii) Estimate the simple regression model $\( \log(salary) = {B}_0 + {B}_1 {ceoten} + u, \)$ and report your results in the usual form. What is the (approximate) predicted percentage increase in salary given one more year as a CEO?

log_salary_hat = smf.ols("np.log(salary) ~ 1 + ceoten", data=df2).fit()

print("Paramters:\n", log_salary_hat.params, sep='')

print("Percentage increase:", round(log_salary_hat.params.iloc[1] * 100, 2))
Paramters:
Intercept    6.505498
ceoten       0.009724
dtype: float64
Percentage increase: 0.97

C3#

Use the data in SLEEP75 from Biddle and Hamermesh (1990) to study whether there is a tradeoff between the time spent sleeping per week and the time spent in paid work. We could use either variable as the dependent variable. For concreteness, estimate the model $\( sleep = B_0 + B_1 totwrk + u \)$ where sleep is minutes spent sleeping at night per week and totwrk is total minutes worked during the week

df3 = dataWoo("sleep75")
df3.head()
age black case clerical construc educ earns74 gdhlth inlf leis1 ... spwrk75 totwrk union worknrm workscnd exper yngkid yrsmarr hrwage agesq
0 32 0 1 0.0 0.0 12 0.0 0 1 3529 ... 0 3438 0 3438 0 14 0 13 7.070004 1024
1 31 0 2 0.0 0.0 14 9500.0 1 1 2140 ... 0 5020 0 5020 0 11 0 0 1.429999 961
2 44 0 3 0.0 0.0 17 42500.0 1 1 4595 ... 1 2815 0 2815 0 21 0 0 20.529997 1936
3 30 0 4 0.0 0.0 12 42500.0 1 1 3211 ... 1 3786 0 3786 0 12 0 12 9.619998 900
4 64 0 5 0.0 0.0 14 2500.0 1 1 4052 ... 1 2580 0 2580 0 44 0 33 2.750000 4096

5 rows × 34 columns

dataWoo("sleep75", description=True)
name of dataset: sleep75
no of variables: 34
no of observations: 706

+----------+--------------------------------+
| variable | label                          |
+----------+--------------------------------+
| age      | in years                       |
| black    | =1 if black                    |
| case     | identifier                     |
| clerical | =1 if clerical worker          |
| construc | =1 if construction worker      |
| educ     | years of schooling             |
| earns74  | total earnings, 1974           |
| gdhlth   | =1 if in good or excel. health |
| inlf     | =1 if in labor force           |
| leis1    | sleep - totwrk                 |
| leis2    | slpnaps - totwrk               |
| leis3    | rlxall - totwrk                |
| smsa     | =1 if live in smsa             |
| lhrwage  | log hourly wage                |
| lothinc  | log othinc, unless othinc < 0  |
| male     | =1 if male                     |
| marr     | =1 if married                  |
| prot     | =1 if Protestant               |
| rlxall   | slpnaps + personal activs      |
| selfe    | =1 if self employed            |
| sleep    | mins sleep at night, per wk    |
| slpnaps  | minutes sleep, inc. naps       |
| south    | =1 if live in south            |
| spsepay  | spousal wage income            |
| spwrk75  | =1 if spouse works             |
| totwrk   | mins worked per week           |
| union    | =1 if belong to union          |
| worknrm  | mins work main job             |
| workscnd | mins work second job           |
| exper    | age - educ - 6                 |
| yngkid   | =1 if children < 3 present     |
| yrsmarr  | years married                  |
| hrwage   | hourly wage                    |
| agesq    | age^2                          |
+----------+--------------------------------+

J.E. Biddle and D.S. Hamermesh (1990), “Sleep and the Allocation of
Time,” Journal of Political Economy 98, 922-943. Professor Biddle
kindly provided the data.