Econometric Data Science

Prof. F.X. Diebold

Fall 2019

Welcome!

This course provides an undergraduate introduction to modern econometrics, in
both cross-section and time-series environments.

Prerequisites: Econ 103 must be taken *prior*
to Econ 104. (Certain other introductory Penn courses or course sequences in
probability and statistics including regression may be acceptable, again taken *prior* to Econ 104, such as Penn's Stat
430+431 or Penn's Engineering equivalent. Any other
background must be explicitly approved.)

Heavily-used site: *Econometric Data Science*
(open text, slides, data, code, etc.). The site is constantly evolving, so
check frequently for updates. The course outline is effectively the text's
table of contents, but the slides and lectures, *not the text*, are the centerpiece of everything. The text is
offered as one tool to help you understand the slides and lectures, which,
again, are the centerpiece of everything. Although the slides are
self-contained, there is no way to understand them without attending lectures,
where I will interpret/embellish/generalize/specialize them, guiding you
selectively. Hence regular class attendance is absolutely essential.

Relevant texts, recommended but not required, include: Gujarati, *Econometrics by Example*, latest
edition (pragmatic, easy to read); Wooldridge, *Introductory Econometrics: A Modern Approach*, latest
edition (balanced and comprehensive); Stock and Watson, *Introduction to Econometrics*,
latest edition (deep and insightful, worth the time investment).

Software: Your choice. The *Econometric Data Science* site
has some R, EViews, Stata, and Python code samples. R
is the official course software.

TA office hours as
announced in class. Professor Diebold’s office
hours (held in PCPSE 607) here.

Grading will be based on equally-weighted problem sets (50% of final grade)
and equally-weighted exams (50% of final grade). Problem sets are due at the
start of class on the assigned day. *Under
no circumstances will late problem sets be accepted*, so be sure to start
(and finish) them early, to insure against illness and emergencies.

**Important Administrative
Policies: **Here. (*Read them carefully.*)

**Important Dates and Assignments (All are tentative until
confirmed/discussed in class)**:

*** Tues Sept 10 ***

Add period ends.

*** Thurs Oct 3 ***

In-Class Exam 1 (No books, notes, electronic devices, etc.)

*** Tues Oct 8 ***

Problem Set 1 (Must be done alone. Show all code in an appendix.)

Obtain the test score dataset. (1) Display a scatterplot of
math score (MATH_SCR) vs. student/teacher ratio (STR). (2) Regress MATH_SCR on
an intercept alone. Interpret this regression and discuss your results. In this
intercept regression framework, how would you test the hypothesis that the
(population) mean score is 82? Do it, and discuss your results. Now conduct a
one-sided hypothesis test that the population mean score is not less than 82
and discuss your results. (3) Regress MATH_SCR on an intercept and STR. Discuss
your results. Do you need an intercept? Again graph
MATH_SCR vs. STR, this time with your preferred fitted regression line
superimposed.

*** Mon Oct 7 ***

Drop period ends.

*** Fri Oct 25 ***

Grade type change deadline.

*** Thurs Oct 31 ***

In-Class Exam 2 (No books, notes, electronic devices, etc.)

*** Mon Nov 4 ***

Last day to withdraw.

*** Tues Nov 5 ***

Problem Set 2 (May be done in groups of at most three. I expect a creative
analysis, well-defended yet qualified as appropriate, thorough yet concise,
maximum 15 pages. Show all code in an appendix.)

(1)
Regress READING score on student/teacher ratio. (2) Select a "best"
predictive regression model for reading score. Among other things, you may want
to consider non-normality, outliers, group effects, nonlinearities, and heteroskedasticity. Do the results differ from those of
Regression 1? Interpret your results. (3) Repeat 1 and 2 with a predictive
regression model for MATH score. Are your selected models the same for
reading and math? (4) Suppose California creates a new school district,
and that legislators mandate a 15/1 student/teacher ratio. Based on
that information alone, predict the new district's average reading score
(point, interval, density). (5) Now suppose that,
in addition, you learn that the new district has average income $7,000, 50%
English learners, 60 % qualifying for a reduced-price lunch, and all other
variables are at their dataset sample mean. Predict the district's average
reading score (point, interval, density).

*** Thurs Dec 5 ***

In-Class Exam 3 (No books, notes, electronic devices, etc.)

*** Mon Dec 16 ***

Problem Set 3 (May be done in groups of at most three. I expect a creative
analysis, well-defended yet qualified as appropriate, thorough yet concise,
maximum 15 pages. Show all code in an appendix.)

(1) Specify and estimate a
model of U.S. monthly domestic auto sales, NSA (series DAUTONSA from FRED),
using ONLY data for January 1967 - February 2019. Among other things, you
may want to consider trend, seasonality and other calendar effects,
nonlinearity, and autoregressive dynamics (with at most six lags). Do NOT worry
about possible structural change, non-normality, or heteroskedasticity.
(2) Use your preferred model from part 1 to make out-of-sample point, interval,
and density forecasts of DAUTONSA for March 2019. Evaluate the performance your
forecasts. Again, do not worry about possible structural change,
non-normality, or heteroskedasticity. (In particular,
construct your forecasts assuming structural stability and Gaussian
disturbances with constant variance.) (3) Now worry about possible
structural change and/or non-normality and/or heteroskedasticity,
and re-do the analyses of parts 1 and 2. How, if at all, do your preferred
model and forecasts change? (4) Bonus (+5 points max, +4 pages max): How would
you extend your point, interval, and density forecasts to September 2019?

**Note Well:** *Changes may be implemented at any time. Check this site frequently, and
attend class, for updates and explanations.*