<Preface>
Statistics
or data analysis has been a challenging subject for many students since it is
based on mathematics, and there are difficult concepts related to probability.
I experienced similar frustration when I was in college and graduate school.
Ironically, despite such difficulties, my interest in data analysis grew and
led me to my current research and teaching. Since I was a graduate student, I
have taught data analysis courses for more than 15 years, and I have been
trying to find a better way to teach this subject. From my teaching, I have
learned a new path to teaching data analysis. This textbook is the outcome of
my experience as a student and a teacher in the data analysis courses.
This book
is an introductory data analysis textbook for college and graduate students who
have not studied this subject before. Different from other textbooks in data
analysis, it focuses on the methods that are commonly used in quantitative
reports and research papers in social science. Particularly, it covers how the
sample mean and the regression model can be widely applicable to the
cross-sectional data for various purposes. On the other hand, this book does
not cover some of the conventional empirical methods that could be well
replaced by the OLS regression analysis, such as the Chi-square test, ANOVA,
and ANCOVA. Also, to help students understand, this book includes practical
examples and exercises.
This
textbook is designed for a one-semester course. Most of the instructors would
cover all chapters over 15 weeks by teaching one chapter within one to two
weeks. After the course with this book, students would be able to empirically
analyze various topics in social science using the sample mean and regression
model.
February,
2024
Haeil Jung
<Contents>
Chapter 1
How do we examine our interests with data?: Distribution and mean
•
Understanding our world with data
•
Mapping what we want to study into numbers
•
Less likely or more likely? Think about the probabilities of events
•
Which group of subjects do we want to study?: The population of interest and
the random sample
•
Random sample assumption and sampling methods
•
What useful information can we have from a sample?: sample mean and sample
variance
•
Normal distribution and its application: One of the most popular and useful
distributions
•
Alternative measures to mean: median and mode
•
Chapter Summary
•
Exercises
Chapter 2
Do more with the sample mean: Inference
•
Sampling distribution of the sample mean and the Central Limit Theorem
• The
confidence interval (CI) for the population mean μ
•
Hypothesis test for the population mean μ
• How
to choose an appropriate sample size in the survey for inference
•
Chapter Summary
•
Exercises
Chapter 3
Examining the relationship between the two quantitative variables I:
Correlation coefficient and introduction to the OLS regression analysis
•
Covarience and correlation coefficent
•
Introduction to the OLS regression analysis
•
Chapter Summary
•
Exercises
Chapter 4
Examining the relationship between the two continuous variables II: Inference
in the OLS regression analysis
• The
normally of the error term and the sampling distribution of the OLS estimator
• The
linear regression model when the sample size becomes larger
• The
Confidence Interval (CI) for the regression parameter β1
•
Hypothesis test for the regression parameter β1
•
Chapter Summary
•
Exercises
Chapter 5
Handling two or more explanatory variables in OLS regression analysis I:
Multivariate Regression Analysis
•
Partialling out and multicollinearity in multivariate regression analysis
•
Omitted variable bias in the linear regression model
•
Adding an explanatory variable and the efficiency of OLS estimators
•
Chapter Summary
•
Exercises
Chapter 6
Handling two or more explanatory variables in OLS regression analysis II:
Hypothesis tests and more in Multivariate Regression Analysis
•
Hypothesis tests in multivariable regression analysis
•
Adjusted R-squared
•
Chapter Summary
•
Exercises
Chapter 7
The OLS regression analysis when comparing the outcomes of the two or more
groups: Use of binary explanatory variables
•
Estimating group differences in an outcome variable
•
Estimating group differences in an outcome variable without the constant
•
Estimating group differences using an interval variable
•
Estimating group differences in a slope coefficient
•
Estimating group differences in all explanatory variables
•
Estimating the nonlinear relationship between an explanatory variable and an
outcome variable
•
Subsample analysis based on exogenous explanatory variables
•
Chapter Summary
•
Exercises
Chapter 8
Developing and completing the OLS regression analysis by using rescaling and
functional specifications
•
Rescaling of the outcome and explanatory variables
•
Linearity in the OLS analysis
•
Linear and nonlinear specifications in the OLS analysis
•
Choosing specifications by considering three different types of causal paths
•
General rules for including additional variables and making specifications in
multivariate regression analysis
•
Chapter Summary
•
Exercises
Chapter 9
The OLS regression analysis when the variance of the error term depends on the
explanatory variables: Heteroscedasticity
•
Chapter Summary
•
Exercises
Chapter 10
The regression analysis when the outcome variable is binary: LPM, Logit, and
Probit
•
Linear Probability Model (LPM): Using OLS when the outcome variable is binary
• The
estimation of logit and probit models
•
Statistical inference and goodness of it for probit and logit models
•
Chapter Summary
•
Exercises
Appendix
A. Software
programs for data analysis: SPSS, SAS, Stata, R
B. How to
do a reliable empirical study
C. z
distribution table: standard normal curve tail probabilities
D. t
distribution table: critical values of the t distribution
E.
Chi-square distribution table: critical values of the Chi-square distribution
F. F
distribution table: critical values of the F distribution
<Author>
Haeil Jung
is a professor in the Department of Public Administration at Korea University
in Seoul, South Korea. He earned his PhD degree in Public Policy from the
University of Chicago, Chicago, USA. Before assuming his current role, he was
an assistant professor in the Paul H. O'Neill School of Public and
Environmental Affairs at Indiana University, Bloomington, IN, USA, from 2009 to
2015. Additionally, from 2012 to 2020, he served as a consultant for the World
Bank, where he played a key role in the evaluation of the early childhood
education program in Indonesia. His research expertise lies in policy analysis
and program evaluation, particularly focusing on poverty, inequality, and
related social policy interventions. He has authored numerous peer-reviewed
research articles on diverse topics such as early childhood education, college
education, labor market participation, immigration, fertility, obesity,
incarceration, COVID-19, and empirical methods, making significant
contributions to these fields. Along with his research, he has a comprehensive
teaching background. He has taught introductory, intermediate, and advanced
data analysis courses, as well as social policy courses, at the University of
Chicago, Indiana University, and Korea University.
|