Becoming Fluent in Data
I Intro
Preface
About the book
Aims of this book
For students
For instructors
Structure
Color
Components
Why R?
Why Tidyverse?
About the author
Dust and Dark
Teach – Learn – Repeat
Intro to R
R is a calculator
R is more than a calculator
Define objects
Numeric Variables
Logical Variables
String Variables
Factor Variables
Combine objects
Vectors
Data frame
Plots
Intro to Tidyverse
Data with readr
Verbs of dplyr
Glimpse
Select
Arrange
Rename
The pipe operator
Filter
Mutate
Summarize
Graphs with ggplot2
II Data
1
Data is everywhere
1.1
Why we measure
1.1.1
Women are having far fewer children.
1.1.2
Global surface temperature is rising.
1.2
Means of measuring
1.3
Types of data
1.3.1
Origin of data
1.3.2
Analysis of data
1.3.3
Structure of data
1.3.4
The level of access
1.4
Can we measure everything?
1.5
The reality behind the data
2
Stories and Visuals
2.1
Facts
2.2
Visualization
2.3
Telling a story
2.4
Man's best friend
2.5
Less is more
2.6
Grammar of Graphics
3
Tabular Data
3.1
Types of Tabular Data
3.1.1
Cross-section
3.1.2
Repeated cross-section
3.1.3
Time series
3.1.4
Panel data
4
Panel Data
4.1
Unemployment
4.1.1
On decline in Germany
4.1.2
Measurement
4.2
Application
4.2.1
Data Inspection
4.2.2
Data Preparation
4.2.3
Data Visualization
4.3
Panel Studies
5
Time data
5.1
Measuring Time
5.2
Measuring Dates
5.3
Your First Time (in R)
5.4
Time Zones
5.5
Time Management in R
5.5.1
Decimal Time
5.5.2
Time Formats
5.6
Coffee Spending
5.6.1
Spending Time of Day
5.6.2
Run Chart
5.6.3
Run Chart Grouped
5.7
Run Chart Grouped Cleaned
6
Web Data
6.1
Most expensive paintings
6.2
Student numbers at Viadrina
6.2.1
PDF scraping
6.2.2
Share of female students
6.2.3
Share of foreign students
6.2.4
Web scraping
6.2.5
Most recent student numbers
6.2.6
The long run trend
7
Geo Data
7.1
Geo coordinates
7.1.1
Where Are You?
7.1.2
Latitude and longitude
7.1.3
Angles and Degrees
7.1.4
Coordinate Reference System
7.1.5
Distance measurement
7.1.6
Points And Polygons
7.1.7
Shapefiles
7.2
Google Takeout
7.3
Blood Donation
8
Missing Data
8.1
Types of Missing Data
8.1.1
Missing Completely at Random (MCAR)
8.1.2
Missing at Random (MAR)
8.1.3
Missing Not at Random (MNAR)
8.2
Causes of Missing Data
8.3
Causes of Missing Data
8.4
Causes of Missing Data
8.5
Causes of Missing Data
8.6
Missing Data in R
III Models
9
Relationships
9.1
Storks Deliver Babies
9.2
Statistics
9.2.1
Variance
9.2.2
Standard Deviation
9.2.3
Covariance
9.2.4
Correlation
9.2.5
An Early Glimpse into Regression
9.3
Visualizations
9.3.1
Storks and Population in a Scatterplot
9.3.2
Storks and Population in a Barplot
9.3.3
Storks and Area
9.4
Spurious Relationships
Readings
10
Regression
10.1
Old but Gold
10.2
Data is everywhere
10.2.1
Data in a table
10.2.2
Data in a graph
10.2.3
The trend
10.2.4
The blackbox
10.2.5
Nobody's perfect
10.2.6
Vocab Wrap-Up
10.3
For the truly dedicated
10.3.1
Algebra
10.3.2
Analysis
10.3.3
Take the Long Way Home
10.4
Survival of the Fittest Line
10.5
On the Shoulders of Giants
11
Linear Regression
11.1
What You Deserve Is What You Get
11.2
Data & Sample
11.3
Data Visualization
11.4
Simplest Regression
11.5
Simple Regression
11.5.1
X is continuous
11.5.2
X is a dummy
11.5.3
X is categorical
11.5.4
X is categorical, is it?
11.6
Parallel Slopes
11.6.1
X is continuous + dummy
11.7
Model Comparison
11.8
Transform to Perform
12
Interaction Models
12.1
Motivation
12.2
Data & Sample
12.3
Throwback Parallel Slopes
12.4
Regression with Moderators
12.4.1
Dummy * Dummy
12.4.2
Categorical * Dummy
12.4.3
Continuous * Dummy
12.4.4
Continuous * Continuous
12.5
Model Comparison
References
12.6
Resources
Visit my personal page
Becoming Fluent in Data
This book is
Work in Progress
. I appreciate your feedback to make the book better.
Becoming Fluent in Data
A Personal Journey – Every Time.
Marco Kühne
2024-02-12
Preface