warpread
← Blog

AP Statistics Study Guide: Data, Inference, and the Writing That Earns Full Credit

10 min readBy warpread.app

AP Statistics differs from other AP math courses in a fundamental way: it is primarily a reasoning course that happens to involve mathematical procedures, not a computation course. The exam rewards students who can think statistically — understanding what data reveals, what inference allows us to conclude, and how to communicate statistical reasoning in precise language.

Students who focus only on formula memorisation and procedure execution consistently underperform. Students who also practice the written communication requirements — describing distributions, interpreting outputs, writing conclusions — consistently outperform their expectations.

Exploring data: describing distributions fully

Every distribution description must include four elements: Shape, Centre, Spread, and Outliers (SOCS). Skipping any element costs points.

Shape: Symmetric vs skewed. Right-skewed (tail extends to the right, mean > median). Left-skewed (tail extends to the left, mean < median). Unimodal vs bimodal. Normal-looking.

Centre: Mean (use for symmetric distributions) or median (use for skewed distributions or when outliers are present).

Spread: Standard deviation (paired with mean) or IQR (paired with median). Range is rarely sufficient alone.

Outliers: Identify using the 1.5 × IQR rule: below Q1 − 1.5(IQR) or above Q3 + 1.5(IQR). State both whether outliers exist and their approximate values.

Comparing two distributions: Use all four elements, explicitly comparing them: "Distribution A has a higher median (45 vs 32) and is more spread out (IQR = 20 vs IQR = 12). Both are approximately symmetric with no apparent outliers."

Regression: Know how to interpret: slope ("for each additional [unit of x], y is predicted to increase/decrease by [slope value]"), y-intercept (when meaningful), correlation coefficient r (direction and strength), and coefficient of determination r² ("approximately [r²×100]% of the variation in y is explained by the linear relationship with x"). The residual = observed y − predicted y. Always check a residual plot for patterns — if a pattern exists, the linear model is not appropriate.

Use the Cornell Notes Tool to build an interpretation template for each output type — keep the template language alongside the statistical formula.

Sampling and experimental design

Sampling methods: Simple random sample (SRS) — every possible sample of size n has an equal probability of selection. Stratified random sample — divide into strata, take SRS from each. Cluster sample — divide into clusters, randomly select clusters, include all individuals in selected clusters. Systematic sample — every kth individual. Know the advantages and disadvantages of each.

Sources of bias: Voluntary response bias (self-selection produces non-representative samples — online polls), undercoverage bias (some groups excluded from the sampling frame), nonresponse bias (selected individuals who don't respond differ systematically from those who do), question wording bias (leading questions).

Experimental vs observational study: The key distinction: in experiments, the researcher assigns treatments; in observational studies, the researcher observes without intervention. Only experiments can establish causation. Know the three principles of experimental design: randomisation (reduces confounding), replication (increases reliability), control (controls for extraneous variables). Know the difference between a completely randomised design and a randomised block design.

Confounding variables: When an extraneous variable is related to both the explanatory and response variables, making it impossible to determine the true effect. Distinguish from a lurking variable (same issue, but for observational studies).

Probability and sampling distributions

Basic probability: P(A or B) = P(A) + P(B) − P(A and B). P(A and B) = P(A) × P(B|A). Independent events: P(A and B) = P(A) × P(B). Conditional probability: P(A|B) = P(A and B)/P(B).

Random variables: Mean of X: μ_X = Σ[x · P(X=x)]. Variance: σ²_X = Σ[(x − μ)² · P(X=x)]. Linear combinations: μ_(aX+b) = aμ_X + b; σ_(aX+b) = |a|σ_X. For independent random variables: μ_(X+Y) = μ_X + μ_Y; σ²_(X+Y) = σ²_X + σ²_Y.

Central Limit Theorem (CLT): For a random sample of size n from any population with mean μ and standard deviation σ, the sampling distribution of x̄ is approximately normal with mean μ and standard deviation σ/√n, provided n is large (n ≥ 30 as a rule of thumb, though the normal-looking population requires smaller n). This is the theoretical foundation of all inference procedures — know it thoroughly.

Inference: the heart of the course

The structure of every hypothesis test: (1) State hypotheses — H₀ (null, equality statement) and Hₐ (alternative, inequality); (2) Check conditions — Random, Normal/Large sample, Independence (10% condition when sampling without replacement); (3) Calculate test statistic and p-value; (4) State conclusion in context — include the p-value, compare to α, and interpret in the problem's context.

Interpreting p-values correctly: A p-value is the probability of observing a test statistic at least as extreme as the one calculated, assuming H₀ is true. It is NOT the probability that H₀ is true. Never say "the probability that the null hypothesis is true is p."

Interpreting confidence intervals correctly: "We are 95% confident that the true [parameter] is between [lower bound] and [upper bound]." The 95% confidence level means that if we repeated this procedure many times, about 95% of the resulting intervals would contain the true parameter. Do NOT say "there is a 95% probability the true value is in this specific interval" — that specific interval either does or does not contain the true value.

t-procedures vs z-procedures: Use t (t-distribution, df = n−1 for one sample) when: the population standard deviation σ is unknown and you are using the sample standard deviation s. Use z when: you know σ (rare in practice) or when dealing with proportions.

Chi-square tests: Goodness-of-fit tests whether an observed distribution matches an expected distribution. Test for independence/association tests whether two categorical variables are related. Expected count formula: E = (row total × column total) / table total. Conditions: all expected counts ≥ 5. Chi-square statistic: Σ[(O − E)²/E].

The Spaced Repetition Flashcard Tool is highly effective for statistics — one card per condition check ("What are the conditions for a two-sample t-test?"), per interpretation template, and per formula. Use the Pomodoro Timer for timed free-response practice — write a complete four-step hypothesis test within 15 minutes. Released College Board free-response questions from the past five years are the most authentic practice material available.

Topics

AP Statistics study guideAP Statistics reviewAP Statistics exam prepAP Statistics inferenceAP Statistics free responseAP Statistics hypothesis testingAP Statistics tipsAP Statistics score 5

Prepare for AP exams and college coursework

Build AP flashcard decks with the Spaced Repetition Flashcard Tool, use the Cornell Notes Tool for content-heavy AP subjects, and the Pomodoro Timer to structure daily study sessions.