Core Curriculum
Module 4: Distributions & Statistical Concepts
Core Topics
Data types, central tendency, variability, normal distribution, skewness, regression to the mean, and categorical data analysis — the essential statistical toolkit for medical research.
📊 4.1 Types of Data (Levels of Measurement)
Categorical Data
Nominal: unordered categories (blood type, sex, race).
Ordinal: ordered categories, unequal intervals (pain scale, cancer stage).
Continuous (Numerical) Data
Interval: equal intervals, no true zero (°C, °F).
Ratio: equal intervals, true zero (weight, height, BP).
📌 Choosing the correct statistical test depends on the type of data and distribution.
📈 4.2 Measures of Central Tendency
Mean = sum/n — sensitive to outliers. Median = middle value — robust to outliers, preferred for skewed data. Mode = most frequent — used for categorical data.
Normal distribution: mean = median = mode
Right‑skewed: mean > median > mode
Left‑skewed: mode > median > mean
📏 4.3 Measures of Variability & Dispersion
- Range: max – min (sensitive to outliers).
- Variance: average squared deviation from mean.
- Standard Deviation (SD): sqrt(variance); average distance from mean.
- Interquartile Range (IQR): Q3 – Q1 (middle 50%, robust).
- Standard Error (SE): SD/√n — precision of sample mean.
SE decreases as sample size increases → more precise estimate of population mean.
🔔 4.4 The Normal Distribution
Symmetric, bell‑shaped, defined by mean (μ) and SD (σ). Empirical rule (68‑95‑99.7):
μ ± 1σ → 68% of data
μ ± 2σ → 95% of data
μ ± 3σ → 99.7% of data
Z‑score = (X – μ)/σ converts to standard normal (mean 0, SD 1). Many biological variables (height, BP) approximate normal distribution.
📉 4.5 Skewed & Other Distributions
Right‑skewed (positive)
Tail on right, mean > median. Examples: income, hospital LOS, survival times.
Left‑skewed (negative)
Tail on left, mean < median. Examples: age at death in developed countries.
Bimodal
Two peaks → suggests two subpopulations (e.g., age distribution with children and elderly).
Binomial & Poisson
Binomial: successes in n trials; Poisson: rare events over time/space.
📌 For skewed data, use median and IQR, and consider non‑parametric tests.
🔄 4.6 Regression to the Mean
Extreme measurements tend to move toward the population mean on repeat testing due to natural variation and measurement error. This can mimic a treatment effect in before‑after studies without a control group. Solution: include a concurrent control group.
📈 Example: Patients selected for highest BP often show lower BP on repeat, even without intervention.
📊 4.7 Analyzing Categorical Data
Chi‑Square Test (χ²)
Tests association between two categorical variables. Assumption: expected frequency ≥5 per cell.
Fisher’s Exact Test
Used when expected frequencies <5; exact p‑value.
McNemar’s Test
For paired categorical data (before‑after, matched case‑control).
2×2 table: χ² = Σ (O – E)² / E; df = (rows‑1)×(cols‑1).