Module 4: Distributions & Statistical Concepts — Core Topics

📊 4.1 Types of Data (Levels of Measurement)

Categorical Data
Nominal: unordered categories (blood type, sex, race).
Ordinal: ordered categories, unequal intervals (pain scale, cancer stage).

Continuous (Numerical) Data
Interval: equal intervals, no true zero (°C, °F).
Ratio: equal intervals, true zero (weight, height, BP).

📌 Choosing the correct statistical test depends on the type of data and distribution.

📈 4.2 Measures of Central Tendency

Mean = sum/n — sensitive to outliers. Median = middle value — robust to outliers, preferred for skewed data. Mode = most frequent — used for categorical data.

Normal distribution: mean = median = mode
Right‑skewed: mean > median > mode
Left‑skewed: mode > median > mean

📏 4.3 Measures of Variability & Dispersion

Range: max – min (sensitive to outliers).
Variance: average squared deviation from mean.
Standard Deviation (SD): sqrt(variance); average distance from mean.
Interquartile Range (IQR): Q3 – Q1 (middle 50%, robust).
Standard Error (SE): SD/√n — precision of sample mean.

SE decreases as sample size increases → more precise estimate of population mean.

🔔 4.4 The Normal Distribution

Symmetric, bell‑shaped, defined by mean (μ) and SD (σ). Empirical rule (68‑95‑99.7):

μ ± 1σ → 68% of data
μ ± 2σ → 95% of data
μ ± 3σ → 99.7% of data

Z‑score = (X – μ)/σ converts to standard normal (mean 0, SD 1). Many biological variables (height, BP) approximate normal distribution.

📉 4.5 Skewed & Other Distributions

Right‑skewed (positive)
Tail on right, mean > median. Examples: income, hospital LOS, survival times.

Left‑skewed (negative)
Tail on left, mean < median. Examples: age at death in developed countries.

Bimodal
Two peaks → suggests two subpopulations (e.g., age distribution with children and elderly).

Binomial & Poisson
Binomial: successes in n trials; Poisson: rare events over time/space.

📌 For skewed data, use median and IQR, and consider non‑parametric tests.

🔄 4.6 Regression to the Mean

Extreme measurements tend to move toward the population mean on repeat testing due to natural variation and measurement error. This can mimic a treatment effect in before‑after studies without a control group. Solution: include a concurrent control group.

📈 Example: Patients selected for highest BP often show lower BP on repeat, even without intervention.

📊 4.7 Analyzing Categorical Data

Chi‑Square Test (χ²)
Tests association between two categorical variables. Assumption: expected frequency ≥5 per cell.

Fisher’s Exact Test
Used when expected frequencies <5; exact p‑value.

McNemar’s Test
For paired categorical data (before‑after, matched case‑control).

2×2 table: χ² = Σ (O – E)² / E; df = (rows‑1)×(cols‑1).

Continue Your Learning

High‑Yield Subtopics Clinical Correlation Full Course Map

Back to Biostatistics & Epidemiology Hub

Module 4: Distributions & Statistical ConceptsCore Topics