Module 5: Correlation, Regression & Probability — Core Topics

📈 5.1 Correlation Coefficients

Pearson correlation coefficient (r): measures strength and direction of a linear relationship between two continuous variables. Range –1 to +1. |r| > 0.7 = strong; 0.3–0.7 = moderate; <0.3 = weak. r² (coefficient of determination) = proportion of variance in Y explained by X.

Spearman rank correlation (ρ): non‑parametric alternative for ordinal data or non‑normal distributions. Correlation ≠ causation – associations may be due to confounding, reverse causation, or chance.

📌 Clinical pearl: Always examine scatterplots; outliers and restriction of range can distort correlation.

📐 5.2 Simple Linear Regression

Y = a + bX + ε

Slope (b): change in Y per one‑unit increase in X. Intercept (a): value of Y when X = 0. R²: proportion of variance explained. Residuals (observed – predicted) should be randomly scattered (homoscedasticity) and approximately normal.

Example: BP = 80 + 0.5(Age) → each year increases BP by 0.5 mmHg.

📊 5.3 Multiple Regression

Y = a + b₁X₁ + b₂X₂ + … + bₙXₙ + ε

Allows adjustment for confounders. Each coefficient (b₁) represents the change in Y per unit change in X₁, holding all other variables constant. Adjusted R² penalizes addition of irrelevant predictors. Multicollinearity (high correlation between predictors) inflates standard errors; detected via VIF.

📌 Clinical example: SBP = 90 + 0.4(Age) + 2.5(BMI) – 3.0(Exercise) → exercise independently lowers BP by 3 mmHg after adjusting for age and BMI.

⚖️ 5.4 Logistic Regression

Used for binary outcomes (disease/no disease). Models log‑odds: log(p/(1‑p)) = a + b₁X₁ + …

Odds Ratio (OR) = e^b → interpreted as adjusted OR for each predictor.

Model discrimination: C‑statistic (AUC) measures ability to distinguish cases from controls. AUC 0.7‑0.8 = acceptable, 0.8‑0.9 = excellent.

Example: log(odds MI) = –5 + 0.05(Age) + 0.7(Smoking) → smoking OR ≈ 2.0 (twice the odds of MI, adjusted for age).

🎲 5.5 Probability Theory & Decision Trees

Probability Rules
• Addition (OR): P(A or B) = P(A) + P(B) – P(A and B)
• Multiplication (AND): P(A and B) = P(A) × P(B|A)
• Conditional: P(A|B) = P(A and B)/P(B)

Decision Trees
• Square = decision node; circle = chance node
• Expected Value (EV) = Σ (probability × outcome)
• Choose option with highest EV

📊 Sensitivity analysis varies probabilities to test robustness of decision.

🔬 5.6 Likelihood Ratios & Bayes’ Theorem

LR+ = Sensitivity / (1 – Specificity)
LR‑ = (1 – Sensitivity) / Specificity

LR+ > 10 → large increase in posttest probability; LR‑ < 0.1 → large decrease. LRs are independent of prevalence and can be multiplied for sequential independent tests.

Bayes Theorem (odds form): Posttest odds = Pretest odds × LR
Odds = P/(1‑P); Probability = Odds/(1+Odds)

The Fagan nomogram provides a graphical shortcut. Use posttest probability from one test as pretest for the next.

📌 Example: Pretest probability 20% (odds 0.25), LR+ = 8 → posttest odds = 2 → probability ≈ 67%.

Continue Your Learning

High‑Yield Subtopics Clinical Correlation Full Course Map

Back to Biostatistics & Epidemiology Hub

Module 5: Correlation, Regression & ProbabilityCore Topics