Start Free Trial
Core Curriculum

Module 7: Study Interpretation & Critical Appraisal
Core Topics

Causation criteria, hypothesis testing, confidence intervals, bias, confounding, internal/external validity, and clinical vs. statistical significance — the essential skills for evaluating medical literature.

7
Sessions
20+
Key concepts
USMLE
High yield

🔗 7.1 Causation and Causal Criteria

Establishing that an exposure causes an outcome requires more than association. Hill's Criteria help evaluate causality.

Key Criteria
  • Temporality (essential): exposure must precede outcome.
  • Strength: large effect size (e.g., RR > 5).
  • Dose‑response: increasing exposure → increasing risk.
  • Consistency: replicated across studies.
  • Plausibility: biologically credible.
Other Criteria
  • Specificity, coherence, experiment, analogy.
  • Reverse causality: outcome causing exposure (e.g., disease → dietary change).
📌 Essential takeaway: Temporality is the only absolute requirement; prospective designs (cohort, RCT) best establish it.

📊 7.2 Hypothesis Testing & Statistical Significance

Null hypothesis (H₀): no difference / no association
Alternative (H₁): difference exists
p‑value: probability of observed data if H₀ is true.
Errors
• Type I (α): false positive (reject true H₀).
• Type II (β): false negative (fail to reject false H₀).
• Power = 1 – β (probability to detect true effect).
Factors affecting power
↑ sample size, ↑ effect size, ↓ variability, ↑ α.
Conventional α = 0.05, power = 80%.
📊 p < 0.05 → statistically significant; does not measure clinical importance.

📏 7.3 Confidence Intervals

95% CI = point estimate ± (1.96 × SE)
CI excludes null value ↔ p < 0.05.

Interpretation: Narrow CI → precise estimate (larger sample size). For RR/OR, if CI includes 1.0 → not statistically significant. CI provides effect magnitude and precision, more informative than p‑value alone.

Example: RR = 2.5 (95% CI 1.2–5.0) → significant; RR = 2.5 (0.9–6.8) → not significant, wide interval.

📈 7.4 Statistical Test Selection

Continuous outcome
• 2 independent groups: t‑test (parametric) / Mann‑Whitney U (non‑parametric)
• Paired: paired t‑test / Wilcoxon signed‑rank
• ≥3 groups: ANOVA / Kruskal‑Wallis
Categorical outcome
• Chi‑square (expected ≥5)
• Fisher’s exact (small cells)
• McNemar (paired)
Time‑to‑event
• Log‑rank, Cox regression

Parametric tests assume normality; non‑parametric are robust to violations. A priori analyses are pre‑specified; post hoc (subgroup) analyses are exploratory and increase Type I error.

⚠️ 7.5 Bias and Confounding

Selection bias
• Sampling bias, non‑response, loss to follow‑up
• Healthy worker effect, Berkson’s bias
Information bias
• Recall bias, interviewer bias, measurement error
• Lack of blinding → performance/detection bias

Confounding: a third variable associated with both exposure and outcome, distorting the relationship. Control: randomization (design), restriction, matching, stratification, multivariable regression.

📌 Hawthorne effect: behavior change due to observation. Placebo effect: improvement from expectation.

🎯 7.6 Internal and External Validity

Internal validity
Are results true for the study population?
Threats: bias, confounding, chance, poor design.
External validity (generalizability)
Can results apply to other populations/settings?
Affected by inclusion criteria, setting, time.

Efficacy vs. Effectiveness: Efficacy measures effect under ideal conditions (high internal validity). Effectiveness measures real‑world impact (high external validity).

📌 Trade‑off: strict eligibility criteria increase internal validity but limit generalizability.

💊 7.7 Clinical vs. Statistical Significance

Statistical significance
p < 0.05
Depends on sample size; large n can detect trivial effects.
Clinical significance
Effect large enough to matter to patients.
MCID = minimum clinically important difference.

Endpoints: Clinical endpoints measure direct patient outcomes (mortality, stroke). Surrogate endpoints are biomarkers (LDL, HbA1c) assumed to predict benefit but can be misleading (e.g., torcetrapib, rosiglitazone).

📊 Example: LDL reduction 5 mg/dL, p<0.001 → statistically significant but likely not clinically meaningful. Always evaluate effect magnitude alongside p‑value.

Continue Your Learning