Page 7 of 13

CM6.{3,5-6} | CM6.{3,5-6} | Statistical Analysis and Software Use — SDL Guide (Part 3)

Statistical Software in Community Medicine Practice

Modern community medicine practice does not compute chi-square by hand in the field — it uses statistical software. NMC competency CM6.5 requires understanding the use of statistical software for data analysis. Four tools are most relevant to Indian community medicine training and practice:

Epi Info 7 (CDC, free) — the international standard for field epidemiology and community health surveys. Features: data entry with built-in validation, Freq (frequencies and cross-tabulations with chi-square), Means (mean, SD, SE, t-tests), Tables (2×2 contingency tables, OR, RR, chi-square, Fisher's exact), Kaplan-Meier survival curves, outbreak analysis tools. Widely used in IDSP field investigations and NFHS fieldwork. Download from cdc.gov/epiinfo.

OpenEpi (free, browser-based, no installation) — designed specifically for public health and epidemiology. Features: sample size calculation (for means and proportions), proportions with 95% CI, chi-square and Fisher's exact from 2×2 table entry, OR and RR with CI, sensitivity/specificity from a 2×2 diagnostic accuracy table. Ideal for rapid field calculations. Access at www.openepi.com.

SPSS (Statistical Package for the Social Sciences) — the most commonly required in postgraduate research and thesis work. A graphical interface (point-and-click) that runs the full range of tests (Descriptives, Frequencies, t-tests, ANOVA, chi-square, correlation, regression). Requires a license (institutional access common at medical colleges). SPSS output includes test statistics, df, p-value, and 95% CI in a standard format.

Microsoft Excel — available on all institutional computers; suitable for: frequency tables, basic descriptive statistics (AVERAGE, STDEV, MEDIAN functions), histograms (Data Analysis ToolPak), and simple chi-square computation (CHISQ.TEST function). NOT suitable as a primary statistical package for research, but useful for data entry, graphing, and checking hand calculations.

Practical workflow using Epi Info for a chi-square:
1. Enter data: Analysis → Read → select dataset.
2. Cross-tabulate: Statistics → Tables → specify exposure variable and outcome variable.
3. Output: Epi Info automatically computes chi-square, Fisher's exact, OR, RR, and p-values.
4. Interpret: check chi-square p-value, verify expected frequencies ≥5 (if not, use Fisher's exact instead).

For NMC Theory and PG entrance purposes: know WHAT each software is used for; for practice, hands-on use of Epi Info and OpenEpi during community medicine postings is the most valuable preparation.

CLINICAL PEARL

The most common exam trap on chi-square: A student sees a 2×2 table with all expected frequencies ≥5 and correctly uses chi-square — PASS. But if even one cell has an expected frequency <5, Fisher's exact test is required. The expected frequency is calculated as (row total × column total) / grand total — it is NOT the observed frequency. Students often check the observed frequencies (which may all be large) and miss that the expected frequencies for small-margin cells can still fall below 5. Always compute expected frequencies before choosing chi-square. A second common trap: the degrees of freedom. For a 2×2 table, df = 1; for a 3×2 table, df = 2. Look up the critical chi-square value at the correct df — using df=1 for a 3×4 table gives a wrong critical value and wrong conclusion.

Applied: Perform and Interpret Descriptive Statistics from a Dataset

NMC competency CM6.6 requires you to 'perform descriptive statistics of a given data-set and interpret' — a practical skill that integrates everything from both SDLs in this cluster. The following worked example mirrors the type of dataset analysis that appears in community medicine theory papers and internship postings. The ability to move fluently from raw numbers to a coherent descriptive summary, then to an appropriate significance test and a contextualised interpretation, is what distinguishes a competent community medicine practitioner from one who merely knows statistical names without being able to apply them. Working through this example end-to-end — pausing to justify each methodological choice — is the most effective preparation for both examinations and field posting assignments. Notice how each step builds on the previous: descriptive statistics reveal the shape of the data; the correct graph makes that shape visible; the significance test answers the specific research question; and the interpretation goes beyond the number to communicate what the finding means for the community.

Dataset: A community health survey in a block enrolled 100 adults from 5 villages. Variables collected: age (years), sex (M/F), smoking status (Y/N), systolic blood pressure (SBP in mmHg), and BMI (kg/m²).

Step 1 — Descriptive statistics for continuous variables (CM6.4/6.6):
Using the 100 SBP values:
- Mean SBP: 128.4 mmHg
- Median SBP: 126.0 mmHg (the slight positive skew indicates a few hypertensives pulling the mean up)
- SD: 18.6 mmHg (spread of individual values)
- SE: 18.6 / √100 = 1.86 mmHg (precision of the sample mean)
- 95% CI for population mean SBP: 128.4 ± 1.96 × 1.86 = 128.4 ± 3.64 → 124.8 to 132.0 mmHg
- Range: 96–178 mmHg

Step 2 — Frequency table and graph for a categorical variable (CM6.2):
Smoking status: 32 smokers (32%), 68 non-smokers (68%). Present as a bar chart (categorical data). Cross-tabulate smoking by hypertension status (SBP ≥140 mmHg defined as hypertensive: 18 subjects).

Step 3 — Test for association (CM6.3):
Hypothesis: Is there an association between smoking and hypertension in this sample?

2×2 table:
- Smokers: 12 hypertensive, 20 not
- Non-smokers: 6 hypertensive, 62 not

Expected frequencies:
- Smokers/Hypertensive: (32 × 18)/100 = 5.76 ✓ (≥5)
- Smokers/Not: (32 × 82)/100 = 26.24 ✓
- Non-smokers/Hypertensive: (68 × 18)/100 = 12.24 ✓
- Non-smokers/Not: (68 × 82)/100 = 55.76 ✓

All expected frequencies ≥5 → chi-square test is appropriate.
χ² = (12−5.76)²/5.76 + (20−26.24)²/26.24 + (6−12.24)²/12.24 + (62−55.76)²/55.76
= 38.9/5.76 + 38.9/26.24 + 38.9/12.24 + 38.9/55.76
= 6.75 + 1.48 + 3.18 + 0.70 = 12.11

df = 1. Critical value at p = 0.05 = 3.84. Since 12.11 > 3.84, p < 0.05 (in fact p < 0.001 from chi-square tables).

Step 4 — Interpretation:
There is a statistically significant association between smoking and hypertension (χ² = 12.11, df = 1, p < 0.001). Smokers had hypertension prevalence of 37.5% (12/32) vs 8.8% (6/68) in non-smokers. However, this is a cross-sectional survey — it demonstrates association, NOT causation. Confounders (age, BMI, diet, alcohol use) have not been controlled. The 95% CI for the prevalence difference (28.7 percentage points) would be needed to assess precision before making programme recommendations.

This four-step workflow — descriptive summary, graphical display, appropriate test, careful interpretation — is the standard approach for any community medicine dataset and the expected answer format in theory examinations.

SELF-CHECK

A researcher studies 15 hypertensive patients, measuring their systolic BP before and after 8 weeks of a dietary salt-restriction intervention. The BP differences between paired measurements are normally distributed. Which test should be used to determine if the intervention significantly reduced SBP?

A. Independent two-sample t-test

B. Paired t-test

C. One-way ANOVA

D. Chi-square test

Reveal Answer

Answer: B. Paired t-test

The study design involves the SAME 15 patients measured TWICE (before and after intervention) — these are paired/repeated measurements, not two independent groups. The paired t-test is correct because it tests whether the mean of the within-subject differences (post − pre) differs significantly from zero, exploiting the correlation between paired measurements to reduce variability. The independent two-sample t-test incorrectly treats before and after as separate independent groups, losing the pairing information and reducing statistical power. The data are continuous and normally distributed, so the parametric paired t-test is preferred over its non-parametric equivalent (Wilcoxon signed-rank). ANOVA is for comparing ≥3 groups; chi-square is for categorical data.

Interactive practice: Multiple Choice

Interactive practice: True / False