4. Chi-Square Test
4.1. Summary
The Chi-square test of independence is a standard contingency-table test for discrete variables. In Tetrad, it is used as a CI test for categorical variables by comparing observed counts to expected counts under independence.
4.2. When to use
Data are discrete (categorical).
You want a classical Pearson chi-square test instead of the likelihood ratio (G-square) test.
Sample sizes per cell are moderately large.
4.3. Assumptions
Multinomial sampling with fixed margins is approximately valid.
Expected cell counts are not too small (a common rule of thumb is at least 5 in most cells).
Variables and conditioning sets are discrete with moderate arity.
4.4. Test details (conceptual)
For each candidate independence X ⟂ Y | S:
Form contingency tables of counts for X and Y given each configuration of S.
Compute expected counts under the assumption that X and Y are independent given S.
Compute Pearson’s chi-square statistic as the sum over cells of (observed − expected)² / expected.
Use a chi-square distribution with appropriate degrees of freedom to obtain a p-value.
4.5. Parameters
Parameter (camelCase) |
Description |
|---|---|
|
Significance level (p-value cutoff) for the chi-square test of (conditional) independence. The null hypothesis is that the variables are independent given the conditioning set. P-values below |
|
Minimum allowed count in each cell of the contingency table. If some cells fall below this threshold, the chi-square approximation becomes less reliable. Increasing this value can improve accuracy but may reduce power when sample size is small. Default is 1; minimum is 1; maximum is 1,000,000. |
|
Optimization choice for how to build contingency tables: |
|
The effective sample size to use in computing p-values. If set to |
4.6. Strengths
Widely known and understood.
Easy to implement and interpret.
Works well when cell counts are sufficiently large.
4.7. Limitations
Performs poorly with sparse tables (many small expected counts).
Not appropriate for continuous data without discretization.
As conditioning sets grow, tables can become very large and sparse.
4.8. References
Agresti, A. (2002). Categorical Data Analysis (2nd ed.). Wiley.