12. G-Square Test

12.1. Summary

The G-square test (likelihood ratio chi-square) is a test of (conditional) independence for discrete variables. It compares the likelihood of a model where two variables are independent given a conditioning set S to a model where they are allowed to depend on each other given S.

12.2. When to use

Variables are discrete (categorical).
Sample sizes per cell are reasonably large.
You want a standard, likelihood-based CI test for PC, CPC, FCI, RFCI, or other constraint-based algorithms on discrete data.

12.3. Assumptions

Variables are discrete with a manageable number of categories.
Expected cell counts in the contingency tables are not too small (as for chi-square-type tests generally).
The multinomial model for counts is a reasonable approximation.

12.4. Test details (conceptual)

For each candidate independence X ⟂ Y | S, the G-square test:

Constructs contingency tables for X, Y, and S.
Compares the log-likelihood of the full model (X and Y possibly dependent given S) to the restricted model (X and Y independent given S).
Forms the test statistic G² = 2 * (logL_full − logL_restricted).
Uses an approximate chi-square distribution with degrees of freedom equal to the difference in the number of parameters to compute a p-value.

12.5. Parameters

Parameter (camelCase)	Description
`alpha`	Significance level (p-value cutoff) for the G² likelihood-ratio test of (conditional) independence. The null hypothesis is that the variables are independent given the conditioning set. P-values below `alpha` lead to rejection. Smaller values make the test more conservative (fewer edges); larger values make the graph denser. Typical range: 0.0–1.0.
`minCountPerCell`	Minimum allowed count in each cell of the contingency table. If some cells fall below this threshold, the asymptotic chi-square approximation for the G² statistic becomes less reliable. Increasing this value can improve accuracy but may reduce power when sample size is small. Default is 1; minimum is 1; maximum is 1,000,000.
`cellTableType`	Optimization choice for how to build contingency tables: `1 = AD Tree`, `2 = Count Sample`. This affects how counts are computed internally (data structure and performance), but should not change the numerical results. Default is 1 (AD Tree).

12.6. Strengths

Standard likelihood-based test for discrete contingency tables.
Works naturally with multinomial models used in discrete Bayes nets.
Symmetric in X and Y and straightforward to interpret.

12.7. Limitations

Can be unreliable when sample sizes per cell are small.
Complexity can grow quickly with the number of categories and conditioning variables.
Not suitable for continuous variables without discretization.

12.8. References

Agresti, A. (2002). Categorical Data Analysis (2nd ed.). Wiley.