12. G-Square Test

12.1. Summary

The G-square test (likelihood ratio chi-square) is a test of (conditional) independence for discrete variables. It compares the likelihood of a model where two variables are independent given a conditioning set S to a model where they are allowed to depend on each other given S.

12.2. When to use

  • Variables are discrete (categorical).

  • Sample sizes per cell are reasonably large.

  • You want a standard, likelihood-based CI test for PC, CPC, FCI, RFCI, or other constraint-based algorithms on discrete data.

12.3. Assumptions

  • Variables are discrete with a manageable number of categories.

  • Expected cell counts in the contingency tables are not too small (as for chi-square-type tests generally).

  • The multinomial model for counts is a reasonable approximation.

12.4. Test details (conceptual)

For each candidate independence X ⟂ Y | S, the G-square test:

  1. Constructs contingency tables for X, Y, and S.

  2. Compares the log-likelihood of the full model (X and Y possibly dependent given S) to the restricted model (X and Y independent given S).

  3. Forms the test statistic G² = 2 * (logL_full − logL_restricted).

  4. Uses an approximate chi-square distribution with degrees of freedom equal to the difference in the number of parameters to compute a p-value.

12.5. Parameters

Parameter (camelCase)

Description

alpha

Significance level (p-value cutoff) for the G² likelihood-ratio test of (conditional) independence. The null hypothesis is that the variables are independent given the conditioning set. P-values below alpha lead to rejection. Smaller values make the test more conservative (fewer edges); larger values make the graph denser. Typical range: 0.0–1.0.

minCountPerCell

Minimum allowed count in each cell of the contingency table. If some cells fall below this threshold, the asymptotic chi-square approximation for the G² statistic becomes less reliable. Increasing this value can improve accuracy but may reduce power when sample size is small. Default is 1; minimum is 1; maximum is 1,000,000.

cellTableType

Optimization choice for how to build contingency tables: 1 = AD Tree, 2 = Count Sample. This affects how counts are computed internally (data structure and performance), but should not change the numerical results. Default is 1 (AD Tree).

12.6. Strengths

  • Standard likelihood-based test for discrete contingency tables.

  • Works naturally with multinomial models used in discrete Bayes nets.

  • Symmetric in X and Y and straightforward to interpret.

12.7. Limitations

  • Can be unreliable when sample sizes per cell are small.

  • Complexity can grow quickly with the number of categories and conditioning variables.

  • Not suitable for continuous variables without discretization.

12.8. References

  • Agresti, A. (2002). Categorical Data Analysis (2nd ed.). Wiley.