# G-Square Test

## Summary

The G-square test (likelihood ratio chi-square) is a test of (conditional)
independence for **discrete** variables. It compares the likelihood of a model
where two variables are independent given a conditioning set S to a model where
they are allowed to depend on each other given S.

## When to use

- Variables are **discrete** (categorical).
- Sample sizes per cell are reasonably large.
- You want a standard, likelihood-based CI test for PC, CPC, FCI, RFCI, or
  other constraint-based algorithms on discrete data.

## Assumptions

- Variables are discrete with a manageable number of categories.
- Expected cell counts in the contingency tables are not too small (as for
  chi-square-type tests generally).
- The multinomial model for counts is a reasonable approximation.

## Test details (conceptual)

For each candidate independence X ⟂ Y | S, the G-square test:

1. Constructs contingency tables for X, Y, and S.
2. Compares the **log-likelihood** of the full model (X and Y possibly
   dependent given S) to the **restricted model** (X and Y independent given
   S).
3. Forms the test statistic G² = 2 * (logL_full − logL_restricted).
4. Uses an approximate chi-square distribution with degrees of freedom equal
   to the difference in the number of parameters to compute a p-value.

## Parameters

| Parameter (camelCase)   | Description |
|-------------------------|-------------|
| `alpha`                 | Significance level (p-value cutoff) for the G² likelihood-ratio test of (conditional) independence. The null hypothesis is that the variables are independent given the conditioning set. P-values below `alpha` lead to rejection. Smaller values make the test more conservative (fewer edges); larger values make the graph denser. Typical range: 0.0–1.0. |
| `minCountPerCell`       | Minimum allowed count in each cell of the contingency table. If some cells fall below this threshold, the asymptotic chi-square approximation for the G² statistic becomes less reliable. Increasing this value can improve accuracy but may reduce power when sample size is small. Default is 1; minimum is 1; maximum is 1,000,000. |
| `cellTableType`         | Optimization choice for how to build contingency tables: `1 = AD Tree`, `2 = Count Sample`. This affects how counts are computed internally (data structure and performance), but should not change the numerical results. Default is 1 (AD Tree). |

## Strengths

- Standard likelihood-based test for **discrete** contingency tables.
- Works naturally with multinomial models used in discrete Bayes nets.
- Symmetric in X and Y and straightforward to interpret.

## Limitations

- Can be unreliable when **sample sizes per cell are small**.
- Complexity can grow quickly with the number of categories and conditioning
  variables.
- Not suitable for continuous variables without discretization.

## References

- Agresti, A. (2002). *Categorical Data Analysis* (2nd ed.). Wiley.