6. Conditional Gaussian Likelihood Ratio Test
6.1. Summary
The Conditional Gaussian Likelihood Ratio Test is designed for conditional Gaussian (CG) models, where some variables are continuous and others are discrete. It tests independence X ⟂ Y | S under a CG assumption by comparing nested CG models with and without cross-terms between X and Y.
6.2. When to use
Data are mixed: some continuous, some discrete.
You want a parametric test that respects the structure of CG models (linear Gaussian conditional on discrete configurations).
You are using CG-capable algorithms or scores in Tetrad.
6.3. Assumptions
The data generation process is compatible with a conditional Gaussian distribution: given the discrete variables, the continuous variables follow a multivariate normal distribution whose mean and covariance may depend on the discrete configuration.
Sufficient sample size exists within each configuration of the discrete variables.
Relationships are linear in the continuous variables within each discrete cell.
6.4. Test details (conceptual)
For each candidate independence X ⟂ Y | S, the CG LRT:
Partitions the data according to the discrete variables in X, Y, and S.
Fits CG models that either permit or forbid dependence between X and Y given S within each partition.
Forms a likelihood ratio statistic by comparing the full and restricted models.
Uses an asymptotic chi-square distribution for the difference in log- likelihoods to obtain a p-value.
6.5. Parameters
Parameter (camelCase) |
Description |
|---|---|
|
Significance level (p-value cutoff) for the likelihood-ratio test of conditional independence. The null hypothesis is that the variables are conditionally independent given the conditioning set. P-values below |
|
Boolean. If |
|
Integer ≥ 2. Number of categories used when discretizing continuous variables in the backup discretization step. Default is 3. Larger values give a finer discretization but increase the number of cells and reduce counts per cell. |
|
Integer ≥ 2. Minimum required sample size per configuration (cell) in the conditional Gaussian model. If some cells fall below this threshold, the test may fall back to discretization (if |
6.6. Strengths
Designed specifically for mixed continuous/discrete data.
Avoids ad hoc discretization of continuous variables.
Compatible with CG BIC scores and CG-aware search procedures.
6.7. Limitations
Requires enough samples per discrete configuration; sparse cells can be a problem.
Assumes linear-Gaussian structure for continuous variables within each discrete cell.
More complex and computationally intensive than purely continuous or purely discrete tests.
6.8. References
Lauritzen, S. L. (1996). Graphical Models. Oxford University Press.
6.9. References
Andrews, B., Ramsey, J., & Cooper, G. F. (2018). Scoring Bayesian networks of mixed variables. International Journal of Data Science and Analytics, 6(1), 3–18.