21. Probabilistic Independence Test
21.1. Summary
The Probabilistic Independence Test is a wrapper test that uses explicit probability or density models (for example, from an instantiated Bayesian network or parametric model) to answer independence queries X ⟂ Y | S. It is used when the probability model, not the data, is considered the source of truth.
21.2. When to use
You have a fully specified probabilistic model (e.g., a Bayes net or parametric SEM) and want to query its implied independences.
You are performing oracle-style experiments where the model encodes the ground truth.
You wish to test search algorithms in controlled synthetic settings.
21.3. Assumptions
The provided probabilistic model is correct (for the purpose of the experiment).
Independence decisions are made by checking whether the model assigns the same conditional distribution to X given S, with and without conditioning on Y (or equivalent factorizations).
21.4. Test details (conceptual)
For each X ⟂ Y | S query, the test:
Uses the underlying probability model to compute or compare P(X | S) and P(X | Y, S), or an equivalent characterization.
Declares independence if these distributions are equal (within numerical tolerance), and dependence otherwise.
Does not rely on raw data; the model itself is the oracle.
21.5. Parameters
Parameter (camelCase) |
Description |
|---|---|
|
Boolean. If |
|
Double in [0.0, 1.0]. Independence cutoff threshold. When |
|
Double ≥ 1.0. Prior equivalent sample size for the underlying Bayesian model used to estimate independence probabilities. This acts like a pseudo-count total that is distributed across cells in the relevant contingency or parameter tables. Larger values make the prior stronger relative to the data (smoother probabilities); smaller values let the data dominate more. Default is |
21.6. Strengths
Provides exact or near-exact independence decisions given the model.
Ideal for algorithm evaluation and theoretical experiments.
21.7. Limitations
Not applicable when only raw data are available and no model is given.
Results are only as good as the underlying model specification.