# Choosing Tests & Scores Selecting the right conditional-independence test or score is one of the most important modeling decisions in Tetrad. Many search algorithms (PC, FGES, BOSS, GFCI, etc.) allow you to choose a test/score tailored to your data type and modeling assumptions. This page gives practical guidance on which tests and scores to use for: - Continuous (Gaussian / nearly Gaussian) - Discrete - Mixed continuous/discrete - Non-Gaussian linear models - Nonlinear models (KCI, basis expansions) - Latent-variable block-based searches The goal is not to list every option, but to highlight the ones that **work well in practice** in real causal-discovery workflows. --- ## 1. Continuous, Approximately Gaussian Data When variables are continuous, roughly symmetric, and well-modelled by linear relationships: ### Recommended Tests - **[Fisher Z Test](tests-and-scores/fisher-z.md)** The default in Tetrad and the most reliable choice for continuous data. Fast, robust, and statistically well-understood. - **Partial Covariance / Partial Correlation Tests** Equivalent variants used by some algorithms (conceptually the same family as the Fisher Z / partial-correlation approach). ### Recommended Scores - **[Sem BIC Score](tests-and-scores/sem-bic-score.md)** The standard linear-Gaussian score. ### Best-Fit Algorithms - PC, PC-Max, CPC - FGES (high-dimensional) - BOSS (small–medium dimensional; often gives better precision than FGES) - GRaSP - GFCI / RFCI **Rule of thumb:** If nothing fancy is required, use **Fisher Z** and **Sem BIC**. --- ## 2. Discrete Data (Binary / Ordinal / Categorical) Discrete CI testing behaves very differently from continuous modeling. Tetrad offers fast standard tests plus Bayesian alternatives. ### Recommended Tests - **[G²](tests-and-scores/g-square.md) / [Chi-Square Test](tests-and-scores/chi-square.md)** Best for moderate–large samples. - **“BDeu-style” Bayesian tests** (Logically aligned with the [BDeu Score](tests-and-scores/bdeu-score.md), especially in small-sample / sparse-table settings.) ### Recommended Scores - **[Discrete BDeu Score](tests-and-scores/bdeu-score.md)** Works well for small–medium models. **Note:** For large numbers of discrete variables, CPTs explode. ### Best-Fit Algorithms - **BOSS** (preferred for discrete; handles small–medium models extremely well) - **FGES** (works, but CPT blowup can be prohibitive; use only when p is modest) - PC / GFCI / RFCI with G² or BDeu-type tests **Rule of thumb:** Finite-state models are rarely high-dimensional; for score-based search on discrete data: **BOSS > FGES**. --- ## 3. Mixed Continuous/Discrete Data Mixed data requires special handling. In Tetrad you have **three practically useful options**: ### A. Conditional Gaussian (CG) - **CG Independence Test** → **[ConditionalGaussianLrt](tests-and-scores/conditional-gaussian-lrt.md)** - **CG BIC Score** → **[ConditionalGaussianBicScore](tests-and-scores/conditional-gaussian-bic-score.md)** Works when continuous variables are approximately Gaussian within levels of discrete parents. Statistically principled but slower. ### B. Degenerate Gaussian (DGC) - **Degenerate Gaussian Test** → **[DegenerateGaussianLrt](tests-and-scores/degenerate-gaussian-lrt.md)** - **Degenerate Gaussian Score** → **[DegenerateGaussianBicScore](tests-and-scores/degenerate-gaussian-bic-score.md)** Treats discrete variables as “degenerate Gaussians.” Much faster than full CG for large data. ### C. Basis Function (BF) Tests/Scores - **Basis Function CI Test** → **[BasisFunctionLrt](tests-and-scores/basis-function-lrt.md)** - **Basis Function BIC Score** → **[BasisFunctionBicScore](tests-and-scores/basis-function-bic-score.md)** This approach expands continuous variables using orthogonal polynomials (Legendre/Chebyshev): - works for mixed data - handles **nonlinear** relationships - extremely fast compared to kernel methods - sample-size scaling ~ constant --- ## 4. Non-Gaussian Linear Models If relationships are linear but noise terms are non-Gaussian (LiNGAM-type settings or visibly skewed residuals): ### Recommended Tests For **LiNGAM-style algorithms** (DirectLiNGAM, ICA-LiNGAM, etc.), independence is handled internally via ICA or related objective functions; there is **no separate CI test** to choose in the GUI. If you still run “ordinary” DAG search (PC, FGES, BOSS, GRaSP) on linear non-Gaussian data: - It is usually fine to keep using **[Fisher Z](tests-and-scores/fisher-z.md)** as the CI test. The structure is still identifiable from second-order moments under many conditions, even if the noise is non-Gaussian. ### Recommended Scores - **[Sem BIC Score](tests-and-scores/sem-bic-score.md)** (heuristic but empirically strong) Although derived under a Gaussian assumption, Sem BIC often works very well in linear non-Gaussian settings. In particular, the BOSS paper > Andrews, B., Ramsey, J., Sanchez-Romero, R., Camchong, J., & Kummerfeld, E. (2023). > *Fast scalable and accurate discovery of DAGs using the best order score search and grow shrink trees.* > NeurIPS 36, 63945–63956. shows that **BOSS + Sem BIC** performs comparably to DirectLiNGAM in linear non-Gaussian simulations (see Figure 4a). - **[PoissonPriorScore](tests-and-scores/poisson-prior-score.md)** (optional structural prior) This is a **structural sparsity prior** (Poisson on edges/parents), not a noise model. You can combine it with Sem BIC if you want an explicit probabilistic prior over graph complexity, but it is not specific to linear non-Gaussian noise. ### Best-Fit Algorithms - DirectLiNGAM, ICA-LiNGAM, ICA-LiNG-D (internal ICA-type objective) - FASK / FASK-Vote - BOSS or FGES with **Sem BIC** (heuristic but well supported by experiments) - Pairwise-skewness-based orientation methods **Rule of thumb:** If residuals are visibly skewed or heavy-tailed, it is still quite reasonable to use **Sem BIC** with BOSS/FGES for structure learning, and to compare against dedicated LiNGAM-style methods when possible. --- ## 5. Nonlinear Models Tetrad provides three practically useful nonlinear CI tests. ### A. Kernel Conditional Independence Test (**KCI**) - **[KCI](tests-and-scores/kci-test.md)** Captures **arbitrary nonlinear** dependencies. Very powerful but computationally expensive for large N. Best for small–medium datasets. ### B. Random Conditional Independence Test (**RCIT**) - **[RCIT](tests-and-scores/rcit-test.md)** Captures **arbitrary nonlinear** dependencies. Faster than KCI. Useful for larger datasets. ### B. Basis Function Test / Score (**Recommended for scalability**) - **[Basis Function CI Test](tests-and-scores/basis-function-lrt.md)** - **[Basis Function BIC Score](tests-and-scores/basis-function-bic-score.md)** - Nonlinear via polynomial/orthogonal expansions. - Post-nonlinear models. - Often matches or outperforms KCI on large N due to superior speed. **Rule of thumb:** | Goal | Recommended | |------|-------------| | Best accuracy for nonlinear CI | **[KCI](tests-and-scores/kci-test.md)** | | Best speed + strong accuracy | **[Basis Function Test](tests-and-scores/basis-function-lrt.md)** / **[Basis Function BIC](tests-and-scores/basis-function-bic-score.md)** | --- ## 6. Latent Variable Workflows (Block-Based Search) When you run latent clustering (TSC, FOFC, FTFC, GFFC, BPC), you obtain **clusters → latent nodes**. You can then run structure discovery over those latent nodes. ### Block-Based Tests/Scores - **Blocks-Test-TS** Trek-separation test on clusters. - **Blocks-BIC Score** A block-aware score for FGES, BOSS, GRaSP, SP, etc. (These are not yet documented as separate test/score pages in this manual.) ### Compatible Algorithms Any algorithm that accepts a test and/or score: - PC (default choice) - GFCI / RFCI - FGES (when clusters are few) - BOSS (very strong for moderate-sized latent structures) - GRaSP or SP (if small) ### Typical Workflow 1. Run clustering (e.g., TSC) 2. Convert clusters → latent variables 3. Choose Blocks-Test-TS or Blocks-BIC 4. Run PC, GFCI, BOSS, FGES, or others This is the recommended approach for **latent causal structure without specifying measurement models**. --- ## Summary Table (Practical Defaults) | Setting | Test | Score | Algorithms | |--------|-------------------------------------------------------------------------------------|------------------------------------------------------------------------------|--------------------------------------------| | Continuous linear | [Fisher Z](tests-and-scores/fisher-z.md) | [Sem BIC](tests-and-scores/sem-bic-score.md) | PC, FGES, BOSS\*, GFCI | | Discrete | [G²](tests-and-scores/g-square.md) or BDeu-style tests | [BDeu](tests-and-scores/bdeu-score.md) | BOSS\*, FGES, PC | | Mixed | [Degenerate Gaussian Test](tests-and-scores/degenerate-gaussian-lrt.md) | [Degenerate Gaussian BIC](tests-and-scores/degenerate-gaussian-bic-score.md) | PC, FGES, BOSS, GFCI, FCIT | | Linear non-Gaussian | Internal ICA criteria or [Fisher Z](tests-and-scores/fisher-z.md) when using PC/BOSS | [Sem BIC](tests-and-scores/sem-bic-score.md) | DirectLiNGAM, FASK, BOSS (heuristic), FGES | | Nonlinear | [KCI](tests-and-scores/kci-test.md), [RCIT](tests-and-scores/rcit-test.md) | (none / kernel-based) | PC+KCI, CAM | | Nonlinear scalable | [Basis Function Test](tests-and-scores/basis-function-lrt.md) | [Basis Function BIC](tests-and-scores/basis-function-bic-score.md) | PC+BF, GFCI+BF | | Latent blocks | Blocks-Test-TS | Blocks-BIC | PC, GFCI, FGES, BOSS | \* **BOSS is recommended over FGES** unless the number of variables is very large. GRaSP is a similar algorithm that should be considered as well. --- ## Next Steps - **[Tests & Scores Catalog](tests-and-scores-catalog.md)** - **[Search Algorithms — Short List](search-algorithms-short-list.md)** - **[Latent Clustering](algorithms/latent-cluster.md)** - Per-algorithm documentation for parameter definitions