23. SEM BIC Score

23.1. Summary

The SEM BIC Score is a BIC-type score for linear structural equation models (SEMs) with continuous variables and Gaussian errors. It evaluates the fit of a DAG or SEM structure by combining the log-likelihood of the implied covariance matrix with a penalty on model complexity.

23.2. When to use

  • Data are continuous and reasonably Gaussian.

  • You are learning a DAG or SEM using algorithms like FGES, BOSS, or GRaSP.

  • You want a consistent, likelihood-based score that trades off fit and complexity.

23.3. Model class

  • Linear structural equation models with Gaussian noise.

  • Equivalent to evaluating a DAG with linear regressions at each node.

23.4. Score form (conceptual)

The SEM BIC Score is of the form:

BIC = 2 * logL − k * ln(N)

where:

  • logL is the maximized log-likelihood for the model,

  • k is the number of free parameters (edges and variances),

  • N is the sample size.

In Tetrad’s convention, larger BIC values are better.

23.5. Parameters

Parameter (camelCase)

Description

penaltyDiscount

Double ≥ 0.0. The penalty multiplier “c” in the modified BIC-type criterion (for example, a score of the form 2·log-likelihood minus c·k·log(N), where k is the number of free parameters and N is the sample size). Larger values impose a stronger complexity penalty and yield sparser graphs; smaller values allow denser graphs. Default is 2.0.

semBicStructurePrior

Double ≥ 0.0. Structure prior coefficient specific to the SEM BIC score. When 0.0 (default), the score uses essentially a flat structure prior. Positive values encode a preference for certain in-degree patterns (for example, sparser graphs), acting as an additional prior on the number of edges or parents per node.

semBicRule

Integer. Choice of SEM BIC rule for how likelihood differences are translated into edge decisions: 1 = Chickering, 2 = Nandy. The Chickering rule uses likelihood differences directly; the Nandy rule uses a transformation based on the absolute value of partial correlations in place of the raw likelihood difference. Default is 1 (Chickering).

precomputeCovariances

Boolean. If true, precomputes and caches covariance (and possibly cross-covariance) matrices used by the score. This speeds up repeated scoring at the cost of additional memory. If false, covariances are computed on the fly, which saves memory but may be slower for large graphs or many score evaluations. Recommended: true for up to a few thousand variables; false when p is very large.

singularityLambda

Double. Handles singular or nearly singular covariance matrices. If singularityLambda > 0, that value is added to the diagonal (a ridge term) to stabilize matrix inverses. If singularityLambda < 0, a pseudoinverse is used instead. Default is 0.0. Use a small positive value if you encounter numerical-singularity warnings.

effectiveSampleSize

Integer > 0, or -1. If -1 (default), the actual sample size N is used in the log(N) penalty term. If set to a positive value, the score behaves as if that were the sample size (for example, when treating weighted or subsampled data as having a different effective N).

23.6. Strengths

  • Well-studied, consistent under standard regularity conditions.

  • Efficient to compute using regression or covariance matrix factorizations.

  • Natural choice for continuous linear DAG/SEM learning.

23.7. Limitations

  • Assumes linear-Gaussian structure; may mis-score strong nonlinear or non-Gaussian relationships.

  • Sensitive to outliers and heteroskedasticity.