23. SEM BIC Score

23.1. Summary

The SEM BIC Score is a BIC-type score for linear structural equation models (SEMs) with continuous variables and Gaussian errors. It evaluates the fit of a DAG or SEM structure by combining the log-likelihood of the implied covariance matrix with a penalty on model complexity.

23.2. When to use

Data are continuous and reasonably Gaussian.
You are learning a DAG or SEM using algorithms like FGES, BOSS, or GRaSP.
You want a consistent, likelihood-based score that trades off fit and complexity.

23.3. Model class

Linear structural equation models with Gaussian noise.
Equivalent to evaluating a DAG with linear regressions at each node.

23.4. Score form (conceptual)

The SEM BIC Score is of the form:

BIC = 2 * logL − k * ln(N)

where:

logL is the maximized log-likelihood for the model,
k is the number of free parameters (edges and variances),
N is the sample size.

In Tetrad’s convention, larger BIC values are better.

23.5. Parameters

Parameter (camelCase)	Description
`penaltyDiscount`	Double ≥ 0.0. The penalty multiplier “c” in the modified BIC-type criterion (for example, a score of the form 2·log-likelihood minus c·k·log(N), where k is the number of free parameters and N is the sample size). Larger values impose a stronger complexity penalty and yield sparser graphs; smaller values allow denser graphs. Default is 2.0.
`semBicStructurePrior`	Double ≥ 0.0. Structure prior coefficient specific to the SEM BIC score. When 0.0 (default), the score uses essentially a flat structure prior. Positive values encode a preference for certain in-degree patterns (for example, sparser graphs), acting as an additional prior on the number of edges or parents per node.
`semBicRule`	Integer. Choice of SEM BIC rule for how likelihood differences are translated into edge decisions: `1 = Chickering`, `2 = Nandy`. The Chickering rule uses likelihood differences directly; the Nandy rule uses a transformation based on the absolute value of partial correlations in place of the raw likelihood difference. Default is 1 (Chickering).
`precomputeCovariances`	Boolean. If `true`, precomputes and caches covariance (and possibly cross-covariance) matrices used by the score. This speeds up repeated scoring at the cost of additional memory. If `false`, covariances are computed on the fly, which saves memory but may be slower for large graphs or many score evaluations. Recommended: `true` for up to a few thousand variables; `false` when p is very large.
`singularityLambda`	Double. Handles singular or nearly singular covariance matrices. If `singularityLambda > 0`, that value is added to the diagonal (a ridge term) to stabilize matrix inverses. If `singularityLambda < 0`, a pseudoinverse is used instead. Default is 0.0. Use a small positive value if you encounter numerical-singularity warnings.
`effectiveSampleSize`	Integer > 0, or `-1`. If `-1` (default), the actual sample size N is used in the log(N) penalty term. If set to a positive value, the score behaves as if that were the sample size (for example, when treating weighted or subsampled data as having a different effective N).

23.6. Strengths

Well-studied, consistent under standard regularity conditions.
Efficient to compute using regression or covariance matrix factorizations.
Natural choice for continuous linear DAG/SEM learning.

23.7. Limitations

Assumes linear-Gaussian structure; may mis-score strong nonlinear or non-Gaussian relationships.
Sensitive to outliers and heteroskedasticity.