# SEM BIC Score

## Summary

The SEM BIC Score is a **BIC-type score** for **linear structural equation
models (SEMs)** with continuous variables and Gaussian errors. It evaluates the
fit of a DAG or SEM structure by combining the log-likelihood of the implied
covariance matrix with a penalty on model complexity.

## When to use

- Data are continuous and reasonably **Gaussian**.
- You are learning a DAG or SEM using algorithms like FGES, BOSS, or GRaSP.
- You want a consistent, likelihood-based score that trades off fit and
  complexity.

## Model class

- Linear structural equation models with Gaussian noise.
- Equivalent to evaluating a DAG with linear regressions at each node.

## Score form (conceptual)

The SEM BIC Score is of the form:

    BIC = 2 * logL − k * ln(N)

where:

- `logL` is the maximized log-likelihood for the model,
- `k` is the number of free parameters (edges and variances),
- `N` is the sample size.

In Tetrad’s convention, **larger BIC values are better**.

## Parameters

| Parameter (camelCase)     | Description |
|---------------------------|-------------|
| `penaltyDiscount`         | Double ≥ 0.0. The penalty multiplier “c” in the modified BIC-type criterion (for example, a score of the form 2·log-likelihood minus c·k·log(N), where k is the number of free parameters and N is the sample size). Larger values impose a stronger complexity penalty and yield sparser graphs; smaller values allow denser graphs. Default is 2.0. |
| `semBicStructurePrior`    | Double ≥ 0.0. Structure prior coefficient specific to the SEM BIC score. When 0.0 (default), the score uses essentially a flat structure prior. Positive values encode a preference for certain in-degree patterns (for example, sparser graphs), acting as an additional prior on the number of edges or parents per node. |
| `semBicRule`              | Integer. Choice of SEM BIC rule for how likelihood differences are translated into edge decisions: `1 = Chickering`, `2 = Nandy`. The Chickering rule uses likelihood differences directly; the Nandy rule uses a transformation based on the absolute value of partial correlations in place of the raw likelihood difference. Default is 1 (Chickering). |
| `precomputeCovariances`   | Boolean. If `true`, precomputes and caches covariance (and possibly cross-covariance) matrices used by the score. This speeds up repeated scoring at the cost of additional memory. If `false`, covariances are computed on the fly, which saves memory but may be slower for large graphs or many score evaluations. Recommended: `true` for up to a few thousand variables; `false` when p is very large. |
| `singularityLambda`       | Double. Handles singular or nearly singular covariance matrices. If `singularityLambda > 0`, that value is added to the diagonal (a ridge term) to stabilize matrix inverses. If `singularityLambda < 0`, a pseudoinverse is used instead. Default is 0.0. Use a small positive value if you encounter numerical-singularity warnings. |
| `effectiveSampleSize`     | Integer > 0, or `-1`. If `-1` (default), the actual sample size N is used in the log(N) penalty term. If set to a positive value, the score behaves as if that were the sample size (for example, when treating weighted or subsampled data as having a different effective N). |

## Strengths

- Well-studied, consistent under standard regularity conditions.
- Efficient to compute using regression or covariance matrix factorizations.
- Natural choice for continuous linear DAG/SEM learning.

## Limitations

- Assumes linear-Gaussian structure; may mis-score strong nonlinear or
  non-Gaussian relationships.
- Sensitive to outliers and heteroskedasticity.