10. Extended BIC (EBIC) Score
10.1. Summary
The Extended BIC (EBIC) Score is a generalization of BIC intended for high-dimensional settings. It adds an extra penalty term that depends on the number of possible edges, favoring sparser graphs more strongly than standard BIC.
10.2. When to use
Number of variables is large relative to the sample size.
You want stronger sparsity encouragement than standard BIC provides.
You are using score-based methods (FGES, BOSS, GRaSP) in high-dimensional regimes.
10.3. Model class
Typically applied to linear Gaussian or discrete DAGs, but the EBIC form is generic.
10.4. Score form (conceptual)
A common EBIC form is:
EBIC = 2 * logL − k * ln(N) − 2 * γ * ln(choose(p, k_edges))
where γ is a parameter in [0, 1], p is the number of variables, and
k_edges is the number of edges.
10.5. Parameters
Parameter (camelCase) |
Description |
|---|---|
|
Double in [0, 1]. The gamma parameter for Extended BIC (EBIC). Values closer to 0 reduce EBIC to ordinary BIC; values closer to 1 add a strong extra penalty for models with many predictors (useful in high-dimensional settings). Default is 0.8. |
|
Boolean. If |
|
Double. Handles singular or nearly singular covariance matrices. If |
|
Double > 0, or |
10.6. Strengths
More conservative than BIC, tending to select sparser graphs in high- dimensional settings.
Supported by theory in some sparse regression and graphical model contexts.
10.7. Limitations
Choice of γ is somewhat problem-dependent.
May penalize edges too strongly when N is not extremely small compared to p.
10.8. References
Chen, J., & Chen, Z. (2008). Extended Bayesian information criteria for model selection with large model spaces. Biometrika, 95(3), 759–771.