9. DAGMA — Learning DAGs via M-Matrices and Log-Determinant Acyclicity
Type: Score-based, continuous optimization
Output: DAG or CPDAG (depending on settings)
DAGMA (Bello, Aragam & Ravikumar 2022) is a continuous optimization method for learning directed acyclic graphs using a smooth, differentiable characterization of acyclicity based on M-matrices and log-determinant penalties. DAGMA directly optimizes a penalized likelihood objective under an exact acyclicity constraint, producing a weighted adjacency matrix whose thresholded structure represents a DAG. Tetrad’s implementation follows the original optimization loop closely and provides an option to convert the learned DAG into a CPDAG.
9.1. Key Idea
DAGMA replaces the combinatorial acyclicity constraint with a log-determinant characterization:
A matrix corresponds to a DAG iff a certain M-matrix constructed from it has positive diagonal and nonpositive off-diagonals and satisfies log-determinant conditions.
DAGMA optimizes:
a least-squares likelihood term,
an L1 sparsity penalty,
plus a smooth acyclicity penalty.
Optimization uses ADAM with continuation over:
decreasing central-path parameter μ,
a sequence of increasing s-values defining different M-matrices.
The result is a continuous weight matrix W that is then thresholded and optionally closed under Meek rules.
9.2. When to Use
Use DAGMA when:
You want a purely score-based, continuous optimization method for DAG learning.
Data are continuous, reasonably large N, and roughly linear-Gaussian or linear-non-Gaussian.
You prefer a DAG rather than CPDAG output.
You want an alternative to NOTEARS, GraNDAG, GOLEM, or BOSS for continuous DAG learning.
Avoid DAGMA when:
You need latent-variable handling.
You need strict knowledge constraints (forbidden/required edges) — DAGMA does not support these.
Data are strongly nonlinear or heavy-tailed (FASK, DirectLiNGAM, or nonlinear algorithms may be preferable).
9.3. Prior Knowledge Support
Does DAGMA accept background knowledge?
No.
The current implementation in Tetrad does not honor:
forbidden edges,
required edges,
tier/temporal constraints, or
structural priors.
9.4. Strengths
Continuous optimization → fast for moderate dimensionality.
Exact acyclicity.
Often produces clean, sparse DAGs.
No need for CI tests — works well when CI tests are unreliable.
9.5. Limitations
No support for background knowledge.
Requires tuning of several optimization parameters.
Sensitive to covariance estimation; works best with large N.
Optimization can fail or slow down for high-dimensional, noisy datasets.
9.6. Key Parameters in Tetrad
Parameter (camelCase) |
Description |
|---|---|
|
L1 sparsity penalty on edge weights. |
|
Initial threshold for pruning small weights. |
|
Output CPDAG if true; otherwise return DAG. |
9.7. Reference
Bello, K., Aragam, B., & Ravikumar, P. (2022).
DAGMA: Learning DAGs via M-Matrices and a Log-Determinant Acyclicity Characterization.
NeurIPS 2022, 35, 8226–8239.
9.8. Summary
DAGMA is a smooth, score-based DAG learning algorithm enforcing exact acyclicity using M-matrix log-determinant constraints. It produces clean DAGs without CI tests but does not support knowledge constraints in Tetrad.