9. DAGMA — Learning DAGs via M-Matrices and Log-Determinant Acyclicity

Type: Score-based, continuous optimization
Output: DAG or CPDAG (depending on settings)

DAGMA (Bello, Aragam & Ravikumar 2022) is a continuous optimization method for learning directed acyclic graphs using a smooth, differentiable characterization of acyclicity based on M-matrices and log-determinant penalties. DAGMA directly optimizes a penalized likelihood objective under an exact acyclicity constraint, producing a weighted adjacency matrix whose thresholded structure represents a DAG. Tetrad’s implementation follows the original optimization loop closely and provides an option to convert the learned DAG into a CPDAG.

9.1. Key Idea

DAGMA replaces the combinatorial acyclicity constraint with a log-determinant characterization:

A matrix corresponds to a DAG iff a certain M-matrix constructed from it has positive diagonal and nonpositive off-diagonals and satisfies log-determinant conditions.
DAGMA optimizes:
- a least-squares likelihood term,
- an L1 sparsity penalty,
- plus a smooth acyclicity penalty.
Optimization uses ADAM with continuation over:
- decreasing central-path parameter μ,
- a sequence of increasing s-values defining different M-matrices.

The result is a continuous weight matrix W that is then thresholded and optionally closed under Meek rules.

9.2. When to Use

Use DAGMA when:

You want a purely score-based, continuous optimization method for DAG learning.
Data are continuous, reasonably large N, and roughly linear-Gaussian or linear-non-Gaussian.
You prefer a DAG rather than CPDAG output.
You want an alternative to NOTEARS, GraNDAG, GOLEM, or BOSS for continuous DAG learning.

Avoid DAGMA when:

You need latent-variable handling.
You need strict knowledge constraints (forbidden/required edges) — DAGMA does not support these.
Data are strongly nonlinear or heavy-tailed (FASK, DirectLiNGAM, or nonlinear algorithms may be preferable).

9.3. Prior Knowledge Support

Does DAGMA accept background knowledge?
No.
The current implementation in Tetrad does not honor:

forbidden edges,
required edges,
tier/temporal constraints, or
structural priors.

9.4. Strengths

Continuous optimization → fast for moderate dimensionality.
Exact acyclicity.
Often produces clean, sparse DAGs.
No need for CI tests — works well when CI tests are unreliable.

9.5. Limitations

No support for background knowledge.
Requires tuning of several optimization parameters.
Sensitive to covariance estimation; works best with large N.
Optimization can fail or slow down for high-dimensional, noisy datasets.

9.6. Key Parameters in Tetrad

Parameter (camelCase)	Description
`lambda1`	L1 sparsity penalty on edge weights.
`wThreshold`	Initial threshold for pruning small weights.
`cpdag`	Output CPDAG if true; otherwise return DAG.

9.7. Reference

Bello, K., Aragam, B., & Ravikumar, P. (2022).
DAGMA: Learning DAGs via M-Matrices and a Log-Determinant Acyclicity Characterization.
NeurIPS 2022, 35, 8226–8239.

9.8. Summary

DAGMA is a smooth, score-based DAG learning algorithm enforcing exact acyclicity using M-matrix log-determinant constraints. It produces clean DAGs without CI tests but does not support knowledge constraints in Tetrad.