9. DAGMA — Learning DAGs via M-Matrices and Log-Determinant Acyclicity

Type: Score-based, continuous optimization
Output: DAG or CPDAG (depending on settings)

DAGMA (Bello, Aragam & Ravikumar 2022) is a continuous optimization method for learning directed acyclic graphs using a smooth, differentiable characterization of acyclicity based on M-matrices and log-determinant penalties. DAGMA directly optimizes a penalized likelihood objective under an exact acyclicity constraint, producing a weighted adjacency matrix whose thresholded structure represents a DAG. Tetrad’s implementation follows the original optimization loop closely and provides an option to convert the learned DAG into a CPDAG.


9.1. Key Idea

DAGMA replaces the combinatorial acyclicity constraint with a log-determinant characterization:

  • A matrix corresponds to a DAG iff a certain M-matrix constructed from it has positive diagonal and nonpositive off-diagonals and satisfies log-determinant conditions.

  • DAGMA optimizes:

    • a least-squares likelihood term,

    • an L1 sparsity penalty,

    • plus a smooth acyclicity penalty.

  • Optimization uses ADAM with continuation over:

    • decreasing central-path parameter μ,

    • a sequence of increasing s-values defining different M-matrices.

The result is a continuous weight matrix W that is then thresholded and optionally closed under Meek rules.


9.2. When to Use

Use DAGMA when:

  • You want a purely score-based, continuous optimization method for DAG learning.

  • Data are continuous, reasonably large N, and roughly linear-Gaussian or linear-non-Gaussian.

  • You prefer a DAG rather than CPDAG output.

  • You want an alternative to NOTEARS, GraNDAG, GOLEM, or BOSS for continuous DAG learning.

Avoid DAGMA when:

  • You need latent-variable handling.

  • You need strict knowledge constraints (forbidden/required edges) — DAGMA does not support these.

  • Data are strongly nonlinear or heavy-tailed (FASK, DirectLiNGAM, or nonlinear algorithms may be preferable).


9.3. Prior Knowledge Support

Does DAGMA accept background knowledge?
No.
The current implementation in Tetrad does not honor:

  • forbidden edges,

  • required edges,

  • tier/temporal constraints, or

  • structural priors.


9.4. Strengths

  • Continuous optimization → fast for moderate dimensionality.

  • Exact acyclicity.

  • Often produces clean, sparse DAGs.

  • No need for CI tests — works well when CI tests are unreliable.


9.5. Limitations

  • No support for background knowledge.

  • Requires tuning of several optimization parameters.

  • Sensitive to covariance estimation; works best with large N.

  • Optimization can fail or slow down for high-dimensional, noisy datasets.


9.6. Key Parameters in Tetrad

Parameter (camelCase)

Description

lambda1

L1 sparsity penalty on edge weights.

wThreshold

Initial threshold for pruning small weights.

cpdag

Output CPDAG if true; otherwise return DAG.


9.7. Reference

Bello, K., Aragam, B., & Ravikumar, P. (2022).
DAGMA: Learning DAGs via M-Matrices and a Log-Determinant Acyclicity Characterization.
NeurIPS 2022, 35, 8226–8239.


9.8. Summary

DAGMA is a smooth, score-based DAG learning algorithm enforcing exact acyclicity using M-matrix log-determinant constraints. It produces clean DAGs without CI tests but does not support knowledge constraints in Tetrad.