# DAGMA — Learning DAGs via M-Matrices and Log-Determinant Acyclicity

**Type:** Score-based, continuous optimization  
**Output:** DAG or CPDAG (depending on settings)

**DAGMA** (Bello, Aragam & Ravikumar 2022) is a continuous optimization method for learning **directed acyclic graphs** using a smooth, differentiable characterization of acyclicity based on **M-matrices** and **log-determinant penalties**. DAGMA directly optimizes a penalized likelihood objective under an exact acyclicity constraint, producing a weighted adjacency matrix whose thresholded structure represents a DAG. Tetrad’s implementation follows the original optimization loop closely and provides an option to convert the learned DAG into a CPDAG.

---

## Key Idea

DAGMA replaces the combinatorial acyclicity constraint with a **log-determinant characterization**:

- A matrix corresponds to a DAG iff a certain **M-matrix** constructed from it has **positive diagonal and nonpositive off-diagonals** and satisfies log-determinant conditions.
- DAGMA optimizes:
  - a least-squares likelihood term,
  - an L1 sparsity penalty,
  - plus a smooth acyclicity penalty.
- Optimization uses **ADAM** with continuation over:
  - decreasing central-path parameter μ,
  - a sequence of increasing s-values defining different M-matrices.

The result is a continuous weight matrix W that is then thresholded and optionally closed under Meek rules.

---

## When to Use

Use DAGMA when:

- You want a **purely score-based**, continuous optimization method for DAG learning.
- Data are **continuous**, reasonably large N, and roughly linear-Gaussian or linear-non-Gaussian.
- You prefer a **DAG** rather than CPDAG output.
- You want an alternative to NOTEARS, GraNDAG, GOLEM, or BOSS for continuous DAG learning.

Avoid DAGMA when:

- You need **latent-variable** handling.
- You need strict **knowledge constraints** (forbidden/required edges) — DAGMA does *not* support these.
- Data are strongly nonlinear or heavy-tailed (FASK, DirectLiNGAM, or nonlinear algorithms may be preferable).

---

## Prior Knowledge Support

**Does DAGMA accept background knowledge?**  
**No.**  
The current implementation in Tetrad **does not** honor:

- forbidden edges,
- required edges,
- tier/temporal constraints, or
- structural priors.

---

## Strengths

- Continuous optimization → **fast** for moderate dimensionality.
- Exact acyclicity.
- Often produces **clean, sparse DAGs**.
- No need for CI tests — works well when CI tests are unreliable.

---

## Limitations

- **No support for background knowledge**.
- Requires tuning of several optimization parameters.
- Sensitive to covariance estimation; works best with large N.
- Optimization can fail or slow down for high-dimensional, noisy datasets.

---

## Key Parameters in Tetrad

| Parameter (camelCase) | Description |
|------------------------|-------------|
| `lambda1` | L1 sparsity penalty on edge weights. |
| `wThreshold` | Initial threshold for pruning small weights. |
| `cpdag` | Output CPDAG if true; otherwise return DAG. |

---

## Reference

Bello, K., Aragam, B., & Ravikumar, P. (2022).  
**DAGMA: Learning DAGs via M-Matrices and a Log-Determinant Acyclicity Characterization.**  
*NeurIPS 2022*, 35, 8226–8239.

---

## Summary

DAGMA is a smooth, score-based DAG learning algorithm enforcing exact acyclicity using M-matrix log-determinant constraints. It produces clean DAGs without CI tests but does **not** support knowledge constraints in Tetrad.