6. CD-NOD — Causal Discovery from Nonstationary / Distribution-Shifted Data

Type: Constraint-based, distribution-shift aware
Output: CPDAG over observed variables plus a continuous change-index C

CD-NOD is a PC-style causal discovery algorithm for data with nonstationarity or distribution shifts.
It assumes you have a continuous change-index variable C (for example time, domain index, or an ordering of environments) and uses this, together with conditional independence tests, to learn a CPDAG over the measured variables and C. This Tetrad implementation is a translation of the CD-NOD idea (Zhang et al.) and of the corresponding implementation in the causal-learn project, adapted to the PC/FAS + Meek rules framework.

6.1. Key Idea

The core idea is:

Treat a continuous change index C as an exogenous driver of distributional changes and look for stable conditional independences in the joint distribution over (X, C).

Internally, CD-NOD:

Builds a skeleton with FAS (PC-style)
- Runs FAS on all variables including C, using a user-supplied IndependenceTest.
- Stores separating sets (sepsets) for later collider decisions.
- Optionally uses the stable variant of FAS.
Forces C → X adjacencies
- For any adjacency between C and a variable X, CD-NOD orients it as C → X, unless:
  - that direction is forbidden by knowledge, or
  - the opposite direction is required.
Orients unshielded colliders with a chosen style
- For each unshielded triple X–Z–Y (with X and Y non-adjacent), CD-NOD can use:
  - SEPSETS: standard PC rule based on stored separating sets;
  - CONSERVATIVE: CPC-style logic (requires consistent evidence);
  - MAX_P: compares best p-values for sepsets including vs excluding the middle node Z and orients according to the stronger side.
Applies Meek rules
- Runs Meek rules (with Knowledge) to propagate orientations and close under standard CPDAG implications.
- The final output is a CPDAG over the X variables plus C.

6.2. When to Use

Use Cdnod when:

You have nonstationary or heterogeneous data and a known change index C, such as:
- time (trend or slow drift),
- an environment index (domains, sites, regimes),
- a known ordering of batches or conditions.
You want a PC-style CPDAG that accounts for distribution shifts, rather than assuming i.i.d. data.
You have (or can define) a continuous C and a suitable conditional independence test over (X, C).
You want more robust collider decisions around nonstationary structure using CPC-style or MAX_P rules.

Related algorithms:

PC / CPC / PC-Max: same overall flavor, but assume stationarity, no special C.

6.3. Prior Knowledge Support

Does it accept background knowledge?
Yes.

CD-NOD uses Tetrad’s Knowledge in several places:

Skeleton phase: passes Knowledge into FAS to constrain allowed adjacencies.
C → X orientation: respects forbidden/required edges and tiering when deciding whether to orient C → X.
Collider orientation: checks Knowledge before orienting X → Z ← Y.
Meek rules: runs with Knowledge, so implied orientations also respect required/forbidden edges and tiers.

You can therefore enforce:

Required edges (X must cause Y),
Forbidden edges (X must not cause Y),
Tier/temporal constraints (edges must go forward in time / tier).

6.4. Strengths

Explicitly models distribution shift via C
- Makes CD-NOD ideas available in a PC-style CPDAG framework.
Flexible collider orientation
- Choice of SEPSETS, CONSERVATIVE (CPC), or MAX_P collider logic.
Integrates with standard Tetrad components
- Uses FAS, Meek rules, Knowledge, and IndependenceTest in a familiar way.
Supports timeouts and depth caps
- More controllable in large or high-dimensional problems.

6.5. Limitations

Requires a valid change index
- You must provide a meaningful continuous C; if C is arbitrary or noisy, results may degrade.
CPDAG only; no PAG semantics
- This implementation does not represent latent confounding or selection bias explicitly.
Same sensitivity to CI-test errors as PC
- Mis-specified tests or small samples can lead to missing or spurious edges, and to ambiguous collider decisions.
Assumes C is the last column
- When you provide dataWithC directly, the last column must be the change index C.

6.6. Key Parameters in Tetrad / Scripting

CD-NOD is typically constructed via its Builder in code, or wrapped by a higher-level Tetrad algorithm.
The main knobs are:

Parameter (camelCase)	Description
`stableFas`	Boolean. If `true`, uses the stable version of FAS for skeleton discovery (order-independent).
`colliderOrientationStyle`	Strategy for orienting colliders. Options include `SEPSETS`, `CONSERVATIVE`, or `MAX_P` depending on how aggressively to orient ambiguous triples.
`depth`	Maximum conditioning-set size for both FAS and collider detection. Use `-1` for unlimited depth.
`fdrQ`	False discovery rate threshold `q` for FDR-controlled independence testing. Used only when FDR is enabled.
`verbose`	If `true`, prints detailed diagnostic output for FAS, collider discovery, and orientation propagation.

6.7. Reference

The algorithmic idea is based on:

Zhang, K., Huang, B., Zhang, J., Glymour, C., & Schölkopf, B. (2017).
Causal discovery from nonstationary/heterogeneous data: Causal invariance and CD-NOD.
In Proceedings of the 31st Conference on Neural Information Processing Systems (NeurIPS).

This Tetrad implementation is an adaptation of the CD-NOD procedure, and closely follows the implementation available in the causal-learn project, re-expressed in a PC/FAS + Meek rules framework to produce a CPDAG.

6.8. Summary

CD-NOD is a PC-style constraint-based algorithm for nonstationary or distribution-shifted data, treating a continuous change index C as an exogenous driver of changes and returning a CPDAG over X and C using CD-NOD-inspired collider decisions, with full support for Tetrad’s background knowledge and tools.