6. CD-NOD β€” Causal Discovery from Nonstationary / Distribution-Shifted Data

Type: Constraint-based, distribution-shift aware
Output: CPDAG over observed variables plus a continuous change-index C

CD-NOD is a PC-style causal discovery algorithm for data with nonstationarity or distribution shifts.
It assumes you have a continuous change-index variable C (for example time, domain index, or an ordering of environments) and uses this, together with conditional independence tests, to learn a CPDAG over the measured variables and C. This Tetrad implementation is a translation of the CD-NOD idea (Zhang et al.) and of the corresponding implementation in the causal-learn project, adapted to the PC/FAS + Meek rules framework.


6.1. Key Idea

The core idea is:

Treat a continuous change index C as an exogenous driver of distributional changes and look for stable conditional independences in the joint distribution over (X, C).

Internally, CD-NOD:

  1. Builds a skeleton with FAS (PC-style)

    • Runs FAS on all variables including C, using a user-supplied IndependenceTest.

    • Stores separating sets (sepsets) for later collider decisions.

    • Optionally uses the stable variant of FAS.

  2. Forces C β†’ X adjacencies

    • For any adjacency between C and a variable X, CD-NOD orients it as C β†’ X, unless:

      • that direction is forbidden by knowledge, or

      • the opposite direction is required.

  3. Orients unshielded colliders with a chosen style

    • For each unshielded triple X–Z–Y (with X and Y non-adjacent), CD-NOD can use:

      • SEPSETS: standard PC rule based on stored separating sets;

      • CONSERVATIVE: CPC-style logic (requires consistent evidence);

      • MAX_P: compares best p-values for sepsets including vs excluding the middle node Z and orients according to the stronger side.

  4. Applies Meek rules

    • Runs Meek rules (with Knowledge) to propagate orientations and close under standard CPDAG implications.

    • The final output is a CPDAG over the X variables plus C.


6.2. When to Use

Use Cdnod when:

  • You have nonstationary or heterogeneous data and a known change index C, such as:

    • time (trend or slow drift),

    • an environment index (domains, sites, regimes),

    • a known ordering of batches or conditions.

  • You want a PC-style CPDAG that accounts for distribution shifts, rather than assuming i.i.d. data.

  • You have (or can define) a continuous C and a suitable conditional independence test over (X, C).

  • You want more robust collider decisions around nonstationary structure using CPC-style or MAX_P rules.

Related algorithms:

  • PC / CPC / PC-Max: same overall flavor, but assume stationarity, no special C.


6.3. Prior Knowledge Support

Does it accept background knowledge?
Yes.

CD-NOD uses Tetrad’s Knowledge in several places:

  • Skeleton phase: passes Knowledge into FAS to constrain allowed adjacencies.

  • C β†’ X orientation: respects forbidden/required edges and tiering when deciding whether to orient C β†’ X.

  • Collider orientation: checks Knowledge before orienting X β†’ Z ← Y.

  • Meek rules: runs with Knowledge, so implied orientations also respect required/forbidden edges and tiers.

You can therefore enforce:

  • Required edges (X must cause Y),

  • Forbidden edges (X must not cause Y),

  • Tier/temporal constraints (edges must go forward in time / tier).


6.4. Strengths

  • Explicitly models distribution shift via C

    • Makes CD-NOD ideas available in a PC-style CPDAG framework.

  • Flexible collider orientation

    • Choice of SEPSETS, CONSERVATIVE (CPC), or MAX_P collider logic.

  • Integrates with standard Tetrad components

    • Uses FAS, Meek rules, Knowledge, and IndependenceTest in a familiar way.

  • Supports timeouts and depth caps

    • More controllable in large or high-dimensional problems.


6.5. Limitations

  • Requires a valid change index

    • You must provide a meaningful continuous C; if C is arbitrary or noisy, results may degrade.

  • CPDAG only; no PAG semantics

    • This implementation does not represent latent confounding or selection bias explicitly.

  • Same sensitivity to CI-test errors as PC

    • Mis-specified tests or small samples can lead to missing or spurious edges, and to ambiguous collider decisions.

  • Assumes C is the last column

    • When you provide dataWithC directly, the last column must be the change index C.


6.6. Key Parameters in Tetrad / Scripting

CD-NOD is typically constructed via its Builder in code, or wrapped by a higher-level Tetrad algorithm.
The main knobs are:

Parameter (camelCase)

Description

stableFas

Boolean. If true, uses the stable version of FAS for skeleton discovery (order-independent).

colliderOrientationStyle

Strategy for orienting colliders. Options include SEPSETS, CONSERVATIVE, or MAX_P depending on how aggressively to orient ambiguous triples.

depth

Maximum conditioning-set size for both FAS and collider detection. Use -1 for unlimited depth.

fdrQ

False discovery rate threshold q for FDR-controlled independence testing. Used only when FDR is enabled.

verbose

If true, prints detailed diagnostic output for FAS, collider discovery, and orientation propagation.


6.7. Reference

The algorithmic idea is based on:

Zhang, K., Huang, B., Zhang, J., Glymour, C., & SchΓΆlkopf, B. (2017).
Causal discovery from nonstationary/heterogeneous data: Causal invariance and CD-NOD.
In Proceedings of the 31st Conference on Neural Information Processing Systems (NeurIPS).

This Tetrad implementation is an adaptation of the CD-NOD procedure, and closely follows the implementation available in the causal-learn project, re-expressed in a PC/FAS + Meek rules framework to produce a CPDAG.


6.8. Summary

CD-NOD is a PC-style constraint-based algorithm for nonstationary or distribution-shifted data, treating a continuous change index C as an exogenous driver of changes and returning a CPDAG over X and C using CD-NOD-inspired collider decisions, with full support for Tetrad’s background knowledge and tools.