37. PC — Peter–Clark Algorithm

Type: Constraint-based
Output: CPDAG

PC is the classic constraint-based causal discovery algorithm given in Causation, Prediction, and Search. It learns a CPDAG (the Markov equivalence class of DAGs) under the assumptions of acyclicity, causal sufficiency, and the Causal Markov + Faithfulness conditions. PC proceeds by systematically removing edges using conditional independence (CI) tests, then orienting v-structures and propagating orientations while avoiding new unshielded colliders.


37.1. Key Idea

PC operates in two stages:

  1. Adjacency search (FAS phase)
    Begin with a fully connected undirected graph and iteratively remove edges whenever variables are found conditionally independent given some subset of their neighbors.

  2. Orientation phase
    Using the separating sets from the adjacency search:

    • Orient unshielded colliders (X Y Z where X–Z are nonadjacent and Y not in the separating set),

    • Apply orientation propagation rules (Meek’s rules) to complete the CPDAG without introducing new v-structures or directed cycles.

This yields a sound and complete CPDAG for oracle CI tests.


37.2. When to Use

  • Data are plausibly generated by a DAG without latent confounders.

  • You want explicit statistical control via α.

  • The CI test used (Gaussian, G², RCIT, etc.) matches the data type.

  • You want a lightweight and widely understood baseline for comparison.

PC is also used as the adjacency phase of many other algorithms (e.g., FCI, PC-Max, CCD).


37.3. Prior Knowledge Support

PC fully supports background knowledge.

You may supply:

  • Required edges

  • Forbidden edges

  • Temporal/tier constraints

  • Other standard Knowledge constraints

These constraints are respected during both adjacency removal and orientation propagation.


37.4. Strengths

  • Highly interpretable — every edge removal justified by a CI test

  • Produces a clean CPDAG with explicit orientation logic

  • Works with many types of CI tests (parametric, nonparametric, hybrid)

  • Often very fast, especially with sparse graphs

  • Serves as the foundation for many advanced algorithms (FCI, GFCI, BOSS-FCI, FCIT)


37.5. Limitations

  • The theoretical guarantee of correctness requires the data to be i.i.d. and faithful to a causally sufficient DAG.

  • Sensitive to CI test errors when conditioning sets become large.

  • May leave many edges unoriented, especially in dense graphs or with weak dependencies.

  • Cannot by itself distinguish latent confounding from direct causation (FCI and its variants address this).


37.6. Key Parameters in Tetrad

The PC wrapper exposes the following parameters (camelCase names shown):

Parameter (camelCase)

Description

stableFas

If true, uses the stable adjacency phase (order-independent edge removals).

colliderOrientationStyle

Chooses the style of collider identification/orientation (PC-standard, conservative, etc.).

allowBidirected

Allows temporary bidirected edges during orientation.

depth

Maximum size of conditioning sets during adjacency search. (-1 = unlimited)

fdrQ

Use FDR correction at level Q instead of fixed α (optional).

timeLag

Lag τ when applying PC to time-series or lagged datasets.

timeLagReplicatingGraph

Whether lagged graphs replicate structure across time slices.

verbose

Print detailed CI-test and orientation decisions.

These allow fine control over complexity, reproducibility, and collider logic.

37.7. Historical Notes

The PC algorithm was one of the original “Tetrad” algorithms and has been implemented and studied widely. Its development unfolded across three major milestones:

  1. Spirtes & Glymour (1991)
    “An Algorithm for Fast Recovery of Sparse Causal Graphs”
    Introduced the first version of the algorithm and established the key insight that conditional independence information can efficiently prune a fully connected graph under sparsity assumptions.

  2. Spirtes, Glymour & Scheines (1993)
    Causation, Prediction, and Search (1st edition)
    Refined and extended the algorithm, with clearer statements of assumptions, correctness proofs, and the introduction of what later became known as the FAS (Fast Adjacency Search) phase.

  3. Spirtes, Glymour & Scheines (2000)
    Causation, Prediction, and Search (2nd edition)
    Formalized PC in its modern form. Meek’s 1995 orientation rules were incorporated, establishing the now-standard procedure of
    (i) adjacency pruning via conditional independence, followed by
    (ii) propagation of orientations via sound and complete rules.

37.7.1. Why PC is historically important

  • First scalable causal graph algorithm: PC demonstrated that conditional independence tests, together with sparsity, allowed structure discovery in time that is polynomial for sparse graphs.

  • Introduced the idea of equivalence classes: PC made CPDAGs the standard representation for causal structures under Markov and faithfulness assumptions.

  • Inspired nearly every major causal algorithm since: FCI, RFCI, GFCI, PC-Max, CCD, and many others are direct conceptual descendants of PC.

Today PC remains:

  • a baseline that all new algorithms are compared against,

  • a teaching tool for causal inference fundamentals, and

  • a practical choice in many real-world settings.

The Tetrad implementation continues this lineage, with additional improvements (parallelization, stable FAS, PC-Max, skewness extensions, etc.) that preserve the spirit of the original algorithm while adapting it to modern data sizes.


37.8. Additional Reference

Meek, C. (1995). Causal inference and construction of explanation with background knowledge. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (UAI-95), pp. 403-411. Morgan Kaufmann Publishers.


37.9. Summary

PC is the foundational constraint-based algorithm for learning CPDAGs, combining systematic CI testing with principled orientation rules. It remains a gold-standard baseline for settings without latent confounders.