# PC — Peter–Clark Algorithm

**Type:** Constraint-based  
**Output:** CPDAG

PC is the classic constraint-based causal discovery algorithm given in *Causation, Prediction, and Search*. It learns a **CPDAG** (the Markov equivalence class of DAGs) under the assumptions of acyclicity, causal sufficiency, and the Causal Markov + Faithfulness conditions. PC proceeds by systematically removing edges using conditional independence (CI) tests, then orienting v-structures and propagating orientations while avoiding new unshielded colliders.

---

## Key Idea

PC operates in two stages:

1. **Adjacency search (FAS phase)**  
   Begin with a fully connected undirected graph and iteratively remove edges whenever variables are found conditionally independent given some subset of their neighbors.

2. **Orientation phase**  
   Using the separating sets from the adjacency search:
    - Orient **unshielded colliders** (`X → Y ← Z` where X–Z are nonadjacent and Y not in the separating set),
    - Apply orientation propagation rules (Meek’s rules) to complete the CPDAG without introducing new v-structures or directed cycles.

This yields a **sound and complete CPDAG** for oracle CI tests.

---

## When to Use

- Data are plausibly generated by a **DAG without latent confounders**.
- You want explicit **statistical control via α**.
- The CI test used (Gaussian, G², RCIT, etc.) matches the data type.
- You want a **lightweight** and widely understood baseline for comparison.

PC is also used as the adjacency phase of many other algorithms (e.g., FCI, PC-Max, CCD).

---

## Prior Knowledge Support

**PC fully supports background knowledge.**

You may supply:

- **Required edges**
- **Forbidden edges**
- **Temporal/tier constraints**
- Other standard `Knowledge` constraints

These constraints are respected during both adjacency removal and orientation propagation.

---

## Strengths

- Highly interpretable — every edge removal justified by a CI test
- Produces a **clean CPDAG** with explicit orientation logic
- Works with many types of CI tests (parametric, nonparametric, hybrid)
- Often very fast, especially with sparse graphs
- Serves as the foundation for many advanced algorithms (FCI, GFCI, BOSS-FCI, FCIT)

---

## Limitations

- The **theoretical guarantee** of correctness requires the data to be i.i.d. and faithful to a causally sufficient DAG.
- Sensitive to CI test errors when conditioning sets become large.
- May leave many edges unoriented, especially in dense graphs or with weak dependencies.
- Cannot by itself distinguish latent confounding from direct causation (FCI and its variants address this).

---

## Key Parameters in Tetrad

The PC wrapper exposes the following parameters (camelCase names shown):

| Parameter (camelCase) | Description |
|------------------------|-------------|
| `stableFas` | If true, uses the *stable* adjacency phase (order-independent edge removals). |
| `colliderOrientationStyle` | Chooses the style of collider identification/orientation (PC-standard, conservative, etc.). |
| `allowBidirected` | Allows temporary bidirected edges during orientation. |
| `depth` | Maximum size of conditioning sets during adjacency search. (`-1` = unlimited) |
| `fdrQ` | Use FDR correction at level Q instead of fixed α (optional). |
| `timeLag` | Lag τ when applying PC to time-series or lagged datasets. |
| `timeLagReplicatingGraph` | Whether lagged graphs replicate structure across time slices. |
| `verbose` | Print detailed CI-test and orientation decisions. |

These allow fine control over complexity, reproducibility, and collider logic.

## Historical Notes

The **PC algorithm** was one of the original “Tetrad” algorithms and has been implemented and studied widely. Its development unfolded across three major milestones:

1. **Spirtes & Glymour (1991)**  
   *“An Algorithm for Fast Recovery of Sparse Causal Graphs”*  
   Introduced the first version of the algorithm and established the key insight that conditional independence information can efficiently prune a fully connected graph under sparsity assumptions.

2. **Spirtes, Glymour & Scheines (1993)**  
   *Causation, Prediction, and Search (1st edition)*  
   Refined and extended the algorithm, with clearer statements of assumptions, correctness proofs, and the introduction of what later became known as the **FAS** (Fast Adjacency Search) phase.

3. **Spirtes, Glymour & Scheines (2000)**  
   *Causation, Prediction, and Search (2nd edition)*  
   Formalized PC in its modern form. Meek’s 1995 orientation rules were incorporated, establishing the now-standard procedure of  
   (i) adjacency pruning via conditional independence, followed by  
   (ii) propagation of orientations via sound and complete rules.

### Why PC is historically important

- **First scalable causal graph algorithm**: PC demonstrated that conditional independence tests, together with sparsity, allowed structure discovery in time that is *polynomial* for sparse graphs.
- **Introduced the idea of equivalence classes**: PC made CPDAGs the standard representation for causal structures under Markov and faithfulness assumptions.
- **Inspired nearly every major causal algorithm since**: FCI, RFCI, GFCI, PC-Max, CCD, and many others are direct conceptual descendants of PC.

Today PC remains:
- a baseline that all new algorithms are compared against,
- a teaching tool for causal inference fundamentals, and
- a practical choice in many real-world settings.

The Tetrad implementation continues this lineage, with additional improvements (parallelization, stable FAS, PC-Max, skewness extensions, etc.) that preserve the spirit of the original algorithm while adapting it to modern data sizes.

---

## Additional Reference

Meek, C. (1995). 
Causal inference and construction of explanation with background knowledge. 
In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (UAI-95), pp. 403-411. Morgan Kaufmann Publishers.

---

## Summary

PC is the foundational constraint-based algorithm for learning CPDAGs, combining systematic CI testing with principled orientation rules. It remains a gold-standard baseline for settings without latent confounders.