48. StabilitySelection

Type: Wrapper / resampling ensemble (bootstrap-based)
Output: Same graph class as the wrapped algorithm (typically CPDAG / PAG / DAG / etc.)

StabilitySelection is a generic wrapper that repeatedly runs a base Tetrad algorithm on bootstrap resamples of the data and then keeps only edges that appear often enough across runs.
It does not define its own score or CI test; instead, it delegates all modeling assumptions and graph semantics to the wrapped Algorithm, and then performs stability selection on that algorithm’s output.

48.1. Key Idea

The idea is to take any causal discovery algorithm that returns a graph (DAG, CPDAG, PAG, …) and make its output more robust by:

Resampling the data
- Draw numSubsamples bootstrap samples (with replacement) of size percentSubsampleSize * N.
Running the base algorithm on each resample
- For each sample, run the wrapped Algorithm with the same Parameters.
- Collect the resulting graphs in a list.
Counting edge frequencies
- For each edge (including its orientation), count how many graphs contain it.
Thresholding by stability
- Include an edge in the final graph if its selection frequency exceeds percentStability (e.g., 0.7 → kept if present in > 70% of runs).

The final graph is built over the original variable set and contains only those edges (and orientations) that are sufficiently stable under resampling.

48.2. When to Use

Use StabilitySelection when:

You have a base algorithm (e.g., FGES, BOSS, PC, FCI, RFCI, DAGMA, etc.) and want a more conservative, robust edge set.
You are concerned that a single run over one dataset may give unstable edges due to sampling variability or tuning choices.
You want a simple stability heuristic without designing a new score or CI test.

Typical settings:

High-dimensional data where many edges are near the detection threshold.
Situations where you are willing to trade recall for precision (i.e., fewer but more reliable edges).

Related algorithms:

StARS (separate wrapper) — more specifically modeled after the StARS criterion for regularization paths.
Bootstrapping in general — StabilitySelection is a specific bootstrap-with-threshold pattern for edge selection.

48.3. Prior Knowledge Support

Does it accept background knowledge?
Indirectly, yes — through the wrapped Algorithm.

StabilitySelection itself does not manage Knowledge objects.
Whatever knowledge / constraints you pass to the base algorithm (e.g., forbidden/required edges, tiers) will be honored on each bootstrap run.
The final stable graph only contains edges that:
1. Are allowed by the base algorithm’s knowledge configuration, and
2. Survive the stability threshold.

So: knowledge support = that of the wrapped algorithm. The wrapper does not add or override any constraints.

48.4. Strengths

Algorithm-agnostic
- Works with any Tetrad Algorithm that takes a DataSet and Parameters and returns a Graph.
More robust edge set
- Edges must appear consistently across many resampled datasets to survive, which often improves precision.
Parallelized implementation
- Uses a ForkJoinPool to run subsample searches in parallel across available CPU cores.
Easy to drop in
- You can wrap an existing algorithm in code or scripting with a one-line change and reuse its parameter set.

48.5. Limitations

Computationally expensive
- If the base algorithm is already heavy, repeating it numSubsamples times multiplies runtime.
Heuristic thresholding
- percentStability is a user-chosen cutoff; there is no built-in theoretical guarantee that a particular value is “optimal.”
Bootstrap-with-replacement, not pure subsampling
- Implementation uses a BootstrapSampler with replacement (sample size percentSubsampleSize * N). This is close to subsampling but not identical to the original Meinshausen–Bühlmann stability selection setup.
Edge orientation stability mirrors the base algorithm
- If the base algorithm’s orientations are themselves unstable, stability selection may discard many of them; orientation robustness is no better than what repeated runs can support.

48.6. Key Parameters in Tetrad

All parameters appear alongside the wrapped algorithm’s parameters.
In addition to the base algorithm’s parameters, StabilitySelection adds:

Parameter (camelCase)	Description
`numSubsamples`	Number of bootstrap replications. For each replication, a new bootstrap sample is drawn and the base algorithm is run.
`percentSubsampleSize`	Fraction of the original sample size used for each bootstrap (e.g., `0.8` → each bootstrap sample has `0.8 * N` rows, with replacement).
`percentStability`	Stability threshold in `[0, 1]`. An edge is kept if its selection frequency across all runs exceeds `percentStability`.
`depth`	Passed through to the underlying algorithm’s parameter set. Not used directly by `StabilitySelection` itself, but often relevant for constraint-based base learners.
`verbose`	Passed through to the underlying algorithm’s parameter set if used there; `StabilitySelection` itself does not log per-run details.

Remember: the wrapped algorithm’s parameters (e.g., alpha, penaltyDiscount, useBes, etc.) are still fully respected and typically dominate the causal semantics of the result.

48.7. Reference

This wrapper is inspired by the general idea of stability selection:

Meinshausen, N., & Bühlmann, P. (2010).
Stability selection.
Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(4), 417–473.

Tetrad’s StabilitySelection adapts this idea to graph-structure selection by counting edge frequencies rather than variable inclusion in a regression model.

48.8. Summary

StabilitySelection is a generic bootstrap-based ensemble that wraps any Tetrad algorithm, repeatedly runs it on resampled data, and keeps only those edges (and orientations) that appear with high frequency—giving you a more conservative, stable graph built on top of whatever assumptions the base algorithm makes.