48. StabilitySelection

Type: Wrapper / resampling ensemble (bootstrap-based)
Output: Same graph class as the wrapped algorithm (typically CPDAG / PAG / DAG / etc.)

StabilitySelection is a generic wrapper that repeatedly runs a base Tetrad algorithm on bootstrap resamples of the data and then keeps only edges that appear often enough across runs.
It does not define its own score or CI test; instead, it delegates all modeling assumptions and graph semantics to the wrapped Algorithm, and then performs stability selection on that algorithm’s output.


48.1. Key Idea

The idea is to take any causal discovery algorithm that returns a graph (DAG, CPDAG, PAG, …) and make its output more robust by:

  1. Resampling the data

    • Draw numSubsamples bootstrap samples (with replacement) of size percentSubsampleSize * N.

  2. Running the base algorithm on each resample

    • For each sample, run the wrapped Algorithm with the same Parameters.

    • Collect the resulting graphs in a list.

  3. Counting edge frequencies

    • For each edge (including its orientation), count how many graphs contain it.

  4. Thresholding by stability

    • Include an edge in the final graph if its selection frequency exceeds percentStability (e.g., 0.7 → kept if present in > 70% of runs).

The final graph is built over the original variable set and contains only those edges (and orientations) that are sufficiently stable under resampling.


48.2. When to Use

Use StabilitySelection when:

  • You have a base algorithm (e.g., FGES, BOSS, PC, FCI, RFCI, DAGMA, etc.) and want a more conservative, robust edge set.

  • You are concerned that a single run over one dataset may give unstable edges due to sampling variability or tuning choices.

  • You want a simple stability heuristic without designing a new score or CI test.

Typical settings:

  • High-dimensional data where many edges are near the detection threshold.

  • Situations where you are willing to trade recall for precision (i.e., fewer but more reliable edges).

Related algorithms:

  • StARS (separate wrapper) — more specifically modeled after the StARS criterion for regularization paths.

  • Bootstrapping in general — StabilitySelection is a specific bootstrap-with-threshold pattern for edge selection.


48.3. Prior Knowledge Support

Does it accept background knowledge?
Indirectly, yes — through the wrapped Algorithm.

  • StabilitySelection itself does not manage Knowledge objects.

  • Whatever knowledge / constraints you pass to the base algorithm (e.g., forbidden/required edges, tiers) will be honored on each bootstrap run.

  • The final stable graph only contains edges that:

    1. Are allowed by the base algorithm’s knowledge configuration, and

    2. Survive the stability threshold.

So: knowledge support = that of the wrapped algorithm. The wrapper does not add or override any constraints.


48.4. Strengths

  • Algorithm-agnostic

    • Works with any Tetrad Algorithm that takes a DataSet and Parameters and returns a Graph.

  • More robust edge set

    • Edges must appear consistently across many resampled datasets to survive, which often improves precision.

  • Parallelized implementation

    • Uses a ForkJoinPool to run subsample searches in parallel across available CPU cores.

  • Easy to drop in

    • You can wrap an existing algorithm in code or scripting with a one-line change and reuse its parameter set.


48.5. Limitations

  • Computationally expensive

    • If the base algorithm is already heavy, repeating it numSubsamples times multiplies runtime.

  • Heuristic thresholding

    • percentStability is a user-chosen cutoff; there is no built-in theoretical guarantee that a particular value is “optimal.”

  • Bootstrap-with-replacement, not pure subsampling

    • Implementation uses a BootstrapSampler with replacement (sample size percentSubsampleSize * N). This is close to subsampling but not identical to the original Meinshausen–Bühlmann stability selection setup.

  • Edge orientation stability mirrors the base algorithm

    • If the base algorithm’s orientations are themselves unstable, stability selection may discard many of them; orientation robustness is no better than what repeated runs can support.


48.6. Key Parameters in Tetrad

All parameters appear alongside the wrapped algorithm’s parameters.
In addition to the base algorithm’s parameters, StabilitySelection adds:

Parameter (camelCase)

Description

numSubsamples

Number of bootstrap replications. For each replication, a new bootstrap sample is drawn and the base algorithm is run.

percentSubsampleSize

Fraction of the original sample size used for each bootstrap (e.g., 0.8 → each bootstrap sample has 0.8 * N rows, with replacement).

percentStability

Stability threshold in [0, 1]. An edge is kept if its selection frequency across all runs exceeds percentStability.

depth

Passed through to the underlying algorithm’s parameter set. Not used directly by StabilitySelection itself, but often relevant for constraint-based base learners.

verbose

Passed through to the underlying algorithm’s parameter set if used there; StabilitySelection itself does not log per-run details.

Remember: the wrapped algorithm’s parameters (e.g., alpha, penaltyDiscount, useBes, etc.) are still fully respected and typically dominate the causal semantics of the result.


48.7. Reference

This wrapper is inspired by the general idea of stability selection:

  • Meinshausen, N., & Bühlmann, P. (2010).
    Stability selection.
    Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(4), 417–473.

Tetrad’s StabilitySelection adapts this idea to graph-structure selection by counting edge frequencies rather than variable inclusion in a regression model.


48.8. Summary

StabilitySelection is a generic bootstrap-based ensemble that wraps any Tetrad algorithm, repeatedly runs it on resampled data, and keeps only those edges (and orientations) that appear with high frequency—giving you a more conservative, stable graph built on top of whatever assumptions the base algorithm makes.