44. RFCI-BSC

Type: Hybrid (constraint-based + Bayesian scoring over constraints)
Output: PAG (Partial Ancestral Graph) with edge-type probabilities

RfciBsc wraps RFCI in a Bayesian structural-constraints framework. It repeatedly runs RFCI with a probabilistic independence test, collects both the resulting PAGs and the probabilities of queried independence facts, then learns a Bayesian network over those facts. Each candidate PAG is scored under two Bayesian structural-constraints criteria (BSC-D and BSC-I), and the best-scoring PAG is returned, with edge-type probabilities attached from the ensemble.


44.1. Key Idea

RfciBsc proceeds in several stages:

  1. Initial RFCI runs and constraint collection

    • Uses the IndTestProbabilistic test inside an RFCI run to collect a map of independence facts and their estimated probabilities.

    • Repeats RFCI multiple times with a probabilistic test, saving the resulting PAGs.

    • From all queried facts, it selects only those whose independence probability lies in a “informative” band (between lowerBound and upperBound).

  2. Bootstrap and constraint data construction

    • Creates many bootstrap resamples of the original data.

    • For each bootstrap sample, re-estimates whether each selected independence fact is independent or dependent and encodes these as 0/1 in a “constraint dataset,” where each column is an independence fact and each row is a bootstrap replicate.

  3. Learning a dependency structure over constraints

    • Learns a Bayesian network over the constraint dataset using FGES with a BDeu score.

    • Converts the resulting CPDAG to a DAG and estimates conditional probability tables via Dirichlet-Bayes parameter learning.

    • This model captures dependence structure among the independence facts themselves.

  4. Bayesian structural-constraints scoring

    • For each candidate PAG from the ensemble, RfciBsc computes two log-probability scores:

      • BSC-I: based directly on the independence probabilities from the original probabilistic test.

      • BSC-D: “dependence-filtered,” using the learned BN over constraints to refine probabilities.

    • It then normalizes these scores across the ensemble to obtain BSC-D and BSC-I “posterior-like” scores for each PAG.

  5. Select and annotate output PAG

    • Identifies graphRBD (best under BSC-D) and graphRBI (best under BSC-I), and returns either graphRBD or graphRBI depending on outputRBD.

    • Adds edge-type probabilities to each edge, summarizing how often each edge type (tail–arrow, arrow–tail, circle–circle, etc.) occurs across the ensemble of PAGs.

The result is a single PAG that is both RFCI-compatible and globally scored using the joint behavior of all tested independence constraints.


44.2. When to Use

  • You want RFCI-style PAGs but would like a Bayesian meta-criterion to select among multiple candidate PAGs.

  • You have discrete data and can reasonably model independence-test outcomes as random variables.

  • You are willing to pay extra computation (multiple RFCIs, bootstrap resampling, and FGES) for a more globally coherent PAG.

  • You’d like edge-type probabilities summarizing uncertainty in the PAG structure.

Related algorithms:

  • RFCI (base constraint-based PAG learner)

  • PagSamplingRfci (RFCI ensemble with simple frequency aggregation)

  • FGES + BDeu (used internally to model dependencies among independence facts)


44.3. Prior Knowledge Support

Does it accept background knowledge?
Partially.

  • The Rfci object passed into the RfciBsc constructor can be configured with knowledge (forbidden/required edges, tiers) for the initial RFCI runs.

  • The randomized RFCI runs inside RfciBsc currently construct new RFCI instances with probabilistic tests and do not explicitly reuse the original knowledge. In practice this means:

    • Background knowledge may influence the initial constraint collection (through the original RFCI),

    • But randomized ensemble runs may not fully reflect that knowledge.

If knowledge is critical, you should be aware of this behavior and treat RfciBsc as an experimental or advanced method.


44.4. Strengths

  • Combines constraint-based learning (RFCI) with Bayesian scoring over constraints, offering a more global model selection criterion than local CI tests alone.

  • Produces two principled candidate PAGs: one maximizing BSC-D (dependence-filtered) and one maximizing BSC-I (independence-based).

  • Attaches edge-type probabilities to each edge, giving a richer summary of structural uncertainty.

  • Uses multi-threaded computation (ForkJoinPool) for RFCI runs, bootstrap constraint evaluation, and scoring.


44.5. Limitations

  • Discrete data only: relies on SimpleDataLoader and discrete BDeu scoring.

  • Computationally heavy: multiple RFCI runs, many bootstrap samples, and a separate FGES structure-learning step.

  • Parameter-rich: performance and behavior can be sensitive to bounds (lowerBound, upperBound), bootstrap size, and threshold/cutoff settings for probabilistic tests.

  • Background knowledge is not consistently propagated to all randomized RFCI runs in the current implementation.


44.6. Key Parameters in Tetrad

The main RfciBsc-specific knobs are:

Parameter (camelCase)

Description

numRandomizedSearchModels

Number of RFCI runs for generating candidate PAGs and collecting independence facts.

numBscBootstrapSamples

Number of bootstrap samples used to build the constraint dataset.

lowerBound

Lower probability threshold for selecting “informative” independence facts from IndTestProbabilistic (facts with probability below this are excluded).

upperBound

Upper probability threshold for selecting “informative” independence facts (facts with probability above this are excluded).

outputRBD

If true, output the best PAG under BSC-D (graphRBD); if false, output the best under BSC-I (graphRBI).

verbose

Controls detailed logging of intermediate steps and scores.

thresholdNoRandomDataSearch

Whether to apply a threshold rule for probabilistic independence in the initial RFCI runs over the original data.

cutoffDataSearch

Probability cutoff for independence in the initial RFCI runs (used when thresholdNoRandomDataSearch is true).

thresholdNoRandomConstrainSearch

Whether to apply a threshold rule when re-estimating independence in each bootstrap sample.

cutoffConstrainSearch

Probability cutoff for independence in the bootstrap-based constraint estimation.

Additional behavior is controlled by the underlying Rfci object you pass into the constructor, including:

  • depth (maximum conditioning set size)

  • maxDiscriminatingPathLength (limit on discriminating paths)

  • knowledge (forbidden/required edges, tiers)

  • IndTestProbabilistic settings used in the initial RFCI call


44.7. Reference

There is no dedicated public paper for RfciBsc. It was implemented by:

  • Chirayu Kong Wongchokprasitti, PhD

and builds on:

  • RFCI (Fast Causal Inference with latent variables and selection)

  • IndTestProbabilistic (probabilistic independence testing)

  • BCInference (Bayesian constraints inference framework)

  • FGES with BDeu (for learning a BN over independence facts)


44.8. Summary

RfciBsc is an advanced hybrid method that uses RFCI, bootstrap resampling, and a Bayesian model over independence constraints to select a best-supported PAG and attach edge-type probabilities, trading extra computation for a more globally coherent and uncertainty-aware causal graph.