17. FCI-IOD — FCI with Independent Overlapping Datasets
Type: Constraint-based (multi-dataset) Output: PAG Reference: Tillman & Spirtes (2011). Learning equivalence classes of acyclic models with latent and selection variables from multiple datasets with overlapping variables. AISTATS.
FCI-IOD extends the standard Fast Causal Inference (FCI) algorithm to work with multiple datasets that contain different (but overlapping) subsets of variables. It uses the IOD (Independent Overlapping Datasets) independence test, which pools CI information across datasets to infer a single PAG over the union of variables.
17.1. Key Idea
When multiple datasets each contain only some of the variables (e.g., due to study design, missing experimental modules, data collection constraints), standard CI testing cannot be applied directly, because not all conditioning sets are available in every dataset.
FCI-IOD solves this by:
Pooled CI testing (IOD test)
Performs CI tests across all datasets where the relevant variables are jointly observed.
Combines their evidence to decide conditional independence or dependence.
Modified FCI search
Uses these pooled CI judgments for adjacency search.
Applies the standard FCI orientation rules to produce a PAG representing all models compatible with the (pooled) CI information.
Full union graph
Produces a PAG over all variables observed in any dataset, even those never jointly observed in one dataset.
The result is a latent/selection-robust PAG that integrates information across overlapping datasets.
17.2. When to Use
Use FCI-IOD when:
You have multiple observational datasets with partially overlapping variable sets. (e.g., Dataset A has {X, Y, Z}, Dataset B has {Y, Z, W}, etc.)
Variables may share latent confounders.
Variables may be subject to selection bias in some datasets.
You want a single, integrated causal model over all variables.
Typical applications:
Consortium or multi-site studies where different labs measure different subsets of variables.
Combining datasets across years where instrument batteries change.
Merging clinical datasets with heterogeneous measurement panels.
17.3. Prior Knowledge Support
FCI-IOD accepts the same background knowledge as FCI:
Required edges
Forbidden edges
Temporal / tier constraints
Knowledge is enforced during adjacency pruning and orientation.
17.4. Strengths
Integrates multiple datasets with differing variable sets.
Latent-variable and selection-bias robust, through FCI logic.
Produces a single PAG covering all measured variables.
Leverages all available CI information, even when no dataset contains all variables jointly.
17.5. Limitations
Computational cost grows with:
number of datasets
size of variable union
size of conditioning sets
Relies on the correctness of the pooled IOD independence test.
Finite-sample instability can propagate across datasets.
Produces a PAG, which may be highly partially oriented when coverage is sparse.
17.6. Key Parameters in Tetrad
Parameter (camelCase) |
Description |
|---|---|
|
Should be set to IndTestIod, which performs pooled CI tests across datasets. |
|
The list of input datasets (each a DataSet object) with overlapping variable sets. |
|
Significance level for CI testing. |
|
Max conditioning-set size. |
|
Whether to print detailed processing logs. |
|
Background knowledge (required/forbidden edges, tiers). |
17.7. Reference
Tillman, R., & Spirtes, P. (2011). Learning equivalence classes of acyclic models with latent and selection variables from multiple datasets with overlapping variables. In AISTATS, pp. 3–15.
17.8. Summary
FCI-IOD extends FCI to the setting of multiple datasets with overlapping variable sets, pooling CI information through the IOD test to produce a unified, latent-variable–robust PAG over all variables.