21. FOFC — Find One-Factor Clusters

Type: Latent cluster discovery
Output: A set of clusters, each corresponding to a single latent factor

FOFC (Find One-Factor Clusters) detects groups of observed variables that share a single latent parent. It does this by testing tetrad constraints, algebraic equalities that hold exactly when a set of variables is generated by the same latent factor. FOFC is usually the first step in pipelines that incorporate latent variables into causal discovery.


21.1. Key Idea

Suppose several observed variables are indicators of the same latent factor. Then, for any four of them, a specific algebraic expression involving products of covariances must equal zero in the population. These expressions are called tetrads. FOFC identifies clusters of variables by:

  1. Computing the correlation matrix.

  2. Testing whether all tetrads for a candidate group vanish at significance level alpha.

  3. Applying a “substitution test”: for every variable inside the group, replacing it with any outside variable should break the tetrad relationships.
    This ensures the cluster is pure, not accidentally formed by correlations or mixed latent structure.

  4. Growing the cluster by adding additional variables that also satisfy all tetrad tests relative to the existing cluster.

  5. Returning all such clusters as one-factor groups.

Everything is done using statistical tetrad tests rather than fitting full factor models.


21.2. When to Use

  • When latent factors are known or suspected.

  • When performing MimBuild, GIN, or Blocks-Test-TS analyses.

  • Before running causal discovery to factor out measurement structure.

  • When indicators are continuous or approximately continuous.

FOFC works best when latent variables each have at least three indicators.


21.3. Prior Knowledge Support

FOFC does not incorporate background knowledge (forbidden/required edges, tiers, etc.).
It is purely a clustering procedure on the covariance matrix.


21.4. Strengths

  • Identifies latent measurement structure directly from covariances.

  • Robust to some violations of linearity because tetrads capture broad patterns, not detailed SEM fits.

  • Fast enough to be used routinely before causal discovery.

  • Helps remove confounding due to measurement structure.


21.5. Limitations

  • Requires continuous data or at least near-continuous behavior.

  • Needs adequate sample size for reliable tetrad testing.

  • Only finds one-factor clusters; multi-factor models need FTFC, GFFC, or BPC.

  • Sensitive to highly correlated factors or weak loadings.


21.6. Key Parameters in Tetrad

Parameter (camelCase)

Description

alpha

Significance level for tetrad tests.

ess

Equivalent sample size used in rank and tetrad tests.

verbose

Whether to print detailed cluster-finding steps.


21.7. Reference

Kummerfeld, E., & Ramsey, J. (2016).
Causal clustering for 1-factor measurement models. Proceedings of KDD.


21.8. Summary

FOFC discovers clusters of indicators generated by a single latent factor by testing tetrad constraints. It is a simple, effective preprocessing step for any causal discovery workflow involving latent variables.