24. GFFC — Generalized Find Factor Clusters
Type: Latent cluster discovery
Output: Clusters of variables consistent with latent factors of increasing rank
Algorithms included: FOFC (tetrads), FTFC (sextads), and higher-order n-tad tests
GFFC (“Generalized FOFC/FTFC”) is the most flexible of the factor-clustering algorithms. It systematically searches for latent factor clusters of increasing rank — starting with rank-1 clusters (two indicators, tetrad constraints), then rank-2 clusters (three indicators, sextad constraints), then rank-3 clusters (four indicators), and so on.
At each stage it:
Runs the appropriate n-tad purity test (tetrads, sextads, or larger minors).
Identifies pure clusters of size
2*(rank+1)(e.g., 4, 6, 8, …).Removes all variables in those clusters from further search.
Moves on to the next rank.
Stops when no more clusters can be formed from the remaining variables.
The result is a set of disjoint clusters, each representing a latent factor with the number of indicators implied by its rank.
GFFC is a generalization of:
Kummerfeld & Ramsey (2016), “Causal clustering for 1-factor measurement models,” KDD.
24.1. Key Idea
Where FOFC and FTFC search for only one cluster type, GFFC performs a hierarchical search over cluster sizes.
For rank r:
A pure cluster must contain
2*(r+1)observed variables.Rank 1 → size 4 (tetrads)
Rank 2 → size 6 (sextads)
Rank 3 → size 8
etc.
These variables must satisfy the appropriate vanishing constraints (tetrads for r=1, sextads for r=2, or higher-order minors).
Substituting any one variable in the set with any unclustered variable should break the constraint — ensuring the cluster is genuinely generated by a single latent factor (purity test).
This ensures that clusters are not only algebraically compatible with a latent factor, but also uniquely determined by those equalities.
24.2. Algorithm Overview
GFFC proceeds as follows:
Start with all observed variables unclustered.
Rank 1 (tetrad clusters):
Generate candidate 4-variable sets.
Test whether all tetrad relations vanish.
Apply the substitution test to guarantee purity.
Accept pure clusters and remove their variables.
Rank 2 (sextad clusters):
On the remaining variables, generate candidate 6-variable sets.
Test sextad constraints.
Apply the substitution test.
Accept pure clusters and remove them.
Rank 3, 4, … (higher-order):
Continue with 8-variable sets, 10-variable sets, etc.
Stop when:
no cluster of the required size can be formed, or
cluster purity fails for all combinations.
Return all discovered clusters, each labeled with its rank.
The algorithm is greedy across ranks but exact within each rank: no variable participates in more than one cluster, and later stages never reconsider already clustered variables.
24.3. Why Use GFFC?
Unifies FOFC and FTFC into a single framework.
Detects latent factors with different numbers of indicators in the same dataset.
Ensures clusters are non-overlapping and purified via substitution tests.
Provides a measurement-model structure suitable for further causal analysis (e.g., applying PC on latent parents).
24.4. Strengths
Automatically adapts to the data: identifies 2-indicator, 3-indicator, or larger clusters.
Substitution test provides protection against spurious clusters.
Works directly from the covariance/correlation matrix — no iterative SEM fitting required.
24.5. Limitations
Computational cost grows with rank (combination explosion).
Sextad and higher-order tests require larger sample sizes.
Returns only simple clusters (single-factor measurement models), not complex cross-loadings.
24.6. Parameters in Tetrad
Parameter |
Description |
|---|---|
|
Significance level for vanishing-minor tests (tetrads, sextads, etc.). |
|
Equivalent sample size used in rank/minor calculations. |
|
If true, prints intermediate clusters and purity checks. |
|
Maximum rank to consider (default is 2 in many use cases). |
24.7. Reference
Kummerfeld, E., & Ramsey, J. (2016).
Causal clustering for 1-factor measurement models. Proceedings of KDD.
24.8. Summary
GFFC generalizes FOFC and FTFC by iteratively searching for pure latent clusters of increasing rank. It begins with tetrad-based clustering, removes those variables, then performs sextad-based clustering on the rest, and continues until no more clusters can be formed. This produces a set of mutually exclusive latent clusters, each corresponding to a latent factor with a known number of indicators.