24. GFFC — Generalized Find Factor Clusters

Type: Latent cluster discovery
Output: Clusters of variables consistent with latent factors of increasing rank
Algorithms included: FOFC (tetrads), FTFC (sextads), and higher-order n-tad tests

GFFC (“Generalized FOFC/FTFC”) is the most flexible of the factor-clustering algorithms. It systematically searches for latent factor clusters of increasing rank — starting with rank-1 clusters (two indicators, tetrad constraints), then rank-2 clusters (three indicators, sextad constraints), then rank-3 clusters (four indicators), and so on.

At each stage it:

Runs the appropriate n-tad purity test (tetrads, sextads, or larger minors).
Identifies pure clusters of size 2*(rank+1) (e.g., 4, 6, 8, …).
Removes all variables in those clusters from further search.
Moves on to the next rank.
Stops when no more clusters can be formed from the remaining variables.

The result is a set of disjoint clusters, each representing a latent factor with the number of indicators implied by its rank.

GFFC is a generalization of:
Kummerfeld & Ramsey (2016), “Causal clustering for 1-factor measurement models,” KDD.

24.1. Key Idea

Where FOFC and FTFC search for only one cluster type, GFFC performs a hierarchical search over cluster sizes.

For rank r:

A pure cluster must contain 2*(r+1) observed variables.
- Rank 1 → size 4 (tetrads)
- Rank 2 → size 6 (sextads)
- Rank 3 → size 8
- etc.
These variables must satisfy the appropriate vanishing constraints (tetrads for r=1, sextads for r=2, or higher-order minors).
Substituting any one variable in the set with any unclustered variable should break the constraint — ensuring the cluster is genuinely generated by a single latent factor (purity test).

This ensures that clusters are not only algebraically compatible with a latent factor, but also uniquely determined by those equalities.

24.2. Algorithm Overview

GFFC proceeds as follows:

Start with all observed variables unclustered.
Rank 1 (tetrad clusters):
- Generate candidate 4-variable sets.
- Test whether all tetrad relations vanish.
- Apply the substitution test to guarantee purity.
- Accept pure clusters and remove their variables.
Rank 2 (sextad clusters):
- On the remaining variables, generate candidate 6-variable sets.
- Test sextad constraints.
- Apply the substitution test.
- Accept pure clusters and remove them.
Rank 3, 4, … (higher-order):
- Continue with 8-variable sets, 10-variable sets, etc.
- Stop when:
  - no cluster of the required size can be formed, or
  - cluster purity fails for all combinations.
Return all discovered clusters, each labeled with its rank.

The algorithm is greedy across ranks but exact within each rank: no variable participates in more than one cluster, and later stages never reconsider already clustered variables.

24.3. Why Use GFFC?

Unifies FOFC and FTFC into a single framework.
Detects latent factors with different numbers of indicators in the same dataset.
Ensures clusters are non-overlapping and purified via substitution tests.
Provides a measurement-model structure suitable for further causal analysis (e.g., applying PC on latent parents).

24.4. Strengths

Automatically adapts to the data: identifies 2-indicator, 3-indicator, or larger clusters.
Substitution test provides protection against spurious clusters.
Works directly from the covariance/correlation matrix — no iterative SEM fitting required.

24.5. Limitations

Computational cost grows with rank (combination explosion).
Sextad and higher-order tests require larger sample sizes.
Returns only simple clusters (single-factor measurement models), not complex cross-loadings.

24.6. Parameters in Tetrad

Parameter	Description
`alpha`	Significance level for vanishing-minor tests (tetrads, sextads, etc.).
`ess`	Equivalent sample size used in rank/minor calculations.
`verbose`	If true, prints intermediate clusters and purity checks.
`rMax`	Maximum rank to consider (default is 2 in many use cases).

24.7. Reference

Kummerfeld, E., & Ramsey, J. (2016).
Causal clustering for 1-factor measurement models. Proceedings of KDD.

24.8. Summary

GFFC generalizes FOFC and FTFC by iteratively searching for pure latent clusters of increasing rank. It begins with tetrad-based clustering, removes those variables, then performs sextad-based clustering on the rest, and continues until no more clusters can be formed. This produces a set of mutually exclusive latent clusters, each corresponding to a latent factor with a known number of indicators.