Grid Search (Simulation)

This page describes how to use Grid Search in Tetrad when working from a simulation rather than a fixed dataset.

In simulation-based Grid Search, Tetrad repeatedly generates data from a specified simulation model and evaluates causal discovery algorithms across multiple parameter settings. This workflow is especially useful for method comparison, sensitivity analysis, and benchmarking under controlled conditions.




Step 1: Select a Simulation

  1. Add a Grid Search box to the workspace.

  2. Connect it to a Simulation box.

  3. In the Simulation editor:

    • Choose a graph type (e.g., random DAG, scale-free)

    • Choose a simulation model (e.g., linear Gaussian, nonlinear)

    • Set simulation parameters (number of variables, sample size, noise level, etc.)

Only one simulation may be active at a time.


Step 2: Algorithms Tab

In the Algorithms tab:

  1. Click Add Algorithm

  2. Select one or more causal discovery algorithms

  3. Choose compatible tests or scores

  4. Optionally edit algorithm, test, or score parameters

As with data-based Grid Search, parameters may be specified as comma-separated lists, and all combinations will be explored.


Step 3: Table Columns Tab

In the Table Columns tab, select statistics and parameters to report.

Because the true graph is known in simulation mode, you may include:

  • Adjacency precision / recall

  • Arrowhead precision / recall

  • Structural Hamming Distance (SHD)

  • Other truth-based performance measures

You may also include:

  • Markov checking statistics

  • Estimated graph properties (e.g., number of edges)

  • Parameter values

Choose a small, interpretable set of columns to keep comparisons readable.


Step 4: Comparison Tab

In the Comparison tab:

  • Choose a comparison graph type (e.g., DAG, CPDAG, PAG)

  • Select a truth graph or derived graph for evaluation

  • Configure utilities for truth-based statistics if sorting by utility

  • Choose Markov checking options if desired

Truth-based utilities are meaningful here because the ground truth is known.


Step 5: Run Counts and Randomness

Simulation-based Grid Search allows you to specify how many times each configuration is run.

Key options include:

  • Number of runs per configuration

  • Random seed (if reproducibility is desired)

  • Aggregation method (e.g., mean statistics across runs)

Increasing the number of runs improves stability but increases computation time.


Running the Comparison

Click Run Comparison to begin.

For each algorithm and parameter combination, Grid Search will:

  1. Generate data from the simulation

  2. Run the algorithm

  3. Compute selected statistics

  4. Aggregate results across runs

Progress and detailed logs appear in the Verbose Output tab.


Interpreting Simulation Results

Simulation-based Grid Search is best interpreted comparatively:

  • Compare algorithms under identical conditions

  • Examine trade-offs between:

    • accuracy,

    • complexity,

    • robustness,

    • and consistency

  • Identify regimes where methods perform well or fail

Avoid focusing on single rows; patterns across conditions are more informative.


Common Pitfalls

  • Sweeping too many parameters at once

  • Using too few simulation runs

  • Over-interpreting small differences

  • Ignoring failure cases

Simulation studies are most valuable when they reveal limitations, not just successes.


Summary

Simulation-based Grid Search allows you to:

  • Evaluate causal discovery methods under controlled conditions

  • Use truth-based performance metrics responsibly

  • Understand sensitivity to modeling choices

  • Compare algorithms systematically

It complements data-based Grid Search by answering methodological questions rather than applied ones.


🧭 Next Steps

  • Compare results across multiple simulations

  • Vary assumptions systematically

  • Use insights to guide applied analyses on real data