Detail: Simulation typesο
Tetrad includes several built-in simulators for generating synthetic data from a known causal model. These are mainly used for:
testing algorithms on data where the βtrueβ graph is known,
sanity-checking modeling assumptions (linearity, additivity, discreteness, Gaussianity),
benchmarking and debugging search and estimation code.
Most simulators follow the same high-level pattern:
Generate (or accept) a graph, usually a DAG.
Assign a structural equation or conditional distribution to each node.
Sample exogenous noise terms (or latent randomness).
Generate samples in a valid causal (topological) order (or, for time series, in temporal order).
Return a dataset (continuous, discrete, mixed, or time series).
Below are the main simulation types available in Tetrad, what they assume, and when to use them.
Bayes netο
Use when: you want fully discrete data generated from a DAG using conditional probability tables (CPTs).
What it generates - All variables are discrete. - Each node is sampled from a multinomial distribution conditional on its parentsβ discrete states. - The local conditional distribution is represented as a CPT (or an equivalent discrete parameterization).
Conceptual form For each node X_i with parents Pa(i), P(X_i | X_Pa(i)).
Linear structural equation modelο
Use when: you want a classic linear SEM-style simulator.
What it generates - Continuous variables. - Linear relationships between variables. - Either Gaussian or non-Gaussian noise.
Model form X_i = sum_{j in Pa(i)} b_{ij} X_j + E_i.
Noise structure - Gaussian case: the error terms E_i may be specified with a full covariance matrix, allowing errors to be statistically dependent. - Non-Gaussian case: the error terms E_i are mutually independent.
Notes - Allowing correlated Gaussian errors makes this simulator suitable for modeling latent confounding at the noise level. - With independent non-Gaussian noise, the model aligns more closely with assumptions used in some identifiability results.
Linear Fisher modelο
Use when: you want large linear datasets generated using a stimulate-then-settle (equilibrium) mechanism.
What it generates - Continuous data. - Linear dependencies.
Conceptual behavior - The system is repeatedly stimulated with noise. - Variables are updated according to linear relations. - Iteration continues until values settle to equilibrium. - The settled values are recorded as observations.
Nonlinear additive SEM (CAM)ο
Use when: you want nonlinear causal mechanisms with additive contributions from parents, following the Causal Additive Model (CAM) framework of Peters et al.
What it generates - Continuous data. - Each parent contributes additively, but possibly nonlinearly. - Noise is additive and independent.
Model form X_i = sum_{j in Pa(i)} f_{ij}(X_j) + E_i,
where each f_{ij} is a univariate nonlinear function and E_i is an independent noise term.
Notes - This is more structured than a general additive-noise model because the nonlinearity is decomposed parent-by-parent. - Many theoretical results in nonlinear causal discovery are stated for this model class.
General noise SEMο
Use when: you want a flexible nonlinear simulator that does not enforce additive noise.
What it generates - Continuous data. - Nonlinear mechanisms where noise can enter the function in a general way.
Model form X_i = f_i(X_Pa(i), E_i),
where E_i is an exogenous noise term that is independent across nodes but not required to appear additively.
Notes - Noise may interact with parent variables inside nonlinearities. - This simulator is useful for stress-testing robustness beyond additive-noise assumptions.
Additive noise SEMο
Use when: you want a general additive-noise model without the CAM restriction of additive parent contributions.
What it generates - Continuous data. - A (possibly multivariate) nonlinear function of all parents, plus additive noise.
Model form X_i = f_i(X_Pa(i)) + E_i,
where E_i is independent noise.
Contrast with nonlinear additive SEM (CAM) - CAM: sum of univariate functions, one per parent. - Additive noise SEM: a single (possibly multivariate) nonlinear function of all parents.
Lee and Hastieο
Use when: you want simulated mixed continuous and discrete data following the Lee and Hastie framework.
What it generates - A mix of discrete and continuous variables. - Structured conditional distributions ensuring coherent mixed-type behavior.
Conceptual behavior - Discrete parents of continuous children primarily affect distributional parameters (e.g., the mean). - Continuous parents influence continuous children in a regression-like way. - Discrete children are generated from appropriate discrete conditional models.
Conditional Gaussianο
Use when: you want mixed discrete/continuous data from a conditional Gaussian model.
What it generates - Variables designated as discrete or continuous. - Continuous variables are Gaussian conditional on discrete parent configurations.
Conceptual form X_i | (D=d, C=c) ~ N(mu(d,c), Sigma(d)),
with mu often linear in c for each discrete configuration d.
Time seriesο
Use when: you want temporally ordered data with lagged dependencies.
What it generates - Time-indexed variables. - Dependencies across time lags.
Conceptual form X_i(t) = f_i({X_j(t-l)}) + E_i(t),
where l ranges over specified lags and E_i(t) are innovation terms.
Choosing a simulatorο
Discrete only: Bayes net
Linear continuous: Linear structural equation model or Linear Fisher model
Nonlinear additive (parent-wise): Nonlinear additive SEM (CAM)
Nonlinear additive (general): Additive noise SEM
Nonlinear with general noise injection: General noise SEM
Mixed discrete/continuous: Lee and Hastie or Conditional Gaussian
Temporal structure: Time series