Data Types and Formats

Tetrad accepts several file formats for loading datasets, covariance matrices,
and correlation matrices. This page documents all currently supported formats.

The goal is to eliminate common parsing errors and clarify exactly what Tetrad expects.


1. Overview of Supported Formats

Tetrad supports:

  • Tabular data files (continuous, discrete, or mixed)

  • Covariance matrices (full square or legacy lower-triangular)

  • Correlation matrices (full square or legacy lower-triangular)

This page covers these data formats.


2. Dataset Format (Tabular Data)

Tabular datasets must follow this structure:

  1. Header row: variable names

  2. One data row per case

  3. Values separated by tabs, commas, or spaces

Example:

X Y Z
1.2 5.0 3
0.9 4.8 3
1.4 5.1 2

Notes

  • Missing values are written by default as: *, though other options are also supported.

  • Mixed data (continuous + discrete) is allowed.

  • Variable order in the header determines column order throughout the file.

  • It is important not to end a line with a delimiter, as this can cause parsing errors.


3. Discrete Data

Discrete data follows the same structure but the values must be integers or strings
representing categories.

Example:

A B C
0 1 yes
1 0 no
2 1 yes

4. Continuous Data

Continuous data uses the same tabular format, but all values should be numeric:

HEIGHT WEIGHT AGE
66.1 150.2 33
70.3 182.4 44

5. Covariance and Correlation Matrices

Tetrad now supports two equivalent file formats for covariance and correlation matrices:

  1. A full square matrix, preceded by a sample-size line and a variable-name line (current default)

  2. A lower-triangular matrix preceded by a sample-size line and a variable-name line (still fully supported for backward compatibility)

Both formats use the same header:

  1. Sample size (integer, on its own line)

  2. Variable names (space-separated or tab-separated)

  3. Matrix values (either full square or lower-triangular)

Any deviation from this structure will cause Tetrad to reject the file.


5.1 Required Structure

A valid covariance/correlation file must contain:

  1. Sample size (integer, on its own line)

  2. Variable names (space-separated or tab-separated)

  3. Matrix body, in one of the two supported shapes:

Here p is the number of variables (the number of names on the header line).

  • Lower-triangular format

    • Row 1: 1 value

    • Row 2: 2 values

    • …

    • Row p: p values

  • Full square format (current default)

    • Row 1: p values

    • Row 2: p values

    • …

    • Row p: p values


5.2 Lower Triangle Covariance Matrix Example

Below is a complete, valid lower-triangular covariance file.
This is a classic example that has been used by Tetrad historically.

164
ABILITY GPQ PREPROD QFJ SEX CITES PUBS
1.0
 .62 1.0
 .25 .09 1.0
 .16 .28 .07 1.0
 -.10 .00 .03 .10 1.0
 .29 .25 .34 .37 .13 1.0
 .18 .15 .19 .41 .43 .55 1.0

Important Characteristics of the Lower Triangular Format

  • Line 1: sample size (164)

  • Line 2: variable names (7 variables)

  • Lines 3–9: lower triangle of the covariance matrix

  • Whitespace before numbers is acceptable

  • Diagonal entries must be included (1.0 above)

  • No commas (1,0 is invalid)

  • No extra blank lines at the end

  • The number of rows in the lower triangle must equal the number of variables

  • It is important not to end a line with a delimiter, as this can cause parsing errors.

This format is still accepted for both covariance and correlation matrices.


5.3 Full Square Covariance Matrix Example (Current Default)

When a covariance matrix is exported from Tetrad, it is now written in full square format.
The same example as above, written as a full 7 Γ— 7 matrix:

164
ABILITY GPQ PREPROD QFJ SEX CITES PUBS
1.00   0.62  0.25  0.16 -0.10  0.29  0.18
0.62   1.00  0.09  0.28  0.00  0.25  0.15
0.25   0.09  1.00  0.07  0.03  0.34  0.19
0.16   0.28  0.07  1.00  0.10  0.37  0.41
-0.10  0.00  0.03  0.10  1.00  0.13  0.43
0.29   0.25  0.34  0.37  0.13  1.00  0.55
0.18   0.15  0.19  0.41  0.43  0.55  1.00

You may use tabs, spaces, or the chosen delimiter; the key requirement is that:

  • The matrix is square, with exactly p values on each of p lines,

  • The order of variables matches the header line.

  • It is important not to end a line with a delimiter, as this can cause parsing errors.


5.4 Correlation Matrices

Correlation matrices use the same two formats as covariance matrices:

  • Full square matrix (current export default)

  • Lower triangle (still accepted)

Example (full square):

500
X Y Z
1.0  0.20 -0.10
0.20 1.0   0.35
-0.10 0.35 1.0

Lower-triangular example:

500
X Y Z
1.0
 .20 1.0
 -.10 .35 1.0

5.5 Common Parsing Errors for Covariance/Correlation Files

Users often encounter:

  • Missing sample size (first line must be an integer)

  • Variable names not matching matrix dimensions

    • For square format: not exactly p values per row, or not exactly p rows

    • For lower-triangular format: wrong number of values in a given row

  • Mixing square and triangular conventions in a single file

  • Trailing blank lines

  • Use of commas instead of decimal points

  • Extra text or comments in the matrix region

Each of these produces a β€œCould not parse covariance matrix” error.


6. Lower-Triangular Format

The lower-triangular matrix format is kept as a legacy option:

  • All existing covariance and correlation files using the old lower-triangular convention
    will continue to load without modification, provided the user selects the Lower Triangular option.

  • New users are encouraged to use the square format, which aligns with conventions in R and Python.

  • Internally, Tetrad treats both formats as representing the same symmetric matrix.

There is no need to convert existing lower-triangular files unless you prefer the square style.


6.1 Note on GUI Display

In the Tetrad GUI, covariance and correlation matrices are currently displayed using only the lower triangle, even if they were loaded from a full square matrix file. This is purely a display choice:

  • The matrix is still treated as a full symmetric covariance/correlation matrix internally.

  • Only the lower triangle is shown to reduce redundancy and to emphasize that the object is a covariance or correlation matrix.

This behavior is independent of whether the file itself was in square or lower-triangular format.


7. Exporting Data from Tetrad

Tetrad can export:

  • Tabular datasets

  • Covariance matrices

  • Correlation matrices

Export behavior:

  • Covariance and correlation matrices are now exported in full square format by default.

  • The lower-triangular format is no longer used for export, but is still supported for import.

This means:

  • You can load either full square or lower-triangular files.

  • When you save from Tetrad, you will get the square form.


8. Summary

Format Type

Supported?

Notes

Tabular data

βœ”

Standard space/tab/comma-separated file

Lower Triangle covariance/correlation matrix

βœ” (legacy)

Fully supported for backward compatibility

Full covariance/correlation square matrix

βœ”

Current default import/export format

Graph formats

βœ”

See graph file section