# Example: Auto MPG Analysis with Grid Search

This page walks through a complete causal analysis workflow in Tetrad using the **Auto MPG** dataset.  
It illustrates how to move from data exploration to model selection using **Grid Search** and **Markov checking**, following the default workflow recommended in this manual.

The goal is not to identify a single “true” causal graph, but to show how to arrive at a **minimal, Markov-consistent model** under clearly stated assumptions.

---

## 1. The Auto MPG Dataset

We use the Auto MPG dataset from the CMU causal datasets repository:

- Repository:  
  https://github.com/cmu-phil/example-causal-datasets
- Data file used:  

  `real/auto-mpg/data/auto-mpg.data.mixed.max.3.categories.txt`

### Data Preparation

Before loading the data into Tetrad, we made two simple preprocessing decisions:

1. **Removed the car name field**, which serves as an identifier and is not meaningful for causal modeling.
2. **Removed rows with missing values**, to keep the example focused on the core workflow rather than missing-data handling.

The resulting dataset contains:

- Several continuous variables (e.g., `mpg`, `weight`, `horsepower`)
- One discrete variable (`origin`) with **three categories**

Because of this mixture, the data should be loaded as **mixed data** with a maximum of **3 categories**, as indicated in the file name.

---

## 2. Loading and Exploring the Data in Tetrad

1. Load the dataset into a **Data box**.
2. Specify that the data are **mixed**, with a maximum of 3 categories.

### Visual Exploration

![Plot matrix for the Auto MPG data.](../../_static/images/tetrad-interface/box-by-box/example-data-plotmatrix.png)

Using the **Plot Matrix** tool in the Data box, we observe:

- Strong, approximately **linear relationships** among many pairs of variables
- No obvious nonlinear clusters or sharp discontinuities
- Patterns consistent with additive, roughly Gaussian noise

These observations suggest that **linear-Gaussian modeling assumptions** are reasonable for this dataset, even though one variable is discrete.

---

## 3. Algorithm Choice and Assumptions

### Causal Sufficiency

For this example, we **assume causal sufficiency**:

- All major common causes of the measured variables are assumed to be observed.
- We therefore search for a **CPDAG** (a Markov equivalence class of DAGs), rather than a PAG.

This is a modeling assumption made for illustration purposes; it simplifies the workflow and is reasonable for this dataset.

---

### Algorithm: BOSS

We choose **BOSS**, a score-based search algorithm, because:

- It performs well in linear settings
- It scales well for systematic exploration
- It integrates naturally with score-based model comparison

---

### Score: Degenerate Gaussian BIC

Based on data exploration, we select the **Degenerate Gaussian BIC** score:

- It is appropriate for **mixed data**
- It aligns with the approximately linear structure seen in the plot matrix
- It supports Markov checking via the **DG-LRT** test

---

## 4. Setting Up the Grid Search

### Step 1: Connect the Data

- Draw an edge from the **Auto MPG Data box** to a **Grid Search** box.

This configures Grid Search to operate directly on the dataset.

---

### Step 2: Algorithms Tab

1. Go to the **Algorithms** tab.
2. Click **Add Algorithm**.
3. Select **BOSS**.
4. Choose **Degenerate Gaussian BIC** as the score.

At this point, leave parameter ranges unchanged.

---

### Step 3: Table Columns Tab

1. Go to the **Table Columns** tab.
2. Click **Add Table Column(s)**.
3. In the dialog, click **Markov Check Columns**.

> **Note:** At present, there is a UI issue that requires scrolling to the bottom of the dialog to ensure all relevant columns are selected.  
> (This will be addressed in a future release.)

4. Click **Add**.

The selected Markov-check statistics should now appear in the table-columns list.

---

### Step 4: Comparison Tab (Initial Setup)

In the **Comparison** tab:

- Set **Comparison Graph Type** to **CPDAG**
- Set **Sort by Utility** to **Yes**
- Set **Markov Checker Test** to  
  **DG-LRT (Degenerate Gaussian Likelihood Ratio Test)**

If you open **Edit Utilities**, you will see that default utilities for the Markov-check statistics are already configured.

---

### Step 5: Set Parameter Ranges

Return to the **Algorithms** tab:

1. Click **Edit Parameters**
2. Open the **Scores** section
3. For **Penalty Discount**, enter:

```
1, 1.2, 1.4, 1.6, 1.8, 2, 2.2, 2.4, 2.6, 2.8, 3, 3.2, 3.4, 3.6
```

This range spans models from relatively dense to relatively sparse.

---

## 5. Running the Comparison

1. Go back to the **Comparison** tab.
2. Click **Run Comparison**.

Grid Search will:

- Run BOSS for each penalty-discount value
- Compute Markov-check statistics for each resulting CPDAG
- Summarize the results in a comparison table

![Grid Search comparison results for the Auto MPG data.](../../_static/images/tetrad-interface/box-by-box/example-data-comparison.png)

---

## 6. Interpreting the Comparison Results

In the comparison table, two columns are especially informative:

- **MC-KSPass**  
  Indicates whether the model passes the Markov check
- **#EdgesEst**  
  Indicates model complexity

A common pattern is visible:

- Very sparse models fail Markov checking
- Very dense models pass but are difficult to interpret
- Several intermediate models pass Markov checks

---

### Choosing a Model

Among the rows where **MC-KSPass = 1**, select the model with the **fewest edges**.

In this example, that corresponds to:

- **Algorithm = 8**

This choice represents a **minimal Markov-consistent CPDAG** under the stated assumptions.

![Selected CPDAG for the Auto MPG example.](../../_static/images/tetrad-interface/box-by-box/example-data-graph8.png)

---

## 7. Viewing the Selected Graph

1. Open the **View Graphs** tab.
2. Select **Algorithm = 8**.

The displayed graph is the final candidate model for this analysis.

---

## 8. What This Example Illustrates

This worked example demonstrates a **complete default workflow** in Tetrad:

1. Explore the data visually
2. Make assumptions explicit
3. Use Grid Search to explore parameter sensitivity
4. Evaluate models using Markov checking
5. Select a minimal model that passes diagnostics

---

## 9. Next Steps

From here, you might:

- Explore alternative assumptions (e.g., allowing latent variables)
- Inspect Markov-check violations in more detail
- Incorporate background knowledge and rerun the analysis
- Use the selected structure for causal effect estimation