< Exercise 01 >

< Data Detective >

Crack open a real CDC dataset · Find something a policymaker needs to see

Time ~15 minutes

Tools Your Claude account + the dataset below

Coding Zero

What This Is

You have a real dataset from the CDC. It tracks fine particulate matter (PM2.5) — the tiny particles that get into your lungs and bloodstream — across five localities near Charlottesville from 2003 to 2011.

The localities: Albemarle County, Charlottesville city, Fluvanna County, Greene County, and Nelson County.

Nine years of monthly air quality readings. About 500 rows. Messy government formatting.

Your job: use Claude to go from “I have a CSV” to “I have something I could show a policymaker” — in three steps.

Download the Dataset

↓ Fine_Particulate_Matter_PM2.5_2003-2011.csv

This is real data pulled directly from the CDC WONDER database — the same system you’ll use throughout your career to look up mortality rates, environmental exposures, birth outcomes, and more. The file has some quirks (a text footer, “Total” rows mixed in, government-style formatting). That’s on purpose. Real data is messy. Claude handles it.

Step 1: Understand

Upload the CSV to Claude and send this:

I just uploaded a CSV from the CDC WONDER database. It contains air quality data (PM2.5 levels) for localities around Charlottesville, Virginia from 2003-2011. Can you review the file, explain what we’re looking at, and flag anything weird about the formatting? Don’t clean anything yet — just tell me what you see.

Read what Claude tells you. Notice how it spots the footer rows, the “Total” rows, the encoding quirks. This is what it means to have an AI collaborator handle the data wrangling so you can focus on the questions that matter.

Then ask Claude whatever you’re curious about. Some ideas — but follow your own instinct:

Which county has the worst air quality overall? Is it getting better or worse over time?

Are there months where the air quality is dramatically worse? Is it the same months every year?

How does Charlottesville city compare to the surrounding rural counties?

Is there anything in this data that would concern you if you were a primary care physician seeing asthma patients in this area?

Spend a few minutes exploring. Let Claude make charts if it offers to. Ask follow-up questions. Find a thread that interests you — a pattern, a comparison, a surprise in the data.

Step 2: Prototype

Once you’ve found something interesting, ask Claude to help you visualize it:

Based on what we’ve been exploring, I want to create a simple visual dashboard that a local health clinic could use. Pick the most important finding from our analysis and create a clear chart that communicates it. Then describe what a fuller dashboard would include — what would a clinic want to see at a glance during high-risk months?

Claude will generate a visualization and describe the concept. If you don’t love it, push back:

“Make it simpler — a clinic receptionist needs to understand this in 5 seconds.”
“Focus on the seasonal pattern instead.”
“What if we compared just Charlottesville vs. Albemarle?”

This is design thinking. You’re iterating on a prototype using natural language.

Step 3: Communicate

Now turn your finding into something actionable:

Take the most compelling insight from our analysis and draft a short public health brief — something I could hand to a local health policymaker or clinic director. Include: a clear headline, 3 key findings in plain language, and one specific recommendation for how clinics could use this data to better protect patients during high-risk periods.

Read the output. If it feels too generic, tell Claude:

Make it more specific to our local data. I want someone from Albemarle County to read this and think “this is about us.”

You’re Done

Post your output to #hds-general on Slack. Share whichever piece you’re most proud of — the chart, the brief, or even just a screenshot of an interesting exchange with Claude. Add one sentence about what surprised you.

Don’t worry about getting it “right.” The point is speed, curiosity, and seeing what’s possible when domain knowledge meets AI. We’ll discuss everyone’s findings on Day 1.

Want More?

Go to CDC WONDER and pull a dataset on something you care about — opioid mortality, infant birth weight, lead exposure, whatever connects to your future specialty. Run through the same three steps. Bring it Monday.