STOP BUILDING CHATBOTS: PRACTICAL ML WITH CITY DATA

19/12/2025
3-minute read
485 words

If I see one more “Talk to your PDF” app on Product Hunt, I might scream.

We are living through the greatest democratization of AI in history, and 90% of engineers are using it to generate marketing copy. Meanwhile, our cities are generating terabytes of high-value, structured data that is begging for analysis—and it’s completely free.

I recently downloaded the Austin Crash Report Data (a publicly available CSV). It contains every traffic incident in Austin, Texas, for the last ten years. That is not just data; that is a map of human risk.

Instead of asking an LLM to “write a poem about traffic,” I used boring, old-school Machine Learning to find out where cyclists are actually getting hit.

The “Boring” Stack (No Installs Required)

You don’t need a GPU cluster, an OpenAI API key, or even a local Python environment for this. You just need a web browser and Google Colab.

Within Colab, you’ll leverage:

DuckDB: To crunch the 10-year CSV file efficiently (referenced in my modern data stack guide /). Easily installable in Colab with !pip install duckdb.
Scikit-Learn: To train a Random Forest classifier. This comes pre-installed in Colab.
Python: To glue it all together. Also pre-installed.

Ready to try it yourself? You can start by downloading the Austin Crash Report Data and following along with the analysis using DuckDB and Scikit-Learn in Google Colab.

The Insight

I filtered the data for “Vulnerable Road Users” (cyclists and pedestrians). Then, I trained a model to predict accident severity based on features like:

Time of day
Weather conditions
Road geometry (intersection vs. straightaway)
Speed limit

The model didn’t just tell me where accidents happen (we have heatmaps for that). It told me why.

It revealed that certain “safe” neighborhood streets had a higher probability of severe injury than major arterials during specific twilight hours. It highlighted that accidents at 4-way stops were less frequent but far more likely to result in hospitalization than those at traffic lights.

Why This Matters

This is the difference between Generative AI and Predictive AI.

Generative AI (LLMs) creates new content. It is fun, creative, and sometimes useful. Predictive AI (Classic ML) discovers patterns in existing reality. It is boring, math-heavy, and it saves lives.

We used similar techniques in my guide to predicting purchases with BigQuery ML /. The math is the same whether you are predicting a shopping cart checkout or a car crash.

A Challenge to Engineers

The next time you open your IDE to build a side project, pause. Do not npm install openai.

Go to data.gov or your local city’s open data portal. Download a CSV about water quality, restaurant inspections, or emergency response times. Build a model that highlights a problem your city council missed.

A chatbot can explain what a pothole is. A logistic regression model can tell you which street is about to crumble.

Build the thing that fixes the street.

ai architecture data-engineering