Curious Soul
projects writing collaborate ↗
← Back to all projects
Project · Personal · Shipped

USGS Earthquake Dashboard

A clean, fast Streamlit dashboard for exploring global seismic activity, backed by a pandas pipeline that turns messy USGS GeoJSON into a reusable analytical dataset.

Python Streamlit Docker pandas USGS GeoJSON API
01 · The problem & why I built it

Raw USGS data is rich but unreadable.

The USGS publishes a comprehensive real-time earthquake feed. The data is excellent: magnitude, depth, location, tsunami flags, felt reports, alert levels, all of it. The problem is that consuming it directly is painful. The GeoJSON structure is nested, fields are inconsistent, location strings come in formats like "23km SSW of Volcano, Hawaii", and there's no built-in way to filter or chart it.

I wanted a tool that would let me actually look at the data. Filter by country, see where events cluster, check how depth correlates with magnitude, look at monthly trends. Less of an "earthquake monitoring" tool, more of a "give me a fast lens on whatever USGS just published" tool.

02 · Architecture & flow

A pipeline that cleans once, a dashboard that reads fast.

The deliberate split: heavy cleaning runs in a separate pipeline that writes a clean CSV. The dashboard just reads that CSV. That means the Streamlit app starts in under a second and never has to parse raw GeoJSON at request time.

USGS GeoJSON API (real-time feed) │ ▼ pipeline.py ┌────────────────────────────────┐ │ fetch + normalize JSON │ │ flatten to DataFrame │ │ timezone + timestamp cleanup │ │ region / state / country parse │ │ magnitude + depth bucketing │ │ derived metrics: │ │ dmin_km, is_felt, │ │ time_gap_secs, is_above_norm │ └────────────────────────────────┘ │ ▼ earthquake_clean.csv │ ▼ app.py ┌────────────────────────────────┐ │ Streamlit dashboard │ │ ├ KPI: total events │ │ ├ Geo map of event locations │ │ ├ Monthly count bar chart │ │ ├ Magnitude type breakdown │ │ ├ Magnitude × significance │ │ └ Depth × magnitude scatter │ └────────────────────────────────┘ │ ▼ Dockerfile (containerized)
03 · Key technical decisions & bottlenecks

The constraints came from the data, not the tools.

Decision Streamlit over a custom React frontend

I needed an interactive dashboard with maps and charts, with the lowest possible setup cost so I'd actually finish.

Why Streamlit gives me Python-native widgets, built-in caching, and a one-line deployment story. The trade-off is less UI polish, but for a personal analytics tool, that's the right call. I'd reach for React only if I needed custom interactions Streamlit can't express.

Bottleneck USGS location strings are not parseable

The place field looks like "23km SSW of Volcano, Hawaii" or "central Mid-Atlantic Ridge". There's no formal schema. Sometimes it ends with a country, sometimes a US state, sometimes a geographic region with no political boundary.

Resolution A multi-step parser in pipeline.py:

  • Split on the rightmost comma to isolate the location tail.
  • Map known US state names → country = "US".
  • Match known country names directly.
  • Tag everything else as "Unknown / Region" rather than guessing.

Why this conservative approach: Wrong country labels would silently break the filter. Better to mark unknowns honestly and surface them as their own bucket than to over-confidently mis-classify.

Bottleneck Doing cleanup on every dashboard load

The first version of the app called the USGS API and ran the full cleaning pipeline on every page render. Load time was 8+ seconds and the dashboard felt unusable.

Resolution Pre-compute. pipeline.py runs separately (or on a schedule), writes earthquake_clean.csv to disk. app.py only reads that file. Startup drops to under a second.

Why this matters: It's a small change in architecture but a big change in UX. It also separates concerns properly. The pipeline can fail or get updated without breaking the dashboard.

Decision Feature engineering at the pipeline level

Derived metrics like is_felt, time_gap_secs, magnitude buckets, and "higher-than-normal magnitude" flags could live in the dashboard layer.

Approach Compute them once in the pipeline and store them as columns in the clean CSV. The dashboard reads them directly.

Why: It keeps the dashboard code purely about display. Every derived field is documented in one place. If I ever want to swap Streamlit for something else, the cleaned dataset is portable.

Decision Containerizing with Docker

The app has a handful of dependencies (Streamlit, pandas, plotly, requests). Running it on a fresh machine is annoying.

Approach A small Dockerfile that pins the Python version, installs requirements, and exposes the Streamlit port. docker run and the dashboard is live.

04 · Results & what they mean

An exploration tool, not a monitoring tool.

The shipped dashboard gives me:

The pandas pipeline is the part I'm most proud of. The clean CSV is the reusable artifact. Anyone can pick it up and build their own analysis on top (Tableau, a notebook, a different framework). The dashboard is one consumer. It's not the product.

Engineering lesson that generalizes: separate the data-prep layer from the consumption layer, even on small projects. It costs almost nothing to do early and saves real effort later.

05 · What I'd build next

Where this could go.

See the code

All projects live on GitHub. Issues and PRs welcome.

View on GitHub ↗ Email me about this →
← vaibhavimutya.github.io Built by me, obviously.