Curious Soul
Home Projects Writing Resume collaborate ↗
Writing

How I built a tool to solve my own visa problem

The tech industry moves fast and sometimes it moves you out the door before you're ready. I faced two layoffs in three years, both due to restructuring and funding issues that had nothing to do with my work. The first one knocked me down hard. I spent weeks convinced my career was over, that the timing was wrong, that the market was impossible. The second one hit differently. By then I'd learned something: waiting for the right circumstances is a losing strategy. You have to build your way out.

The problem with the standard H1B playbook
For most international candidates in the US, the H1B process feels like a lottery you just have to accept. You apply to large companies, hope your number gets drawn in the annual cap, and if it doesn't, you start the cycle over. What most people don't realize or don't act on is that a significant category of employers exists entirely outside that lottery.

Cap-exempt employers universities, nonprofits, certain research institutions and teaching hospitals can sponsor H1B visas year-round with no cap and no lottery. The problem is there's no clean, searchable directory of them. The data exists, but it is buried in Department of Labor LCA disclosure files: multi-year Excel spreadsheets with inconsistent formatting, thousands of rows, and no obvious way to filter for what actually matters: which organizations have a real track record of sponsoring, and are they currently hiring for roles like mine?

Job boards like HigherEdJobs and Idealist exist, but they don't solve the core problem. In today's restrictive market, many nonprofits aren't sponsoring at all. What I needed was a ranked list of employers with proven sponsorship history, linked directly to where their jobs actually live.

Building the pipeline
The system downloads three years of DOL LCA data, parses each file to identify cap-exempt employers using NAICS codes and keyword filters, aggregates H1B petition counts by employer across years, and ranks the top sponsors. It then uses the Serper Google Search API to locate each employer's careers page and writes everything to a clean CSV.

The filtering logic is a hybrid: NAICS prefixes for higher education, hospitals, and membership organizations, combined with employer name keywords like "university", "medical center", "institute", "national laboratory". Neither approach alone is sufficient; the combination catches what either misses.

Finding clean careers page URLs turned out to be the most fragile step. Direct Google scraping triggers bot detection. Serper's API solves that, but the results still require filtering: aggregator sites like LinkedIn, Indeed, and Glassdoor get blocked, and URL paths are checked for careers or jobs keywords to improve link quality.

The insight I didn't expect
I went into this project thinking the hard problem was finding job listings. It wasn't. The hard problem is knowing which employers are worth applying to in the first place.

Most candidates waste enormous energy applying broadly across hundreds of companies with no sponsorship history, inconsistent hiring patterns, and no real fit signal. Narrowing the universe first to organizations that have actually sponsored H1Bs consistently over three years changes the entire strategy. You apply to fewer places with much higher precision.

The output isn't exciting to look at. It's two CSV files. But what those files represent is a fundamentally different way of approaching an international job search: evidence-based, not luck-based.

What's next
A searchable frontend UI so anyone can filter by state, industry, and H1B sponsorship volume without touching a spreadsheet. And an extension to the job scraping layer more reliable extraction from the university and nonprofit portals that the current Playwright scraper struggles with.