Back to Blog

How I built RFP Search, an AI-powered RFP aggregator

2026-03-256 min read

If you've ever tried to find government RFPs for digital services, you know the pain. Opportunities are scattered across SAM.gov, state procurement portals, Grants.gov, the UN, the EU, and a dozen other places. Each site has its own search interface, its own format, and its own quirks. Checking them all manually is tedious, so I built RFP Search to do it for me.

What it does

RFP Search scrapes 11 government and nonprofit procurement sources every night, runs each opportunity through an AI model to extract structured data, and serves everything through a single searchable interface with filters and an interactive map.

The sources include SAM.gov, Grants.gov, the Texas ESBD, California's Cal eProcure, Florida MFMP, the EU's TED portal, the UN Global Marketplace, NYC Open Data, the Federal Register, USAspending, and Brave Search for broader coverage. Every morning, fresh results are waiting.

The architecture

The whole thing runs on Cloudflare. Three Workers, one D1 database, and zero traditional servers. I structured it as a monorepo with Turborepo and npm workspaces.

The three Workers each have a clear job.

  • rfp-web is the frontend, built with React Router v7 (the Remix successor). It handles the search UI, filters, and an interactive Leaflet map with marker clustering.
  • rfp-api is a Hono REST API that handles search queries, pagination, stats, and geolocation data for the map.
  • rfp-scraper is a scheduled Worker that runs nightly cron jobs to fetch and process RFPs from all sources.

The web Worker talks to the API Worker through Cloudflare Service Bindings. No HTTP, no CORS, no public API endpoint. The binding calls the API directly within Cloudflare's network, which is both faster and more secure.

The scraper

This is the most interesting part. Each source is a plugin that implements a simple interface. Want to add a new source? Create a file, implement the interface, register it, and add a database row. That's it.

The scraper runs in three staggered batches (9:00, 9:15, and 9:30 AM UTC) to stay within Worker CPU time limits. Each batch processes a subset of sources.

Batch 1 (9:00 UTC)  SAM.gov, TED EU, USAspending
Batch 2 (9:15 UTC) Texas ESBD, Cal eProcure, Brave Search, Grants.gov, Federal Register
Batch 3 (9:30 UTC) Florida MFMP, UNGM, NYC Open Data

Each source plugin knows how to fetch and parse its own data format. Some hit REST APIs, some scrape HTML, some use RSS feeds. The plugin architecture keeps this complexity contained. If a source changes its format, I only need to update one file.

The scraper also tracks failures. If a source fails three times in a row, it gets automatically deactivated so it doesn't waste cycles or pollute the logs.

AI extraction

Raw RFP listings are messy. Some have detailed descriptions, some have just a title and a link. I use Cloudflare Workers AI with Llama 3.1 70B Instruct to normalize everything into a consistent structure.

Each RFP gets a single AI call that extracts structured fields (deadline, budget, location, contact info), decision-making metadata (required certifications, contract type, remote friendliness, estimated team size, tech stack), categories (web development, CMS, cloud, AI, migration, and more), and a 2-3 sentence summary.

One call per RFP, one prompt that does everything. This keeps the Workers AI costs low while still getting good extraction quality. The model is surprisingly capable at pulling structured data from unstructured government procurement text.

The database

Cloudflare D1 (managed SQLite at the edge) stores everything. The main rfps table has 30+ fields covering the core data, AI-extracted fields, and decision-making metadata.

For search, I'm using SQLite's FTS5 (Full-Text Search 5) with triggers that automatically sync the search index whenever a row is inserted, updated, or deleted. No manual reindexing, no separate search service. It just works.

CREATE VIRTUAL TABLE rfps_fts USING fts5(
title, description, ai_summary, categories,
content=rfps, content_rowid=id
);

Deduplication uses a unique constraint on (source_name, external_id), so if the same RFP appears in multiple scrape runs, it gets updated rather than duplicated.

RFP status (open, expiring soon, closed) is computed at query time from the deadline field rather than stored. This means status is always accurate without needing a background job to update stale records.

The frontend

The UI is built with React Router v7 on Cloudflare Workers, styled with Tailwind CSS v4. It has a search bar with full-text search across titles, descriptions, summaries, and categories. Dropdown filters let you narrow by organization type, category, deadline status, and estimated value.

The interactive map uses Leaflet with marker clustering. Each RFP with geocoded location data shows up as a pin. The geocoding itself happens in the scraper using Nominatim (OpenStreetMap's geocoding service) to convert location strings into latitude/longitude pairs.

A stats bar at the top shows the count of currently open RFPs and active sources, so you can tell at a glance how fresh the data is.

What I'd do differently

If I were starting over, I'd probably add email alerts from day one. Right now you have to visit the site to check for new opportunities. A simple daily digest for saved searches would make it much more useful.

I'd also invest more in source reliability testing. Government websites change without warning, and scraping them is inherently fragile. Better monitoring and automatic alerts when a source starts returning unexpected data would save debugging time.

Try it out

The platform is live at rfp.davidloor.com. It's focused on digital services, web development, CMS, cloud, and AI opportunities, but the architecture could support any category of RFPs.

Stay Updated

Get the latest posts and insights delivered to your inbox.

Unsubscribe anytime. No spam, ever.