163 Leads from 23 Accelerators: How I Built an AI Prospecting Pipeline
Every LinkedIn guide makes AI prospecting sound easy. Scrape a list, enrich it, send emails. 1,000 leads in 10 minutes.
I needed recently funded European startups as prospects. Instead of buying a stale list, I built the system that generates the list. One afternoon, six Python scripts, real numbers.
The Problem
I wanted funded startups in Italy and Spain — companies that just went through an accelerator, have some traction, and might need help with their go-to-market.
The obvious move: buy a list from Apollo, ZoomInfo, or Crunchbase. The problem: accelerator portfolios update faster than those databases do. You pay per contact for data that’s six months old, with no way to filter by “completed an accelerator cohort in the last 12 months.”
So I built the pipeline instead.
The Architecture
Six Python scripts. No frameworks, no fancy tooling — just urllib, BeautifulSoup, and API calls. About 2,900 lines of Python total, built in one afternoon with Claude Code. Not a SaaS product. A set of scripts that each do one job.
The jobs: scrape portfolio pages, find founders, enrich emails, import to CRM.
Step 1: Scrape 23 Accelerator Portfolios
I compiled a list of 23 accelerators across Italy and Spain. The big names — LVenture, H-FARM, PoliHub, Techstars Italy, Startup Wise Guys, Plug and Play — plus smaller ones like Nana Bianca, I3P, Sellalab, and Spanish players like Lanzadera, Barcelona Activa, SeedRocket.
Every accelerator organizes their site differently. Some have a clean /portfolio/ page. Others bury startups under /en/companies/. Spanish accelerators use /empresas/. One had everything under /ecosystem/. So the scraper tries 10 different portfolio page paths per site and works with whatever it finds.
The founding-year filter mattered more than I expected. One accelerator had 62 startups listed — every single one founded before 2023. Filtering to 2024-2026 cut 40% of results across the full dataset, and every cut was correct.
Result: 329 raw entries. After deduplication and year filtering: 163 unique startups (111 Italian, 52 Spanish).
Step 2: Find the Founders
This is where most of the complexity lives. A 5-step cascade that tries progressively harder methods to find a CEO or founder for each startup.
Team page scraping runs first. The script tries 12 different URL paths (/team, /about, /chi-siamo, /equipo…) and extracts names from structured data, CSS class patterns, and heading elements near role keywords. Founder title matching covers three languages:
FOUNDER_TITLES = [
'ceo', 'founder', 'co-founder', 'cofounder',
'fondatore', 'cofondatore', 'fundador', 'cofundador',
'managing director', 'direttore generale', 'director general',
]
Hit rate: 12-15%. Most startup websites don’t have a team page, or they list first names with no titles. This step catches the easy wins and not much else.
Brave Search does the real work. For each startup without a founder, the script runs:
"{company name}" site:linkedin.com CEO OR founder
The critical shift: searching by domain rather than company name. “Olivia” as a company name returns noise. olivia.io is unambiguous. Switching to domain-first search bumped the hit rate from 30% to 49%.
Brave hit rate for Spain: 63% (29 out of 46 remaining after the team page step).
Apollo runs as a fallback — query by domain, filter for CEO/founder titles. Works well in testing, returned nothing on the full Spain run because the free tier had been quietly depleted. Invisible rate limits that reset unpredictably.
Manual review queue catches the rest. Only the highest-ICP prospects get flagged for manual lookup. Everything else is accepted as “no founder found.” You can’t chase every lead.
Cumulative result: 117 founders identified out of 163 startups (72%).
Step 3: Find Their Email
Two steps: cleanup, then enrichment.
The cleanup step exists because name parsing is dirtier than it looks. Italian and Spanish names have prefixes (di, del, de la), compound surnames, and inconsistent formatting. “Chi Siamo” — Italian for “About Us” — appeared as a person name three times. “Person Profile” twice. “King Kong” once (that was a company name that had slipped through). The cleanup script has a growing list of junk patterns built from real false positives.
After cleanup: 78 usable founders (from 117 raw).
Then email enrichment. Datagma takes a name plus company domain and returns a verified email. Where Datagma misses, Dropcontact runs as fallback — different algorithm, different database.
Datagma hit rate: ~45%. Dropcontact catches roughly 15% of what Datagma misses.
Result: 52 verified email addresses (35 Italian, 17 Spanish).
Step 4: CRM Import
Four clean imports into Crono. Prospects with email, and company accounts separately:
| Import | Records |
|---|---|
| IT prospects (with email) | 35 |
| IT accounts (companies) | 40 |
| ES prospects (with email) | 17 |
| ES accounts (companies) | 27 |
Zero duplicates. The script checks existing records before importing.
One API quirk: Crono requires importType inside the data object, not at the root level. The company field is company, not companyName. Spent 30 minutes on this before actually reading the error response.
The Full Funnel
23 accelerators scraped
→ 329 raw startup entries
→ 163 unique startups (dedup + year filter)
→ 117 founders identified (72%)
→ 78 after name cleanup (67% of identified founders)
→ 52 with verified email (32% of total startups)
163 startups to 52 usable contacts. One afternoon of work. The alternative was two weeks of manual research or paying per-contact prices for data six months out of date.
What Went Wrong
Ten dead websites. Expected — startups die. What wasn’t expected: four of those ten were actually alive, just rebranded with new domains. The accelerator portfolio page still linked the old URL. Brave search for the company name found the new domain in all four cases. Without that validation step, I’d have lost four real prospects.
Name parsing. Italian and Spanish names are hard. Prefixes, compound surnames, inconsistent capitalization across different websites. The validation function has eight rules and still let junk through. “Chi Siamo” survived three times.
Apollo free tier. Works well in early testing. Then it returned nothing for the entire Spain run. Their free tier has invisible quota limits that reset on a schedule I couldn’t find documented.
No standard for portfolio pages. Every accelerator structures theirs differently. Some render the startup list with JavaScript — invisible to a simple scraper. Some link to a PDF. One embedded a Google Sheets. The multiple fallback scraping methods exist because no single approach works for more than 60% of sites.
Seven junk records reached the CRM. Despite the cleanup step, seven entries were clearly not people. Crono’s API doesn’t support deleting imported records, so I had to rename them “Junk Delete” in the UI and delete manually. Adding a human QA pass before import would have caught these.
The Stack
| Tool | Role | Cost |
|---|---|---|
| Claude Code | Built all 6 scripts in one session | Max plan (existing) |
| Brave Search API | Website + founder discovery | Free tier |
| Apollo | Organization + people lookup | Free tier |
| Datagma | Email enrichment (primary) | Existing plan |
| Dropcontact | Email enrichment (fallback) | Existing plan |
| BeautifulSoup | HTML parsing | Free |
| Crono CRM | Import + outreach | Existing plan |
Total incremental cost: $0. Everything ran on free tiers or existing subscriptions.
Would I Do It Again?
Yes. But I’d cut Phase 1 (team page scraping) entirely. A 12-15% hit rate doesn’t justify the complexity — go straight to Brave search and Apollo. And I’d invest more time in the junk detection before hitting the CRM. The false positives that survive cleanup are the most annoying part of the whole pipeline.
The reusable piece is real. The scripts accept --country IT or --country ES flags. Adding UK, DE, or FR is just adding accelerator URLs to the config. I can run this again against any market in an afternoon.
The honest takeaway: AI prospecting is fast, not magic. The scripts handle the mechanical work — scraping, searching, matching, importing. The judgment calls — which accelerators to target, how to score ICP fit, when to stop enriching — are still mine.
163 startups. 52 usable leads. One afternoon. That’s the real conversion rate.
Built with Claude Code on a Mac Mini. Questions? Find me on LinkedIn.