How I Built Aethera
From raw web scraping to a high-performance, geolocation-aware directory. A look inside the 2026 HealthTech stack.
The Scraper Pipeline
The core challenge was acquiring clean, verified data from highly protected directories like Psychology Today. I built a multi-stage Python pipeline using BeautifulSoup for static parsing and Playwright for JavaScript-heavy network traversal.
Key Metrics
- 1,063 Unique Providers
- Multiple States Scraped
- Anti-bot bypass logic
# Step 1: Python Scraper Mockup
def scrape_region(region_url):
html = fetch_with_stealth(region_url)
soup = BeautifulSoup(html, 'lxml')
# ... DOM parsing logic ...
async def enrich_provider(page, name, city, state):
# Search Bing, defeat tracking
search_url = f"https://www.bing.com/search?q={query}"
await page.goto(search_url)
# Decode b64 tracking links
link = decode_bing_url(a.get('href'))
# Rip organic phone numbers via NLP
phones = extract_valid_phones(site_text)
Fig 1.5: OSINT Playwright Enrichment
OSINT Data Enrichment
Proxy directories intentionally mask provider phone numbers and trap clinic websites behind redirects. We bypass this using a sophisticated Playwright & NLP pipeline to automatically scrape Bing, decode Base64 tracking links, and rip organic text from private practice DOMs.
// Haversine Radius Calculation
$sql = "SELECT id, name, lat, lng FROM providers";
$radius = 10; // Miles
foreach($providers as $p) {
if (!$p['lat']) continue;
$dist = haversine($userLat, $userLng, $p['lat'], $p['lng']);
if ($dist <= $radius) {
$filtered[] = $p;
}
}
Fig 2.0: SQL-Lite Compatible Radial Search Logic
Spatial Intelligence
Most directories fail at location because they rely on zip codes. I mapped every city to high-precision latitude/longitude coordinates (city_coords.py). This enables our 10-mile radius search feature via the Haversine formula, providing a much higher CX.
Accuracy
~0.1 Miles
Load Time
< 40ms
Premium Design Language
Clinical directories are often cold and complex. Aethera uses a custom HealthTech Glassmorphism design system. High-contrast typography (Playfair Display) paired with a functional 8px grid.
OHP Green Engine
Specific color-coding (Sage & Emerald) for Oregon Health Plan providers, making low-cost care easily visible.
Adaptive Funnels
Therapists take center stage; experimental "Treatments" are hidden deeper to ensure high-intent medical browsing.
Shadow Profiles
A dual-database approach that maps scraped public data into claimable "Shadow Profiles" for provider conversion.
Recent Shipping &
Engineering Roadmap
Stripe Monetization & Onboarding
Implemented multi-tier subscriptions, a secure provider dashboard, and clean dynamic URL routing.
Dynamic SEO Sitemap & NLP Tools
Building out automated AI tools to rewrite bios and generate local search pages for "Therapist in [City]".
Review Engine
A HIPAA-compliant patient feedback loop to verify quality of care across the network.