The most valuable data on the web is not publicly accessible. Salary data, pricing information, proprietary research, user reviews, job listings with contact details — almost all of it sits behind a login. Scrapers that can only access public content miss the majority of the valuable data. Building scrapers that can create verified accounts and maintain authenticated sessions is the next level.
The Three-Phase Pattern
Authenticated scraping at scale follows a three-phase pattern:
- Phase 1 — Account creation: programmatically create a verified account using a disposable inbox. Complete email verification. Save the session.
- Phase 2 — Session management: store browser storage state (cookies, localStorage) to disk. Reuse it across scrape runs without re-creating the account.
- Phase 3 — Authenticated scraping: load saved session state, navigate authenticated pages, extract the gated content.
Full Implementation with Playwright
import { chromium } from "playwright";
import AgentMailr from "agentmailr";
import * as fs from "fs";
const client = new AgentMailr({ apiKey: process.env.AGENTMAILR_API_KEY });
async function createVerifiedAccount(signupUrl: string, sessionPath: string) {
const inbox = await client.inboxes.create();
const browser = await chromium.launch({ headless: true });
const context = await browser.newContext();
const page = await context.newPage();
await page.goto(signupUrl);
await page.fill('[type="email"]', inbox.address);
await page.fill('[type="password"]', generateSecurePassword());
await page.click('[type="submit"]');
// Wait for OTP
const { otp } = await client.messages.waitForOTP({
inboxId: inbox.id,
timeout: 30_000,
});
await page.fill('[name="otp"], [placeholder*="code"]', otp);
await page.click('[type="submit"]');
// Save session state for reuse
await context.storageState({ path: sessionPath });
await browser.close();
await client.inboxes.delete(inbox.id);
return { email: inbox.address, sessionPath };
}
// Create 5 accounts for a rotation pool
const accounts = await Promise.all(
Array.from({ length: 5 }, (_, i) =>
createVerifiedAccount(
"https://target-site.com/signup",
`sessions/account-${i}.json`
)
)
);
Session Reuse
Once an account is created and the session is saved, you do not need to re-authenticate on every scrape run. Playwright's storageState captures all cookies and localStorage. On subsequent runs, load it directly:
// Subsequent runs: no re-authentication needed
async function scrapeWithSession(url: string, sessionPath: string) {
const browser = await chromium.launch({ headless: true });
const context = await browser.newContext({
storageState: sessionPath, // Load saved session
});
const page = await context.newPage();
await page.goto(url);
// Page loads as authenticated user
const data = await page.evaluate(() => extractPageData());
await browser.close();
return data;
}
Rotating Identities at Scale
For high-volume scraping, maintain a pool of verified accounts and distribute requests across them. When a session expires or gets rate-limited, create a new account programmatically to replenish the pool:
class AccountPool {
private sessions: string[] = [];
async initialize(count: number) {
this.sessions = await Promise.all(
Array.from({ length: count }, (_, i) =>
createVerifiedAccount(SIGNUP_URL, `sessions/pool-${i}.json`)
.then(({ sessionPath }) => sessionPath)
)
);
}
getSession(): string {
// Round-robin distribution
const session = this.sessions.shift()!;
this.sessions.push(session);
return session;
}
async replenish(index: number) {
// Replace expired/blocked session
const { sessionPath } = await createVerifiedAccount(
SIGNUP_URL,
`sessions/pool-${index}-renewed.json`
);
this.sessions[index] = sessionPath;
}
}
Start Free
AgentMailr powers the account creation phase of authenticated scraping. Free to start, no credit card required.