Documentation Index
Fetch the complete documentation index at: https://firecrawl-rhys-choosing-data-extractor-update-may-2026.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Firecrawl offers two approaches for extracting structured data from web pages. Each serves different use cases with varying levels of automation and control.
Quick Comparison
| Feature | /agent | /scrape (JSON mode) |
|---|
| URL Required | No (optional) | Yes (single URL) |
| Scope | Web-wide discovery, multi-page, or single page | Single page |
| URL Discovery | Autonomous web search | None |
| Processing | Asynchronous | Synchronous |
| Schema Required | No (prompt or schema) | No (prompt or schema) |
| Pricing | Dynamic (5 free runs/day); 10 credits/cell on Parallel Agents fast path | 5 credits/page (1 base + 4 for JSON mode) |
| Best For | Research, discovery, multi-page or batch gathering | Known single-page extraction |
1. /agent Endpoint
The /agent endpoint is Firecrawl’s most advanced offering. It uses AI agents to autonomously search, navigate, and gather data from across the web.
Key Characteristics
- URLs Optional: Just describe what you need via
prompt; URLs are completely optional
- Autonomous Navigation: The agent searches and navigates deep into sites to find your data
- Deep Web Search: Autonomously discovers information across multiple domains and pages
- Parallel Processing: Processes multiple sources simultaneously for faster results
- Models Available:
spark-1-fast (cheapest, used by Parallel Agents — 10 credits/cell), spark-1-mini (default, balanced cost/quality), and spark-1-pro (highest accuracy)
Example
from firecrawl import Firecrawl
from pydantic import BaseModel, Field
from typing import List, Optional
app = Firecrawl(api_key="fc-YOUR_API_KEY")
class Founder(BaseModel):
name: str = Field(description="Full name of the founder")
role: Optional[str] = Field(None, description="Role or position")
background: Optional[str] = Field(None, description="Professional background")
class FoundersSchema(BaseModel):
founders: List[Founder] = Field(description="List of founders")
result = app.agent(
prompt="Find the founders of Firecrawl",
schema=FoundersSchema,
model="spark-1-mini",
max_credits=100
)
print(result.data)
Best Use Case: Autonomous Research & Discovery
Scenario: You need to find information about AI startups that raised Series A funding, including their founders and funding amounts.
Why /agent: You don’t know which websites contain this information. The agent will autonomously search the web, navigate to relevant sources (Crunchbase, news sites, company pages), and compile the structured data for you.
Parallel Agents
For high-volume batch extraction — e.g., enriching a list of 1,000 companies with funding data — use Parallel Agents. They run an intelligent waterfall: spark-1-fast handles simple cells at a flat 10 credits/cell, escalating to spark-1-mini only when needed. This is the right tool for grid/batch workflows where you’d otherwise be looping over many similar prompts.
For more details, see the Agent documentation.
2. /scrape Endpoint with JSON Mode
The /scrape endpoint with JSON mode is the most controlled approach—it extracts structured data from a single known URL using an LLM to parse the page content into your specified schema.
Key Characteristics
- Single URL Only: Designed for extracting data from one specific page at a time
- Exact URL Required: You must know the precise URL containing the data
- Schema Optional: Can use JSON schema OR just a prompt (LLM chooses structure)
- Synchronous: Returns data immediately (no job polling needed)
- Additional Formats: Can combine JSON extraction with markdown, HTML, screenshots in one request
Example
from firecrawl import Firecrawl
from pydantic import BaseModel
app = Firecrawl(api_key="fc-YOUR-API-KEY")
class CompanyInfo(BaseModel):
company_mission: str
supports_sso: bool
is_open_source: bool
is_in_yc: bool
result = app.scrape(
'https://firecrawl.dev',
formats=[{
"type": "json",
"schema": CompanyInfo.model_json_schema()
}],
only_main_content=False,
timeout=120000
)
print(result)
Scenario: You’re building a price monitoring tool and need to extract the price, stock status, and product details from a specific product page you already have the URL for.
Why /scrape with JSON mode: You know exactly which page contains the data, need precise single-page extraction, and want synchronous results without job management overhead.
For more details, see the JSON mode documentation.
Decision Guide
Do you know the exact URL(s) containing your data?
- NO → Use
/agent (autonomous web discovery)
- YES
- Single page? → Use
/scrape with JSON mode
- Multiple pages? → Use
/agent with URLs (or batch /scrape)
- Many similar prompts across a list? → Use
/agent Parallel Agents
Recommendations by Scenario
| Scenario | Recommended Endpoint |
|---|
| ”Find all AI startups and their funding” | /agent |
| ”Extract data from this specific product page” | /scrape (JSON mode) |
| “Get all blog posts from competitor.com” | /agent with URL |
| ”Monitor prices across multiple known URLs” | /scrape with batch processing |
| ”Research companies in a specific industry” | /agent |
| ”Enrich 1,000 companies with funding data” | /agent Parallel Agents |
| ”Extract contact info from 50 known company pages” | /scrape with batch processing |
Pricing
| Endpoint | Cost | Notes |
|---|
/scrape (JSON mode) | 5 credits/page (1 base + 4 for JSON mode) | Fixed, predictable |
/agent | Dynamic | 5 free runs/day; typical run ~100–500 credits |
/agent (Parallel Agents) | 10 credits/cell on the fast path | Batch/grid workflows |
Example: “Find the founders of Firecrawl”
| Endpoint | How It Works | Credits Used |
|---|
/scrape | You find the URL manually, then scrape 1 page | ~5 credits |
/agent | Just send the prompt—agent finds and extracts | ~100–500 credits |
Tradeoff: /scrape is cheapest but requires you to know the URL. /agent costs more but handles discovery automatically.
For detailed pricing, see Firecrawl Pricing.
Key Takeaways
-
Know the exact URL? Use
/scrape with JSON mode—it’s the cheapest (5 credits/page), fastest (synchronous), and most predictable option.
-
Need autonomous research? Use
/agent—it handles discovery automatically with 5 free runs/day, then dynamic pricing based on complexity.
-
Running batch workflows over a list? Use
/agent Parallel Agents with spark-1-fast for a flat 10 credits/cell.
-
Cost vs. convenience tradeoff:
/scrape is most cost-effective when you know your URLs; /agent costs more but eliminates manual URL discovery.
Further Reading