Skip to main content

Documentation Index

Fetch the complete documentation index at: https://firecrawl-rhys-choosing-data-extractor-update-may-2026.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Firecrawl offers two approaches for extracting structured data from web pages. Each serves different use cases with varying levels of automation and control.

Quick Comparison

Feature/agent/scrape (JSON mode)
URL RequiredNo (optional)Yes (single URL)
ScopeWeb-wide discovery, multi-page, or single pageSingle page
URL DiscoveryAutonomous web searchNone
ProcessingAsynchronousSynchronous
Schema RequiredNo (prompt or schema)No (prompt or schema)
PricingDynamic (5 free runs/day); 10 credits/cell on Parallel Agents fast path5 credits/page (1 base + 4 for JSON mode)
Best ForResearch, discovery, multi-page or batch gatheringKnown single-page extraction

1. /agent Endpoint

The /agent endpoint is Firecrawl’s most advanced offering. It uses AI agents to autonomously search, navigate, and gather data from across the web.

Key Characteristics

  • URLs Optional: Just describe what you need via prompt; URLs are completely optional
  • Autonomous Navigation: The agent searches and navigates deep into sites to find your data
  • Deep Web Search: Autonomously discovers information across multiple domains and pages
  • Parallel Processing: Processes multiple sources simultaneously for faster results
  • Models Available: spark-1-fast (cheapest, used by Parallel Agents — 10 credits/cell), spark-1-mini (default, balanced cost/quality), and spark-1-pro (highest accuracy)

Example

from firecrawl import Firecrawl
from pydantic import BaseModel, Field
from typing import List, Optional

app = Firecrawl(api_key="fc-YOUR_API_KEY")

class Founder(BaseModel):
    name: str = Field(description="Full name of the founder")
    role: Optional[str] = Field(None, description="Role or position")
    background: Optional[str] = Field(None, description="Professional background")

class FoundersSchema(BaseModel):
    founders: List[Founder] = Field(description="List of founders")

result = app.agent(
    prompt="Find the founders of Firecrawl",
    schema=FoundersSchema,
    model="spark-1-mini",
    max_credits=100
)

print(result.data)

Best Use Case: Autonomous Research & Discovery

Scenario: You need to find information about AI startups that raised Series A funding, including their founders and funding amounts. Why /agent: You don’t know which websites contain this information. The agent will autonomously search the web, navigate to relevant sources (Crunchbase, news sites, company pages), and compile the structured data for you.

Parallel Agents

For high-volume batch extraction — e.g., enriching a list of 1,000 companies with funding data — use Parallel Agents. They run an intelligent waterfall: spark-1-fast handles simple cells at a flat 10 credits/cell, escalating to spark-1-mini only when needed. This is the right tool for grid/batch workflows where you’d otherwise be looping over many similar prompts. For more details, see the Agent documentation.

2. /scrape Endpoint with JSON Mode

The /scrape endpoint with JSON mode is the most controlled approach—it extracts structured data from a single known URL using an LLM to parse the page content into your specified schema.

Key Characteristics

  • Single URL Only: Designed for extracting data from one specific page at a time
  • Exact URL Required: You must know the precise URL containing the data
  • Schema Optional: Can use JSON schema OR just a prompt (LLM chooses structure)
  • Synchronous: Returns data immediately (no job polling needed)
  • Additional Formats: Can combine JSON extraction with markdown, HTML, screenshots in one request

Example

from firecrawl import Firecrawl
from pydantic import BaseModel

app = Firecrawl(api_key="fc-YOUR-API-KEY")

class CompanyInfo(BaseModel):
    company_mission: str
    supports_sso: bool
    is_open_source: bool
    is_in_yc: bool

result = app.scrape(
    'https://firecrawl.dev',
    formats=[{
      "type": "json",
      "schema": CompanyInfo.model_json_schema()
    }],
    only_main_content=False,
    timeout=120000
)

print(result)

Best Use Case: Single-Page Precision Extraction

Scenario: You’re building a price monitoring tool and need to extract the price, stock status, and product details from a specific product page you already have the URL for. Why /scrape with JSON mode: You know exactly which page contains the data, need precise single-page extraction, and want synchronous results without job management overhead. For more details, see the JSON mode documentation.

Decision Guide

Do you know the exact URL(s) containing your data?
  • NO → Use /agent (autonomous web discovery)
  • YES
    • Single page? → Use /scrape with JSON mode
    • Multiple pages? → Use /agent with URLs (or batch /scrape)
    • Many similar prompts across a list? → Use /agent Parallel Agents

Recommendations by Scenario

ScenarioRecommended Endpoint
”Find all AI startups and their funding”/agent
”Extract data from this specific product page”/scrape (JSON mode)
“Get all blog posts from competitor.com”/agent with URL
”Monitor prices across multiple known URLs”/scrape with batch processing
”Research companies in a specific industry”/agent
”Enrich 1,000 companies with funding data”/agent Parallel Agents
”Extract contact info from 50 known company pages”/scrape with batch processing

Pricing

EndpointCostNotes
/scrape (JSON mode)5 credits/page (1 base + 4 for JSON mode)Fixed, predictable
/agentDynamic5 free runs/day; typical run ~100–500 credits
/agent (Parallel Agents)10 credits/cell on the fast pathBatch/grid workflows

Example: “Find the founders of Firecrawl”

EndpointHow It WorksCredits Used
/scrapeYou find the URL manually, then scrape 1 page~5 credits
/agentJust send the prompt—agent finds and extracts~100–500 credits
Tradeoff: /scrape is cheapest but requires you to know the URL. /agent costs more but handles discovery automatically. For detailed pricing, see Firecrawl Pricing.

Key Takeaways

  1. Know the exact URL? Use /scrape with JSON mode—it’s the cheapest (5 credits/page), fastest (synchronous), and most predictable option.
  2. Need autonomous research? Use /agent—it handles discovery automatically with 5 free runs/day, then dynamic pricing based on complexity.
  3. Running batch workflows over a list? Use /agent Parallel Agents with spark-1-fast for a flat 10 credits/cell.
  4. Cost vs. convenience tradeoff: /scrape is most cost-effective when you know your URLs; /agent costs more but eliminates manual URL discovery.

Further Reading