Skip to main content

Scrape URL

Scrape a webpage and extract content using AI.

Endpoint

POST /v1/content/scrape

Request Body

ParameterTypeRequiredDescription
urlstringYesThe URL to scrape
extractobjectNoFields to extract with AI descriptions
wait_forstringNoCSS selector to wait for before scraping
timeoutintegerNoTimeout in milliseconds (default: 30000)
javascriptbooleanNoExecute JavaScript (default: true)
proxybooleanNoUse rotating proxy (default: false)

Extract Object

Define the fields you want to extract using natural language:

{
"extract": {
"title": "The main title of the page",
"price": "The product price including currency",
"description": "A brief product description",
"images": "All product image URLs as an array"
}
}

Example Request

curl -X POST https://api.scrapebit.com/v1/content/scrape \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example-shop.com/product/123",
"extract": {
"title": "The product name",
"price": "The price with currency symbol",
"rating": "The average rating out of 5",
"reviews_count": "Number of reviews",
"in_stock": "Whether the item is in stock (true/false)"
},
"wait_for": ".product-details",
"timeout": 15000
}'

Response

{
"success": true,
"data": {
"id": "scrape_abc123",
"url": "https://example-shop.com/product/123",
"extracted": {
"title": "Premium Wireless Headphones",
"price": "$299.99",
"rating": "4.5",
"reviews_count": "1,234",
"in_stock": true
},
"raw_html": "<!DOCTYPE html>...",
"scraped_at": "2025-01-31T10:30:00Z"
},
"credits_used": 1,
"credits_remaining": 99
}

Advanced Options

Pagination Support

For multi-page content, provide a next button selector:

{
"url": "https://example.com/products",
"extract": {
"products": "List of all product names"
},
"pagination": {
"next_button_selector": ".pagination .next",
"max_pages": 5
}
}

Custom Headers

{
"url": "https://example.com",
"headers": {
"Accept-Language": "en-US",
"Cookie": "session=abc123"
}
}

Wait Conditions

{
"url": "https://example.com",
"wait_for": "#content-loaded",
"wait_timeout": 10000
}

Error Responses

Invalid URL

{
"success": false,
"error": {
"code": "invalid_url",
"message": "The provided URL is not valid"
}
}

Timeout

{
"success": false,
"error": {
"code": "timeout",
"message": "The page took too long to load"
}
}

Blocked

{
"success": false,
"error": {
"code": "blocked",
"message": "Access to this page was blocked. Try enabling proxy."
}
}

Credits

This endpoint uses 1 credit per page scraped. Additional pages from pagination each use 1 credit.

Try It Out

POST/v1/content/scrape

Test the scrape endpoint with your API key

Your API key is stored locally in your browser