Scrape Baidu search result pages into structured JSON — organic listings, paid promotion blocks, Baidu-property cards, and the rich answer modules unique to China's leading engine.
Baidu is the primary gateway to search inside mainland China, and its result pages mix organic links with heavily promoted paid listings and self-referential cards pointing to Baidu Baike, Zhidao, and Tieba. The Baidu Scraper API separates organic positions from paid promotions and parses the rich answer cards and image clusters into validated records.
Cross-border brands and localization agencies rely on this data to understand visibility inside a walled search ecosystem that Western tooling barely reaches. Set the simplified-Chinese locale, choose device, schedule recurring pulls, and diff snapshots to track how organic and paid placements shift.
# POST a target — get validated JSON back
curl https://api.crawlzo.com/v4/scrape \
-H "Authorization: Bearer $CRAWLZO_KEY" \
-d '{
"url": "https://www.baidu.com/search?q=structured+web+data",
"geo": "us",
"device": "desktop"
}'
// ← response
{
"status": "ok",
"data": {
"query": "structured web data",
"organic": [
{ "position": 1, "title": "…", "url": "https://…", "snippet": "…" }
],
"features": { "ads": 3, "answer_box": true }
}
} "geo": "us",
"device": "desktop"Baidu data parsed into clean, validated JSON. Pull any group below on its own, or combine them in a single request.
Simplified-Chinese keyword rank tracking
Baidu paid-promotion competitive analysis
Baidu-property card visibility checks
Cross-border China market entry research
Yes. Baidu mixes paid listings tightly into the result flow, so each record carries a paid-versus-organic flag and position to keep your visibility analysis honest.
Structured JSON straight from the API, or pushed to your stack natively — S3, BigQuery, Snowflake, Postgres, Kafka, or any HTTPS webhook. Call it from Python, Node, Go, Rust, or any HTTP client. The data lands where your pipeline already lives.
No. You pay for valid, schema-passing rows only. Retries, blocks, CAPTCHAs, and 5xxs are on us. If a run doesn't return data that conforms to the schema, it isn't billed.
Every request routes through the same engine behind our Web Unblocker API: compliant residential IPs, real browser fingerprints, TLS-level evasion, behaviour modelling, and built-in CAPTCHA solving. Hard targets become routine.
Yes. We respect robots policies, rate budgets, and ToS-aware allow/deny lists. We deliver and move on — no row-level retention beyond your replay window. GDPR DPA, PII redaction, and custom data residency available on request.