Data Extraction

POST/v1/scrape/extract

$0.01/call

Usage

const res = await fetch('https://api.yepapi.com/v1/scrape/extract', {
  method: 'POST',
  headers: {
    'x-api-key': 'YOUR_API_KEY',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    url: 'https://news.ycombinator.com',
    extractRules: {
      titles: { selector: '.titleline > a', type: 'list' },
    },
  }),
});
const { data } = await res.json();
console.log(data.extracted);

curl -X POST https://api.yepapi.com/v1/scrape/extract \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://news.ycombinator.com", "extractRules": {"titles": {"selector": ".titleline > a", "type": "list"}}}'

Request Body

Parameter	Type	Required	Description	Default
`url`	`string`	Yes	URL to extract data from	—
`extractRules`	`object`	Yes	CSS/XPath extraction rules (see below)	—

Extract Rules Format

Simple: {"title": "h1"} — extracts text content of the first h1.

List: {"items": {"selector": ".card", "type": "list"}} — extracts all matching elements.

Nested: Extract multiple fields from repeating elements:

{
  "articles": {
    "selector": ".post",
    "type": "list",
    "output": {
      "title": ".post-title",
      "link": { "selector": "a", "output": "@href" }
    }
  }
}

Attributes: Use @attr to extract element attributes: {"image": "img@src"}.

XPath: Selectors starting with / are treated as XPath: {"title": "//h1"}.

Response

{
  "ok": true,
  "data": {
    "url": "https://news.ycombinator.com",
    "extracted": {
      "titles": [
        "Show HN: Open-source AI code editor",
        "The State of WebAssembly 2026",
        "PostgreSQL 18 Released"
      ]
    }
  }
}

Response Fields

Field	Type	Description
`ok`	`boolean`	Whether the request succeeded
`data`	`object`	Response payload
`data.url`	`string`	The URL that was scraped
`data.extracted`	`object`	Extracted data matching your `extractRules` keys. Each key contains the result of the corresponding selector

Under the Hood

Pages are rendered with JavaScript enabled before extraction. CSS selectors and XPath expressions both work — use whichever you prefer.