Skip to main content

Overview

Use the API to pull specific data fields from parsed documents. You define a schema that specifies which fields to extract, and returns their values in a structured, predictable format. is designed for high-volume, repeatable workflows: use it when you need to retrieve the same set of fields from many documents, such as pulling invoice totals, contract dates, or form field values. Because extraction is schema-driven, results are consistent across documents with varying layouts. runs after Parse, which is required as the first step in all ADE workflows. It can also follow Split if you’re working with multi-document files.

What’s New in Extract

The API was significantly updated on April 2, 2026. To get the full benefits of these improvements, use model extract-20260314 or later.
  • Unlimited schema size: No limits on the number of fields, nested levels, or characters in a schema.
  • Multi-document schemas: Use the API or the Playground to generate a single schema that covers multiple document types. This is useful when similar documents share most fields but differ in others.
  • Improved handling of large content: Better extraction from large tables, large arrays, and long documents.
  • Schema drift correction: If you add a new document type to an existing workflow, you can prompt to update an existing schema to accommodate it, either in the Playground or using the API.
  • format keyword: Specify how extracted values should be formatted. See format.
  • x-alternativeNames keyword: Define alternative labels for fields that may be named differently across documents. See Alternative Names.

Get Started: Extraction Workflow

You can use the schema extraction wizard directly in our Playground to build and validate an extraction schema. The Playground generates scripts that you can then copy and use in your own code:
  1. Use the schema extraction wizard in our Playground to build a schema tailored to your documents. Build a Schema with the Wizard
  2. Copy the script for the method you plan on using: the library or the API. Export the Relevant Format
  3. Paste the script into your code.
You can also extract data in the Playground. We recommend doing this only for testing purposes, since the Playground isn’t designed to handle bulk document processing.

Use the ADE Extract API to Extract Fields from Markdown

Use the API to extract data from the Markdown output created by the API. See the full API reference here.

Specify Documents to Run Extraction On

The API offers two parameters for specifying the document you want to extract from:
  • markdown: Specify the actual Markdown file you want to run extraction on.
  • markdown_url: Include the URL to the Markdown file you want to run extraction on.

Set the Extraction Schema

Set the extraction schema in the schema parameter. The schema must meet specific format and property requirements. For detailed guidance, see JSON Schema for Extraction.

Set the strict Parameter

Use the optional strict parameter to control how the API handles schemas that include keywords that cause errors.
  • If strict is false: the API continues processing and returns a 206 (Partial Content).
  • If strict is true: the API stops processing and returns a 422 (Unprocessable Entity).
In both cases, the API returns 422 if the schema fails validation, and 206 if the extracted output does not conform to the schema after extraction completes.

Extracted Output

For details about the extraction response structure and fields, see JSON Response for Extraction.

Run Extract with Our Libraries

Click one of the tiles below to learn how to run the API with our libraries.

Python Library

Run Extract with our Python library.
https://mintcdn.com/landingaitest/9admv5znHgUFfVyj/images/ts-logo-512-green.svg?fit=max&auto=format&n=9admv5znHgUFfVyj&q=85&s=37d2005241bb43dec1aa8716782a7508

TypeScript Library

Run Extract with our TypeScript library.

Use Parse Markdown for Best Results

The API is optimized for Markdown generated by the API. The parsed output includes element IDs, anchor tags, chunk tags, and other metadata that uses during the extraction process. can also process generic Markdown files or edited Parse Markdown, but results may be less accurate. For best results:
  • Use only Markdown output from the API, not generic Markdown files.
  • Do not edit the Markdown from before passing it to .