Use this file to discover all available pages before exploring further.
The library is a lightweight TypeScript library you can use for parsing documents, classifying pages, extracting data, generating tables of contents, and splitting documents into sub-documents.The library is automatically generated from our API specification, ensuring you have access to the latest endpoints and parameters.
To use the library, first generate an API key. Save the key to a .zshrc file or another secure location on your computer. Then export the key as an environment variable.
export VISION_AGENT_API_KEY=<your-api-key>
When initializing the client, the library automatically reads from the VISION_AGENT_API_KEY environment variable:
import LandingAIADE from "landingai-ade";const client = new LandingAIADE();
Alternatively, you can explicitly pass the API key when initializing the client:
import LandingAIADE from "landingai-ade";const client = new LandingAIADE({ apikey: process.env.VISION_AGENT_API_KEY});
For more information about API keys and alternate methods for setting the API key, go to API Key.
By default, the library uses the US endpoints. If your API key is from the EU endpoint, set the environment parameter to eu when initializing the client.
import LandingAIADE from "landingai-ade";const client = new LandingAIADE({ environment: "eu",});// ... rest of your code
The parse method converts documents into structured Markdown with chunk and grounding metadata. Use these examples as guides to get started with parsing with the library.
The parse method accepts optional parameters to customize parsing behavior. To see all available parameters, go to ADE Parse API.Pass these parameters directly to the parse() method.
import LandingAIADE from "landingai-ade";import fs from "fs";const client = new LandingAIADE();const response = await client.parse({ document: fs.createReadStream("/path/to/document.pdf"), model: "dpt-2-latest", split: "page"});
The parseJobs resource enables you to asynchronously parse documents that are up to 1,000 pages or 1 GB.
For more information about parse jobs, go to Parse Large Files (Parse Jobs).Here is the basic workflow for working with parse jobs:
Start a parse job.
Copy the job_id in the response.
Get the results from the parsing job with the job_id.
This script contains the full workflow:
import LandingAIADE from "landingai-ade";import fs from "fs";const client = new LandingAIADE();// Step 1: Create a parse jobconst job = await client.parseJobs.create({ document: fs.createReadStream("/path/to/file/document"), model: "dpt-2-latest"});const jobId = job.job_id;console.log(`Job ${jobId} created.`);// Step 2: Get the parsing resultswhile (true) { const response = await client.parseJobs.get(jobId); if (response.status === "completed") { console.log(`Job ${jobId} completed.`); break; } console.log(`Job ${jobId}: ${response.status} (${(response.progress * 100).toFixed(0)}% complete)`); await new Promise(resolve => setTimeout(resolve, 5000));}// Step 3: Access the parsed dataconst response = await client.parseJobs.get(jobId);console.log("Global Markdown:", response.data.markdown.substring(0, 200) + "...");console.log(`Number of chunks: ${response.data.chunks.length}`);// Save Markdown output (useful if you plan to run extract on the Markdown)fs.writeFileSync("output.md", response.data.markdown, "utf-8");
To list all async parse jobs associated with your API key, run this code:
import LandingAIADE from "landingai-ade";const client = new LandingAIADE();// List all jobsconst response = await client.parseJobs.list();for (const job of response.jobs) { console.log(`Job ${job.job_id}: ${job.status}`);}
The extract method extracts structured data from Markdown content using extraction schemas. Use these examples as guides to get started with extracting with the library.Pass Markdown ContentThe library supports a few methods for passing the Markdown content for extraction:
If you already have a Markdown file (from a previous parsing operation), you can extract data directly from it. Use the markdown parameter with fs.createReadStream() for local Markdown files or with fetch() for remote Markdown files.
import LandingAIADE from "landingai-ade";import fs from "fs";// Define your extraction schemaconst schemaDict = { type: "object", properties: { employee_name: { type: "string", description: "The employee's full name" }, employee_ssn: { type: "string", description: "The employee's Social Security Number" }, gross_pay: { type: "number", description: "The gross pay amount" } }};const client = new LandingAIADE();const schemaJson = JSON.stringify(schemaDict);// Extract from a local markdown fileconst extractResponse = await client.extract({ schema: schemaJson, markdown: fs.createReadStream("/path/to/output.md"), model: "extract-latest"});// Or extract from a remote markdown fileconst extractResponse = await client.extract({ schema: schemaJson, markdown: await fetch("https://example.com/document.md"), model: "extract-latest"});// Access the extracted dataconsole.log(extractResponse.extraction);
Use Zod schemas to define your extraction schema in a type-safe way. Zod provides TypeScript type inference and runtime validation for your extracted data.To use Zod with the library, install zod:
npm install zod
After installing zod, run extraction with the library:
import LandingAIADE, { toFile } from "landingai-ade";import fs from "fs";import { z } from "zod";// Define your extraction schema as a Zod schemaconst PayStubSchema = z.object({ employee_name: z.string().describe("The employee's full name"), employee_ssn: z.string().describe("The employee's Social Security Number"), gross_pay: z.number().describe("The gross pay amount")});// Extract TypeScript type from schematype PayStubData = z.infer<typeof PayStubSchema>;// Initialize the clientconst client = new LandingAIADE();// First, parse the document to get markdownconst parseResponse = await client.parse({ document: fs.createReadStream("/path/to/pay-stub.pdf"), model: "dpt-2-latest"});// Convert Zod schema to JSON schemaconst schema = JSON.stringify(z.toJSONSchema(PayStubSchema));// Extract structured data using the schemaconst extractResponse = await client.extract({ schema: schema, markdown: await toFile(Buffer.from(parseResponse.markdown), "document.md"), model: "extract-latest"});// Access the extracted data with type safetyconst data = extractResponse.extraction as PayStubData;console.log(data);// Access extraction metadata to see which chunks were referencedconsole.log(extractResponse.extraction_metadata);
Define your extraction schema directly as a JSON string in your script.
import LandingAIADE, { toFile } from "landingai-ade";import fs from "fs";// Define your extraction schema as an objectconst schemaDict = { type: "object", properties: { employee_name: { type: "string", description: "The employee's full name" }, employee_ssn: { type: "string", description: "The employee's Social Security Number" }, gross_pay: { type: "number", description: "The gross pay amount" } }};// Initialize the clientconst client = new LandingAIADE();// First, parse the document to get markdownconst parseResponse = await client.parse({ document: fs.createReadStream("/path/to/pay-stub.pdf"), model: "dpt-2-latest"});// Convert schema object to JSON stringconst schemaJson = JSON.stringify(schemaDict);// Extract structured data using the schemaconst extractResponse = await client.extract({ schema: schemaJson, markdown: await toFile(Buffer.from(parseResponse.markdown), "document.md"), model: "extract-latest"});// Access the extracted dataconsole.log(extractResponse.extraction);// Access extraction metadata to see which chunks were referencedconsole.log(extractResponse.extraction_metadata);
Load your extraction schema from a separate JSON file for better organization and reusability.For example, here is the pay_stub_schema.json file:
{ "type": "object", "properties": { "employee_name": { "type": "string", "description": "The employee's full name" }, "employee_ssn": { "type": "string", "description": "The employee's Social Security Number" }, "gross_pay": { "type": "number", "description": "The gross pay amount" } }}
You can pass the JSON file defined above in the following script:
import LandingAIADE, { toFile } from "landingai-ade";import fs from "fs";// Initialize the clientconst client = new LandingAIADE();// First, parse the document to get markdownconst parseResponse = await client.parse({ document: fs.createReadStream("/path/to/pay-stub.pdf"), model: "dpt-2-latest"});// Load schema from JSON fileconst schemaJson = fs.readFileSync("pay_stub_schema.json", "utf-8");// Extract structured data using the schemaconst extractResponse = await client.extract({ schema: schemaJson, markdown: await toFile(Buffer.from(parseResponse.markdown), "document.md"), model: "extract-latest"});// Access the extracted dataconsole.log(extractResponse.extraction);// Access extraction metadata to see which chunks were referencedconsole.log(extractResponse.extraction_metadata);
Define nested Zod schemas to extract hierarchical data from documents. This approach organizes related information under meaningful section names.Define nested schemas before the main extraction schema. Otherwise, the nested schemas will not be defined when referenced.For example, to extract data from the Patient Details and Emergency Contact Information sections in this Medical Form, define separate schemas for each section, then combine them in a main schema.
import LandingAIADE, { toFile } from "landingai-ade";import fs from "fs";import { z } from "zod";// Define a nested schema for patient-specific informationconst PatientDetailsSchema = z.object({ patient_name: z.string().describe("Full name of the patient."), date: z.string().describe("Date the patient information form was filled out.")});// Define a nested schema for emergency contact detailsconst EmergencyContactInformationSchema = z.object({ emergency_contact_name: z.string().describe("Full name of the emergency contact person."), relationship_to_patient: z.string().describe("Relationship of the emergency contact to the patient."), primary_phone_number: z.string().describe("Primary phone number of the emergency contact."), secondary_phone_number: z.string().describe("Secondary phone number of the emergency contact."), address: z.string().describe("Full address of the emergency contact.")});// Define the main extraction schema that combines all the nested schemasconst PatientAndEmergencyContactInformationSchema = z.object({ patient_details: PatientDetailsSchema.describe("Information about the patient as provided in the form."), emergency_contact_information: EmergencyContactInformationSchema.describe("Details of the emergency contact person for the patient.")});// Extract TypeScript type from schematype PatientAndEmergencyContactInformation = z.infer<typeof PatientAndEmergencyContactInformationSchema>;// Initialize the clientconst client = new LandingAIADE();// Parse the document to get markdownconst parseResponse = await client.parse({ document: fs.createReadStream("/path/to/medical-form.pdf"), model: "dpt-2-latest"});// Convert Zod schema to JSON schemaconst schema = JSON.stringify(z.toJSONSchema(PatientAndEmergencyContactInformationSchema));// Extract structured data using the schemaconst extractResponse = await client.extract({ schema: schema, markdown: await toFile(Buffer.from(parseResponse.markdown), "document.md"), model: "extract-latest"});// Display the extracted structured dataconsole.log(extractResponse.extraction);
Use Zod’s z.array() to extract repeatable data structures when you don’t know how many items will appear. Common examples include line items in invoices, transaction records, or contact information for multiple people.For example, to extract variable-length wire instructions and line items from this Wire Transfer Form, use z.array(DescriptionItemSchema) for line items and z.array(WireInstructionSchema) for wire transfer details.
import LandingAIADE, { toFile } from "landingai-ade";import fs from "fs";import { z } from "zod";// Nested schemas for array fieldsconst DescriptionItemSchema = z.object({ description: z.string().describe("Invoice or Bill Description"), amount: z.number().describe("Invoice or Bill Amount")});const WireInstructionSchema = z.object({ bank_name: z.string().describe("Bank name"), bank_address: z.string().describe("Bank address"), bank_account_no: z.string().describe("Bank account number"), swift_code: z.string().describe("SWIFT code"), aba_routing: z.string().describe("ABA routing number"), ach_routing: z.string().describe("ACH routing number")});// Invoice schema containing array fieldsconst InvoiceSchema = z.object({ description_or_particular: z.array(DescriptionItemSchema).describe("List of invoice line items (description and amount)"), wire_instructions: z.array(WireInstructionSchema).describe("Wire transfer instructions")});// Main extraction schemaconst ExtractedInvoiceFieldsSchema = z.object({ invoice: InvoiceSchema.describe("Invoice list-type fields")});// Extract TypeScript type from schematype ExtractedInvoiceFields = z.infer<typeof ExtractedInvoiceFieldsSchema>;// Initialize the clientconst client = new LandingAIADE();// Parse the document to get markdownconst parseResponse = await client.parse({ document: fs.createReadStream("/path/to/wire-transfer.pdf"), model: "dpt-2-latest"});// Convert Zod schema to JSON schemaconst schema = JSON.stringify(z.toJSONSchema(ExtractedInvoiceFieldsSchema));// Extract structured data using the schemaconst extractResponse = await client.extract({ schema: schema, markdown: await toFile(Buffer.from(parseResponse.markdown), "document.md"), model: "extract-latest"});// Display the extracted dataconsole.log(extractResponse.extraction);
The classify method classifies each page in a document by type. Provide your document and a list of classes, and the API assigns a class to each page. Use these examples as guides to get started with classifying with the library.
Use the document_url parameter to classify files from remote URLs (http, https).
import LandingAIADE from "landingai-ade";const client = new LandingAIADE();const classes = [ { class: "invoice", description: "A commercial bill with line items, totals, and payment terms" }, { class: "bank_statement", description: "A monthly summary of account transactions" }];const classesJson = JSON.stringify(classes);const response = await client.classify({ classes: classesJson as any, document_url: "https://example.com/document.pdf", model: "classify-latest"});for (const result of response.classification) { console.log(`Page ${result.page}: ${result.class}`);}
const unknown = response.classification.filter(r => r.class === "unknown");for (const r of unknown) { console.log(`Page ${r.page}: suggested class is ${r.suggested_class}`);}
The section method analyzes a parsed document and generates a hierarchical table of contents. Use these examples as guides to get started with sectioning with the library.Pass Markdown ContentThe library supports a few methods for passing the Markdown content for sectioning:
If you already have a Markdown file (from a previous parsing operation), you can section it directly. Use the markdown parameter for local Markdown files or the markdown parameter with fetch() for remote Markdown files.
import LandingAIADE from "landingai-ade";import fs from "fs";const client = new LandingAIADE();// Section from a local Markdown fileconst sectionResponse = await client.section({ markdown: fs.createReadStream("/path/to/parsed_output.md"), model: "section-latest"});// Or section from a remote Markdown fileconst sectionResponse = await client.section({ markdown: await fetch("https://example.com/document.md"), model: "section-latest"});// Access the table of contentsfor (const entry of sectionResponse.table_of_contents) { const indent = " ".repeat(entry.level - 1); console.log(`${indent}${entry.section_number}. ${entry.title}`);}
The section method accepts optional parameters to customize sectioning behavior. To see all available parameters, go to API.
import LandingAIADE from "landingai-ade";import fs from "fs";const client = new LandingAIADE();const sectionResponse = await client.section({ markdown: fs.createReadStream("/path/to/parsed_output.md"), guidelines: "Treat each numbered article as a top-level section", model: "section-latest"});
The split method classifies and separates a parsed document into multiple sub-documents based on Split Rules you define. Use these examples as guides to get started with splitting with the library.Pass Markdown ContentThe library supports a few methods for passing the Markdown content for splitting:
After parsing a document, you can pass the Markdown string directly from the ParseResponse to the split method without saving it to a file.
import LandingAIADE from "landingai-ade";import fs from "fs";const client = new LandingAIADE();// Parse the documentconst parseResponse = await client.parse({ document: fs.createReadStream("/path/to/document.pdf"), model: "dpt-2-latest"});// Define Split Rulesconst splitClass = [ { name: "Bank Statement", description: "Document from a bank that summarizes all account activity over a period of time." }, { name: "Pay Stub", description: "Document that details an employee's earnings, deductions, and net pay for a specific pay period.", identifier: "Pay Stub Date" }];const splitClassJson = JSON.stringify(splitClass);// Split using the Markdown string from parse responseconst splitResponse = await client.split({ split_class: splitClassJson, markdown: parseResponse.markdown, // Pass Markdown string directly model: "split-latest"});// Access the splitsfor (const split of splitResponse.splits) { console.log(`Classification: ${split.classification}`); console.log(`Identifier: ${split.identifier}`); console.log(`Pages: ${split.pages}`);}
If you already have a Markdown file (from a previous parsing operation), you can split it directly. Use the markdown parameter for local Markdown files or the markdown parameter with fetch() for remote Markdown files.
import LandingAIADE, { toFile } from "landingai-ade";import fs from "fs";const client = new LandingAIADE();// Define Split Rulesconst splitClass = [ { name: "Invoice", description: "A document requesting payment for goods or services.", identifier: "Invoice Number" }, { name: "Receipt", description: "A document acknowledging that payment has been received." }];const splitClassJson = JSON.stringify(splitClass);// Split from a local Markdown fileconst splitResponse = await client.split({ split_class: splitClassJson, markdown: fs.createReadStream("/path/to/parsed_output.md"), model: "split-latest"});// Or split from a remote Markdown fileconst splitResponse = await client.split({ split_class: splitClassJson, markdown: await fetch("https://example.com/document.md"), model: "split-latest"});// Access the splitsfor (const split of splitResponse.splits) { console.log(`Classification: ${split.classification}`); if (split.identifier) { console.log(`Identifier: ${split.identifier}`); } console.log(`Number of pages: ${split.pages.length}`); console.log(`Markdown content: ${split.markdowns[0].substring(0, 100)}...`);}
for (const split of splitResponse.splits) { console.log(`Classification: ${split.classification}`); for (let i = 0; i < split.markdowns.length; i++) { console.log(` Page ${split.pages[i]} Markdown: ${split.markdowns[i].substring(0, 100)}...`); }}
Group splits by identifier:
const splitsByIdentifier = new Map<string, Array<typeof splitResponse.splits[0]>>();for (const split of splitResponse.splits) { if (split.identifier) { const existing = splitsByIdentifier.get(split.identifier) || []; existing.push(split); splitsByIdentifier.set(split.identifier, existing); }}for (const [identifier, splits] of splitsByIdentifier.entries()) { console.log(`Identifier '${identifier}': ${splits.length} split(s)`);}
Use the optional saveTo parameter to save the full API response as a JSON file. The parameter is available on parse, extract, and split. The directory is created automatically if it doesn’t exist.
Pass a directory path. The library names the file using the input document’s filename and the function called (for example, document_parse_output.json).
When passing Markdown content as a string (markdown: parseResponse.markdown), the library cannot derive a filename from the content. In this situation, use Set the File Name instead.
The parse response includes a markdown field that you can pass directly to other functions in the same script. To save the Markdown for downstream tasks, write it to a file: