Skip to main content
POST
/
v1
/
documents
/
extract
curl --request POST \
  --url 'https://api.pardocs.com/v1/documents/extract?include_coordinates=true' \
  --header 'x-api-key: <api-key>' \
  --form 'file=@/path/to/document.pdf' \
  --form 'template_ids=["gAWADq4aB"]'
{
  "ocr": "<array>",
  "markdown": "<string>",
  "result": [
    {
      "document_type": "<string>",
      "pages": [
        123
      ],
      "properties": {}
    }
  ]
}
The Extract from File or URL endpoint runs extraction on a document you send in the same request—either as an uploaded file or via a URL. No document is stored in ParDocs; you get OCR, markdown, and extracted results back immediately. Use this when you want to process a document once without creating a document record. Query Parameters:
  • include_coordinates (boolean, optional): When true, each extracted field in result includes the source location: value, page, and bounding_box_coordinates (normalized [x1, y1, x2, y2]). Default is false.
  • force_exclusive_template (boolean, optional): When true, the first template’s document type is used for the entire document instead of auto-splitting by type. Default is false.
  • document_layout_analysis (boolean, optional): When true, layout analysis is run before reading (useful for complex or table-heavy documents). Default is false.
Request Body (multipart/form-data):
  • file (binary, optional): The document file to extract from. Provide either file or url, not both.
  • url (string, optional): The URL of the document to extract from. Provide either file or url, not both.
  • template_ids (array of strings, required): List of template_ids used for splitting and extraction.
curl --request POST \
  --url 'https://api.pardocs.com/v1/documents/extract?include_coordinates=true' \
  --header 'x-api-key: <api-key>' \
  --form 'file=@/path/to/document.pdf' \
  --form 'template_ids=["gAWADq4aB"]'
Response The response is a single object with:
  • ocr: Array of OCR output per page.
  • markdown: Full document text as markdown.
  • result: Array of split sections. Each item has document_type, pages, and properties (the extracted key–value pairs for that section).
When include_coordinates=true, each value in properties is an object with value, page, and bounding_box_coordinates (normalized [x1, y1, x2, y2]) instead of a plain string or number, so you can map each field back to a region on the document.

Authorizations

x-api-key
string
header
required

Query Parameters

include_coordinates
boolean
default:false

When true, each extracted field in result includes value, page, and normalized bounding_box_coordinates [x1, y1, x2, y2] (0–1) for the source region in the document.

force_exclusive_template
boolean
default:false

When true, force the first template's document_type for the whole document instead of auto-splitting by type.

document_layout_analysis
boolean
default:false

When true, run layout analysis before reading (e.g. for complex or table-heavy documents).

Body

multipart/form-data
template_ids
string[]
required

List of template_id values to use for splitting and extraction.

file
file

Document file to extract from. Provide either file or url, not both.

url
string<uri>

URL of the document to extract from. Provide either file or url, not both.

Response

Successful Response. Returns ocr, markdown, and result (array of { document_type, pages, properties }). When include_coordinates=true, each property value is an object with value, page, and bounding_box_coordinates.

ocr
array

OCR output per page from the document.

markdown
string

Converted markdown text of the full document.

result
object[]