Classification, Extraction, Summarisation, and Analysis — Powered by Amazon Bedrock and Amazon Textract, Running Entirely Within Your AWS Account
Every organisation has a document processing bottleneck. Documents arrive — invoices, contracts, applications, correspondence, compliance filings, medical records, onboarding packets — and someone has to read them, classify them, extract the relevant data, enter it into a system, route them to the right person, and file them in the right place. This manual processing is slow, expensive, error-prone, and the single largest barrier to scaling document-intensive operations.
AI document processing automates this work. Not by replacing human judgement, but by handling the mechanical steps — identifying what type of document it is, extracting the structured data from it, classifying its sensitivity, summarising its content, and routing it to the right workflow — so that human attention is reserved for decisions that require it.
FormKiQ provides AI document processing as an integrated capability within a governed document management platform — not as a standalone AI service that processes documents in isolation. Documents are classified, extracted, enriched, and analysed within FormKiQ, and the results feed directly into metadata, search indexes, workflows, and governance controls. All AI processing runs within your AWS account through Amazon Bedrock and Amazon Textract. Your documents never leave your cloud environment, and inference region controls ensure processing stays within your data residency boundaries.
The AI Document Processing Stack
FormKiQ's AI document processing operates as a layered stack — each layer builds on the previous, and organisations can enable the layers they need:
| Layer | Technology | What It Does | Availability |
|---|---|---|---|
| OCR | Tesseract | Extracts raw text from scanned documents, images, and image-based PDFs | All editions |
| Structured extraction | Amazon Textract | Extracts tables, form fields, key-value pairs, and spatial relationships from documents | Essentials+ |
| AI classification | Amazon Bedrock | Identifies document type and applies classification metadata automatically | Advanced / Enterprise |
| AI extraction | Amazon Bedrock | Extracts entities (names, dates, amounts, identifiers) from unstructured text and applies them as structured metadata | Advanced / Enterprise |
| AI sensitivity | Amazon Bedrock | Identifies documents containing PII, PHI, financial data, privileged content, or other sensitive information | Advanced / Enterprise |
| AI summarisation | Amazon Bedrock | Generates concise summaries of document content for triage, review, and discovery | Advanced / Enterprise |
| AI analysis | Amazon Bedrock | Analyses document content against criteria, checklists, or requirements — producing structured analytical output | Advanced / Enterprise |
The distinction between layers matters. OCR tells you what the text says. Textract tells you what the form fields contain. Bedrock tells you what the document means. Each layer adds understanding, and the output of each layer feeds into FormKiQ's metadata, search, and workflow systems.
OCR and Structured Extraction
The foundation of AI document processing is getting text out of documents that aren't born digital — scanned PDFs, faxes, images, and multi-page paper submissions.
Tesseract OCR (All Editions)
FormKiQ Core includes OCR processing using Tesseract, an open-source OCR engine that provides reliable text extraction for standard digitisation workflows. Tesseract runs within your AWS account using Lambda functions, with extracted text stored as part of the document's metadata and available for full-text search.
Tesseract works well for clearly printed text in standard layouts. It has limitations with handwriting, complex multi-column layouts, low-quality scans, and documents where spatial relationships matter (tables, forms).
Amazon Textract (Essentials and Above)
From Essentials onward, FormKiQ provides structured extraction using Amazon Textract — AWS's managed machine learning service for document text and data extraction. Textract goes beyond raw text to understand document structure:
| Textract Capability | What It Extracts | Why It Matters |
|---|---|---|
| Text extraction | All text from the document — including handwriting | Handles the document types that Tesseract struggles with |
| Table extraction | Tables as structured data with rows, columns, and cell values | Financial statements, lab results, comparison tables — extracted as data, not as a wall of text |
| Form extraction | Labelled fields as key-value pairs | Application forms, intake forms, registration documents — each field extracted with its label and value |
| Layout understanding | Spatial relationships between text elements — columns, headers, sections | Multi-column documents, letters with headers and footers — text extracted in reading order, not pixel order |
Custom Extraction Mappings
Textract's form and table extraction can be combined with custom mappings that route specific extracted fields directly to FormKiQ metadata attributes:
| Document Type | Extracted Fields | Metadata Result |
|---|---|---|
| Invoice | Vendor name, invoice number, date, line items, total | AP metadata for matching, approval, and archival |
| Contract | Counterparty, effective date, expiry, value | Contract metadata for lifecycle tracking and renewal |
| Application | Applicant name, date, type, key responses | Application metadata for eligibility routing and case creation |
| Insurance claim | Claimant, policy number, date of loss, amount | Claims metadata for intake routing and adjudication |
| Employee onboarding form | Name, role, start date, department, emergency contact | HR metadata for employee record creation |
AI-Powered Classification
Document classification is the gateway to governance. Until a document is classified — identified by type, tagged with the right metadata, assigned to the right category — governance rules can't apply to it. Manual classification is the bottleneck that slows intake, introduces inconsistency, and creates gaps where documents sit unclassified and ungoverned. FormKiQ's AI classification — powered by Amazon Bedrock — automates this step.
How AI Classification Works
- Document arrives — uploaded via API, web console, email, SFTP, or Document Gateway
- AI analysis triggered — a document action triggers Bedrock analysis based on the document's source, workflow, or document type definition
- Type identification — Bedrock analyses the document content and determines its type (invoice, contract, correspondence, application, medical record, policy document, etc.)
- Confidence scoring — the classification includes a confidence score indicating how certain the model is about its determination
- High-confidence routing — documents classified with high confidence are automatically tagged and routed to the appropriate workflow
- Low-confidence review — documents classified with low confidence are routed to a human review queue for verification before metadata is finalised
What Makes This Different from Rule-Based Classification
Rule-based classification (matching on filename, sender, folder, or keyword) works for predictable, consistent document sources. It breaks down when documents arrive from varied sources in varied formats — which is the reality for most intake-heavy organisations. AI classification works on document content, not on superficial attributes, which means it handles the messy, unpredictable documents that rules can't classify.
| Classification Approach | Strengths | Weaknesses |
|---|---|---|
| Filename / folder rules | Fast, simple, predictable | Breaks when filenames are inconsistent or documents are misnamed |
| Keyword matching | Catches specific terms | High false positive rate; can't handle synonyms, abbreviations, or context |
| AI classification (Bedrock) | Understands content and context; handles varied formats and sources | Requires confidence thresholds; low-confidence results need human review |
The practical approach is to combine all three: rules handle the predictable cases, AI handles the varied and unpredictable ones, and human review handles the edge cases that neither can resolve confidently.
AI-Powered Extraction and Enrichment
Beyond classification, Bedrock extracts meaningful entities from document content and applies them as structured, searchable metadata. This goes beyond Textract's form-field extraction — Bedrock understands context and can extract entities from unstructured narrative text, not just labelled form fields.
| Entity Type | What Gets Extracted | How It's Used in FormKiQ |
|---|---|---|
| People | Names, titles, roles, signatories | Metadata for party-based search, access control, and correspondence tracking |
| Organisations | Company names, agency names, departments | Metadata for counterparty-level views and vendor/client portfolio search |
| Dates | Effective dates, expiry dates, filing dates, deadlines | Metadata for milestone tracking, renewal alerting, and retention trigger |
| Financial | Amounts, payment terms, account numbers | Metadata for financial reporting, approval routing, and three-way matching |
| Identifiers | Case numbers, policy numbers, contract IDs, permit numbers | Metadata for cross-referencing with enterprise systems |
| Locations | Addresses, jurisdictions, regions | Metadata for jurisdiction-aware governance and data residency |
The key insight is that extraction without governance is just data entry automation. In FormKiQ, extracted entities become governed metadata — searchable, access-controlled, and tied to the document's lifecycle. An extracted expiry date becomes a renewal alert. An extracted counterparty becomes a portfolio-level search dimension. An extracted sensitivity indicator becomes an ABAC access restriction.
AI-Powered Sensitivity Classification
One of the highest-value AI capabilities for regulated organisations is automatic identification of documents containing sensitive information. Without sensitivity classification, organisations rely on users to correctly classify documents at the point of upload — which means sensitive documents regularly end up in general-access locations without appropriate controls. FormKiQ's sensitivity classification detects:
- Personally identifiable information (PII) — names combined with identifiers, addresses, dates of birth, financial account numbers
- Protected health information (PHI) — patient names, medical record numbers, diagnoses, treatment information
- Financial data — account numbers, transaction details, credit card numbers, financial statements
- Privileged communications — attorney-client communications, legal advice, litigation strategy
- Confidential business information — trade secrets, proprietary processes, M&A-related content, board materials
- Minor-related information — content relating to children or young people requiring heightened protection
When sensitivity is detected, configurable handling rules apply automatically — restricting access, applying encryption enforcement, routing to compliance review, or flagging for manual classification. The sensitivity classification itself becomes metadata on the document, enabling ABAC policies that restrict access based on the sensitivity level.
AI-Powered Summarisation
Document summarisation reduces the time required to triage, review, and understand lengthy documents. It's particularly valuable in high-volume intake workflows where reviewers process hundreds of documents daily and need to quickly determine relevance, priority, and required action.
| Use Case | What Gets Summarised | How It Helps |
|---|---|---|
| Intake triage | Incoming applications, submissions, or filings | Reviewers read the summary to determine priority and routing before reading the full document |
| Case review | Case file documents — investigation reports, medical records, legal filings | Case workers get a consolidated view of case history without reading every document in full |
| Correspondence digest | Email threads and correspondence chains | Account managers and case workers understand the key exchanges without full-thread review |
| Contract review | Executed contracts and agreements | Legal reviewers identify key terms, obligations, and risk indicators before detailed review |
| Archive discovery | Archived documents retrieved from cold storage | Researchers and investigators determine document relevance from the summary before retrieving the full document |
| Management reporting | Lengthy reports, audit findings, or compliance assessments | Executives receive plain-language summaries of documents that would otherwise require specialist interpretation |
Summaries are stored as metadata on the document — searchable and accessible without opening the full document. This is particularly valuable for archived documents in Glacier storage: the summary is available instantly even when the full document requires a retrieval request.
AI-Powered Document Analysis
Document analysis goes beyond extraction and summarisation to apply structured analytical judgement to document content. Analysis evaluates documents against criteria, identifies issues, and produces actionable outputs:
| Analysis Type | What It Does | Example Application |
|---|---|---|
| Compliance analysis | Assesses whether a document meets defined compliance requirements | Evaluate regulatory submissions for completeness and conformity before acceptance |
| Contract analysis | Identifies obligations, rights, termination conditions, non-standard terms | Flag contracts deviating from standard templates before legal review |
| Application analysis | Evaluates applications against eligibility criteria | Assess grant applications for required elements and eligibility before scoring |
| Consistency analysis | Identifies conflicts or gaps between related documents | Flag policy provisions that conflict with regulatory requirements or other policies |
| Risk analysis | Evaluates content for risk indicators | Surface high-risk clauses in vendor contracts before approval |
| Completeness analysis | Checks whether a document set meets defined requirements | Verify that an onboarding document package contains all required items before the employee's start date |
Analytical outputs are stored as structured metadata — making them searchable and available to workflow routing. A document that fails a compliance analysis can be automatically routed to a remediation queue. A contract flagged for non-standard terms can be automatically escalated to senior legal review.
Data Sovereignty for AI Processing
A critical differentiator for FormKiQ's AI processing: every step runs within your AWS account. This eliminates the data residency and data sovereignty concerns that prevent many regulated organisations from adopting AI document processing.
| Concern | Third-Party AI Services | FormKiQ on Your AWS Account |
|---|---|---|
| Where is data processed? | Vendor's infrastructure — often unclear which region or country | Your AWS account, in the region you select, with inference region controls |
| Who has access to data during processing? | The vendor's systems and potentially their staff | Your AWS services — no vendor access to document content during processing |
| Is data used for model training? | Varies by vendor — some use customer data for model improvement | Amazon Bedrock does not use customer data for model training |
| Can you audit processing? | Limited — vendor provides logs if available | CloudTrail records every API call; Bedrock invocation logs in your account |
| Data residency compliance | Difficult to verify — depends on vendor's architecture | Verifiable — your AWS region configuration determines where processing occurs |
For organisations subject to GDPR, HIPAA, PIPEDA, or data sovereignty requirements, this architecture eliminates the need to choose between AI capability and regulatory compliance.
Selective Processing and Human Oversight
AI document processing in FormKiQ is designed for selective, incremental deployment with human oversight — not as an all-or-nothing automation layer:
- Per-document-type — enable specific AI capabilities for specific document types (OCR for all scans, classification for intake submissions, full analysis for contracts only)
- Per-workflow — apply AI processing at specific workflow stages (classification at intake, summarisation before review, analysis before approval)
- Per-site — in multi-tenant deployments, enable different AI capabilities for different organisational sites or business units
- Confidence-based routing — low-confidence AI outputs routed to human review queues for verification before metadata is finalised
- Incremental enablement — start with OCR, add Textract extraction, then layer AI capabilities as confidence in the outputs grows
This approach ensures that AI augments human judgement rather than replacing it unsupervised. The organisation controls which document types are processed by AI, at which workflow stages, and with what confidence thresholds — and every AI decision is auditable and reversible.
FormKiQ Editions for AI Document Processing
| Capability | Core | Essentials | Advanced | Enterprise |
|---|---|---|---|---|
| OCR — Tesseract | ✓ | ✓ | ✓ | ✓ |
| OCR & Structured Extraction — Textract | ✓ | ✓ | ✓ | |
| Custom Extraction Mappings | ✓ | ✓ | ✓ | |
| AI Classification (Bedrock) | ✓ | ✓ | ||
| AI Entity Extraction (Bedrock) | ✓ | ✓ | ||
| AI Sensitivity Classification (Bedrock) | ✓ | ✓ | ||
| AI Summarisation (Bedrock) | ✓ | ✓ | ||
| AI Document Analysis (Bedrock) | ✓ | ✓ | ||
| Inference Region Controls | ✓ | ✓ | ||
| KnowledgeBase | ✓ | ✓ | ||
| Document Gateway Modules | ✓ | ✓ | ||
| Integration Framework Modules | ✓ | ✓ | ||
| Multi-Instance & Multi-Region Licensing | ✓ | ✓ | ||
| Vendor-Managed & Hybrid Deployment | ✓ | |||
| Compliance Consulting | ✓ | |||
| Support | Community | 2-business-day SLA | Private Slack + 40 hrs onboarding | 8-business-hour SLA + strategic support |
Getting Started
FormKiQ Core — including Tesseract OCR — can be deployed to your AWS account in fifteen to twenty minutes. Amazon Textract integration is available from Essentials onward. AI-powered classification, extraction, sensitivity detection, summarisation, and analysis are available on Advanced and Enterprise.
Frequently Asked Questions
What is AI document processing?
AI document processing is the use of machine learning and large language models to automate the classification, data extraction, sensitivity detection, summarisation, and analysis of documents. It extends traditional OCR (which extracts text) by adding understanding — identifying what type of document it is, what the important data elements are, whether it contains sensitive information, and what action should be taken.
How is FormKiQ's AI processing different from standalone AI document processing services?
Standalone AI services process documents in isolation — you send a document, get results, and then manually integrate those results into your document management workflow. FormKiQ integrates AI processing directly into the governed document lifecycle. AI outputs become metadata on the document, feed into search indexes, trigger workflow routing, and apply governance controls. The document is classified, enriched, and governed in a single platform rather than processed in one system and stored in another.
Does document content leave my AWS account during AI processing?
No. Every layer of FormKiQ's AI processing stack — Tesseract OCR, Amazon Textract, Amazon Bedrock — runs within your AWS account. Documents are never sent to external services. Inference region controls for Bedrock allow you to specify which AWS regions are used for model processing, ensuring content stays within your data residency boundaries.
What happens when AI classification is uncertain?
Every AI classification includes a confidence score. Documents classified with high confidence are automatically tagged and routed. Documents classified with low confidence are routed to a human review queue for manual verification before metadata is finalised. Confidence thresholds are configurable per document type, allowing organisations to set the boundary between automated and human-reviewed classification based on their risk tolerance.
Can I start with basic OCR and add AI later?
Yes. FormKiQ's AI capabilities are layered and independently enableable. You can deploy Core with Tesseract OCR, upgrade to Essentials for Textract structured extraction, and add AI classification, extraction, and analysis on Advanced — all within the same deployment. Each layer builds on the previous without requiring migration or reconfiguration.
What large language models does FormKiQ use?
FormKiQ's AI processing runs through Amazon Bedrock, which supports multiple large language models including Anthropic Claude, Amazon Nova, and other available models. The specific model used can be configured based on the processing task, cost considerations, and regional availability. All models are accessed within your AWS account — no external API calls are made.