Long-Term Document Storage, Preservation, and Retrieval — on AWS Infrastructure with Cost-Optimized Tiered Storage
Every organization accumulates documents that are no longer in active daily use but must be preserved — for regulatory retention, legal defensibility, institutional memory, or operational reference. These documents don't belong in active file shares, email inboxes, or legacy on-premises storage, but they also can't simply be deleted. They need to be archived: stored durably, indexed for retrieval, protected by access controls, and managed under retention policies that govern how long they're kept and what happens when they expire.
FormKiQ provides a governed document archive on AWS — combining Amazon S3's tiered storage classes for cost-optimized long-term retention with structured metadata, full-text search, access controls, and retention enforcement. Deployed directly into your AWS account, FormKiQ gives you an archive that is searchable, governed, and auditable — not a dark storage vault where documents go to disappear.
What Is a Document Archive?
A document archive is distinct from both active document management and records management, though it intersects with both:
| Active Document Management | Records Management | Document Archive | |
|---|---|---|---|
| Purpose | Operational use — create, edit, collaborate, process | Governance — retention enforcement, legal hold, disposition | Preservation — long-term storage and retrieval of inactive content |
| Content | Documents in active use | Records with formal retention obligations | Documents and records that have moved past active use but require preservation |
| Access pattern | Frequent read/write | Controlled read, restricted write | Infrequent read, no write (immutable or near-immutable) |
| Cost priority | Performance and availability | Compliance and auditability | Storage cost optimization with acceptable retrieval latency |
| Lifecycle action | Edit, version, route, approve | Retain, hold, dispose | Preserve, migrate, retrieve on demand |
FormKiQ supports all three models within a single platform — documents can move from active management through records governance into archival storage without leaving the governed environment or losing their metadata, search indexes, or audit history.
Why AWS for Document Archives
AWS provides the foundational storage architecture that makes cost-effective, durable, governed archival storage practical at any scale.
S3 Storage Classes for Archival Tiering
Amazon S3 offers multiple storage classes designed for different access patterns and cost profiles. FormKiQ leverages S3 lifecycle policies to move documents through storage tiers automatically as they age:
| S3 Storage Class | Access Pattern | Retrieval Time | Cost Profile | Archive Use Case |
|---|---|---|---|---|
| S3 Standard | Frequent access | Milliseconds | Highest storage cost | Active documents not yet ready for archival |
| S3 Infrequent Access (IA) | Monthly or less | Milliseconds | ~45% lower than Standard | Recently archived documents that may still be referenced |
| S3 Intelligent-Tiering | Variable / unknown | Milliseconds to hours | Automatic optimization | Archives with unpredictable access patterns |
| S3 Glacier Instant Retrieval | Quarterly or less | Milliseconds | ~68% lower than Standard | Archived documents needing immediate access when requested |
| S3 Glacier Flexible Retrieval | 1–2 times per year | Minutes to hours | ~78% lower than Standard | Long-term archive with occasional retrieval needs |
| S3 Glacier Deep Archive | Rarely if ever | 12–48 hours | Lowest cost (~95% lower than Standard) | Regulatory retention archives, permanent preservation, legal defensibility |
FormKiQ manages these transitions transparently — documents move through storage tiers based on configurable lifecycle policies without changing their metadata, search index entries, access controls, or audit trail. A document archived to Glacier Deep Archive is still searchable by metadata and full-text content, and still subject to its retention policy and access controls.
Durability and Integrity
Amazon S3 provides 99.999999999% (eleven nines) durability — meaning that for every ten million objects stored, you can statistically expect to lose one object every ten thousand years. For archival storage, this level of durability eliminates the data loss risk that drives organizations to maintain expensive redundant on-premises storage infrastructure.
S3 also supports object versioning, integrity checksums, and object lock (WORM) configurations that protect archived documents from accidental or unauthorized modification or deletion.
Document Archive Capabilities in FormKiQ
Archival Ingestion
Documents enter the archive through multiple pathways:
- Lifecycle transition — documents in active FormKiQ repositories automatically transition to archival status based on metadata, age, or workflow events (case closure, contract expiry, project completion)
- Bulk migration — large-scale ingestion of legacy archives from file shares, on-premises storage, legacy ECM platforms, or other repositories using the FormKiQ CLI or API
- Document Gateway Modules — structured ingestion from SharePoint, Google Drive, email, SFTP, and scanner capture for documents entering the archive directly from external systems
- Records disposition — records reaching the end of their active retention period can be transferred to the archive rather than destroyed, when the retention policy calls for preservation rather than disposition
Metadata and Search
Archived documents retain their full metadata and search index — ensuring they remain discoverable even after moving to low-cost storage tiers:
| Capability | Description |
|---|---|
| Full-text search | Powered by Amazon OpenSearch — archived documents remain full-text searchable regardless of storage tier |
| Metadata search | Query by any combination of classification attributes, dates, parties, document types, and custom metadata |
| Tag schemas and composite keys | Consistent metadata applied across archived collections |
| Cross-collection search | Search across multiple archive collections and active repositories within a single deployment |
| OCR and IDP | Documents can be OCR-processed and metadata-enriched at the point of archival ingestion — ensuring scanned historical documents are as searchable as born-digital content |
Access Controls
- Attribute-based access control (ABAC) — archived document visibility tied to metadata values, enabling access policies based on classification, department, sensitivity, or custom attributes
- Temporal access controls — access rules that change based on archival age or declassification schedules
- Read-only enforcement — archived documents protected from modification, with write access limited to metadata updates by authorized administrators
- Audit trail — every access to an archived document is recorded with timestamps and actor identification
Retention and Disposition
- Configurable retention policies — applied at the document, collection, or document-type level with automatic enforcement
- Legal hold — archived documents under legal hold are protected from disposition regardless of retention schedule
- Defensible disposition — audit-logged disposition workflows for documents reaching the end of their archival retention period
- Permanent preservation — documents designated for permanent retention are protected from disposition indefinitely, with periodic integrity verification
Immutable Storage (WORM)
For organizations with regulatory requirements for non-rewritable, non-erasable storage, FormKiQ supports S3 Object Lock configurations:
| Object Lock Mode | Behavior | Regulatory Use Case |
|---|---|---|
| Governance mode | Prevents deletion by most users; authorized administrators can override | Internal compliance, audit protection |
| Compliance mode | Prevents deletion by all users, including root account — cannot be overridden until retention period expires | SEC 17a-4, FINRA, CFTC (US) — broker-dealer record retention; UK FCA SYSC rules; MiFID II article 76 (EU) — equivalent immutable retention requirements for financial records |
Object Lock is applied at the object level within S3, ensuring that individual archived documents are protected according to their specific regulatory requirements.
Migration from Legacy Archive Systems
Organizations frequently need to migrate document archives from legacy systems — on-premises file servers, tape storage, legacy ECM platforms, or cloud storage accounts that have become ungoverned repositories.
Common Migration Sources
| Source | Migration Approach |
|---|---|
| On-premises file shares | FormKiQ CLI bulk upload with metadata extraction from folder structures and file properties |
| Legacy ECM platforms (OpenText, Hyland, IBM FileNet, Laserfiche) | Export and re-ingest with metadata schema mapping from legacy field structures to FormKiQ metadata |
| Email archives | Email Ingestion Gateway or bulk import with metadata extraction from message headers, body, and attachment properties |
| Tape and offline storage | Stage to S3 or local storage, then bulk ingest via CLI or API |
| Existing S3 buckets | Cloud Storage Gateway for in-place governance overlay, or bulk re-ingest with metadata enrichment |
| Google Drive / SharePoint | Document Gateway Modules for structured migration with metadata mapping |
Migration Process
- Inventory and assessment — catalog the source archive to understand volume, file types, metadata availability, and retention requirements
- Metadata schema mapping — define how source metadata (folder names, file properties, legacy ECM fields) maps to FormKiQ's metadata architecture
- OCR and enrichment — scanned and image-based documents processed with OCR and IDP at ingestion to create searchable full-text indexes
- Bulk ingestion — FormKiQ CLI or API handles high-volume ingestion with metadata application, classification, and storage tier assignment
- Validation — document counts, metadata integrity, and search index completeness verified against source inventory
- Retention assignment — archival retention policies applied to migrated documents based on document type, age, and regulatory requirements
Migration services are available as add-on professional services on Advanced and Enterprise editions.
AI-Powered Archive Analysis
FormKiQ's AI Processing and Analysis module — powered by Amazon Bedrock — can be applied to archived collections for classification, enrichment, and discovery:
| AI Capability | Archive Application |
|---|---|
| Document type classification | Classify unstructured legacy archives — identify document types across collections that were previously organized only by folder name or date |
| Metadata extraction | Extract key entities (names, dates, amounts, identifiers) from archived documents and apply them as searchable structured metadata |
| Content summarization | Generate summaries of lengthy archived documents to support discovery without requiring full document retrieval from cold storage |
| Sensitivity classification | Identify documents containing PII, PHI, or other sensitive content within legacy archives — enabling appropriate reclassification and access control |
| Content analysis | Analyze archived documents against regulatory requirements, retention rules, or organizational policies to identify compliance gaps |
All AI processing runs within your AWS account through Amazon Bedrock, using supported large language models including Anthropic Claude, Amazon Nova, and other available models. Archived document content never leaves your cloud environment.
Integration with Enterprise Systems
FormKiQ's Integration Framework Modules connect document archives to enterprise systems that reference archived content:
| Framework | Archive Use Cases |
|---|---|
| ERP | Archived purchase orders, invoices, contracts, and financial records linked to ERP business objects — retrievable from the ERP interface via document deeplinks |
| CRM | Archived customer correspondence, account documentation, and engagement records linked to CRM records |
| HRIS | Archived employee records, onboarding documents, and separation records linked to HR system records — with retention tied to employment lifecycle and jurisdictional requirements |
| Case Management | Closed case files archived with full metadata and access controls — retrievable from case management systems for reference or legal production |
Document Archive vs. Unmanaged Storage
Many organizations use unmanaged storage — file shares, S3 buckets without governance, or cloud drives — as de facto archives. This creates significant risk:
| Unmanaged Storage | FormKiQ Document Archive | |
|---|---|---|
| Search | Folder browsing, filename search only | Full-text search and structured metadata search across entire archive |
| Access control | Folder-level permissions, often overly broad | Attribute-based access control at the document level |
| Retention | Manual or absent — no enforcement, no audit trail | Configurable retention with automatic enforcement and audit-logged disposition |
| Legal hold | Manual identification and protection — error-prone | Systematic hold application and tracking with audit evidence |
| Cost optimization | Single storage tier — paying active-access prices for inactive content | Automatic tiering through S3 storage classes based on access patterns |
| Compliance evidence | None — no access logs, no retention records, no disposition evidence | Complete audit trail for every access, retention, and disposition event |
| Migration risk | Content locked in platform-specific formats or structures | Format-agnostic storage on S3 — documents accessible through standard AWS tools regardless of FormKiQ status |
FormKiQ Editions for Document Archive
| Capability | Core | Essentials | Advanced | Enterprise |
|---|---|---|---|---|
| Document Storage (S3) & API | ✓ | ✓ | ✓ | ✓ |
| Tagging, Search & Classification | ✓ | ✓ | ✓ | ✓ |
| OCR (Tesseract) | ✓ | ✓ | ✓ | ✓ |
| Multi-Tenant Support | ✓ | ✓ | ✓ | ✓ |
| SSO (SAML — Entra, Google, Auth0) | ✓ | ✓ | ✓ | |
| Workflows, Queues & Rulesets | ✓ | ✓ | ✓ | |
| Encryption (in-transit & at-rest) | ✓ | ✓ | ✓ | |
| Document Control & Versioning | ✓ | ✓ | ✓ | |
| OCR & IDP (AWS Textract) | ✓ | ✓ | ✓ | |
| Antivirus & Anti-Malware | ✓ | ✓ | ✓ | |
| AI Processing & Analysis (Bedrock) | ✓ | ✓ | ||
| Enhanced Full-Text Search (OpenSearch) | ✓ | ✓ | ||
| Document Gateway Modules | ✓ | ✓ | ||
| Integration Framework Modules | ✓ | ✓ | ||
| Multi-Instance & Multi-Region Licensing | ✓ | ✓ | ||
| Vendor-Managed & Hybrid Deployment | ✓ | |||
| Custom SLAs & Compliance Consulting | ✓ | |||
| OEM & Partner Licensing | ✓ | |||
| Support | Community (Slack & GitHub) | Support Portal (2-business-day SLA) | Private Slack + videoconference + 40 hrs onboarding | Rapid response (8-business-hour SLA) + strategic architecture support |
Compliance and Regulatory Alignment for Archives
| Framework | Archive-Specific Requirements | FormKiQ Capabilities |
|---|---|---|
| SEC 17a-4 / FINRA / CFTC (US) / FCA (UK) / MiFID II (EU) | Non-rewritable, non-erasable storage for broker-dealer records; UK FCA SYSC rules and MiFID II article 76 require equivalent WORM-style retention for financial communications and records | S3 Object Lock (Compliance mode), retention enforcement, audit-logged access |
| SOX (Sarbanes-Oxley) | Financial record retention for public companies | Retention scheduling, access controls, audit trails, immutable storage |
| HIPAA | Long-term retention of protected health information | Encryption (KMS), ABAC, audit trails, data residency enforcement |
| GDPR / UK GDPR | Right-to-erasure balanced against retention obligations | Retention controls, deletion workflows, data residency enforcement |
| NARA (US) / UK National Archives / Library and Archives Canada / National Archives of Australia | Federal records transfer and permanent preservation requirements (NARA in US); equivalent national archives authorities in UK (TNA), Canada (LAC), and Australia (NAA) impose similar obligations for government records | Retention scheduling, transfer-to-archive workflows, preservation metadata |
| FDA 21 CFR Part 11 (US) / EU MDR / IVDR | Long-term retention of electronic records with audit trail integrity; EU Medical Device Regulation and IVDR impose equivalent electronic records and audit trail requirements | Document versioning, audit trails, access controls, integrity verification |
| State / Provincial retention statutes | Jurisdiction-specific retention for employment, financial, and operational records | Configurable retention by jurisdiction, document type, and business unit |
| ISO 14641 | Standard for electronic archiving — integrity, durability, and traceability | Checksums, versioning, audit trails, S3 durability guarantees |
Who Uses Document Archive on AWS
| Industry | What Gets Archived | Key Drivers |
|---|---|---|
| Financial Services & Insurance | Trading records, client correspondence, regulatory filings, claims documentation, audit evidence | SEC 17a-4, FINRA, SOX (US), FCA (UK), MiFID II (EU), APRA (Australia), OSFI (Canada), WORM requirements |
| Government & Public Sector | Constituent records, historical correspondence, policy archives, FOIA-responsive records | NARA (US), TNA (UK), LAC (Canada), NAA (Australia), state/provincial retention statutes, FOIA / Access to Information / FOI readiness |
| Healthcare & Life Sciences | Patient records, clinical trial data, regulatory submissions, quality records | HIPAA, FDA 21 CFR Part 11, long-term retention |
| Higher Education | Student records, research data, institutional archives, grant documentation | FERPA, research data retention, institutional preservation |
| Legal & Professional Services | Closed matter files, client records, correspondence, billing archives | Professional regulatory retention, conflict reference |
| Manufacturing & Energy | Engineering drawings, safety records, environmental compliance documentation, quality records | Sector-specific retention, ISO quality requirements |
| Media & Cultural Institutions | Digital collections, historical records, photographic archives, institutional correspondence | Preservation mandates, public access requirements |
Deployment Models
| Model | Description | Availability |
|---|---|---|
| Customer-Managed AWS | Deploys directly into your AWS account via CloudFormation. Full control of infrastructure, networking, encryption keys, and operations. | All editions |
| Vendor-Managed | FormKiQ manages the AWS infrastructure on your behalf — deployment, updates, and operational support. | Enterprise |
| Hybrid | You retain control of specific components (encryption keys, network config) while delegating operational management to FormKiQ. | Enterprise |
Every deployment is a dedicated, isolated instance in an AWS account owned by or designated by the customer. FormKiQ does not operate a shared multi-tenant environment.
Getting Started
FormKiQ Core can be deployed to your AWS account in fifteen to twenty minutes using a one-click install via AWS CloudFormation. For organizations migrating legacy archives or deploying governed archival storage at scale, FormKiQ Advanced and Enterprise provide Document Gateway Modules, AI Processing, Enhanced Full-Text Search, and Integration Framework Modules.
For organizations evaluating document archive on AWS, FormKiQ offers a Proof-of-Value program — a three-month deployment in a FormKiQ-managed AWS environment that provides full platform access in a non-production setting.
Frequently Asked Questions
What is a document archive on AWS?
A document archive on AWS is a governed, long-term document storage environment deployed on Amazon Web Services — combining S3's tiered storage classes for cost optimization with structured metadata, full-text search, access controls, and retention enforcement. Unlike unmanaged storage, a document archive maintains the searchability, governance, and auditability of archived content throughout its preservation lifecycle.
How does S3 storage tiering reduce archive costs?
Amazon S3 offers storage classes optimized for different access frequencies — from Standard (frequent access) through Glacier Deep Archive (rarely accessed). FormKiQ manages automatic transitions between these tiers based on configurable lifecycle policies. Documents that haven't been accessed move to progressively cheaper storage — potentially reducing storage costs by up to 95% compared to active-tier pricing — while remaining searchable by metadata and full-text content.
Can I search archived documents stored in Glacier?
Yes. FormKiQ maintains metadata and full-text search indexes in Amazon DynamoDB and OpenSearch regardless of the S3 storage tier. You can search for and identify archived documents instantly. Retrieving the document content itself may take minutes to hours depending on the Glacier tier, but the search and identification step is always immediate.
What is S3 Object Lock and when do I need it?
S3 Object Lock provides WORM (Write Once Read Many) storage — preventing documents from being deleted or overwritten for a defined retention period. Compliance mode Object Lock is required by regulations like SEC 17a-4 and FINRA, UK FCA SYSC rules, or MiFID II article 76 (EU) that mandate non-rewritable, non-erasable storage for certain financial records. FormKiQ supports Object Lock configuration for documents that require this level of immutability.
How do I migrate an existing archive to FormKiQ?
FormKiQ supports bulk migration from file shares, legacy ECM platforms, email archives, tape storage, existing S3 buckets, and cloud storage services. The process involves inventory assessment, metadata schema mapping, OCR processing for scanned content, bulk ingestion via the FormKiQ CLI or API, and validation. Migration services are available as add-on professional services on Advanced and Enterprise editions.
How is a document archive different from records management?
Records management is a governance discipline focused on retention enforcement, legal hold, and defensible disposition — controlling how long records are kept and ensuring they are disposed of lawfully. A document archive is focused on long-term preservation and cost-optimized storage — keeping documents accessible and governed over extended periods. FormKiQ supports both within a single platform — records management policies can govern archived documents, and documents can transition from active records management into archival storage as part of their retention lifecycle.
What happens to my archived data if I stop using FormKiQ?
Your archived documents remain in Amazon S3 within your own AWS account. You own the S3 buckets, the metadata in DynamoDB, and the search indexes in OpenSearch. You can access these resources directly through AWS or other tools at any time, regardless of your FormKiQ subscription status. FormKiQ's architecture ensures you are never locked into the platform for access to your own archived content.