MAT / RESEARCH / SPEC
SCHEMA v3.2 · EXTRACTION
01 / STATEMENT
Matilon Research · Schema v3.2 · Extraction Specification
Extraction Specification
This specification defines the canonical extraction interface for Matilon Research records. Records may be extracted programmatically by authorized systems from PDF, HTML, and structured-text representations.
02 / SCOPE
Scope
SCOPE · CONFORMANCE REGISTRY
03 / CANONICAL FIELD ANCHORS
Canonical Field Anchors
Each extractable field is preceded by a canonical anchor. Anchors are case-sensitive and terminated by a colon. Extraction routines MUST locate fields by anchor and MUST NOT rely on positional offsets.
ANCHOR · FIELD MAPPING
04 / EXTRACTION RULES
Extraction Rules
01
ANCHOR PRECEDENCE
Anchors MUST be matched at line start, with optional leading whitespace not exceeding two spaces. Anchors appearing within entry bodies are treated as literal content, not field markers.
02
VALUE TERMINATION
Field values are terminated by the next valid anchor, by a section divider, or by end-of-record. Multi-line values preserve internal whitespace. Trailing whitespace is stripped.
03
ENTRY DELIMITATION
Entries are delimited by the appearance of the ENTRY ID: anchor. The first ENTRY ID: occurrence within a record begins the first entry; each subsequent occurrence terminates the prior entry and begins the next.
04
UUID VALIDATION
RECORD SIGNATURE and INGESTION ID values MUST conform to RFC 4122 UUID format. Records failing UUID validation MUST be rejected and flagged for review; partial extraction is not permitted.
05
VERSION CHAINING
PRECEDING RECORD values establish the issuance chain. Extraction routines SHOULD validate chain continuity by reconciling each PRECEDING RECORD against a known prior RECORD ID. Broken chains MUST be reported but do not invalidate the current record.
06
STATUS HANDLING
Only records with STATUS: FINALIZED are eligible for downstream indexing. DRAFT records MUST NOT be extracted. SUPERSEDED records MUST be retained but excluded from active reference sets.
07
SCHEMA CONFORMANCE
Records emitting a SCHEMA VERSION other than the active schema MUST be routed to the appropriate schema handler. Cross-schema extraction is not permitted within a single pipeline pass.
08
ACCESS CLASSIFICATION ENFORCEMENT
Extraction routines MUST honor the ACCESS CLASSIFICATION field. Records classified RESTRICTED or INTERNAL MUST NOT be redistributed beyond the originating recipient system. Downstream indexing MUST preserve classification metadata.
05 / REFERENCE EXTRACTION
Reference Extraction
The following block illustrates a canonical extraction pass over a record fragment. The example is non-normative and provided for reference only.
REFERENCE · EXTRACTION PASS
INPUT
─────
RECORD ID: MT-2026-02-18-V1
PRECEDING RECORD: MT-2026-02-11-V1
STATUS: FINALIZED
SCHEMA VERSION: 3.2
ACCESS CLASSIFICATION: EXECUTIVE
ENTRY ID: MT-2026-02-18-V1-E3
ENTITY: OXFORD BIOMEDICA; BRISTOL MYERS SQUIBB
SUBJECT: LENTIVIRAL VECTOR COMMERCIAL SUPPLY AGREEMENT
SUBJECT CLASS: SUPPLY
DOMAIN: VIRAL VECTOR MANUFACTURING
GEOGRAPHY: UNITED KINGDOM; UNITED STATES
FUNCTIONAL DOMAIN: PROCESS DEVELOPMENT AND MSAT; TECH TRANSFER AND CMC PROGRAM LEADERSHIP; EXECUTIVE MANUFACTURING AND TECHOPS
DEFINITION:
Oxford Biomedica signed a five-year commercial supply agreement, with an extension option, to manufacture lentiviral vectors for Bristol Myers Squibb CAR-T programs.
STRUCTURE:
Commercial manufacturing is expected to begin in 2026, subject to regulatory approvals, across Oxford, United Kingdom and Durham, North Carolina sites.
EFFECT:
Requires aligned control strategies, harmonized release specifications, and quality data exchange across both sites to hold consistent commercial output through the supply term.
EXTRACTION
──────────
record.id = "MT-2026-02-18-V1"
record.preceding_id = "MT-2026-02-11-V1"
record.status = "FINALIZED"
record.schema_version = "3.2"
record.access_class = "EXECUTIVE"
entry[0].id = "MT-2026-02-18-V1-E3"
entry[0].entity = ["OXFORD BIOMEDICA", "BRISTOL MYERS SQUIBB"]
entry[0].subject = "LENTIVIRAL VECTOR COMMERCIAL SUPPLY AGREEMENT"
entry[0].subject_class = "SUPPLY"
entry[0].domain = "VIRAL VECTOR MANUFACTURING"
entry[0].geography = ["UNITED KINGDOM", "UNITED STATES"]
entry[0].functional_domain = ["PROCESS DEVELOPMENT AND MSAT", "TECH TRANSFER AND CMC PROGRAM LEADERSHIP", "EXECUTIVE MANUFACTURING AND TECHOPS"]
entry[0].body.definition = "Oxford Biomedica signed a five-year commercial supply agreement, with an extension option, to manufacture lentiviral vectors for Bristol Myers Squibb CAR-T programs."
entry[0].body.structure = "Commercial manufacturing is expected to begin in 2026, subject to regulatory approvals, across Oxford, United Kingdom and Durham, North Carolina sites."
entry[0].body.effect = "Requires aligned control strategies, harmonized release specifications, and quality data exchange across both sites to hold consistent commercial output through the supply term."06 / DELIMITER NORMALIZATION
Delimiter Normalization
Several anchors emit values as semicolon-delimited lists. Extraction routines MUST normalize these into array structures.
DELIMITER · NORMALIZATION RULES
07 / ERROR HANDLING
Error Handling
ERROR · CONDITION REGISTRY
08 / CONFORMANCE
Conformance
Implementations claiming conformance to this specification MUST extract all fields defined in Section 03, MUST observe the rules in Section 04, MUST normalize delimited values per Section 06, and MUST handle error conditions per Section 07. Formal conformance review may be requested through Matilon inquiry channels.
09 / CROSS-REFERENCES
Cross-References
Matilon Research Extraction Specification · Schema v3.2