Skip to content

Schema Reference

This page documents the data shapes that make the archive reusable. It is descriptive, not a formal validator yet. The rule is simple: external tools may consume these files, but review-state fields still decide whether a record can be cited as verified fact.

IDPurposePathExistsRepresentative Fields
source_catalogSeeded source records and custody pointers.sources/source_catalog.jsonyesinternet_archive_id, local_raw_file, processed_status, site_path, source_id, source_status, title, year
research_indexCorpus-wide processing inventory.processed/research_index.jsonyesquality_note, source_count, sources
evidence_ledgerTraceability records for source/candidate/promoted items.processed/evidence_ledger.jsonyescollection, confidence, evidence, id, label, person, record_id, record_type
chapter_atlasTheme routing records for processed sections.processed/chapter_atlas.jsonyesid, kind, line_end, line_start, sequence, source_id, source_title, status
book_coverage_atlasBook-level coverage records for processed sources and sections.processed/book_coverage_atlas.jsonyesgenerated_at, quality_note, source_count, sources, total_sections
chapter_workbenchSection-level research workbench records.processed/chapter_workbench.jsonyesconcept_hits, equation_count, equations, figure_count, figures, glossary_hits, id, kind
concept_concordanceSource-text concept hit records.processed/concept_concordance.jsonyescollection, concepts, generated_at, person, quality_note, section_count, source_count, total_concepts
canonical_equationsFirst equation canon and review state.processed/canonical_equations.jsonyesid, modern_form, original_form, site_path, source_id, source_ref, source_title, status
completion_auditSource-by-source readiness gates.processed/completion_audit.jsonyescurated_public_pages, gates, has_ocr_seed, has_source_manifest, links, next_actions, original_crop_manifests, processed_status
citation_indexProject and source citation records.processed/citation_index.jsonyesauthor, id, issued, recommended_citation, site_url, source_url, title, type
notation_ledgerEquation notation and translation ledger.processed/notation_ledger.jsonyesequation_id, modern_form, original_form, review_actions, site_path, source_id, source_ref, source_title
diagram_provenance_ledgerOriginal crop and redraw provenance ledger.processed/diagram_provenance_ledger.jsonyesasset_type, crop_box_pixels, height, id, manifest_path, output_path, public_url, quality_note
schema_referenceMachine-readable schema/reference guide.processed/schema_reference.jsonyesfiles, generated_at, quality_note
expert_review_packetsReview bundles for experts and contributors.processed/expert_review_packets.jsonyesartifact_links, id, ready_count, reviewer_profile, scope, tasks, title
release_readinessNamed publication release levels and readiness states.processed/release_readiness.jsonyesgenerated_at, levels, quality_note
accessibility_auditAutomated accessibility-readiness scan and manual review gates.processed/accessibility_audit.jsonyesgates, generated_at, html_table_count, iframe_count, image_tag_count, image_tag_missing_alt, issue_pages, long_table_pages
edition_comparison_indexEdition collation queue for seeded sources.processed/edition_comparison_index.jsonyesedition_review_status, internet_archive_id, local_raw_file, priority, processed_status, review_actions, source_id, title
patent_theory_bridgeSeeded bridge from patents to concepts and theory-review targets.processed/patent_theory_bridge.jsonyesbridge_status, concept_links, diagram_targets, domain_tags, patent_number, patent_url, pdf_url, publication_date
canonical_verification_workbenchTop-level queue index for canonical verification work.processed/canonical_verification_workbench.jsonyesgenerated_at, quality_note, queues, summary
equation_verification_queueEquation scan-check queue with OCR line snippets.processed/equation_verification_queue.jsonyescandidate_status, chapter_id, chapter_refs, chapter_title, id, line_anchors, line_ranges, links
figure_verification_queueOriginal figure crop verification queue.processed/figure_verification_queue.jsonyescrop_box_pixels, id, links, manifest_path, output_path, public_url, review_actions, sha256
patent_verification_queuePatent authority PDF, claim, drawing, and theory bridge queue.processed/patent_verification_queue.jsonyesconcept_links, diagram_targets, domain_tags, links, patent_number, patent_url, pdf_url, publication_date
claim_attribution_ledgerSource-isolation ledger for fact, candidate, translation, patent, diagram, and interpretation layers.processed/claim_attribution_ledger.jsonyesallowed_use, claim_type, collection, confidence, id, interpretation_layer, label, person

When building external tools, preserve fields such as status, verification, confidence, quality_note, source_ref, and review_state. Removing them makes candidate OCR look more certain than it is.

The next level is formal JSON Schema files in pipeline/schemas/, versioned export contracts, and validation in CI.