zenodo.org
OCR Full-Text Corpus of PURSUE Release 1 Declassified UAP Records (FBI Case File 62-HQ-83894 and USAF Project Blue Book Box 7)
Machine-readable plain-text corpus extracted via Optical Character Recognition (Mistral AI mistral-ocr-latest) from the 18 PDF source documents released under PURSUE Tranche 1 (Presidential Unsealing and Reporting System for UAP Encounters) by the U.S. Department of War on May 8, 2026 (war.gov/UFO). The corpus covers two major archival collections: FBI Case File 62-HQ-83894 (Flying Saucers) — 10 sections, 7 individual serials (130, 153, 164, 220, 403, 438, 449), and Sub-file A. Approximately 2,300 pages spanning July 1947 to 1967. USAF Project Blue Book Box 7 — Incident Summaries 1–233 — Three files covering the first 233 USAF-documented UAP cases (June 1947 – January 1949), approximately 531 pages. Each source document is provided as a single plain-text file with pages delimited by ---PAGE--- separators. Total corpus: approximately 3,000 pages of machine-readable text. Intended for computational analysis, database ingestion, and cross-referencing with other UAP research corpora. Source documents are in the public domain (U.S. government works, 17 U.S.C. § 105). OCR output released under CC0.