Aeneas logo

Aeneas

AI that helps historians connect the past

2025-07-28

Product Introduction

  1. Aeneas is an open-source AI model developed by Google DeepMind specifically designed to assist historians in restoring, dating, and contextualizing fragmentary ancient inscriptions. It analyzes both textual content and visual characteristics of inscriptions to provide historical context and identify parallels across datasets. The model supports Latin texts initially but is adaptable to other ancient languages and media like papyri or coins.

  2. The core value of Aeneas lies in accelerating epigraphic research by automating time-intensive tasks like identifying textual parallels and restoring damaged texts. It enables historians to draw data-driven connections across fragmented artifacts at scale, enhancing the accuracy and scope of historical interpretations.

Main Features

  1. Aeneas performs parallels search using transformer-based embeddings to create "historical fingerprints" that identify similarities in wording, syntax, and provenance across its database of 176,000+ Latin inscriptions. This allows rapid retrieval of contextual matches from the Epigraphic Database Roma (EDR) and other linked datasets.
  2. The model processes multimodal inputs, combining textual analysis with visual data from inscription images to predict geographical provenance with 72% accuracy across 62 Roman provinces. This dual approach improves attribution where text alone is insufficient due to damage or wear.
  3. Aeneas introduces unknown-length gap restoration, a novel capability for reconstructing missing text segments without prior knowledge of gap size, achieving 58% accuracy in real-world scenarios. It also provides saliency maps to explain which input features influenced its predictions.

Problems Solved

  1. Aeneas addresses the challenge of interpreting fragmentary or degraded inscriptions, which traditionally require manual cross-referencing of physical and linguistic clues across scattered archives. It reduces restoration time from weeks to seconds for heavily damaged texts.
  2. The product targets historians, archaeologists, educators, and museum professionals working with Roman-era materials. It is particularly valuable for researchers analyzing large corpora of inscriptions with limited contextual metadata.
  3. Typical use cases include dating disputed artifacts like the Res Gestae Divi Augusti, reconstructing military diplomas with missing sections, and identifying regional patterns in administrative or religious inscriptions across the Roman Empire.

Unique Advantages

  1. Unlike general-purpose language models or earlier tools like Ithaca (focused on Greek texts), Aeneas specializes in Latin epigraphy with built-in support for multimodal analysis and historical metadata. It integrates directly with academic databases rather than relying solely on unstructured text.
  2. The model innovates through its contextualization engine, which retrieves parallels based on both linguistic patterns and spatial-temporal metadata. This enables hierarchical analysis from individual phrases to empire-wide trends.
  3. Competitive advantages include state-of-the-art performance in date prediction (within 13 years of expert estimates) and open-source availability of its codebase and Latin Epigraphic Dataset (LED). The tool is designed for collaborative use, allowing historians to refine outputs using domain expertise.

Frequently Asked Questions (FAQ)

  1. How does Aeneas differ from DeepMind’s earlier Ithaca model? Aeneas expands beyond Ithaca’s Greek text focus by adding Latin support, multimodal image analysis, and unknown-length gap restoration. It also introduces a parallels retrieval system for contextual analysis rather than solely focusing on text completion.
  2. Can Aeneas restore inscriptions with completely missing sections? Yes, the model handles gaps of unknown length using a transformer-based decoder trained on 176,000+ inscriptions, achieving 58% accuracy in restorations without predefined character limits. Historians can further refine results using the provided saliency maps.
  3. What datasets power Aeneas? It uses the Latin Epigraphic Dataset (LED), a harmonized collection integrating the Epigraphic Database Roma, Epigraphic Database Heidelberg, and Epigraphic Database Clauss-Slaby. The dataset includes metadata on provenance, dating, and inscription types.
  4. Is Aeneas accessible to non-technical users? Yes, an interactive web interface at predictingthepast.com allows free access to core functions. The open-source codebase also enables integration into academic workflows via Python APIs and Jupyter notebooks.
  5. Does Aeneas support languages beyond Latin? While optimized for Latin, the architecture is adaptable to other ancient languages. The team has demonstrated prototype extensions for Greek papyri and plans community-driven expansions to Coptic and Oscan texts.

Subscribe to Our Newsletter

Get weekly curated tool recommendations and stay updated with the latest product news