Computational philology

Teaching Logion how scribes actually slip

A weighted edit distance for restoring damaged Byzantine Greek — and a statistically significant gain on the variants real manuscripts preserve.

Reza Ramji · extends Princeton’s Logion · source

Two Greek words one letter-pair apart The dative articles tais and tois differ only by the itacistic swap of alpha-iota to omicron-iota, two spellings a Byzantine scribe pronounced the same. Plain edit distance scores this swap exactly like any unrelated one-letter change; the weighted scheme makes it cost far less. ταις tais — “to the” (fem.) τοις tois — “to the” (masc./neut.) αι → οι both heard as /i/ edit cost 0.00 1.00 (unrelated) 0.35 itacism swap — ~3.7× cheaper

Two Greek words that differ by one letter-pair. A Byzantine scribe taking dictation could not hear the difference between ταις and τοις — both were pronounced /i/. Plain edit distance scores that slip exactly like any unrelated one-letter change. The weighted scheme makes it cost 0.35 instead of 1.00.

Ancient Greek reaches us through centuries of hand-copying. Letters smudge, words drop out, scribes mishear from dictation or misread a neighbour’s hand. Princeton’s Logion — a BERT language model trained on premodern Greek — reads a passage, flags where the transmitted word looks wrong, and proposes what belongs there. One thing it gets blunt: every letter swap costs the same. Weight swaps by how easily letters are actually confused, and the suggestions match how scribes really err — recovering more of the variants real manuscripts preserve.


The mechanism

How weighting reshapes the candidate ranking For one damaged word, the baseline filter admits every neighbour within edit distance one and treats them all equally. The weighted scheme promotes candidates that differ by a documented scribal confusion and demotes the rest, so the correct reading rises toward the top of the shortlist. Baseline filter all neighbours equal Weighted filter cost by confusion class candidates for one suspect word τοις 1.00 ταις 1.00 τωις 1.00 τυις 1.00 βοις 1.00 τοις 0.35 ταις 0.45 τωις τυις βοις longer bar = more plausible The reading the scribe meant rises to the top of the list.

Logion scores spellings near a suspect word. Watch how the shortlist changes when swaps are weighted.

01 — How it decides something looks off

A model that second-guesses the page

For each word, Logion asks: how likely is what’s written, against the most likely near-spelling it can imagine in that spot? When some other spelling is far more probable in context, the word is surfaced for an editor.

Building that set of near-spellings needs a notion of close. Logion uses edit distance — the number of single-character changes that turn one word into another.

02 — The problem

Every swap costs the same

Plain edit distance is blunt. Changing one letter to any other costs exactly one — whether the two are constantly confused or unrelated. Logion’s original filter is blunter still: a yes/no gate that admits every word within distance 1 and treats them all alike.

So the five candidates on the left arrive as equals. The model has discarded the one thing a philologist knows cold: which mistakes scribes actually make.

03 — The change

Weight the swaps by how scribes err

Give each substitution a cost from documented scribal confusions — itacistic vowels heard alike, look-alikes in the minuscule hand. Easily-confused letters cost little; unrelated ones cost the full amount. The hard gate becomes a smooth weight.

The shortlist re-sorts. The itacistic neighbour τοις — one /i/-swap away — climbs to the top, and the implausible candidates sink.

An itacism swap like ταις / τοις costs 0.35 — about 3.7× cheaper than an unrelated same-distance edit like τις / οις, which costs the full 1.00.


The result

Tested against real attested variants — 333 spots in the SBLGNT apparatus where the manuscripts genuinely disagree — the weighted filter recovers more of them. This is the fairest test: real scribal errors, not artificial ones.

Top-5 recovery climbs to 42.3% from 35.7%

Paired McNemar test, p < 10−3: the weighted filter recovered 28 loci the baseline missed and lost only 6. Top-1 trends positive but isn’t significant (16.5% vs. 14.4%, p = 0.30).

Variant recovery, baseline versus weighted Bar chart of variant-locus recovery on 333 SBLGNT loci. At top-1 baseline 14.4 percent versus weighted 16.5 percent; at top-5 baseline 35.7 versus weighted 42.3, a significant gap; at top-10 baseline 50.2 versus weighted 53.5. Error bars are Wilson 95 percent confidence intervals. 0 20 40 60 recovery (%) top-1 14.4 16.5 top-5 35.7 42.3 p < 10⁻³ top-10 50.2 53.5 baseline weighted
Real attested-variant recovery on SBLGNT (n = 333). Weighted = weighted Levenshtein only. Higher at every cutoff; the top-5 gap is statistically significant. Error bars in the PDF figure are Wilson 95% CIs.

One caveat worth naming: a variant isn’t always an error — many SBLGNT disagreements are between defensible readings — so this measures whether Logion finds a locus unusual, not whether it’s wrong. The weights are hand-tuned from textbooks; the full ablation, the controlled protocols, and a deployment on Photius are in the PDF.