Teaching Logion How Scribes Actually Slip

Ancient Greek reaches us through centuries of hand-copying. Letters smudge, words drop out, scribes mishear from dictation or misread a neighbour’s hand. Princeton’s Logion — a BERT language model trained on premodern Greek — reads a passage, flags where the transmitted word looks wrong, and proposes what belongs there. One thing it gets blunt: every letter swap costs the same. Weight swaps by how easily letters are actually confused, and the suggestions match how scribes really err — recovering more of the variants real manuscripts preserve.

Logion scores spellings near a suspect word. Watch how the shortlist changes when swaps are weighted.

01 — How it decides something looks off

A model that second-guesses the page

For each word, Logion asks: how likely is what’s written, against the most likely near-spelling it can imagine in that spot? When some other spelling is far more probable in context, the word is surfaced for an editor.

Building that set of near-spellings needs a notion of close. Logion uses edit distance — the number of single-character changes that turn one word into another.

02 — The problem

Every swap costs the same

Plain edit distance is blunt. Changing one letter to any other costs exactly one — whether the two are constantly confused or unrelated. Logion’s original filter is blunter still: a yes/no gate that admits every word within distance 1 and treats them all alike.

So the five candidates on the left arrive as equals. The model has discarded the one thing a philologist knows cold: which mistakes scribes actually make.

03 — The change

Weight the swaps by how scribes err

Give each substitution a cost from documented scribal confusions — itacistic vowels heard alike, look-alikes in the minuscule hand. Easily-confused letters cost little; unrelated ones cost the full amount. The hard gate becomes a smooth weight.

The shortlist re-sorts. The itacistic neighbour τοις — one /i/-swap away — climbs to the top, and the implausible candidates sink.

An itacism swap like ταις / τοις costs 0.35 — about 3.7× cheaper than an unrelated same-distance edit like τις / οις, which costs the full 1.00.