༄༅།

Tibetan Woodblock Digitization

ཤིང་པར་གྱི་ཡི་གེ་ཨང་ཅན་ཡིག་ཆར་བསྒྱུར་བ།

We are digitizing Tibetan block-print manuscripts (pecha): taking scans of the original woodblock folios and converting them into searchable digital Tibetan text — together with a transliteration, a simple pronunciation guide, and a draft English translation.

Everything runs locally: OCR uses the BDRC woodblock models, pronunciation and Wylie are derived with open tools (bophono, pyewts), and the draft translation comes from a small translation model trained on Buddhist texts (MLotsawa). The digital text is then proofread against the original print, folio by folio.

How a text moves through

  1. Scan — high-resolution images of the woodblock folios
  2. OCR — BDRC Woodblock model reads the print into Unicode Tibetan
  3. Clean — normalization and mechanical fixes
  4. Layers — Wylie transliteration, pronunciation, draft English
  5. Proofread — checked line-by-line against the scan by a qualified reader

Texts

07-01-2026-Test
W11577 vol. 4, folios 121–129 · sand-mandala ritual text · 9 folios · scan, Tibetan, pronunciation & draft translation