
- This event has passed.
Digital Humanities Research Group seminar: “To have the ‘million’ readers yet”: applying OCR & NER to bilingual Irish-English texts in An Gaodhal (1881–1898)
February 6, 2024 @ 12:00 pm - 2:00 pm
Event Navigation

Digital Humanities Research Group seminar
Deirdre Ní Chonghaile, Glucksman Ireland House, New York University
Oksana Dereza, Data Science Institute, University of Galway
“To have the ‘million’ readers yet”: applying OCR & NER to bilingual Irish-English texts in An Gaodhal (1881–1898)
Abstract:
Computerized text extraction for the Irish language (Gaeilge) faces a number of challenges, the most significant of which is the machine-readability of cló Gaelach, the typeface most commonly used in hand-written and printed Irish-language material up until the 1960s. To date, only a handful of OCR training models attuned to cló Gaelach, and to pre-standardized spelling, have emerged and none were trained on bilingual texts (Irish-English). Using the text-recognition software Transkribus, a team at New York University and University of Galway have developed two new OCR models: a Gaeilge-only model and a bilingual Gaeilge-English model. The core dataset for this OCR training exercise is the Brooklyn-based bilingual monthly newspaper An Gaodhal (1881-1898), the first serial dedicated to providing content to an Irish-language readership, which was established, edited, and printed by Galwayman Micheál Ó Lócháin (1836-1899). The contents of the newspaper reflect the cultural interests of Irish speakers in New York, Ireland, and the wider diaspora; Irish American life; New York history; and the development of the Irish language during the Celtic Revival period. Using the texts extracted from An Gaodhal, which are being corrected at word level, the team is developing NER (named entity recognition) tools to aid future NLP work in the Irish language. This work-in-progress presentation will share learnings from this on-going project.
Registration
To to attend via Zoom, please register here: https://forms.office.com/e/CvkPkh39sJ