ABOUT ME

Matthew R. Lee

Email:

Matthew_Lee@sil.org

I am a Language Technology Consultant and Linguist with SIL International. My wife and I serve with SIL Cameroon in Yaoundé, Cameroon and I'm Adjunct Faculty at Dallas Internatial University. My focus is promoting minority language groups through the application of Language Technology. This role takes the form of trainer, researcher, training material developer, advocate, technical supporter, bug reporter and, when the need arises, as a programmer.

My Research

My pre-SIL Bachelor's focused on cross-domain scientific problem-solving, and the Master's degree in descriptive linguistics served to equip me to be more relevant. My research interests are wide, but focus on tools to help linguists analyze, develop, use, and promote their language. Much of my work relates to a Cameroonian context, but always with an eye to wider application.

Languages

  • English: Native
  • French: Near-Native
  • Feʔfeʔ & Ewondo (Bantu): Novice

Interests (In order of my preference.)

SELECTED PROJECTS

My Current Projects

The Cameroon Keyboard is a virtual keyboard maintained by SIL Cameroon. It allows the user to type in any language of Cameroon that follows the GACL (General Alphabet of Cameroonian Languages).

Between the 7th and 9th of March 1979, the National Committee for the Unification and Harmonisation of the Alphabets of Cameroon Languages decided on a Unified Alphabet for the orthographies of Cameroonian languages. The choices follow the Roman alphabet used in English and French, but with the addition of extra letters from the International Phonetic Alphabet.

Since that time, SIL Cameroon has been entrusted to maintain virtual keyboards to allow users to type in the local languages of Cameroon. The AZERTY and QWERTY source code, for Microsoft Keyboard Layout Creator, Keyman, and Linux's XKB, is a product of this continuing research.

Cameroon Keyboard App IconThe Cameroon Keyboard is a virtual keyboard maintained by SIL Cameroon. It allows the user to type in any language of Cameroon that follows the GACL (General Alphabet of Cameroonian Languages). You can now send text messages and use social media in your language from your phone or tablet! It has been available on Windows® for nearly a decade, but is now available on Android®!

Between the 7th and 9th of March 1979, the National Committee for the Unification and Harmonisation of the Alphabets of Cameroon Languages decided on a Unified Alphabet for the orthographies of Cameroonian languages. The choices follow the Roman alphabet used in English and French, but with the addition of extra letters from the International Phonetic Alphabet.

Since that time, SIL Cameroon has been entrusted to maintain virtual keyboards to allow users to type in the local languages of Cameroon. This app is a product of this continuing research.

Open Source Projects Supported Directly

XlingPaper App IconYou can use XLingPaper to produce linguistic documents with at least five outputs, all from the same source document: Web pages, PDF, Microsoft Word, Open Office Writer, and ePUB. It automatically numbers sections and examples and keeps track of section references, citation references, and glosses/abbreviations. See Simons and Black (2009) and Black (2009) for the key notions used.

While this project is managed by others, specifically Andy Black, I do help with support, bug reports, and sometimes code.

XlingPaper App IconKeyman makes it possible for you to type in any language on Windows, macOS, Linux, iPhone, iPad, Android tablets and phones, and even instantly in your web browser. Create keyboard layouts with Keyman Developer and share them with the community in the keyboards repository. The Keyman Community have already contributed keyboard layouts for over 1,500 languages! Keyman is an open source project distributed under the MIT license.

In addition to maintaining the Cameroon Keyboard for Keyman, I am regularly involved in beta testing, documentation, and sometimes in coding.

PUBLICATIONS AND TALKS

Master's Thesis

The transcription bottleneck (Seifart et al. 2018, 1) delays and sometimes prohibits valuable language documentation resources from being shared with the target community and other researchers. An oral-first annotation methodology (Boerger et al. 2019) partially alleviates that problem through a process of oral annotation that can be done much more efficiently than written transcription. Specialized tools such as SIL International's SayMore aid a researcher and community to facilitate collection, annotation, documentation, and archiving of orally and textually annotated data in a standardized format, but the resulting web of aligned archival data is disjointed and far from an interactive format ready to be experienced by the community. This thesis proposes a solution, a tool named Prestige, that complements oral annotation's contribution by providing an automatically generated multilayered media experience designed to mobilize the resulting rich sources and annotations for immediate use in the community.

Lee, Matthew R. (2021). Prestige: Mobilizing an Orally Annotated Language Documentation Corpus. M.A. Thesis. Dallas, TX. Dallas International University

Full papers

Despite recent advances in natural language processing and other language technology, the application of such technology to language documentation and conservation has been limited. In August 2019, a workshop was held at Carnegie Mellon University in Pittsburgh to attempt to bring together language community members, documentary linguists, and technologists to discuss how to bridge this gap and create prototypes of novel and practical language revitalization technologies. This paper reports the results of this workshop, including issues discussed, and various conceived and implemented technologies for nine languages: Arapaho, Cayuga, Inuktitut, Irish Gaelic, Kidaw'ida, Kwak'wala, Ojibwe, San Juan Quiahije Chatino, and Seneca.

Neubig, G., Rijhwani, S., Palmer, A., MacKenzie, J., Cruz, H., Li, X., … Littell, P. (2020). A Summary of the First Workshop on Language Technology for Language Documentation and Revitalization.

Multilingual models can improve language processing, particularly for low resource situations, by sharing parameters across languages. Multilingual acoustic models, however, generally ignore the difference between phonemes (sounds that can support lexical contrasts in a particular language) and their corresponding phones (the sounds that are actually spoken, which are language independent). This can lead to performance degradation when combining a variety of training languages, as identically annotated phonemes can actually correspond to several different underlying phonetic realizations. In this work, we propose a joint model of both language-independent phone and language-dependent phoneme distributions. In multilingual ASR experiments over 11 languages, we find that this model improves testing performance by 2% phoneme error rate absolute in low-resource conditions. Additionally, because we are explicitly modeling language-independent phones, we can build a (nearly-)universal phone recognizer that, when combined with the PHOIBLE [1] large, manually curated database of phone inventories, can be customized into 2,000 language dependent recognizers. Experiments on two low-resourced indigenous languages, Inuktitut and Tusom, show that our recognizer achieves phone accuracy improvements of more than 17%, moving a step closer to speech recognition for all languages in the world.

Li, X., Dalmia, S., Li, J., Lee, M., Littell, P., Yao, J., … Metze, F. (2020). Universal Phone Recognition with a Multilingual Allophone System.

A great Scripture engagement opportunity lies before us. People are publicly reading the Scriptures every Sunday using lectionaries in the national language. Especially for minority languages without a full Bible, how can we make these important readings available in the mother tongue?

Weber, M., Weber, J.; Lee, M., (October 2017) What does God say this week?" Encouraging Churches through Mother Tongue Lectionaries. Bible Translation Conference, Dallas, TX, USA

Presentations

Video Presentation (raw footage) AI for Everyone!?! (English Presentation) IA pout Tous !?! (Presentation française)

This presentation in is overview of Artificial Intelligence. It was given to language development staff at SIL Cameroon on March 31, 2024.


Lee, M. (March 27, 2023). AI for Everyone, LangTech Half-Day. SIL Cameroon, Yaoundé, CM
Video Presentation (raw footage) Language Technology Overview Language Documentation Overview Artificial Intelligence & Minority Languages

This presentation in three parts presents 1) an overview of Language Technology, 2) an introduction to Language Technology and 3) a starting discussion on AI and minority languages. It was given to language development staff at SIL Cameroon on April 12, 2023.


Lee, M. (April 2023). Lang Tech Half-Day. SIL Cameroon, Yaoundé, CM
Kain, G., Ngono, L.P., Lee, M. (Feb 2022). The Use of Technology in Multilingual Learning: Challenges and Opportunities. International Mother Tongue Day. SIL Cameroon, Yaoundé, CM
Lee, M. (April 2021). Lang Tech Half-Day. SIL Cameroon, Yaoundé, CM

This presentation discusses "Traditional" Language Vitality and Digital Language Vitality, with an emphais on Digital Language Support and tools in Cameroon.

Lee, M. (Jan 2021). Basic Tools for Promoting Cameroon’s Language Vitality in Today’s Digital World. SIL Cameroon Monthly Webinars, Yaoundé, CM

Ceci est une présentation sur la Technologie linguistique donnée le 19 août, 2021 à SIL Cameroun avec un focus pour les partenaires chrétiens au Cameroun. Pour mieux parler à tout le monde, l'audio est en français, mais le texte de la présentation est en anglais.

Il y a deux versions de cette présentation, une avec une emphase pour les partenaires académiques et gouvernementales, et l'autre avec une emphase pour les partenaires chrétiens.

Lee, M.; Ngono, L. P. (Aug 2020). Les bases de la Technologie linguistique au Cameroun. SIL Cameroon Annual Report, Yaoundé, CM

This presentation seeks to increase awareness about some of the Language Technology Tools that support work in the Languages in Cameroon.

Lee, M.; Ngono, L. P.; Roettele, J. (Jan 2018). Language Technology in Cameroon. National Symposium on Cameroonian Languages (NASCAL), Yaoundé, CM

Cette présentation a été presenté pour partager quelque connaissances sur l'archivage et le copyright aux étudiants de l'Université à une discussion table round.

Lee, M. (2017) La production et l’archivage des ressources numériques au Cameroun. Open Data Conference, University of Yaounde I

This presentation presents a technical history of the Cameroon Keyboard, and makes some proposals for standardizing the technical specifics of the General Alphabet of Cameroonian Languages.

Lee, M. (2017) Considérations informatiques sur l'alphabet generale des langues camerounaises. Conference sur l'Alphabet générale des langues camerounaises, University of Yaounde I

This overview presentation was given to SIL Cameroon's Advisory Board for their review of tools and Activities of CMB's Language Technology Department.

Now that dictionaries are no longer stored as formatted documents, but as lexical databases (in FLEx, WeSay, Toolbox, etc.), what can we do with this all this structured data beyond printing dictionaries and can we share this wealth? This project (under the guidance of Gary Simons) takes the smart data of FLEx lexical databases a step further. By linking together our lexicons stored in appropriate forms, we can start to leverage the structured knowledge within to ask interesting cross-linguistic questions of our data, and we can share that resource with the outside world.
• Do any of the languages in this cluster have a non-borrowed word for faith?
• Is the past tense in each language marked with a prefix, suffix, or clitic?
• In these related languages, are person and number typically expressed in separate morphemes or as a portmanteau?
Recorded on May 8, 2014 in Mahler 5-7 at GIAL.
Edited by Matthew. Property of Matthew Lee, Graduate Institute of Applied Linguistics, and SIL International

Lee, M. (2014) Dictionaries as Datapoints. Academic Forum, Graduate Institute of Applied Linguistics, Dallas, TX, USA

This brief presentation closed a series of workshops on promoting minority languages in modern technology.

Going Kompyuta Website

This brief presentation gave a history of the Cameroon Keyboard before asking for help to revise the Cameroon Qwerty and Cameroon AZERTY. Presented January 21st, 2011

Workshops, Posters, etc.

This brief presentation gives some advice on how, and how not to promote minority language keyboards.

Lee, M. (2020) Promoting your Keyboard. Online Keyman Developer Training Workshop, SIL International Language Technology

This is and Intermediate Training on Logos Bible Software, arranged as a panel discussion.

O'Rear, P., Maust, D. Lee, M. (2020) Translator's Workplace Logos Intermediate Training. [Panel Discussion], SIL International Language Technology

Bachelor's Thesis

This is my capstone thesis for my Bachelor's in Integrated Science and Technology at James Madison University

Lee, M. R. (2006) TRIADS: A wireless location-based information service. [Bachelor's Thesis]. James Madison University

Miscellaneous

This is the Blog of the Language Technology Department for SIL Cameroon, providing news and resources for the minority languages of Cameroon. Special attention is gieven to Keyboards, fonts, and encoding.

Lee, M. (2021) Language Technology in Cameroon. [Blog] Language Technology in Cameroon, https://langtechcameroon.info/

This document is the result of an assignment from Dr. Steve Parker to explain Sonority in the style of an elevator speech to a specific audience, and I chose Star Wars geeks. Please enjoy the citations. No offense is intended to any of those cited, living, force ghost, or imaginary.

Lee, M. (2019) Sonority: A Space Elevator Story. Speculative Grammarian, Vol CLXXXVII, No. 4

EDUCATION

The Master of Arts degree with a major in Applied Linguistics is designed to produce graduates qualified to serve in specialist cross-cultural roles in Descriptive Linguistics.

After the groundwork in the Certificate in Applied Linguistics, this program continued with courses on Language Documentation, Advanced Phonology, Sonority, Syntax, Discourse, Linked Data in Linguistics, Semantics and Pragmatics, Cross-cultural Training, and related fields.

My chosen courses have provided a solid grounding in many domains of linguistics, with focus on available and emerging technologies that can aid such work.

  • Courses: Phonetics, Phonology, Grammar, Anthropology, Lang. & Culture Acq., Sociolinguistics, and Field Methods
  • Teacher’s Assistant for Field Data Management: Took over as instructor for 7 of 8 weeks of the grad-level technology course during professor’s leave. Taught FieldWorks Language Explorer, Phonology Assistant, etc.
  • Honors Thesis: “A Wireless Location-Based Information Service”
  • 2005-06: Concentration in Information and Knowledge Management: emphasis on Intelligent Systems, Software Engineering, Multimedia, and Programming.
  • 2004-05: Study of specific scientific sectors: Information and Knowledge Management, Health Systems, and Telecommunications.
  • 2002-04: Grounding Coursework in Applied Physics, Trigonometry, Calculus, Biology, Chemistry, Databases and Programming, Instrumentation and Measurement, as well as each field’s social dimensions.
  • Member of Phi Sigma Tau (ΦΣΤ) Philosophy Honor Society
  • Concentration in Philosophy: Coverage of Ancient Greek and Modern Philosophy, Epistemology, Moral Theory, Self and Community, Logic, New Testament, Buddhist Thought, Political Philosophy, and Philosophy of Religion

EMPLOYMENT

  • Develop French and English materials and train users in effective use of linguistic and translation software. (Paratext, FieldWorks, Adapt It, WeSay, Phonology Assistant, OurWord, etc.)
  • Customize and troubleshoot linguistic applications for various languages and situations.
  • Convert and clean large documents and data for publishing or import. (Regular Expressions, EditPad Pro, PowerGrep, MS Office, Libre/OpenOffice, Acrobat)
  • Develop and maintain customized keyboards for multiple operating systems and languages.
  • Advocate new and traditional technologies and workflows through networking and research.
  • Instructor for the Language and Culture Documentation course
  • Develop French and English materials and train users in effective use of linguistic and translation software. (Paratext, FieldWorks, Adapt It, WeSay, Phonology Assistant, OurWord, etc.)
  • Customize and troubleshoot linguistic applications for various languages and situations.
  • Convert and clean large documents and data for publishing or import. (Regular Expressions, EditPad Pro, PowerGrep, MS Office, Libre/OpenOffice, Acrobat)
  • Develop and maintain customized keyboards for multiple operating systems and languages.
  • Advocate new and traditional technologies and workflows through networking and research.
  • Started course as the Teaching Assistant.
  • Took over as instructor for 7 of 8 weeks of the grad-level technology course during the professor’s emergency leave.
  • Taught FieldWorks Language Explorer, Phonology Assistant, Linguistic Desktop Publishing, LingTree, etc.
  • Led six-week course on effective use of linguistic and translation software. (Paratext, Translator’s Workplace, WeSay, etc.)
  • Managed remote monitoring system to provide maintenance and checks of customer’s critical systems.
  • Telephone system installation, configuration, and troubleshooting. Focus on NEC Elite IPK II.
  • Installed, analyzed, maintained, and supported both our office network and customer networks. (server/router/wireless configuration, switch/bridge/PC installs, replace devices, wiring)
  • Created sales orders, order supplies, negotiate with customers, complete contractor paperwork
  • Taught 2D & 3D Video Game Creation using different environments and variety of multimedia apps.
  • Fulfilled full 24-hour camp counselor duties (leading activities, games, supervising dormitory)
  • Troubleshoot computer and user issues, software training, computer setup
  • Server maintenance, database design and management, data entry

COMMUNITIES & PARTICIPATION

Outilingua is a community of staff serving Language Technology in SIL and it's partner organizations across Francophone Africa.

Paratext is the world’s leading software application for the development and checking of new Bible translations and revisions to existing texts. Developed jointly by UBS and SIL International, Paratext is free for registered users from all walks of life, from church-based translation efforts in minority languages to Bible publishers in major languages. With more than 1,500 resource texts and tools available to vetted Bible translators, it enables consistent and accurate translation based on the original texts and modeled on versions in major languages. Endowed with cutting edge collaboration features, Paratext helps Bible translation teams work together to produce higher-quality translations in much less time than previous tools and methods have allowed.

The Paratext Prioritization Committee is charged with defining each year's focus for the Paratext Developers.

This major goal of this workshop was to take the recent and rapid advances in language technology (such as speech recognition, machine translation, automatic analysis of syntax, question answering, etc.), and put them in the hands of those on the front lines of language documentation and revitalization (such as language community members or documentary linguists).

Reso Collect is a non-profit organization, registered in Switzerland and Cameroon. It is committed to the fight against extreme poverty and the social reintegration of women and girls from disadvantaged neighborhoods. Its main activity is to offer Cameroon’s women and families equal work opportunities in order to improve socially and economically.

Matthew R. Lee, http://MattGyverLee.github.io,