I am a Language Technology
Consultant and Linguist with SIL
International. My wife and I serve with SIL
Cameroon in Yaoundé, Cameroon and I'm Adjunct Faculty at Dallas Internatial University. My focus is promoting minority language
groups through the application of Language Technology. This role takes the
form of trainer, researcher, training material developer, advocate, technical
supporter, bug reporter and, when the need arises, as a programmer.
My Research
My pre-SIL Bachelor's focused on cross-domain scientific
problem-solving, and the Master's degree in descriptive linguistics
served to equip me to be more relevant. My research interests are wide, but focus on
tools to help linguists analyze, develop, use, and promote their
language. Much of my work relates to a Cameroonian context,
but always with an eye to wider application.
The Cameroon Keyboard is a virtual keyboard maintained by SIL Cameroon.
It allows the user to type in any language of Cameroon that follows the
GACL (General Alphabet of Cameroonian Languages).
Between the 7th and 9th of March 1979, the National Committee for the
Unification and Harmonisation of the Alphabets of Cameroon Languages
decided on a Unified Alphabet for the orthographies of Cameroonian
languages. The choices follow the Roman alphabet used in English and
French, but with the addition of extra letters from the International
Phonetic Alphabet.
Since that time, SIL Cameroon has been entrusted to maintain virtual
keyboards to allow users to type in the local languages of Cameroon. The
AZERTY and QWERTY source code, for Microsoft Keyboard Layout Creator,
Keyman, and Linux's XKB, is a product of this continuing research.
The Cameroon Keyboard is a
virtual keyboard maintained by SIL Cameroon. It allows the user to type in
any language of Cameroon that follows the GACL (General Alphabet of
Cameroonian Languages). You can now send text messages and use social
media in your language from your phone or tablet! It has been available on
Windows® for nearly a decade, but is now available on Android®!
Between the 7th and 9th of March 1979, the National Committee for the
Unification and Harmonisation of the Alphabets of Cameroon Languages
decided on a Unified Alphabet for the orthographies of Cameroonian
languages. The choices follow the Roman alphabet used in English and
French, but with the addition of extra letters from the International
Phonetic Alphabet.
Since that time, SIL Cameroon has been entrusted to maintain virtual
keyboards to allow users to type in the local languages of Cameroon. This
app is a product of this continuing research.
You can use XLingPaper to produce
linguistic documents with at least five outputs, all from the same source
document: Web pages, PDF, Microsoft Word, Open Office Writer, and ePUB. It
automatically numbers sections and examples and keeps track of section
references, citation references, and glosses/abbreviations. See Simons and
Black (2009) and Black (2009) for the key notions used.
While this project is managed by others, specifically Andy Black, I do
help with support, bug reports, and sometimes code.
Keyman makes it possible for you to
type in any language on Windows, macOS, Linux, iPhone, iPad, Android
tablets and phones, and even instantly in your web browser. Create
keyboard layouts with Keyman Developer and share them with the community
in the keyboards repository. The Keyman Community have already contributed
keyboard layouts for over 1,500 languages! Keyman is an open source
project distributed under the MIT license.
In addition to maintaining the Cameroon Keyboard for Keyman, I am
regularly involved in beta testing, documentation, and sometimes in
coding.
The transcription bottleneck (Seifart et al. 2018, 1) delays and sometimes prohibits valuable language documentation resources from being shared with the target community and other researchers. An oral-first annotation methodology (Boerger et al. 2019) partially alleviates that problem through a process of oral annotation that can be done much more efficiently than written transcription. Specialized tools such as SIL International's SayMore aid a researcher and community to facilitate collection, annotation, documentation, and archiving of orally and textually annotated data in a standardized format, but the resulting web of aligned archival data is disjointed and far from an interactive format ready to be experienced by the community. This thesis proposes a solution, a tool named Prestige, that complements oral annotation's contribution by providing an automatically generated multilayered media experience designed to mobilize the resulting rich sources and annotations
for immediate use in the community.
Lee, Matthew R. (2021). Prestige: Mobilizing an Orally Annotated Language Documentation Corpus. M.A. Thesis. Dallas, TX. Dallas International University
Despite recent advances in natural language processing and other
language technology, the application of such technology to language
documentation and conservation has been limited. In August 2019, a
workshop was held at Carnegie Mellon University in Pittsburgh to attempt
to bring together language community members, documentary linguists, and
technologists to discuss how to bridge this gap and create prototypes of
novel and practical language revitalization technologies. This paper
reports the results of this workshop, including issues discussed, and
various conceived and implemented technologies for nine languages:
Arapaho, Cayuga, Inuktitut, Irish Gaelic, Kidaw'ida, Kwak'wala, Ojibwe,
San Juan Quiahije Chatino, and Seneca.
Neubig, G., Rijhwani, S., Palmer, A., MacKenzie, J.,
Cruz, H., Li, X., … Littell, P. (2020). A Summary of the First Workshop
on Language Technology for Language Documentation and
Revitalization.
Multilingual models can improve language processing, particularly for low
resource situations, by sharing parameters across languages. Multilingual
acoustic models, however, generally ignore the difference between phonemes
(sounds that can support lexical contrasts in a particular language) and
their corresponding phones (the sounds that are actually spoken, which are
language independent). This can lead to performance degradation when
combining a variety of training languages, as identically annotated
phonemes can actually correspond to several different underlying phonetic
realizations. In this work, we propose a joint model of both
language-independent phone and language-dependent phoneme distributions.
In multilingual ASR experiments over 11 languages, we find that this model
improves testing performance by 2% phoneme error rate absolute in
low-resource conditions. Additionally, because we are explicitly modeling
language-independent phones, we can build a (nearly-)universal phone
recognizer that, when combined with the PHOIBLE [1] large, manually
curated database of phone inventories, can be customized into 2,000
language dependent recognizers. Experiments on two low-resourced
indigenous languages, Inuktitut and Tusom, show that our recognizer
achieves phone accuracy improvements of more than 17%, moving a step
closer to speech recognition for all languages in the world.
Li, X., Dalmia, S., Li, J., Lee, M., Littell, P., Yao,
J., … Metze, F. (2020). Universal Phone Recognition with a Multilingual
Allophone System.
A great Scripture engagement opportunity lies before us. People are
publicly reading the Scriptures every Sunday using lectionaries in the
national language. Especially for minority languages without a full Bible,
how can we make these important readings available in the mother
tongue?
Weber, M., Weber, J.; Lee, M., (October 2017) What
does God say this week?" Encouraging Churches through Mother Tongue
Lectionaries. Bible Translation Conference, Dallas, TX, USA
This presentation in three parts presents 1) an overview of Language Technology, 2) an introduction to Language Technology and 3) a starting discussion on AI and minority languages. It was given to language development staff at SIL Cameroon on April 12, 2023.
Lee, M. (April 2023). Lang Tech Half-Day. SIL
Cameroon, Yaoundé, CM
Kain, G., Ngono, L.P., Lee, M. (Feb 2022). The Use of Technology in Multilingual Learning: Challenges and Opportunities. International Mother Tongue Day. SIL
Cameroon, Yaoundé, CM
This presentation discusses "Traditional" Language Vitality and Digital
Language Vitality, with an emphais on Digital Language Support and tools
in Cameroon.
Lee, M. (Jan 2021). Basic Tools for Promoting
Cameroon’s Language Vitality in Today’s Digital World. SIL Cameroon
Monthly Webinars, Yaoundé, CM
Ceci est une présentation sur la Technologie linguistique donnée le 19
août, 2021 à SIL Cameroun avec un focus pour les partenaires chrétiens au
Cameroun. Pour mieux parler à tout le monde, l'audio est en français, mais
le texte de la présentation est en anglais.
Il y a deux versions de cette présentation, une avec une emphase pour les
partenaires académiques et gouvernementales, et l'autre avec une emphase
pour les partenaires chrétiens.
Lee, M.; Ngono, L. P. (Aug 2020). Les bases de la
Technologie linguistique au Cameroun. SIL Cameroon Annual Report,
Yaoundé, CM
Cette présentation a été presenté pour partager quelque connaissances sur
l'archivage et le copyright aux étudiants de l'Université à une discussion
table round.
Lee, M. (2017) La production et l’archivage des
ressources numériques au Cameroun. Open Data Conference, University
of Yaounde I
This presentation presents a technical history of the Cameroon Keyboard,
and makes some proposals for standardizing the technical specifics of the
General Alphabet of Cameroonian Languages.
Lee, M. (2017) Considérations informatiques sur
l'alphabet generale des langues camerounaises. Conference sur
l'Alphabet générale des langues camerounaises, University of Yaounde
I
This overview presentation was given to SIL Cameroon's Advisory Board for
their review of tools and Activities of CMB's Language Technology
Department.
Now that dictionaries are no longer stored as formatted documents, but as
lexical databases (in FLEx, WeSay, Toolbox, etc.), what can we do with
this all this structured data beyond printing dictionaries and can we
share this wealth? This project (under the guidance of Gary Simons) takes
the smart data of FLEx lexical databases a step further. By linking
together our lexicons stored in appropriate forms, we can start to
leverage the structured knowledge within to ask interesting
cross-linguistic questions of our data, and we can share that resource
with the outside world. • Do any of the languages in this cluster
have a non-borrowed word for faith? • Is the past tense in each
language marked with a prefix, suffix, or clitic? • In these
related languages, are person and number typically expressed in separate
morphemes or as a portmanteau? Recorded on May 8, 2014 in Mahler
5-7 at GIAL. Edited by Matthew. Property of Matthew Lee, Graduate
Institute of Applied Linguistics, and SIL International
Lee, M. (2014) Dictionaries as Datapoints.
Academic Forum, Graduate Institute of Applied Linguistics, Dallas, TX,
USA
This brief presentation gave a history of the Cameroon Keyboard before asking for help to revise the Cameroon Qwerty and Cameroon AZERTY. Presented January 21st, 2011
This is the Blog of the Language Technology Department for SIL Cameroon,
providing news and resources for the minority languages of Cameroon.
Special attention is gieven to Keyboards, fonts, and encoding.
This document is the result of an assignment from Dr. Steve Parker to
explain Sonority in the style of an elevator speech to a specific
audience, and I chose Star Wars geeks. Please enjoy the citations. No
offense is intended to any of those cited, living, force ghost, or
imaginary.
Lee, M. (2019) Sonority: A Space Elevator Story.
Speculative Grammarian, Vol CLXXXVII, No. 4
The Master of Arts degree with a major in Applied Linguistics is designed
to produce graduates qualified to serve in specialist cross-cultural roles
in Descriptive Linguistics.
After the groundwork in the Certificate
in Applied Linguistics, this program continued with courses on
Language Documentation, Advanced Phonology, Sonority, Syntax, Discourse,
Linked Data in Linguistics, Semantics and Pragmatics, Cross-cultural
Training, and related fields.
My chosen courses have provided a solid grounding in many domains of
linguistics, with focus on available and emerging technologies that can
aid such work.
Courses: Phonetics, Phonology, Grammar, Anthropology, Lang. &
Culture Acq., Sociolinguistics, and Field Methods
Teacher’s Assistant for Field Data Management: Took over as
instructor for 7 of 8 weeks of the grad-level technology course during
professor’s leave. Taught FieldWorks Language Explorer, Phonology
Assistant, etc.
2005-06: Concentration in Information and Knowledge Management:
emphasis on Intelligent Systems, Software Engineering, Multimedia,
and Programming.
2004-05: Study of specific scientific sectors: Information and
Knowledge Management, Health Systems, and Telecommunications.
2002-04: Grounding Coursework in Applied Physics, Trigonometry,
Calculus, Biology, Chemistry, Databases and Programming,
Instrumentation and Measurement, as well as each field’s social
dimensions.
Member of Phi Sigma Tau (ΦΣΤ) Philosophy Honor Society
Concentration in Philosophy: Coverage of Ancient Greek and
Modern Philosophy, Epistemology, Moral Theory, Self and Community,
Logic, New Testament, Buddhist Thought, Political Philosophy, and
Philosophy of Religion
Develop French and English materials and train users in effective use
of linguistic and translation software. (Paratext, FieldWorks, Adapt It,
WeSay, Phonology Assistant, OurWord, etc.)
Customize and troubleshoot linguistic applications for various
languages and situations.
Convert and clean large documents and data for publishing or import.
(Regular Expressions, EditPad Pro, PowerGrep, MS Office,
Libre/OpenOffice, Acrobat)
Develop and maintain customized keyboards for multiple operating
systems and languages.
Advocate new and traditional technologies and workflows through
networking and research.
Develop French and English materials and train users in effective use
of linguistic and translation software. (Paratext, FieldWorks, Adapt It,
WeSay, Phonology Assistant, OurWord, etc.)
Customize and troubleshoot linguistic applications for various
languages and situations.
Convert and clean large documents and data for publishing or import.
(Regular Expressions, EditPad Pro, PowerGrep, MS Office,
Libre/OpenOffice, Acrobat)
Develop and maintain customized keyboards for multiple operating
systems and languages.
Advocate new and traditional technologies and workflows through
networking and research.
Paratext is the world’s leading software application for the development
and checking of new Bible translations and revisions to existing texts.
Developed jointly by UBS and SIL International, Paratext is free for
registered users from all walks of life, from church-based translation
efforts in minority languages to Bible publishers in major languages. With
more than 1,500 resource texts and tools available to vetted Bible
translators, it enables consistent and accurate translation based on the
original texts and modeled on versions in major languages. Endowed with
cutting edge collaboration features, Paratext helps Bible translation
teams work together to produce higher-quality translations in much less
time than previous tools and methods have allowed.
The Paratext Prioritization Committee is charged with defining each
year's focus for the Paratext Developers.
This major goal of this workshop was to take the recent and rapid
advances in language technology (such as speech recognition, machine
translation, automatic analysis of syntax, question answering, etc.), and
put them in the hands of those on the front lines of language
documentation and revitalization (such as language community members or
documentary linguists).
Reso Collect is a non-profit organization, registered in Switzerland and
Cameroon. It is committed to the fight against extreme poverty and the
social reintegration of women and girls from disadvantaged neighborhoods.
Its main activity is to offer Cameroon’s women and families equal work
opportunities in order to improve socially and economically.