About Taiwanese (Tâi-gí 台語)

Taiwanese, also called Taiwanese Hokkien, Tâi-gí (台語), or Taigi, is a variety of Southern Min (閩南語). Roughly 19 million people in Taiwan speak it to some degree; related Hokkien varieties add tens of millions of speakers abroad. It is not a dialect of Mandarin; a Mandarin speaker listening to spoken Taiwanese understands almost none of it. It has its own sound system, with seven tones in the traditional count (two of them only in checked syllables). The everyday vocabulary is its own too, with loanwords from Japanese and from Taiwan's Austronesian languages.

How Taiwanese Is Written

Taiwanese lives mostly in speech, but it has three written forms, and this tool shows all of them. Taiwanese 漢字 writes the language in Chinese characters. Pe̍h-ōe-jī (白話字), the older romanization, was created in the 1800s by Western missionaries and spread in Taiwan by the Presbyterian church; for many Taiwanese converts it was the first writing system they ever learned to read, and Taiwan's first printed newspaper (1885) was published in it. That century of religious literature is why many Taiwanese Christians still prefer Pe̍h-ōe-jī today. Tâi-lô (台羅), standardized by Taiwan's Ministry of Education in 2006, was derived from Pe̍h-ōe-jī and now dominates schools and official materials; the two differ in only a handful of spellings (ch→ts, oa→ua, ⁿ→nn, among others).

Suppression and Revival

For most of the island's modern history, Taiwanese was the language of the street. Japanese rule (1895 to 1945) pushed it out of schools and public life. The Kuomintang went further: from 1946, and for the whole of martial law (1949 to 1987), Mandarin was the language of classrooms, government, and broadcasting; children were punished and mocked for speaking Taiwanese at school, and Bibles printed in romanized Taiwanese were confiscated. So a generation of parents quietly stopped passing the language down, to spare their children the trouble. That choice, repeated across millions of homes, may have done more damage than any single ban.

The language came back with democracy. Taiwanese television, pop music, and political campaigning returned after martial law ended in 1987; since 2001 every elementary student has taken a weekly class in a local language; the 2019 National Languages Development Act (國家語言發展法) made Taiwanese a national language. But an hour or two a week, often taught in Mandarin, has not been enough to raise speakers. Researchers applying UNESCO's vitality criteria place its transmission between “vulnerable” and “definitely endangered”, closer to the latter, and many younger Taiwanese understand the language without being able to speak it. Keeping it heard is the point of this project.

About This Tool

Taiwanese English Translator is a free voice translator between English, Taiwanese, and Mandarin. Type or speak, and hear the translation spoken back, with Taiwanese shown in 漢字, Tâi-lô, and Pe̍h-ōe-jī. Everything works in the browser, with nothing to install and no account. It exists because Google Translate, Microsoft, and DeepL all leave Taiwanese out entirely, and because a language that lives in speech needs a tool that listens and talks.

Getting Good Results

  • Speak or type complete sentences. The models were trained on natural speech and translate full phrases far better than isolated words.
  • Record somewhere quiet, with the phone close to whoever is speaking.
  • The first translation after a quiet period can take a few minutes while the service wakes up. After that each one takes seconds, so keep the page open between requests.
  • For a two-way conversation, use the Conversation tab: lay the phone flat between you with the top half facing the Taiwanese speaker.
  • Tap Slow to replay unfamiliar Taiwanese at 70% speed with the tones intact.

How It Works

Every translation flows through text, so you always see what was heard and what is being said:

  • Translation: Taigi-Llama, an open language model fine-tuned on hundreds of thousands of parallel sentences, translates between English, Mandarin, and Taiwanese, in both characters and romanization.
  • Speech recognition: MediaTek's open Breeze ASR model, which has heard some ten thousand hours of synthesized Taiwanese, listens to the Taiwanese and Mandarin sides. OpenAI's Whisper listens to the English side.
  • Taiwanese voice and romanization: Taiwanese audio and Tâi-lô romanization come from the SuíSiann (媠聲) service by 意傳科技 (Ìthuân), with the open-source taibun library as a fallback.

Credits

The Taiwanese voice you hear is synthesized by SuíSiann (媠聲), an open service built and hosted by 意傳科技 (Ìthuân), a team in Changhua creating open Taiwanese language technology. Their code is released under the MIT license, and this tool would not speak without them. Try their site directly: suisiann.ithuan.tw

This is an independent open project, not affiliated with any company. If you spot a translation mistake or want to help, reach out through the feedback form. For the data behind the language's situation, with sources, see Taiwanese as a spoken language.

Translate & speak Taiwanese now →