無縫 M4T
Meta 三個月前公佈的 SeamlessM4T (Massively Multilingual and Multimodal Machine Translation model) ,目前已更新到 v2,於 GitHub 開放下載最新的源碼。SeamlessM4T v2 採用 UnitY2 架構的更新版本。與 SeamlessM4T v1 相比,此新模型在品質以及語音生成任務中的推理延遲方面有所改進。
M4T 是一體式大規模多語言和多模式的機器翻譯模型,可為近 100 種語言的語音和文字提供高品質翻譯。
SeamlessM4T 模型支援以下任務:
- 語音轉語音翻譯 (S2ST)
- 語音轉文字翻譯 (S2TT)
- 文字轉語音翻譯 (T2ST)
- 文本到文本翻譯 (T2TT)
- 自動語音辨識 (ASR)
下面列出了 SeamlessM4T-large (v1/v2) 支援的語言。 來源列指定是否支援某種語言作為來源語音 (Sp) 和/或來源文字 (Tx)。 目標列指定是否支援某語言作為目標語音 (Sp) 和/或目標文字 (Tx)。可惜暫時未見有廣東話 tts!
編碼 | 語言 | script | 來源 | 目標 |
---|---|---|---|---|
afr | Afrikaans | Latn | Sp, Tx | Tx |
amh | Amharic | Ethi | Sp, Tx | Tx |
arb | Modern Standard Arabic | Arab | Sp, Tx | Sp, Tx |
ary | Moroccan Arabic | Arab | Sp, Tx | Tx |
arz | Egyptian Arabic | Arab | Sp, Tx | Tx |
asm | Assamese | Beng | Sp, Tx | Tx |
ast | Asturian | Latn | Sp | — |
azj | North Azerbaijani | Latn | Sp, Tx | Tx |
bel | Belarusian | Cyrl | Sp, Tx | Tx |
ben | Bengali | Beng | Sp, Tx | Sp, Tx |
bos | Bosnian | Latn | Sp, Tx | Tx |
bul | Bulgarian | Cyrl | Sp, Tx | Tx |
cat | Catalan | Latn | Sp, Tx | Sp, Tx |
ceb | Cebuano | Latn | Sp, Tx | Tx |
ces | Czech | Latn | Sp, Tx | Sp, Tx |
ckb | Central Kurdish | Arab | Sp, Tx | Tx |
cmn | Mandarin Chinese | Hans | Sp, Tx | Sp, Tx |
cmn_Hant | Mandarin Chinese | Hant | Sp, Tx | Sp, Tx |
cym | Welsh | Latn | Sp, Tx | Sp, Tx |
dan | Danish | Latn | Sp, Tx | Sp, Tx |
deu | German | Latn | Sp, Tx | Sp, Tx |
ell | Greek | Grek | Sp, Tx | Tx |
eng | English | Latn | Sp, Tx | Sp, Tx |
est | Estonian | Latn | Sp, Tx | Sp, Tx |
eus | Basque | Latn | Sp, Tx | Tx |
fin | Finnish | Latn | Sp, Tx | Sp, Tx |
fra | French | Latn | Sp, Tx | Sp, Tx |
fuv | Nigerian Fulfulde | Latn | Sp, Tx | Tx |
gaz | West Central Oromo | Latn | Sp, Tx | Tx |
gle | Irish | Latn | Sp, Tx | Tx |
glg | Galician | Latn | Sp, Tx | Tx |
guj | Gujarati | Gujr | Sp, Tx | Tx |
heb | Hebrew | Hebr | Sp, Tx | Tx |
hin | Hindi | Deva | Sp, Tx | Sp, Tx |
hrv | Croatian | Latn | Sp, Tx | Tx |
hun | Hungarian | Latn | Sp, Tx | Tx |
hye | Armenian | Armn | Sp, Tx | Tx |
ibo | Igbo | Latn | Sp, Tx | Tx |
ind | Indonesian | Latn | Sp, Tx | Sp, Tx |
isl | Icelandic | Latn | Sp, Tx | Tx |
ita | Italian | Latn | Sp, Tx | Sp, Tx |
jav | Javanese | Latn | Sp, Tx | Tx |
jpn | Japanese | Jpan | Sp, Tx | Sp, Tx |
kam | Kamba | Latn | Sp | — |
kan | Kannada | Knda | Sp, Tx | Tx |
kat | Georgian | Geor | Sp, Tx | Tx |
kaz | Kazakh | Cyrl | Sp, Tx | Tx |
kea | Kabuverdianu | Latn | Sp | — |
khk | Halh Mongolian | Cyrl | Sp, Tx | Tx |
khm | Khmer | Khmr | Sp, Tx | Tx |
kir | Kyrgyz | Cyrl | Sp, Tx | Tx |
kor | Korean | Kore | Sp, Tx | Sp, Tx |
lao | Lao | Laoo | Sp, Tx | Tx |
lit | Lithuanian | Latn | Sp, Tx | Tx |
ltz | Luxembourgish | Latn | Sp | — |
lug | Ganda | Latn | Sp, Tx | Tx |
luo | Luo | Latn | Sp, Tx | Tx |
lvs | Standard Latvian | Latn | Sp, Tx | Tx |
mai | Maithili | Deva | Sp, Tx | Tx |
mal | Malayalam | Mlym | Sp, Tx | Tx |
mar | Marathi | Deva | Sp, Tx | Tx |
mkd | Macedonian | Cyrl | Sp, Tx | Tx |
mlt | Maltese | Latn | Sp, Tx | Sp, Tx |
mni | Meitei | Beng | Sp, Tx | Tx |
mya | Burmese | Mymr | Sp, Tx | Tx |
nld | Dutch | Latn | Sp, Tx | Sp, Tx |
nno | Norwegian Nynorsk | Latn | Sp, Tx | Tx |
nob | Norwegian Bokmål | Latn | Sp, Tx | Tx |
npi | Nepali | Deva | Sp, Tx | Tx |
nya | Nyanja | Latn | Sp, Tx | Tx |
oci | Occitan | Latn | Sp | — |
ory | Odia | Orya | Sp, Tx | Tx |
pan | Punjabi | Guru | Sp, Tx | Tx |
pbt | Southern Pashto | Arab | Sp, Tx | Tx |
pes | Western Persian | Arab | Sp, Tx | Sp, Tx |
pol | Polish | Latn | Sp, Tx | Sp, Tx |
por | Portuguese | Latn | Sp, Tx | Sp, Tx |
ron | Romanian | Latn | Sp, Tx | Sp, Tx |
rus | Russian | Cyrl | Sp, Tx | Sp, Tx |
slk | Slovak | Latn | Sp, Tx | Sp, Tx |
slv | Slovenian | Latn | Sp, Tx | Tx |
sna | Shona | Latn | Sp, Tx | Tx |
snd | Sindhi | Arab | Sp, Tx | Tx |
som | Somali | Latn | Sp, Tx | Tx |
spa | Spanish | Latn | Sp, Tx | Sp, Tx |
srp | Serbian | Cyrl | Sp, Tx | Tx |
swe | Swedish | Latn | Sp, Tx | Sp, Tx |
swh | Swahili | Latn | Sp, Tx | Sp, Tx |
tam | Tamil | Taml | Sp, Tx | Tx |
tel | Telugu | Telu | Sp, Tx | Sp, Tx |
tgk | Tajik | Cyrl | Sp, Tx | Tx |
tgl | Tagalog | Latn | Sp, Tx | Sp, Tx |
tha | Thai | Thai | Sp, Tx | Sp, Tx |
tur | Turkish | Latn | Sp, Tx | Sp, Tx |
ukr | Ukrainian | Cyrl | Sp, Tx | Sp, Tx |
urd | Urdu | Arab | Sp, Tx | Sp, Tx |
uzn | Northern Uzbek | Latn | Sp, Tx | Sp, Tx |
vie | Vietnamese | Latn | Sp, Tx | Sp, Tx |
xho | Xhosa | Latn | Sp | — |
yor | Yoruba | Latn | Sp, Tx | Tx |
yue | Cantonese | Hant | Sp, Tx | Tx |
zlm | Colloquial Malay | Latn | Sp | — |
zsm | Standard Malay | Latn | Tx | Tx |
zul | Zulu | Latn | Sp, Tx | Tx |
請注意,seamlessM4T-medium 在文字模式中支援 200 種語言,並且基於 NLLB-200(請參閱資產卡中的完整清單)
Hugging Face Demo (A100 GPU)