%reload_ext autoreload
%autoreload 2
%matplotlib inline
import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID";
os.environ["CUDA_VISIBLE_DEVICES"]="0"
As of v0.17.x, ktrain supports language translation. There are currently two different classes to support language translation. 1) EnglishTranslator
for translation to English and 2) Translator
for translations between many different languages. Both are wrappers around MarianMT
models in the transformers
library.
EnglishTranslator
Class for Translation to English¶The EnglishTranslator
class can be used to easily convert text in various languages to English. It currently supports the following languages:
zh
: Chinese (both Simplified and Traditional)ar
: Arabicru
: Russiande
: Germanaf
: Afrikaansfr
: Frenches
: Spanishit
: Italianpt
: PortugueseTo demonstrate such translations, let us first begin by importing ktrain.
import ktrain
from ktrain.text.translation import EnglishTranslator, Translator
Next, we will translate the following sentence to English from a number of different source languages.
The pandemic has wreaked havoc on world economies. However, as of June 2020, the U.S. stock market continues to rise.
To generate translations to English from a particuar language, we simply supply the language code for the language as the src_lang
argument to EnglishTranslator
.
translator = EnglishTranslator(src_lang='zh')
src_text = '''大流行对世界经济造成了严重破坏。但是,截至2020年6月,美国股票市场持续上涨。'''
print(translator.translate(src_text))
The pandemic has caused serious damage to the world economy. However, the United States stock market continued to rise as of June 2020.
Notice in the example above that we supplied a document of two sentences as input. The translate
method can accept single sentences, paragraphs, or entire documents. However, if the document is large (e.g., a book), we recommend that you break it up into smaller chunks (e.g., pages or paragraphs). This is because ktrain tokenizes your document into individual sentences, which are joined together and fed to model as single batch when generating the translation. If the batch is too large for memory, errors will occur.
When instantiating the EnglishTranslator
, pretrained models are automatically loaded, which may take a few seconds. Once instantiated, the translate
method can be repeatedly invoked on different documents or sentences. Next, let us reinstantiate an EnglishTranslator
object to translate Arabic.
translator = EnglishTranslator(src_lang='ar')
src_text = '''لقد أحدث الوباء دمارا في اقتصادات العالم.
ومع ذلك ، اعتبارًا من يونيو 2020 ، استمرت سوق الأسهم الأمريكية في الارتفاع.
'''
print(translator.translate(src_text))
The epidemic has devastated the world's economies. However, as of June 2020, the American stock market continued to rise.
translator = EnglishTranslator(src_lang='ru')
src_text = '''Пандемия нанесла ущерб мировой экономике.
Однако по состоянию на июнь 2020 года фондовый рынок США продолжает расти.
'''
print(translator.translate(src_text))
The pandemic has damaged the world economy. However, as of June 2020, the US stock market continues to grow.
translator = EnglishTranslator(src_lang='de')
src_text = '''Die Pandemie hat die Weltwirtschaft verwüstet.
Ab Juni 2020 steigt der US-Aktienmarkt jedoch weiter an.'''
print(translator.translate(src_text))
The pandemic has devastated the global economy. However, as of June 2020, the US stock market will continue to rise.
translator = EnglishTranslator(src_lang='fr')
src_text = '''La pandémie a fait des ravages dans les économies mondiales.
Cependant, en juin 2020, le marché boursier américain continue de grimper.'''
print(translator.translate(src_text))
The pandemic has wreaked havoc in world economies. However, in June 2020, the U.S. stock market continues to climb.
translator = EnglishTranslator(src_lang='es')
src_text = '''La pandemia ha causado estragos en las economías mundiales.
Sin embargo, a partir de junio de 2020, el mercado de valores de EE. UU. Continúa aumentando.'''
print(translator.translate(src_text))
The pandemic has wreaked havoc on world economies. However, from June 2020 onwards, the US securities market. USA. Continues to grow.
Translator
Class for Translating to and from Many Languages¶For translations from and to other languages, text.Translator
instances can be used. Translator
instances accept as input a pretrained model from Helsinki-NLP. For instance, to translate Chinese to German, one can use the Helsinki-NLP/opus-mt-zh-de model:
translator = Translator(model_name='Helsinki-NLP/opus-mt-zh-de')
src_text = '''冠状病毒大流行对世界经济造成了严重破坏。但是,截至2020年6月,美国股市继续上涨。'''
print(translator.translate(src_text))
Die Pandemie hat eine ernste Zerstörung der Weltwirtschaft verursacht. Aber bis Juni 2020 stieg der US-Markt weiter an.