In [1]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline
import os

Language Translation in krain

As of v0.17.x, ktrain supports language translation. There are currently two different classes to support language translation. 1) EnglishTranslator for translation to English and 2) Translator for translations between many different languages. Both are wrappers around MarianMT models in the transformers library.

The EnglishTranslator Class for Translation to English

The EnglishTranslator class can be used to easily convert text in various languages to English. It currently supports the following languages:

  • zh : Chinese (both Simplified and Traditional)
  • ar : Arabic
  • ru : Russian
  • de : German
  • af : Afrikaans
  • fr : French
  • es : Spanish
  • it : Italian
  • pt : Portuguese

To demonstrate such translations, let us first begin by importing ktrain.

In [2]:
import ktrain
from ktrain.text.translation import EnglishTranslator, Translator

Next, we will translate the following sentence to English from a number of different source languages.

The pandemic has wreaked havoc on world economies. However, as of June 2020, the U.S. stock market continues to rise.

To generate translations to English from a particuar language, we simply supply the language code for the language as the src_lang argument to EnglishTranslator.

Chinese to English

In [3]:
translator = EnglishTranslator(src_lang='zh')
src_text = '''大流行对世界经济造成了严重破坏。但是,截至2020年6月,美国股票市场持续上涨。'''
The pandemic has caused serious damage to the world economy.
However, the United States stock market continued to rise as of June 2020.

Some comments about translations:

Notice in the example above that we supplied a document of two sentences as input. The translate method can accept single sentences, paragraphs, or entire documents. However, if the document is large (e.g., a book), we recommend that you break it up into smaller chunks (e.g., pages or paragraphs). This is because ktrain tokenizes your document into individual sentences, which are joined together and fed to model as single batch when generating the translation. If the batch is too large for memory, errors will occur.

When instantiating the EnglishTranslator, pretrained models are automatically loaded, which may take a few seconds. Once instantiated, the translate method can be repeatedly invoked on different documents or sentences. Next, let us reinstantiate an EnglishTranslator object to translate Arabic.

Arabic to English

In [4]:
translator = EnglishTranslator(src_lang='ar')
src_text = '''لقد أحدث الوباء دمارا في اقتصادات العالم.
ومع ذلك ، اعتبارًا من يونيو 2020 ، استمرت سوق الأسهم الأمريكية في الارتفاع.
The epidemic has devastated the world's economies.
However, as of June 2020, the American stock market continued to rise.

Russian to English

In [5]:
translator = EnglishTranslator(src_lang='ru')
src_text = '''Пандемия нанесла ущерб мировой экономике.
Однако по состоянию на июнь 2020 года фондовый рынок США продолжает расти.
The pandemic has damaged the world economy.
However, as of June 2020, the US stock market continues to grow.

German to English

In [6]:
translator = EnglishTranslator(src_lang='de')
src_text = '''Die Pandemie hat die Weltwirtschaft verwüstet. 
Ab Juni 2020 steigt der US-Aktienmarkt jedoch weiter an.'''
The pandemic has devastated the global economy.
However, as of June 2020, the US stock market will continue to rise.

French to English

In [7]:
translator = EnglishTranslator(src_lang='fr')
src_text = '''La pandémie a fait des ravages dans les économies mondiales. 
Cependant, en juin 2020, le marché boursier américain continue de grimper.'''
The pandemic has wreaked havoc in world economies.
However, in June 2020, the U.S. stock market continues to climb.

Spanish to English

In [8]:
translator = EnglishTranslator(src_lang='es')
src_text = '''La pandemia ha causado estragos en las economías mundiales. 
Sin embargo, a partir de junio de 2020, el mercado de valores de EE. UU. Continúa aumentando.'''
The pandemic has wreaked havoc on world economies.
However, from June 2020 onwards, the US securities market.
Continues to grow.

The Translator Class for Translating to and from Many Languages

For translations from and to other languages, text.Translatorinstances can be used. Translator instances accept as input a pretrained model from Helsinki-NLP. For instance, to translate Chinese to German, one can use the Helsinki-NLP/opus-mt-zh-de model:

In [9]:
translator = Translator(model_name='Helsinki-NLP/opus-mt-zh-de')
src_text = '''冠状病毒大流行对世界经济造成了严重破坏。但是,截至2020年6月,美国股市继续上涨。'''
Die Pandemie hat eine ernste Zerstörung der Weltwirtschaft verursacht.
Aber bis Juni 2020 stieg der US-Markt weiter an.
In [ ]: