In [1]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline
import os

Building an Arabic Sentiment Analyzer With BERT

In this notebook, we will build a simple, fast, and accurate Arabic-language text classification model with minimal effort. More specifically, we will build a model that classifies Arabic hotel reviews as either positive or negative.

The dataset can be downloaded from Ashraf Elnagar's GitHub repository (

Each entry in the dataset includes a review in Arabic and a rating between 1 and 5. We will convert this to a binary classification dataset by assigning reviews with a rating of above 3 a positive label and assigning reviews with a rating of less than 3 a negative label.

(Disclaimer: I don't speak Arabic. Please forgive mistakes.)

In [3]:
# convert ratings to a binary format:  pos=positive, neg=negative
import pandas as pd
df = pd.read_csv('data/arabic_hotel_reviews/balanced-reviews.txt', delimiter='\t', encoding='utf-16')
df = df[['rating', 'review']] 
df['rating'] = df['rating'].apply(lambda x: 'neg' if x < 3 else 'pos')
rating review
0 neg “ممتاز”. النظافة والطاقم متعاون.
1 pos استثنائي. سهولة إنهاء المعاملة في الاستقبال. ل...
2 pos استثنائي. انصح بأختيار الاسويت و بالاخص غرفه ر...
3 neg “استغرب تقييم الفندق كخمس نجوم”. لا شي. يستحق ...
4 pos جيد. المكان جميل وهاديء. كل شي جيد ونظيف بس كا...

Let's split out a training and validation set.

In [4]:
df_train = df.sample(frac=0.85, random_state=42)
df_test = df.drop(df_train.index)
len(df_train), len(df_test)
(89843, 15855)

With the Transformer API in ktrain, we can select any Hugging Face transformers model appropriate for our data. Since we are dealing with Arabic, we will use AraBERT by the AUB MIND Lab instead of multilingual BERT (which is normally used by ktrain for non-English datasets in the alternative text_classifier API in ktrain). As you can see below, with only 1 epoch, we obtain a 96.37 accuracy on the validation set.

In [7]:
import ktrain
from ktrain import text
MODEL_NAME = 'aubmindlab/bert-base-arabertv01'
t = text.Transformer(MODEL_NAME, maxlen=128)
trn = t.preprocess_train(, df_train.rating.values)
val = t.preprocess_test(, df_test.rating.values)
model = t.get_classifier()
learner = ktrain.get_learner(model, train_data=trn, val_data=val, batch_size=32)
learner.fit_onecycle(5e-5, 1)
preprocessing train...
language: ar
train sequence lengths:
	mean : 24
	95percentile : 67
	99percentile : 120
Is Multi-Label? False
preprocessing test...
language: ar
test sequence lengths:
	mean : 24
	95percentile : 67
	99percentile : 121

begin training using onecycle policy with max lr of 5e-05...
Train for 2808 steps, validate for 496 steps
2808/2808 [==============================] - 1104s 393ms/step - loss: 0.1447 - accuracy: 0.9466 - val_loss: 0.1054 - val_accuracy: 0.9637
<tensorflow.python.keras.callbacks.History at 0x7f06344b84a8>

Making Predictions on New Data

In [8]:
p = ktrain.get_predictor(learner.model, t)

Predicting label for the text

"The room was clean, the food excellent, and I loved the view from my room."

In [9]:
p.predict("الغرفة كانت نظيفة ، الطعام ممتاز ، وأنا أحب المنظر من غرفتي.")

Predicting label for:

"This hotel was too expensive and the staff is rude."

In [10]:
p.predict('كان هذا الفندق باهظ الثمن والموظفين غير مهذبين.')

Save our Predictor for Later Deployment

In [11]:
# save model for later use'/tmp/arabic_predictor')
In [12]:
# reload from disk
p = ktrain.load_predictor('/tmp/arabic_predictor')
In [13]:
# still works as expected after reloading from disk
p.predict("الغرفة كانت نظيفة ، الطعام ممتاز ، وأنا أحب المنظر من غرفتي.")
In [ ]: