Call Center - Sentiment Analysis Pipeline

This notebook demonstrates how to build a pipeline for sentiment analysis of call center conversations. The goal of this pipeline is to develop sentiment analysis for use within an external dashboard.

This tutorial will guide you through the use of NVIDIA's RIVA for automatic speech recognition and text classification. This tutorial uses NetApp cloud storage for data storage and a pre-trained RIVA model.

In [1]:
%load_ext autoreload
%autoreload 2

Library Installations

In [2]:
!pip install pydub
!pip install jiwer
!pip install rouge
!pip install gdown
!pip install tqdm
!pip install matplotlib
WARNING: Value for scheme.platlib does not match. Please report this to <https://github.com/pypa/pip/issues/10151>
distutils: /usr/local/lib/python3.8/dist-packages
sysconfig: /usr/lib/python3.8/site-packages
WARNING: Value for scheme.purelib does not match. Please report this to <https://github.com/pypa/pip/issues/10151>
distutils: /usr/local/lib/python3.8/dist-packages
sysconfig: /usr/lib/python3.8/site-packages
WARNING: Value for scheme.headers does not match. Please report this to <https://github.com/pypa/pip/issues/10151>
distutils: /usr/local/include/python3.8/UNKNOWN
sysconfig: /usr/include/python3.8/UNKNOWN
WARNING: Value for scheme.scripts does not match. Please report this to <https://github.com/pypa/pip/issues/10151>
distutils: /usr/local/bin
sysconfig: /usr/bin
WARNING: Value for scheme.data does not match. Please report this to <https://github.com/pypa/pip/issues/10151>
distutils: /usr/local
sysconfig: /usr
WARNING: Additional context:
user = False
home = None
root = None
prefix = None
Requirement already satisfied: pydub in /usr/local/lib/python3.8/dist-packages (0.25.1)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
WARNING: You are using pip version 21.2.1; however, version 21.2.4 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
WARNING: Value for scheme.platlib does not match. Please report this to <https://github.com/pypa/pip/issues/10151>
distutils: /usr/local/lib/python3.8/dist-packages
sysconfig: /usr/lib/python3.8/site-packages
WARNING: Value for scheme.purelib does not match. Please report this to <https://github.com/pypa/pip/issues/10151>
distutils: /usr/local/lib/python3.8/dist-packages
sysconfig: /usr/lib/python3.8/site-packages
WARNING: Value for scheme.headers does not match. Please report this to <https://github.com/pypa/pip/issues/10151>
distutils: /usr/local/include/python3.8/UNKNOWN
sysconfig: /usr/include/python3.8/UNKNOWN
WARNING: Value for scheme.scripts does not match. Please report this to <https://github.com/pypa/pip/issues/10151>
distutils: /usr/local/bin
sysconfig: /usr/bin
WARNING: Value for scheme.data does not match. Please report this to <https://github.com/pypa/pip/issues/10151>
distutils: /usr/local
sysconfig: /usr
WARNING: Additional context:
user = False
home = None
root = None
prefix = None
Requirement already satisfied: jiwer in /usr/local/lib/python3.8/dist-packages (2.2.0)
Requirement already satisfied: python-Levenshtein in /usr/local/lib/python3.8/dist-packages (from jiwer) (0.12.2)
Requirement already satisfied: numpy in /usr/lib/python3/dist-packages (from jiwer) (1.17.4)
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (from python-Levenshtein->jiwer) (57.4.0)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
WARNING: You are using pip version 21.2.1; however, version 21.2.4 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
WARNING: Value for scheme.platlib does not match. Please report this to <https://github.com/pypa/pip/issues/10151>
distutils: /usr/local/lib/python3.8/dist-packages
sysconfig: /usr/lib/python3.8/site-packages
WARNING: Value for scheme.purelib does not match. Please report this to <https://github.com/pypa/pip/issues/10151>
distutils: /usr/local/lib/python3.8/dist-packages
sysconfig: /usr/lib/python3.8/site-packages
WARNING: Value for scheme.headers does not match. Please report this to <https://github.com/pypa/pip/issues/10151>
distutils: /usr/local/include/python3.8/UNKNOWN
sysconfig: /usr/include/python3.8/UNKNOWN
WARNING: Value for scheme.scripts does not match. Please report this to <https://github.com/pypa/pip/issues/10151>
distutils: /usr/local/bin
sysconfig: /usr/bin
WARNING: Value for scheme.data does not match. Please report this to <https://github.com/pypa/pip/issues/10151>
distutils: /usr/local
sysconfig: /usr
WARNING: Additional context:
user = False
home = None
root = None
prefix = None
Requirement already satisfied: rouge in /usr/local/lib/python3.8/dist-packages (1.0.1)
Requirement already satisfied: six in /usr/local/lib/python3.8/dist-packages (from rouge) (1.16.0)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
WARNING: You are using pip version 21.2.1; however, version 21.2.4 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
WARNING: Value for scheme.platlib does not match. Please report this to <https://github.com/pypa/pip/issues/10151>
distutils: /usr/local/lib/python3.8/dist-packages
sysconfig: /usr/lib/python3.8/site-packages
WARNING: Value for scheme.purelib does not match. Please report this to <https://github.com/pypa/pip/issues/10151>
distutils: /usr/local/lib/python3.8/dist-packages
sysconfig: /usr/lib/python3.8/site-packages
WARNING: Value for scheme.headers does not match. Please report this to <https://github.com/pypa/pip/issues/10151>
distutils: /usr/local/include/python3.8/UNKNOWN
sysconfig: /usr/include/python3.8/UNKNOWN
WARNING: Value for scheme.scripts does not match. Please report this to <https://github.com/pypa/pip/issues/10151>
distutils: /usr/local/bin
sysconfig: /usr/bin
WARNING: Value for scheme.data does not match. Please report this to <https://github.com/pypa/pip/issues/10151>
distutils: /usr/local
sysconfig: /usr
WARNING: Additional context:
user = False
home = None
root = None
prefix = None
Requirement already satisfied: gdown in /usr/local/lib/python3.8/dist-packages (3.13.0)
Requirement already satisfied: requests[socks]>=2.12.0 in /usr/local/lib/python3.8/dist-packages (from gdown) (2.26.0)
Requirement already satisfied: tqdm in /usr/local/lib/python3.8/dist-packages (from gdown) (4.62.2)
Requirement already satisfied: six in /usr/local/lib/python3.8/dist-packages (from gdown) (1.16.0)
Requirement already satisfied: filelock in /usr/local/lib/python3.8/dist-packages (from gdown) (3.0.12)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.8/dist-packages (from requests[socks]>=2.12.0->gdown) (1.26.6)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.8/dist-packages (from requests[socks]>=2.12.0->gdown) (2.0.3)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.8/dist-packages (from requests[socks]>=2.12.0->gdown) (3.2)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.8/dist-packages (from requests[socks]>=2.12.0->gdown) (2021.5.30)
Requirement already satisfied: PySocks!=1.5.7,>=1.5.6 in /usr/local/lib/python3.8/dist-packages (from requests[socks]>=2.12.0->gdown) (1.7.1)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
WARNING: You are using pip version 21.2.1; however, version 21.2.4 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
WARNING: Value for scheme.platlib does not match. Please report this to <https://github.com/pypa/pip/issues/10151>
distutils: /usr/local/lib/python3.8/dist-packages
sysconfig: /usr/lib/python3.8/site-packages
WARNING: Value for scheme.purelib does not match. Please report this to <https://github.com/pypa/pip/issues/10151>
distutils: /usr/local/lib/python3.8/dist-packages
sysconfig: /usr/lib/python3.8/site-packages
WARNING: Value for scheme.headers does not match. Please report this to <https://github.com/pypa/pip/issues/10151>
distutils: /usr/local/include/python3.8/UNKNOWN
sysconfig: /usr/include/python3.8/UNKNOWN
WARNING: Value for scheme.scripts does not match. Please report this to <https://github.com/pypa/pip/issues/10151>
distutils: /usr/local/bin
sysconfig: /usr/bin
WARNING: Value for scheme.data does not match. Please report this to <https://github.com/pypa/pip/issues/10151>
distutils: /usr/local
sysconfig: /usr
WARNING: Additional context:
user = False
home = None
root = None
prefix = None
Requirement already satisfied: tqdm in /usr/local/lib/python3.8/dist-packages (4.62.2)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
WARNING: You are using pip version 21.2.1; however, version 21.2.4 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
WARNING: Value for scheme.platlib does not match. Please report this to <https://github.com/pypa/pip/issues/10151>
distutils: /usr/local/lib/python3.8/dist-packages
sysconfig: /usr/lib/python3.8/site-packages
WARNING: Value for scheme.purelib does not match. Please report this to <https://github.com/pypa/pip/issues/10151>
distutils: /usr/local/lib/python3.8/dist-packages
sysconfig: /usr/lib/python3.8/site-packages
WARNING: Value for scheme.headers does not match. Please report this to <https://github.com/pypa/pip/issues/10151>
distutils: /usr/local/include/python3.8/UNKNOWN
sysconfig: /usr/include/python3.8/UNKNOWN
WARNING: Value for scheme.scripts does not match. Please report this to <https://github.com/pypa/pip/issues/10151>
distutils: /usr/local/bin
sysconfig: /usr/bin
WARNING: Value for scheme.data does not match. Please report this to <https://github.com/pypa/pip/issues/10151>
distutils: /usr/local
sysconfig: /usr
WARNING: Additional context:
user = False
home = None
root = None
prefix = None
Requirement already satisfied: matplotlib in /usr/local/lib/python3.8/dist-packages (3.4.3)
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.8/dist-packages (from matplotlib) (2.8.2)
Requirement already satisfied: numpy>=1.16 in /usr/lib/python3/dist-packages (from matplotlib) (1.17.4)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.8/dist-packages (from matplotlib) (0.10.0)
Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.8/dist-packages (from matplotlib) (8.3.2)
Requirement already satisfied: pyparsing>=2.2.1 in /usr/local/lib/python3.8/dist-packages (from matplotlib) (2.4.7)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.8/dist-packages (from matplotlib) (1.3.2)
Requirement already satisfied: six in /usr/local/lib/python3.8/dist-packages (from cycler>=0.10->matplotlib) (1.16.0)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
WARNING: You are using pip version 21.2.1; however, version 21.2.4 is available.
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.

Importing Libraries

In [3]:
## General ##
import io
import os 
import grpc
import librosa
import librosa.display
import IPython.display as ipd
from pathlib import Path
from tqdm.notebook import tqdm


## Data Science ##
import random
import numpy as np
import matplotlib.pyplot as plt


## RIVA ##
# Automatic Speech Recognition (ASR) #
import riva_api.audio_pb2 as ra
import riva_api.riva_asr_pb2 as rasr
import riva_api.riva_asr_pb2_grpc as rasr_srv

# Sentiment Analysis #
import riva_api.riva_nlp_pb2 as rnlp
import riva_api.riva_nlp_pb2_grpc as rnlp_srv


# Utilities #
from utils_RIVA import (
    # Speech-to-Text #
    download_data,
    unzip_data,
    read_audio,
    audio2text,
    word_error_rate,
    rouge_error,
    read_files,
    natural_keys,
    mp3_2_wav,
    batch_metrics,
    # Sentiment Analysis #
    run_text_classify,
    validation_sentiment_analysis,
)

Channels

These are the channels on which RIVA is hosting models.

  • speech: 51051
  • voice: 61051

These channels must be aligned with riva_speech_api_port and riva_vision_api_port within config.sh

In [4]:
speech_channel = "localhost:51051"
voice_channel = "localhost:61051"

Speech-To-Text

Automatic Speech Recognition (ASR) takes as input an audio stream or audio buffer and returns one or more text transcripts, along with additional optional metadata. ASR represents a full speech recognition pipeline that is GPU accelerated with optimized performance and accuracy. ASR supports synchronous and streaming recognition modes.

For more information on NVIDIA RIVA's Automatic Speech Recognition, visit here.

Constants

Use these constants to affect different aspects of this pipeline:

  • DATA_DIR: base folder where data is stored
  • DATASET_NAME: name of the call center dataset
  • COMPANY_DATE: folder name identifying the particular call center conversation

NOTE: MAKE SURE THESE CONSTANTS ALIGN WITH Call Center - Model Training and Fine-Tuning.ipynb

In [5]:
DATA_DIR = "data"
DATASET_NAME = "ReleasedDataset_mp3"
COMPANY_DATE = "Aetna Inc_20171031"

Data Acquisition

In the cell below, download the call center data from your database into the local storage system. The local location is specified by DATA_DIR, and the DATASET_NAME determines the call center dataset.

In [6]:
FIRST_TIME_DOWNLOAD = False

if FIRST_TIME_DOWNLOAD:
    download_data(DATA_DIR)
    unzip_data(DATA_DIR)

Loading Files

Once the call center audio files are loaded into the local storage, they can be processed for use within NVIDIA RIVA.

In [7]:
paths = read_files(
    path = os.path.join(DATA_DIR, DATASET_NAME, COMPANY_DATE, "Audio"),
    extension = "mp3",
)

Creating Wav Files

In general, one should avoid working with MP3 files because the use of lossy codecs can reduce quality and performance. Lossless audio formats such as WAV are preferred.

These call center audio files were originally exported to MP3s, so they must be reformatted.

In [8]:
%%time 
mp3_2_wav(
    wav_path = os.path.join(DATA_DIR, DATASET_NAME, COMPANY_DATE, "Audio"),
    mp3_paths = paths,
)
CPU times: user 789 ms, sys: 1.24 s, total: 2.03 s
Wall time: 18.9 s
In [9]:
paths = read_files(
    path = os.path.join(DATA_DIR, DATASET_NAME, COMPANY_DATE, "Audio" , "wav"),
    extension = "wav",
)

The individual conversation sentences can be visualized and heard within the sentiment analysis dashboard.

In [10]:
idx = 1

x, sr = librosa.load(paths[idx])
plt.figure(figsize=(14, 5))
librosa.display.waveplot(x, sr=sr)
plt.show()

ipd.Audio(paths[idx])
Out[10]: