This notebook demonstrates how to build a pipeline for sentiment analysis of call center conversations. The goal of this pipeline is to develop sentiment analysis for use within an external dashboard.
This tutorial will guide you through the use of NVIDIA's RIVA for automatic speech recognition and text classification. This tutorial uses NetApp cloud storage for data storage and a pre-trained RIVA model.
%load_ext autoreload
%autoreload 2
!pip install pydub
!pip install jiwer
!pip install rouge
!pip install gdown
!pip install tqdm
!pip install matplotlib
WARNING: Value for scheme.platlib does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/lib/python3.8/dist-packages sysconfig: /usr/lib/python3.8/site-packages WARNING: Value for scheme.purelib does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/lib/python3.8/dist-packages sysconfig: /usr/lib/python3.8/site-packages WARNING: Value for scheme.headers does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/include/python3.8/UNKNOWN sysconfig: /usr/include/python3.8/UNKNOWN WARNING: Value for scheme.scripts does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/bin sysconfig: /usr/bin WARNING: Value for scheme.data does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local sysconfig: /usr WARNING: Additional context: user = False home = None root = None prefix = None Requirement already satisfied: pydub in /usr/local/lib/python3.8/dist-packages (0.25.1) WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv WARNING: You are using pip version 21.2.1; however, version 21.2.4 is available. You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command. WARNING: Value for scheme.platlib does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/lib/python3.8/dist-packages sysconfig: /usr/lib/python3.8/site-packages WARNING: Value for scheme.purelib does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/lib/python3.8/dist-packages sysconfig: /usr/lib/python3.8/site-packages WARNING: Value for scheme.headers does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/include/python3.8/UNKNOWN sysconfig: /usr/include/python3.8/UNKNOWN WARNING: Value for scheme.scripts does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/bin sysconfig: /usr/bin WARNING: Value for scheme.data does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local sysconfig: /usr WARNING: Additional context: user = False home = None root = None prefix = None Requirement already satisfied: jiwer in /usr/local/lib/python3.8/dist-packages (2.2.0) Requirement already satisfied: python-Levenshtein in /usr/local/lib/python3.8/dist-packages (from jiwer) (0.12.2) Requirement already satisfied: numpy in /usr/lib/python3/dist-packages (from jiwer) (1.17.4) Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (from python-Levenshtein->jiwer) (57.4.0) WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv WARNING: You are using pip version 21.2.1; however, version 21.2.4 is available. You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command. WARNING: Value for scheme.platlib does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/lib/python3.8/dist-packages sysconfig: /usr/lib/python3.8/site-packages WARNING: Value for scheme.purelib does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/lib/python3.8/dist-packages sysconfig: /usr/lib/python3.8/site-packages WARNING: Value for scheme.headers does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/include/python3.8/UNKNOWN sysconfig: /usr/include/python3.8/UNKNOWN WARNING: Value for scheme.scripts does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/bin sysconfig: /usr/bin WARNING: Value for scheme.data does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local sysconfig: /usr WARNING: Additional context: user = False home = None root = None prefix = None Requirement already satisfied: rouge in /usr/local/lib/python3.8/dist-packages (1.0.1) Requirement already satisfied: six in /usr/local/lib/python3.8/dist-packages (from rouge) (1.16.0) WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv WARNING: You are using pip version 21.2.1; however, version 21.2.4 is available. You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command. WARNING: Value for scheme.platlib does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/lib/python3.8/dist-packages sysconfig: /usr/lib/python3.8/site-packages WARNING: Value for scheme.purelib does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/lib/python3.8/dist-packages sysconfig: /usr/lib/python3.8/site-packages WARNING: Value for scheme.headers does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/include/python3.8/UNKNOWN sysconfig: /usr/include/python3.8/UNKNOWN WARNING: Value for scheme.scripts does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/bin sysconfig: /usr/bin WARNING: Value for scheme.data does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local sysconfig: /usr WARNING: Additional context: user = False home = None root = None prefix = None Requirement already satisfied: gdown in /usr/local/lib/python3.8/dist-packages (3.13.0) Requirement already satisfied: requests[socks]>=2.12.0 in /usr/local/lib/python3.8/dist-packages (from gdown) (2.26.0) Requirement already satisfied: tqdm in /usr/local/lib/python3.8/dist-packages (from gdown) (4.62.2) Requirement already satisfied: six in /usr/local/lib/python3.8/dist-packages (from gdown) (1.16.0) Requirement already satisfied: filelock in /usr/local/lib/python3.8/dist-packages (from gdown) (3.0.12) Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.8/dist-packages (from requests[socks]>=2.12.0->gdown) (1.26.6) Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.8/dist-packages (from requests[socks]>=2.12.0->gdown) (2.0.3) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.8/dist-packages (from requests[socks]>=2.12.0->gdown) (3.2) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.8/dist-packages (from requests[socks]>=2.12.0->gdown) (2021.5.30) Requirement already satisfied: PySocks!=1.5.7,>=1.5.6 in /usr/local/lib/python3.8/dist-packages (from requests[socks]>=2.12.0->gdown) (1.7.1) WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv WARNING: You are using pip version 21.2.1; however, version 21.2.4 is available. You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command. WARNING: Value for scheme.platlib does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/lib/python3.8/dist-packages sysconfig: /usr/lib/python3.8/site-packages WARNING: Value for scheme.purelib does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/lib/python3.8/dist-packages sysconfig: /usr/lib/python3.8/site-packages WARNING: Value for scheme.headers does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/include/python3.8/UNKNOWN sysconfig: /usr/include/python3.8/UNKNOWN WARNING: Value for scheme.scripts does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/bin sysconfig: /usr/bin WARNING: Value for scheme.data does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local sysconfig: /usr WARNING: Additional context: user = False home = None root = None prefix = None Requirement already satisfied: tqdm in /usr/local/lib/python3.8/dist-packages (4.62.2) WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv WARNING: You are using pip version 21.2.1; however, version 21.2.4 is available. You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command. WARNING: Value for scheme.platlib does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/lib/python3.8/dist-packages sysconfig: /usr/lib/python3.8/site-packages WARNING: Value for scheme.purelib does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/lib/python3.8/dist-packages sysconfig: /usr/lib/python3.8/site-packages WARNING: Value for scheme.headers does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/include/python3.8/UNKNOWN sysconfig: /usr/include/python3.8/UNKNOWN WARNING: Value for scheme.scripts does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/bin sysconfig: /usr/bin WARNING: Value for scheme.data does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local sysconfig: /usr WARNING: Additional context: user = False home = None root = None prefix = None Requirement already satisfied: matplotlib in /usr/local/lib/python3.8/dist-packages (3.4.3) Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.8/dist-packages (from matplotlib) (2.8.2) Requirement already satisfied: numpy>=1.16 in /usr/lib/python3/dist-packages (from matplotlib) (1.17.4) Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.8/dist-packages (from matplotlib) (0.10.0) Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.8/dist-packages (from matplotlib) (8.3.2) Requirement already satisfied: pyparsing>=2.2.1 in /usr/local/lib/python3.8/dist-packages (from matplotlib) (2.4.7) Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.8/dist-packages (from matplotlib) (1.3.2) Requirement already satisfied: six in /usr/local/lib/python3.8/dist-packages (from cycler>=0.10->matplotlib) (1.16.0) WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv WARNING: You are using pip version 21.2.1; however, version 21.2.4 is available. You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
## General ##
import io
import os
import grpc
import librosa
import librosa.display
import IPython.display as ipd
from pathlib import Path
from tqdm.notebook import tqdm
## Data Science ##
import random
import numpy as np
import matplotlib.pyplot as plt
## RIVA ##
# Automatic Speech Recognition (ASR) #
import riva_api.audio_pb2 as ra
import riva_api.riva_asr_pb2 as rasr
import riva_api.riva_asr_pb2_grpc as rasr_srv
# Sentiment Analysis #
import riva_api.riva_nlp_pb2 as rnlp
import riva_api.riva_nlp_pb2_grpc as rnlp_srv
# Utilities #
from utils_RIVA import (
# Speech-to-Text #
download_data,
unzip_data,
read_audio,
audio2text,
word_error_rate,
rouge_error,
read_files,
natural_keys,
mp3_2_wav,
batch_metrics,
# Sentiment Analysis #
run_text_classify,
validation_sentiment_analysis,
)
These are the channels on which RIVA is hosting models.
51051
61051
These channels must be aligned with riva_speech_api_port
and riva_vision_api_port
within config.sh
speech_channel = "localhost:51051"
voice_channel = "localhost:61051"
Automatic Speech Recognition (ASR) takes as input an audio stream or audio buffer and returns one or more text transcripts, along with additional optional metadata. ASR represents a full speech recognition pipeline that is GPU accelerated with optimized performance and accuracy. ASR supports synchronous and streaming recognition modes.
For more information on NVIDIA RIVA's Automatic Speech Recognition, visit here.
Use these constants to affect different aspects of this pipeline:
DATA_DIR
: base folder where data is storedDATASET_NAME
: name of the call center datasetCOMPANY_DATE
: folder name identifying the particular call center conversationCall Center - Model Training and Fine-Tuning.ipynb
¶DATA_DIR = "data"
DATASET_NAME = "ReleasedDataset_mp3"
COMPANY_DATE = "Aetna Inc_20171031"
In the cell below, download the call center data from your database into the local storage system. The local location is specified by DATA_DIR
, and the DATASET_NAME
determines the call center dataset.
FIRST_TIME_DOWNLOAD = False
if FIRST_TIME_DOWNLOAD:
download_data(DATA_DIR)
unzip_data(DATA_DIR)
Once the call center audio files are loaded into the local storage, they can be processed for use within NVIDIA RIVA.
paths = read_files(
path = os.path.join(DATA_DIR, DATASET_NAME, COMPANY_DATE, "Audio"),
extension = "mp3",
)
In general, one should avoid working with MP3 files because the use of lossy codecs can reduce quality and performance. Lossless audio formats such as WAV are preferred.
These call center audio files were originally exported to MP3s, so they must be reformatted.
%%time
mp3_2_wav(
wav_path = os.path.join(DATA_DIR, DATASET_NAME, COMPANY_DATE, "Audio"),
mp3_paths = paths,
)
CPU times: user 789 ms, sys: 1.24 s, total: 2.03 s Wall time: 18.9 s
paths = read_files(
path = os.path.join(DATA_DIR, DATASET_NAME, COMPANY_DATE, "Audio" , "wav"),
extension = "wav",
)
The individual conversation sentences can be visualized and heard within the sentiment analysis dashboard.
idx = 1
x, sr = librosa.load(paths[idx])
plt.figure(figsize=(14, 5))
librosa.display.waveplot(x, sr=sr)
plt.show()
ipd.Audio(paths[idx])