This notebook demonstrates how to build a pipeline for sentiment analysis of call center conversations. The goal of this pipeline is to develop sentiment analysis for use within an external dashboard.
This tutorial will guide you through the use of NVIDIA's RIVA for automatic speech recognition and text classification. This tutorial uses NetApp cloud storage for data storage and a pre-trained RIVA model.
%load_ext autoreload
%autoreload 2
!pip install pydub
!pip install jiwer
!pip install rouge
!pip install gdown
!pip install tqdm
!pip install matplotlib
WARNING: Value for scheme.platlib does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/lib/python3.8/dist-packages sysconfig: /usr/lib/python3.8/site-packages WARNING: Value for scheme.purelib does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/lib/python3.8/dist-packages sysconfig: /usr/lib/python3.8/site-packages WARNING: Value for scheme.headers does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/include/python3.8/UNKNOWN sysconfig: /usr/include/python3.8/UNKNOWN WARNING: Value for scheme.scripts does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/bin sysconfig: /usr/bin WARNING: Value for scheme.data does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local sysconfig: /usr WARNING: Additional context: user = False home = None root = None prefix = None Requirement already satisfied: pydub in /usr/local/lib/python3.8/dist-packages (0.25.1) WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv WARNING: You are using pip version 21.2.1; however, version 21.2.4 is available. You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command. WARNING: Value for scheme.platlib does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/lib/python3.8/dist-packages sysconfig: /usr/lib/python3.8/site-packages WARNING: Value for scheme.purelib does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/lib/python3.8/dist-packages sysconfig: /usr/lib/python3.8/site-packages WARNING: Value for scheme.headers does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/include/python3.8/UNKNOWN sysconfig: /usr/include/python3.8/UNKNOWN WARNING: Value for scheme.scripts does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/bin sysconfig: /usr/bin WARNING: Value for scheme.data does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local sysconfig: /usr WARNING: Additional context: user = False home = None root = None prefix = None Requirement already satisfied: jiwer in /usr/local/lib/python3.8/dist-packages (2.2.0) Requirement already satisfied: python-Levenshtein in /usr/local/lib/python3.8/dist-packages (from jiwer) (0.12.2) Requirement already satisfied: numpy in /usr/lib/python3/dist-packages (from jiwer) (1.17.4) Requirement already satisfied: setuptools in /usr/local/lib/python3.8/dist-packages (from python-Levenshtein->jiwer) (57.4.0) WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv WARNING: You are using pip version 21.2.1; however, version 21.2.4 is available. You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command. WARNING: Value for scheme.platlib does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/lib/python3.8/dist-packages sysconfig: /usr/lib/python3.8/site-packages WARNING: Value for scheme.purelib does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/lib/python3.8/dist-packages sysconfig: /usr/lib/python3.8/site-packages WARNING: Value for scheme.headers does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/include/python3.8/UNKNOWN sysconfig: /usr/include/python3.8/UNKNOWN WARNING: Value for scheme.scripts does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/bin sysconfig: /usr/bin WARNING: Value for scheme.data does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local sysconfig: /usr WARNING: Additional context: user = False home = None root = None prefix = None Requirement already satisfied: rouge in /usr/local/lib/python3.8/dist-packages (1.0.1) Requirement already satisfied: six in /usr/local/lib/python3.8/dist-packages (from rouge) (1.16.0) WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv WARNING: You are using pip version 21.2.1; however, version 21.2.4 is available. You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command. WARNING: Value for scheme.platlib does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/lib/python3.8/dist-packages sysconfig: /usr/lib/python3.8/site-packages WARNING: Value for scheme.purelib does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/lib/python3.8/dist-packages sysconfig: /usr/lib/python3.8/site-packages WARNING: Value for scheme.headers does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/include/python3.8/UNKNOWN sysconfig: /usr/include/python3.8/UNKNOWN WARNING: Value for scheme.scripts does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/bin sysconfig: /usr/bin WARNING: Value for scheme.data does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local sysconfig: /usr WARNING: Additional context: user = False home = None root = None prefix = None Requirement already satisfied: gdown in /usr/local/lib/python3.8/dist-packages (3.13.0) Requirement already satisfied: requests[socks]>=2.12.0 in /usr/local/lib/python3.8/dist-packages (from gdown) (2.26.0) Requirement already satisfied: tqdm in /usr/local/lib/python3.8/dist-packages (from gdown) (4.62.2) Requirement already satisfied: six in /usr/local/lib/python3.8/dist-packages (from gdown) (1.16.0) Requirement already satisfied: filelock in /usr/local/lib/python3.8/dist-packages (from gdown) (3.0.12) Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.8/dist-packages (from requests[socks]>=2.12.0->gdown) (1.26.6) Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.8/dist-packages (from requests[socks]>=2.12.0->gdown) (2.0.3) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.8/dist-packages (from requests[socks]>=2.12.0->gdown) (3.2) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.8/dist-packages (from requests[socks]>=2.12.0->gdown) (2021.5.30) Requirement already satisfied: PySocks!=1.5.7,>=1.5.6 in /usr/local/lib/python3.8/dist-packages (from requests[socks]>=2.12.0->gdown) (1.7.1) WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv WARNING: You are using pip version 21.2.1; however, version 21.2.4 is available. You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command. WARNING: Value for scheme.platlib does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/lib/python3.8/dist-packages sysconfig: /usr/lib/python3.8/site-packages WARNING: Value for scheme.purelib does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/lib/python3.8/dist-packages sysconfig: /usr/lib/python3.8/site-packages WARNING: Value for scheme.headers does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/include/python3.8/UNKNOWN sysconfig: /usr/include/python3.8/UNKNOWN WARNING: Value for scheme.scripts does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/bin sysconfig: /usr/bin WARNING: Value for scheme.data does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local sysconfig: /usr WARNING: Additional context: user = False home = None root = None prefix = None Requirement already satisfied: tqdm in /usr/local/lib/python3.8/dist-packages (4.62.2) WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv WARNING: You are using pip version 21.2.1; however, version 21.2.4 is available. You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command. WARNING: Value for scheme.platlib does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/lib/python3.8/dist-packages sysconfig: /usr/lib/python3.8/site-packages WARNING: Value for scheme.purelib does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/lib/python3.8/dist-packages sysconfig: /usr/lib/python3.8/site-packages WARNING: Value for scheme.headers does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/include/python3.8/UNKNOWN sysconfig: /usr/include/python3.8/UNKNOWN WARNING: Value for scheme.scripts does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local/bin sysconfig: /usr/bin WARNING: Value for scheme.data does not match. Please report this to <https://github.com/pypa/pip/issues/10151> distutils: /usr/local sysconfig: /usr WARNING: Additional context: user = False home = None root = None prefix = None Requirement already satisfied: matplotlib in /usr/local/lib/python3.8/dist-packages (3.4.3) Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.8/dist-packages (from matplotlib) (2.8.2) Requirement already satisfied: numpy>=1.16 in /usr/lib/python3/dist-packages (from matplotlib) (1.17.4) Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.8/dist-packages (from matplotlib) (0.10.0) Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.8/dist-packages (from matplotlib) (8.3.2) Requirement already satisfied: pyparsing>=2.2.1 in /usr/local/lib/python3.8/dist-packages (from matplotlib) (2.4.7) Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.8/dist-packages (from matplotlib) (1.3.2) Requirement already satisfied: six in /usr/local/lib/python3.8/dist-packages (from cycler>=0.10->matplotlib) (1.16.0) WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv WARNING: You are using pip version 21.2.1; however, version 21.2.4 is available. You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.
## General ##
import io
import os
import grpc
import librosa
import librosa.display
import IPython.display as ipd
from pathlib import Path
from tqdm.notebook import tqdm
## Data Science ##
import random
import numpy as np
import matplotlib.pyplot as plt
## RIVA ##
# Automatic Speech Recognition (ASR) #
import riva_api.audio_pb2 as ra
import riva_api.riva_asr_pb2 as rasr
import riva_api.riva_asr_pb2_grpc as rasr_srv
# Sentiment Analysis #
import riva_api.riva_nlp_pb2 as rnlp
import riva_api.riva_nlp_pb2_grpc as rnlp_srv
# Utilities #
from utils_RIVA import (
# Speech-to-Text #
download_data,
unzip_data,
read_audio,
audio2text,
word_error_rate,
rouge_error,
read_files,
natural_keys,
mp3_2_wav,
batch_metrics,
# Sentiment Analysis #
run_text_classify,
validation_sentiment_analysis,
)
These are the channels on which RIVA is hosting models.
51051
61051
These channels must be aligned with riva_speech_api_port
and riva_vision_api_port
within config.sh
speech_channel = "localhost:51051"
voice_channel = "localhost:61051"
Automatic Speech Recognition (ASR) takes as input an audio stream or audio buffer and returns one or more text transcripts, along with additional optional metadata. ASR represents a full speech recognition pipeline that is GPU accelerated with optimized performance and accuracy. ASR supports synchronous and streaming recognition modes.
For more information on NVIDIA RIVA's Automatic Speech Recognition, visit here.
Use these constants to affect different aspects of this pipeline:
DATA_DIR
: base folder where data is storedDATASET_NAME
: name of the call center datasetCOMPANY_DATE
: folder name identifying the particular call center conversationCall Center - Model Training and Fine-Tuning.ipynb
¶DATA_DIR = "data"
DATASET_NAME = "ReleasedDataset_mp3"
COMPANY_DATE = "Aetna Inc_20171031"
In the cell below, download the call center data from your database into the local storage system. The local location is specified by DATA_DIR
, and the DATASET_NAME
determines the call center dataset.
FIRST_TIME_DOWNLOAD = False
if FIRST_TIME_DOWNLOAD:
download_data(DATA_DIR)
unzip_data(DATA_DIR)
Once the call center audio files are loaded into the local storage, they can be processed for use within NVIDIA RIVA.
paths = read_files(
path = os.path.join(DATA_DIR, DATASET_NAME, COMPANY_DATE, "Audio"),
extension = "mp3",
)
In general, one should avoid working with MP3 files because the use of lossy codecs can reduce quality and performance. Lossless audio formats such as WAV are preferred.
These call center audio files were originally exported to MP3s, so they must be reformatted.
%%time
mp3_2_wav(
wav_path = os.path.join(DATA_DIR, DATASET_NAME, COMPANY_DATE, "Audio"),
mp3_paths = paths,
)
CPU times: user 789 ms, sys: 1.24 s, total: 2.03 s Wall time: 18.9 s
paths = read_files(
path = os.path.join(DATA_DIR, DATASET_NAME, COMPANY_DATE, "Audio" , "wav"),
extension = "wav",
)
The individual conversation sentences can be visualized and heard within the sentiment analysis dashboard.
idx = 1
x, sr = librosa.load(paths[idx])
plt.figure(figsize=(14, 5))
librosa.display.waveplot(x, sr=sr)
plt.show()
ipd.Audio(paths[idx])
The massive volume of calls that a call center must process on a daily basis means that a database can be quickly overwhelmed by audio files. Efficiently managing the processing and transfer of these audio files is an integral part of the sentiment analysis pipeline.
The data processing steps can be facilitated through the use of the NetApp DataOps Toolkit. This toolkit is a Python library that makes it simple for developers, data scientists, DevOps engineers, and data engineers to perform various data management tasks, such as provisioning a new data volume, near-instantaneously cloning a data volume, and near-instantaneously snapshotting a data volume for traceability/baselining.
Installation and usage of the NetApp DataOps Toolkit for Traditional Environments requires that Python 3.6 or above be installed on the local host. Additionally, the toolkit requires that pip for Python3 be installed.
For more information on the NetApp DataOps Toolkit, click here.
To install the NetApp DataOps Toolkit for Traditional Environments, run the following command.
python3 -m pip install netapp-dataops-traditional
A config file must be created before the NetApp DataOps Toolkit for Traditional Environments can be used to perform data management operations. To create a config file, run the following command. This command will create a config file named 'config.json' in '~/.netapp_dataops/'.
netapp_dataops_cli.py config
CREATE_WAV_ALL_MP3S = True
if CREATE_WAV_ALL_MP3S:
company_dates = sorted(os.listdir(os.path.join(DATA_DIR, DATASET_NAME)))
for company_date in tqdm(company_dates):
if not os.path.exists(os.path.join(DATA_DIR, DATASET_NAME, company_date, "Audio", "wav")):
mp3_paths = read_files(
path = os.path.join(DATA_DIR, DATASET_NAME, company_date, "Audio"),
extension = "mp3",
)
mp3_2_wav(
wav_path = os.path.join(DATA_DIR, DATASET_NAME, company_date, "Audio"),
mp3_paths = mp3_paths,
)
0%| | 0/575 [00:00<?, ?it/s]
To connect to the RIVA speech api server, we create a RIVA client that we can connect to the appropriate servers via the specified ports through grpc.
channel = grpc.insecure_channel(speech_channel)
asr_client = rasr_srv.RivaSpeechRecognitionStub(channel)
The WAV files are passed through the RIVA ASR API to obtain the speech-to-text transcription.
%%time
contents, sample_rate = read_audio(paths)
CPU times: user 145 ms, sys: 226 ms, total: 371 ms Wall time: 672 ms
%%time
transcript_predicted = audio2text(contents, sample_rate, asr_client)
CPU times: user 128 ms, sys: 77.4 ms, total: 206 ms Wall time: 22 s
transcript_predicted
"Hank you ark and good morning everyone. \nArlier today we reported 3rd quarter. Adjusted earnings of million. And adjusted earnings per share of. Earoveryear growth of and. Respectively. \nUr 3rd quarter results reflect strong core business fundamentals driven by disciplined pricing moderate medical cost trends and continued capital deployment. \nGive with some comments on overall performance. \nUr medical membership with. Million is in line with our expectations for the period. \nDjusted revenue was billion. A yearoveryear decrease driven by a planned reduction in commercial and shared membership. Despansion of the health ansure feet and no edicaid contract loffers. \nHese dynamics were partially offset by higher premium yields in our commercial engovernment businesses and membership growth in our edicare products. \nRom an operating perspective our businesses are performing well. \nUr adjusted pretax margin was. A very strong result and at the high end of our target range. \nUr 3rd quarter total helt edical enefit atio was. An improvement of basis points compared to the same period last year. \nUr 3rd quarter results reflects strong underwriting performance that more than offset the approximately basis point impact of the oneyear's sexspansion of the health inshure feet. \nUr adjusted expense ratio was. Basis point improvement over the 3rd quarter of. Driven primarily by the sexspansion of the health insur fee in. And the continued execution of our expense management initiatives partially offset by the targeted investment spending on growth initiatives discussed on prior earnings calls this year. \nRom a balance sheet perspective we remain confident in the adequacy of our reserves. \nE experienced favorable prior period reserve development in the quarter across all ofour co products primarily attributable the 2nd quarter dates of service. Our days claims payable were stable sequentially at days at the end of the quarter. \nUrning to cash flow and capital. Ur yeartodate healthcare and roup insurance cash flows were approximately times net earnings. \nUring the quarter we distributed million. To our quarterly shareholder dividend. Nd we repurchased. Million of our shares. \nEartodate we have returned a total of approximately. Billion to shareholders through share repurchases in our quarterly shareholder dividend. \nN short we are pleased with our 3rd quarter results into continued successfal execution of our strategy to becoming more consumer focused company. \nWill now discuss the key drivers of our 3rd quarter results in greater detail. \nEginning with our government business. We delivered another solid quarter continuing our positive momentum from the 1st half of. From a membership perspective. Edicare members in the quarter. Luding growth of. In the individual edicare advantage. \nUr edicaide membership declined by approximately. Members in the quarter primarily driven by our une exit of the marrolind contract. \n \nE grew our 3rd quarter. Government premiums which represent over half of our total healthcare premiums to billion. \nEdicare premiums grew nearly. Compared to the prioryear period driven by strong premium growth of. In individual edicare advantage. \nEdicait premiums declined yearoveryear due to previously disclow state contract exes. \nUr overnment edical enefit atio was. A continuation of the strong results we achieved in this business during the 1st half of the year. \nHifting to our ommercial business which also had a very strong 3rd quarter from an operating margin perspective. We grew insured commercial membership in the quarter by. Members reflecting international growth related to our acquisition of roup's hainland business partially offset by our continued repositioning in small roup related products which drove reduced small roup insured membership. \nRom a yearoveryear perspective total commercial premiums were lower largely the result of our reduced compliant individual and small roup exposure and the sespansion of the health ansur fee. Partially offset by higher premium yields. \nOwever our large roup commercial premiums were modestly higher as compared to the 3rd quarter of. Despite the impact of the sespension of the health and sharete. \nUr ommercial edical enefit atio was. For the quarter a very good result and a. Basis point improvement over the 3rd quarter of. Despite the negative impointe of the suspansion of the health and sharefee. \nO better understand the yearoveryear comparison let me discuss the drivers of our underwriting results by product go. \nOntinuing the momentum from the 1st half of the year we delivered another strong quarter of underwriting results in our large roup commercial ensured products. \nUr 3rd quarter results reflect continued pricing disciplined and moderate medical cost trends. \nAsed on our yeartodate results. Te now expect our non. Ore ommercial medical cost trends will be approximately. \nE also had a strong quarter of underwriting results in our small group commercial product as we continue to have success in repositioning these products to improve their profitability. \nOving to our compliant individual commercial productur 3rd quarter results were largely in line with our expectation. With lower losses compared to the same period last year due ing large part to our reduced membership. \nAsed on our 3rd quarter results and excluding prioryear items we continue to project an underlying loss on. Compliant individual commercial products. \nHifting to our feebasedr commercial business we grew commercial membership by. Members in the 3rd quarter. \nE now serve nearly. More commercial members compared to the same period last year. \nRom a profitability perspective our group commercial feebased business had another solid quarter exceeding our previous expectations. \nOving on to the balance sheet our financial position capital structure and liquidity all continued to be very strong. \nT eptember we had a debt to total capitalization ratio of approximately. Reflecting our decision in the quarter to prefund. Billion of debt maturities that come do in the 4th quarter of. We expect our yearend. Debt to total capitalization ratio to come back down to be more in line with our targeted range. \nOoking at cash and investments at the parent. We started the quarter with approximately million. \nEt subsidiary dividends to the parent were. Million. \nIssued billion in debt in the quarter. \nE paid a shareholder dividend of million. \nE use million to repurchase shares. \nWe ended the quarter with approximately. Billion of cash at the parent reflecting the previously discussed prefunding of 4th quarter. Debt matorities. \nUr basic share count was approximately. Million at eptember. His morning we increased our. Adjusted earnings per share guidance to approximately. Based by continued strong performance in the 3rd quarter. \nUr updated projections are influante by the following drivers. \nE now project our fullyear. Adjusted revenue to be approximately billion. Reflecting the divestiture of our. \nAnibility businesse. \nE now project that our fullyear total ealthare edical enefit ratio will be approximately. \nHis improved outlook is driven by continued moderate medical costs. \nE now project our adjusted pretax margin to be approximately. Consistent with our high single digit target. \nE now project adjusted earnings will be nearly billion. \nE now project fullyear. Dividends from subsidiaries to be up to billion. Excluding any dividends related to the cash consideration received for the sale of our. \nI an visability businesses. \nXcluding dividends related to the stale we expect to end the year with approximately million in parent cash reflecting the repayment of billion of debt maturities during the 4th quarter an additional capital deploymen. \nE alance of our. Guidance metrics remain unchanged and can be found on our guidance summary. \nS is our convention we will not be providing any. Projections on today's call. \nOwever will provide some directional commentary to help investors better understands some of the larger luving items. Xpect will impact. Adjusted revenue and adjusted earnings. \nS you know as a matter of course we do not include prioryear's reserve development in our forward earnings projections. \nDditionally. Results benefited from the final assessment of. Risk adjustment in reinsurance for compliantd products. \nAsed on our updated. Adjusted projection and a combination of these two items we belev at tis. Baseline adjusted earnings be approximately per share. \nHile we are still working to our planning processe ave incorporated a number of challenges in our forward outlook. \nRom an adjusted revenue perspective thes include. He previously disclosed sale of our. \nAbility businesses. The previously discloses medicaide contract acces. Ur exit from individual ommercial products. Nd our continued repositioning of our compliant mall group ommercial products. \nE project these topline challenges will also pressure adjusted earnings in. In addition to these items we project adjusted earnings growth to be pressured by the reintroduction of the industrywide nondeductible health insure the creating a headwind of approximately. Due to the timing of revenue and expense recognition. \nE also expect this re introduction to produce incremental experience rating pressure in our large group insured products. \nWe do see a number of opportunities including. Aure projected abouve the industry growth in individual edicare advantage and strong growth in roup edicare dvantage products. He fullyear impact of. Capital deployment actions and the potential to deploy additional capital to improve adjusted growth. He reduction of lossces from exciting individual commercial products in. And our ability to achieve expense efficiencies as we continue to simplify our processes and drive probest inclass business performance. \nN closing as we begin the final quarter of. And continue to plan for. We are encouraged by what has been a strong adjusted growth year in despite some topline challenges. \nE also remain confident in our ability to drive longterm adjusted growth over time. \nWill now turn the call back over to oh. \n \nThink your overall character a characterization of and is correct n. \nEn ou think about a lot of the spending that wehave been doing especially in the 2nd half of the year a lot of that actually has been more pointed at. Growth. Particularly in edicare e.g. with ongoing geographic expansion. \nLot of theayer expenditures were doing around analytics and technology and digital platform are also things that have a a multiyear share o growth dynamic to them. \nO think when you think about. We woul largy think aboutediare continuing on that above the industry growth trajectory that expend on. \nWould continue to think about our large oup. O businesse ae being thamisinedigits which they've largely been on. \nNdwhat will be driver think around actually will be medicate an. \nS we can get back on ta low doubledigit track of edicaid earnings growth and obviously we're making some investment there around sort of our petent an ebut. \nThat think will ultimately turn out to be the linch. Deefor. \nObviously thereis a dynamic here is going on of t it fundaing right which sometimes we're saving some of those members but they're ing a reduced revenue profile. \nHe other thing would say that think is happened this year. Is when we look at that business we've largely got that business back into a target margin range. Nd so think the repositioning. Is more successful than just sort of the revenue line. Ight indicate on the service. \nA fair amount. \nOt largeevance a ut there's there's a fair amount of strength in in the underlying small group move in to that market. \nUt there's there's a fair amount of strength in. In the underlying small group move into that mark. \nO let me be chere about nd think. In response the oses questionnd was concerring that it would be a growth year. \nOn't think said that it would be a down year. \nDo think. It's important to on these these revenue issures right we have a good analon what the rev. \nThe is little mort of the billion. \nHe groanc divestiture as billn. \nCaid axits are proaably approachin bio. \nO there's a lot industrypositioning thei. Fat. \nThink as we mentioned some of what we did this year. As to try to accelerate spending. Spend in lewer. Maybe spending in in the future we'ved. \nOme of the spending that we've ramped up this year won't recer. \nUt to your point think in this business when yourwhen youre challnge on the top line. You just really need to take the action on the lineto your productivity measures and elleswhere and think that will be part of. Sort of the roll forward for having a successful gain we'll be managing this this dealleveraging. \nUt to your point think in this business when yourwhen youre challnge on the top line. You just really need to take the action on the linet your productivity measures and elleswhere and think that will be part of. Sort of the roll forward for having a successful gain we'll be managing this this dealleveraging. \nWhat we've tried to do is actually characterize that as other people have characterizeit in thi cite. \nSort of the accounting dynamic that exist this year and then the effectthis. The effact of this neyear. \nS mentioned before this is this is. Hi is all about you can put it in the pricing but it about whether you get your overall aggregate rite at the end of the day. \nThis quantification wal sort of intended to pick up both dynamics. \nNd contine t. Sthis thas played out obviously the group commercial a largerout places a pace e a. Think the absolute level of rate increased matters. \nWe're having obviously a good year on commercial ans. S going to create. Some challenges in terms of ortof how much of that will slow back through the experience rating ofit. \nStaing the absolute rate increased matters and as we tryed to stort of manage that dynamic tof the experience rating processs. Think it creates. Pressure ound the overall or renolerating process. \nCriste when you start to recover that. Youre always makeain provision for. Ha isthe stimated revenue base youre going to have to recover this. \nIn doing that we normally by ourselves sretingroo. And anticipate that we might have some of us. \nHat dynamic though is captured in the. When we lo a. \nNk we've done a pretty good job of positioning ourselves. Aspaedout. \nProbly can't be overly specific at thion that. \nThink the number you have is a little bit high. \nE have been driving a little bit on the top line this year and that impacts that ratio as well. Going forward. \nOme of it as mentioned in a previous ques. Sort of managing that. Spend as well as other actions that will be part of how we bridge from to. Meaning fu parto the eqatbut that's we're working trough our planning proow so really can't be more specific at this strunture. \nE e e e e a eaon o the eanest places to servee see this as ewhen we talk about large group. \nHeno tejus. Sort of that revenue we're right there in the midsingledigits for revenue growth. \nTink that's a very good indicator of sort of the state of the market. Around this issue. \n \nWe've had that target out there in the pasth. \nSo let me bechere becase ree pocin o te. A'm not providing guidance. \nComent specifically butatwold g ba. Is ti adjusted base line of a. An thik aot the er for waofiat. \nN agan contine to think t aita teo ecause we will do a lot we've done a lot of that this year. \nThat will have anan annualized benefit next year plus were uf next year capital deployment to tha. \nHat should be a meaningful number for next year in the overall equation. \nGs aoun experience rating in the health ansure fee and things like that but again we do have some some good opportunities on. Ne occare in the exiting of the individual business. \nO from that adjusted baseline. Ink we're seeing pretty good right now. \nE tak yo about then the. \nIr its eboute geographic expansion. \nWe havea really good samestore growth. \nNeed more tor. \nNd so as our intentchon to get into the. Ene few years. \nIet pat of a one of thet pats of our investment is ramping the ability to build out networks. \nE're actually using very interesting davta analytics be fiyoure that out as using the edicar ata bate. \nExpandingmorma is going to generate future growth. \nThe other is that we've got. Of the people that retire everyday. \nE an e a ta is related to network coverage. D portability. \nSo. We will have launch amportable edicare dvantage product across our e current network which is approximately. Where we willtaotheape incetain market of having tcoverage an were m that en weget ramp up to that. Welavanaty to offer a natually portable medicare dvantage product. \n \nAtin gettin. Relationships as a big. \nBack on the side we continue to develop relations across our network. \nE have the ability to build er ou network bat we're getting the best cost structure we thi to provide an affordable product. In the. \nE have the ability to build er ou network bat we're getting the best cost structure we thi to provide an affordable product. In the. \n"
The actual transcripts are read into memory to compare to the RIVA predicted transcripts.
paths = read_files(
os.path.join(DATA_DIR, DATASET_NAME, COMPANY_DATE),
extension = "txt"
)
transcript_actual = Path(paths[0]).read_text()
transcript_actual
"Thank you, Mark, and good morning, everyone\nEarlier today, we reported third quarter 2017 adjusted earnings of $814 million and adjusted earnings per share of $2.45, year-over-year growth of 11% and 18% respectively\nOur third quarter results reflect strong core business fundamentals driven by disciplined pricing, moderate medical cost trend and continued capital deployment\nI'll begin with some comments on overall performance\nOur medical membership of 22.2 million is in line with our expectations for the period\nAdjusted revenue was $14.9 billion, a year-over-year decrease driven by a planned reduction in Commercial ACA insured membership, the 2017 suspension of the health insurer fee and known Medicaid contract losses\nThese dynamics were partially offset by higher premium yields in our Commercial and Government businesses and membership growth in our Medicare products\nFrom an operating perspective, our businesses are performing well\nOur adjusted pre-tax margin was 9.2%, a very strong result and at the high end of our target range\nOur third quarter total health medical benefit ratio was 81.9%, an improvement of 10 basis points compared to the same period last year\nOur third quarter results reflect strong underwriting performance that more than offset the approximately 200 basis point impact of the one-year suspension of the health insurer fee\nOur adjusted expense ratio was 17.5%, a 10 basis point improvement over the third quarter of 2016, driven primarily by the suspension of the health insurer fee in 2017 and the continued execution of our expense management initiatives, partially offset by the targeted investment spending on growth initiatives discussed on prior earnings calls this year\nFrom a balance sheet perspective, we remain confident in the adequacy of our reserves\nWe experienced favorable prior period reserve development in the quarter across all of our core products, primarily attributable to second quarter 2017 dates of service and our days claims payable were stable sequentially at 54 days at the end of the quarter\nTurning to cash flow and capital, our year-to-date Health Care and Group Insurance cash flows were approximately 1.5 times net earnings\nDuring the quarter, we distributed $166 million through our quarterly shareholder dividend and we repurchased $545 million of our shares\nYear to date we have returned a total of approximately $4.3 billion to shareholders through share repurchases in our quarterly shareholder dividend\nIn short, we are pleased with our third quarter results and the continued successful execution of our strategy to become a more consumer-focused company\nI will now discuss the key drivers of our third quarter results in greater detail\nBeginning with our Government business, we delivered another solid quarter, continuing our positive momentum from the first half of 2017. From a membership perspective, we grew by 23,000 Medicare members in the quarter, including growth of 11,000 in individual Medicare Advantage\nOur Medicaid membership declined by approximately 218,000 members in the quarter, primarily driven by our June 30 exit of the Maryland ASC contract\nThis membership decline was in line with our previous projection for the period\nWe grew our third quarter 2017 Government premiums, which represent over half of our total health care premiums, to $6.7 billion\nMedicare premiums grew nearly 5% compared to the prior-year period, driven by strong premium growth of 12% in individual Medicare Advantage\nMedicaid premiums declined year over year due to previously disclosed state contract exits\nOur Government medical benefit ratio was 82.4%, a continuation of the strong results we achieved in this business during the first half of the year\nShifting to our Commercial business, which also had a very strong third quarter from an operating margin perspective, we grew insured Commercial membership in the quarter by 177,000 members, reflecting international growth related to our acquisition of Bupa Group's Thailand business, partially offset by our continued repositioning in small group ACA-related products, which drove reduced small group insured membership\nFrom a year-over-year perspective, total Commercial premiums were lower, largely the result of our reduced ACA-compliant individual and small group exposure and the suspension of the health insurer fee, partially offset by higher premium yields\nHowever, our large group Commercial premiums were modestly higher as compared to the third quarter of 2016 despite the impact of the suspension of the health insurer fee\nOur Commercial medical benefit ratio was 81.4% for the quarter, a very good result and a 240 basis point improvement over the third quarter of 2016 despite the negative influence of the suspension of the health insurer fee\nTo better understand the year-over-year comparison let me discuss the drivers of our underwriting results by product group\nContinuing the momentum from the first half of the year, we delivered another strong quarter of underwriting results in our large group Commercial insured products\nOur third quarter results reflect continued pricing discipline and moderate medical cost trends\nBased on our year-to-date 2017 results, we now expect our 2017 non-ACA core Commercial medical cost trends will be approximately 5.5%\nWe also had a strong quarter of underwriting results in our small group Commercial product as we continue to have success in repositioning these products to improve their profitability\nMoving to our ACA-compliant individual Commercial product, our third quarter results were largely in line with our expectations, with lower losses compared to the same period last year due in large part to our reduced membership\nBased on our third quarter results and excluding prior-year items, we continue to project an underlying loss on 2017 ACA-compliant individual Commercial products\nShifting to our fee-based group Commercial business, we grew Commercial ASC membership by 95,000 members in the third quarter\nWe now serve nearly 410,000 more Commercial ASC members compared to the same period last year\nFrom a profitability perspective, our group Commercial fee-based business had another solid quarter, exceeding our previous expectations\nMoving on to the balance sheet, our financial position, capital structure and liquidity all continued to be very strong\nAt September 30, we had a debt-to-total-capitalization ratio of approximately 39.5%, reflecting our decision in the quarter to pre-fund $1 billion of debt maturities that come due in the fourth quarter of 2017. We expect our year-end 2017 debt-to-total-capitalization ratio to come back down to be more in line with our targeted range\nLooking at cash and investments at the parent, we started the quarter with approximately $203 million\nNet subsidiary dividends to the parent were $705 million\nWe issued $1 billion in debt in the quarter\nWe paid a shareholder dividend of $166 million\nWe used $545 million to repurchase shares\nAnd we ended the quarter with approximately $1.2 billion of cash at the parent, reflecting the previously discussed pre-funding of fourth quarter 2017 debt maturities\nOur basic share count was approximately 326.1 million at September 30. This morning, we increased our 2017 adjusted earnings per share guidance to approximately $9.75 based on continued strong performance in the third quarter\nOur updated 2017 projections are influenced by the following drivers\nWe now project our full year 2017 adjusted revenue to be approximately $60.5 billion, reflecting the divestiture of our U.S\ngroup life and disability businesses\nWe now project that our full year total Health Care medical benefit ratio will be approximately 82.3%\nThis improved outlook is driven by continued moderate medical costs\nWe now project our adjusted pre-tax margin to be approximately 8.9%, consistent with our high single-digit target\nWe now project adjusted earnings will be nearly $3.3 billion\nAnd we now project full year 2017 dividends from subsidiaries to be up to $3.2 billion excluding any dividends related to the cash consideration received for the sale of our U.S\ngroup life and disability businesses\nExcluding dividends related to the sale, we expect to end the year with approximately $400 million in parent cash, reflecting the repayment of $1 billion of debt maturities during the fourth quarter and additional capital deployment\nThe balance of our 2017 guidance metrics remain unchanged and can be found on our guidance summary\nAs is our convention, we will not be providing any 2018 projections on today's call\nHowever, I will provide some directional commentary to help investors better understand some of the larger moving items that we expect will impact 2018 adjusted revenue and adjusted earnings\nAs you know, as a matter of course we do not include prior-year's reserve development in our forward earnings projections\nAdditionally, 2017 results benefited from the final assessment of 2016 risk adjustment and reinsurance for ACA-compliant products\nBased on our updated 2017 adjusted EPS projection, and the combination of these two items, we believe Aetna's 2017 baseline adjusted earnings to be approximately $8.75 per share\nWhile we are still working through our planning process, we have incorporated a number of challenges in our forward outlook\nFrom an adjusted revenue perspective, these include the previously-disclosed sale of our U.S\ngroup life and disability businesses, the previously disclosed Medicaid contract exits, our exit from individual Commercial products and our continued repositioning of our ACA-compliant small group Commercial products\nWe project these top line challenges will also pressure adjusted earnings in 2018. In addition to these items we project adjusted earnings growth to be pressured by the reintroduction of the industry-wide non-deductible health insurer fee, creating a headwind of approximately $0.25 in 2018 due to the timing of revenue and expense recognition\nWe also expect this reintroduction to produce incremental experience rating pressure in our large group insured products\nFor 2018, we do see a number of opportunities, including our projected above-industry growth in individual Medicare Advantage and strong growth in group Medicare Advantage products, the full year impact of 2017 capital deployment action, and the potential to deploy additional capital to improve adjusted EPS growth, the reduction of losses from exiting individual Commercial products in 2018 and our ability to achieve expense efficiencies as we continue to simplify our processes and drive for best-in-class business performance\nIn closing, as we begin the final quarter of 2017 and continue to plan for 2018, we are encouraged by what has been a strong adjusted EPS growth year in 2017 despite some top line challenges\nWe also remain confident in our ability to drive long-term adjusted EPS growth over time\nI will now turn the call back over to Joe\nJoe?\nJosh, I think your overall characterization of 2018 and 2019 is correct\nAnd when you think about a lot of the spending that we've been doing, especially in the second half of the year, a lot of that actually has been more pointed at 2019 growth, particularly in Medicare for example, with ongoing geographic expansion\nA lot of the expenditures we're doing around analytics and technology and digital platform are also things that have a multi-year sort of growth dynamic to them\nSo I think when you think about 2019, we would largely think about Medicare continuing on that above-industry growth trajectory that it's been on\nI would continue to think about our large group core business as sort of being in that mid single-digits, which they've largely been on\nAnd what will be a driver I think around 2019 actually will be Medicaid\nAnd if we can get back on the low double-digit track of Medicaid earnings growth, then obviously we're making some investment there around sort of our procurement and re-procurement processes\nBut that I think will ultimately turn out to be the linchpin around sort of the degree of growth for 2019.\nJust recall, Kevin, that obviously there's a dynamic here of going on of the move to alternate funding, right, which sometimes we're saving some of those members but they're in a reduced revenue profile\nThe other thing I would say that I think has happened this year is when we look at that business, we've largely got that business back into a target margin range, and so I think the repositioning is more successful than just sort of the revenue line might indicate on the surface\nA fair amount is coming\nIt's both large group and small group but there's a fair amount of strength in the underlying small group move into that market\nSo let me be clear about 2018. I think, in response to Josh's question, I was concurring that it wouldn't be a growth year\nI don't think I said that it would be a down year\nI do think it's important to – on these revenue issues, right, we have a good handle on what the revenue is\nThe individual exit is a little north of $1 billion\nThe Group Insurance divestiture is $2 billion\nThe Medicaid exits are probably approaching $2 billion\nSo there's a lot in this repositioning there to sort of hold flat\nI think, as we mentioned, some of what we did this year was to try to accelerate SG&A spending and spend it in lieu of maybe spending it in the future\nWe've done that\nSome of the spending that we've ramped up this year won't recur\nBut to your point, I think in this business, when you're challenged on the top line, you just really need to take the action on the SG&A line through your productivity measures and elsewhere, and I think that will be part of sort of the roll forward for having a successful 2018 will be managing this deleveraging\nWhat we've tried to do is actually characterize it as other people have characterized it in this cycle\nSo what I would say is it's a combination of sort of the accounting dynamic that exists this year and then the effect of this next year\nAs I've mentioned before, this is – money is fungible and so this is all about – you can put it 100% into pricing, but it's about whether you get your overall aggregate price at the end of the day\nSo this quantification was sort of intended to pick up both dynamics\nAnd I continue to, as we've talked to people as this has played out obviously, the group Commercial, the large group place is the space to sort of watch this, because I think the absolute level of rate increase matters\nAnd we're having obviously a good year on Commercial, and so I think that is going to create some challenges in terms of sort of how much of that will flow back through the experience rating process\nI'm just saying the absolute rate increase matters, and as we try to sort of manage that dynamic through the experience rating process, I think it creates pressure on the overall sort of renewal rating process\nSo, Christine, when you start to recover this, you're always making provision for what is the estimated revenue base you're going to have to recover this from\nAnd so in doing that we normally buy ourselves some breathing room and anticipate that we might have some of this\nSo that dynamic though is captured in the $0.25 when we look at it\nI think we've done a pretty good job of positioning ourselves for sort of how this has played out\nI probably can't be overly specific at this point on that\nI think the number you have is a little bit high\nWe have been dragging a little bit on the top line this year, and that impacts that ratio as well going forward\nSo some of it, as I mentioned in a previous question, sort of managing that spend as well as other actions, that will be part of how we bridge from 2017 to 2018. So it will be a meaningful part of the equation, but that's – we're working through our planning process right now, so I really can't be more specific at this juncture\nYeah, and I think, Chris, the fact one of the cleanest places to sort of see this is when we talk about large group\nAnd when you HIF-adjust sort of that revenue, we're right there in the mid single-digits for revenue growth\nAnd I think that's a very good indicator of sort of the state of the market around this issue\nYeah\nSo we've had that target out there in the past\nSo let me be clear, because we're close enough to 2018. I'm not providing 2018 guidance\nWe're just too close to that for me to comment specifically, but what I would go back to is this adjusted baseline of $8.75 and think about the drivers forward off of that\nAnd, again, I continue to think about capital deployment, because we will do a lot – we've done a lot of that this year\nThat will have an annualized benefit next year, plus we'll have next year's capital deployment\nSo that should be a meaningful number for next year in the overall equation\nAnd we've talked about some of the challenges around experience rating and the health insurer fee and things like that, but, again, we do have some good opportunities on Medicare and the exiting of the individual business\nSo from that adjusted baseline going forward, and I think we're feeling pretty good right now\nWell, let me talk to you about the general investments in MA, and then we'll address the PBM one\nFirst, it's about geographic expansion\nSo we have really good same-store growth\nWe need more stores\nAnd so it is our intention to get into the 80%, mid 80%s coverage over the next few years\nPart of the – one of the biggest parts of our investment is ramping the ability to build out networks\nAnd we're actually using very interesting data analytics to figure that out as well, using the Medicare database\nAnd so this whole idea of expanding more markets is going to generate future growth\nNow the other is that we've got 30% of the people that retire every day, 11,000-ish that only take Medicare Advantage, 70% don't\nIn large part that 70% is related to network coverage and portability\nSo for 1/1/2018 we will have launched a portable Medicare Advantage product across our current network, which is approximately 62% coverage, where we will test out some of the aspects in certain markets of having this kind of coverage and learn more so that when we get ramped up to that mid 80%, we will have an ability to offer a nationally portable Medicare Advantage product\nAnd that, we'll go after that 70%\nStar ratings getting to 4.5-Stars, 5-Stars with some of our big ACL relationships is a big idea\nAnd then back on the PBM side, we continue to develop a relationship across our network\nWe have the ability to build our own network where we're getting the best cost structure we can get to provide an affordable product for seniors in the PDP and MAPD product\n"
Word Error Rate (WER) measures the edit distance in words. Lower WER scores are preferred.
WER = (Insertions + Deletions + Substitutions)/N_Words
wer = word_error_rate(transcript_predicted, transcript_actual)
print("Word Error Rate:", round(wer, 2))
Word Error Rate: 0.48
Recall-Oriented Understudy for Gisting Evaluation (ROUGE) measures overlap as n-grams. All ROUGE scores reported are for bigram overlap (ROUGE-2). Higher ROUGE scores are preferred.
rouge = rouge_error(transcript_predicted, transcript_actual)
rouge = {key : round(rouge[0]["rouge-2"][key], 2) for key in rouge[0]["rouge-2"]}
rouge
{'r': 0.44, 'p': 0.48, 'f': 0.46}
print("[PREDICTED]:",transcript_predicted.split("\n")[2])
print("[ACTUAL]: ",transcript_actual.split("\n")[2])
[PREDICTED]: Ur 3rd quarter results reflect strong core business fundamentals driven by disciplined pricing moderate medical cost trends and continued capital deployment. [ACTUAL]: Our third quarter results reflect strong core business fundamentals driven by disciplined pricing, moderate medical cost trend and continued capital deployment
We measure the average WER and ROUGE for an specified number (batch_size
) of companies.
_, _ = batch_metrics(
data_dir = DATA_DIR,
dataset_name = DATASET_NAME,
asr_client = asr_client,
batch_size = 20,
)
[COMPANY_DATE] Hewlett Packard Enterprise_20170905 data/ReleasedDataset_mp3/Hewlett Packard Enterprise_20170905/Audio/wav [WER] 0.32 [ROUGE] {'r': 0.58, 'p': 0.6, 'f': 0.59} [COMPANY_DATE] The Mosaic Company_20170801 data/ReleasedDataset_mp3/The Mosaic Company_20170801/Audio/wav [WER] 0.46 [ROUGE] {'r': 0.43, 'p': 0.46, 'f': 0.44} [COMPANY_DATE] Hologic_20171108 data/ReleasedDataset_mp3/Hologic_20171108/Audio/wav [WER] 0.52 [ROUGE] {'r': 0.38, 'p': 0.4, 'f': 0.39} [COMPANY_DATE] F5 Networks_20170426 data/ReleasedDataset_mp3/F5 Networks_20170426/Audio/wav [WER] 0.51 [ROUGE] {'r': 0.37, 'p': 0.43, 'f': 0.4} [COMPANY_DATE] American Tower Corp A_20171031 data/ReleasedDataset_mp3/American Tower Corp A_20171031/Audio/wav [WER] 0.46 [ROUGE] {'r': 0.43, 'p': 0.52, 'f': 0.47} [COMPANY_DATE] The Clorox Company_20171101 data/ReleasedDataset_mp3/The Clorox Company_20171101/Audio/wav [WER] 0.53 [ROUGE] {'r': 0.37, 'p': 0.38, 'f': 0.37} [COMPANY_DATE] Kohl's Corp._20170223 data/ReleasedDataset_mp3/Kohl's Corp._20170223/Audio/wav [WER] 0.54 [WARNING] Unable to process Kohl's Corp._20170223, please check the file sizes do not exceeed MAX_DURATION. [COMPANY_DATE] Kraft Heinz Co_20170503 data/ReleasedDataset_mp3/Kraft Heinz Co_20170503/Audio/wav [WARNING] Unable to process Kraft Heinz Co_20170503, please check the file sizes do not exceeed MAX_DURATION. [COMPANY_DATE] Hanesbrands Inc_20171101 data/ReleasedDataset_mp3/Hanesbrands Inc_20171101/Audio/wav [WARNING] Unable to process Hanesbrands Inc_20171101, please check the file sizes do not exceeed MAX_DURATION. [COMPANY_DATE] Republic Services Inc_20170727 data/ReleasedDataset_mp3/Republic Services Inc_20170727/Audio/wav [WER] 0.6 [ROUGE] {'r': 0.27, 'p': 0.33, 'f': 0.3} [COMPANY_DATE] Nektar Therapeutics_20171107 data/ReleasedDataset_mp3/Nektar Therapeutics_20171107/Audio/wav [WER] 0.74 [ROUGE] {'r': 0.37, 'p': 0.32, 'f': 0.34} [COMPANY_DATE] WestRock_20170124 data/ReleasedDataset_mp3/WestRock_20170124/Audio/wav [WER] 0.73 [ROUGE] {'r': 0.19, 'p': 0.23, 'f': 0.21} [COMPANY_DATE] Waste Management Inc._20171026 data/ReleasedDataset_mp3/Waste Management Inc._20171026/Audio/wav [WER] 0.63 [ROUGE] {'r': 0.3, 'p': 0.43, 'f': 0.36} [COMPANY_DATE] Align Technology_20170727 data/ReleasedDataset_mp3/Align Technology_20170727/Audio/wav [WER] 0.38 [ROUGE] {'r': 0.51, 'p': 0.54, 'f': 0.53} [COMPANY_DATE] AmerisourceBergen Corp_20171102 data/ReleasedDataset_mp3/AmerisourceBergen Corp_20171102/Audio/wav [WER] 0.6 [ROUGE] {'r': 0.29, 'p': 0.31, 'f': 0.3} [COMPANY_DATE] Illinois Tool Works_20171023 data/ReleasedDataset_mp3/Illinois Tool Works_20171023/Audio/wav [WER] 0.62 [ROUGE] {'r': 0.29, 'p': 0.35, 'f': 0.32} [COMPANY_DATE] CBS Corp._20170807 data/ReleasedDataset_mp3/CBS Corp._20170807/Audio/wav [WER] 0.53 [ROUGE] {'r': 0.35, 'p': 0.37, 'f': 0.36} [COMPANY_DATE] Dominion Energy_20171030 data/ReleasedDataset_mp3/Dominion Energy_20171030/Audio/wav [WER] 0.61 [ROUGE] {'r': 0.29, 'p': 0.35, 'f': 0.31} [COMPANY_DATE] Invesco Ltd._20170727 data/ReleasedDataset_mp3/Invesco Ltd._20170727/Audio/wav [WER] 0.76 [ROUGE] {'r': 0.17, 'p': 0.24, 'f': 0.2} [COMPANY_DATE] Franklin Resources_20170127 data/ReleasedDataset_mp3/Franklin Resources_20170127/Audio/wav [WER] 0.62 [ROUGE] {'r': 0.29, 'p': 0.39, 'f': 0.33} [AGGREGATED WER] 0.48 [AGGREGATED ROUGE] {'r': 0.29, 'p': 0.33, 'f': 0.31}
Text Classification is the task of classifying any given input text (or sequence of tokens) into a predefined set of classes. You can use this generic text classification task for various domain specific use cases such as sentiment classification, topic classification, intent classification, domain classification, etc.
For more information on NVIDIA RIVA's Text Classification, visit here.
sentiment_results = run_text_classify(
server = speech_channel,
model = "riva_text_classification_default",
query = transcript_predicted.split("\n"),
)
sentiments = [x[1] for x in sentiment_results]
Client app to test text classification on Riva Using model: riva_text_classification_default
Finally, the sentiment analysis can be displayed via a dashboard UI. Users will be able to track the sentiments in a conversation in realtime and investigation historic sentiments at various levels of granularity.
validation_sentiment_analysis(sentiments)