Analyze_Text.ipynb: Analyze Text with Pandas and Watson Natural Language Understanding

Introduction¶

This notebook shows how the open source library Text Extensions for Pandas lets you use Pandas DataFrames and the Watson Natural Language Understanding service to analyze natural language text.

We start out with an excerpt from the plot synopsis from the Wikipedia page for Monty Python and the Holy Grail. We pass this example document to the Watson Natural Language Understanding (NLU) service. Then we use Text Extensions for Pandas to convert the output of the Watson NLU service to Pandas DataFrames. Next, we perform an example analysis task both with and without Pandas to show how Pandas makes analyzing NLP information easier. Finally, we walk through all the different DataFrames that Text Extensions for Pandas can extract from the output of Watson Natural Language Understanding.

Environment Setup¶

This notebook requires a Python 3.7 or later environment with the following packages:

The dependencies listed in the "requirements.txt" file for Text Extensions for Pandas
The "ibm-watson" package, available via pip install ibm-watson
text_extensions_for_pandas

You can satisfy the dependency on text_extensions_for_pandas in either of two ways:

Run pip install text_extensions_for_pandas before running this notebook. This command adds the library to your Python environment.
Run this notebook out of your local copy of the Text Extensions for Pandas project's source tree. In this case, the notebook will use the version of Text Extensions for Pandas in your local source tree if the package is not installed in your Python environment.

In [1]:

# Core Python libraries
import json
import os
import sys
import pandas as pd
from typing import *

# IBM Watson libraries
import ibm_watson
import ibm_watson.natural_language_understanding_v1 as nlu
import ibm_cloud_sdk_core

# And of course we need the text_extensions_for_pandas library itself.
try:
    import text_extensions_for_pandas as tp
except ModuleNotFoundError as e:
    # If we're running from within the project source tree and the parent Python
    # environment doesn't have the text_extensions_for_pandas package, use the
    # version in the local source tree.
    if not os.getcwd().endswith("notebooks"):
        raise e
    if ".." not in sys.path:
        sys.path.insert(0, "..")
    import text_extensions_for_pandas as tp

Set up the Watson Natural Language Understanding Service¶

In this part of the notebook, we will use the Watson Natural Language Understanding (NLU) service to extract key features from our example document.

You can create an instance of Watson NLU on the IBM Cloud for free by navigating to this page and clicking on the button marked "Get started free". You can also install your own instance of Watson NLU on OpenShift by using IBM Watson Natural Language Understanding for IBM Cloud Pak for Data.

You'll need two pieces of information to access your instance of Watson NLU: An API key and a service URL. If you're using Watson NLU on the IBM Cloud, you can find your API key and service URL in the IBM Cloud web UI. Navigate to the resource list and click on your instance of Natural Language Understanding to open the management UI for your service. Then click on the "Manage" tab to show a page with your API key and service URL.

The cell that follows assumes that you are using the environment variables IBM_API_KEY and IBM_SERVICE_URL to store your credentials. If you're running this notebook in Jupyter on your laptop, you can set these environment variables while starting up jupyter notebook or jupyter lab. For example:

IBM_API_KEY='<my API key>' \
IBM_SERVICE_URL='<my service URL>' \
  jupyter lab

Alternately, you can uncomment the first two lines of code below to set the IBM_API_KEY and IBM_SERVICE_URL environment variables directly. Be careful not to store your API key in any publicly-accessible location!

In [2]:

# If you need to embed your credentials inline, uncomment the following two lines and
# paste your credentials in the indicated locations.
# os.environ["IBM_API_KEY"] = "<API key goes here>"
# os.environ["IBM_SERVICE_URL"] = "<Service URL goes here>"

# Retrieve the API key for your Watson NLU service instance
if "IBM_API_KEY" not in os.environ:
    raise ValueError("Expected Watson NLU api key in the environment variable 'IBM_API_KEY'")
api_key = os.environ.get("IBM_API_KEY")
    
# Retrieve the service URL for your Watson NLU service instance
if "IBM_SERVICE_URL" not in os.environ:
    raise ValueError("Expected Watson NLU service URL in the environment variable 'IBM_SERVICE_URL'")
service_url = os.environ.get("IBM_SERVICE_URL")  

Connect to the Watson Natural Language Understanding Python API¶

This notebook uses the IBM Watson Python SDK to perform authentication on the IBM Cloud via the IAMAuthenticator class. See the IBM Watson Python SDK documentation for more information.

We start by using the API key and service URL from the previous cell to create an instance of the Python API for Watson NLU.

In [3]:

natural_language_understanding = ibm_watson.NaturalLanguageUnderstandingV1(
    version="2019-07-12",
    authenticator=ibm_cloud_sdk_core.authenticators.IAMAuthenticator(api_key)
)
natural_language_understanding.set_service_url(service_url)
natural_language_understanding

Out[3]:

<ibm_watson.natural_language_understanding_v1.NaturalLanguageUnderstandingV1 at 0x7fc05134dc70>

Pass a Document through the Watson NLU Service¶

Once you've opened a connection to the Watson NLU service, you can pass documents through the service by invoking the analyze() method.

The example document that we use here is an excerpt from the plot summary for Monty Python and the Holy Grail, drawn from the Wikipedia entry for that movie.

Let's show what the raw text looks like:

In [4]:

from IPython.display import display, HTML
doc_file = "../resources/holy_grail_short.txt"
with open(doc_file, "r") as f:
    doc_text = f.read()

display(HTML(f"<b>Document Text:</b><blockquote>{doc_text}</blockquote>"))

Document Text:

In AD 932, King Arthur and his squire, Patsy, travel throughout Britain searching for men to join the Knights of the Round Table. Along the way, he recruits Sir Bedevere the Wise, Sir Lancelot the Brave, Sir Galahad the Pure, Sir Robin the Not-Quite-So-Brave-as-Sir-Lancelot, and Sir Not-Appearing-in-this-Film, along with their squires and Robin's troubadours. Arthur leads the men to Camelot, but upon further consideration (thanks to a musical number) he decides not to go there because it is "a silly place". As they turn away, God (an image of W. G. Grace) speaks to them and gives Arthur the task of finding the Holy Grail.

In the code below, we instruct Watson Natural Language Understanding to perform five different kinds of analysis on the example document:

entities (with sentiment)
keywords (with sentiment and emotion)
relations
semantic_roles
syntax (with sentences, tokens, and part of speech)

See the Watson NLU documentation for a full description of the types of analysis that NLU can perform.

In [5]:

# Make the request
response = natural_language_understanding.analyze(
    text=doc_text,
    # TODO: Use this URL once we've pushed the shortened document to Github
    #url="https://raw.githubusercontent.com/CODAIT/text-extensions-for-pandas/master/resources/holy_grail_short.txt",
    return_analyzed_text=True,
    features=nlu.Features(
        entities=nlu.EntitiesOptions(sentiment=True, mentions=True),
        keywords=nlu.KeywordsOptions(sentiment=True, emotion=True),
        relations=nlu.RelationsOptions(),
        semantic_roles=nlu.SemanticRolesOptions(),
        syntax=nlu.SyntaxOptions(sentences=True, 
                                 tokens=nlu.SyntaxOptionsTokens(lemma=True, part_of_speech=True))
    )).get_result()

The response from the analyze() method is a Python dictionary. The dictionary contains an entry for each pass of analysis requested, plus some additional entries with metadata about the API request itself. Here's a list of the keys in response:

In [6]:

response.keys()

Out[6]:

dict_keys(['usage', 'syntax', 'semantic_roles', 'relations', 'language', 'keywords', 'entities', 'analyzed_text'])

Perform an Example Task¶

Let's use the information that Watson Natural Language Understanding has extracted from our example document to perform an example task: Find all the pronouns in each sentence, broken down by sentence.

This task could serve as first step to a number of more complex tasks, such as resolving anaphora (for example, associating "King Arthur" with "his" in the phrase "King Arthur and his squire, Patsy") or analyzing the relationship between sentiment and the gender of pronouns.

We'll start by doing this task using straight Python code that operates directly over the output of Watson NLU's analyze() method. Then we'll redo the task using Pandas DataFrames and Text Extensions for Pandas. This exercise will show how Pandas DataFrames can represent the intermediate data structures of an NLP application in a way that is both easier to understand and easier to manipulate with less code.

Let's begin.

Perform the Task Without Using Pandas¶

All the information that we need to perform our task is in the "syntax" section of the response we captured above from Watson NLU's analyze() method. Syntax analysis captures a large amount of information, so the "syntax" section of the response is very verbose.

For reference, here's the text of our example document again:

In [7]:

display(HTML(f"<b>Document Text:</b><blockquote>{doc_text}</blockquote>"))

Document Text:

In AD 932, King Arthur and his squire, Patsy, travel throughout Britain searching for men to join the Knights of the Round Table. Along the way, he recruits Sir Bedevere the Wise, Sir Lancelot the Brave, Sir Galahad the Pure, Sir Robin the Not-Quite-So-Brave-as-Sir-Lancelot, and Sir Not-Appearing-in-this-Film, along with their squires and Robin's troubadours. Arthur leads the men to Camelot, but upon further consideration (thanks to a musical number) he decides not to go there because it is "a silly place". As they turn away, God (an image of W. G. Grace) speaks to them and gives Arthur the task of finding the Holy Grail.

And here's the output of Watson NLU's syntax analysis, converted to a string:

In [8]:

response["syntax"]

Out[8]:

{'tokens': [{'text': 'In',
   'part_of_speech': 'ADP',
   'location': [0, 2],
   'lemma': 'in'},
  {'text': 'AD', 'part_of_speech': 'PROPN', 'location': [3, 5], 'lemma': 'Ad'},
  {'text': '932', 'part_of_speech': 'NUM', 'location': [6, 9]},
  {'text': ',', 'part_of_speech': 'PUNCT', 'location': [9, 10]},
  {'text': 'King',
   'part_of_speech': 'PROPN',
   'location': [11, 15],
   'lemma': 'King'},
  {'text': 'Arthur', 'part_of_speech': 'PROPN', 'location': [16, 22]},
  {'text': 'and',
   'part_of_speech': 'CCONJ',
   'location': [23, 26],
   'lemma': 'and'},
  {'text': 'his',
   'part_of_speech': 'PRON',
   'location': [27, 30],
   'lemma': 'his'},
  {'text': 'squire',
   'part_of_speech': 'NOUN',
   'location': [31, 37],
   'lemma': 'squire'},
  {'text': ',', 'part_of_speech': 'PUNCT', 'location': [37, 38]},
  {'text': 'Patsy',
   'part_of_speech': 'PROPN',
   'location': [39, 44],
   'lemma': 'Patsy'},
  {'text': ',', 'part_of_speech': 'PUNCT', 'location': [44, 45]},
  {'text': 'travel',
   'part_of_speech': 'NOUN',
   'location': [46, 52],
   'lemma': 'travel'},
  {'text': 'throughout',
   'part_of_speech': 'ADP',
   'location': [53, 63],
   'lemma': 'throughout'},
  {'text': 'Britain', 'part_of_speech': 'PROPN', 'location': [64, 71]},
  {'text': 'searching',
   'part_of_speech': 'NOUN',
   'location': [72, 81],
   'lemma': 'searching'},
  {'text': 'for',
   'part_of_speech': 'ADP',
   'location': [82, 85],
   'lemma': 'for'},
  {'text': 'men',
   'part_of_speech': 'NOUN',
   'location': [86, 89],
   'lemma': 'man'},
  {'text': 'to',
   'part_of_speech': 'PART',
   'location': [90, 92],
   'lemma': 'to'},
  {'text': 'join',
   'part_of_speech': 'VERB',
   'location': [93, 97],
   'lemma': 'join'},
  {'text': 'the',
   'part_of_speech': 'DET',
   'location': [98, 101],
   'lemma': 'the'},
  {'text': 'Knights',
   'part_of_speech': 'PROPN',
   'location': [102, 109],
   'lemma': 'Knight'},
  {'text': 'of',
   'part_of_speech': 'ADP',
   'location': [110, 112],
   'lemma': 'of'},
  {'text': 'the',
   'part_of_speech': 'DET',
   'location': [113, 116],
   'lemma': 'the'},
  {'text': 'Round',
   'part_of_speech': 'ADJ',
   'location': [117, 122],
   'lemma': 'round'},
  {'text': 'Table',
   'part_of_speech': 'NOUN',
   'location': [123, 128],
   'lemma': 'table'},
  {'text': '.', 'part_of_speech': 'PUNCT', 'location': [128, 129]},
  {'text': 'Along',
   'part_of_speech': 'ADP',
   'location': [130, 135],
   'lemma': 'along'},
  {'text': 'the',
   'part_of_speech': 'DET',
   'location': [136, 139],
   'lemma': 'the'},
  {'text': 'way',
   'part_of_speech': 'NOUN',
   'location': [140, 143],
   'lemma': 'way'},
  {'text': ',', 'part_of_speech': 'PUNCT', 'location': [143, 144]},
  {'text': 'he',
   'part_of_speech': 'PRON',
   'location': [145, 147],
   'lemma': 'he'},
  {'text': 'recruits',
   'part_of_speech': 'VERB',
   'location': [148, 156],
   'lemma': 'recruit'},
  {'text': 'Sir',
   'part_of_speech': 'PROPN',
   'location': [157, 160],
   'lemma': 'Sir'},
  {'text': 'Bedevere', 'part_of_speech': 'PROPN', 'location': [161, 169]},
  {'text': 'the',
   'part_of_speech': 'DET',
   'location': [170, 173],
   'lemma': 'the'},
  {'text': 'Wise',
   'part_of_speech': 'PROPN',
   'location': [174, 178],
   'lemma': 'Wise'},
  {'text': ',', 'part_of_speech': 'PUNCT', 'location': [178, 179]},
  {'text': 'Sir',
   'part_of_speech': 'PROPN',
   'location': [180, 183],
   'lemma': 'Sir'},
  {'text': 'Lancelot', 'part_of_speech': 'PROPN', 'location': [184, 192]},
  {'text': 'the',
   'part_of_speech': 'DET',
   'location': [193, 196],
   'lemma': 'the'},
  {'text': 'Brave',
   'part_of_speech': 'PROPN',
   'location': [197, 202],
   'lemma': 'Brave'},
  {'text': ',', 'part_of_speech': 'PUNCT', 'location': [202, 203]},
  {'text': 'Sir',
   'part_of_speech': 'PROPN',
   'location': [204, 207],
   'lemma': 'Sir'},
  {'text': 'Galahad', 'part_of_speech': 'PROPN', 'location': [208, 215]},
  {'text': 'the',
   'part_of_speech': 'DET',
   'location': [216, 219],
   'lemma': 'the'},
  {'text': 'Pure', 'part_of_speech': 'PROPN', 'location': [220, 224]},
  {'text': ',', 'part_of_speech': 'PUNCT', 'location': [224, 225]},
  {'text': 'Sir',
   'part_of_speech': 'PROPN',
   'location': [226, 229],
   'lemma': 'Sir'},
  {'text': 'Robin',
   'part_of_speech': 'PROPN',
   'location': [230, 235],
   'lemma': 'Robin'},
  {'text': 'the',
   'part_of_speech': 'DET',
   'location': [236, 239],
   'lemma': 'the'},
  {'text': 'Not', 'part_of_speech': 'PROPN', 'location': [240, 243]},
  {'text': '-', 'part_of_speech': 'PUNCT', 'location': [243, 244]},
  {'text': 'Quite', 'part_of_speech': 'PROPN', 'location': [244, 249]},
  {'text': '-', 'part_of_speech': 'PUNCT', 'location': [249, 250]},
  {'text': 'So',
   'part_of_speech': 'ADV',
   'location': [250, 252],
   'lemma': 'so'},
  {'text': '-', 'part_of_speech': 'PUNCT', 'location': [252, 253]},
  {'text': 'Brave',
   'part_of_speech': 'PROPN',
   'location': [253, 258],
   'lemma': 'Brave'},
  {'text': '-', 'part_of_speech': 'PUNCT', 'location': [258, 259]},
  {'text': 'as',
   'part_of_speech': 'ADP',
   'location': [259, 261],
   'lemma': 'as'},
  {'text': '-', 'part_of_speech': 'PUNCT', 'location': [261, 262]},
  {'text': 'Sir',
   'part_of_speech': 'PROPN',
   'location': [262, 265],
   'lemma': 'Sir'},
  {'text': '-', 'part_of_speech': 'PUNCT', 'location': [265, 266]},
  {'text': 'Lancelot', 'part_of_speech': 'PROPN', 'location': [266, 274]},
  {'text': ',', 'part_of_speech': 'PUNCT', 'location': [274, 275]},
  {'text': 'and',
   'part_of_speech': 'CCONJ',
   'location': [276, 279],
   'lemma': 'and'},
  {'text': 'Sir',
   'part_of_speech': 'PROPN',
   'location': [280, 283],
   'lemma': 'Sir'},
  {'text': 'Not',
   'part_of_speech': 'ADV',
   'location': [284, 287],
   'lemma': 'not'},
  {'text': '-', 'part_of_speech': 'PUNCT', 'location': [287, 288]},
  {'text': 'Appearing', 'part_of_speech': 'PROPN', 'location': [288, 297]},
  {'text': '-', 'part_of_speech': 'PUNCT', 'location': [297, 298]},
  {'text': 'in',
   'part_of_speech': 'ADP',
   'location': [298, 300],
   'lemma': 'in'},
  {'text': '-', 'part_of_speech': 'PUNCT', 'location': [300, 301]},
  {'text': 'this',
   'part_of_speech': 'PRON',
   'location': [301, 305],
   'lemma': 'this'},
  {'text': '-', 'part_of_speech': 'PUNCT', 'location': [305, 306]},
  {'text': 'Film',
   'part_of_speech': 'PROPN',
   'location': [306, 310],
   'lemma': 'Film'},
  {'text': ',', 'part_of_speech': 'PUNCT', 'location': [310, 311]},
  {'text': 'along',
   'part_of_speech': 'ADP',
   'location': [312, 317],
   'lemma': 'along'},
  {'text': 'with',
   'part_of_speech': 'ADP',
   'location': [318, 322],
   'lemma': 'with'},
  {'text': 'their',
   'part_of_speech': 'PRON',
   'location': [323, 328],
   'lemma': 'their'},
  {'text': 'squires',
   'part_of_speech': 'NOUN',
   'location': [329, 336],
   'lemma': 'squire'},
  {'text': 'and',
   'part_of_speech': 'CCONJ',
   'location': [337, 340],
   'lemma': 'and'},
  {'text': 'Robin',
   'part_of_speech': 'PROPN',
   'location': [341, 346],
   'lemma': 'Robin'},
  {'text': "'s",
   'part_of_speech': 'PART',
   'location': [346, 348],
   'lemma': "'s"},
  {'text': 'troubadours',
   'part_of_speech': 'NOUN',
   'location': [349, 360],
   'lemma': 'troubadour'},
  {'text': '.', 'part_of_speech': 'PUNCT', 'location': [360, 361]},
  {'text': 'Arthur', 'part_of_speech': 'PROPN', 'location': [362, 368]},
  {'text': 'leads',
   'part_of_speech': 'VERB',
   'location': [369, 374],
   'lemma': 'lead'},
  {'text': 'the',
   'part_of_speech': 'DET',
   'location': [375, 378],
   'lemma': 'the'},
  {'text': 'men',
   'part_of_speech': 'NOUN',
   'location': [379, 382],
   'lemma': 'man'},
  {'text': 'to',
   'part_of_speech': 'ADP',
   'location': [383, 385],
   'lemma': 'to'},
  {'text': 'Camelot', 'part_of_speech': 'PROPN', 'location': [386, 393]},
  {'text': ',', 'part_of_speech': 'PUNCT', 'location': [393, 394]},
  {'text': 'but',
   'part_of_speech': 'CCONJ',
   'location': [395, 398],
   'lemma': 'but'},
  {'text': 'upon',
   'part_of_speech': 'ADP',
   'location': [399, 403],
   'lemma': 'upon'},
  {'text': 'further',
   'part_of_speech': 'ADJ',
   'location': [404, 411],
   'lemma': 'far'},
  {'text': 'consideration',
   'part_of_speech': 'NOUN',
   'location': [412, 425],
   'lemma': 'consideration'},
  {'text': '(', 'part_of_speech': 'PUNCT', 'location': [426, 427]},
  {'text': 'thanks',
   'part_of_speech': 'NOUN',
   'location': [427, 433],
   'lemma': 'thanks'},
  {'text': 'to',
   'part_of_speech': 'ADP',
   'location': [434, 436],
   'lemma': 'to'},
  {'text': 'a', 'part_of_speech': 'DET', 'location': [437, 438], 'lemma': 'a'},
  {'text': 'musical',
   'part_of_speech': 'ADJ',
   'location': [439, 446],
   'lemma': 'musical'},
  {'text': 'number',
   'part_of_speech': 'NOUN',
   'location': [447, 453],
   'lemma': 'number'},
  {'text': ')', 'part_of_speech': 'PUNCT', 'location': [453, 454]},
  {'text': 'he',
   'part_of_speech': 'PRON',
   'location': [455, 457],
   'lemma': 'he'},
  {'text': 'decides',
   'part_of_speech': 'VERB',
   'location': [458, 465],
   'lemma': 'decide'},
  {'text': 'not',
   'part_of_speech': 'PART',
   'location': [466, 469],
   'lemma': 'not'},
  {'text': 'to',
   'part_of_speech': 'PART',
   'location': [470, 472],
   'lemma': 'to'},
  {'text': 'go',
   'part_of_speech': 'VERB',
   'location': [473, 475],
   'lemma': 'go'},
  {'text': 'there',
   'part_of_speech': 'ADV',
   'location': [476, 481],
   'lemma': 'there'},
  {'text': 'because',
   'part_of_speech': 'SCONJ',
   'location': [482, 489],
   'lemma': 'because'},
  {'text': 'it',
   'part_of_speech': 'PRON',
   'location': [490, 492],
   'lemma': 'it'},
  {'text': 'is',
   'part_of_speech': 'AUX',
   'location': [493, 495],
   'lemma': 'be'},
  {'text': '"', 'part_of_speech': 'PUNCT', 'location': [496, 497]},
  {'text': 'a', 'part_of_speech': 'DET', 'location': [497, 498], 'lemma': 'a'},
  {'text': 'silly',
   'part_of_speech': 'ADJ',
   'location': [499, 504],
   'lemma': 'silly'},
  {'text': 'place',
   'part_of_speech': 'NOUN',
   'location': [505, 510],
   'lemma': 'place'},
  {'text': '"', 'part_of_speech': 'PUNCT', 'location': [510, 511]},
  {'text': '.', 'part_of_speech': 'PUNCT', 'location': [511, 512]},
  {'text': 'As',
   'part_of_speech': 'SCONJ',
   'location': [513, 515],
   'lemma': 'as'},
  {'text': 'they',
   'part_of_speech': 'PRON',
   'location': [516, 520],
   'lemma': 'they'},
  {'text': 'turn',
   'part_of_speech': 'VERB',
   'location': [521, 525],
   'lemma': 'turn'},
  {'text': 'away', 'part_of_speech': 'ADP', 'location': [526, 530]},
  {'text': ',', 'part_of_speech': 'PUNCT', 'location': [530, 531]},
  {'text': 'God',
   'part_of_speech': 'PROPN',
   'location': [532, 535],
   'lemma': 'God'},
  {'text': '(', 'part_of_speech': 'PUNCT', 'location': [536, 537]},
  {'text': 'an',
   'part_of_speech': 'DET',
   'location': [537, 539],
   'lemma': 'a'},
  {'text': 'image',
   'part_of_speech': 'NOUN',
   'location': [540, 545],
   'lemma': 'image'},
  {'text': 'of',
   'part_of_speech': 'ADP',
   'location': [546, 548],
   'lemma': 'of'},
  {'text': 'W.', 'part_of_speech': 'PROPN', 'location': [549, 551]},
  {'text': 'G.', 'part_of_speech': 'PROPN', 'location': [552, 554]},
  {'text': 'Grace',
   'part_of_speech': 'PROPN',
   'location': [555, 560],
   'lemma': 'Grace'},
  {'text': ')', 'part_of_speech': 'PUNCT', 'location': [560, 561]},
  {'text': 'speaks',
   'part_of_speech': 'VERB',
   'location': [562, 568],
   'lemma': 'speak'},
  {'text': 'to',
   'part_of_speech': 'ADP',
   'location': [569, 571],
   'lemma': 'to'},
  {'text': 'them',
   'part_of_speech': 'PRON',
   'location': [572, 576],
   'lemma': 'they'},
  {'text': 'and',
   'part_of_speech': 'CCONJ',
   'location': [577, 580],
   'lemma': 'and'},
  {'text': 'gives',
   'part_of_speech': 'VERB',
   'location': [581, 586],
   'lemma': 'give'},
  {'text': 'Arthur', 'part_of_speech': 'PROPN', 'location': [587, 593]},
  {'text': 'the',
   'part_of_speech': 'DET',
   'location': [594, 597],
   'lemma': 'the'},
  {'text': 'task',
   'part_of_speech': 'NOUN',
   'location': [598, 602],
   'lemma': 'task'},
  {'text': 'of',
   'part_of_speech': 'SCONJ',
   'location': [603, 605],
   'lemma': 'of'},
  {'text': 'finding',
   'part_of_speech': 'VERB',
   'location': [606, 613],
   'lemma': 'find'},
  {'text': 'the',
   'part_of_speech': 'DET',
   'location': [614, 617],
   'lemma': 'the'},
  {'text': 'Holy', 'part_of_speech': 'PROPN', 'location': [618, 622]},
  {'text': 'Grail', 'part_of_speech': 'PROPN', 'location': [623, 628]},
  {'text': '.', 'part_of_speech': 'PUNCT', 'location': [628, 629]}],
 'sentences': [{'text': 'In AD 932, King Arthur and his squire, Patsy, travel throughout Britain searching for men to join the Knights of the Round Table.',
   'location': [0, 129]},
  {'text': "Along the way, he recruits Sir Bedevere the Wise, Sir Lancelot the Brave, Sir Galahad the Pure, Sir Robin the Not-Quite-So-Brave-as-Sir-Lancelot, and Sir Not-Appearing-in-this-Film, along with their squires and Robin's troubadours.",
   'location': [130, 361]},
  {'text': 'Arthur leads the men to Camelot, but upon further consideration (thanks to a musical number) he decides not to go there because it is "a silly place".',
   'location': [362, 512]},
  {'text': 'As they turn away, God (an image of W. G. Grace) speaks to them and gives Arthur the task of finding the Holy Grail.',
   'location': [513, 629]}]}

Buried in the above data structure is all the information we need to perform our example task:

The location of every token in the document.
The part of speech of every token in the document.
The location of every sentence in the document.

The Python code in the next cell uses this information to construct a list of pronouns in each sentence in the document.

In [9]:

import collections

# Create a data structure to hold a mapping from sentence identifier
# to a list of pronouns. This step requires defining sentence ids.
def sentence_id(sentence_record: Dict[str, Any]):
    return tuple(sentence_record["location"])

pronouns_by_sentence_id = collections.defaultdict(list)

# Pass 1: Use nested for loops to identify pronouns and match them with 
#         their containing sentences.
# Running time: O(num_tokens * num_sentences), i.e. O(document_size^2)
for t in response["syntax"]["tokens"]:
    pos_str = t["part_of_speech"]  # Decode numeric POS enum
    if pos_str == "PRON":
        found_sentence = False
        for s in response["syntax"]["sentences"]:
            if (t["location"][0] >= s["location"][0] 
                    and t["location"][1] <= s["location"][1]):
                found_sentence = True
                pronouns_by_sentence_id[sentence_id(s)].append(t)
        if not found_sentence:
            raise ValueError(f"Token {t} is not in any sentence")
            pass  # Make JupyterLab syntax highlighting happy

# Pass 2: Translate sentence identifiers to full sentence metadata.
sentence_id_to_sentence = {sentence_id(s): s 
                           for s in response["syntax"]["sentences"]}
result = [
    {
        "sentence": sentence_id_to_sentence[key],
        "pronouns": pronouns
    }
    for key, pronouns in pronouns_by_sentence_id.items()
]
result

Out[9]:

[{'sentence': {'text': 'In AD 932, King Arthur and his squire, Patsy, travel throughout Britain searching for men to join the Knights of the Round Table.',
   'location': [0, 129]},
  'pronouns': [{'text': 'his',
    'part_of_speech': 'PRON',
    'location': [27, 30],
    'lemma': 'his'}]},
 {'sentence': {'text': "Along the way, he recruits Sir Bedevere the Wise, Sir Lancelot the Brave, Sir Galahad the Pure, Sir Robin the Not-Quite-So-Brave-as-Sir-Lancelot, and Sir Not-Appearing-in-this-Film, along with their squires and Robin's troubadours.",
   'location': [130, 361]},
  'pronouns': [{'text': 'he',
    'part_of_speech': 'PRON',
    'location': [145, 147],
    'lemma': 'he'},
   {'text': 'this',
    'part_of_speech': 'PRON',
    'location': [301, 305],
    'lemma': 'this'},
   {'text': 'their',
    'part_of_speech': 'PRON',
    'location': [323, 328],
    'lemma': 'their'}]},
 {'sentence': {'text': 'Arthur leads the men to Camelot, but upon further consideration (thanks to a musical number) he decides not to go there because it is "a silly place".',
   'location': [362, 512]},
  'pronouns': [{'text': 'he',
    'part_of_speech': 'PRON',
    'location': [455, 457],
    'lemma': 'he'},
   {'text': 'it',
    'part_of_speech': 'PRON',
    'location': [490, 492],
    'lemma': 'it'}]},
 {'sentence': {'text': 'As they turn away, God (an image of W. G. Grace) speaks to them and gives Arthur the task of finding the Holy Grail.',
   'location': [513, 629]},
  'pronouns': [{'text': 'they',
    'part_of_speech': 'PRON',
    'location': [516, 520],
    'lemma': 'they'},
   {'text': 'them',
    'part_of_speech': 'PRON',
    'location': [572, 576],
    'lemma': 'they'}]}]

The code above is quite complex given the simplicity of the task. You would need to stare at the previous cell for a few minutes to convince yourself that the algorithm is correct. This implementation also has scalability issues: The worst-case running time of the nested for loops section is proportional to the square of the document length.

We can do better.

Repeat the Example Task Using Pandas¶

Let's revisit the example task we just performed in the previous cell. Again, the task is: Find all the pronouns in each sentence, broken down by sentence. This time around, let's perform this task using Pandas.

Text Extensions for Pandas includes a function parse_response() that turns the output of Watson NLU's analyze() function into a dictionary of Pandas DataFrames. Let's run our response object through that conversion.

In [10]:

dfs = tp.io.watson.nlu.parse_response(response)
dfs.keys()

Out[10]:

dict_keys(['syntax', 'entities', 'entity_mentions', 'keywords', 'relations', 'semantic_roles'])

The output of each analysis pass that Watson NLU performed is now a DataFrame. Let's look at the DataFrame for the "syntax" pass:

In [11]:

syntax_df = dfs["syntax"]
syntax_df

Out[11]:

	span	part_of_speech	lemma	sentence
0	[0, 2): 'In'	ADP	in	[0, 129): 'In AD 932, King Arthur and his squi...
1	[3, 5): 'AD'	PROPN	Ad	[0, 129): 'In AD 932, King Arthur and his squi...
2	[6, 9): '932'	NUM	None	[0, 129): 'In AD 932, King Arthur and his squi...
3	[9, 10): ','	PUNCT	None	[0, 129): 'In AD 932, King Arthur and his squi...
4	[11, 15): 'King'	PROPN	King	[0, 129): 'In AD 932, King Arthur and his squi...
...	...	...	...	...
142	[606, 613): 'finding'	VERB	find	[513, 629): 'As they turn away, God (an image ...
143	[614, 617): 'the'	DET	the	[513, 629): 'As they turn away, God (an image ...
144	[618, 622): 'Holy'	PROPN	None	[513, 629): 'As they turn away, God (an image ...
145	[623, 628): 'Grail'	PROPN	None	[513, 629): 'As they turn away, God (an image ...
146	[628, 629): '.'	PUNCT	None	[513, 629): 'As they turn away, God (an image ...

147 rows × 4 columns

The DataFrame has one row for every token in the document. Each row has information on the span of the token, its part of speech, its lemmatized form, and the span of the containing sentence.

Let's use this DataFrame to perform our example task a second time.

In [12]:

pronouns_by_sentence = syntax_df[syntax_df["part_of_speech"] == "PRON"][["sentence", "span"]]
pronouns_by_sentence

Out[12]:

	sentence	span
7	[0, 129): 'In AD 932, King Arthur and his squi...	[27, 30): 'his'
31	[130, 361): 'Along the way, he recruits Sir Be...	[145, 147): 'he'
73	[130, 361): 'Along the way, he recruits Sir Be...	[301, 305): 'this'
79	[130, 361): 'Along the way, he recruits Sir Be...	[323, 328): 'their'
104	[362, 512): 'Arthur leads the men to Camelot, ...	[455, 457): 'he'
111	[362, 512): 'Arthur leads the men to Camelot, ...	[490, 492): 'it'
120	[513, 629): 'As they turn away, God (an image ...	[516, 520): 'they'
135	[513, 629): 'As they turn away, God (an image ...	[572, 576): 'them'

That's it. With the DataFrame version of this data, we can perform our example task with one line of code.

Specifically, we use a Pandas selection condition to filter out the tokens that aren't pronouns, and then we project down to the columns containing sentence and token spans. The result is another DataFrame that we can display directly in our Jupyter notebook.

How it Works¶

Let's take a moment to drill into the internals of the DataFrames we just used. For reference, here are the first three rows of the syntax analysis DataFrame:

In [13]:

syntax_df.head(3)

Out[13]:

	span	part_of_speech	lemma	sentence
0	[0, 2): 'In'	ADP	in	[0, 129): 'In AD 932, King Arthur and his squi...
1	[3, 5): 'AD'	PROPN	Ad	[0, 129): 'In AD 932, King Arthur and his squi...
2	[6, 9): '932'	NUM	None	[0, 129): 'In AD 932, King Arthur and his squi...

And here is that DataFrame's data type information:

In [14]:

syntax_df.dtypes

Out[14]:

span                   SpanDtype
part_of_speech            object
lemma                     object
sentence          TokenSpanDtype
dtype: object

Two of the columns in this DataFrame — "span" and "sentence" — contain extension types from the Text Extensions for Pandas library. Let's look first at the "span" column.

The "span" column is stored internally using the class SpanArray from Text Extensions for Pandas. SpanArray is a subclass of ExtensionArray, the base class for custom 1-D array types in Pandas.

You can use the property pandas.Series.array to access the ExtensionArray behind any Pandas extension type:

In [15]:

print(syntax_df["span"].array)

<SpanArray>
[         [0, 2): 'In',          [3, 5): 'AD',         [6, 9): '932',
          [9, 10): ',',      [11, 15): 'King',    [16, 22): 'Arthur',
       [23, 26): 'and',       [27, 30): 'his',    [31, 37): 'squire',
         [37, 38): ',',
 ...
   [581, 586): 'gives',  [587, 593): 'Arthur',     [594, 597): 'the',
    [598, 602): 'task',      [603, 605): 'of', [606, 613): 'finding',
     [614, 617): 'the',    [618, 622): 'Holy',   [623, 628): 'Grail',
       [628, 629): '.']
Length: 147, dtype: SpanDtype

Internally, a SpanArray is stored as Numpy arrays of begin and end offsets, plus a Python string containing the target text. You can access this internal data as properties if your application needs that information:

In [16]:

syntax_df["span"].array.begin[:10], syntax_df["span"].array.end[:10]

Out[16]:

(array([ 0,  3,  6,  9, 11, 16, 23, 27, 31, 37]),
 array([ 2,  5,  9, 10, 15, 22, 26, 30, 37, 38]))

You can also convert an individual element of the array into a Python object of type Span:

In [17]:

span_obj = syntax_df["span"].array[0]
print(f"\"{span_obj}\" is an object of type {type(span_obj)}")

"[0, 2): 'In'" is an object of type <class 'text_extensions_for_pandas.array.span.Span'>

Or you can convert the entire array (or a slice of it) into Python objects, one object per span:

In [18]:

syntax_df["span"].iloc[:10].to_numpy()

Out[18]:

array([[0, 2): 'In', [3, 5): 'AD', [6, 9): '932', [9, 10): ',',
       [11, 15): 'King', [16, 22): 'Arthur', [23, 26): 'and',
       [27, 30): 'his', [31, 37): 'squire', [37, 38): ','], dtype=object)

A SpanArray can also render itself using Jupyter Notebook callbacks. To see the HTML representation of the SpanArray, pass the array object to Jupyter's display() function; or make that object be the last line of the cell, as in the following example:

In [19]:

# Show the first 10 tokens in context
syntax_df["span"].iloc[:10].array

Out[19]:

	begin	end	context
0	0	2	In
1	3	5	AD
2	6	9	932
3	9	10	,
4	11	15	King
5	16	22	Arthur
6	23	26	and
7	27	30	his
8	31	37	squire
9	37	38	,

In AD 932 , King Arthur and his squire , Patsy, travel throughout Britain searching for men to join the Knights of the Round Table. Along the way, he recruits Sir Bedevere the Wise, Sir Lancelot the Brave, Sir Galahad the Pure, Sir Robin the Not-Quite-So-Brave-as-Sir-Lancelot, and Sir Not-Appearing-in-this-Film, along with their squires and Robin's troubadours. Arthur leads the men to Camelot, but upon further consideration (thanks to a musical number) he decides not to go there because it is "a silly place". As they turn away, God (an image of W. G. Grace) speaks to them and gives Arthur the task of finding the Holy Grail.

Your notebook viewer does not support Javascript execution. The above rendering will not be interactive.

Let's take another look at our DataFrame of syntax information:

In [20]:

syntax_df.head(3)

Out[20]:

	span	part_of_speech	lemma	sentence
0	[0, 2): 'In'	ADP	in	[0, 129): 'In AD 932, King Arthur and his squi...
1	[3, 5): 'AD'	PROPN	Ad	[0, 129): 'In AD 932, King Arthur and his squi...
2	[6, 9): '932'	NUM	None	[0, 129): 'In AD 932, King Arthur and his squi...

The "sentence" column is backed by an object of type TokenSpanArray. TokenSpanArray, another extension type from Text Extensions for Pandas, is a version of SpanArray for representing a set of spans that are constrained to begin and end on token boundaries. In addition to all the functionality of a SpanArray, a TokenSpanArray encodes additional information about the relationships between its spans and a tokenization of the document.

Here are the distinct elements of the "sentence" column rendered as HTML:

In [21]:

syntax_df["sentence"].unique()

Out[21]:

	begin	end	begin token	end token	context
0	0	129	0	27	In AD 932, King Arthur and his squire, Patsy, travel throughout Britain searching for men to join the Knights of the Round Table.
1	130	361	27	86	Along the way, he recruits Sir Bedevere the Wise, Sir Lancelot the Brave, Sir Galahad the Pure, Sir Robin the Not-Quite-So-Brave-as-Sir-Lancelot, and Sir Not-Appearing-in-this-Film, along with their squires and Robin's troubadours.
2	362	512	86	119	Arthur leads the men to Camelot, but upon further consideration (thanks to a musical number) he decides not to go there because it is "a silly place".
3	513	629	119	147	As they turn away, God (an image of W. G. Grace) speaks to them and gives Arthur the task of finding the Holy Grail.

In AD 932, King Arthur and his squire, Patsy, travel throughout Britain searching for men to join the Knights of the Round Table. Along the way, he recruits Sir Bedevere the Wise, Sir Lancelot the Brave, Sir Galahad the Pure, Sir Robin the Not-Quite-So-Brave-as-Sir-Lancelot, and Sir Not-Appearing-in-this-Film, along with their squires and Robin's troubadours. Arthur leads the men to Camelot, but upon further consideration (thanks to a musical number) he decides not to go there because it is "a silly place". As they turn away, God (an image of W. G. Grace) speaks to them and gives Arthur the task of finding the Holy Grail.

Your notebook viewer does not support Javascript execution. The above rendering will not be interactive.

As the table in the previous cell's output shows, each span in the TokenSpanArray has begin and end offsets in terms of both characters and tokens. Internally, the TokenSpanArray is stored as follows:

A Numpy array of begin offsets, measured in tokens
A Numpy array of end offsets in tokens
A reference to a SpanArray of spans representing the tokens

The TokenSpanArray object computes the character offsets and covered text of its spans on demand.

Applications can access the internals of a TokenSpanArray via the properties begin_token, end_token, and document_tokens:

In [22]:

token_span_array = syntax_df["sentence"].unique()
print(f"""
Offset information (stored in the TokenSpanArray):
`begin_token` property: {token_span_array.begin_token}
  `end_token` property: {token_span_array.end_token}
   
Token information (`document_tokens` property, shared among mulitple TokenSpanArrays):
{token_span_array.document_tokens}
""")

Offset information (stored in the TokenSpanArray):
`begin_token` property: [  0  27  86 119]
  `end_token` property: [ 27  86 119 147]
   
Token information (`document_tokens` property, shared among mulitple TokenSpanArrays):
<SpanArray>
[         [0, 2): 'In',          [3, 5): 'AD',         [6, 9): '932',
          [9, 10): ',',      [11, 15): 'King',    [16, 22): 'Arthur',
       [23, 26): 'and',       [27, 30): 'his',    [31, 37): 'squire',
         [37, 38): ',',
 ...
   [581, 586): 'gives',  [587, 593): 'Arthur',     [594, 597): 'the',
    [598, 602): 'task',      [603, 605): 'of', [606, 613): 'finding',
     [614, 617): 'the',    [618, 622): 'Holy',   [623, 628): 'Grail',
       [628, 629): '.']
Length: 147, dtype: SpanDtype

The extension types in Text Extensions for Pandas support the full set of Pandas array operations. For example, we can build up a DataFrame of the spans of all sentences in the document by applying pandas.DataFrame.drop_duplicates() to the sentence column:

In [23]:

syntax_df[["sentence"]].drop_duplicates()

Out[23]:

	sentence
0	[0, 129): 'In AD 932, King Arthur and his squi...
27	[130, 361): 'Along the way, he recruits Sir Be...
86	[362, 512): 'Arthur leads the men to Camelot, ...
119	[513, 629): 'As they turn away, God (an image ...

A More Complex Example¶

Now that we've had an introduction to the Text Extensions for Pandas span types, let's take another look at the DataFrame that our "find pronouns by sentence" code produced:

In [24]:

pronouns_by_sentence

Out[24]:

	sentence	span
7	[0, 129): 'In AD 932, King Arthur and his squi...	[27, 30): 'his'
31	[130, 361): 'Along the way, he recruits Sir Be...	[145, 147): 'he'
73	[130, 361): 'Along the way, he recruits Sir Be...	[301, 305): 'this'
79	[130, 361): 'Along the way, he recruits Sir Be...	[323, 328): 'their'
104	[362, 512): 'Arthur leads the men to Camelot, ...	[455, 457): 'he'
111	[362, 512): 'Arthur leads the men to Camelot, ...	[490, 492): 'it'
120	[513, 629): 'As they turn away, God (an image ...	[516, 520): 'they'
135	[513, 629): 'As they turn away, God (an image ...	[572, 576): 'them'

This DataFrame contains two columns backed by Text Extensions for Pandas span types:

In [25]:

pronouns_by_sentence.dtypes

Out[25]:

sentence    TokenSpanDtype
span             SpanDtype
dtype: object

That means that we can use the full power of Pandas' high-level operations on this DataFrame. Let's use the output of our earlier task to build up a more complex task: Highlight all pronouns in sentences containing the word "Arthur"

In [26]:

mask = pronouns_by_sentence["sentence"].map(lambda s: s.covered_text).str.contains("Arthur")
pronouns_by_sentence["span"][mask].values

Out[26]:

	begin	end	context
0	27	30	his
1	455	457	he
2	490	492	it
3	516	520	they
4	572	576	them

In AD 932, King Arthur and his squire, Patsy, travel throughout Britain searching for men to join the Knights of the Round Table. Along the way, he recruits Sir Bedevere the Wise, Sir Lancelot the Brave, Sir Galahad the Pure, Sir Robin the Not-Quite-So-Brave-as-Sir-Lancelot, and Sir Not-Appearing-in-this-Film, along with their squires and Robin's troubadours. Arthur leads the men to Camelot, but upon further consideration (thanks to a musical number) he decides not to go there because it is "a silly place". As they turn away, God (an image of W. G. Grace) speaks to them and gives Arthur the task of finding the Holy Grail.

Your notebook viewer does not support Javascript execution. The above rendering will not be interactive.

Here's another variation: Pair each instance of the word "Arthur" with the pronouns that occur in the same sentence.

In [27]:

(
    syntax_df[syntax_df["span"].array.covered_text == "Arthur"]  # Find instances of "Arthur"
    .merge(pronouns_by_sentence, on="sentence")  # Match with pronouns in the same sentence
    .rename(columns={"span_x": "arthur_span", "span_y": "pronoun_span"})
    [["arthur_span", "pronoun_span", "sentence"]]  # Reorder columns
)

Out[27]:

	arthur_span	pronoun_span	sentence
0	[16, 22): 'Arthur'	[27, 30): 'his'	[0, 129): 'In AD 932, King Arthur and his squi...
1	[362, 368): 'Arthur'	[455, 457): 'he'	[362, 512): 'Arthur leads the men to Camelot, ...
2	[362, 368): 'Arthur'	[490, 492): 'it'	[362, 512): 'Arthur leads the men to Camelot, ...
3	[587, 593): 'Arthur'	[516, 520): 'they'	[513, 629): 'As they turn away, God (an image ...
4	[587, 593): 'Arthur'	[572, 576): 'them'	[513, 629): 'As they turn away, God (an image ...

Other Outputs of Watson NLU as DataFrames¶

The examples so far have used the DataFrame representation of Watson Natural Language Understanding's syntax analysis. In addition to syntax analysis, Watson NLU can perform several other types of analysis. Let's take a look at the DataFrames that Text Extensions for Pandas can produce from the output of Watson NLU.

We'll start by revisiting the results of our earlier code that ran

dfs = tp.io.watson.nlu.parse_response(response)

over the response object that the Watson NLU's Python API returned. dfs is a dictionary of DataFrames.

In [28]:

dfs.keys()

Out[28]:

dict_keys(['syntax', 'entities', 'entity_mentions', 'keywords', 'relations', 'semantic_roles'])

The "syntax" element of dfs contains the syntax analysis DataFrame that we showed earlier. Let's take a look at the other elements.

The "entities" element of dfs contains the named entities that Watson Natural Language Understanding found in the document.

In [29]:

dfs["entities"].head()

Out[29]:

	type	text	sentiment.label	sentiment.score	relevance	count	confidence	disambiguation.subtype	disambiguation.name	disambiguation.dbpedia_resource
0	Person	Sir Bedevere	positive	0.835873	0.950560	1	0.982315	None	None	None
1	Person	King Arthur	neutral	0.000000	0.720381	1	0.924937	None	None	None
2	Person	Patsy	neutral	0.000000	0.679300	1	0.830596	None	None	None
3	Person	Sir Lancelot	positive	0.835873	0.662902	1	0.956371	[MusicalArtist, TVActor]	Sir_Lancelot_%28singer%29	http://dbpedia.org/resource/Sir_Lancelot_%28si...
4	Person	Sir Galahad	positive	0.835873	0.654170	1	0.948409	None	None	None

The "entity_mentions" element of dfs contains the locations of individual mentions of entities from the "entities" DataFrame.

In [30]:

dfs["entity_mentions"].head()

Out[30]:

	type	text	span	confidence
0	Person	Sir Bedevere	[157, 169): 'Sir Bedevere'	0.982315
1	Person	King Arthur	[11, 22): 'King Arthur'	0.924937
2	Person	Patsy	[39, 44): 'Patsy'	0.830596
3	Person	Sir Lancelot	[180, 192): 'Sir Lancelot'	0.956371
4	Person	Sir Galahad	[204, 215): 'Sir Galahad'	0.948409

Note that the DataFrame under "entitiy_mentions" may contain multiple mentions of the same name:

In [31]:

arthur_mentions = dfs["entity_mentions"][dfs["entity_mentions"]["text"] == "Arthur"]
arthur_mentions

Out[31]:

	type	text	span	confidence
10	Person	Arthur	[362, 368): 'Arthur'	0.996876
11	Person	Arthur	[587, 593): 'Arthur'	0.973795

The "type" and "text" columns of the "entity_mentions" DataFrame refer back to the "entities" DataFrame columns of the same names. You can combine the global and local information about entities into a single DataFrame using Pandas' DataFrame.merge() method:

In [32]:

arthur_mentions.merge(dfs["entities"], on=["type", "text"], suffixes=["_mention", "_entity"])

Out[32]:

	type	text	span	confidence_mention	sentiment.label	sentiment.score	relevance	count	confidence_entity	disambiguation.subtype	disambiguation.name	disambiguation.dbpedia_resource
0	Person	Arthur	[362, 368): 'Arthur'	0.996876	positive	0.721919	0.311653	2	0.999918	None	None	None
1	Person	Arthur	[587, 593): 'Arthur'	0.973795	positive	0.721919	0.311653	2	0.999918	None	None	None

Watson Natural Language Understanding has several other models besides the entities and syntax models. Text Extensions for Pandas can also convert these other outputs. Here's the output of the keywords model on our example document:

In [33]:

dfs["keywords"].head()

Out[33]:

	text	sentiment.label	sentiment.score	relevance	emotion.sadness	emotion.joy	emotion.fear	emotion.disgust	emotion.anger	count
0	Sir Bedevere	positive	0.835873	0.884359	0.031301	0.496318	0.135650	0.015545	0.022961	1
1	King Arthur	neutral	0.000000	0.850874	0.441230	0.330559	0.043714	0.020016	0.025905	1
2	Sir Lancelot	positive	0.835873	0.823645	0.031301	0.496318	0.135650	0.015545	0.022961	1
3	image of W. G. Grace	positive	0.721919	0.722026	0.044130	0.901205	0.039773	0.012838	0.027599	1
4	musical number	neutral	0.000000	0.621432	0.312246	0.174343	0.032726	0.077707	0.045592	1

Take a look at the notebook Sentiment_Analysis.ipynb for more information on the keywords model and its sentiment-related outputs.

Watson Natural Language Understanding also has a relations model that finds relationships between pairs of nouns:

In [34]:

dfs["relations"].head()

Out[34]:

	type	sentence_span	score	arguments.0.span	arguments.1.span	arguments.0.entities.type	arguments.1.entities.type	arguments.0.entities.text	arguments.1.entities.text
0	partOfMany	[130, 361): 'Along the way, he recruits Sir Be...	0.610221	[208, 215): 'Galahad'	[323, 328): 'their'	Person	Person	Galahad	their
1	partOfMany	[130, 361): 'Along the way, he recruits Sir Be...	0.710112	[266, 274): 'Lancelot'	[323, 328): 'their'	Person	Person	Lancelot	their
2	parentOf	[130, 361): 'Along the way, he recruits Sir Be...	0.382100	[323, 328): 'their'	[329, 336): 'squires'	Person	Person	their	squires
3	residesIn	[362, 512): 'Arthur leads the men to Camelot, ...	0.492869	[362, 368): 'Arthur'	[386, 393): 'Camelot'	Person	GeopoliticalEntity	King Arthur	Camelot
4	locatedAt	[362, 512): 'Arthur leads the men to Camelot, ...	0.339446	[379, 382): 'men'	[386, 393): 'Camelot'	Person	GeopoliticalEntity	men	Camelot

The semantic_roles model identifies places where the document describes events and extracts a subject-verb-object triple for each such event:

In [35]:

dfs["semantic_roles"].head()

Out[35]:

	subject.text	sentence	object.text	action.verb.text	action.verb.tense	action.text	action.normalized
0	for men	In AD 932, King Arthur and his squire, Patsy, ...	the Knights of the Round Table	join	infinitive	join	join
1	he	Along the way, he recruits Sir Bedevere the Wi...	Sir Bedevere the Wise, Sir Lancelot the Brave,...	recruit	present	recruits	recruit
2	Arthur	Arthur leads the men to Camelot, but upon furt...	the men to Camelot	lead	present	leads	lead
3	he	Arthur leads the men to Camelot, but upon furt...	not to go there because it is "a silly place"	decide	present	decides	decide
4	he	Arthur leads the men to Camelot, but upon furt...	None	go	infinitive	go	go

Take a look at our market intelligence tutorial to learn more about the semantic_roles model.