%%html
<script>
function code_toggle() {
if (code_shown){
$('div.input').hide('500');
$('#toggleButton').val('Show Code')
} else {
$('div.input').show('500');
$('#toggleButton').val('Hide Code')
}
code_shown = !code_shown
}
$( document ).ready(function(){
code_shown=false;
$('div.input').hide()
});
</script>
<form action="javascript:code_toggle()"><input type="submit" id="toggleButton" value="Show Code"></form>
%%capture
%load_ext autoreload
%autoreload 2
%matplotlib inline
# preamble
import sys
sys.path.append("..")
import statnlpbook.transition as transition
This chapter is influenced by the EACL 2014 tutorial by Ryan McDonald and Joakim Nivre.
In the parsing chapter we saw how to develop a syntactic parser based on context-free grammars (CFG). In this chapter we will see how to develop a syntactic parser based on a different paradigm, dependency parsing.
The key idea in dependency parsing is that syntactic structure of lexical items, linked by binary asymmetric relations called dependencies [Nivre, 2008]. More simply, syntax is represented as directed edges between words, commonly referred to arcs in the literature. Thus unlike the trees in CFG parsing, dependency trees have only terminal nodes (the words of the sentences), which can appear as leaves as and non-leaf nodes. Here is the dependency graph for the sentence:
"Economic news had little effect on financial markets"
tokens = ["Economic", "news", "had", "little", "effect", "on", "financial", "markets", "."]
arcs = {(1,0,"amod"), (2,1,"nsubj"), (2, 4, "dobj"), (4,3,"amod"), (4,5, "prep"), (5,7,"pmod"), (7,6,"amod")}
#transition.render_tree(tokens, arcs)
transition.render_displacy(*transition.to_displacy_graph(arcs, tokens))
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-1-0553f2d2eaed> in <module> 4 #transition.render_tree(tokens, arcs) 5 ----> 6 transition.render_displacy(*transition.to_displacy_graph(arcs, tokens)) 7 NameError: name 'transition' is not defined
Following Nivre (2008), the dependency parse of a sentence is a graph $\y = (\x, \a) $ where:
For a graph $\y$ to be a valid dependency tree, the following constrains must be obeyed:
Note that the dependency parse tree shown above is not connected since the period is left without an arc (it is a dependency forrest). To ensure that dependency trees are well-formed, we introduce a ROOT node which points to the main verb of the sentence, as well as the punctuation.
tokens = ["ROOT", "Alice", "saw", "Bob"]
arcs = {(0,2, "root"), (2,1,"nsubj"), (2,3,"dobj")}
transition.render_displacy(*transition.to_displacy_graph(arcs, tokens))
A straightforward approach to dependency parsing would be given a sentence $\x$, to enumerate over all valid graphs for the sentence $\y\in\Ys_x$ and score them using an appropriate function $s_\params(\x,\y)$, a case of a structured prediction problem. While such an approach is possible and there has been a lot of work often referred to as graph-based parsing, e.g. McDonald et al. (2006), in this note we will focus on transition-based approaches, which decompose the task into a sequence of label predictions that can be learned with a classifier.
To perform transition-based parsing we first need to define a transition system consisting of the following elements:
Configuration:
We further define two special configurations:
Actions:
Below we show a simple implementation of this transition system:
from collections import deque
class Configuration():
def __init__(self, tokenized_sentence):
# This implements the initial configuration for a sentence
self.arcs = set()
self.buffer = deque()
self.sentence = tokenized_sentence
for idx, token in enumerate(tokenized_sentence[1:], start=1):
self.buffer.append({"index": idx, "form": token})
self.stack = [{"index": 0, "form": "ROOT"}]
import copy
def parse(tokenized_sentence, actions):
# This stores the (configuration, action) tuples generated
transitions = []
# Initialize the configuration
configuration = Configuration(tokenized_sentence)
transitions.append((copy.deepcopy(configuration), ""))
for action in actions:
if action == "shift":
token = configuration.buffer.popleft()
configuration.stack.append(token)
elif action.startswith("leftArc"):
head = configuration.stack[-1]
dependent = configuration.stack.pop(-2)
label = action.split("-")[1]
configuration.arcs.add((int(head["index"]), int(dependent["index"]), label))
elif action.startswith("rightArc"):
head = configuration.stack[-2]
dependent = configuration.stack.pop()
label = action.split("-")[1]
configuration.arcs.add((int(head["index"]), int(dependent["index"]), label))
transitions.append((copy.deepcopy(configuration), action))
if len(configuration.buffer) == 0 and len(configuration.stack) <= 1:
transitions.append((copy.deepcopy(configuration), ""))
return transitions
Let's see how we can parse the example sentence using this transition system defined above assuming we are given the correct sequence of actions:
tokenized_sentence = ["ROOT", "Alice", "saw", "Bob"]
actions = ["shift","shift", "leftArc-nsubj", "shift", "rightArc-dobj", "rightArc-root"]
transitions = parse(tokenized_sentence, actions)
transition.render_transitions_displacy(transitions, tokenized_sentence)
buffer | stack | parse | action |
ROOT Economic news had little effect on financial markets . | INIT | ||
Economic news had little effect on financial markets . | ROOT | shift | |
news had little effect on financial markets . | ROOT Economic | shift | |
news had little effect on financial markets . | ROOT | leftArc-amod | |
had little effect on financial markets . | ROOT news | shift | |
had little effect on financial markets . | ROOT | leftArc-nsubj | |
little effect on financial markets . | ROOT had | rightArc-root | |
effect on financial markets . | ROOT had little | shift | |
effect on financial markets . | ROOT had | leftArc-amod | |
on financial markets . | ROOT had effect | rightArc-dobj | |
financial markets . | ROOT had effect on | rightArc-prep | |
markets . | ROOT had effect on financial | shift | |
markets . | ROOT had effect on | leftArc-amod | |
. | ROOT had effect on markets | rightArc-pmod | |
. | ROOT had effect on | reduce | |
. | ROOT had effect | reduce | |
. | ROOT had | reduce | |
. | ROOT | reduce | |
ROOT . | rightArc-p | ||
ROOT . | TERMINAL |
The key idea in transition dependency parsing is that we converted graph prediction, a structured prediction problem, into a sequence of classification predictions that are guaranteed to give us a valid dependency tree. Thus a transition based dependency parser is a classifier that predicts the correct action for the current configuration.
The choice of classifier is free; we can use any classifier we like, for example loglinear classification models. The features are defined in order to describe the configuration and the previous actions taken in a way that helps the classifier predict the correct action. For example, encoding that the words on top of the buffer and the stack are "on" and "effect" respectively is highly indicative of the rightArc-prep action to create an arc between them. Such lexicalized features though can be quite sparse, which is why recent work has looked into continuous representations for them (Chen and Manning, 2014).
The next question is where we get the training data to train the classifier, i.e. configurations labeled with the correct action. However the training data we are typically provided with consists of sentences labeled with the final dependency tree. Thus, we need to develop a function that given a sentence and its dependency tree can "reverse-engineer" the parsing process to recover the sequence of actions that was used construct it. This function is often referred to as the oracle (name inspired by the ones in antiquity), and it is usually a set of heuristics that returns the correct sequence of actions by looking at the dependency tree. A different way to think about it is that of a human annotator demonstrating how to construct the parse tree using the transition system defined.
The transition system we defined above is known as the arc-eager system due to Nivre (2003). Different transition systems have been proposed, another popular choice being the arc-standard transition system that has three actions, left-arc, right-arc and shift that are defined differently compared to the arc-eager ones. As expected, different transition systems have different oracles to extract configurations labeled with the correct transition action from sentences annotated with dependency trees.
An important restriction that both the arc-eager and arc-standard transition systems have is that they can only produce projective dependency trees, i.e. trees that when they are drawn having the words on a fixed left-to-right order their arcs do not cross. However this restriction is violated when long-distance dependencies and free word order need to be taken into account, as in the sentence below in which (
tokens = ["ROOT", "What", "did", "economic", "news", "have", "little", "effect", "?"]
arcs = {(0,5, "root"), (0,9,"p"), (8,1,"pobj"), (5,2,"aux"), (4,3,"amod"), (5,4,"nsubj"), (5, 7, "dobj"), (7,6,"amod"), (5,8, "prep"), (6,8,"pmod"), (8,7,"amod")}
transition.render_displacy(*transition.to_displacy_graph(arcs, tokens))
To produce non-projective dependency trees such as the on in the example above, more complex transition systems employing mulitple stacks have been developed (Gomez-Rodriguez and Nivre, 2010).