In the lecture we took a look at a simple tokenizer and sentence segmenter. In this exercise we will expand our understanding of the problem by asking a few important questions, and looking at the problem from a different perspectives.
%%capture
%load_ext autoreload
%autoreload 2
%matplotlib inline
# %cd ..
import sys
sys.path.append("..")
import math
import statnlpbook.util as util
import statnlpbook.parsing as parsing
Be sure you understand grammatical categories and structures and brush up on your grammar skils.
Then re-visit the Enju online parser, and parse the following sentences...
What is wrong with the parses of the following sentences? Are they correct?
What about these, is the problem in the parser or in the sentence?
These were examples of garden path sentences, find out what that means.
What about these sentences? Are their parses correct?
Reminisce the lecture notes in parsing, and the mentioned parent annotation. (grand)*parents, matter - knowing who the parent is in a tree gives a bit of context information which can later help us with smoothing probabilities, and approaching context-dependent parsing.
in that case, each non-terminal node should know it's parent. We'll do this exercise on a single tree, just to play around a bit with trees and their labeling.
Given the following tree:
x = ('S', [('Subj', ['He']), ('VP', [('Verb', ['shot']), ('Obj', ['the', 'elephant']), ('PP', ['in', 'his', 'pyjamas'])])])
parsing.render_tree(x)
Construct an annotate_parents
function which will take that tree, and annotate its parents. The final annotation result should look like this:
y = ('S^?', [('Subj^S', ['He']), ('VP^S', [('Verb^VP', ['shot']), ('Obj^VP', ['the', 'elephant']), ('PP^VP', ['in', 'his', 'pyjamas'])])])
parsing.render_tree(y)