%%html
<script>
function code_toggle() {
if (code_shown){
$('div.input').hide('500');
$('#toggleButton').val('Show Code')
} else {
$('div.input').show('500');
$('#toggleButton').val('Hide Code')
}
code_shown = !code_shown
}
$( document ).ready(function(){
code_shown=false;
$('div.input').hide()
});
</script>
<form action="javascript:code_toggle()"><input type="submit" id="toggleButton" value="Show Code"></form>
<style>
.rendered_html td {
font-size: xx-large;
text-align: left; !important
}
.rendered_html th {
font-size: xx-large;
text-align: left; !important
}
</style>
%%capture
%load_ext autoreload
%autoreload 2
import sys
sys.path.append("../statnlpbook/")
#import util
import ie
import tfutil
import random
import numpy as np
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
np.random.seed(1337)
tf.set_random_seed(1337)
#util.execute_notebook('relation_extraction.ipynb')
%load_ext tikzmagic
It would be useful to automatically build a database of this form
Brand | Parent |
---|---|
KitKat | Nestle |
Lipton | Unilever |
... | ... |
or this graph:
These are all instances of the "owned by" relation. Can also be expressed as:
owned_by(KitKat, Nestle)
owned_by(Lipton, Unilever)
The web contains a lot of textual evidence for this relation:
Dechra Pharmaceuticals, which has just made its second acquisition, had previously purchased Genitrix.
Trinity Mirror plc, the largest British newspaper, purchased Local World, its rival.
Kraft, owner of Milka, purchased Cadbury Dairy Milk and is now gearing up for a roll-out of its new brand.
... and for many other relations.
born_in(Barack Obama, Hawaii)
educated_at(Albert Einstein, University of Zürich)
occupation(Steve Jobs, businessman)
spouse(Angela Merkel, Joachim Sauer)
...
ReVerb (Fader et al., 2011) demo:
[Barack Obama]PER was born in [Hawaii]LOC |
[Isabelle Augenstein]PER is an associate professor at the [University of Copenhagen]ORG |
Label tokens as beginning (B), inside (I), or outside (O) a named entity:
Barack | Obama | was | born | in | Hawaii |
B-PER | I-PER | O | O | O | B-LOC |
Isabelle | Augenstein | is | an | associate | professor | at | the | University | of | Copenhagen |
B-PER | I-PER | O | O | O | O | O | O | B-ORG | I-ORG | I-ORG |
Task of extracting semantic relations between arguments
Step 1: IOB sequence labelling for NER
Isabelle | Augenstein | is | an | associate | professor | at | the | University | of | |
---|---|---|---|---|---|---|---|---|---|---|
B-PER | I-PER | O | O | O | O | O | O | B-ORG | I-ORG | I-ORG |
Step 2: NE decoding
Step 3: Relation extraction
Relation | Entity 1 | Entity 2 |
---|---|---|
associate professor at | Isabelle Augenstein | University of Copenhagen |
method used for task
relations from sentences in computer science publicationsExample publications:
NONE
training_patterns, training_entpairs = ie.readLabelledPatternData()
# Training patterns and entity pairs for relation `method used for task`
list(zip(training_patterns[:3], training_entpairs[:3]))
[('demonstrates XXXXX and clustering techniques for XXXXX', ['text mining', 'building domain ontology']), ('demonstrates text mining and XXXXX for building XXXXX', ['clustering techniques', 'domain ontology']), ('the XXXXX is able to enhance the XXXXX', ['ensemble classifier', 'detection of construction materials'])]
method used for task
(decide between it and NONE
)testing_patterns, testing_entpairs = ie.readPatternData()
# Testing patterns and entity pairs
list(zip(testing_patterns[0:3], testing_entpairs[:3]))
[('a method for estimation of XXXXX of XXXXX is presented', ['effective properties', 'porous materials']), ('accounting for XXXXX is essential for estimation of XXXXX', ['nonlinear effects', 'effective properties']), ('develops the heterogeneous XXXXX for fiber-reinforced XXXXX', ['feature model', 'object modeling'])]
XXXXX
def sentence_to_short_pattern(sent):
"""
Returns the sequence between two arguments in a sentence, where the arguments have been masked
Args:
sent: the sentence
Returns:
the sequence between to arguments
"""
sent_toks = sent.split(" ")
indeces = [i for i, ltr in enumerate(sent_toks) if ltr == "XXXXX"]
pattern = " ".join(sent_toks[indeces[0]+1:indeces[1]])
return pattern
print(training_patterns[0])
sentence_to_short_pattern(training_patterns[0])
demonstrates XXXXX and clustering techniques for XXXXX
'and clustering techniques for'
method used for task
and NONE
Example: return instances which contain a method used for task
pattern
def pattern_extraction(training_sentences, testing_sentences):
"""
Given a set of patterns for a relation, searches for those patterns in other sentences
Args:
sent: training sentences with arguments masked, testing sentences with arguments masked
Returns:
the testing sentences which the training patterns appeared in
"""
# convert training and testing sentences to short paths to obtain patterns
training_patterns = set([sentence_to_short_pattern(train_sent) for train_sent in training_sentences])
testing_patterns = [sentence_to_short_pattern(test_sent) for test_sent in testing_sentences]
# look for match of training and testing patterns
testing_extractions = []
for i, testing_pattern in enumerate(testing_patterns):
if testing_pattern in training_patterns: # look for exact matches of patterns
testing_extractions.append(testing_sentences[i])
return testing_extractions
pattern_extraction(training_patterns[:300], testing_patterns[:300])
['paper reviews applications of XXXXX in XXXXX', 'a novel approach was developed to determine the XXXXX in XXXXX', 'four different types of insoles were examined in terms of their effects on XXXXX in XXXXX', 'the findings can aid in better understanding the insole design features that could improve XXXXX in XXXXX', 'this new approach provides more degrees of freedom and XXXXX in XXXXX']
Problems with approach:
Next: approach which addresses those two shortcomings
# use patterns to find more entity pairs
def search_for_entpairs_by_patterns(training_patterns, testing_patterns, testing_entpairs, testing_sentences):
testing_extractions = []
appearing_testing_patterns = []
appearing_testing_entpairs = []
for i, testing_pattern in enumerate(testing_patterns): # iterate over patterns
if testing_pattern in training_patterns: # if there is an exact match of a pattern
testing_extractions.append(testing_sentences[i]) # add the corresponding sentence
appearing_testing_patterns.append(testing_pattern) # add the pattern
appearing_testing_entpairs.append(testing_entpairs[i]) # add the entity pairs
return testing_extractions, appearing_testing_patterns, appearing_testing_entpairs
# use entity pairs to find more patterns
def search_for_patterns_by_entpairs(training_entpairs, testing_patterns, testing_entpairs, testing_sentences):
testing_extractions = []
appearing_testing_patterns = []
appearing_testing_entpairs = []
for i, testing_entpair in enumerate(testing_entpairs): # iterate over entity pairs
if testing_entpair in training_entpairs: # if there is an exact match of an entity pair
testing_extractions.append(testing_sentences[i]) # add the corresponding sentence
appearing_testing_entpairs.append(testing_entpair) # add the entity pair
appearing_testing_patterns.append(testing_patterns[i]) # add the pattern
return testing_extractions, appearing_testing_patterns, appearing_testing_entpairs
The two helper functions are then applied iteratively:
def bootstrapping_extraction(train_sents, train_entpairs, test_sents, test_entpairs, num_iter=10):
"""
Given a set of patterns and entity pairs for a relation, extracts more patterns and entity pairs iteratively
Args:
train_sents: training sentences with arguments masked
train_entpairs: training entity pairs
test_sents: testing sentences with arguments masked
test_entpairs: testing entity pairs
Returns:
the testing sentences which the training patterns or any of the inferred patterns appeared in
"""
# convert training and testing sentences to short paths to obtain patterns
train_patterns = set([sentence_to_short_pattern(s) for s in train_sents])
train_patterns.discard("in") # too general, remove this
test_patterns = [sentence_to_short_pattern(s) for s in test_sents]
test_extracts = []
# iteratively get more patterns and entity pairs
for i in range(1, num_iter):
print("Number extractions at iteration", str(i), ":", str(len(test_extracts)))
print("Number patterns at iteration", str(i), ":", str(len(train_patterns)))
print("Number entpairs at iteration", str(i), ":", str(len(train_entpairs)))
# get more patterns and entity pairs
test_extracts_e, ext_test_patterns_e, ext_test_entpairs_e = search_for_patterns_by_entpairs(train_entpairs, test_patterns, test_entpairs, test_sents)
test_extracts_p, ext_test_patterns_p, ext_test_entpairs_p = search_for_entpairs_by_patterns(train_patterns, test_patterns, test_entpairs, test_sents)
# add them to the existing patterns and entity pairs for the next iteration
train_patterns.update(ext_test_patterns_p)
train_patterns.update(ext_test_patterns_e)
train_entpairs.extend(ext_test_entpairs_p)
train_entpairs.extend(ext_test_entpairs_e)
test_extracts.extend(test_extracts_p)
test_extracts.extend(test_extracts_e)
return test_extracts, test_entpairs
test_extracts, test_entpairs = ie.bootstrappingExtraction(training_patterns[:20], training_entpairs[:20], testing_patterns, testing_entpairs, 10)
Number extractions at iteration 0 : 0 Number patterns at iteration 0 : 19 Number entpairs at iteration 0 : 20 Number extractions at iteration 1 : 78 Number patterns at iteration 1 : 19 Number entpairs at iteration 1 : 98 Number extractions at iteration 2 : 239 Number patterns at iteration 2 : 24 Number entpairs at iteration 2 : 259 Number extractions at iteration 3 : 405 Number patterns at iteration 3 : 24 Number entpairs at iteration 3 : 425 Number extractions at iteration 4 : 571 Number patterns at iteration 4 : 24 Number entpairs at iteration 4 : 591 Number extractions at iteration 5 : 737 Number patterns at iteration 5 : 24 Number entpairs at iteration 5 : 757 Number extractions at iteration 6 : 903 Number patterns at iteration 6 : 24 Number entpairs at iteration 6 : 923 Number extractions at iteration 7 : 1069 Number patterns at iteration 7 : 24 Number entpairs at iteration 7 : 1089 Number extractions at iteration 8 : 1235 Number patterns at iteration 8 : 24 Number entpairs at iteration 8 : 1255 Number extractions at iteration 9 : 1401 Number patterns at iteration 9 : 24 Number entpairs at iteration 9 : 1421
Problem:
train_patterns = set(sentence_to_short_pattern(s) for s in training_patterns[:20])
test_patterns = set(sentence_to_short_pattern(s) for s in test_extracts)
# patterns that do not co-occur with first set of entity pairs
for p in test_patterns:
if p not in train_patterns:
print(p)
is firstly introduced in is higher in and finally to illustrate the applicability of the proposed method , a is proposed to plan and execute task in is introduced in
method used for task
employee-at
, student-at
method used for task
) and negative (NONE
) training examplestraining_sents, training_entpairs, training_labels = ie.readLabelledData()
print("Manually labelled data set consists of", training_labels.count("NONE"),
"negative training examples and", training_labels.count("method used for task"), "positive training examples\n")
list(zip(training_sents[21:25], training_entpairs[21:25], training_labels[21:25]))
Manually labelled data set consists of 22 negative training examples and 22 positive training examples
[('a new XXXXX is proposed to solve the XXXXX associated with damage assessment', ['dynamic quantum pso algorithm', 'inverse problem'], 'method used for task'), ('XXXXX ( sa ) helps ivs in XXXXX with time constraint', ['sensitivity analysis', 'heterogeneous input variables'], 'NONE'), ('this study focused on the XXXXX of XXXXX in images', ['automatic detection', 'construction materials'], 'NONE'), ('design of a XXXXX used as XXXXX', [" ['case study", 'wind turbine blade'], 'NONE')]
Represent training and testing data as feature vectors.
Typical features:
sklearn
's built-in feature extractor)from sklearn.feature_extraction.text import CountVectorizer
def feat_transform(sents_train, sents_test):
cv = CountVectorizer()
cv.fit(sents_train)
features_train = cv.transform(sents_train)
features_test = cv.transform(sents_test)
return features_train, features_test, cv
from sklearn.linear_model import LogisticRegression
def model_train(feats_train, labels):
model = LogisticRegression(penalty='l2', solver='liblinear') # logistic regression model with l2 regularisation
model.fit(feats_train, labels) # fit the model to the transformed training data
return model
def predict(model, features_test):
"""Find the most compatible output class"""
preds = model.predict(features_test) # this returns the predicted labels
return preds
def supervised_extraction(train_sents, train_entpairs, train_labels, test_sents, test_entpairs):
"""
Given pos/neg training instances, train a logistic regression model with simple BOW features and predict labels on unseen test instances
Args:
train_sents: training sentences with arguments masked
train_entpairs: training entity pairs
train_labels: labels of training instances
test_sents: testing sentences with arguments masked
test_entpairs: testing entity pairs
Returns:
predictions for the testing sentences
"""
# extract short patterns from training and testing sentences
train_patterns = [sentence_to_short_pattern(test_sent) for test_sent in train_sents]
test_patterns = [sentence_to_short_pattern(test_sent) for test_sent in test_sents]
features_train, features_test, cv = feat_transform(train_patterns, test_patterns) # extract features
model = model_train(features_train, train_labels) # train model
predictions = predict(model, features_test) # get predictions
return predictions
testing_preds = supervised_extraction(training_sents, training_entpairs, training_labels, testing_patterns, testing_entpairs)
list(zip(testing_preds, testing_patterns, testing_entpairs))[:10]
[('NONE', 'a method for estimation of XXXXX of XXXXX is presented', ['effective properties', 'porous materials']), ('method used for task', 'accounting for XXXXX is essential for estimation of XXXXX', ['nonlinear effects', 'effective properties']), ('method used for task', 'develops the heterogeneous XXXXX for fiber-reinforced XXXXX', ['feature model', 'object modeling']), ('NONE', 'two formulations for the problem of optimum XXXXX of onshore XXXXX', ['layout design', 'wind farms']), ('method used for task', 'boundary-value and initial-value XXXXX are solved using XXXXX and graph products', ['differential equations', 'finite difference method']), ('method used for task', 'boundary-value and initial-value XXXXX are solved using finite difference method and XXXXX', ['differential equations', 'graph products']), ('method used for task', 'boundary-value and initial-value differential equations are solved using XXXXX and XXXXX', ['finite difference method', 'graph products']), ('method used for task', 'an XXXXX to couple cfd and XXXXX is presented', ['open-source software', 'multiobjective optimization']), ('method used for task', 'parallel XXXXX for solving cfd XXXXX is implemented', ['evolutionary algorithm', 'optimization problems']), ('method used for task', 'a novel XXXXX platform is developed for the examination of an rfid-enabled mascs in a flexible XXXXX , and several system performance measures are considered in the XXXXX platform', ['assembly line', 'simulation test'])]
def distantly_supervised_labelling(kb_entpairs, unlab_sents, unlab_entpairs):
"""
Label instances using distant supervision assumption
Args:
kb_entpairs: entity pairs for a specific relation
unlab_sents: unlabelled sentences with entity pairs anonymised
unlab_entpairs: entity pairs which were anonymised in unlab_sents
Returns: pos_train_sents, pos_train_enpairs, neg_train_sents, neg_train_entpairs
"""
train_sents, train_entpairs, train_labels = [], [], []
for i, unlab_entpair in enumerate(unlab_entpairs):
# if the entity pair is a KB tuple, it is a positive example for that relation
if unlab_entpair in kb_entpairs:
train_entpairs.append(unlab_entpair)
train_sents.append(unlab_sents[i])
train_labels.append("method used for task")
else: # else, it is a negative example for that relation
train_entpairs.append(unlab_entpair)
train_sents.append(unlab_sents[i])
train_labels.append("NONE")
return train_sents, train_entpairs, train_labels
def distantly_supervised_extraction(kb_entpairs, unlab_sents, unlab_entpairs, test_sents, test_entpairs):
# training_data <- Find training sentences with entity pairs
train_sents, train_entpairs, train_labels = distantly_supervised_labelling(kb_entpairs, unlab_sents, unlab_entpairs)
print("Distantly supervised labelling results in", train_labels.count("NONE"),
"negative training examples and", train_labels.count("method used for task"), "positive training examples")
# training works the same as for supervised RE
return supervised_extraction(train_sents, train_entpairs, train_labels, test_sents, test_entpairs)
kb_entpairs, unlab_sents, unlab_entpairs = ie.readDataForDistantSupervision()
#print(len(kb_entpairs), "'KB' entity pairs for relation `method used for task` :", kb_entpairs[0:5])
#print(len(unlab_entpairs), 'all entity pairs')
testing_preds = distantly_supervised_extraction(kb_entpairs, unlab_sents, unlab_entpairs, testing_patterns, testing_entpairs)
list(zip(testing_preds, testing_patterns, testing_entpairs))[:10]
Distantly supervised labelling results in 22 negative training examples and 22 positive training examples
[('NONE', 'a method for estimation of XXXXX of XXXXX is presented', ['effective properties', 'porous materials']), ('method used for task', 'accounting for XXXXX is essential for estimation of XXXXX', ['nonlinear effects', 'effective properties']), ('method used for task', 'develops the heterogeneous XXXXX for fiber-reinforced XXXXX', ['feature model', 'object modeling']), ('NONE', 'two formulations for the problem of optimum XXXXX of onshore XXXXX', ['layout design', 'wind farms']), ('method used for task', 'boundary-value and initial-value XXXXX are solved using XXXXX and graph products', ['differential equations', 'finite difference method']), ('method used for task', 'boundary-value and initial-value XXXXX are solved using finite difference method and XXXXX', ['differential equations', 'graph products']), ('method used for task', 'boundary-value and initial-value differential equations are solved using XXXXX and XXXXX', ['finite difference method', 'graph products']), ('method used for task', 'an XXXXX to couple cfd and XXXXX is presented', ['open-source software', 'multiobjective optimization']), ('method used for task', 'parallel XXXXX for solving cfd XXXXX is implemented', ['evolutionary algorithm', 'optimization problems']), ('method used for task', 'a novel XXXXX platform is developed for the examination of an rfid-enabled mascs in a flexible XXXXX , and several system performance measures are considered in the XXXXX platform', ['assembly line', 'simulation test'])]
For example, this relation holds:
lives-in(Margrethe II of Denmark, Amalienborg)
but it would be wrong to attribute it to the sentence
Margrethe was born 16 April 1940 at Amalienborg
The space of entity pairs and relations is defined by a matrix:
demonstrates XXXXX for XXXXXX | XXXXX is capable of XXXXXX | an XXXXX model is employed for XXXXX | XXXXX decreases the XXXXX | ||
---|---|---|---|---|---|
'text mining', 'building domain ontology' | 1 | 1 | |||
'ensemble classifier', 'detection of construction materials' | 1 | 1 | |||
'data mining', 'characterization of wireless systems performance' | 1 | ? | |||
'frequency domain', 'computational cost' | 1 | ? |
method used for task
is a pre-defined relation, others are patternsTraining data:
training_sents, training_entpairs, training_labels = ie.readLabelledData() # data reading
pos_train_ids, neg_train_ids = ie.split_labels_pos_neg(training_labels + training_labels) # split positive and negative training data
training_toks_pos = [t.split(" ") for i, t in enumerate(training_sents + training_labels) if i in pos_train_ids]
training_toks_neg = [t.split(" ") for i, t in enumerate(training_sents + training_labels) if i in neg_train_ids]
training_ent_toks_pos = [" || ".join(t).split(" ") for i, t in enumerate(training_entpairs + training_entpairs) if i in pos_train_ids]
training_ent_toks_neg = [" || ".join(t).split(" ") for i, t in enumerate(training_entpairs + training_entpairs) if i in neg_train_ids]
testing_ent_toks = [" || ".join(t).split(" ") for t in testing_entpairs]
# vectorise data (assign IDs to words)
count_rels, dictionary_rels, reverse_dictionary_rels = ie.build_dataset(
[token for senttoks in training_toks_pos + training_toks_neg for token in senttoks])
count_ents, dictionary_ents, reverse_dictionary_ents = ie.build_dataset(
[token for senttoks in training_ent_toks_pos + training_ent_toks_neg for token in senttoks])
# transform sentences to IDs, pad vectors for each sentence so they have same length
lens_rel = [len(s) for s in training_toks_pos + training_toks_neg]
lens_ents = [len(s) for s in training_ent_toks_pos + training_ent_toks_neg + testing_ent_toks]
rels_train_pos = [ie.transform_dict(dictionary_rels, senttoks, max(lens_rel)) for senttoks in training_toks_pos]
rels_train_neg = [ie.transform_dict(dictionary_rels, senttoks, max(lens_rel)) for senttoks in training_toks_neg]
ents_train_pos = [ie.transform_dict(dictionary_ents, senttoks, max(lens_ents)) for senttoks in training_ent_toks_pos]
ents_train_neg = [ie.transform_dict(dictionary_ents, senttoks, max(lens_ents)) for senttoks in training_ent_toks_neg]
# Negatively sample some entity pairs for training. Here we have some manually labelled neg ones, so we can sample from them.
ents_train_neg_samp = [random.choice(ents_train_neg) for _ in rels_train_neg]
ents_test_pos = [ie.transform_dict(dictionary_ents, senttoks, max(lens_ents)) for senttoks in testing_ent_toks]
# Sample those test entity pairs from the training ones as for those we have neg annotations
ents_test_neg_samp = [random.choice(ents_train_neg) for _ in ents_test_pos]
vocab_size_rels = len(dictionary_rels)
vocab_size_ents = len(dictionary_ents)
# for testing, we want to check if each unlabelled instance expresses the given relation "method for task"
rels_test_pos = [ie.transform_dict(dictionary_rels, training_toks_pos[-1], max(lens_rel)) for _ in testing_patterns]
rels_test_neg_samp = [random.choice(rels_train_neg) for _ in rels_test_pos]
data = ie.vectorise_data(training_sents, training_entpairs, training_labels, testing_patterns, testing_entpairs)
rels_train_pos, rels_train_neg, ents_train_pos, ents_train_neg_samp, rels_test_pos, rels_test_neg_samp, \
ents_test_pos, ents_test_neg_samp, vocab_size_rels, vocab_size_ents, max_lens_rel, max_lens_ents, \
dictionary_rels_rev, dictionary_ents_rev = data
# setting hyper-parameters
batch_size = 4
repr_dim = 30 # dimensionality of relation and entity pair vectors
learning_rate = 0.001
max_epochs = 31
# Placeholders (empty Tensorflow variables) for positive and negative relations and entity pairs
# In each training epoch, for each batch, those will be set through mini batching
relations_pos = tf.placeholder(tf.int32, [None, max_lens_rel], name='relations_pos') # [batch_size, max_rel_seq_len]
relations_neg = tf.placeholder(tf.int32, [None, max_lens_rel], name='relations_neg') # [batch_size, max_rel_seq_len]
ents_pos = tf.placeholder(tf.int32, [None, max_lens_ents], name="ents_pos") # [batch_size, max_ent_seq_len]
ents_neg = tf.placeholder(tf.int32, [None, max_lens_ents], name="ents_neg") # [batch_size, max_ent_seq_len]
# Creating latent representations of relations and entity pairs
# latent feature representation of all relations, which are initialised randomly
relation_embeddings = tf.Variable(tf.random_uniform([vocab_size_rels, repr_dim], -0.1, 0.1, dtype=tf.float32),
name='rel_emb', trainable=True)
# latent feature representation of all entity pairs, which are initialised randomly
ent_embeddings = tf.Variable(tf.random_uniform([vocab_size_ents, repr_dim], -0.1, 0.1, dtype=tf.float32),
name='cand_emb', trainable=True)
# look up latent feature representation for relations and entities in current batch
rel_encodings_pos = tf.nn.embedding_lookup(relation_embeddings, relations_pos)
rel_encodings_neg = tf.nn.embedding_lookup(relation_embeddings, relations_neg)
ent_encodings_pos = tf.nn.embedding_lookup(ent_embeddings, ents_pos)
ent_encodings_neg = tf.nn.embedding_lookup(ent_embeddings, ents_neg)
# our feature representation here is a vector for each word in a relation or entity
# because our training data is so small
# we therefore take the sum of those vectors to get a representation of each relation or entity pair
rel_encodings_pos = tf.reduce_sum(rel_encodings_pos, 1) # [batch_size, num_rel_toks, repr_dim]
rel_encodings_neg = tf.reduce_sum(rel_encodings_neg, 1) # [batch_size, num_rel_toks, repr_dim]
ent_encodings_pos = tf.reduce_sum(ent_encodings_pos, 1) # [batch_size, num_ent_toks, repr_dim]
ent_encodings_neg = tf.reduce_sum(ent_encodings_neg, 1) # [batch_size, num_ent_toks, repr_dim]
# measuring compatibility between positive entity pairs and relations
# used for ranking test data
dotprod_pos = tf.reduce_sum(tf.multiply(ent_encodings_pos, rel_encodings_pos), 1)
# measuring compatibility between negative entity pairs and relations
dotprod_neg = tf.reduce_sum(tf.multiply(ent_encodings_neg, rel_encodings_neg), 1)
# difference in dot product of positive and negative instances
# used for BPR loss (ranking loss)
diff_dotprod = tf.reduce_sum(tf.multiply(ent_encodings_pos, rel_encodings_pos) - tf.multiply(ent_encodings_neg, rel_encodings_neg), 1)
Final vocab size: 163 Final vocab size: 138 Max relation length: 16 Max entity pair length: 9 Final vocab size: 163 Final vocab size: 138
Loss: maximise distance between the positive and negative instances
$$\mathcal{\sum - \log(v_{e_{pos}} \cdot a_{r_{pos}})} + {\sum \log(v_{e_{neg}} \cdot a_{r_{neg}})}$$Now that we have read in the data, vectorised it and created the universal schema relation extraction model, let's start training
# create the model / Tensorflow computation graph
dotprod_pos, dotprod_neg, diff_dotprod, placeholders = ie.create_model_f_reader(max_lens_rel, max_lens_ents, repr_dim, vocab_size_rels,
vocab_size_ents)
# logistic loss
loss = tf.reduce_sum(tf.nn.softplus(-dotprod_pos)+tf.nn.softplus(dotprod_neg))
data = [np.asarray(rels_train_pos), np.asarray(rels_train_neg), np.asarray(ents_train_pos), np.asarray(ents_train_neg_samp)]
data_test = [np.asarray(rels_test_pos), np.asarray(rels_test_neg_samp), np.asarray(ents_test_pos), np.asarray(ents_test_neg_samp)]
# define an optimiser. Here, we use the Adam optimiser
optimizer = tf.train.AdamOptimizer(learning_rate)
# training with mini-batches
batcher = tfutil.BatchBucketSampler(data, batch_size)
batcher_test = tfutil.BatchBucketSampler(data_test, 1, test=True)
with tf.Session() as sess:
trainer = tfutil.Trainer(optimizer, max_epochs)
trainer(batcher=batcher, placeholders=placeholders, loss=loss, session=sess)
# we obtain test scores
test_scores = trainer.test(batcher=batcher_test, placeholders=placeholders, model=tf.nn.sigmoid(dotprod_pos), session=sess)
Epoch 1 Loss 53.11272859573364 Epoch 2 Loss 40.50460696220398 Epoch 3 Loss 31.24713945388794 Epoch 4 Loss 24.058605074882507 Epoch 5 Loss 17.75257933139801 Epoch 6 Loss 13.676397264003754 Epoch 7 Loss 10.506169497966766 Epoch 8 Loss 7.549997687339783 Epoch 9 Loss 6.212332338094711 Epoch 10 Loss 5.04358983039856 Epoch 11 Loss 4.006999537348747 Epoch 12 Loss 3.2603053152561188 Epoch 13 Loss 2.6340380758047104 Epoch 14 Loss 2.1575410664081573 Epoch 15 Loss 1.7965537309646606 Epoch 16 Loss 1.615551583468914 Epoch 17 Loss 1.4916696399450302 Epoch 18 Loss 1.240600325167179 Epoch 19 Loss 1.1110371351242065 Epoch 20 Loss 0.9509495869278908 Epoch 21 Loss 0.894762460142374 Epoch 22 Loss 0.7767390683293343 Epoch 23 Loss 0.7147475983947515 Epoch 24 Loss 0.6345311086624861 Epoch 25 Loss 0.5917699486017227 Epoch 26 Loss 0.5399498473852873 Epoch 27 Loss 0.5118373408913612 Epoch 28 Loss 0.49578461796045303 Epoch 29 Loss 0.4344583507627249 Epoch 30 Loss 0.39716633781790733
Test prediction probabilities are obtained by scoring each test instances with:
$$\mathcal{ \sigma ( v_{e} \cdot a_{r} )}$$# show predictions
ents_test = [ie.reverse_dict_lookup(dictionary_ents_rev, e) for e in ents_test_pos]
rels_test = [ie.reverse_dict_lookup(dictionary_rels_rev, r) for r in rels_test_pos]
testresults = sorted(zip(test_scores, ents_test, rels_test), key=lambda t: t[0], reverse=True) # sort for decreasing score
print("\nTest predictions by decreasing probability:")
for score, tup, rel in testresults[:10]:
print('%f\t%s\t%s' % (score, " ".join(tup), " ".join(rel)))
Test predictions by decreasing probability: 0.999735 UNK optimization problem || optimal UNK UNK problem method used for task 0.999711 UNK optimization problem || UNK optimization problem method used for task 0.999117 optimal UNK || hybrid optimization method method used for task 0.999073 UNK swarm optimization || local search method used for task 0.998582 UNK swarm optimization || search algorithm method used for task 0.998582 UNK search algorithm || UNK swarm optimization method used for task 0.998456 UNK detection || UNK swarm optimization method method used for task 0.998432 optimization problem || UNK problem method used for task 0.998391 hybrid algorithm || UNK swarm optimization method used for task 0.998229 UNK model || supply chain problem method used for task
Various relation extraction techniques:
Features often a mix of
Jurafky, Dan and Martin, James H. (2016). Speech and Language Processing, Chapter 18 (Information Extraction): https://web.stanford.edu/~jurafsky/slp3/18.pdf
Riedel, Sebastian and Yao, Limin and McCallum, Andrew and Marlin, Benjamin M. (2013). Relation extraction with Matrix Factorization and Universal Schemas. Proceedings of NAACL. http://www.aclweb.org/anthology/N13-1008