! pip install -q johnsnowlabs
Using my.johnsnowlabs.com SSO
from johnsnowlabs import nlp, legal
# nlp.install(force_browser=True)
If you are not registered in my.johnsnowlabs.com, you received a license via e-email or you are using Safari, you may need to do a manual update of the license.
from google.colab import files
print('Please Upload your John Snow Labs License using the button below')
license_keys = files.upload()
nlp.install()
spark = nlp.start()
Legal relation extraction is a task in natural language processing (NLP) that involves extracting relationships between entities in legal documents. These relationships can be between people, organizations, or legal concepts.
Legal relation extraction is useful for a variety of purposes, including legal research, contract analysis, and legal case management. For example, legal relation extraction can be used to identify relationships between parties in a contract, such as the buyer and seller, or to extract clauses in a contract that outline certain obligations or rights.
Here are the list of pretrained Relation Extraction models and pipelines:
Relation Extraction Models
An obligation sentence in a legal agreement is a provision that specifies the duties, responsibilities, and obligations of one or more parties to the agreement. These sentences are used to outline the specific actions that a party must take or refrain from taking in order to fulfill their obligations under the agreement.They are an important part of any legal agreement, as they help to ensure that the parties understand and agree to their respective roles and responsibilities.
📚We understand an obligation
as a sentence or sentences in which a Party OBLIGATION_SUBJECT must do OBLIGATION_ACITON something OBLIGATION_OBJECT to other Party OBLIGATION_INDIRECT_OBJECT.
# Create Generic Function to Show Relations in Dataframe
import pandas as pd
def get_relations_df (results, col='relations'):
rel_pairs=[]
for i in range(len(results)):
for rel in results[i][col]:
rel_pairs.append((
rel.result,
rel.metadata['entity1'],
rel.metadata['entity1_begin'],
rel.metadata['entity1_end'],
rel.metadata['chunk1'],
rel.metadata['entity2'],
rel.metadata['entity2_begin'],
rel.metadata['entity2_end'],
rel.metadata['chunk2'],
rel.metadata['confidence']
))
rel_df = pd.DataFrame(rel_pairs, columns=['relation','entity1','entity1_begin','entity1_end','chunk1','entity2','entity2_begin','entity2_end','chunk2', 'confidence'])
return rel_df
sample_text = ["""In addition, the Borrowers agree to pay any present or future stamp or documentary taxes or any other excise or property taxes or similar levies which arise from any payment made hereunder or from the execution, delivery, or registration of, or otherwise with respect to, this Agreement or any Note (hereinafter referred to as "OTHER TAXES").""",
"""Licensee agrees to reasonably cooperate with Licensor in achieving registration of the Licensed Mark."""]
sample_text
['In addition, the Borrowers agree to pay any present or future stamp or documentary taxes or any other excise or property taxes or similar levies which arise from any payment made hereunder or from the execution, delivery, or registration of, or otherwise with respect to, this Agreement or any Note (hereinafter referred to as "OTHER TAXES").', 'Licensee agrees to reasonably cooperate with Licensor in achieving registration of the Licensed Mark.']
document_assembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
tokenizer = nlp.Tokenizer()\
.setInputCols("document")\
.setOutputCol("token")
ner_model = legal.BertForTokenClassification.pretrained("legner_obligations", "en", "legal/models")\
.setInputCols("token", "document")\
.setOutputCol("ner")\
.setMaxSentenceLength(512)\
.setCaseSensitive(True)
ner_converter = nlp.NerConverter()\
.setInputCols(["document","token","ner"])\
.setOutputCol("ner_chunk")
re_model = legal.RelationExtractionDLModel().pretrained("legre_obligations_md", "en", "legal/models")\
.setPredictionThreshold(0.4)\
.setInputCols(["ner_chunk", "document"])\
.setOutputCol("relations")
pipeline = nlp.Pipeline(stages=[
document_assembler,
tokenizer,
ner_model,
ner_converter,
re_model
])
empty_df = spark.createDataFrame([[""]]).toDF("text")
model = pipeline.fit(empty_df)
legner_obligations download started this may take some time. [OK!] legre_obligations_md download started this may take some time. [OK!]
light_model = nlp.LightPipeline(model)
result = light_model.fullAnnotate(sample_text)
rel_df = get_relations_df(result)
rel_df[rel_df["relation"] != "other"]
relation | entity1 | entity1_begin | entity1_end | chunk1 | entity2 | entity2_begin | entity2_end | chunk2 | confidence | |
---|---|---|---|---|---|---|---|---|---|---|
0 | is_obliged_to | OBLIGATION_ACTION | 27 | 38 | agree to pay | OBLIGATION_SUBJECT | 13 | 25 | the Borrowers | 0.9983413 |
1 | is_obliged_to | OBLIGATION_SUBJECT | 13 | 25 | the Borrowers | OBLIGATION | 40 | 143 | any present or future stamp or documentary tax... | 0.46110857 |
2 | is_obliged_object | OBLIGATION_ACTION | 27 | 38 | agree to pay | OBLIGATION | 40 | 143 | any present or future stamp or documentary tax... | 0.9991379 |
3 | is_obliged_to | OBLIGATION_ACTION | 9 | 38 | agrees to reasonably cooperate | OBLIGATION_SUBJECT | 0 | 7 | Licensee | 0.9090177 |
4 | is_obliged_with | OBLIGATION_SUBJECT | 0 | 7 | Licensee | OBLIGATION_INDIRECT_OBJECT | 45 | 52 | Licensor | 0.8136201 |
5 | is_obliged_to | OBLIGATION | 54 | 100 | in achieving registration of the Licensed Mark. | OBLIGATION_SUBJECT | 0 | 7 | Licensee | 0.86316615 |
6 | is_obliged_object | OBLIGATION_ACTION | 9 | 38 | agrees to reasonably cooperate | OBLIGATION_INDIRECT_OBJECT | 45 | 52 | Licensor | 0.96135247 |
7 | is_obliged_object | OBLIGATION_ACTION | 9 | 38 | agrees to reasonably cooperate | OBLIGATION | 54 | 100 | in achieving registration of the Licensed Mark. | 0.82649904 |
8 | is_obliged_to | OBLIGATION_INDIRECT_OBJECT | 45 | 52 | Licensor | OBLIGATION | 54 | 100 | in achieving registration of the Licensed Mark. | 0.9142798 |
re_vis = nlp.viz.RelationExtractionVisualizer()
for i in range(len(sample_text)):
re_vis.display(result = result[i],
relation_col = "relations",
document_col = "document",
exclude_relations = ["other"],
show_relations=True
)
Now, let's suppose we want to extract GRANTS
and GRANTS_TO
relations between the OBLIGATION_SUBJECT, OBLIGATION_ACTION and OBLIGATION_INDIRECT_OBJECT entities. We don't have a model to do that, but!
That's when Zero-shot RE comes into the game. You can use Zero-shot RE model without training data and without any pretrained model to create your RE model.
Similarly to Zero-shot NER, Zero-shot RE also works with H
(hypotheses) and P
(premises), and the extraction as a positive result is conditioned to the H
being entailed
given a P
.
📜In this case, what we do is:
For example, ENT_1
is PARTY
. ENT_2
is DOC
. [some_text]
is was signed
.
Given a premise Meta, Inc. signed a Purchase Agreement with Whatsapp, Inc.
, the result of the previous prompt will be entailed
for both Meta, Inc.
and Purchase Agreement
and Whatsapp, Inc.
and Purchase Agreement
.
Just few examples of the relations types you are looking for, to output a proper result.
⚡!!!Make sure you keep the proper syntax of the relations you want to extract!!!
Firstly, we will download sample dataset and do all progress on it.
! wget -q https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp-workshop/master/legal-nlp/data/intellectual_property_agreement.txt
with open('intellectual_property_agreement.txt', 'r') as f:
agreement = f.read()
print(agreement[:1500])
Exhibit 10.2 Execution Version INTELLECTUAL PROPERTY AGREEMENT This INTELLECTUAL PROPERTY AGREEMENT (this "Agreement"), dated as of December 31, 2018 (the "Effective Date") is entered into by and between Armstrong Flooring, Inc., a Delaware corporation ("Seller") and AFI Licensing LLC, a Delaware limited liability company ("Licensing" and together with Seller, "Arizona") and AHF Holding, Inc. (formerly known as Tarzan HoldCo, Inc.), a Delaware corporation ("Buyer") and Armstrong Hardwood Flooring Company, a Tennessee corporation (the "Company" and together with Buyer the "Buyer Entities") (each of Arizona on the one hand and the Buyer Entities on the other hand, a "Party" and collectively, the "Parties"). WHEREAS, Seller and Buyer have entered into that certain Stock Purchase Agreement, dated November 14, 2018 (the "Stock Purchase Agreement"); WHEREAS, pursuant to the Stock Purchase Agreement, Seller has agreed to sell and transfer, and Buyer has agreed to purchase and acquire, all of Seller's right, title and interest in and to Armstrong Wood Products, Inc., a Delaware corporation ("AWP") and its Subsidiaries, the Company and HomerWood Hardwood Flooring Company, a Delaware corporation ("HHFC," and together with the Company, the "Company Subsidiaries" and together with AWP, the "Company Entities" and each a "Company Entity") by way of a purchase by Buyer and sale by Seller of the Shares, all upon the terms and condition set forth therein; WHEREAS, Arizona owns certain Co
Firstly, we will get a sanple text from agreement. We will use GRANT OF COPYRIGHT LICENSE
clauses. So, we will split the agreement to get that clauses.
document_assembler = nlp.DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")
text_splitter = legal.TextSplitter() \
.setInputCols(["document"]) \
.setOutputCol("sections")\
.setCustomBounds(["\n\n","\d\.?\d? "])\
.setUseCustomBoundsOnly(True)\
.setExplodeSentences(True)
nlp_pipeline = nlp.Pipeline(stages=[
document_assembler,
text_splitter])
empty_df = spark.createDataFrame([[""]]).toDF("text")
model = nlp_pipeline.fit(empty_df)
light_model = nlp.LightPipeline(model)
result = light_model.annotate(agreement)
sections = result['sections']
sections[:20]
['Exhibit 10.2', 'Execution Version', 'INTELLECTUAL PROPERTY AGREEMENT', 'This INTELLECTUAL PROPERTY AGREEMENT (this "Agreement"), dated as of December 31, 20', '(the "Effective Date") is entered into by and between Armstrong Flooring, Inc., a Delaware corporation ("Seller") and AFI Licensing LLC, a Delaware limited liability company ("Licensing" and together with Seller, "Arizona") and AHF Holding, Inc. (formerly known as Tarzan HoldCo, Inc.), a Delaware corporation ("Buyer") and Armstrong Hardwood Flooring Company, a Tennessee corporation (the "Company" and together with Buyer the "Buyer Entities") (each of Arizona on the one hand and the Buyer Entities on the other hand, a "Party" and collectively, the "Parties").', 'WHEREAS, Seller and Buyer have entered into that certain Stock Purchase Agreement, dated November 14, 20', '(the "Stock Purchase Agreement"); WHEREAS, pursuant to the Stock Purchase Agreement, Seller has agreed to sell and transfer, and Buyer has agreed to purchase and acquire, all of Seller\'s right, title and interest in and to Armstrong Wood Products, Inc., a Delaware corporation ("AWP") and its Subsidiaries, the Company and HomerWood Hardwood Flooring Company, a Delaware corporation ("HHFC," and together with the Company, the "Company Subsidiaries" and together with AWP, the "Company Entities" and each a "Company Entity") by way of a purchase by Buyer and sale by Seller of the Shares, all upon the terms and condition set forth therein;', "WHEREAS, Arizona owns certain Copyrights, Know-How, Patents and Trademarks which may be used in the Company Field, and in connection with the transactions contemplated by the Stock Purchase Agreement the Company desires to acquire all of Arizona's right, title and interest in and to such Intellectual Property used exclusively in the Company Field, and obtain a license from Arizona to use other such Intellectual Property on the terms and subject to the conditions set forth herein;", 'WHEREAS, Seller is signatory to the Trademark License Agreement pursuant to which Seller obtains a license to the Arizona Licensed Trademarks;', 'WHEREAS, the Company desires to obtain a sublicense to use the Arizona Licensed Trademarks in the Company Field;', 'WHEREAS, Arizona has obtained consent from all counterparties to the Trademark License Agreement to grant to the Company the sublicenses to the Arizona Licensed Trademarks included in this Agreement; and', 'WHEREAS, the Company Entities own certain Copyrights and Know-How which may be used in the Arizona Field, and in connection with the transactions contemplated by the Stock Purchase Agreement, Arizona desires to obtain a license from the Company Entities to use such Intellectual Property on the terms and subject to the conditions set forth herein.', 'NOW, THEREFORE, in consideration of the foregoing and the mutual agreements, provisions and covenants contained in this Agreement, and for other good and valuable consideration, the receipt and sufficiency of which are hereby acknowledged, the Parties hereby agree as follows:', 'Source: ARMSTRONG FLOORING, INC., 8-K, 1/7/2019', 'DEFINITIONS AND INTERPRETATION', 'Certain Definitions. As used herein, capitalized terms have the meaning ascribed to them herein, including the following terms have the meanings set forth below. Capitalized terms that are not defined in this Agreement shall have the meaning set forth in the Stock Purchase Agreement. (a) "Arizona Assigned Copyrights" means all Copyrights, whether registered or unregistered, owned by Licensing or Seller as of the Effective Date and used or held for use exclusively in the Company Field as of November 14, 20', '(the "SPA Signing Date") and/or as of the Effective Date. (b) "Arizona Assigned Internet Domain Names" means the Internet domain names set forth on Schedule 1.1(b) and all other Internet domain names owned by Licensing or Seller as of the Effective Date and used or held for use exclusively in the Company Field as of the SPA Signing Date and/or as of the Effective Date (other than any Internet domain names that include any Arizona Licensed Trademarks). (c) "Arizona Assigned IP" means the Arizona Assigned Copyrights, Arizona Assigned Internet Domain Names, Arizona Assigned Know- How, Arizona Assigned Patents and Arizona Assigned Trademarks. (d) "Arizona Assigned Know-How" means all Know-How owned by Licensing or Seller as of the Effective Date and used or held for use exclusively in the Company Field as of the SPA Signing Date and/or as of the Effective Date. (e) "Arizona Assigned Patents" means the Patents set forth on Schedule 1.1(e) and all other Patents owned by Licensing or Seller and used or held for use exclusively in the Company Field as of the SPA Signing Date and/or as of the Effective Date. (f) "Arizona Assigned Trademarks" means the Trademarks set forth on Schedule 1.1(f) and all other Trademarks owned by Licensing or Seller as of the Effective Date and used or held for use exclusively in the in the Company Field as of the SPA Signing Date and/or as of the Effective Date (other than, for clarity any Arizona Licensed Trademarks). (g) "Arizona Domain Names" means the Internet domain names set forth on Schedule 1.1(g). (h) "Arizona Field" means all activities conducted by Arizona or its Affiliates, other than the Company Field. (i) "Arizona Licensed Copyrights" means all Copyrights owned by Licensing or Seller or their respective Affiliates, as of the Effective Date and used or held for use in the Company Field during the five (5) years prior to the Effective Date (other than the Arizona Assigned Copyrights). 2', 'Source: ARMSTRONG FLOORING, INC., 8-K, 1/7/2019', '(j) "Arizona Licensed IP" means the Arizona Licensed Copyrights, the Arizona Licensed Know-How, the Arizona Licensed Patents, the Arizona Licensed Trademarks, the Diamond Licensed Trademarks and the Phase-Out Marks. (k) "Arizona Licensed Know-How" means all Know-How owned by Licensing or Seller or their respective Affiliates, as of the Effective Date and used or held for use in the Company Field during the five (5) years prior to the Effective Date (other than the Arizona Assigned Know- How). (l) "Arizona Licensed Patents" means the Patents set forth on Schedule 1.1(l) and all other Patents owned by Licensing or Seller or their respective Affiliates as of the Effective Date and used or held for use in the Company Field during the five (5) years prior to the Effective Date (other than the Arizona Assigned Patents). (m) "Arizona Licensed Trademarks" means the Trademarks set forth on Schedule 1.1(m). (n) "Arizona Trademark License Term" means the period commencing on the Effective Date and ending twenty-four (24) months thereafter. (o) "Company Field" means the design, development, manufacture, marketing, promotion, advertising, sourcing, distribution and sale of solid hardwood and engineered wood flooring products by or for any Company Entity. (p) "Company Licensed Copyrights" means all Copyrights and registrations and applications for any of the foregoing owned by any Company Entity as of the Effective Date and used or held for use in the Arizona Field as of the Effective Date. (q) "Company Licensed IP" means the Company Licensed Copyrights, the Company Licensed Know-How and the Company Licensed Patents. (r) "Company Licensed Know-How" means all Know-How owned by any Company Entity as of the Effective Date and used or held for use in the Arizona Field as of the Effective Date. (s) "Company Licensed Patents" means the Patents set forth on Schedule 1.1(s). (t) "Copyrights" means copyrights (whether registered or unregistered) including applications for copyright (excluding, for clarity, Trademarks). (u) "Diamond Licensed Trademarks" means the Trademarks set forth on Schedule 1.1(u). (v) "Diamond Product" means the design, development, manufacture, marketing, promotion, advertising, sourcing, distribution and sale of the solid hardwood flooring product by any Company Entity as conducted under the Diamond Licensed Trademarks by any Company Entity prior to the Effective Date 3', 'Source: ARMSTRONG FLOORING, INC., 8-K, 1/7/2019']
sections.index('GRANT OF COPYRIGHT LICENSE')
30
We will get the first clause after the title as the sample text.
text = sections[31]
text
'Arizona Copyright Grant. Subject to the terms and conditions of this Agreement, Arizona hereby grants to the Company a perpetual, non- exclusive, royalty-free license in, to and under the Arizona Licensed Copyrights for use in the Company Field throughout the world.'
As we say above, we want to extract GRANTS
and GRANTS_TO
relations between the OBLIGATION_SUBJECT, OBLIGATION_ACTION and OBLIGATION_INDIRECT_OBJECT entities. To do this we use legner_obligations
NER model. After that we use legre_zero_shot
model to extract relations.
But !!!make sure you keep the proper syntax of the relations you want to extract!!!
documentAssembler = nlp.DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")
tokenizer = nlp.Tokenizer()\
.setInputCols("document")\
.setOutputCol("token")
tokenClassifier = legal.BertForTokenClassification.pretrained('legner_obligations','en', 'legal/models')\
.setInputCols("token", "document")\
.setOutputCol("ner")\
.setMaxSentenceLength(512)\
.setCaseSensitive(True)
ner_converter = nlp.NerConverter()\
.setInputCols(["document", "token", "ner"])\
.setOutputCol("ner_chunk")
re_model = legal.ZeroShotRelationExtractionModel.pretrained("legre_zero_shot", "en", "legal/models")\
.setInputCols(["ner_chunk", "document"]) \
.setOutputCol("relations")
# Remember it's 2 curly brackets instead of one if you are using Spark NLP < 4.0
re_model.setRelationalCategories({
"GRANTS_TO": ["{OBLIGATION_SUBJECT} grants {OBLIGATION_INDIRECT_OBJECT}"],
"GRANTS": ["{OBLIGATION_SUBJECT} grants {OBLIGATION_ACTION}"]
})
pipeline = nlp.Pipeline(stages = [
document_assembler,
tokenizer,
tokenClassifier,
ner_converter,
re_model
])
empty_df = spark.createDataFrame([[""]]).toDF("text")
model = pipeline.fit(empty_df)
light_model = nlp.LightPipeline(model)
legner_obligations download started this may take some time. [OK!] legre_zero_shot download started this may take some time. [OK!]
result = light_model.fullAnnotate(text)
rel_df = get_relations_df(result)
rel_df[rel_df["relation"] != "no_rel"]
relation | entity1 | entity1_begin | entity1_end | chunk1 | entity2 | entity2_begin | entity2_end | chunk2 | confidence | |
---|---|---|---|---|---|---|---|---|---|---|
0 | GRANTS_TO | OBLIGATION_SUBJECT | 80 | 86 | Arizona | OBLIGATION_INDIRECT_OBJECT | 109 | 115 | Company | 0.9535338 |
1 | GRANTS | OBLIGATION_SUBJECT | 80 | 86 | Arizona | OBLIGATION_ACTION | 88 | 100 | hereby grants | 0.9873099 |
# from sparknlp_display import RelationExtractionVisualizer
re_vis = nlp.viz.RelationExtractionVisualizer()
re_vis.display(result = result[0],
relation_col = "relations",
document_col = "document",
exclude_relations = ["no_rel"],
show_relations=True,
)
You can use Zero-shot RE model with other NER models to get different relations between the different entities.