🎬 Installation¶

In [ ]:

! pip install -q johnsnowlabs

🔗 Automatic Installation¶

Using my.johnsnowlabs.com SSO

In [ ]:

from johnsnowlabs import nlp, legal

# nlp.install(force_browser=True)

🔗 Manual downloading¶

If you are not registered in my.johnsnowlabs.com, you received a license via e-email or you are using Safari, you may need to do a manual update of the license.

Go to my.johnsnowlabs.com
Download your license
Upload it using the following command

In [ ]:

from google.colab import files
print('Please Upload your John Snow Labs License using the button below')
license_keys = files.upload()

Install it

In [ ]:

nlp.install()

📌 Starting¶

In [ ]:

spark = nlp.start()

🔎 Legal Relation Extraction(RE) and Zero-shot Relation Extraction¶

Legal relation extraction is a task in natural language processing (NLP) that involves extracting relationships between entities in legal documents. These relationships can be between people, organizations, or legal concepts.

Legal relation extraction is useful for a variety of purposes, including legal research, contract analysis, and legal case management. For example, legal relation extraction can be used to identify relationships between parties in a contract, such as the buyer and seller, or to extract clauses in a contract that outline certain obligations or rights.

✔️ Pretrained Relation Extraction Models and Pipelines for Legal¶

Here are the list of pretrained Relation Extraction models and pipelines:

Relation Extraction Models

index	model
1	Legal Relation Extraction (Parties, Alias, Dates, Document Type) (Small, Bidirectional)
2	Legal Relation Extraction (Parties, Alias, Dates, Document Type) (Medium, Undirectional)
3	Legal Relation Extraction (Alias)
4	Legal Relation Extraction (Whereas) (Small, Bidirectional)
5	Legal Relation Extraction (Whereas) (Medium, Unidirectional)
6	Legal Relation Extraction (Indemnification) (Small, Bidirectional)
7	Legal Relation Extraction (Indemnification) (Medium, Unidirectional)
8	Legal Relation Extraction (Confidentiality) (Small, Bidirectional)
9	Legal Relation Extraction (Confidentiality) (Medium, Unidirectional)
10	Legal Relation Extraction (Warranty)
11	Legal Relation Extraction (Grants) (Medium, Unidirectional)
12	(Obligations) (Medium, Unidirectional)
13	Legal Relation Extraction (Notice Clause)
14	Legal Zero-shot Relation Extraction
15	Pretrained Pipeline(Whereas)

✔️ Relation Extraction Model to Infer Relations Between Elements in OBLIGATIONS-like sentences¶

An obligation sentence in a legal agreement is a provision that specifies the duties, responsibilities, and obligations of one or more parties to the agreement. These sentences are used to outline the specific actions that a party must take or refrain from taking in order to fulfill their obligations under the agreement.They are an important part of any legal agreement, as they help to ensure that the parties understand and agree to their respective roles and responsibilities.

📚We understand an obligation as a sentence or sentences in which a Party OBLIGATION_SUBJECT must do OBLIGATION_ACITON something OBLIGATION_OBJECT to other Party OBLIGATION_INDIRECT_OBJECT.

In [ ]:

# Create Generic Function to Show Relations in Dataframe

import pandas as pd
def get_relations_df (results, col='relations'):
    rel_pairs=[]
    for i in range(len(results)):
        for rel in results[i][col]:
            rel_pairs.append((
              rel.result, 
              rel.metadata['entity1'], 
              rel.metadata['entity1_begin'],
              rel.metadata['entity1_end'],
              rel.metadata['chunk1'], 
              rel.metadata['entity2'],
              rel.metadata['entity2_begin'],
              rel.metadata['entity2_end'],
              rel.metadata['chunk2'], 
              rel.metadata['confidence']
          ))
    rel_df = pd.DataFrame(rel_pairs, columns=['relation','entity1','entity1_begin','entity1_end','chunk1','entity2','entity2_begin','entity2_end','chunk2', 'confidence'])
    return rel_df

In [ ]:

sample_text = ["""In addition, the Borrowers agree to pay any present or future stamp or documentary taxes or any other excise or property taxes or similar levies which arise from any payment made hereunder or from the execution, delivery, or registration of, or otherwise with respect to, this Agreement or any Note (hereinafter referred to as "OTHER TAXES").""",
              
               """Licensee agrees to reasonably cooperate with Licensor in achieving registration of the Licensed Mark."""]

sample_text               

Out[ ]:

['In addition, the Borrowers agree to pay any present or future stamp or documentary taxes or any other excise or property taxes or similar levies which arise from any payment made hereunder or from the execution, delivery, or registration of, or otherwise with respect to, this Agreement or any Note (hereinafter referred to as "OTHER TAXES").',
 'Licensee agrees to reasonably cooperate with Licensor in achieving registration of the Licensed Mark.']

In [ ]:

document_assembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

tokenizer = nlp.Tokenizer()\
    .setInputCols("document")\
    .setOutputCol("token")

ner_model = legal.BertForTokenClassification.pretrained("legner_obligations", "en", "legal/models")\
    .setInputCols("token", "document")\
    .setOutputCol("ner")\
    .setMaxSentenceLength(512)\
    .setCaseSensitive(True)

ner_converter = nlp.NerConverter()\
    .setInputCols(["document","token","ner"])\
    .setOutputCol("ner_chunk")

re_model = legal.RelationExtractionDLModel().pretrained("legre_obligations_md", "en", "legal/models")\
    .setPredictionThreshold(0.4)\
    .setInputCols(["ner_chunk", "document"])\
    .setOutputCol("relations")

pipeline = nlp.Pipeline(stages=[
        document_assembler, 
        tokenizer,
        ner_model,
        ner_converter,
        re_model
])

empty_df = spark.createDataFrame([[""]]).toDF("text")

model = pipeline.fit(empty_df)

legner_obligations download started this may take some time.
[OK!]
legre_obligations_md download started this may take some time.
[OK!]

In [ ]:

light_model = nlp.LightPipeline(model)

result = light_model.fullAnnotate(sample_text)

In [ ]:

rel_df = get_relations_df(result)

rel_df[rel_df["relation"] != "other"]

Out[ ]:

	relation	entity1	entity1_begin	entity1_end	chunk1	entity2	entity2_begin	entity2_end	chunk2	confidence
0	is_obliged_to	OBLIGATION_ACTION	27	38	agree to pay	OBLIGATION_SUBJECT	13	25	the Borrowers	0.9983413
1	is_obliged_to	OBLIGATION_SUBJECT	13	25	the Borrowers	OBLIGATION	40	143	any present or future stamp or documentary tax...	0.46110857
2	is_obliged_object	OBLIGATION_ACTION	27	38	agree to pay	OBLIGATION	40	143	any present or future stamp or documentary tax...	0.9991379
3	is_obliged_to	OBLIGATION_ACTION	9	38	agrees to reasonably cooperate	OBLIGATION_SUBJECT	0	7	Licensee	0.9090177
4	is_obliged_with	OBLIGATION_SUBJECT	0	7	Licensee	OBLIGATION_INDIRECT_OBJECT	45	52	Licensor	0.8136201
5	is_obliged_to	OBLIGATION	54	100	in achieving registration of the Licensed Mark.	OBLIGATION_SUBJECT	0	7	Licensee	0.86316615
6	is_obliged_object	OBLIGATION_ACTION	9	38	agrees to reasonably cooperate	OBLIGATION_INDIRECT_OBJECT	45	52	Licensor	0.96135247
7	is_obliged_object	OBLIGATION_ACTION	9	38	agrees to reasonably cooperate	OBLIGATION	54	100	in achieving registration of the Licensed Mark.	0.82649904
8	is_obliged_to	OBLIGATION_INDIRECT_OBJECT	45	52	Licensor	OBLIGATION	54	100	in achieving registration of the Licensed Mark.	0.9142798

In [ ]:

re_vis = nlp.viz.RelationExtractionVisualizer()

for i in range(len(sample_text)):
  re_vis.display(result = result[i],
            relation_col = "relations",
            document_col = "document",
            exclude_relations = ["other"],
            show_relations=True
            )

✔️ Zero Shot Relation Extraction to Extract Relations Between Legal Entities¶

Now, let's suppose we want to extract GRANTS and GRANTS_TO relations between the OBLIGATION_SUBJECT, OBLIGATION_ACTION and OBLIGATION_INDIRECT_OBJECT entities. We don't have a model to do that, but!

That's when Zero-shot RE comes into the game. You can use Zero-shot RE model without training data and without any pretrained model to create your RE model.

✔️ A variation of NLI for Zero-shot Relation Extraction¶

Similarly to Zero-shot NER, Zero-shot RE also works with H (hypotheses) and P (premises), and the extraction as a positive result is conditioned to the H being entailed given a P.

📜In this case, what we do is:

We took a prompt in the form of {ENT_1} [some_text] {ENT_2}
ENT_1 is filled with entities from a previous NER
ENT_2 too.
We ask the ZeroShotRE model if, given the whole text, the premise {ENT_1} [some_text] {ENT_2} is entailed.

For example, ENT_1 is PARTY. ENT_2 is DOC. [some_text] is was signed.

Given a premise Meta, Inc. signed a Purchase Agreement with Whatsapp, Inc., the result of the previous prompt will be entailed for both Meta, Inc. and Purchase Agreement and Whatsapp, Inc. and Purchase Agreement.

🔎 Some examples¶

Just few examples of the relations types you are looking for, to output a proper result.

⚡!!!Make sure you keep the proper syntax of the relations you want to extract!!!

Firstly, we will download sample dataset and do all progress on it.

In [ ]:

! wget -q https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp-workshop/master/legal-nlp/data/intellectual_property_agreement.txt

In [ ]:

with open('intellectual_property_agreement.txt', 'r') as f:
  agreement = f.read()
print(agreement[:1500])

Exhibit 10.2

Execution Version

INTELLECTUAL PROPERTY AGREEMENT

This INTELLECTUAL PROPERTY AGREEMENT (this "Agreement"), dated as of December 31, 2018 (the "Effective Date") is entered into by and between Armstrong Flooring, Inc., a Delaware corporation ("Seller") and AFI Licensing LLC, a Delaware limited liability company ("Licensing" and together with Seller, "Arizona") and AHF Holding, Inc. (formerly known as Tarzan HoldCo, Inc.), a Delaware corporation ("Buyer") and Armstrong Hardwood Flooring Company, a Tennessee corporation (the "Company" and together with Buyer the "Buyer Entities") (each of Arizona on the one hand and the Buyer Entities on the other hand, a "Party" and collectively, the "Parties").

WHEREAS, Seller and Buyer have entered into that certain Stock Purchase Agreement, dated November 14, 2018 (the "Stock Purchase Agreement"); WHEREAS, pursuant to the Stock Purchase Agreement, Seller has agreed to sell and transfer, and Buyer has agreed to purchase and acquire, all of Seller's right, title and interest in and to Armstrong Wood Products, Inc., a Delaware corporation ("AWP") and its Subsidiaries, the Company and HomerWood Hardwood Flooring Company, a Delaware corporation ("HHFC," and together with the Company, the "Company Subsidiaries" and together with AWP, the "Company Entities" and each a "Company Entity") by way of a purchase by Buyer and sale by Seller of the Shares, all upon the terms and condition set forth therein;

WHEREAS, Arizona owns certain Co

📚 Get sample clause from agreement¶

Firstly, we will get a sanple text from agreement. We will use GRANT OF COPYRIGHT LICENSE clauses. So, we will split the agreement to get that clauses.

In [ ]:

document_assembler = nlp.DocumentAssembler() \
    .setInputCol("text") \
    .setOutputCol("document")

text_splitter = legal.TextSplitter() \
    .setInputCols(["document"]) \
    .setOutputCol("sections")\
    .setCustomBounds(["\n\n","\d\.?\d? "])\
    .setUseCustomBoundsOnly(True)\
    .setExplodeSentences(True)

nlp_pipeline = nlp.Pipeline(stages=[
    document_assembler,
    text_splitter])

empty_df = spark.createDataFrame([[""]]).toDF("text")

model = nlp_pipeline.fit(empty_df)

light_model = nlp.LightPipeline(model)

In [ ]:

result = light_model.annotate(agreement)

sections = result['sections']

In [ ]:

sections[:20]

Out[ ]:

['Exhibit 10.2',
 'Execution Version',
 'INTELLECTUAL PROPERTY AGREEMENT',
 'This INTELLECTUAL PROPERTY AGREEMENT (this "Agreement"), dated as of December 31, 20',
 '(the "Effective Date") is entered into by and between Armstrong Flooring, Inc., a Delaware corporation ("Seller") and AFI Licensing LLC, a Delaware limited liability company ("Licensing" and together with Seller, "Arizona") and AHF Holding, Inc. (formerly known as Tarzan HoldCo, Inc.), a Delaware corporation ("Buyer") and Armstrong Hardwood Flooring Company, a Tennessee corporation (the "Company" and together with Buyer the "Buyer Entities") (each of Arizona on the one hand and the Buyer Entities on the other hand, a "Party" and collectively, the "Parties").',
 'WHEREAS, Seller and Buyer have entered into that certain Stock Purchase Agreement, dated November 14, 20',
 '(the "Stock Purchase Agreement"); WHEREAS, pursuant to the Stock Purchase Agreement, Seller has agreed to sell and transfer, and Buyer has agreed to purchase and acquire, all of Seller\'s right, title and interest in and to Armstrong Wood Products, Inc., a Delaware corporation ("AWP") and its Subsidiaries, the Company and HomerWood Hardwood Flooring Company, a Delaware corporation ("HHFC," and together with the Company, the "Company Subsidiaries" and together with AWP, the "Company Entities" and each a "Company Entity") by way of a purchase by Buyer and sale by Seller of the Shares, all upon the terms and condition set forth therein;',
 "WHEREAS, Arizona owns certain Copyrights, Know-How, Patents and Trademarks which may be used in the Company Field, and in connection with the transactions contemplated by the Stock Purchase Agreement the Company desires to acquire all of Arizona's right, title and interest in and to such Intellectual Property used exclusively in the Company Field, and obtain a license from Arizona to use other such Intellectual Property on the terms and subject to the conditions set forth herein;",
 'WHEREAS, Seller is signatory to the Trademark License Agreement pursuant to which Seller obtains a license to the Arizona Licensed Trademarks;',
 'WHEREAS, the Company desires to obtain a sublicense to use the Arizona Licensed Trademarks in the Company Field;',
 'WHEREAS, Arizona has obtained consent from all counterparties to the Trademark License Agreement to grant to the Company the sublicenses to the Arizona Licensed Trademarks included in this Agreement; and',
 'WHEREAS, the Company Entities own certain Copyrights and Know-How which may be used in the Arizona Field, and in connection with the transactions contemplated by the Stock Purchase Agreement, Arizona desires to obtain a license from the Company Entities to use such Intellectual Property on the terms and subject to the conditions set forth herein.',
 'NOW, THEREFORE, in consideration of the foregoing and the mutual agreements, provisions and covenants contained in this Agreement, and for other good and valuable consideration, the receipt and sufficiency of which are hereby acknowledged, the Parties hereby agree as follows:',
 'Source: ARMSTRONG FLOORING, INC., 8-K, 1/7/2019',
 'DEFINITIONS AND INTERPRETATION',
 'Certain Definitions. As used herein, capitalized terms have the meaning ascribed to them herein, including the following terms have the meanings set forth below. Capitalized terms that are not defined in this Agreement shall have the meaning set forth in the Stock Purchase Agreement. (a) "Arizona Assigned Copyrights" means all Copyrights, whether registered or unregistered, owned by Licensing or Seller as of the Effective Date and used or held for use exclusively in the Company Field as of November 14, 20',
 '(the "SPA Signing Date") and/or as of the Effective Date. (b) "Arizona Assigned Internet Domain Names" means the Internet domain names set forth on Schedule 1.1(b) and all other Internet domain names owned by Licensing or Seller as of the Effective Date and used or held for use exclusively in the Company Field as of the SPA Signing Date and/or as of the Effective Date (other than any Internet domain names that include any Arizona Licensed Trademarks). (c) "Arizona Assigned IP" means the Arizona Assigned Copyrights, Arizona Assigned Internet Domain Names, Arizona Assigned Know- How, Arizona Assigned Patents and Arizona Assigned Trademarks. (d) "Arizona Assigned Know-How" means all Know-How owned by Licensing or Seller as of the Effective Date and used or held for use exclusively in the Company Field as of the SPA Signing Date and/or as of the Effective Date. (e) "Arizona Assigned Patents" means the Patents set forth on Schedule 1.1(e) and all other Patents owned by Licensing or Seller and used or held for use exclusively in the Company Field as of the SPA Signing Date and/or as of the Effective Date. (f) "Arizona Assigned Trademarks" means the Trademarks set forth on Schedule 1.1(f) and all other Trademarks owned by Licensing or Seller as of the Effective Date and used or held for use exclusively in the in the Company Field as of the SPA Signing Date and/or as of the Effective Date (other than, for clarity any Arizona Licensed Trademarks). (g) "Arizona Domain Names" means the Internet domain names set forth on Schedule 1.1(g). (h) "Arizona Field" means all activities conducted by Arizona or its Affiliates, other than the Company Field. (i) "Arizona Licensed Copyrights" means all Copyrights owned by Licensing or Seller or their respective Affiliates, as of the Effective Date and used or held for use in the Company Field during the five (5) years prior to the Effective Date (other than the Arizona Assigned Copyrights). 2',
 'Source: ARMSTRONG FLOORING, INC., 8-K, 1/7/2019',
 '(j) "Arizona Licensed IP" means the Arizona Licensed Copyrights, the Arizona Licensed Know-How, the Arizona Licensed Patents, the Arizona Licensed Trademarks, the Diamond Licensed Trademarks and the Phase-Out Marks. (k) "Arizona Licensed Know-How" means all Know-How owned by Licensing or Seller or their respective Affiliates, as of the Effective Date and used or held for use in the Company Field during the five (5) years prior to the Effective Date (other than the Arizona Assigned Know- How). (l) "Arizona Licensed Patents" means the Patents set forth on Schedule 1.1(l) and all other Patents owned by Licensing or Seller or their respective Affiliates as of the Effective Date and used or held for use in the Company Field during the five (5) years prior to the Effective Date (other than the Arizona Assigned Patents). (m) "Arizona Licensed Trademarks" means the Trademarks set forth on Schedule 1.1(m). (n) "Arizona Trademark License Term" means the period commencing on the Effective Date and ending twenty-four (24) months thereafter. (o) "Company Field" means the design, development, manufacture, marketing, promotion, advertising, sourcing, distribution and sale of solid hardwood and engineered wood flooring products by or for any Company Entity. (p) "Company Licensed Copyrights" means all Copyrights and registrations and applications for any of the foregoing owned by any Company Entity as of the Effective Date and used or held for use in the Arizona Field as of the Effective Date. (q) "Company Licensed IP" means the Company Licensed Copyrights, the Company Licensed Know-How and the Company Licensed Patents. (r) "Company Licensed Know-How" means all Know-How owned by any Company Entity as of the Effective Date and used or held for use in the Arizona Field as of the Effective Date. (s) "Company Licensed Patents" means the Patents set forth on Schedule 1.1(s). (t) "Copyrights" means copyrights (whether registered or unregistered) including applications for copyright (excluding, for clarity, Trademarks). (u) "Diamond Licensed Trademarks" means the Trademarks set forth on Schedule 1.1(u). (v) "Diamond Product" means the design, development, manufacture, marketing, promotion, advertising, sourcing, distribution and sale of the solid hardwood flooring product by any Company Entity as conducted under the Diamond Licensed Trademarks by any Company Entity prior to the Effective Date 3',
 'Source: ARMSTRONG FLOORING, INC., 8-K, 1/7/2019']

In [ ]:

sections.index('GRANT OF COPYRIGHT LICENSE')

Out[ ]:

We will get the first clause after the title as the sample text.

In [ ]:

text = sections[31]

text

Out[ ]:

'Arizona Copyright Grant. Subject to the terms and conditions of this Agreement, Arizona hereby grants to the Company a perpetual, non- exclusive, royalty-free license in, to and under the Arizona Licensed Copyrights for use in the Company Field throughout the world.'

📚 Extract Relations with Zero-shot RE Model¶

As we say above, we want to extract GRANTS and GRANTS_TO relations between the OBLIGATION_SUBJECT, OBLIGATION_ACTION and OBLIGATION_INDIRECT_OBJECT entities. To do this we use legner_obligations NER model. After that we use legre_zero_shot model to extract relations.

But !!!make sure you keep the proper syntax of the relations you want to extract!!!

In [ ]:

documentAssembler = nlp.DocumentAssembler()\
    .setInputCol("text")\
    .setOutputCol("document")

tokenizer = nlp.Tokenizer()\
    .setInputCols("document")\
    .setOutputCol("token")

tokenClassifier = legal.BertForTokenClassification.pretrained('legner_obligations','en', 'legal/models')\
    .setInputCols("token", "document")\
    .setOutputCol("ner")\
    .setMaxSentenceLength(512)\
    .setCaseSensitive(True)

ner_converter = nlp.NerConverter()\
    .setInputCols(["document", "token", "ner"])\
    .setOutputCol("ner_chunk")

re_model = legal.ZeroShotRelationExtractionModel.pretrained("legre_zero_shot", "en", "legal/models")\
    .setInputCols(["ner_chunk", "document"]) \
    .setOutputCol("relations")

# Remember it's 2 curly brackets instead of one if you are using Spark NLP < 4.0
re_model.setRelationalCategories({
    "GRANTS_TO": ["{OBLIGATION_SUBJECT} grants {OBLIGATION_INDIRECT_OBJECT}"],
    "GRANTS": ["{OBLIGATION_SUBJECT} grants {OBLIGATION_ACTION}"]
})

pipeline = nlp.Pipeline(stages = [
                document_assembler,  
                tokenizer,
                tokenClassifier, 
                ner_converter,
                re_model
               ])

empty_df = spark.createDataFrame([[""]]).toDF("text")

model = pipeline.fit(empty_df)

light_model = nlp.LightPipeline(model)

legner_obligations download started this may take some time.
[OK!]
legre_zero_shot download started this may take some time.
[OK!]

In [ ]:

result = light_model.fullAnnotate(text)

rel_df = get_relations_df(result)

rel_df[rel_df["relation"] != "no_rel"]

Out[ ]:

	relation	entity1	entity1_begin	entity1_end	chunk1	entity2	entity2_begin	entity2_end	chunk2	confidence
0	GRANTS_TO	OBLIGATION_SUBJECT	80	86	Arizona	OBLIGATION_INDIRECT_OBJECT	109	115	Company	0.9535338
1	GRANTS	OBLIGATION_SUBJECT	80	86	Arizona	OBLIGATION_ACTION	88	100	hereby grants	0.9873099

📚 Visualization of Extracted Relations¶

In [ ]:

# from sparknlp_display import RelationExtractionVisualizer

re_vis = nlp.viz.RelationExtractionVisualizer()

re_vis.display(result = result[0],
           relation_col = "relations",
           document_col = "document",
           exclude_relations = ["no_rel"],
           show_relations=True,
           )

You can use Zero-shot RE model with other NER models to get different relations between the different entities.