📜One of the latests biggest outcomes in NLP are Language Models and their ability to answer questions, expressed in natural language.
*While our gross profit margin increased to 81.4% in 2020 from 63.1% in 2019, our revenues declined approximately 27% in 2020...
... We reported an operating loss of approxiamtely $8,048,581 million in 2020 as compared to an operating loss of $7,738,193 in 2019 ...*
- What is the profit increase?
- What was the decline in revenue?
- What was the operation loss in 2020?
- What was the operation loss in 2019?
📜
Question Answeering (QA) uses specific Language Models trained to carry out Natural Language Inference (NLI)
NLI works as follows:
entailed
, contradicted
or not related
in P.Although we are not getting into the maths of it, it's basically done by using a Language Model to encode P, H and then carry out sentence similarity operations.
The most straight-forward, retrieving answers to natural language questions.
At John Snow Labs, we have developed our own annotators based on NLI, to not only carry out Question Answering, but using QA to:
Given a Question Q, for example, What was the profit increase in 2017?
, and given the text P In 2017, the Company reported a profit decline of $4 million dollars compared to 2016
we:
Generate Hypotheses H with the tokens of the text
contradiction
contradiction
contradiction
entailment
entailment
We check all the H towards P to see if they are entailed
. If so, we return them as NER entity. If several tokens in a row return entailed
, we check if they can be part of the same chunk.
Let's take a look at some examples of applications of QA to Financial Texts.
! pip install -q johnsnowlabs
Using my.johnsnowlabs.com SSO
from johnsnowlabs import nlp, finance
# nlp.install(force_browser=True)
If you are not registered in my.johnsnowlabs.com, you received a license via e-email or you are using Safari, you may need to do a manual update of the license.
from google.colab import files
print('Please Upload your John Snow Labs License using the button below')
license_keys = files.upload()
nlp.install()
spark = nlp.start()
👌 Detected license file /content/spark_nlp_for_healthcare_spark_ocr_7162 (6).json
👌 Launched cpu optimized session with with: 🚀Spark-NLP==4.4.1, 💊Spark-Healthcare==4.4.2, running on ⚡ PySpark==3.1.2
! wget https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp-workshop/master/finance-nlp/data/cdns-20220101.html.txt
with open('cdns-20220101.html.txt', 'r') as f:
cadence_sec10k = f.read()
Let's take a random piece of text from our 10-K filing...
random_piece = cadence_sec10k[135000:144000]
print(random_piece)
necessary, on commercially reasonable terms or at all and, even if successful, those alternative actions may not allow us to meet our scheduled debt service obligations. The agreement governing our revolving credit facility restricts our ability to dispose of assets and use the proceeds from those dispositions and may also restrict our ability to raise debt or equity capital to be used to repay other indebtedness when it becomes due. We may not be able to consummate those dispositions or to obtain proceeds in an amount sufficient to meet any debt service obligations then due. In addition, we conduct a substantial portion of our operations through our subsidiaries, none of which are currently guarantors of our indebtedness. Accordingly, repayment of our indebtedness is dependent on the generation of cash flow by our subsidiaries and their ability to make such cash available to us, by dividend, debt repayment or otherwise. Our subsidiaries do not have any obligation to pay amounts due on our indebtedness or to make funds available for that purpose. Our subsidiaries may not be able to, or may not be permitted to, make distributions to enable us to make payments in respect of our indebtedness. Each subsidiary is a distinct legal entity, and, under certain circumstances, legal and contractual restrictions may limit our ability to obtain cash from our subsidiaries. In the event that we do not receive distributions from our subsidiaries, we may be unable to make required principal and interest payments on our indebtedness. 24 Table of Contents If we cannot make scheduled payments on our debt, we will be in default and holders of our debt could declare all outstanding principal and interest to be due and payable, the lenders under our revolving credit facility could terminate their commitments to loan money and we could be forced into bankruptcy or liquidation. In addition, a material default on our indebtedness could suspend our eligibility to register securities using certain registration statement forms under SEC guidelines that permit incorporation by reference of substantial information regarding us, potentially hindering our ability to raise capital through the issuance of our securities and increasing our costs of registration. Despite our current level of indebtedness, we and our subsidiaries may incur substantially more debt. This could further exacerbate the risks to our financial condition described above. We and our subsidiaries may incur significant additional indebtedness in the future. Although the agreement governing our revolving credit facility contains restrictions on the incurrence of additional indebtedness, these restrictions are subject to a number of qualifications and exceptions, and the additional indebtedness incurred in compliance with these restrictions could be substantial. If we incur any additional indebtedness that ranks equally with the 2024 Notes, then subject to any collateral arrangements we may enter into, the holders of that debt will be entitled to share ratably in any proceeds distributed in connection with any insolvency, liquidation, reorganization, dissolution or other winding up of our company. Our variable rate indebtedness subjects us to interest rate risk, which could cause our debt service obligations to increase significantly. Borrowings under our revolving credit facility are at variable rates of interest and expose us to interest rate risk. If interest rates were to increase, our debt service obligations on our variable rate indebtedness would increase even though the amount borrowed remained the same, and our net income and cash flows, including cash available for servicing our indebtedness, would correspondingly decrease. In the future, we may enter into interest rate swaps that involve the exchange of floating for fixed rate interest payments in order to reduce interest rate volatility. However, we may not maintain interest rate swaps with respect to all of our variable rate indebtedness, and any swaps we enter into may not fully mitigate our interest rate risk. Our revolving credit facility utilizes, at our option, either (1) LIBOR, plus a margin of between 0.750% and 1.250%, determined by reference to the credit rating of our unsecured debt, or (2) base rate plus a margin of 0.000% to 0.250%, determined by reference to the credit rating of our unsecured debt, to calculate the amount of accrued interest on any borrowings. Regulators in certain jurisdictions including the United Kingdom and the United States have begun to phase out the use of LIBOR, ceasing publication for certain tenors of the U.S. dollar (and other) LIBOR at the end of 2021, with plans to cease publication for the remaining tenors of U.S. dollar LIBOR beginning June 30, 2023. Our revolving credit facility contains provisions that contemplate the transition from LIBOR under specified events; however, the transition from LIBOR to a new replacement benchmark remains uncertain at this time and the consequences of such developments cannot be entirely predicted, but could result in an increase in the cost of our borrowings under our existing credit facility and any future borrowings. In addition, our revolving credit facility uses a pricing grid based on our credit ratings. If our credit ratings are downgraded or other negative action is taken, the interest rate payable by us under our revolving credit facility would increase. Credit rating downgrades could also restrict our ability to obtain additional financing in the future and affect the terms of any such financing. Various factors could increase our future borrowing costs or reduce our access to capital, including a lowering or withdrawal of the ratings assigned to us and our 2024 Notes by credit rating agencies. We may in the future seek additional financing for a variety of reasons, and our future borrowing costs, terms and access to capital could be affected by factors including the condition of the debt and equity markets, the condition of the economy generally, prevailing interest rates, our level of indebtedness, our credit rating and our business and financial condition. In addition, the 2024 Notes currently have an investment grade credit rating, which could be lowered or withdrawn entirely by a credit rating agency based on adverse changes to circumstances relating to the basis of the credit rating. Consequently, real or anticipated changes in our credit ratings will generally affect the market value of the 2024 Notes. Any future lowering of the credit ratings of the 2024 Notes likely would make it more difficult or more expensive for us to obtain additional debt financing. Item 1B. Unresolved Staff Comments None. Item 2. Properties We own land and buildings at our headquarters located in San Jose, California. We also own buildings in India. As of January 1, 2022, the total square footage of our owned buildings was approximately 1,010,000. We lease additional facilities in the United States and various other countries. We may sublease certain of these facilities where space is not fully utilized. We believe that these facilities are adequate for our current needs and that suitable additional or substitute space will be available as needed to accommodate any expansion of our operations. 25 Table of Contents Item 3. Legal Proceedings From time to time, we are involved in various disputes and legal proceedings that arise in the ordinary course of business. These include disputes and legal proceedings related to intellectual property, indemnification obligations, mergers and acquisitions, licensing, contracts, customers, products, distribution and other commercial arrangements and employee relations matters. At least quarterly, we review the status of each significant matter and assess its potential financial exposure. If the potential loss from any claim or legal proceeding is considered probable and the amount or the range of loss can be estimated, we accrue a liability for the estimated loss. Legal proceedings are subject to uncertainties, and the outcomes are difficult to predict. Because of such uncertainties, accruals are based on our judgments using the best information available at the time. As additional information becomes available, we reassess the potential liability related to pending claims and legal proceedings and may revise estimates. Item 4. Mine Safety Disclosures Not applicable. 26 Table of Contents PART II. Item 5. Market for Registrant’s Common Equity, Related Stockholder Matters and Issuer Purchases of Equity Securities Our common stock is traded on the Nasdaq Global Select Market under the symbol CDNS. As of February 5, 2022, we had 384 registered stockholders and approximately 340,000 beneficial owners of our common stock. Stockholder Return Performance Graph The following graph compares the cumulative 5-year total stockholder return on our common stock relative to the cumulative total return of the Nasdaq Composite Index,
Items 2,3, and 5 seem good to ask questions about them!
item2 = """We own land and buildings at our headquarters located in San Jose, California. We also own buildings in India. As of January 1, 2022, the total square footage of our owned buildings was approximately 1,010,000.
We lease additional facilities in the United States and various other countries. We may sublease certain of these facilities where space is not fully utilized."""
item3 = """From time to time, we are involved in various disputes and legal proceedings that arise in the ordinary course of business. These include disputes and legal proceedings related to intellectual property, indemnification obligations, mergers and acquisitions, licensing, contracts, customers, products, distribution and other commercial arrangements and employee relations matters. At least quarterly, we review the status of each significant matter and assess its potential financial exposure. If the potential loss from any claim or legal proceeding is considered probable and the amount or the range of loss can be estimated, we accrue a liability for the estimated loss. Legal proceedings are subject to uncertainties, and the outcomes are difficult to predict. Because of such uncertainties, accruals are based on our judgments using the best information available at the time. As additional information becomes available, we reassess the potential liability related to pending claims and legal proceedings and may revise estimates."""
item5 = """Our common stock is traded on the Nasdaq Global Select Market under the symbol CDNS. As of February 5, 2022, we had 384 registered stockholders and approximately 340,000 beneficial owners of our common stock."""
We will use a RoBerta
based QA model named finqa_roberta
📜To do that, we use in our pipelines:
MultiDocumentAssembler
, which puts together questions (Q to create H) and context (P).🚀IMPORTANT: We highly recommend to use setCaseSensitive(False)
to prevent uppercase to be managed as proper nouns and possibly trigger OOV.
documentAssembler = nlp.MultiDocumentAssembler()\
.setInputCols(["question", "context"])\
.setOutputCols(["document_question", "document_context"])
spanClassifier = nlp.BertForQuestionAnswering.pretrained("finqa_bert","en", "finance/models") \
.setInputCols(["document_question", "document_context"]) \
.setOutputCol("answer") \
.setCaseSensitive(False)
qa_pipeline = nlp.Pipeline().setStages([
documentAssembler,
spanClassifier
])
finqa_bert download started this may take some time. Approximate size to download 389 MB [OK!]
P = item2
Q = [
"Where are the headquarters?",
"What is the total square footage?",
"In which countries do they lease facilities?"
]
Q_P = [ [q, P] for q in Q]
example = spark.createDataFrame(Q_P).toDF("question", "context")
example.show()
+--------------------+--------------------+ | question| context| +--------------------+--------------------+ |Where are the hea...|We own land and b...| |What is the total...|We own land and b...| |In which countrie...|We own land and b...| +--------------------+--------------------+
result = qa_pipeline.fit(example).transform(example)
result.select('question', 'answer.result', 'answer').show(truncate=False)
+--------------------------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |question |result |answer | +--------------------------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |Where are the headquarters? |[San Jose , California]|[{chunk, 0, 20, San Jose , California, {chunk -> 0, start_score -> 0.8019189, score -> 0.83842176, end -> 20, start -> 17, end_score -> 0.8749246, sentence -> 0}, []}]| |What is the total square footage? |[1 , 010 , 000] |[{chunk, 0, 12, 1 , 010 , 000, {chunk -> 0, start_score -> 0.66597635, score -> 0.7811918, end -> 55, start -> 50, end_score -> 0.89640725, sentence -> 0}, []}] | |In which countries do they lease facilities?|[United States] |[{chunk, 0, 12, United States, {chunk -> 0, start_score -> 0.5888994, score -> 0.52136713, end -> 64, start -> 63, end_score -> 0.4538349, sentence -> 0}, []}] | +--------------------------------------------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+
P = item5
Q = [
"Where is their common stock traded?",
"Which is the trading symbol?"
]
Q_P = [ [q, P] for q in Q]
example = spark.createDataFrame(Q_P).toDF("question", "context")
result = qa_pipeline.fit(example).transform(example)
result.select('question', 'answer.result', 'answer').show(truncate=False)
+-----------------------------------+-----------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |question |result |answer | +-----------------------------------+-----------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |Where is their common stock traded?|[Nasdaq Global Select Market]|[{chunk, 0, 26, Nasdaq Global Select Market, {chunk -> 0, start_score -> 0.30269945, score -> 0.41721502, end -> 21, start -> 16, end_score -> 0.5317306, sentence -> 0}, []}]| |Which is the trading symbol? |[CDNS] |[{chunk, 0, 3, CDNS, {chunk -> 0, start_score -> 0.8779542, score -> 0.8447887, end -> 25, start -> 24, end_score -> 0.8116232, sentence -> 0}, []}] | +-----------------------------------+-----------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
P = item3
Q = [
"What kind of disputes or legal proceedings related to?"
]
Q_P = [ [q, P] for q in Q]
example = spark.createDataFrame(Q_P).toDF("question", "context")
result = qa_pipeline.fit(example).transform(example)
result.select('question', 'answer.result', 'answer').show(truncate=False)
+------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |question |result |answer | +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |What kind of disputes or legal proceedings related to?|[intellectual property , indemnification obligations , mergers and acquisitions , licensing , contracts , customers , products , distribution and other commercial arrangements and employee relations matters]|[{chunk, 0, 204, intellectual property , indemnification obligations , mergers and acquisitions , licensing , contracts , customers , products , distribution and other commercial arrangements and employee relations matters, {chunk -> 0, start_score -> 0.63349277, score -> 0.56178546, end -> 71, start -> 43, end_score -> 0.4900781, sentence -> 0}, []}]| +------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Now the question is ... is there a way to generate the questions automatically?
The answer is simple: YES, there is!
We have several ways to generate a series of questions, given for examplee:
SUBJECT
of a sentence;ACTION
(verb);More specifically, there are three ways:
Check the notebook "Automatic Question Generation" for examples of how to do it.
For table question answering we have a specific notebok you will find in this workshop. Feel free to check it out too!
But it the meantime, a small spoiler...
Let's create a csv
file with information about clients and agreements.
import pandas as pd
df_data = {
"header" : ['client name', 'last operation year', 'last operation amount', 'document'],
"rows" : [
['John Smith', '2007', '$200000', 'NDA'],
['Jack Gordon', '2017', '$10000', 'Credit Agreement'],
['Mary Lean', '2001', '$120000', 'License Agreement'],
['Jessica James', '2022', '$1200000', 'Purchase Agreement'],
]
}
df = pd.DataFrame(df_data['rows'], columns=df_data['header'])
df.to_csv('table.csv', index=False)
df_data
{'header': ['client name', 'last operation year', 'last operation amount', 'document'], 'rows': [['John Smith', '2007', '$200000', 'NDA'], ['Jack Gordon', '2017', '$10000', 'Credit Agreement'], ['Mary Lean', '2001', '$120000', 'License Agreement'], ['Jessica James', '2022', '$1200000', 'Purchase Agreement']]}
df
client name | last operation year | last operation amount | document | |
---|---|---|---|---|
0 | John Smith | 2007 | $200000 | NDA |
1 | Jack Gordon | 2017 | $10000 | Credit Agreement |
2 | Mary Lean | 2001 | $120000 | License Agreement |
3 | Jessica James | 2022 | $1200000 | Purchase Agreement |
import json
json.dumps(df_data)
'{"header": ["client name", "last operation year", "last operation amount", "document"], "rows": [["John Smith", "2007", "$200000", "NDA"], ["Jack Gordon", "2017", "$10000", "Credit Agreement"], ["Mary Lean", "2001", "$120000", "License Agreement"], ["Jessica James", "2022", "$1200000", "Purchase Agreement"]]}'
Now, some questions...
queries = [
"Who signed an NDA?",
"Who operated last time in 2022?",
"What is the total amount of operations?",
"Which year a Credit Agreement was signed?",
]
Now, we will use the following specific components:
MultiDocumentAssembler
, to put together the questions and the table in json
formatTableAssembler
to assemble the table from a jsondata = spark.createDataFrame([
[json.dumps(df_data), " ".join(queries)]
]).toDF("table_json", "questions")
data.show()
+--------------------+--------------------+ | table_json| questions| +--------------------+--------------------+ |{"header": ["clie...|Who signed an NDA...| +--------------------+--------------------+
document_assembler = nlp.MultiDocumentAssembler() \
.setInputCols("table_json", "questions") \
.setOutputCols("document_table", "document_questions")
text_splitter = finance.TextSplitter() \
.setInputCols(["document_questions"]) \
.setOutputCol("questions")
table_assembler = nlp.TableAssembler()\
.setInputCols(["document_table"])\
.setOutputCol("table")
Last component is TapasForQuestionAnswering
, which will carry out the inference process
tapas = nlp.TapasForQuestionAnswering.pretrained("table_qa_tapas_base_finetuned_wtq", "en")\
.setInputCols(["questions", "table"])\
.setOutputCol("answers")
table_qa_tapas_base_finetuned_wtq download started this may take some time. Approximate size to download 394.7 MB [OK!]
Now the pipeline looks as follows:
pipeline = nlp.Pipeline(stages=[
document_assembler,
text_splitter,
table_assembler,
tapas
])
And this is the result on fit/transform:
model = pipeline.fit(data)
res = model\
.transform(data)\
.selectExpr("explode(answers) AS answer")\
.select("answer")
res.show(truncate=False)
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |answer | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |{chunk, 0, 10, John Smith, {question -> Who signed an NDA?, aggregation -> NONE, cell_positions -> [0, 0], cell_scores -> 1.0}, []} | |{chunk, 0, 13, Jessica James, {question -> Who operated last time in 2022?, aggregation -> NONE, cell_positions -> [0, 3], cell_scores -> 1.0}, []} | |{chunk, 0, 41, COUNT($200000, $10000, $120000, $1200000), {question -> What is the total amount of operations?, aggregation -> COUNT, cell_positions -> [2, 0], [2, 1], [2, 2], [2, 3], cell_scores -> 1.0, 1.0, 1.0, 1.0}, []}| |{chunk, 0, 4, 2017, {question -> Which year a Credit Agreement was signed?, aggregation -> NONE, cell_positions -> [1, 1], cell_scores -> 1.0}, []} | +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
from pyspark.sql import functions as F
res.select("answer.metadata.question", F.expr('answer.result as answer'), F.expr('answer.metadata["aggregation"] as metadata')).show(truncate=False)
+-----------------------------------------+-----------------------------------------+--------+ |question |answer |metadata| +-----------------------------------------+-----------------------------------------+--------+ |Who signed an NDA? |John Smith |NONE | |Who operated last time in 2022? |Jessica James |NONE | |What is the total amount of operations? |COUNT($200000, $10000, $120000, $1200000)|COUNT | |Which year a Credit Agreement was signed?|2017 |NONE | +-----------------------------------------+-----------------------------------------+--------+
You will need Visual NLP
, another licensed product of JSL, to extract tables from documents.
The result will be just a csv, so you can apply the same code exposed above after you extract the table from your documents.
Check the notebook Financial_Visual_Document_Understanding
for more details. In the meantime, a small spoiler...
FLAN-T5 model is a state-of-the-art language model developed by Google AI that utilizes the T5 architecture for text generation tasks. The model is an encoder-decoder model that has been pre-trained on a multi-task mixture of unsupervised and supervised tasks and for which each task is converted into a text-to-text format.
During the training phase, FLAN-T5 was fed a large corpus of text data and was trained to predict missing words in an input text via a fill-in-the-blank style objective. This process is repeated multiple times until the model has learned to generate text that is similar to the input data.
Once trained, FLAN-T5 can be used to perform a variety of NLP tasks, such as text generation, language translation, sentiment analysis, and text classification.
What are a few Use-cases?
FLAN-T5 has a few potential use-cases:
This finqa_flant5_finetuned
Question Answering model has been fine-tuned on FLANT5 using finance data. This model provides powerful and efficient solution for accurately answering finance questions and delivering insightful information in the finance domain.
document_assembler = nlp.MultiDocumentAssembler()\
.setInputCols("question", "context")\
.setOutputCols("document_question", "document_context")
fin_qa = finance.QuestionAnswering.pretrained("finqa_flant5_finetuned","en","finance/models")\
.setInputCols(["document_question", "document_context"])\
.setCustomPrompt("question: {QUESTION} context: {CONTEXT}")\
.setMaxNewTokens(100)\
.setOutputCol("answer")
pipeline = nlp.Pipeline(stages=[document_assembler, fin_qa])
empty_data = spark.createDataFrame([["",""]]).toDF("question", "context")
model = pipeline.fit(empty_data)
finqa_flant5_finetuned download started this may take some time. [OK!]
context = """Our business strategy has been to develop data processing and product technologies that can displace intermediaries within the online advertising ecosystem, while cultivating relationships that can provide access to media spend (advertisers) and media inventory (websites). In this regard, we have proprietary demand (media spend) and supply side (media inventory) technologies, targeting technologies, on-page or in-app ad-unit technologies, proprietary data and data management technologies, and advertising fraud detection technologies. We have both direct and indirect relationships at some of the largest media buyers and/or consolidators in the industry. For the ValidClick platform, the immediate strategy is to maintain the business at current levels by working with existing partners. For the IntentKey platform, the immediate strategy is to scale through the hiring of additional sales professionals, growing existing accounts and expanding the market size by launching a SaaS version of the IntentKey in 2021. We have both direct and indirect relationships at some of the largest media buyers and/or consolidators in the industry. For the ValidClick platform, the immediate strategy is to maintain the business at current levels by working with existing partners where the cash generated from the business can be used to accelerate growth of the IntentKey. For the IntentKey platform, the immediate strategy is to scale through the hiring of additional sales professionals, growing existing accounts and expanding the market size by concurrently selling the SaaS version of the IntentKey beginning in 2021. Our business strategy is focused on providing differentiation through the AI analytics and data products we own and protect through patents. For the marketing and advertising industries we serve, this strategy aligns with the components of the value chain that are the principal drivers of value to our clients. As part of our growth strategy, we evaluate acquisition candidates from time to time as opportunities arise with a focus on companies that have either advertisers or advertising relationships we do not possess or publishers or publishing partners who have content we do not possess."""
questions = ["""What are the key components of the business strategy described?""",
"""What is the immediate strategy for scaling the IntentKey platform?""",
"""How does the company aim to provide differentiation in the market?"""]
Q_P = [ [q, context] for q in questions]
data = spark.createDataFrame(Q_P).toDF("question", "context")
data.show(truncate = 60)
+------------------------------------------------------------+------------------------------------------------------------+ | question| context| +------------------------------------------------------------+------------------------------------------------------------+ |What are the key components of the business strategy desc...|Our business strategy has been to develop data processing...| |What is the immediate strategy for scaling the IntentKey ...|Our business strategy has been to develop data processing...| |How does the company aim to provide differentiation in th...|Our business strategy has been to develop data processing...| +------------------------------------------------------------+------------------------------------------------------------+
result = model.transform(data)
result.select('question', 'answer.result').show(truncate=False)
+------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |question |result | +------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |What are the key components of the business strategy described? |[The key components of the business strategy described are proprietary demand (media spend) and supply side (media inventory) technologies, targeting technologies, on-page or in-app ad-unit technologies, proprietary data and data management technologies, and advertising fraud detection technologies. . . ]| |What is the immediate strategy for scaling the IntentKey platform?|[The immediate strategy for scaling the IntentKey platform is to scale through the hiring of additional sales professionals, growing existing accounts and expanding the market size by concurrently selling the SaaS version of the IntentKey beginning in 2021. ] | |How does the company aim to provide differentiation in the market?|[The company aims to provide differentiation through the AI analytics and data products they own and protect through patents. ] | +------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
context = """Our business strategy has been to develop data processing and product technologies that can displace intermediaries within the online advertising ecosystem, while cultivating relationships that can provide access to media spend (advertisers) and media inventory (websites). In this regard, we have proprietary demand (media spend) and supply side (media inventory) technologies, targeting technologies, on-page or in-app ad-unit technologies, proprietary data and data management technologies, and advertising fraud detection technologies. We have both direct and indirect relationships at some of the largest media buyers and/or consolidators in the industry. For the ValidClick platform, the immediate strategy is to maintain the business at current levels by working with existing partners. For the IntentKey platform, the immediate strategy is to scale through the hiring of additional sales professionals, growing existing accounts and expanding the market size by launching a SaaS version of the IntentKey in 2021. We have both direct and indirect relationships at some of the largest media buyers and/or consolidators in the industry. For the ValidClick platform, the immediate strategy is to maintain the business at current levels by working with existing partners where the cash generated from the business can be used to accelerate growth of the IntentKey. For the IntentKey platform, the immediate strategy is to scale through the hiring of additional sales professionals, growing existing accounts and expanding the market size by concurrently selling the SaaS version of the IntentKey beginning in 2021. Our business strategy is focused on providing differentiation through the AI analytics and data products we own and protect through patents. For the marketing and advertising industries we serve, this strategy aligns with the components of the value chain that are the principal drivers of value to our clients. As part of our growth strategy, we evaluate acquisition candidates from time to time as opportunities arise with a focus on companies that have either advertisers or advertising relationships we do not possess or publishers or publishing partners who have content we do not possess."""
questions = ["""What are the key components of the business strategy described?""",
"""What is the immediate strategy for scaling the IntentKey platform?""",
"""How does the company aim to provide differentiation in the market?"""]
light_model = nlp.LightPipeline(model)
all_result = []
for q in range(len(questions)):
light_result = light_model.annotate([questions[q]],[context])
all_result.append(light_result)
all_result
[[{'document_question': ['What are the key components of the business strategy described?'], 'document_context': ['Our business strategy has been to develop data processing and product technologies that can displace intermediaries within the online advertising ecosystem, while cultivating relationships that can provide access to media spend (advertisers) and media inventory (websites). In this regard, we have proprietary demand (media spend) and supply side (media inventory) technologies, targeting technologies, on-page or in-app ad-unit technologies, proprietary data and data management technologies, and advertising fraud detection technologies. We have both direct and indirect relationships at some of the largest media buyers and/or consolidators in the industry. For the ValidClick platform, the immediate strategy is to maintain the business at current levels by working with existing partners. For the IntentKey platform, the immediate strategy is to scale through the hiring of additional sales professionals, growing existing accounts and expanding the market size by launching a SaaS version of the IntentKey in 2021. We have both direct and indirect relationships at some of the largest media buyers and/or consolidators in the industry. For the ValidClick platform, the immediate strategy is to maintain the business at current levels by working with existing partners where the cash generated from the business can be used to accelerate growth of the IntentKey. For the IntentKey platform, the immediate strategy is to scale through the hiring of additional sales professionals, growing existing accounts and expanding the market size by concurrently selling the SaaS version of the IntentKey beginning in 2021. Our business strategy is focused on providing differentiation through the AI analytics and data products we own and protect through patents. For the marketing and advertising industries we serve, this strategy aligns with the components of the value chain that are the principal drivers of value to our clients. As part of our growth strategy, we evaluate acquisition candidates from time to time as opportunities arise with a focus on companies that have either advertisers or advertising relationships we do not possess or publishers or publishing partners who have content we do not possess.'], 'answer': ['The key components of the business strategy described are proprietary demand (media spend) and supply side (media inventory) technologies, targeting technologies, on-page or in-app ad-unit technologies, proprietary data and data management technologies, and advertising fraud detection technologies. . . ']}], [{'document_question': ['What is the immediate strategy for scaling the IntentKey platform?'], 'document_context': ['Our business strategy has been to develop data processing and product technologies that can displace intermediaries within the online advertising ecosystem, while cultivating relationships that can provide access to media spend (advertisers) and media inventory (websites). In this regard, we have proprietary demand (media spend) and supply side (media inventory) technologies, targeting technologies, on-page or in-app ad-unit technologies, proprietary data and data management technologies, and advertising fraud detection technologies. We have both direct and indirect relationships at some of the largest media buyers and/or consolidators in the industry. For the ValidClick platform, the immediate strategy is to maintain the business at current levels by working with existing partners. For the IntentKey platform, the immediate strategy is to scale through the hiring of additional sales professionals, growing existing accounts and expanding the market size by launching a SaaS version of the IntentKey in 2021. We have both direct and indirect relationships at some of the largest media buyers and/or consolidators in the industry. For the ValidClick platform, the immediate strategy is to maintain the business at current levels by working with existing partners where the cash generated from the business can be used to accelerate growth of the IntentKey. For the IntentKey platform, the immediate strategy is to scale through the hiring of additional sales professionals, growing existing accounts and expanding the market size by concurrently selling the SaaS version of the IntentKey beginning in 2021. Our business strategy is focused on providing differentiation through the AI analytics and data products we own and protect through patents. For the marketing and advertising industries we serve, this strategy aligns with the components of the value chain that are the principal drivers of value to our clients. As part of our growth strategy, we evaluate acquisition candidates from time to time as opportunities arise with a focus on companies that have either advertisers or advertising relationships we do not possess or publishers or publishing partners who have content we do not possess.'], 'answer': ['The immediate strategy for scaling the IntentKey platform is to scale through the hiring of additional sales professionals, growing existing accounts and expanding the market size by concurrently selling the SaaS version of the IntentKey beginning in 2021. ']}], [{'document_question': ['How does the company aim to provide differentiation in the market?'], 'document_context': ['Our business strategy has been to develop data processing and product technologies that can displace intermediaries within the online advertising ecosystem, while cultivating relationships that can provide access to media spend (advertisers) and media inventory (websites). In this regard, we have proprietary demand (media spend) and supply side (media inventory) technologies, targeting technologies, on-page or in-app ad-unit technologies, proprietary data and data management technologies, and advertising fraud detection technologies. We have both direct and indirect relationships at some of the largest media buyers and/or consolidators in the industry. For the ValidClick platform, the immediate strategy is to maintain the business at current levels by working with existing partners. For the IntentKey platform, the immediate strategy is to scale through the hiring of additional sales professionals, growing existing accounts and expanding the market size by launching a SaaS version of the IntentKey in 2021. We have both direct and indirect relationships at some of the largest media buyers and/or consolidators in the industry. For the ValidClick platform, the immediate strategy is to maintain the business at current levels by working with existing partners where the cash generated from the business can be used to accelerate growth of the IntentKey. For the IntentKey platform, the immediate strategy is to scale through the hiring of additional sales professionals, growing existing accounts and expanding the market size by concurrently selling the SaaS version of the IntentKey beginning in 2021. Our business strategy is focused on providing differentiation through the AI analytics and data products we own and protect through patents. For the marketing and advertising industries we serve, this strategy aligns with the components of the value chain that are the principal drivers of value to our clients. As part of our growth strategy, we evaluate acquisition candidates from time to time as opportunities arise with a focus on companies that have either advertisers or advertising relationships we do not possess or publishers or publishing partners who have content we do not possess.'], 'answer': ['The company aims to provide differentiation through the AI analytics and data products they own and protect through patents. ']}]]
import textwrap
context = textwrap.fill(all_result[0][0]['document_context'][0], width=120)
print("➤ Context: \n{}".format(context))
print("\n")
for q in range(len(questions)):
question = textwrap.fill(all_result[q][0]['document_question'][0], width=120)
answer = textwrap.fill(all_result[q][0]['answer'][0], width=120)
print("➤ Question: \n{}".format(question))
print("\n")
print("➤ Answer: \n{}".format(answer))
print("\n")
➤ Context: Our business strategy has been to develop data processing and product technologies that can displace intermediaries within the online advertising ecosystem, while cultivating relationships that can provide access to media spend (advertisers) and media inventory (websites). In this regard, we have proprietary demand (media spend) and supply side (media inventory) technologies, targeting technologies, on-page or in-app ad-unit technologies, proprietary data and data management technologies, and advertising fraud detection technologies. We have both direct and indirect relationships at some of the largest media buyers and/or consolidators in the industry. For the ValidClick platform, the immediate strategy is to maintain the business at current levels by working with existing partners. For the IntentKey platform, the immediate strategy is to scale through the hiring of additional sales professionals, growing existing accounts and expanding the market size by launching a SaaS version of the IntentKey in 2021. We have both direct and indirect relationships at some of the largest media buyers and/or consolidators in the industry. For the ValidClick platform, the immediate strategy is to maintain the business at current levels by working with existing partners where the cash generated from the business can be used to accelerate growth of the IntentKey. For the IntentKey platform, the immediate strategy is to scale through the hiring of additional sales professionals, growing existing accounts and expanding the market size by concurrently selling the SaaS version of the IntentKey beginning in 2021. Our business strategy is focused on providing differentiation through the AI analytics and data products we own and protect through patents. For the marketing and advertising industries we serve, this strategy aligns with the components of the value chain that are the principal drivers of value to our clients. As part of our growth strategy, we evaluate acquisition candidates from time to time as opportunities arise with a focus on companies that have either advertisers or advertising relationships we do not possess or publishers or publishing partners who have content we do not possess. ➤ Question: What are the key components of the business strategy described? ➤ Answer: The key components of the business strategy described are proprietary demand (media spend) and supply side (media inventory) technologies, targeting technologies, on-page or in-app ad-unit technologies, proprietary data and data management technologies, and advertising fraud detection technologies. . . ➤ Question: What is the immediate strategy for scaling the IntentKey platform? ➤ Answer: The immediate strategy for scaling the IntentKey platform is to scale through the hiring of additional sales professionals, growing existing accounts and expanding the market size by concurrently selling the SaaS version of the IntentKey beginning in 2021. ➤ Question: How does the company aim to provide differentiation in the market? ➤ Answer: The company aims to provide differentiation through the AI analytics and data products they own and protect through patents.