This notebook is the continuation of 10.0.Data_Augmentation_with_ChunkMappers.ipynb
! pip install -q johnsnowlabs
Using my.johnsnowlabs.com SSO
from johnsnowlabs import nlp, finance
# nlp.install(force_browser=True)
If you are not registered in my.johnsnowlabs.com, you received a license via e-email or you are using Safari, you may need to do a manual update of the license.
from google.colab import files
print('Please Upload your John Snow Labs License using the button below')
license_keys = files.upload()
Please Upload your John Snow Labs License using the button below
Saving spark_nlp_for_healthcare_spark_ocr_6538.json to spark_nlp_for_healthcare_spark_ocr_6538 (1).json
nlp.install()
👌 Detected license file /content/spark_nlp_for_healthcare_spark_ocr_6538.json 👌 Detected license file /content/spark_nlp_for_healthcare_spark_ocr_6538.json 👌 JSL-Home is up to date! 👌 Everything is already installed, no changes made
spark = nlp.start()
Spark Session already created, some configs may not take. 👌 Detected license file /content/spark_nlp_for_healthcare_spark_ocr_6538.json
Let's suppose we want to manually get information about CADENCE DESIGN SYSTEMS, INC.
Since it's a public US company, we can go to SEC Edgar's database and look for it.
Unfortunately, CADENCE DESIGN SYSTEMS, INC
is not the official name of the company, which means no entry for CADENCE DESIGN SYSTEMS, INC
is available.
This happens very often. Data providers may have different versions of the name with different punctuation. For example, for Meta:
ChunkMappers work by default with EXACT MATCHES
. So if you don't extract exactly the very same way the company appears in your ChunkMapper, you won't get any result. But evidently we include 2 ways to cope with non-full matches.
In this case, we will use Edgar Database (finmapper_edgar_companyname
)
The component which carries out Data Augmentation is called ChunkMapper
.
It's name comes from the way it works: it uses a Ner Chunk to map it to an external data source.
As a result, you will get a JSON with a dictionary of additional fields and their values.
Let's take a look at how it works.
Let's use our NER chunk to map it to Edgar ChunkMapper using Fuzzy Matching.
ORG = ['CADENCE DESIGN SYSTEMS, INC', 'CADENCE DESIGN SYSTEM INCORPORATED']
ORG
['CADENCE DESIGN SYSTEMS, INC', 'CADENCE DESIGN SYSTEM INCORPORATED']
document_assembler = nlp.DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")
# Posible distance metrics: ['levenshtein', 'longest-common-subsequence', 'cosine']
CM = finance.ChunkMapperModel().pretrained("finmapper_edgar_companyname", "en", "finance/models")\
.setInputCols(["document"])\
.setOutputCol("mappings")\
.setEnableFuzzyMatching(True)\
.setEnableCharFingerprintMatching(False)\
.setFuzzyMatchingDistances(['cosine'])\
.setFuzzyMatchingDistanceThresholds([0.1])
cm_pipeline = nlp.Pipeline(stages=[document_assembler, CM])
empty_data = spark.createDataFrame([[""]]).toDF("text")
fit_cm_pipeline = cm_pipeline.fit(empty_data)
lp = nlp.LightPipeline(fit_cm_pipeline)
res = lp.fullAnnotate(ORG)
finmapper_edgar_companyname download started this may take some time. [OK!]
for r in res:
for map in r['mappings']:
print(map)
Annotation(labeled_dependency, 0, 26, CADENCE DESIGN SYSTEMS INC, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'name', '__distance_function__': 'cosine', 'ops': '0.037037037037037035', 'all_relations': '', 'entity': 'CADENCE DESIGN SYSTEMS, INC', 'distance': '0.0', '__relation_name__': 'name'}, []) Annotation(labeled_dependency, 0, 26, SERVICES-PREPACKAGED SOFTWARE [7372], {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'sic', '__distance_function__': 'cosine', 'ops': '0.037037037037037035', 'all_relations': '', 'entity': 'CADENCE DESIGN SYSTEMS, INC', 'distance': '0.0', '__relation_name__': 'sic'}, []) Annotation(labeled_dependency, 0, 26, 7372, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'sic_code', '__distance_function__': 'cosine', 'ops': '0.037037037037037035', 'all_relations': '0', 'entity': 'CADENCE DESIGN SYSTEMS, INC', 'distance': '0.0', '__relation_name__': 'sic_code'}, []) Annotation(labeled_dependency, 0, 26, 770148231, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'irs_number', '__distance_function__': 'cosine', 'ops': '0.037037037037037035', 'all_relations': '0', 'entity': 'CADENCE DESIGN SYSTEMS, INC', 'distance': '0.0', '__relation_name__': 'irs_number'}, []) Annotation(labeled_dependency, 0, 26, 1228, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'fiscal_year_end', '__distance_function__': 'cosine', 'ops': '0.037037037037037035', 'all_relations': '101:::1229:::0:::102', 'entity': 'CADENCE DESIGN SYSTEMS, INC', 'distance': '0.0', '__relation_name__': 'fiscal_year_end'}, []) Annotation(labeled_dependency, 0, 26, CA, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'state_location', '__distance_function__': 'cosine', 'ops': '0.037037037037037035', 'all_relations': '', 'entity': 'CADENCE DESIGN SYSTEMS, INC', 'distance': '0.0', '__relation_name__': 'state_location'}, []) Annotation(labeled_dependency, 0, 26, DE, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'state_incorporation', '__distance_function__': 'cosine', 'ops': '0.037037037037037035', 'all_relations': '', 'entity': 'CADENCE DESIGN SYSTEMS, INC', 'distance': '0.0', '__relation_name__': 'state_incorporation'}, []) Annotation(labeled_dependency, 0, 26, 2655 SEELY AVENUE BLDG 5, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'business_street', '__distance_function__': 'cosine', 'ops': '0.037037037037037035', 'all_relations': '2655 SEELY ROAD BLDG 5', 'entity': 'CADENCE DESIGN SYSTEMS, INC', 'distance': '0.0', '__relation_name__': 'business_street'}, []) Annotation(labeled_dependency, 0, 26, SAN JOSE, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'business_city', '__distance_function__': 'cosine', 'ops': '0.037037037037037035', 'all_relations': '', 'entity': 'CADENCE DESIGN SYSTEMS, INC', 'distance': '0.0', '__relation_name__': 'business_city'}, []) Annotation(labeled_dependency, 0, 26, CA, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'business_state', '__distance_function__': 'cosine', 'ops': '0.037037037037037035', 'all_relations': '', 'entity': 'CADENCE DESIGN SYSTEMS, INC', 'distance': '0.0', '__relation_name__': 'business_state'}, []) Annotation(labeled_dependency, 0, 26, 95134, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'business_zip', '__distance_function__': 'cosine', 'ops': '0.037037037037037035', 'all_relations': '', 'entity': 'CADENCE DESIGN SYSTEMS, INC', 'distance': '0.0', '__relation_name__': 'business_zip'}, []) Annotation(labeled_dependency, 0, 26, 4089431234, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'business_phone', '__distance_function__': 'cosine', 'ops': '0.037037037037037035', 'all_relations': '', 'entity': 'CADENCE DESIGN SYSTEMS, INC', 'distance': '0.0', '__relation_name__': 'business_phone'}, []) Annotation(labeled_dependency, 0, 26, ECAD INC /DE/, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'former_name', '__distance_function__': 'cosine', 'ops': '0.037037037037037035', 'all_relations': '', 'entity': 'CADENCE DESIGN SYSTEMS, INC', 'distance': '0.0', '__relation_name__': 'former_name'}, []) Annotation(labeled_dependency, 0, 26, 19880609, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'former_name_date', '__distance_function__': 'cosine', 'ops': '0.037037037037037035', 'all_relations': '', 'entity': 'CADENCE DESIGN SYSTEMS, INC', 'distance': '0.0', '__relation_name__': 'former_name_date'}, []) Annotation(labeled_dependency, 0, 26, 2017-02-10, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'date', '__distance_function__': 'cosine', 'ops': '0.037037037037037035', 'all_relations': '2017-07-24:::2016-04-25:::2016-07-25:::2016-10-24:::2022-04-25:::2018-10-22:::2015-02-10:::2015-07-13:::2015-09-22:::2015-11-23:::2015-10-27:::2015-12-03:::2014-01-08:::2014-02-06:::2014-02-07:::2014-02-11:::2014-02-13:::2014-02-18:::2014-01-29:::2014-02-19:::2014-02-10:::2014-02-24:::2014-02-14:::2014-02-20:::2014-03-19:::2014-03-07:::2014-03-05:::2014-02-27:::2014-04-01:::2013-10-24:::2012-07-26:::2011-07-29:::2021-10-25:::2020-04-20:::2020-07-20:::2020-10-19:::2008-12-11:::2006-08-07:::2006-10-27:::2002-03-12', 'entity': 'CADENCE DESIGN SYSTEMS, INC', 'distance': '0.0', '__relation_name__': 'date'}, []) Annotation(labeled_dependency, 0, 26, 813672, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'company_id', '__distance_function__': 'cosine', 'ops': '0.037037037037037035', 'all_relations': '', 'entity': 'CADENCE DESIGN SYSTEMS, INC', 'distance': '0.0', '__relation_name__': 'company_id'}, []) Annotation(labeled_dependency, 0, 33, CADENCE DESIGN SYSTEMS INC, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'name', '__distance_function__': 'jaccard', 'ops': '0.29411764705882354', 'all_relations': '', 'entity': 'CADENCE DESIGN SYSTEM INCORPORATED', 'distance': '0.2', '__relation_name__': 'name'}, []) Annotation(labeled_dependency, 0, 33, SERVICES-PREPACKAGED SOFTWARE [7372], {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'sic', '__distance_function__': 'jaccard', 'ops': '0.29411764705882354', 'all_relations': '', 'entity': 'CADENCE DESIGN SYSTEM INCORPORATED', 'distance': '0.2', '__relation_name__': 'sic'}, []) Annotation(labeled_dependency, 0, 33, 7372, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'sic_code', '__distance_function__': 'jaccard', 'ops': '0.29411764705882354', 'all_relations': '0', 'entity': 'CADENCE DESIGN SYSTEM INCORPORATED', 'distance': '0.2', '__relation_name__': 'sic_code'}, []) Annotation(labeled_dependency, 0, 33, 770148231, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'irs_number', '__distance_function__': 'jaccard', 'ops': '0.29411764705882354', 'all_relations': '0', 'entity': 'CADENCE DESIGN SYSTEM INCORPORATED', 'distance': '0.2', '__relation_name__': 'irs_number'}, []) Annotation(labeled_dependency, 0, 33, 1228, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'fiscal_year_end', '__distance_function__': 'jaccard', 'ops': '0.29411764705882354', 'all_relations': '101:::1229:::0:::102', 'entity': 'CADENCE DESIGN SYSTEM INCORPORATED', 'distance': '0.2', '__relation_name__': 'fiscal_year_end'}, []) Annotation(labeled_dependency, 0, 33, CA, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'state_location', '__distance_function__': 'jaccard', 'ops': '0.29411764705882354', 'all_relations': '', 'entity': 'CADENCE DESIGN SYSTEM INCORPORATED', 'distance': '0.2', '__relation_name__': 'state_location'}, []) Annotation(labeled_dependency, 0, 33, DE, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'state_incorporation', '__distance_function__': 'jaccard', 'ops': '0.29411764705882354', 'all_relations': '', 'entity': 'CADENCE DESIGN SYSTEM INCORPORATED', 'distance': '0.2', '__relation_name__': 'state_incorporation'}, []) Annotation(labeled_dependency, 0, 33, 2655 SEELY AVENUE BLDG 5, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'business_street', '__distance_function__': 'jaccard', 'ops': '0.29411764705882354', 'all_relations': '2655 SEELY ROAD BLDG 5', 'entity': 'CADENCE DESIGN SYSTEM INCORPORATED', 'distance': '0.2', '__relation_name__': 'business_street'}, []) Annotation(labeled_dependency, 0, 33, SAN JOSE, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'business_city', '__distance_function__': 'jaccard', 'ops': '0.29411764705882354', 'all_relations': '', 'entity': 'CADENCE DESIGN SYSTEM INCORPORATED', 'distance': '0.2', '__relation_name__': 'business_city'}, []) Annotation(labeled_dependency, 0, 33, CA, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'business_state', '__distance_function__': 'jaccard', 'ops': '0.29411764705882354', 'all_relations': '', 'entity': 'CADENCE DESIGN SYSTEM INCORPORATED', 'distance': '0.2', '__relation_name__': 'business_state'}, []) Annotation(labeled_dependency, 0, 33, 95134, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'business_zip', '__distance_function__': 'jaccard', 'ops': '0.29411764705882354', 'all_relations': '', 'entity': 'CADENCE DESIGN SYSTEM INCORPORATED', 'distance': '0.2', '__relation_name__': 'business_zip'}, []) Annotation(labeled_dependency, 0, 33, 4089431234, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'business_phone', '__distance_function__': 'jaccard', 'ops': '0.29411764705882354', 'all_relations': '', 'entity': 'CADENCE DESIGN SYSTEM INCORPORATED', 'distance': '0.2', '__relation_name__': 'business_phone'}, []) Annotation(labeled_dependency, 0, 33, ECAD INC /DE/, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'former_name', '__distance_function__': 'jaccard', 'ops': '0.29411764705882354', 'all_relations': '', 'entity': 'CADENCE DESIGN SYSTEM INCORPORATED', 'distance': '0.2', '__relation_name__': 'former_name'}, []) Annotation(labeled_dependency, 0, 33, 19880609, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'former_name_date', '__distance_function__': 'jaccard', 'ops': '0.29411764705882354', 'all_relations': '', 'entity': 'CADENCE DESIGN SYSTEM INCORPORATED', 'distance': '0.2', '__relation_name__': 'former_name_date'}, []) Annotation(labeled_dependency, 0, 33, 2017-02-10, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'date', '__distance_function__': 'jaccard', 'ops': '0.29411764705882354', 'all_relations': '2017-07-24:::2016-04-25:::2016-07-25:::2016-10-24:::2022-04-25:::2018-10-22:::2015-02-10:::2015-07-13:::2015-09-22:::2015-11-23:::2015-10-27:::2015-12-03:::2014-01-08:::2014-02-06:::2014-02-07:::2014-02-11:::2014-02-13:::2014-02-18:::2014-01-29:::2014-02-19:::2014-02-10:::2014-02-24:::2014-02-14:::2014-02-20:::2014-03-19:::2014-03-07:::2014-03-05:::2014-02-27:::2014-04-01:::2013-10-24:::2012-07-26:::2011-07-29:::2021-10-25:::2020-04-20:::2020-07-20:::2020-10-19:::2008-12-11:::2006-08-07:::2006-10-27:::2002-03-12', 'entity': 'CADENCE DESIGN SYSTEM INCORPORATED', 'distance': '0.2', '__relation_name__': 'date'}, []) Annotation(labeled_dependency, 0, 33, 813672, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'company_id', '__distance_function__': 'jaccard', 'ops': '0.29411764705882354', 'all_relations': '', 'entity': 'CADENCE DESIGN SYSTEM INCORPORATED', 'distance': '0.2', '__relation_name__': 'company_id'}, [])
We have been able to successfully retrieve the information in Edgar using different variations of the Company Name CADENCE DESIGN SYSTEMS
!
In the previous step, with Fuzzy Matching, we don't really care what the official name of the company in Edgar is.
If we want to retrieve that information as well, we can first do an additional step: Company Names Normalization.
Company Name Normalization is the process of obtaining the name of the company used by data providers, usually the "official" name of the company (in their databases! it may be different in different data providers!)
Let's normalize CADENCE DESIGN SYSTEMS, INC
to the official name in SEC Edgar.
document_assembler = nlp.DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")
use_embeddings = nlp.UniversalSentenceEncoder.pretrained()\
.setInputCols("document") \
.setOutputCol("sentence_embeddings")
resolver = finance.SentenceEntityResolverModel.pretrained("finel_edgar_company_name", "en", "finance/models")\
.setInputCols(["sentence_embeddings"]) \
.setOutputCol("resolution")\
.setDistanceFunction("EUCLIDEAN")
pipelineModel = nlp.PipelineModel(
stages = [
document_assembler,
use_embeddings,
resolver])
lp_res = nlp.LightPipeline(pipelineModel)
tfhub_use download started this may take some time. Approximate size to download 923.7 MB [OK!] finel_edgar_company_name download started this may take some time. [OK!]
el_res = lp_res.annotate(ORG)
el_res
[{'document': ['CADENCE DESIGN SYSTEMS, INC'], 'sentence_embeddings': ['CADENCE DESIGN SYSTEMS, INC'], 'resolution': ['CADENCE DESIGN SYSTEMS INC']}, {'document': ['CADENCE DESIGN SYSTEM INCORPORATED'], 'sentence_embeddings': ['CADENCE DESIGN SYSTEM INCORPORATED'], 'resolution': ['CADENCE DESIGN SYSTEMS INC']}]
NORM_ORG = el_res[0]['resolution'][0]
NORM_ORG
'CADENCE DESIGN SYSTEMS INC'
Here is our normalized name for Amazon: CADENCE DESIGN SYSTEMS INC
.
Now, let's see which information is available in Edgar database for CADENCE DESIGN SYSTEMS INC
Once we have the normalized name of the company, we can use John Snow Labs Chunk Mappers
. These are pretrained data sources, which are updated frequently and can be queried inside Spark NLP without sending any API call to any server.
In this case, we will use Edgar Database (finmapper_edgar_companyname
)
document_assembler = nlp.DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")
# Posible distance metrics: ['levenshtein', 'longest-common-subsequence', 'cosine']
CM = finance.ChunkMapperModel().pretrained("finmapper_edgar_companyname", "en", "finance/models")\
.setInputCols(["document"])\
.setOutputCol("mappings")\
.setEnableFuzzyMatching(True)\
.setEnableCharFingerprintMatching(False)\
.setFuzzyMatchingDistances(['cosine'])\
.setFuzzyMatchingDistanceThresholds([0.1])
cm_pipeline = nlp.Pipeline(stages=[document_assembler, CM])
empty_data = spark.createDataFrame([[""]]).toDF("text")
fit_cm_pipeline = cm_pipeline.fit(empty_data)
lp = nlp.LightPipeline(fit_cm_pipeline)
res = lp.fullAnnotate(NORM_ORG)
finmapper_edgar_companyname download started this may take some time. [OK!]
for r in res:
for map in r['mappings']:
print(map)
Annotation(labeled_dependency, 0, 25, CADENCE DESIGN SYSTEMS INC, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'name', '__distance_function__': 'levenshtein', 'ops': '0.0', 'all_relations': '', 'entity': 'CADENCE DESIGN SYSTEMS INC', 'distance': '0.0', '__relation_name__': 'name'}, []) Annotation(labeled_dependency, 0, 25, SERVICES-PREPACKAGED SOFTWARE [7372], {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'sic', '__distance_function__': 'levenshtein', 'ops': '0.0', 'all_relations': '', 'entity': 'CADENCE DESIGN SYSTEMS INC', 'distance': '0.0', '__relation_name__': 'sic'}, []) Annotation(labeled_dependency, 0, 25, 7372, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'sic_code', '__distance_function__': 'levenshtein', 'ops': '0.0', 'all_relations': '0', 'entity': 'CADENCE DESIGN SYSTEMS INC', 'distance': '0.0', '__relation_name__': 'sic_code'}, []) Annotation(labeled_dependency, 0, 25, 770148231, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'irs_number', '__distance_function__': 'levenshtein', 'ops': '0.0', 'all_relations': '0', 'entity': 'CADENCE DESIGN SYSTEMS INC', 'distance': '0.0', '__relation_name__': 'irs_number'}, []) Annotation(labeled_dependency, 0, 25, 1228, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'fiscal_year_end', '__distance_function__': 'levenshtein', 'ops': '0.0', 'all_relations': '101:::1229:::0:::102', 'entity': 'CADENCE DESIGN SYSTEMS INC', 'distance': '0.0', '__relation_name__': 'fiscal_year_end'}, []) Annotation(labeled_dependency, 0, 25, CA, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'state_location', '__distance_function__': 'levenshtein', 'ops': '0.0', 'all_relations': '', 'entity': 'CADENCE DESIGN SYSTEMS INC', 'distance': '0.0', '__relation_name__': 'state_location'}, []) Annotation(labeled_dependency, 0, 25, DE, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'state_incorporation', '__distance_function__': 'levenshtein', 'ops': '0.0', 'all_relations': '', 'entity': 'CADENCE DESIGN SYSTEMS INC', 'distance': '0.0', '__relation_name__': 'state_incorporation'}, []) Annotation(labeled_dependency, 0, 25, 2655 SEELY AVENUE BLDG 5, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'business_street', '__distance_function__': 'levenshtein', 'ops': '0.0', 'all_relations': '2655 SEELY ROAD BLDG 5', 'entity': 'CADENCE DESIGN SYSTEMS INC', 'distance': '0.0', '__relation_name__': 'business_street'}, []) Annotation(labeled_dependency, 0, 25, SAN JOSE, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'business_city', '__distance_function__': 'levenshtein', 'ops': '0.0', 'all_relations': '', 'entity': 'CADENCE DESIGN SYSTEMS INC', 'distance': '0.0', '__relation_name__': 'business_city'}, []) Annotation(labeled_dependency, 0, 25, CA, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'business_state', '__distance_function__': 'levenshtein', 'ops': '0.0', 'all_relations': '', 'entity': 'CADENCE DESIGN SYSTEMS INC', 'distance': '0.0', '__relation_name__': 'business_state'}, []) Annotation(labeled_dependency, 0, 25, 95134, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'business_zip', '__distance_function__': 'levenshtein', 'ops': '0.0', 'all_relations': '', 'entity': 'CADENCE DESIGN SYSTEMS INC', 'distance': '0.0', '__relation_name__': 'business_zip'}, []) Annotation(labeled_dependency, 0, 25, 4089431234, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'business_phone', '__distance_function__': 'levenshtein', 'ops': '0.0', 'all_relations': '', 'entity': 'CADENCE DESIGN SYSTEMS INC', 'distance': '0.0', '__relation_name__': 'business_phone'}, []) Annotation(labeled_dependency, 0, 25, ECAD INC /DE/, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'former_name', '__distance_function__': 'levenshtein', 'ops': '0.0', 'all_relations': '', 'entity': 'CADENCE DESIGN SYSTEMS INC', 'distance': '0.0', '__relation_name__': 'former_name'}, []) Annotation(labeled_dependency, 0, 25, 19880609, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'former_name_date', '__distance_function__': 'levenshtein', 'ops': '0.0', 'all_relations': '', 'entity': 'CADENCE DESIGN SYSTEMS INC', 'distance': '0.0', '__relation_name__': 'former_name_date'}, []) Annotation(labeled_dependency, 0, 25, 2017-02-10, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'date', '__distance_function__': 'levenshtein', 'ops': '0.0', 'all_relations': '2017-07-24:::2016-04-25:::2016-07-25:::2016-10-24:::2022-04-25:::2018-10-22:::2015-02-10:::2015-07-13:::2015-09-22:::2015-11-23:::2015-10-27:::2015-12-03:::2014-01-08:::2014-02-06:::2014-02-07:::2014-02-11:::2014-02-13:::2014-02-18:::2014-01-29:::2014-02-19:::2014-02-10:::2014-02-24:::2014-02-14:::2014-02-20:::2014-03-19:::2014-03-07:::2014-03-05:::2014-02-27:::2014-04-01:::2013-10-24:::2012-07-26:::2011-07-29:::2021-10-25:::2020-04-20:::2020-07-20:::2020-10-19:::2008-12-11:::2006-08-07:::2006-10-27:::2002-03-12', 'entity': 'CADENCE DESIGN SYSTEMS INC', 'distance': '0.0', '__relation_name__': 'date'}, []) Annotation(labeled_dependency, 0, 25, 813672, {'__trained__': 'CADENCE DESIGN SYSTEMS INC', 'relation': 'company_id', '__distance_function__': 'levenshtein', 'ops': '0.0', 'all_relations': '', 'entity': 'CADENCE DESIGN SYSTEMS INC', 'distance': '0.0', '__relation_name__': 'company_id'}, [])
Yes, here it is. We get additional information about CADENCE DESIGN SYSTEMS INC
using only company name.