Notebook

Here you can find some examples on how to utilize PGT model, hosted in GT4SD, for part-of-patent generation, part-of-patent editing and patent coherence check.

PGTGenerator¶

An algorithm for part-of-patent generation. The user should select the task (posible tasks: title-to-abstract, abstract-to-title, abstract-to-claim and claim-to-abstract) and define the input text to generate the respective part of the patent. For example below, we are interested in generating patent abstracts that correspond to the given title.

In [33]:

from gt4sd.algorithms.generation.pgt.core import PGT,PGTGenerator

configuration = PGTGenerator(task="title-to-abstract",
                            input_text="Artificial intelligence and machine learning infrastructure"
)
algorithm = PGT(configuration=configuration)

Via the algorithm you can easily inspect the suggested generated parts interactively:

In [34]:

list(algorithm.sample(2))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

Out[34]:

['An artificial intelligence application may collect and use machine inputs and/or machine outputs. The application enables users to interact with the application and provide feedback for the system. Feedback may be provided by applying a variety of techniques to the inputs such as user response analysis and user behavior analysis, machine input training, or both. User feedback may also be used to modify the operation of the artificial intelligent application.',
 'Artifact processing to detect anomalies and/or changes in data is contemplated herein. By way of example, data in a manner that facilitates analysis that accounts for such data (e.g., in order to identify anomalies/changes based on changes, such as whether one or more of the fields of change occur during an event). Moreover, by way still other aspects, these data may be processed to determine context-dependent, context associated with corresponding events (for example by determining whether at least a portion of a field of data associated therewith includes atypical content), and automatically or semi-automatically detected anomalies (based, at a least in part, on the context and the corresponding context) that may cause one of, but is not limited to, alerts, and other actions to be taken.']

We can tune the generation by adjusting the respective parameters in the configuration. In the next example, we are interested again in abstract generation but now we extend the max_length to 756, set top_k=20, top_p=0.95 and num_return_sequences=5. Note that the parameter num_return_sequences defines how many alternatives the model is going to produce in parallel and it is the upper limit of how many alternatives we can inspect using the sample method. For more information regarding the top_k and top_p parameters, we refer the user to https://huggingface.co/blog/how-to-generate.

In [35]:

from gt4sd.algorithms.generation.pgt.core import PGT,PGTGenerator

configuration = PGTGenerator(task="title-to-abstract",
                            input_text="Artificial intelligence and machine learning infrastructure",
                            max_length=756,
                            top_k=20,
                            top_p=0.95,
                            num_return_sequences=5
)
algorithm = PGT(configuration=configuration)

list(algorithm.sample(4))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

Out[35]:

['The invention is directed to providing artificial intelligence (AI) based machine or machine learned systems with a capability to generate data models, to receive and store data, and to process the data according to at least one application programming interface (API). The AI system and method may be implemented on a computer system including one or more hardware processors configured to execute instructions to provide an API-based AI based system with artificial data processing capabilities. The system receives a request for data from a user, wherein the request includes information indicative of one of a plurality of user characteristics including a subject, an entity, a relationship, or a context. In response to the receipt of the information, the system automatically generates a data model comprising a first data structure comprising data derived from one and only one type of data source, without requiring data input from human users or human data creators.',
 'A system and method for providing an artificial intelligence (AI) and a machine-learning (ML) infrastructure. The system may include a first processing module. When executed by the first processor module, the system is configured to receive data related to a user. Further, a second processing modules, when executed, is operatively connected to the processing components of the user based on the received data. Finally, an engine is provided which executes on a processor to determine the appropriate AI and ML infrastructure based upon the data received by said first and second processors, and to present the AI on one or more user interfaces.',
 'The present invention relates to an artificial intelligence framework and a machine-learning infrastructure. The artificial Intelligence framework includes a first artificial intelligent device configured for a user to interact with a second artificial electronic device, which is a remote machine. In addition, the artificial AI framework also includes at least one artificial-intelligence device. Each artificial artificial device includes: at a memory, artificial neural-network, and an external output. A controller in communication with the first and second electronic devices and the external outputs of the at lease one device is also provided.',
 'A system and method for providing artificial intelligence services is provided. The system includes an artificial Intelligence (AI) infrastructure that includes AI software components; one or more artificial neural networks; a cloud storage system; and a computing system coupled to the cloud service and configured to execute the AI infrastructure.']

PGTEditor¶

An algorithm for part-of-patent editing. The user should define the input type (posible types: abstract and claim) and the input text to be edited. The input text should contain at least one [MASK] token which indicates the places which the editor should fill with text. For example below, we are interested in editing the given abstract by filling the two missing parts. Each generated sample contains a tuple with the suggested changes for the mask tokens in the same order as they appeared in the text. The parameters that mentioned in the PGTGenerator case could also be leveraged for patent editing.

In [36]:

from gt4sd.algorithms.generation.pgt.core import PGT, PGTEditor

configuration = PGTEditor(input_type="abstract",
                          input_text="In one step of a method for infusing an [MASK], the infusion fluid is pumped through a fluid delivery "
                                    "line of an infusion system. In another step, measurements are taken with at least one sensor connected to the infusion "
                                    "system. In an additional step, an air determination is determined with at least one processor. The air determination is "
                                    "related to air in the fluid delivery line. The air determination is based on the measurements taken by the at least one "
                                    "sensor. The air determination is further based on: (1) [MASK] "
                                    "information regarding the infusion of the infusion fluid; or (2) multi-channel filtering of the measurements from the at "
                                    "least one sensor or non-linear mapping of the measurements from the at least one sensor; and statistical process control "
                                    "charts applied to the multi-channel filtered measurements or applied to the non-linear mapped measurements."
)
algorithm = PGT(configuration=configuration)

list(algorithm.sample(1))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

Out[36]:

[('infusion fluid into a user', 'the measured measurements;')]

PGTCoherenceChecker¶

An algorithm for patent coherence check. It assesses if two given patent parts could belong to same patent. The assesment is based both on the information and structure of these two parts. The user should define the coherence type (posible types: title-abstract, title-claim and abstract-claim) and the two input parts of paragraph. The coherence type defines which two parts of a patent are assesed and it indicates what should be the two given inputs. For example below, we are interested in examing the coherence of a title and an abstract. The output of the check could be yes and no. In case of the input is not well structured or the model cannot make a decision the output could be also NA. Even if the configuration can be changed similar to the other models above, due to the nature of the task, we suggest the user to stick to the default configuration and follow the exactly the steps presented below.

In [37]:

from gt4sd.algorithms.generation.pgt.core import PGT, PGTCoherenceChecker

my_interesting_title = "Artificial intelligence and machine learning infrastructure"
my_related_abstract = "An artificial intelligence and machine learning infrastructure system, including: one or more storage systems comprising, respectively, one or more storage devices; and one or more graphical processing units, wherein the graphical processing units are configured to communicate with the one or more storage systems over a communication fabric; where the one or more storage systems, the one or more graphical processing units, and the communication fabric are implemented within a single chassis."


configuration = PGTCoherenceChecker(coherence_type="title-abstract",
                          input_a=my_interesting_title,
                          input_b=my_related_abstract
)
algorithm = PGT(configuration=configuration)

list(algorithm.sample(1))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

Out[37]:

['yes']

The next example contains a title and an abstract that are not directly related.

In [38]:

from gt4sd.algorithms.generation.pgt.core import PGT, PGTCoherenceChecker

my_interesting_title = "Analog image processing"
my_unrelated_abstract = "An artificial intelligence and machine learning infrastructure system for image classification, including: one or more storage systems comprising, respectively, one or more storage devices; and one or more graphical processing units, wherein the graphical processing units are configured to communicate with the one or more storage systems over a communication fabric; where the one or more storage systems, the one or more graphical processing units, and the communication fabric are implemented within a single chassis."
configuration = PGTCoherenceChecker(coherence_type="title-abstract",
                          input_a=my_interesting_title,
                          input_b=my_unrelated_abstract
)
algorithm = PGT(configuration=configuration)

list(algorithm.sample(1))

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.

Out[38]:

['no']

In [ ]:

Explore PGT model for patent related tasks¶

PGTGenerator¶

PGTEditor¶

PGTCoherenceChecker¶