Diagnostic Generator

This model is trained over around 2 millions diagnostics to extract the probability distributions of word bigrams, which are groups of 2 words that appear one after the other. There are more frequent bigrams than others, for example the bigram caries dentinaria is more frequent than caries gingival and using these conditional probabilities we can synthesize new diagnostics that appears to be written by humans. The method used to model this process is a First Order Hidden Markov Model.

In [4]:
import sys # Module to interact with the operating system.
sys.path.append("..") # Adds higher directory to python modules path.
import src.utils # Loading utilities tu use the model.
In [5]:
synthesizer = src.utils.DiagnosticGenerator( # Class to construct the Generator from a vocabulaty and a Markov matrix
    markov_matrix_file = r"../models/model.npz", # Location of a binary scipy sparse matrix containing the markov matrix.
    vocabulary_file = r"../models/vocabulary.txt" # Location of a text file containing each word of the vocabulary separated by a line break.
)
In [9]:
synthesizer.generate() # Method to generate a list of words based on the Markov model.
Out[9]:
['<START>',
 'tumor',
 'de',
 'la',
 'agudeza',
 'visual',
 'no',
 'insulinodependiente',
 '<END>']