We will be using the
c_v coherence for two different LDA models: a "good" and a "bad" LDA model. The good LDA model will be trained over 50 iterations and the bad one for 1 iteration. Hence in theory, the good LDA model will be able come up with better or more human-understandable topics. Therefore the coherence measure output for the good LDA model should be more (better) than that for the bad LDA model. This is because, simply, the good LDA model usually comes up with better topics that are more human interpretable.
import numpy as np import logging import pyLDAvis.gensim import json import warnings warnings.filterwarnings('ignore') # To ignore all warnings that arise here to enhance clarity from gensim.models.coherencemodel import CoherenceModel from gensim.models.ldamodel import LdaModel from gensim.corpora.dictionary import Dictionary from numpy import array
logger = logging.getLogger() logger.setLevel(logging.DEBUG) logging.debug("test")
As stated in table 2 from this paper, this corpus essentially has two classes of documents. First five are about human-computer interaction and the other four are about graphs. We will be setting up two LDA models. One with 50 iterations of training and the other with just 1. Hence the one with 50 iterations ("better" model) should be able to capture this underlying pattern of the corpus better than the "bad" LDA model. Therefore, in theory, our topic coherence for the good LDA model should be greater than the one for the bad LDA model.
texts = [['human', 'interface', 'computer'], ['survey', 'user', 'computer', 'system', 'response', 'time'], ['eps', 'user', 'interface', 'system'], ['system', 'human', 'system', 'eps'], ['user', 'response', 'time'], ['trees'], ['graph', 'trees'], ['graph', 'minors', 'trees'], ['graph', 'minors', 'survey']]
dictionary = Dictionary(texts) corpus = [dictionary.doc2bow(text) for text in texts]
We'll be setting up two different LDA Topic models. A good one and bad one. To build a "good" topic model, we'll simply train it using more iterations than the bad one. Therefore the
u_mass coherence should in theory be better for the good model than the bad one since it would be producing more "human-interpretable" topics.
goodLdaModel = LdaModel(corpus=corpus, id2word=dictionary, iterations=50, num_topics=2) badLdaModel = LdaModel(corpus=corpus, id2word=dictionary, iterations=1, num_topics=2)
goodcm = CoherenceModel(model=goodLdaModel, corpus=corpus, dictionary=dictionary, coherence='u_mass')
badcm = CoherenceModel(model=badLdaModel, corpus=corpus, dictionary=dictionary, coherence='u_mass')
Following are the pipeline parameters for
u_mass coherence. By pipeline parameters, we mean the functions being used to calculate segmentation, probability estimation, confirmation measure and aggregation as shown in figure 1 in this paper.
CoherenceModel(segmentation=<function s_one_pre at 0x7f663ae82f50>, probability estimation=<function p_boolean_document at 0x7f663ae8a2a8>, confirmation measure=<function log_conditional_probability at 0x7f663ae8a668>, aggregation=<function arithmetic_mean at 0x7f663ae8aa28>)
As we will see below using LDA visualization, the better model comes up with two topics composed of the following words:
Therefore, the topic coherence for the goodLdaModel should be greater for this than the badLdaModel since the topics it comes up with are more human-interpretable. We will see this using
c_v topic coherence measures.
pyLDAvis.gensim.prepare(goodLdaModel, corpus, dictionary)
pyLDAvis.gensim.prepare(badLdaModel, corpus, dictionary)
goodcm = CoherenceModel(model=goodLdaModel, texts=texts, dictionary=dictionary, coherence='c_v')
badcm = CoherenceModel(model=badLdaModel, texts=texts, dictionary=dictionary, coherence='c_v')
CoherenceModel(segmentation=<function s_one_set at 0x7f663ae8a050>, probability estimation=<function p_boolean_sliding_window at 0x7f663ae8a320>, confirmation measure=<function cosine_similarity at 0x7f663ae8a938>, aggregation=<function arithmetic_mean at 0x7f663ae8aa28>)
Hence as we can see, the
c_v coherence for the good LDA model is much more (better) than that for the bad LDA model. This is because, simply, the good LDA model usually comes up with better topics that are more human interpretable. The badLdaModel however fails to decipher between these two topics and comes up with topics which are not clear to a human. The
c_v topic coherences capture this wonderfully by giving the interpretability of these topics a number as we can see above. Hence this coherence measure can be used to compare difference topic models based on their human-interpretability.