PyCM Document¶

Version : 1.8¶

Table of contents¶

Overview
Installation

Source Code
PyPI
Easy Install

Usage

From Vector
Direct CM
Activation Threshold
Load From File
Sample Weights
Transpose
Relabel
Online Help
Acceptable Data Types

Basic Parameters

True Positive
True Negative
False Positive
False Negative
Condition Positive
Condition Negative
Test Outcome Positive
Test Outcome Negative
Population

Class Statistics

True Positive Rate
True Negative Rate
Positive Predictive Value
Negative Predictive Value
False Negative Rate
False Positive Rate
False Discovery Rate
False Omission Rate
Accuracy
Error Rate
FBeta Score
Matthews Correlation Coefficient
Informedness
Markedness
Positive Likelihood Ratio
Negative Likelihood Ratio
Diagnostic Odds Ratio
Prevalence
G-Measure
Random Accuracy
Random Accuracy Unbiased
Jaccard Index
Information Score
Confusion Entropy
Modified Confusion Entropy
Area Under The ROC Curve
Distance Index
Similarity Index
Discriminant Power
Youden Index
Positive Likelihood Ratio Interpretation
Discriminant Power Interpretation
AUC Value Interpretation
Gini Index
Lift Score

Overall Statistics

Kappa
Kappa Unbiased
Kappa No Prevalence
Kappa 95% CI
Kappa Standard Error
Chi Squared
Chi Squared DF
Phi Squared
Cramer's V
95% CI
Standard Error
Bennett's S
Scott's PI
Gwet's AC1
Reference Entropy
Response Entropy
Cross Entropy
Joint Entropy
Conditional Entropy
Kullback-Liebler Divergence
Mutual Information
Goodman-Kruskal's Lambda A
Goodman-Kruskal's Lambda B
Landis-Koch Benchmark
Fleiss’ Benchmark
Altman’s Benchmark
Cicchetti’s Benchmark
Overall Accuracy
Overall Random Accuracy
Overall Random Accuracy Unbiased
Positive Predictive Value Micro
True Positive Rate Micro
Positive Predictive Value Macro
True Positive Rate Macro
Overall Jaccard Index
Hamming Loss
Zero-one Loss
No Information Rate
P Value
Overall Confusion Entropy
Overall Modified Confusion Entropy
Overall Matthews Correlation Coefficient
Global Performance Index
Class Balance Accuracy
AUNU
AUNP
Relative Classifier Information

Print

Full
Matrix
Normalized Matrix
Stat

Save

pycm
HTML
CSV
Object

Input Errors
Examples
References

Overview¶

PyCM is a multi-class confusion matrix library written in Python that supports both input data vectors and direct matrix, and a proper tool for post-classification model evaluation that supports most classes and overall statistics parameters. PyCM is the swiss-army knife of confusion matrices, targeted mainly at data scientists that need a broad array of metrics for predictive models and an accurate evaluation of large variety of classifiers.

Fig1. PyCM Block Diagram

Installation¶

Source code¶

Download Version 1.8 or Latest Source
Run pip install -r requirements.txt or pip3 install -r requirements.txt (Need root access)
Run python3 setup.py install or python setup.py install (Need root access)

PyPI¶

Check Python Packaging User Guide
Run pip install pycm==1.8 or pip3 install pycm==1.8 (Need root access)

Conda¶

Check Conda Managing Package
conda install -c sepandhaghighi pycm (Need root access)

Easy install¶

Run easy_install --upgrade pycm (Need root access)

Usage¶

From vector¶

In [1]:

from pycm import *

In [2]:

y_actu = [2, 0, 2, 2, 0, 1, 1, 2, 2, 0, 1, 2]
y_pred = [0, 0, 2, 1, 0, 2, 1, 0, 2, 0, 2, 2]

In [3]:

cm = ConfusionMatrix(y_actu, y_pred,digit=5)

Notice : `digit` (the number of digits to the right of the decimal point in a number) is new in version 0.6 (default value : 5)
Only for print and save

In [4]:

cm

Out[4]:

pycm.ConfusionMatrix(classes: [0, 1, 2])

In [5]:

cm.actual_vector

Out[5]:

[2, 0, 2, 2, 0, 1, 1, 2, 2, 0, 1, 2]

In [6]:

cm.predict_vector

Out[6]:

[0, 0, 2, 1, 0, 2, 1, 0, 2, 0, 2, 2]

In [7]:

cm.classes

Out[7]:

[0, 1, 2]

In [8]:

cm.class_stat

Out[8]:

{'ACC': {0: 0.8333333333333334, 1: 0.75, 2: 0.5833333333333334},
 'AUC': {0: 0.8888888888888888, 1: 0.611111111111111, 2: 0.5833333333333333},
 'AUCI': {0: 'Very Good', 1: 'Fair', 2: 'Poor'},
 'BM': {0: 0.7777777777777777, 1: 0.2222222222222221, 2: 0.16666666666666652},
 'CEN': {0: 0.25, 1: 0.49657842846620864, 2: 0.6044162769630221},
 'DOR': {0: 'None', 1: 3.999999999999998, 2: 1.9999999999999998},
 'DP': {0: 'None', 1: 0.331933069996499, 2: 0.16596653499824957},
 'DPI': {0: 'None', 1: 'Poor', 2: 'Poor'},
 'ERR': {0: 0.16666666666666663, 1: 0.25, 2: 0.41666666666666663},
 'F0.5': {0: 0.6521739130434783,
  1: 0.45454545454545453,
  2: 0.5769230769230769},
 'F1': {0: 0.75, 1: 0.4, 2: 0.5454545454545454},
 'F2': {0: 0.8823529411764706, 1: 0.35714285714285715, 2: 0.5172413793103449},
 'FDR': {0: 0.4, 1: 0.5, 2: 0.4},
 'FN': {0: 0, 1: 2, 2: 3},
 'FNR': {0: 0.0, 1: 0.6666666666666667, 2: 0.5},
 'FOR': {0: 0.0, 1: 0.19999999999999996, 2: 0.4285714285714286},
 'FP': {0: 2, 1: 1, 2: 2},
 'FPR': {0: 0.2222222222222222,
  1: 0.11111111111111116,
  2: 0.33333333333333337},
 'G': {0: 0.7745966692414834, 1: 0.408248290463863, 2: 0.5477225575051661},
 'GI': {0: 0.7777777777777777, 1: 0.2222222222222221, 2: 0.16666666666666652},
 'IS': {0: 1.263034405833794, 1: 1.0, 2: 0.2630344058337938},
 'J': {0: 0.6, 1: 0.25, 2: 0.375},
 'LS': {0: 2.4, 1: 2.0, 2: 1.2},
 'MCC': {0: 0.6831300510639732, 1: 0.25819888974716115, 2: 0.1690308509457033},
 'MCEN': {0: 0.2643856189774724, 1: 0.5, 2: 0.6875},
 'MK': {0: 0.6000000000000001, 1: 0.30000000000000004, 2: 0.17142857142857126},
 'N': {0: 9, 1: 9, 2: 6},
 'NLR': {0: 0.0, 1: 0.7500000000000001, 2: 0.75},
 'NPV': {0: 1.0, 1: 0.8, 2: 0.5714285714285714},
 'P': {0: 3, 1: 3, 2: 6},
 'PLR': {0: 4.5, 1: 2.9999999999999987, 2: 1.4999999999999998},
 'PLRI': {0: 'Poor', 1: 'Poor', 2: 'Poor'},
 'POP': {0: 12, 1: 12, 2: 12},
 'PPV': {0: 0.6, 1: 0.5, 2: 0.6},
 'PRE': {0: 0.25, 1: 0.25, 2: 0.5},
 'RACC': {0: 0.10416666666666667,
  1: 0.041666666666666664,
  2: 0.20833333333333334},
 'RACCU': {0: 0.1111111111111111,
  1: 0.04340277777777778,
  2: 0.21006944444444442},
 'TN': {0: 7, 1: 8, 2: 4},
 'TNR': {0: 0.7777777777777778, 1: 0.8888888888888888, 2: 0.6666666666666666},
 'TON': {0: 7, 1: 10, 2: 7},
 'TOP': {0: 5, 1: 2, 2: 5},
 'TP': {0: 3, 1: 1, 2: 3},
 'TPR': {0: 1.0, 1: 0.3333333333333333, 2: 0.5},
 'Y': {0: 0.7777777777777777, 1: 0.2222222222222221, 2: 0.16666666666666652},
 'dInd': {0: 0.2222222222222222, 1: 0.6758625033664689, 2: 0.6009252125773316},
 'sInd': {0: 0.8428651597363228, 1: 0.5220930407198541, 2: 0.5750817072006014}}

Notice : `cm.statistic_result` prev versions (0.2 >)

In [9]:

cm.overall_stat

Out[9]:

{'95% CI': (0.30438856248221097, 0.8622781041844558),
 'AUNP': 0.6666666666666666,
 'AUNU': 0.6944444444444443,
 'Bennett S': 0.37500000000000006,
 'CBA': 0.4777777777777778,
 'Chi-Squared': 6.6,
 'Chi-Squared DF': 4,
 'Conditional Entropy': 0.9591479170272448,
 'Cramer V': 0.5244044240850757,
 'Cross Entropy': 1.5935164295556343,
 'Gwet AC1': 0.3893129770992367,
 'Hamming Loss': 0.41666666666666663,
 'Joint Entropy': 2.4591479170272446,
 'KL Divergence': 0.09351642955563438,
 'Kappa': 0.35483870967741943,
 'Kappa 95% CI': (-0.07707577422109269, 0.7867531935759315),
 'Kappa No Prevalence': 0.16666666666666674,
 'Kappa Standard Error': 0.2203645326012817,
 'Kappa Unbiased': 0.34426229508196726,
 'Lambda A': 0.16666666666666666,
 'Lambda B': 0.42857142857142855,
 'Mutual Information': 0.5242078379544426,
 'NIR': 0.5,
 'Overall ACC': 0.5833333333333334,
 'Overall CEN': 0.4638112995385119,
 'Overall J': (1.225, 0.4083333333333334),
 'Overall MCC': 0.36666666666666664,
 'Overall MCEN': 0.5189369467580801,
 'Overall RACC': 0.3541666666666667,
 'Overall RACCU': 0.3645833333333333,
 'P-Value': 0.38720703125,
 'PPV Macro': 0.5666666666666668,
 'PPV Micro': 0.5833333333333334,
 'Phi-Squared': 0.5499999999999999,
 'RCI': 0.3494718919696284,
 'RR': 4.0,
 'Reference Entropy': 1.5,
 'Response Entropy': 1.4833557549816874,
 'SOA1(Landis & Koch)': 'Fair',
 'SOA2(Fleiss)': 'Poor',
 'SOA3(Altman)': 'Fair',
 'SOA4(Cicchetti)': 'Poor',
 'Scott PI': 0.34426229508196726,
 'Standard Error': 0.14231876063832777,
 'TPR Macro': 0.611111111111111,
 'TPR Micro': 0.5833333333333334,
 'Zero-one Loss': 5}

Notice : new in version 0.3

Notice : `_` removed from overall statistics names in version 1.6

In [10]:

cm.table

Out[10]:

{0: {0: 3, 1: 0, 2: 0}, 1: {0: 0, 1: 1, 2: 2}, 2: {0: 2, 1: 1, 2: 3}}

In [11]:

cm.matrix

Out[11]:

{0: {0: 3, 1: 0, 2: 0}, 1: {0: 0, 1: 1, 2: 2}, 2: {0: 2, 1: 1, 2: 3}}

In [12]:

cm.normalized_matrix

Out[12]:

{0: {0: 1.0, 1: 0.0, 2: 0.0},
 1: {0: 0.0, 1: 0.33333, 2: 0.66667},
 2: {0: 0.33333, 1: 0.16667, 2: 0.5}}

In [13]:

cm.normalized_table

Out[13]:

{0: {0: 1.0, 1: 0.0, 2: 0.0},
 1: {0: 0.0, 1: 0.33333, 2: 0.66667},
 2: {0: 0.33333, 1: 0.16667, 2: 0.5}}

Notice : `matrix`, `normalized_matrix` & `normalized_table` added in version 1.5 (changed from print style)

In [14]:

import numpy

In [15]:

y_actu = numpy.array([2, 0, 2, 2, 0, 1, 1, 2, 2, 0, 1, 2])
y_pred = numpy.array([0, 0, 2, 1, 0, 2, 1, 0, 2, 0, 2, 2])

In [16]:

cm = ConfusionMatrix(y_actu, y_pred,digit=5)

In [17]:

cm

Out[17]:

pycm.ConfusionMatrix(classes: [0, 1, 2])

Notice : `numpy.array` support in versions > 0.7

Direct CM¶

In [18]:

cm2 = ConfusionMatrix(matrix={0: {0: 3, 1: 0, 2: 0}, 1: {0: 0, 1: 1, 2: 2}, 2: {0: 2, 1: 1, 2: 3}},digit=5)

In [19]:

cm2

Out[19]:

pycm.ConfusionMatrix(classes: [0, 1, 2])

In [20]:

cm2.actual_vector

In [21]:

cm2.predict_vector

In [22]:

cm2.classes

Out[22]:

[0, 1, 2]

In [23]:

cm2.class_stat

Out[23]:

{'ACC': {0: 0.8333333333333334, 1: 0.75, 2: 0.5833333333333334},
 'AUC': {0: 0.8888888888888888, 1: 0.611111111111111, 2: 0.5833333333333333},
 'AUCI': {0: 'Very Good', 1: 'Fair', 2: 'Poor'},
 'BM': {0: 0.7777777777777777, 1: 0.2222222222222221, 2: 0.16666666666666652},
 'CEN': {0: 0.25, 1: 0.49657842846620864, 2: 0.6044162769630221},
 'DOR': {0: 'None', 1: 3.999999999999998, 2: 1.9999999999999998},
 'DP': {0: 'None', 1: 0.331933069996499, 2: 0.16596653499824957},
 'DPI': {0: 'None', 1: 'Poor', 2: 'Poor'},
 'ERR': {0: 0.16666666666666663, 1: 0.25, 2: 0.41666666666666663},
 'F0.5': {0: 0.6521739130434783,
  1: 0.45454545454545453,
  2: 0.5769230769230769},
 'F1': {0: 0.75, 1: 0.4, 2: 0.5454545454545454},
 'F2': {0: 0.8823529411764706, 1: 0.35714285714285715, 2: 0.5172413793103449},
 'FDR': {0: 0.4, 1: 0.5, 2: 0.4},
 'FN': {0: 0, 1: 2, 2: 3},
 'FNR': {0: 0.0, 1: 0.6666666666666667, 2: 0.5},
 'FOR': {0: 0.0, 1: 0.19999999999999996, 2: 0.4285714285714286},
 'FP': {0: 2, 1: 1, 2: 2},
 'FPR': {0: 0.2222222222222222,
  1: 0.11111111111111116,
  2: 0.33333333333333337},
 'G': {0: 0.7745966692414834, 1: 0.408248290463863, 2: 0.5477225575051661},
 'GI': {0: 0.7777777777777777, 1: 0.2222222222222221, 2: 0.16666666666666652},
 'IS': {0: 1.263034405833794, 1: 1.0, 2: 0.2630344058337938},
 'J': {0: 0.6, 1: 0.25, 2: 0.375},
 'LS': {0: 2.4, 1: 2.0, 2: 1.2},
 'MCC': {0: 0.6831300510639732, 1: 0.25819888974716115, 2: 0.1690308509457033},
 'MCEN': {0: 0.2643856189774724, 1: 0.5, 2: 0.6875},
 'MK': {0: 0.6000000000000001, 1: 0.30000000000000004, 2: 0.17142857142857126},
 'N': {0: 9, 1: 9, 2: 6},
 'NLR': {0: 0.0, 1: 0.7500000000000001, 2: 0.75},
 'NPV': {0: 1.0, 1: 0.8, 2: 0.5714285714285714},
 'P': {0: 3, 1: 3, 2: 6},
 'PLR': {0: 4.5, 1: 2.9999999999999987, 2: 1.4999999999999998},
 'PLRI': {0: 'Poor', 1: 'Poor', 2: 'Poor'},
 'POP': {0: 12, 1: 12, 2: 12},
 'PPV': {0: 0.6, 1: 0.5, 2: 0.6},
 'PRE': {0: 0.25, 1: 0.25, 2: 0.5},
 'RACC': {0: 0.10416666666666667,
  1: 0.041666666666666664,
  2: 0.20833333333333334},
 'RACCU': {0: 0.1111111111111111,
  1: 0.04340277777777778,
  2: 0.21006944444444442},
 'TN': {0: 7, 1: 8, 2: 4},
 'TNR': {0: 0.7777777777777778, 1: 0.8888888888888888, 2: 0.6666666666666666},
 'TON': {0: 7, 1: 10, 2: 7},
 'TOP': {0: 5, 1: 2, 2: 5},
 'TP': {0: 3, 1: 1, 2: 3},
 'TPR': {0: 1.0, 1: 0.3333333333333333, 2: 0.5},
 'Y': {0: 0.7777777777777777, 1: 0.2222222222222221, 2: 0.16666666666666652},
 'dInd': {0: 0.2222222222222222, 1: 0.6758625033664689, 2: 0.6009252125773316},
 'sInd': {0: 0.8428651597363228, 1: 0.5220930407198541, 2: 0.5750817072006014}}

In [24]:

cm.overall_stat

Out[24]:

{'95% CI': (0.30438856248221097, 0.8622781041844558),
 'AUNP': 0.6666666666666666,
 'AUNU': 0.6944444444444443,
 'Bennett S': 0.37500000000000006,
 'CBA': 0.4777777777777778,
 'Chi-Squared': 6.6,
 'Chi-Squared DF': 4,
 'Conditional Entropy': 0.9591479170272448,
 'Cramer V': 0.5244044240850757,
 'Cross Entropy': 1.5935164295556343,
 'Gwet AC1': 0.3893129770992367,
 'Hamming Loss': 0.41666666666666663,
 'Joint Entropy': 2.4591479170272446,
 'KL Divergence': 0.09351642955563438,
 'Kappa': 0.35483870967741943,
 'Kappa 95% CI': (-0.07707577422109269, 0.7867531935759315),
 'Kappa No Prevalence': 0.16666666666666674,
 'Kappa Standard Error': 0.2203645326012817,
 'Kappa Unbiased': 0.34426229508196726,
 'Lambda A': 0.16666666666666666,
 'Lambda B': 0.42857142857142855,
 'Mutual Information': 0.5242078379544426,
 'NIR': 0.5,
 'Overall ACC': 0.5833333333333334,
 'Overall CEN': 0.4638112995385119,
 'Overall J': (1.225, 0.4083333333333334),
 'Overall MCC': 0.36666666666666664,
 'Overall MCEN': 0.5189369467580801,
 'Overall RACC': 0.3541666666666667,
 'Overall RACCU': 0.3645833333333333,
 'P-Value': 0.38720703125,
 'PPV Macro': 0.5666666666666668,
 'PPV Micro': 0.5833333333333334,
 'Phi-Squared': 0.5499999999999999,
 'RCI': 0.3494718919696284,
 'RR': 4.0,
 'Reference Entropy': 1.5,
 'Response Entropy': 1.4833557549816874,
 'SOA1(Landis & Koch)': 'Fair',
 'SOA2(Fleiss)': 'Poor',
 'SOA3(Altman)': 'Fair',
 'SOA4(Cicchetti)': 'Poor',
 'Scott PI': 0.34426229508196726,
 'Standard Error': 0.14231876063832777,
 'TPR Macro': 0.611111111111111,
 'TPR Micro': 0.5833333333333334,
 'Zero-one Loss': 5}

Notice : new in version 0.8.1
In direct matrix mode `actual_vector` and `predict_vector` are empty

Activation threshold¶

threshold is added in version 0.9 for real value prediction. For more information visit Example 3

Notice : new in version 0.9

Load from file¶

file is added in version 0.9.5 in order to load saved confusion matrix with .obj format generated by save_obj method.

For more information visit Example 4

Notice : new in version 0.9.5

Sample weights¶

sample_weight is added in version 1.2

For more information visit Example 5

Notice : new in version 1.2

Transpose¶

transpose is added in version 1.2 in order to transpose input matrix (only in Direct CM mode)

In [25]:

cm = ConfusionMatrix(matrix={0: {0: 3, 1: 0, 2: 0}, 1: {0: 0, 1: 1, 2: 2}, 2: {0: 2, 1: 1, 2: 3}},digit=5,transpose=True)

In [26]:

cm.print_matrix()

Predict          0    1    2    
Actual
0                3    0    2    

1                0    1    1    

2                0    2    3

Notice : new in version 1.2

Relabel¶

relabel method is added in version 1.5 in order to change ConfusionMatrix classnames.

In [27]:

cm.relabel(mapping={0:"L1",1:"L2",2:"L3"})

In [28]:

cm

Out[28]:

pycm.ConfusionMatrix(classes: ['L1', 'L2', 'L3'])

Notice : new in version 1.5

Online help¶

online_help function is added in version 1.1 in order to open each statistics definition in web browser

>>> from pycm import online_help
>>> online_help("J")
>>> online_help("SOA1(Landis & Koch)")
>>> online_help(2)

List of items are available by calling online_help() (without argument)

In [29]:

online_help()

Please choose one parameter : 

Example : online_help("J") or online_help(2)

1-95% CI
2-ACC
3-AUC
4-AUCI
5-AUNP
6-AUNU
7-BM
8-Bennett S
9-CBA
10-CEN
11-Chi-Squared
12-Chi-Squared DF
13-Conditional Entropy
14-Cramer V
15-Cross Entropy
16-DOR
17-DP
18-DPI
19-ERR
20-F0.5
21-F1
22-F2
23-FDR
24-FN
25-FNR
26-FOR
27-FP
28-FPR
29-G
30-GI
31-Gwet AC1
32-Hamming Loss
33-IS
34-J
35-Joint Entropy
36-KL Divergence
37-Kappa
38-Kappa 95% CI
39-Kappa No Prevalence
40-Kappa Standard Error
41-Kappa Unbiased
42-LS
43-Lambda A
44-Lambda B
45-MCC
46-MCEN
47-MK
48-Mutual Information
49-N
50-NIR
51-NLR
52-NPV
53-Overall ACC
54-Overall CEN
55-Overall J
56-Overall MCC
57-Overall MCEN
58-Overall RACC
59-Overall RACCU
60-P
61-P-Value
62-PLR
63-PLRI
64-POP
65-PPV
66-PPV Macro
67-PPV Micro
68-PRE
69-Phi-Squared
70-RACC
71-RACCU
72-RCI
73-RR
74-Reference Entropy
75-Response Entropy
76-SOA1(Landis & Koch)
77-SOA2(Fleiss)
78-SOA3(Altman)
79-SOA4(Cicchetti)
80-Scott PI
81-Standard Error
82-TN
83-TNR
84-TON
85-TOP
86-TP
87-TPR
88-TPR Macro
89-TPR Micro
90-Y
91-Zero-one Loss
92-dInd
93-sInd

Acceptable data types¶

actual_vector : python list or numpy array of any stringable objects
predict_vector : python list or numpy array of any stringable objects
matrix : dict
digit: int
threshold : FunctionType (function or lambda)
file : File object
sample_weight : python list or numpy array of any stringable objects
transpose : bool

run help(ConfusionMatrix) for more information

Basic parameters¶

TP (True positive)¶

A true positive test result is one that detects the condition when the condition is present (correctly identified) [3].

In [30]:

cm.TP

Out[30]:

{'L1': 3, 'L2': 1, 'L3': 3}

TN (True negative)¶

A true negative test result is one that does not detect the condition when the condition is absent correctly rejected) [3].

In [31]:

cm.TN

Out[31]:

{'L1': 7, 'L2': 8, 'L3': 4}

FP (False positive)¶

A false positive test result is one that detects the condition when the condition is absent (incorrectly identified) [3].

In [32]:

cm.FP

Out[32]:

{'L1': 0, 'L2': 2, 'L3': 3}

FN (False negative)¶

A false negative test result is one that does not detect the condition when the condition is present (incorrectly rejected) [3].

In [33]:

cm.FN

Out[33]:

{'L1': 2, 'L2': 1, 'L3': 2}

P (Condition positive)¶

Number of positive samples. Also known as support (the number of occurrences of each class in y_true) [3].

In [34]:

cm.P

Out[34]:

{'L1': 5, 'L2': 2, 'L3': 5}

N (Condition negative)¶

Number of negative samples [3].

In [35]:

cm.N

Out[35]:

{'L1': 7, 'L2': 10, 'L3': 7}

TOP (Test outcome positive)¶

Number of positive outcomes [3].

In [36]:

cm.TOP

Out[36]:

{'L1': 3, 'L2': 3, 'L3': 6}

TON (Test outcome negative)¶

Number of negative outcomes [3].

In [37]:

cm.TON

Out[37]:

{'L1': 9, 'L2': 9, 'L3': 6}

POP (Population)¶

For more information visit [3].

In [38]:

cm.POP

Out[38]:

{'L1': 12, 'L2': 12, 'L3': 12}

Wikipedia page

Class statistics¶

TPR (True positive rate)¶

Sensitivity (also called the true positive rate, the recall, or probability of detection in some fields) measures the proportion of positives that are correctly identified as such (e.g. the percentage of sick people who are correctly identified as having the condition) [3].

Wikipedia page

$$TPR=\frac{TP}{P}=\frac{TP}{TP+FN}$$

In [39]:

cm.TPR

Out[39]:

{'L1': 0.6, 'L2': 0.5, 'L3': 0.6}

TNR (True negative rate)¶

Specificity (also called the true negative rate) measures the proportion of negatives that are correctly identified as such (e.g. the percentage of healthy people who are correctly identified as not having the condition) [3].

Wikipedia page

$$TNR=\frac{TN}{N}=\frac{TN}{TN+FP}$$

In [40]:

cm.TNR

Out[40]:

{'L1': 1.0, 'L2': 0.8, 'L3': 0.5714285714285714}

PPV (Positive predictive value)¶

Predictive value positive is the proportion of positives that correspond to the presence of the condition [3].

Wikipedia page

$$PPV=\frac{TP}{TP+FP}$$

In [41]:

cm.PPV

Out[41]:

{'L1': 1.0, 'L2': 0.3333333333333333, 'L3': 0.5}

NPV (Negative predictive value)¶

Predictive value negative is the proportion of negatives that correspond to the absence of the condition [3].

Wikipedia page

$$NPV=\frac{TN}{TN+FN}$$

In [42]:

cm.NPV

Out[42]:

{'L1': 0.7777777777777778, 'L2': 0.8888888888888888, 'L3': 0.6666666666666666}

FNR (False negative rate)¶

The false negative rate is the proportion of positives which yield negative test outcomes with the test, i.e., the conditional probability of a negative test result given that the condition being looked for is present [3].

Wikipedia page

$$FNR=\frac{FN}{P}=\frac{FN}{FN+TP}=1-TPR$$

In [43]:

cm.FNR

Out[43]:

{'L1': 0.4, 'L2': 0.5, 'L3': 0.4}

FPR (False positive rate)¶

The false positive rate is the proportion of all negatives that still yield positive test outcomes, i.e., the conditional probability of a positive test result given an event that was not present [3].

The false positive rate is equal to the significance level. The specificity of the test is equal to 1 minus the false positive rate.

Wikipedia page

$$FPR=\frac{FP}{N}=\frac{FP}{FP+TN}=1-TNR$$

In [44]:

cm.FPR

Out[44]:

{'L1': 0.0, 'L2': 0.19999999999999996, 'L3': 0.4285714285714286}

FDR (False discovery rate)¶

The false discovery rate (FDR) is a method of conceptualizing the rate of type I errors in null hypothesis testing when conducting multiple comparisons. FDR-controlling procedures are designed to control the expected proportion of "discoveries" (rejected null hypotheses) that are false (incorrect rejections) [3].

Wikipedia page

$$FDR=\frac{FP}{FP+TP}=1-PPV$$

In [45]:

cm.FDR

Out[45]:

{'L1': 0.0, 'L2': 0.6666666666666667, 'L3': 0.5}

FOR (False omission rate)¶

False omission rate (FOR) is a statistical method used in multiple hypothesis testing to correct for multiple comparisons and it is the complement of the negative predictive value. It measures the proportion of false negatives which are incorrectly rejected [3].

Wikipedia page

$$FOR=\frac{FN}{FN+TN}=1-NPV$$

In [46]:

cm.FOR

Out[46]:

{'L1': 0.2222222222222222,
 'L2': 0.11111111111111116,
 'L3': 0.33333333333333337}

ACC (Accuracy)¶

The accuracy is the number of correct predictions from all predictions made [3].

Wikipedia page

$$ACC=\frac{TP+TN}{P+N}=\frac{TP+TN}{TP+TN+FP+FN}$$

In [47]:

cm.ACC

Out[47]:

{'L1': 0.8333333333333334, 'L2': 0.75, 'L3': 0.5833333333333334}

ERR (Error rate)¶

The accuracy is the number of incorrect predictions from all predictions made [3].

$$ERR=\frac{FP+FN}{P+N}=\frac{FP+FN}{TP+TN+FP+FN}=1-ACC$$

In [48]:

cm.ERR

Out[48]:

{'L1': 0.16666666666666663, 'L2': 0.25, 'L3': 0.41666666666666663}

Notice : new in version 0.4

FBeta-Score¶

In statistical analysis of classification, the F1 score (also F-score or F-measure) is a measure of a test's accuracy. It considers both the precision p and the recall r of the test to compute the score. The F1 score is the harmonic average of the precision and recall, where F1 score reaches its best value at 1 (perfect precision and recall) and worst at 0 [3].

Wikipedia page

$$F_{\beta}=(1+\beta^2)\times \frac{PPV\times TPR}{(\beta^2 \times PPV)+TPR}=\frac{(1+\beta^2) \times TP}{(1+\beta^2)\times TP+FP+\beta^2 \times FN}$$

In [49]:

cm.F1

Out[49]:

{'L1': 0.75, 'L2': 0.4, 'L3': 0.5454545454545454}

In [50]:

cm.F05

Out[50]:

{'L1': 0.8823529411764706, 'L2': 0.35714285714285715, 'L3': 0.5172413793103449}

In [51]:

cm.F2

Out[51]:

{'L1': 0.6521739130434783, 'L2': 0.45454545454545453, 'L3': 0.5769230769230769}

In [52]:

cm.F_beta(Beta=4)

Out[52]:

{'L1': 0.6144578313253012, 'L2': 0.4857142857142857, 'L3': 0.5930232558139535}

Notice : new in version 0.4

MCC (Matthews correlation coefficient)¶

The Matthews correlation coefficient is used in machine learning as a measure of the quality of binary (two-class) classifications, introduced by biochemist Brian W. Matthews in 1975. It takes into account true and false positives and negatives and is generally regarded as a balanced measure which can be used even if the classes are of very different sizes.The MCC is in essence a correlation coefficient between the observed and predicted binary classifications; it returns a value between −1 and +1. A coefficient of +1 represents a perfect prediction, 0 no better than random prediction and −1 indicates total disagreement between prediction and observation [27].

Wikipedia page

$$MCC=\frac{TP \times TN-FP \times FN}{\sqrt{(TP+FP)\times (TP+FN)\times (TN+FP)\times (TN+FN)}}$$

In [53]:

cm.MCC

Out[53]:

{'L1': 0.6831300510639732, 'L2': 0.25819888974716115, 'L3': 0.1690308509457033}

BM (Bookmaker informedness)¶

The informedness of a prediction method as captured by a contingency matrix is defined as the probability that the prediction method will make a correct decision as opposed to guessing and is calculated using the bookmaker algorithm [2].

$$BM=TPR+TNR-1$$

In [54]:

cm.BM

Out[54]:

{'L1': 0.6000000000000001,
 'L2': 0.30000000000000004,
 'L3': 0.17142857142857126}

MK (Markedness)¶

In statistics and psychology, the social science concept of markedness is quantified as a measure of how much one variable is marked as a predictor or possible cause of another, and is also known as Δp (deltaP) in simple two-choice cases [2].

$$MK=PPV+NPV-1$$

In [55]:

cm.MK

Out[55]:

{'L1': 0.7777777777777777, 'L2': 0.2222222222222221, 'L3': 0.16666666666666652}

PLR (Positive likelihood ratio)¶

Likelihood ratios are used for assessing the value of performing a diagnostic test. They use the sensitivity and specificity of the test to determine whether a test result usefully changes the probability that a condition (such as a disease state) exists. The first description of the use of likelihood ratios for decision rules was made at a symposium on information theory in 1954 [28].

Wikipedia page

$$LR_+=PLR=\frac{TPR}{FPR}$$

In [56]:

cm.PLR

Out[56]:

{'L1': 'None', 'L2': 2.5000000000000004, 'L3': 1.4}

Notice : `LR+` renamed to `PLR` in version 1.5

NLR (Negative likelihood ratio)¶

Likelihood ratios are used for assessing the value of performing a diagnostic test. They use the sensitivity and specificity of the test to determine whether a test result usefully changes the probability that a condition (such as a disease state) exists. The first description of the use of likelihood ratios for decision rules was made at a symposium on information theory in 1954 [28].

Wikipedia page

$$LR_-=NLR=\frac{FNR}{TNR}$$

In [57]:

cm.NLR

Out[57]:

{'L1': 0.4, 'L2': 0.625, 'L3': 0.7000000000000001}

Notice : `LR-` renamed to `NLR` in version 1.5

DOR (Diagnostic odds ratio)¶

The diagnostic odds ratio is a measure of the effectiveness of a diagnostic test. It is defined as the ratio of the odds of the test being positive if the subject has a disease relative to the odds of the test being positive if the subject does not have the disease [28].

Wikipedia page

$$DOR=\frac{LR+}{LR-}$$

In [58]:

cm.DOR

Out[58]:

{'L1': 'None', 'L2': 4.000000000000001, 'L3': 1.9999999999999998}

PRE (Prevalence)¶

Prevalence is a statistical concept referring to the number of cases of a disease that are present in a particular population at a given time (Reference Likelihood) [14].

Wikipedia page

$$Prevalence=\frac{P}{POP}$$

In [59]:

cm.PRE

Out[59]:

{'L1': 0.4166666666666667, 'L2': 0.16666666666666666, 'L3': 0.4166666666666667}

G (G-measure)¶

Geometric mean of precision and sensitivity [3].

Wikipedia page

$$G=\sqrt{PPV\times TPR}$$

In [60]:

cm.G

Out[60]:

{'L1': 0.7745966692414834, 'L2': 0.408248290463863, 'L3': 0.5477225575051661}

RACC (Random accuracy)¶

The expected accuracy from a strategy of randomly guessing categories according to reference and response distributions [24].

$$RACC=\frac{TOP \times P}{POP^2}$$

In [61]:

cm.RACC

Out[61]:

{'L1': 0.10416666666666667,
 'L2': 0.041666666666666664,
 'L3': 0.20833333333333334}

Notice : new in version 0.3

RACCU (Random accuracy unbiased)¶

The expected accuracy from a strategy of randomly guessing categories according to the average of the reference and response distributions [25].

$$RACCU=(\frac{TOP+P}{2 \times POP})^2$$

In [62]:

cm.RACCU

Out[62]:

{'L1': 0.1111111111111111,
 'L2': 0.04340277777777778,
 'L3': 0.21006944444444442}

Notice : new in version 0.8.1

J (Jaccard index)¶

The Jaccard index, also known as Intersection over Union and the Jaccard similarity coefficient (originally coined coefficient de communauté by Paul Jaccard), is a statistic used for comparing the similarity and diversity of sample sets [29].

Wikipedia page

$$A=Vector_{Actual}$$$$B=Vector_{Predict}$$

$$J(A,B)=\frac{|A\cap B|}{|A\cup B|}=\frac{|A\cap B|}{|A|+|B|-|A\cap B|}$$

In [63]:

cm.J

Out[63]:

{'L1': 0.6, 'L2': 0.25, 'L3': 0.375}

Notice : new in version 0.9

IS (Information score)¶

The amount of information needed to correctly classify an example into class C, whose prior probability is p(C), is defined as -log2(p(C)) [18].

$$IS=-log_2(\frac{TP+FN}{POP})+log_2(\frac{TP}{TP+FP})$$

In [64]:

cm.IS

Out[64]:

{'L1': 1.2630344058337937, 'L2': 0.9999999999999998, 'L3': 0.26303440583379367}

Notice : new in version 1.3

CEN (Confusion entropy)¶

CEN based upon the concept of entropy for evaluating classifier performances. By exploiting the misclassification information of confusion matrices, the measure evaluates the confusion level of the class distribution of misclassified samples. Both theoretical analysis and statistical results show that the proposed measure is more discriminating than accuracy and RCI while it remains relatively consistent with the two measures. Moreover, it is more capable of measuring how the samples of different classes have been separated from each other. Hence the proposed measure is more precise than the two measures and can substitute for them to evaluate classifiers in classification applications [17].

$$P_{i,j}^{j}=\frac{Matrix(i,j)}{\sum_{k=1}^{|C|}\Big(Matrix(j,k)+Matrix(k,j)\Big)}$$

$$P_{i,j}^{i}=\frac{Matrix(i,j)}{\sum_{k=1}^{|C|}\Big(Matrix(i,k)+Matrix(k,i)\Big)}$$

$$CEN_j=-\sum_{k=1,k\neq j}^{|C|}\Bigg(P_{j,k}^jlog_{2(|C|-1)}\Big(P_{j,k}^j\Big)+P_{k,j}^jlog_{2(|C|-1)}\Big(P_{k,j}^j\Big)\Bigg)$$

In [65]:

cm.CEN

Out[65]:

{'L1': 0.25, 'L2': 0.49657842846620864, 'L3': 0.6044162769630221}

Notice : new in version 1.3

MCEN (Modified confusion entropy)¶

Modified version of CEN [19].

$$P_{i,j}^{j}=\frac{Matrix(i,j)}{\sum_{k=1}^{|C|}\Big(Matrix(j,k)+Matrix(k,j)\Big)-Matrix(j,j)}$$

$$P_{i,j}^{i}=\frac{Matrix(i,j)}{\sum_{k=1}^{|C|}\Big(Matrix(i,k)+Matrix(k,i)\Big)-Matrix(i,i)}$$

$$MCEN_j=-\sum_{k=1,k\neq j}^{|C|}\Bigg(P_{j,k}^jlog_{2(|C|-1)}\Big(P_{j,k}^j\Big)+P_{k,j}^jlog_{2(|C|-1)}\Big(P_{k,j}^j\Big)\Bigg)$$

In [66]:

cm.MCEN

Out[66]:

{'L1': 0.2643856189774724, 'L2': 0.5, 'L3': 0.6875}

Notice : new in version 1.3

AUC (Area under the ROC curve)¶

Thus, AUC corresponds to the arithmetic mean of sensitivity and specificity values of each class [23].

$$AUC=\frac{TNR+TPR}{2}$$

In [67]:

cm.AUC

Out[67]:

{'L1': 0.8, 'L2': 0.65, 'L3': 0.5857142857142856}

Notice : new in version 1.4
Notice : this is an approximate calculation of AUC

dInd (Distance index)¶

Euclidean distance of a ROC point from the top left corner of the ROC space, which can take values between 0 (perfect classification) and sqrt(2) [23].

$$dInd=\sqrt{(1-TNR)^2+(1-TPR)^2}$$

In [68]:

cm.dInd

Out[68]:

{'L1': 0.4, 'L2': 0.5385164807134504, 'L3': 0.5862367008195198}

Notice : new in version 1.4

sInd (Similarity index)¶

sInd is comprised between 0 (no correct classifications) and 1 (perfect classification) [23].

$$sInd = 1 - \sqrt{\frac{(1-TNR)^2+(1-TPR)^2}{2}}$$

In [69]:

cm.sInd

Out[69]:

{'L1': 0.717157287525381, 'L2': 0.6192113447068046, 'L3': 0.5854680534700882}

Notice : new in version 1.4

DP (Discriminant power)¶

Discriminant power (DP) is a measure that summarizes sensitivity and specificity. The DP has been used mainly in feature selection over imbalanced data [33].

$$X=\frac{TPR}{1-TPR}$$

$$Y=\frac{TNR}{1-TNR}$$

$$DP=\frac{\sqrt{3}}{\pi}(log_{10}X+log_{10}Y)$$

In [70]:

cm.DP

Out[70]:

{'L1': 'None', 'L2': 0.33193306999649924, 'L3': 0.1659665349982495}

Notice : new in version 1.5

Y (Youden index)¶

Youden’s index, evaluates the algorithm’s ability to avoid failure; it’s derived from sensitivity and specificity and denotes a linear correspondence balanced accuracy. As Youden’s index is a linear transformation of the mean sensitivity and specificity, its values are difficult to interpret, we retain that a higher value of Y indicates better ability to avoid failure. Youden’s index has been conventionally used to evaluate tests diagnostic, improve efficiency of Telemedical prevention [33] [34].

Wikipedia page

$$\gamma=BM=TPR+TNR-1$$

In [71]:

cm.Y

Out[71]:

{'L1': 0.6000000000000001,
 'L2': 0.30000000000000004,
 'L3': 0.17142857142857126}

Notice : new in version 1.5

PLRI (Positive likelihood ratio interpretation)¶

For more information visit [33].

PLR	Model contribution
1 >	Negligible
1 - 5	Poor
5 - 10	Fair
> 10	Good

In [72]:

cm.PLRI

Out[72]:

{'L1': 'None', 'L2': 'Poor', 'L3': 'Poor'}

Notice : new in version 1.5

DPI (Discriminant power interpretation)¶

For more information visit [33].

DP	Model contribution
1 >	Poor
1 - 2	Limited
2 - 3	Fair
> 3	Good

In [73]:

cm.DPI

Out[73]:

{'L1': 'None', 'L2': 'Poor', 'L3': 'Poor'}

Notice : new in version 1.5

AUCI (AUC value interpretation)¶

For more information visit [33].

AUC	Model performance
0.5 - 0.6	Poor
0.6 - 0.7	Fair
0.7 - 0.8	Good
0.8 - 0.9	Very Good
0.9 - 1.0	Excellent

In [74]:

cm.AUCI

Out[74]:

{'L1': 'Very Good', 'L2': 'Fair', 'L3': 'Poor'}

Notice : new in version 1.6

GI (Gini index)¶

A chance-standardized variant of the AUC is given by Gini coefficient, taking values between 0 (no difference between the score distributions of the two classes) and 1 (complete separation between the two distributions). Gini coefficient is widespread use metric in imbalanced data learning [33].

Wikipedia page

$$GI=2\times AUC-1$$

In [75]:

cm.GI

Out[75]:

{'L1': 0.6000000000000001,
 'L2': 0.30000000000000004,
 'L3': 0.17142857142857126}

Notice : new in version 1.7

LS (Lift index)¶

In the context of classification, lift compares model predictions to randomly generated predictions. Lift is often used in marketing research combined with gain and lift charts as a visual aid [35] [36].

$$LS=\frac{PPV}{PRE}$$

In [76]:

cm.LS

Out[76]:

{'L1': 2.4, 'L2': 2.0, 'L3': 1.2}

Notice : new in version 1.8

Overall statistics¶

Kappa¶

Kappa is a statistic which measures inter-rater agreement for qualitative (categorical) items. It is generally thought to be a more robust measure than simple percent agreement calculation, as kappa takes into account the possibility of the agreement occurring by chance [24].

Wikipedia page

$$Kappa=\frac{ACC_{Overall}-RACC_{Overall}}{1-RACC_{Overall}}$$

In [77]:

cm.Kappa

Out[77]:

0.35483870967741943

Notice : new in version 0.3

Kappa unbiased¶

The unbiased kappa value is defined in terms of total accuracy and a slightly different computation of expected likelihood that averages the reference and response probabilities [25].

$$Kappa_{Unbiased}=\frac{ACC_{Overall}-RACCU_{Overall}}{1-RACCU_{Overall}}$$

In [78]:

cm.KappaUnbiased

Out[78]:

0.34426229508196726

Notice : new in version 0.8.1

Kappa no prevalence¶

The kappa statistic adjusted for prevalence [14].

$$Kappa_{NoPrevalence}=2 \times ACC_{Overall}-1$$

In [79]:

cm.KappaNoPrevalence

Out[79]:

0.16666666666666674

Notice : new in version 0.8.1

Kappa 95% CI¶

Kappa 95% Confidence Interval [24].

$$SE_{Kappa}=\sqrt{\frac{ACC_{Overall}\times (1-RACC_{Overall})}{(1-RACC_{Overall})^2}}$$

$$Kappa \pm 1.96\times SE_{Kappa}$$

In [80]:

cm.Kappa_SE

Out[80]:

0.2203645326012817

In [81]:

cm.Kappa_CI

Out[81]:

(-0.07707577422109269, 0.7867531935759315)

Notice : new in version 0.7

Chi-squared¶

Pearson's chi-squared test is a statistical test applied to sets of categorical data to evaluate how likely it is that any observed difference between the sets arose by chance. It is suitable for unpaired data from large samples [10].

Wikipedia page

$$\chi^2=\sum_{i=1}^n\sum_{j=1}^n\frac{\Big(Matrix(i,j)-E(i,j)\Big)^2}{E(i,j)}$$

$$E(i,j)=\frac{TOP_j\times P_i}{POP}$$

In [82]:

cm.Chi_Squared

Out[82]:

6.6000000000000005

Notice : new in version 0.7

Chi-squared DF¶

Number of degrees of freedom of this confusion matrix for the chi-squared statistic [10].

$$DF=(|C|-1)^2$$

In [83]:

cm.DF

Out[83]:

Notice : new in version 0.7

Phi-squared¶

In statistics, the phi coefficient (or mean square contingency coefficient) is a measure of association for two binary variables. Introduced by Karl Pearson, this measure is similar to the Pearson correlation coefficient in its interpretation. In fact, a Pearson correlation coefficient estimated for two binary variables will return the phi coefficient [10].

Wikipedia page

$$\phi^2=\frac{\chi^2}{POP}$$

In [84]:

cm.Phi_Squared

Out[84]:

0.55

Notice : new in version 0.7

Cramer's V¶

In statistics, Cramér's V (sometimes referred to as Cramér's phi) is a measure of association between two nominal variables, giving a value between 0 and +1 (inclusive). It is based on Pearson's chi-squared statistic and was published by Harald Cramér in 1946 [26].

Wikipedia page

$$V=\sqrt{\frac{\phi^2}{|C|-1}}$$

In [85]:

cm.V

Out[85]:

0.5244044240850758

Notice : new in version 0.7

95% CI¶

In statistics, a confidence interval (CI) is a type of interval estimate (of a population parameter) that is computed from the observed data. The confidence level is the frequency (i.e., the proportion) of possible confidence intervals that contain the true value of their corresponding parameter. In other words, if confidence intervals are constructed using a given confidence level in an infinite number of independent experiments, the proportion of those intervals that contain the true value of the parameter will match the confidence level [31].

Wikipedia page

$$SE_{ACC}=\sqrt{\frac{ACC\times (1-ACC)}{POP}}$$

$$ACC \pm 1.96\times SE_{ACC}$$

In [86]:

cm.CI

Out[86]:

(0.30438856248221097, 0.8622781041844558)

In [87]:

cm.SE

Out[87]:

0.14231876063832777

Notice : new in version 0.7

Bennett's S¶

Bennett, Alpert & Goldstein’s S is a statistical measure of inter-rater agreement. It was created by Bennett et al. in 1954. Bennett et al. suggested adjusting inter-rater reliability to accommodate the percentage of rater agreement that might be expected by chance was a better measure than simple agreement between raters [8].

Wikipedia Page

$$p_c=\frac{1}{|C|}$$

$$S=\frac{ACC_{Overall}-p_c}{1-p_c}$$

In [88]:

cm.S

Out[88]:

0.37500000000000006

Notice : new in version 0.5

Scott's Pi¶

Scott's pi (named after William A. Scott) is a statistic for measuring inter-rater reliability for nominal data in communication studies. Textual entities are annotated with categories by different annotators, and various measures are used to assess the extent of agreement between the annotators, one of which is Scott's pi. Since automatically annotating text is a popular problem in natural language processing, and goal is to get the computer program that is being developed to agree with the humans in the annotations it creates, assessing the extent to which humans agree with each other is important for establishing a reasonable upper limit on computer performance [7].

Wikipedia page

$$p_c=\sum_{i=1}^{|C|}(\frac{TOP_i + P_i}{2\times POP})^2$$

$$\pi=\frac{ACC_{Overall}-p_c}{1-p_c}$$

In [89]:

cm.PI

Out[89]:

0.34426229508196726

Notice : new in version 0.5

Gwet's AC1¶

AC1 was originally introduced by Gwet in 2001 (Gwet, 2001). The interpretation of AC1 is similar to generalized kappa (Fleiss, 1971), which is used to assess interrater reliability of when there are multiple raters. Gwet (2002) demonstrated that AC1 can overcome the limitations that kappa is sensitive to trait prevalence and rater's classification probabilities (i.e., marginal probabilities), whereas AC1 provides more robust measure of interrater reliability [6].

$$\pi_i=\frac{TOP_i + P_i}{2\times POP}$$

$$p_c=\frac{1}{|C|-1}\sum_{i=1}^{|C|}\Big(\pi_i\times (1-\pi_i)\Big)$$

$$AC_1=\frac{ACC_{Overall}-p_c}{1-p_c}$$

In [90]:

cm.AC1

Out[90]:

0.3893129770992367

Notice : new in version 0.5

Reference entropy¶

The entropy of the decision problem itself as defined by the counts for the reference. The entropy of a distribution is the average negative log probability of outcomes [30].

$$Likelihood_{Reference}=\frac{P_i}{POP}$$

$$Entropy_{Reference}=-\sum_{i=1}^{|C|}Likelihood_{Reference}(i)\times\log_{2}{Likelihood_{Reference}(i)}$$

$$0\times\log_{2}{0}\equiv0$$

In [91]:

cm.ReferenceEntropy

Out[91]:

1.4833557549816874

Notice : new in version 0.8.1

Response entropy¶

The entropy of the response distribution. The entropy of a distribution is the average negative log probability of outcomes [30].

$$Likelihood_{Response}=\frac{TOP_i}{POP}$$

$$Entropy_{Response}=-\sum_{i=1}^{|C|}Likelihood_{Response}(i)\times\log_{2}{Likelihood_{Response}(i)}$$

$$0\times\log_{2}{0}\equiv0$$

In [92]:

cm.ResponseEntropy

Out[92]:

1.5

Notice : new in version 0.8.1

Cross entropy¶

The cross-entropy of the response distribution against the reference distribution. The cross-entropy is defined by the negative log probabilities of the response distribution weighted by the reference distribution [30].

Wikipedia page

$$Likelihood_{Reference}=\frac{P_i}{POP}$$

$$Likelihood_{Response}=\frac{TOP_i}{POP}$$

$$Entropy_{Cross}=-\sum_{i=1}^{|C|}Likelihood_{Reference}(i)\times\log_{2}{Likelihood_{Response}(i)}$$

$$0\times\log_{2}{0}\equiv0$$

In [93]:

cm.CrossEntropy

Out[93]:

1.5833333333333335

Notice : new in version 0.8.1

Joint entropy¶

The entropy of the joint reference and response distribution as defined by the underlying matrix [30].

$$P^{'}(i,j)=\frac{Matrix(i,j)}{POP}$$

$$Entropy_{Joint}=-\sum_{i=1}^{|C|}\sum_{j=1}^{|C|}P^{'}(i,j)\times\log_{2}{P^{'}(i,j)}$$

$$0\times\log_{2}{0}\equiv0$$

In [94]:

cm.JointEntropy

Out[94]:

2.4591479170272446

Notice : new in version 0.8.1

Conditional entropy¶

The entropy of the distribution of categories in the response given that the reference category was as specified [30].

Wikipedia page

$$P^{'}(j|i)=\frac{Matrix(j,i)}{P_i}$$

$$Entropy_{Conditional}=\sum_{i=1}^{|C|}\Bigg(Likelihood_{Reference}(i)\times\Big(-\sum_{j=1}^{|C|}P^{'}(j|i)\times\log_{2}{P^{'}(j|i)}\Big)\Bigg)$$

$$0\times\log_{2}{0}\equiv0$$

In [95]:

cm.ConditionalEntropy

Out[95]:

0.9757921620455572

Notice : new in version 0.8.1

Kullback-Liebler divergence¶

In mathematical statistics, the Kullback–Leibler divergence (also called relative entropy) is a measure of how one probability distribution diverges from a second, expected probability distribution [11] [30].

Wikipedia Page

$$Likelihood_{Response}=\frac{TOP_i}{POP}$$

$$Likelihood_{Reference}=\frac{P_i}{POP}$$

$$Divergence=-\sum_{i=1}^{|C|}Likelihood_{Reference}\times\log_{2}{\frac{Likelihood_{Reference}}{Likelihood_{Response}}}$$

In [96]:

cm.KL

Out[96]:

0.09997757835164581

Notice : new in version 0.8.1

Mutual information¶

Mutual information is defined Kullback-Lieblier divergence, between the product of the individual distributions and the joint distribution. Mutual information is symmetric. We could also subtract the conditional entropy of the reference given the response from the reference entropy to get the same result [11] [30].

Wikipedia Page

$$P^{'}(i,j)=\frac{Matrix(i,j)}{POP}$$

$$Likelihood_{Reference}=\frac{P_i}{POP}$$

$$Likelihood_{Response}=\frac{TOP_i}{POP}$$

$$MI=-\sum_{i=1}^{|C|}\sum_{j=1}^{|C|}P^{'}(i,j)\times\log_{2}\Big({\frac{P^{'}(i,j)}{Likelihood_{Reference}(i)\times Likelihood_{Response}(i) }\Big)}$$

$$MI=Entropy_{Response}-Entropy_{Conditional}$$

In [97]:

cm.MutualInformation

Out[97]:

0.5242078379544428

Notice : new in version 0.8.1

Goodman & Kruskal's lambda A¶

In probability theory and statistics, Goodman & Kruskal's lambda is a measure of proportional reduction in error in cross tabulation analysis [12].

Wikipedia page

$$\lambda_A=\frac{\sum_{j=1}^{|C|}Max\Big(Matrix(-,j)\Big)-Max(P)}{POP-Max(P)}$$

In [98]:

cm.LambdaA

Out[98]:

0.42857142857142855

Notice : new in version 0.8.1

Goodman & Kruskal's lambda B¶

In probability theory and statistics, Goodman & Kruskal's lambda is a measure of proportional reduction in error in cross tabulation analysis [13].

Wikipedia Page

$$\lambda_B=\frac{\sum_{i=1}^{|C|}Max\Big(Matrix(i,-)\Big)-Max(TOP)}{POP-Max(TOP)}$$

In [99]:

cm.LambdaB

Out[99]:

0.16666666666666666

Notice : new in version 0.8.1

SOA1 (Landis & Koch’s benchmark)¶

For more information visit [1].

Kappa	Strength of Agreement
0 >	Poor
0 - 0.20	Slight
0.21 – 0.40	Fair
0.41 – 0.60	Moderate
0.61 – 0.80	Substantial
0.81 – 1.00	Almost perfect

In [100]:

cm.SOA1

Out[100]:

'Fair'

Notice : new in version 0.3

SOA2 (Fleiss’ benchmark)¶

For more information visit [4].

Kappa	Strength of Agreement
0.40 >	Poor
0.4 - 0.75	Intermediate to Good
More than 0.75	Excellent

In [101]:

cm.SOA2

Out[101]:

'Poor'

Notice : new in version 0.4

SOA3 (Altman’s benchmark)¶

For more information visit [5].

Kappa	Strength of Agreement
0.2 >	Poor
0.21 – 0.40	Fair
0.41 – 0.60	Moderate
0.61 – 0.80	Good
0.81 – 1.00	Very Good

In [102]:

cm.SOA3

Out[102]:

'Fair'

Notice : new in version 0.4

SOA4 (Cicchetti’s benchmark)¶

For more information visit [9].

Kappa	Strength of Agreement
0.4 >	Poor
0.4 – 0.59	Fair
0.6 – 0.74	Good
0.74 – 1.00	Excellent

In [103]:

cm.SOA4

Out[103]:

'Poor'

Notice : new in version 0.7

Overall_ACC¶

For more information visit [3].

$$ACC_{Overall}=\frac{\sum_{i=1}^{|C|}TP_i}{POP}$$

In [104]:

cm.Overall_ACC

Out[104]:

0.5833333333333334

Notice : new in version 0.4

Overall_RACC¶

For more information visit [24].

$$RACC_{Overall}=\sum_{i=1}^{|C|}RACC_i$$

In [105]:

cm.Overall_RACC

Out[105]:

0.3541666666666667

Notice : new in version 0.4

Overall_RACCU¶

For more information visit [25].

$$RACCU_{Overall}=\sum_{i=1}^{|C|}RACCU_i$$

In [106]:

cm.Overall_RACCU

Out[106]:

0.3645833333333333

Notice : new in version 0.8.1

PPV_Micro¶

For more information visit [3].

$$PPV_{Micro}=\frac{\sum_{i=1}^{|C|}TP_i}{\sum_{i=1}^{|C|}TP_i+FP_i}$$

In [107]:

cm.PPV_Micro

Out[107]:

0.5833333333333334

Notice : new in version 0.4

TPR_Micro¶

For more information visit [3].

$$TPR_{Micro}=\frac{\sum_{i=1}^{|C|}TP_i}{\sum_{i=1}^{|C|}TP_i+FN_i}$$

In [108]:

cm.TPR_Micro

Out[108]:

0.5833333333333334

Notice : new in version 0.4

PPV_Macro¶

For more information visit [3].

$$PPV_{Macro}=\frac{1}{|C|}\sum_{i=1}^{|C|}\frac{TP_i}{TP_i+FP_i}$$

In [109]:

cm.PPV_Macro

Out[109]:

0.611111111111111

Notice : new in version 0.4

TPR_Macro¶

For more information visit [3].

$$TPR_{Macro}=\frac{1}{|C|}\sum_{i=1}^{|C|}\frac{TP_i}{TP_i+FN_i}$$

In [110]:

cm.TPR_Macro

Out[110]:

0.5666666666666668

Notice : new in version 0.4

Overall_J¶

For more information visit [29].

$$J_{Mean}=\frac{1}{|C|}\sum_{i=1}^{|C|}J_i$$

$$J_{Sum}=\sum_{i=1}^{|C|}J_i$$

$$J_{Overall}=(J_{Sum},J_{Mean})$$

In [111]:

cm.Overall_J

Out[111]:

(1.225, 0.4083333333333334)

Notice : new in version 0.9

Hamming loss¶

The hamming_loss computes the average Hamming loss or Hamming distance between two sets of samples [31].

$$L_{Hamming}=\frac{1}{POP}\sum_{i=1}^{|P|}1(y_i \neq \widehat{y}_i)$$

In [112]:

cm.HammingLoss

Out[112]:

0.41666666666666663

Notice : new in version 1.0

Zero-one loss¶

For more information visit [31].

$$L_{0-1}=\sum_{i=1}^{|P|}1(y_i \neq \widehat{y}_i)$$

In [113]:

cm.ZeroOneLoss

Out[113]:

Notice : new in version 1.1

NIR (No information rate)¶

The no information error rate is the error rate when the input and output are independent.

$$NIR=\frac{1}{POP}Max(P)$$

In [114]:

cm.NIR

Out[114]:

0.4166666666666667

Notice : new in version 1.2

P-Value¶

For more information visit [31].

$$x=\sum_{i=1}^{|C|}TP_{i}$$

$$p=NIR$$

$$n=POP$$

$$P-Value_{(ACC > NIR)}=1-\sum_{i=1}^{x}\left(\begin{array}{c}n\\ i\end{array}\right)p^{i}(1-p)^{n-i}$$

In [115]:

cm.PValue

Out[115]:

0.18926430237560654

Notice : new in version 1.2

Overall_CEN¶

For more information visit [17].

$$P_j=\frac{\sum_{k=1}^{|C|}\Big(Matrix(j,k)+Matrix(k,j)\Big)}{2\sum_{k,l=1}^{|C|}Matrix(k,l)}$$

$$CEN_{Overall}=\sum_{j=1}^{|C|}P_jCEN_j$$

In [116]:

cm.Overall_CEN

Out[116]:

0.4638112995385119

Notice : new in version 1.3

Overall_MCEN¶

For more information visit [19].

$$\alpha=\begin{cases}1 & |C| > 2\\0 & |C| = 2\end{cases}$$

$$P_j=\frac{\sum_{k=1}^{|C|}\Big(Matrix(j,k)+Matrix(k,j)\Big)-Matrix(j,j)}{2\sum_{k,l=1}^{|C|}Matrix(k,l)-\alpha \sum_{k=1}^{|C|}Matrix(k,k)}$$

$$MCEN_{Overall}=\sum_{j=1}^{|C|}P_jMCEN_j$$

In [117]:

cm.Overall_MCEN

Out[117]:

0.5189369467580801

Notice : new in version 1.3

Overall_MCC¶

For more information visit [20] [27].

$$MCC_{Overall}=\frac{cov(X,Y)}{\sqrt{cov(X,X)\times cov(Y,Y)}}$$

$$cov(X,Y)=\sum_{i,j,k=1}^{|C|}\Big(Matrix(i,i)Matrix(k,j)-Matrix(j,i)Matrix(i,k)\Big)$$

$$cov(X,X) = \sum_{i=1}^{|C|}\Bigg[\Big(\sum_{j=1}^{|C|}Matrix(j,i)\Big)\Big(\sum_{k,l=1,k\neq i}^{|C|}Matrix(l,k)\Big)\Bigg]$$

$$cov(Y,Y) = \sum_{i=1}^{|C|}\Bigg[\Big(\sum_{j=1}^{|C|}Matrix(i,j)\Big)\Big(\sum_{k,l=1,k\neq i}^{|C|}Matrix(k,l)\Big)\Bigg]$$

In [118]:

cm.Overall_MCC

Out[118]:

0.36666666666666664

Notice : new in version 1.4

RR (Global performance index)¶

For more information visit [21].

$$RR=\frac{1}{|C|}\sum_{i,j=1}^{|C|}Matrix(i,j)$$

In [119]:

cm.RR

Out[119]:

4.0

Notice : new in version 1.4

CBA (Class balance accuracy)¶

For more information visit [22].

$$CBA=\frac{\sum_{i=1}^{|C|}\frac{Matrix(i,i)}{Max(TOP_i,P_i)}}{|C|}$$

In [120]:

cm.CBA

Out[120]:

0.4777777777777778

Notice : new in version 1.4

AUNU¶

When dealing with multiclass problems, a global measure of classification performances based on the ROC approach (AUNU) has been proposed as the average of single-class measures [23].

$$AUNU=\frac{\sum_{i=1}^{|C|}AUC_i}{|C|}$$

In [121]:

cm.AUNU

Out[121]:

0.6785714285714285

Notice : new in version 1.4

AUNP¶

Another option (AUNP) is that of averaging the AUCi values with weights proportional to the number of samples experimentally belonging to each class, that is, the a priori class distribution [23].

$$AUNP=\sum_{i=1}^{|C|}\frac{P_i}{POP}AUC_i$$

In [122]:

cm.AUNP

Out[122]:

0.6857142857142857

Notice : new in version 1.4

RCI (Relative classifier information)¶

Performance of different classifiers on the same domain can be measured by comparing relative classifier information while classifier information (mutual information) can be used for comparison across different decision problems [32] [22].

$$H_d=-\sum_{i=1}^{|C|}\Big(\frac{\sum_{l=1}^{|C|}Matrix(i,l)}{\sum_{h,k=1}^{|C|}Matrix(h,k)}log_2\frac{\sum_{l=1}^{|C|}Matrix(i,l)}{\sum_{h,k=1}^{|C|}Matrix(h,k)}\Big)=Entropy_{Reference}$$

$$H_o=\sum_{j=1}^{|C|}\Big(\frac{\sum_{k=1}^{|C|}Matrix(k,j)}{\sum_{h,l=0}^{|C|}Matrix(h,l)}H_{oj}\Big)=Entropy_{Conditional}$$

$$H_{oj}=-\sum_{i=1}^{|C|}\Big(\frac{Matrix(i,j)}{\sum_{k=1}^{|C|}Matrix(k,j)}log_2\frac{Matrix(i,j)}{\sum_{k=1}^{|C|}Matrix(k,j)}\Big)$$

$$RCI=\frac{H_d-H_o}{H_d}=\frac{MI}{Entropy_{Reference}}$$

In [123]:

cm.RCI

Out[123]:

0.3533932006492363

Notice : new in version 1.5

Print¶

Full¶

In [124]:

print(cm)

Predict          L1    L2    L3    
Actual
L1               3     0     2     

L2               0     1     1     

L3               0     2     3     





Overall Statistics : 

95% CI                                                           (0.30439,0.86228)
AUNP                                                             0.68571
AUNU                                                             0.67857
Bennett S                                                        0.375
CBA                                                              0.47778
Chi-Squared                                                      6.6
Chi-Squared DF                                                   4
Conditional Entropy                                              0.97579
Cramer V                                                         0.5244
Cross Entropy                                                    1.58333
Gwet AC1                                                         0.38931
Hamming Loss                                                     0.41667
Joint Entropy                                                    2.45915
KL Divergence                                                    0.09998
Kappa                                                            0.35484
Kappa 95% CI                                                     (-0.07708,0.78675)
Kappa No Prevalence                                              0.16667
Kappa Standard Error                                             0.22036
Kappa Unbiased                                                   0.34426
Lambda A                                                         0.42857
Lambda B                                                         0.16667
Mutual Information                                               0.52421
NIR                                                              0.41667
Overall ACC                                                      0.58333
Overall CEN                                                      0.46381
Overall J                                                        (1.225,0.40833)
Overall MCC                                                      0.36667
Overall MCEN                                                     0.51894
Overall RACC                                                     0.35417
Overall RACCU                                                    0.36458
P-Value                                                          0.18926
PPV Macro                                                        0.61111
PPV Micro                                                        0.58333
Phi-Squared                                                      0.55
RCI                                                              0.35339
RR                                                               4.0
Reference Entropy                                                1.48336
Response Entropy                                                 1.5
SOA1(Landis & Koch)                                              Fair
SOA2(Fleiss)                                                     Poor
SOA3(Altman)                                                     Fair
SOA4(Cicchetti)                                                  Poor
Scott PI                                                         0.34426
Standard Error                                                   0.14232
TPR Macro                                                        0.56667
TPR Micro                                                        0.58333
Zero-one Loss                                                    5

Class Statistics :

Classes                                                          L1                      L2                      L3                      
ACC(Accuracy)                                                    0.83333                 0.75                    0.58333                 
AUC(Area under the roc curve)                                    0.8                     0.65                    0.58571                 
AUCI(Auc value interpretation)                                   Very Good               Fair                    Poor                    
BM(Informedness or bookmaker informedness)                       0.6                     0.3                     0.17143                 
CEN(Confusion entropy)                                           0.25                    0.49658                 0.60442                 
DOR(Diagnostic odds ratio)                                       None                    4.0                     2.0                     
DP(Discriminant power)                                           None                    0.33193                 0.16597                 
DPI(Discriminant power interpretation)                           None                    Poor                    Poor                    
ERR(Error rate)                                                  0.16667                 0.25                    0.41667                 
F0.5(F0.5 score)                                                 0.88235                 0.35714                 0.51724                 
F1(F1 score - harmonic mean of precision and sensitivity)        0.75                    0.4                     0.54545                 
F2(F2 score)                                                     0.65217                 0.45455                 0.57692                 
FDR(False discovery rate)                                        0.0                     0.66667                 0.5                     
FN(False negative/miss/type 2 error)                             2                       1                       2                       
FNR(Miss rate or false negative rate)                            0.4                     0.5                     0.4                     
FOR(False omission rate)                                         0.22222                 0.11111                 0.33333                 
FP(False positive/type 1 error/false alarm)                      0                       2                       3                       
FPR(Fall-out or false positive rate)                             0.0                     0.2                     0.42857                 
G(G-measure geometric mean of precision and sensitivity)         0.7746                  0.40825                 0.54772                 
GI(Gini index)                                                   0.6                     0.3                     0.17143                 
IS(Information score)                                            1.26303                 1.0                     0.26303                 
J(Jaccard index)                                                 0.6                     0.25                    0.375                   
LS(Lift score)                                                   2.4                     2.0                     1.2                     
MCC(Matthews correlation coefficient)                            0.68313                 0.2582                  0.16903                 
MCEN(Modified confusion entropy)                                 0.26439                 0.5                     0.6875                  
MK(Markedness)                                                   0.77778                 0.22222                 0.16667                 
N(Condition negative)                                            7                       10                      7                       
NLR(Negative likelihood ratio)                                   0.4                     0.625                   0.7                     
NPV(Negative predictive value)                                   0.77778                 0.88889                 0.66667                 
P(Condition positive or support)                                 5                       2                       5                       
PLR(Positive likelihood ratio)                                   None                    2.5                     1.4                     
PLRI(Positive likelihood ratio interpretation)                   None                    Poor                    Poor                    
POP(Population)                                                  12                      12                      12                      
PPV(Precision or positive predictive value)                      1.0                     0.33333                 0.5                     
PRE(Prevalence)                                                  0.41667                 0.16667                 0.41667                 
RACC(Random accuracy)                                            0.10417                 0.04167                 0.20833                 
RACCU(Random accuracy unbiased)                                  0.11111                 0.0434                  0.21007                 
TN(True negative/correct rejection)                              7                       8                       4                       
TNR(Specificity or true negative rate)                           1.0                     0.8                     0.57143                 
TON(Test outcome negative)                                       9                       9                       6                       
TOP(Test outcome positive)                                       3                       3                       6                       
TP(True positive/hit)                                            3                       1                       3                       
TPR(Sensitivity, recall, hit rate, or true positive rate)        0.6                     0.5                     0.6                     
Y(Youden index)                                                  0.6                     0.3                     0.17143                 
dInd(Distance index)                                             0.4                     0.53852                 0.58624                 
sInd(Similarity index)                                           0.71716                 0.61921                 0.58547

Matrix¶

In [125]:

cm.print_matrix()

Predict          L1    L2    L3    
Actual
L1               3     0     2     

L2               0     1     1     

L3               0     2     3

In [126]:

cm.matrix

Out[126]:

{0: {0: 3, 1: 0, 2: 2},
 1: {0: 0, 1: 1, 2: 1},
 2: {0: 0, 1: 2, 2: 3},
 'L1': {'L1': 3, 'L2': 0, 'L3': 2},
 'L2': {'L1': 0, 'L2': 1, 'L3': 1},
 'L3': {'L1': 0, 'L2': 2, 'L3': 3}}

In [127]:

cm.print_matrix(one_vs_all=True,class_name = "L1")

Predict          L1    ~     
Actual
L1               3     2     

~                0     7

Parameters¶

one_vs_all : One-Vs-All mode flag (type : bool)
class_name : target class name for One-Vs-All mode (type : any valid type)

Notice : `one_vs_all` option, new in version 1.4

Notice : `matrix()` renamed to `print_matrix()` and `matrix` return confusion matrix as `dict` in version 1.5

Normalized matrix¶

In [128]:

cm.print_normalized_matrix()

Predict          L1     L2     L3     
Actual
L1               0.6    0.0    0.4    

L2               0.0    0.5    0.5    

L3               0.0    0.4    0.6

In [129]:

cm.normalized_matrix

Out[129]:

{0: {0: 0.6, 1: 0.0, 2: 0.4},
 1: {0: 0.0, 1: 0.5, 2: 0.5},
 2: {0: 0.0, 1: 0.4, 2: 0.6},
 'L1': {'L1': 0.6, 'L2': 0.0, 'L3': 0.4},
 'L2': {'L1': 0.0, 'L2': 0.5, 'L3': 0.5},
 'L3': {'L1': 0.0, 'L2': 0.4, 'L3': 0.6}}

In [130]:

cm.print_normalized_matrix(one_vs_all=True,class_name = "L1")

Predict          L1     ~      
Actual
L1               0.6    0.4    

~                0.0    1.0

Parameters¶

one_vs_all : One-Vs-All mode flag (type : bool)
class_name : target class name for One-Vs-All mode (type : any valid type)

Notice : `one_vs_all` option, new in version 1.4

Notice : `normalized_matrix()` renamed to `print_normalized_matrix()` and `normalized_matrix` return normalized confusion matrix as `dict` in version 1.5

Stat¶

In [131]:

cm.stat()

Overall Statistics : 

95% CI                                                           (0.30439,0.86228)
AUNP                                                             0.68571
AUNU                                                             0.67857
Bennett S                                                        0.375
CBA                                                              0.47778
Chi-Squared                                                      6.6
Chi-Squared DF                                                   4
Conditional Entropy                                              0.97579
Cramer V                                                         0.5244
Cross Entropy                                                    1.58333
Gwet AC1                                                         0.38931
Hamming Loss                                                     0.41667
Joint Entropy                                                    2.45915
KL Divergence                                                    0.09998
Kappa                                                            0.35484
Kappa 95% CI                                                     (-0.07708,0.78675)
Kappa No Prevalence                                              0.16667
Kappa Standard Error                                             0.22036
Kappa Unbiased                                                   0.34426
Lambda A                                                         0.42857
Lambda B                                                         0.16667
Mutual Information                                               0.52421
NIR                                                              0.41667
Overall ACC                                                      0.58333
Overall CEN                                                      0.46381
Overall J                                                        (1.225,0.40833)
Overall MCC                                                      0.36667
Overall MCEN                                                     0.51894
Overall RACC                                                     0.35417
Overall RACCU                                                    0.36458
P-Value                                                          0.18926
PPV Macro                                                        0.61111
PPV Micro                                                        0.58333
Phi-Squared                                                      0.55
RCI                                                              0.35339
RR                                                               4.0
Reference Entropy                                                1.48336
Response Entropy                                                 1.5
SOA1(Landis & Koch)                                              Fair
SOA2(Fleiss)                                                     Poor
SOA3(Altman)                                                     Fair
SOA4(Cicchetti)                                                  Poor
Scott PI                                                         0.34426
Standard Error                                                   0.14232
TPR Macro                                                        0.56667
TPR Micro                                                        0.58333
Zero-one Loss                                                    5

Class Statistics :

Classes                                                          L1                      L2                      L3                      
ACC(Accuracy)                                                    0.83333                 0.75                    0.58333                 
AUC(Area under the roc curve)                                    0.8                     0.65                    0.58571                 
AUCI(Auc value interpretation)                                   Very Good               Fair                    Poor                    
BM(Informedness or bookmaker informedness)                       0.6                     0.3                     0.17143                 
CEN(Confusion entropy)                                           0.25                    0.49658                 0.60442                 
DOR(Diagnostic odds ratio)                                       None                    4.0                     2.0                     
DP(Discriminant power)                                           None                    0.33193                 0.16597                 
DPI(Discriminant power interpretation)                           None                    Poor                    Poor                    
ERR(Error rate)                                                  0.16667                 0.25                    0.41667                 
F0.5(F0.5 score)                                                 0.88235                 0.35714                 0.51724                 
F1(F1 score - harmonic mean of precision and sensitivity)        0.75                    0.4                     0.54545                 
F2(F2 score)                                                     0.65217                 0.45455                 0.57692                 
FDR(False discovery rate)                                        0.0                     0.66667                 0.5                     
FN(False negative/miss/type 2 error)                             2                       1                       2                       
FNR(Miss rate or false negative rate)                            0.4                     0.5                     0.4                     
FOR(False omission rate)                                         0.22222                 0.11111                 0.33333                 
FP(False positive/type 1 error/false alarm)                      0                       2                       3                       
FPR(Fall-out or false positive rate)                             0.0                     0.2                     0.42857                 
G(G-measure geometric mean of precision and sensitivity)         0.7746                  0.40825                 0.54772                 
GI(Gini index)                                                   0.6                     0.3                     0.17143                 
IS(Information score)                                            1.26303                 1.0                     0.26303                 
J(Jaccard index)                                                 0.6                     0.25                    0.375                   
LS(Lift score)                                                   2.4                     2.0                     1.2                     
MCC(Matthews correlation coefficient)                            0.68313                 0.2582                  0.16903                 
MCEN(Modified confusion entropy)                                 0.26439                 0.5                     0.6875                  
MK(Markedness)                                                   0.77778                 0.22222                 0.16667                 
N(Condition negative)                                            7                       10                      7                       
NLR(Negative likelihood ratio)                                   0.4                     0.625                   0.7                     
NPV(Negative predictive value)                                   0.77778                 0.88889                 0.66667                 
P(Condition positive or support)                                 5                       2                       5                       
PLR(Positive likelihood ratio)                                   None                    2.5                     1.4                     
PLRI(Positive likelihood ratio interpretation)                   None                    Poor                    Poor                    
POP(Population)                                                  12                      12                      12                      
PPV(Precision or positive predictive value)                      1.0                     0.33333                 0.5                     
PRE(Prevalence)                                                  0.41667                 0.16667                 0.41667                 
RACC(Random accuracy)                                            0.10417                 0.04167                 0.20833                 
RACCU(Random accuracy unbiased)                                  0.11111                 0.0434                  0.21007                 
TN(True negative/correct rejection)                              7                       8                       4                       
TNR(Specificity or true negative rate)                           1.0                     0.8                     0.57143                 
TON(Test outcome negative)                                       9                       9                       6                       
TOP(Test outcome positive)                                       3                       3                       6                       
TP(True positive/hit)                                            3                       1                       3                       
TPR(Sensitivity, recall, hit rate, or true positive rate)        0.6                     0.5                     0.6                     
Y(Youden index)                                                  0.6                     0.3                     0.17143                 
dInd(Distance index)                                             0.4                     0.53852                 0.58624                 
sInd(Similarity index)                                           0.71716                 0.61921                 0.58547

In [132]:

cm.stat(overall_param=["Kappa"],class_param=["ACC","AUC","TPR"])

Overall Statistics : 

Kappa                                                            0.35484

Class Statistics :

Classes                                                          L1                      L2                      L3                      
ACC(Accuracy)                                                    0.83333                 0.75                    0.58333                 
AUC(Area under the roc curve)                                    0.8                     0.65                    0.58571                 
TPR(Sensitivity, recall, hit rate, or true positive rate)        0.6                     0.5                     0.6

In [133]:

cm.stat(overall_param=["Kappa"],class_param=["ACC","AUC","TPR"],class_name=["L1","L3"])

Overall Statistics : 

Kappa                                                            0.35484

Class Statistics :

Classes                                                          L1                      L3                      
ACC(Accuracy)                                                    0.83333                 0.58333                 
AUC(Area under the roc curve)                                    0.8                     0.58571                 
TPR(Sensitivity, recall, hit rate, or true positive rate)        0.6                     0.6

Parameters¶

overall_param : overall statistics names for print (type : list)
class_param : class statistics names for print (type : list)
class_name : class names for print (sub set of classes) (type : list)

Notice : `cm.params()` in prev versions (0.2 >)

Notice : `overall_param` & `class_param` , new in version 1.6

Notice : `class_name` , new in version 1.7

Save¶

.pycm file¶

In [134]:

cm.save_stat("cm1")

Out[134]:

{'Message': 'D:\\For Asus Laptop\\projects\\pycm\\Document\\cm1.pycm',
 'Status': True}

In [135]:

cm.save_stat("cm1_filtered",overall_param=["Kappa"],class_param=["ACC","AUC","TPR"])

Out[135]:

{'Message': 'D:\\For Asus Laptop\\projects\\pycm\\Document\\cm1_filtered.pycm',
 'Status': True}

In [136]:

cm.save_stat("cm1_filtered2",overall_param=["Kappa"],class_param=["ACC","AUC","TPR"],class_name=["L1"])

Out[136]:

{'Message': 'D:\\For Asus Laptop\\projects\\pycm\\Document\\cm1_filtered2.pycm',
 'Status': True}

In [137]:

cm.save_stat("cm1asdasd/")

Out[137]:

{'Message': "[Errno 2] No such file or directory: 'cm1asdasd/.pycm'",
 'Status': False}

Parameters¶

name : output file name (type : str)
address : flag for address return (type : bool)
overall_param : overall statistics names for save (type : list)
class_param : class statistics names for save (type : list)
class_name : class names for print (sub set of classes) (type : list)

Notice : new in version 0.4

Notice : `overall_param` & `class_param` , new in version 1.6

Notice : `class_name` , new in version 1.7

HTML¶

In [138]:

cm.save_html("cm1")

Out[138]:

{'Message': 'D:\\For Asus Laptop\\projects\\pycm\\Document\\cm1.html',
 'Status': True}

In [139]:

cm.save_html("cm1_filtered",overall_param=["Kappa"],class_param=["ACC","AUC","TPR"])

Out[139]:

{'Message': 'D:\\For Asus Laptop\\projects\\pycm\\Document\\cm1_filtered.html',
 'Status': True}

In [140]:

cm.save_html("cm1_filtered2",overall_param=["Kappa"],class_param=["ACC","AUC","TPR"],class_name=["L1"])

Out[140]:

{'Message': 'D:\\For Asus Laptop\\projects\\pycm\\Document\\cm1_filtered2.html',
 'Status': True}

In [141]:

cm.save_html("cm1_colored",color=(255, 204, 255))

Out[141]:

{'Message': 'D:\\For Asus Laptop\\projects\\pycm\\Document\\cm1_colored.html',
 'Status': True}

In [142]:

cm.save_html("cm1asdasd/")

Out[142]:

{'Message': "[Errno 2] No such file or directory: 'cm1asdasd/.html'",
 'Status': False}

Parameters¶

name : output file name (type : str)
address : flag for address return (type : bool)
overall_param : overall statistics names for save (type : list)
class_param : class statistics names for save (type : list)
class_name : class names for print (sub set of classes) (type : list)
color : matrix color (R,G,B) (type : tuple)

Notice : new in version 0.5

Notice : `overall_param` & `class_param` , new in version 1.6

Notice : `class_name` , new in version 1.7

Notice : `color`, new in version 1.8

CSV¶

In [143]:

cm.save_csv("cm1")

Out[143]:

{'Message': 'D:\\For Asus Laptop\\projects\\pycm\\Document\\cm1.csv',
 'Status': True}

In [144]:

cm.save_csv("cm1_filtered",class_param=["ACC","AUC","TPR"])

Out[144]:

{'Message': 'D:\\For Asus Laptop\\projects\\pycm\\Document\\cm1_filtered.csv',
 'Status': True}

In [145]:

cm.save_csv("cm1_filtered2",class_param=["ACC","AUC","TPR"],class_name=["L1"])

Out[145]:

{'Message': 'D:\\For Asus Laptop\\projects\\pycm\\Document\\cm1_filtered2.csv',
 'Status': True}

In [146]:

cm.save_csv("cm1asdasd/")

Out[146]:

{'Message': "[Errno 2] No such file or directory: 'cm1asdasd/.csv'",
 'Status': False}

Parameters¶

name : output file name (type : str)
address : flag for address return (type : bool)
class_param : class statistics names for save (type : list)
class_name : class names for print (sub set of classes) (type : list)

Notice : new in version 0.6

Notice : `class_param` , new in version 1.6

Notice : `class_name` , new in version 1.7

OBJ¶

In [147]:

cm.save_obj("cm1")

Out[147]:

{'Message': 'D:\\For Asus Laptop\\projects\\pycm\\Document\\cm1.obj',
 'Status': True}

In [148]:

cm.save_obj("cm1asdasd/")

Out[148]:

{'Message': "[Errno 2] No such file or directory: 'cm1asdasd/.obj'",
 'Status': False}

Parameters¶

name : output file name (type : str)
address : flag for address return (type : bool)

Notice : new in version 0.9.5

Input errors¶

In [149]:

try:
    cm2=ConfusionMatrix(y_actu, 2)
except pycmVectorError as e:
    print(str(e))

The type of input vectors is assumed to be a list or a NumPy array

In [150]:

try:
    cm3=ConfusionMatrix(y_actu, [1,2,3])
except pycmVectorError as e:
    print(str(e))

Input vectors must have same length

In [151]:

try:
    cm_4 = ConfusionMatrix([], [])
except pycmVectorError as e:
    print(str(e))

Input vectors are empty

In [152]:

try:
    cm_5 = ConfusionMatrix([1,1,1,], [1,1,1,1])
except pycmVectorError as e:
    print(str(e))

Input vectors must have same length

In [153]:

try:
    cm3=ConfusionMatrix(matrix={})
except pycmMatrixError as e:
    print(str(e))

Input confusion matrix format error

In [154]:

try:
    cm_4=ConfusionMatrix(matrix={1:{1:2,"1":2},"1":{1:2,"1":3}})
except pycmMatrixError as e:
    print(str(e))

Type of the input matrix classes is assumed  be the same

In [155]:

try:
    cm_5=ConfusionMatrix(matrix={1:{1:2}})
except pycmVectorError as e:
    print(str(e))

Number of the classes is lower than 2

Notice : updated in version 0.8

Examples¶

Example-1 (Comparison of three different classifiers)¶

Example-2 (How to plot via matplotlib)¶

Example-3 (Activation threshold)¶

Example-4 (File)¶

Example-5 (Sample weights)¶

Example-6 (Unbalanced data)¶

Example-7 (How to plot via seaborn+pandas)¶

References¶

1- J. R. Landis, G. G. Koch, “The measurement of observer agreement for categorical data. Biometrics,” in International Biometric Society, pp. 159–174, 1977.

2- D. M. W. Powers, “Evaluation: from precision, recall and f-measure to roc, informedness, markedness & correlation,” in Journal of Machine Learning Technologies, pp.37-63, 2011.

3- C. Sammut, G. Webb, “Encyclopedia of Machine Learning” in Springer, 2011.

4- J. L. Fleiss, “Measuring nominal scale agreement among many raters,” in Psychological Bulletin, pp. 378-382.

5- D.G. Altman, “Practical Statistics for Medical Research,” in Chapman and Hall, 1990.

6- K. L. Gwet, “Computing inter-rater reliability and its variance in the presence of high agreement,” in The British Journal of Mathematical and Statistical Psychology, pp. 29–48, 2008.”

7- W. A. Scott, “Reliability of content analysis: The case of nominal scaling,” in Public Opinion Quarterly, pp. 321–325, 1955.

8- E. M. Bennett, R. Alpert, and A. C. Goldstein, “Communication through limited response questioning,” in The Public Opinion Quarterly, pp. 303–308, 1954.

9- D. V. Cicchetti, "Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology," in Psychological Assessment, pp. 284–290, 1994.

10- R.B. Davies, "Algorithm AS155: The Distributions of a Linear Combination of χ2 Random Variables," in Journal of the Royal Statistical Society, pp. 323–333, 1980.

11- S. Kullback, R. A. Leibler "On information and sufficiency," in Annals of Mathematical Statistics, pp. 79–86, 1951.

12- L. A. Goodman, W. H. Kruskal, "Measures of Association for Cross Classifications, IV: Simplification of Asymptotic Variances," in Journal of the American Statistical Association, pp. 415–421, 1972.

13- L. A. Goodman, W. H. Kruskal, "Measures of Association for Cross Classifications III: Approximate Sampling Theory," in Journal of the American Statistical Association, pp. 310–364, 1963.

14- T. Byrt, J. Bishop and J. B. Carlin, “Bias, prevalence, and kappa,” in Journal of Clinical Epidemiology pp. 423-429, 1993.

15- M. Shepperd, D. Bowes, and T. Hall, “Researcher Bias: The Use of Machine Learning in Software Defect Prediction,” in IEEE Transactions on Software Engineering, pp. 603-616, 2014.

16- X. Deng, Q. Liu, Y. Deng, and S. Mahadevan, “An improved method to construct basic probability assignment based on the confusion matrix for classification problem, ” in Information Sciences, pp.250-261, 2016.

17- Wei, J.-M., Yuan, X.-Y., Hu, Q.-H., Wang, S.-Q.: A novel measure for evaluating classifiers. Expert Systems with Applications, Vol 37, 3799–3809 (2010).

18- Kononenko I. and Bratko I. Information-based evaluation criterion for classifier’s performance. Machine Learning, 6:67–80, 1991.

19- Delgado R., Núñez-González J.D. (2019) Enhancing Confusion Entropy as Measure for Evaluating Classifiers. In: Graña M. et al. (eds) International Joint Conference SOCO’18-CISIS’18-ICEUTE’18. SOCO’18-CISIS’18-ICEUTE’18 2018. Advances in Intelligent Systems and Computing, vol 771. Springer, Cham

20- Gorodkin J (2004) Comparing two K-category assignments by a K-category correlation coefficient. Computational Biology and Chemistry 28: 367–374

21- Freitas C.O.A., de Carvalho J.M., Oliveira J., Aires S.B.K., Sabourin R. (2007) Confusion Matrix Disagreement for Multiple Classifiers. In: Rueda L., Mery D., Kittler J. (eds) Progress in Pattern Recognition, Image Analysis and Applications. CIARP 2007. Lecture Notes in Computer Science, vol 4756. Springer, Berlin, Heidelberg

22- Branco P., Torgo L., Ribeiro R.P. (2017) Relevance-Based Evaluation Metrics for Multi-class Imbalanced Domains. In: Kim J., Shim K., Cao L., Lee JG., Lin X., Moon YS. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2017. Lecture Notes in Computer Science, vol 10234. Springer, Cham

23- Ballabio, D., Grisoni, F. and Todeschini, R. (2018). Multivariate comparison of classification performance measures. Chemometrics and Intelligent Laboratory Systems, 174, pp.33-44.

24- Cohen, Jacob. 1960. A coefficient of agreement for nominal scales. Educational And Psychological Measurement 20:37-46

25- Siegel, Sidney and N. John Castellan, Jr. 1988. Nonparametric Statistics for the Behavioral Sciences. McGraw Hill.

26- Cramér, Harald. 1946. Mathematical Methods of Statistics. Princeton: Princeton University Press, page 282 (Chapter 21. The two-dimensional case)

27- Matthews, B. W. (1975). "Comparison of the predicted and observed secondary structure of T4 phage lysozyme". Biochimica et Biophysica Acta (BBA) - Protein Structure. 405 (2): 442–451.

28- Swets JA. (1973). "The relative operating characteristic in Psychology". Science. 182 (14116): 990–1000.

29- Jaccard, Paul (1901), "Étude comparative de la distribution florale dans une portion des Alpes et des Jura", Bulletin de la Société Vaudoise des Sciences Naturelles, 37: 547–579.

30- Thomas M. Cover and Joy A. Thomas. 2006. Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing). Wiley-Interscience, New York, NY, USA.

31- Keeping, E.S. (1962) Introduction to Statistical Inference. D. Van Nostrand, Princeton, NJ.

32- Sindhwani V, Bhattacharge P, Rakshit S (2001) Information theoretic feature crediting in multiclass Support Vector Machines. In: Grossman R, Kumar V, editors, Proceedings First SIAM International Conference on Data Mining, ICDM01. SIAM, pp. 1–18.

33- Bekkar, Mohamed & Djema, Hassiba & Alitouche, T.A.. (2013). Evaluation measures for models assessment over imbalanced data sets. Journal of Information Engineering and Applications. 3. 27-38.

34- Youden W, (1950),« Index for rating diagnostic tests »; Cancer, 3 :32–35

35- S. Brin, R. Motwani, J. D. Ullman, and S. Tsur. Dynamic itemset counting and implication rules for market basket data. In Proc. of the ACM SIGMOD Int'l Conf. on Management of Data (ACM SIGMOD '97), pages 265-276, 1997.

36- Raschka, Sebastian (2018) MLxtend: Providing machine learning and data science utilities and extensions to Python's scientific computing stack. J Open Source Softw 3(24).