PyCM's distance
method provides users with a wide range of string distance/similarity metrics to evaluate a confusion matrix by measuring its distance to a perfect confusion matrix. Distance/Similarity metrics measure the distance between two vectors of numbers. Small distances between two objects indicate similarity. In the PyCM's distance
method, a distance measure can be chosen from DistanceType
. The measures' names are chosen based on the namig style suggested in [1].
from pycm import ConfusionMatrix, DistanceType
cm = ConfusionMatrix(matrix={0: {0: 3, 1: 0, 2: 0}, 1: {0: 0, 1: 1, 2: 2}, 2: {0: 2, 1: 1, 2: 3}})
cm.distance(metric=DistanceType.AMPLE)
{0: 0.6, 1: 0.3, 2: 0.17142857142857143}
Anderberg's D [4].
cm.distance(metric=DistanceType.Anderberg)
{0: 0.16666666666666666, 1: 0.0, 2: 0.041666666666666664}
Andres & Marzo's Delta correlation [5].
cm.distance(metric=DistanceType.AndresMarzoDelta)
{0: 0.8333333333333334, 1: 0.5142977396044842, 2: 0.17508504286947035}
Baroni-Urbani & Buser I similarity [6].
cm.distance(metric=DistanceType.BaroniUrbaniBuserI)
{0: 0.79128784747792, 1: 0.5606601717798213, 2: 0.5638559245324765}
Baroni-Urbani & Buser II correlation [6].
cm.distance(metric=DistanceType.BaroniUrbaniBuserII)
{0: 0.58257569495584, 1: 0.12132034355964261, 2: 0.1277118490649528}
Batagelj & Bren distance [7].
cm.distance(metric=DistanceType.BatageljBren)
{0: 0.0, 1: 0.25, 2: 0.5}
Baulieu I distance [8].
cm.distance(metric=DistanceType.BaulieuI)
{0: 0.4, 1: 0.8333333333333334, 2: 0.7}
Baulieu II similarity [8].
cm.distance(metric=DistanceType.BaulieuII)
{0: 0.4666666666666667, 1: 0.11851851851851852, 2: 0.11428571428571428}
Baulieu III distance [8].
cm.distance(metric=DistanceType.BaulieuIII)
{0: 0.20833333333333334, 1: 0.4166666666666667, 2: 0.4166666666666667}
Baulieu IV distance [9].
cm.distance(metric=DistanceType.BaulieuIV)
{0: -41.45702383161246, 1: -22.855395541901885, 2: -13.85431293274332}
Baulieu V distance [9].
cm.distance(metric=DistanceType.BaulieuV)
{0: 0.5, 1: 0.8, 2: 0.6666666666666666}
Baulieu VI distance [9].
cm.distance(metric=DistanceType.BaulieuVI)
{0: 0.3333333333333333, 1: 0.6, 2: 0.5555555555555556}
Baulieu VII distance [9].
cm.distance(metric=DistanceType.BaulieuVII)
{0: 0.13333333333333333, 1: 0.14285714285714285, 2: 0.3333333333333333}
Baulieu VIII distance [9].
cm.distance(metric=DistanceType.BaulieuVIII)
{0: 0.027777777777777776, 1: 0.006944444444444444, 2: 0.006944444444444444}
Baulieu IX distance [9].
cm.distance(metric=DistanceType.BaulieuIX)
{0: 0.16666666666666666, 1: 0.35714285714285715, 2: 0.5333333333333333}
Baulieu X distance [9].
cm.distance(metric=DistanceType.BaulieuX)
{0: 0.2857142857142857, 1: 0.35714285714285715, 2: 0.5333333333333333}
Baulieu XI distance [9].
cm.distance(metric=DistanceType.BaulieuXI)
{0: 0.2222222222222222, 1: 0.2727272727272727, 2: 0.5555555555555556}
Baulieu XII distance [9].
cm.distance(metric=DistanceType.BaulieuXII)
{0: 0.5, 1: 1.0, 2: 0.7142857142857143}
Baulieu XIII distance [9].
cm.distance(metric=DistanceType.BaulieuXIII)
{0: 0.25, 1: 0.23076923076923078, 2: 0.45454545454545453}
Baulieu XIV distance [9].
cm.distance(metric=DistanceType.BaulieuXIV)
{0: 0.4, 1: 0.8333333333333334, 2: 0.7272727272727273}
Baulieu XV distance [9].
cm.distance(metric=DistanceType.BaulieuXV)
{0: 0.5714285714285714, 1: 0.8333333333333334, 2: 0.7272727272727273}
Benini I correlation [10].
cm.distance(metric=DistanceType.BeniniI)
{0: 1.0, 1: 0.2, 2: 0.14285714285714285}
Benini II correlation [10].
cm.distance(metric=DistanceType.BeniniII)
{0: 1.0, 1: 0.3333333333333333, 2: 0.2}
cm.distance(metric=DistanceType.Canberra)
{0: 0.25, 1: 0.6, 2: 0.45454545454545453}
Clement similarity [13].
cm.distance(metric=DistanceType.Clement)
{0: 0.7666666666666666, 1: 0.55, 2: 0.588095238095238}
Consonni & Todeschini I similarity [14].
cm.distance(metric=DistanceType.ConsonniTodeschiniI)
{0: 0.9348704159880586, 1: 0.8977117175026231, 2: 0.8107144632819592}
Consonni & Todeschini II similarity [14].
cm.distance(metric=DistanceType.ConsonniTodeschiniII)
{0: 0.5716826589686053, 1: 0.4595236911453605, 2: 0.3014445045412856}
Consonni & Todeschini III similarity [14].
cm.distance(metric=DistanceType.ConsonniTodeschiniIII)
{0: 0.5404763088546395, 1: 0.27023815442731974, 2: 0.5404763088546395}
Consonni & Todeschini IV similarity [14].
cm.distance(metric=DistanceType.ConsonniTodeschiniIV)
{0: 0.7737056144690831, 1: 0.43067655807339306, 2: 0.6309297535714574}
Consonni & Todeschini V correlation [14].
cm.distance(metric=DistanceType.ConsonniTodeschiniV)
{0: 0.8560267854703983, 1: 0.30424737289682985, 2: 0.17143541431350617}
1- C. C. Little, "Abydos Documentation," 2018.
2- V. Dallmeier, C. Lindig, and A. Zeller, "Lightweight defect localization for Java," in European conference on object-oriented programming, 2005: Springer, pp. 528-550.
3- R. Abreu, P. Zoeteweij, and A. J. Van Gemund, "An evaluation of similarity coefficients for software fault localization," in 2006 12th Pacific Rim International Symposium on Dependable Computing (PRDC'06), 2006: IEEE, pp. 39-46.
4- M. R. Anderberg, Cluster analysis for applications: probability and mathematical statistics: a series of monographs and textbooks. Academic press, 2014.
5- A. M. Andrés and P. F. Marzo, "Delta: A new measure of agreement between two raters," British journal of mathematical and statistical psychology, vol. 57, no. 1, pp. 1-19, 2004.
6- C. Baroni-Urbani and M. W. Buser, "Similarity of binary data," Systematic Zoology, vol. 25, no. 3, pp. 251-259, 1976.
7- V. Batagelj and M. Bren, "Comparing resemblance measures," Journal of classification, vol. 12, no. 1, pp. 73-90, 1995.
8- F. B. Baulieu, "A classification of presence/absence based dissimilarity coefficients," Journal of Classification, vol. 6, no. 1, pp. 233-246, 1989.
9- F. B. Baulieu, "Two variant axiom systems for presence/absence based dissimilarity coefficients," Journal of Classification, vol. 14, no. 1, pp. 0159-0170, 1997.
10- R. Benini, Principii di demografia. Barbera, 1901.
11- G. N. Lance and W. T. Williams, "Computer programs for hierarchical polythetic classification (“similarity analyses”)," The Computer Journal, vol. 9, no. 1, pp. 60-64, 1966.
12- G. N. Lance and W. T. Williams, "Mixed-Data Classificatory Programs I - Agglomerative Systems," Australian Computer Journal, vol. 1, no. 1, pp. 15-20, 1967.
13- P. W. Clement, "A formula for computing inter-observer agreement," Psychological Reports, vol. 39, no. 1, pp. 257-258, 1976.
14- V. Consonni and R. Todeschini, "New similarity coefficients for binary data," Match-Communications in Mathematical and Computer Chemistry, vol. 68, no. 2, p. 581, 2012.