Using PCA to visualize the MtG universe

In this notebook, we're going to scrape Magic the Gathering's Gatherer card database and then perform principal components analysis to visualize hidden relationships between cards. Our goal will be to see how much card-to-card variation can be simplified and then plotted in two-dimensions and again what those card groupings look like.

This data set is very high-dimensional -- there are over a 100 unique mechanics in the game and the game state has many different elements (hand, battlefield, mana pool, etc.). Being able to translate the 13,000 unique card texts into structured data is also a challenge NLP-related task.

Warning: This notebook is long...so, for the impatient:

Here is what we will be working towards, a programmatic mapping of every Magic card ever made across two psuedo-axes:

We will show that while Magic cards can differ in thousands of ways, they can be roughly categorized based on two simple measures: how "creature-y" are they? and how much do they related to the board or non-board state?

Implementation details

Pretty baller, right? We will interpret and grok this graph later, but for now, let's do this...

LEERRRROOYYY JENNNKIINNNNNSSS.

Outline

Here's a breakdown of the four steps that we'll go through to accomplish this task:

  1. Scrape + clean the data using requests, web from pattern, and pandas
  • Extract features from the data using fuzzywuzzy and domain knowledge
  • Perform and analyze PCA using sklearn
  • Visualize + interpret results using the plotly graphing library

First some boring imports and settings (feel free to skip over)

In [172]:
# boring imports

%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pylab as plt

import requests
from pattern import web
requests.packages.urllib3.disable_warnings()

import re, string
from sets import Set
from collections import Counter
from fuzzywuzzy import fuzz

database = {}
pd.set_option('display.max_rows', 10)

# Silly helper functions

def isInt(s):
    try: 
        int(s)
        return True
    except ValueError:
        return False

def anyIntOrColor(l):
    for val in l:
        if isInt(val) | (val in ['Black', 'Red', 'Green', 'Blue', 'White']) : return True
    return False

(1) -- Scrape baby, scrape

Our first order of business is scraping the data from the Gatherer database using requests and web from pattern. In it's simplest form, every Magic card has a name, text, type, mana cost, and power/toughness (if it's a creature). An example is Hypnotic Specter, a powerful creature in the early days of Magic:

To scrape the relevant card features, we will construct card URLs using the card's multiverse_id and on the page we load will look for unique HTML elements that correspond to each of the features we will to obtain.

In [173]:
# grabCard scrapes:

# name, types, text (lowered, alphanumeritized), mana cost,
# cmc, power and toughness, and rarity.

# and adds it to the global card database

def grabCard(multiverse_id):
    xml = "http://gatherer.wizards.com/Pages/Card/Details.aspx?multiverseid=" + str(multiverse_id)
    dom = web.Element(requests.get(xml).text)
    
    # card name, card type
    cardName = dom('div.cardImage img')[0].attributes['alt'] if dom('div .cardImage img') else ''
    cardType = [element.strip() for element in \
                dom('div#ctl00_ctl00_ctl00_MainContent_SubContent_SubContent_typeRow div.value')[0].content.split(u'\u2014')]
    
    # extract, parse, clean text into a list
    cardText = []
    pattern = re.compile('[\W_]+')
    for line in dom('div.cardtextbox'):
        for element in line:
            cardText.append(element)
    
    for i in xrange(len(cardText)):
        if cardText[i].type == 'element' and cardText[i].tag == 'img':
            cardText[i] = cardText[i].attributes['alt']
        else:
            cardText[i] = str(cardText[i]).strip().lower()
        pattern.sub('', cardText[i]) 
    
    # mana symbols
    manaCost = [element.attributes['alt'] for element in dom('div#ctl00_ctl00_ctl00_MainContent_SubContent_SubContent_manaRow div.value img')]
    cmc = int(dom('div#ctl00_ctl00_ctl00_MainContent_SubContent_SubContent_cmcRow div.value')[0].content.strip()) \
            if dom('div#ctl00_ctl00_ctl00_MainContent_SubContent_SubContent_cmcRow div.value') else np.nan
    
    # rarity
    rarity = dom('div #ctl00_ctl00_ctl00_MainContent_SubContent_SubContent_rarityRow div.value span')[0].content.lower()
    
    # p/t
    power = np.nan
    power = [_.strip() for _ in dom('div#ctl00_ctl00_ctl00_MainContent_SubContent_SubContent_ptRow div.value')[0].content.split(' / ')][0] \
                if dom('div#ctl00_ctl00_ctl00_MainContent_SubContent_SubContent_ptRow div.value') else np.nan
    power = float(power) if power != '*' and power != np.nan else np.nan
    toughness = [_.strip() for _ in dom('div#ctl00_ctl00_ctl00_MainContent_SubContent_SubContent_ptRow div.value')[0].content.split(' / ')][1] \
                    if dom('div#ctl00_ctl00_ctl00_MainContent_SubContent_SubContent_ptRow div.value') else np.nan
    toughness = float(toughness) if (toughness != '*' and toughness != '7-*' and toughness != np.nan) else np.nan
      
    # add data
    database[cardName] = {
                            'cardType' : cardType,
                            'cardText' : cardText,
                            'manaCost' : manaCost,
                            'cmc' : cmc,
                            'rarity': rarity,
                            'power' : power,
                            'toughness' : toughness
                         }

Perform the scraping

We'll iterate through a range of multiverse_ids to scrape a desired amount of cards. Note that it takes around 1 minute/500 multiverse_ids. Given that there are 13k+ cards (and multiple versions of each -- see below), we'll limit our scraping to ~500 cards from the very first Magic set: Alpha.

In [174]:
cardsToScrape = 600

for i in xrange(1, cardsToScrape):
    if (i % 100 == 0): print "Grabbed " + str(i)
    grabCard(i)

print "Done!"
Grabbed 100
Grabbed 200
Grabbed 300
Grabbed 400
Grabbed 500
Done!

At this point, we now have roughly cardsToScrape cards and associated values in a local dict using the cardName as the key. (Note that we have less than cardsToScrape as we're iterating over multiverse_ids and some ids don't actually match to a card page.)

Note for potential future work

There are other aspects represented on the Gatherer database such as set and community ratings but we leave this to future work. Annoyingly, for cards in multiple sets, the card will have a different page (and subsequently different set of ratings) for each set; though this would require more work, it'd be super interesting if you could predict a card's community interest (# ratings) and favorability (average rating).

Making the data usable

We'll now put this into a pandas dataframe for cleaning, variable creation and initial analysis/spot checking/understanding.

In [175]:
data = pd.DataFrame.from_dict(database, orient='index')
data['cardName'] = data.index
data
Out[175]:
toughness power cmc rarity cardType cardText manaCost cardName
Air Elemental 4 4 5 uncommon [Creature, Elemental] [flying] [3, Blue, Blue] Air Elemental
Ancestral Recall NaN NaN 1 rare [Instant] [target player draws three cards.] [Blue] Ancestral Recall
Animate Artifact NaN NaN 4 uncommon [Enchantment, Aura] [enchant artifact, as long as enchanted artifa... [3, Blue] Animate Artifact
Animate Dead NaN NaN 2 uncommon [Enchantment, Aura] [enchant creature card in a graveyard, when an... [1, Black] Animate Dead
Animate Wall NaN NaN 1 rare [Enchantment, Aura] [enchant wall, enchanted wall can attack as th... [White] Animate Wall
... ... ... ... ... ... ... ... ...
Winter Orb NaN NaN 2 rare [Artifact] [players can't untap more than one land during... [2] Winter Orb
Wooden Sphere NaN NaN 1 uncommon [Artifact] [whenever a player casts a green spell, you ma... [1] Wooden Sphere
Word of Command NaN NaN 2 rare [Instant] [look at target opponent's hand and choose a c... [Black, Black] Word of Command
Wrath of God NaN NaN 4 rare [Sorcery] [destroy all creatures. they can't be regenera... [2, White, White] Wrath of God
Zombie Master 3 2 3 rare [Creature, Zombie] [other zombie creatures have swampwalk., other... [1, Black, Black] Zombie Master

296 rows × 8 columns

(2) -- Feature extraction

Based on our domain knowledge, we're going to extract four main types of features for each card:

  1. Mana cost and mana amounts of a card
  2. Categorical features -- type (i.e. Artifact, Creature, etc.) and rarity (i.e. Common, Uncommon, etc.)
  3. Text features based on the card's text (i.e. "When this creature enters the battlefield...")
  4. Functional features -- having a Tap ability, being a mana generator, etc.
In [176]:
# Which features do we want to use?
# All enabled by default, mana and categorical features required

textFeatures = True
functionalFeatures = True

(2.1) -- Mana features

In [177]:
# Create mana features

colorlessMana = []
colorless = []

for row in data['manaCost']:
    found = 0
    for val in row:
        if isInt(val):
            colorlessMana.append(float(val))
            found = 1
    if found == 0:
        colorlessMana.append(0)

data['colorlessMana'] = colorlessMana 
data['Variable Colorless'] = [1 if 'Variable Colorless' in text else 0 for text in data['manaCost']]
In [178]:
# Count mana symbols

manaSymbols = []

manaSymbols = ['Blue', 'Black', 'Red', 'Green', 'White']
manaVars = ['mana_' + _ for _ in manaSymbols]

for i in xrange(len(manaSymbols)):
    data[manaVars[i]] = [text.count(manaSymbols[i]) for text in data['manaCost']]
    data[manaSymbols[i]] = [1 if text.count(manaSymbols[i]) > 0 else 0 for text in data['manaCost']]
In [179]:
# Find color (ignores multicolor)

def isColorless(l):
    for val in l:
        if val in manaSymbols: return False
    return True

data['Artifact'] = [1 if isColorless(x) else 0 for x in data['manaCost']]

def findColor(l):
    for val in l:
        if not isInt(val) and val != 'Variable Colorless': return val
    return 'Artifact'

data['color'] = [findColor(l) for l in data['manaCost']]

data.groupby(data['color']).describe().to_csv('colorSummary.csv')
data.groupby(data['color']).describe()
Out[179]:
Artifact Black Blue Green Red Variable Colorless White cmc colorlessMana mana_Black mana_Blue mana_Green mana_Red mana_White power toughness
color
Artifact count 62 62 62 62 62 62 62 47.000000 62.000000 62 62 62 62 62.00 5.000000 5.000000
mean 1 0 0 0 0 0 0 2.361702 1.790323 0 0 0 0 0.00 2.400000 5.000000
std 0 0 0 0 0 0 0 1.673652 1.775397 0 0 0 0 0.00 2.302173 1.414214
min 1 0 0 0 0 0 0 0.000000 0.000000 0 0 0 0 0.00 0.000000 3.000000
25% 1 0 0 0 0 0 0 1.000000 0.000000 0 0 0 0 0.00 0.000000 4.000000
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
White min 0 0 0 0 0 0 1 1.000000 0.000000 0 0 0 0 1.00 1.000000 1.000000
25% 0 0 0 0 0 0 1 1.000000 0.000000 0 0 0 0 1.00 1.500000 1.000000
50% 0 0 0 0 0 0 1 2.000000 1.000000 0 0 0 0 1.00 2.000000 2.000000
75% 0 0 0 0 0 0 1 3.000000 1.000000 0 0 0 0 1.75 3.000000 4.500000
max 0 0 0 0 0 1 1 6.000000 3.000000 0 0 0 0 3.00 6.000000 6.000000

48 rows × 16 columns

(2b) -- Categorical features

In [180]:
# Create categorical features
    
primaryTypes = [cardType[0] for cardType in data['cardType']]

for i in xrange(len(primaryTypes)):
    if primaryTypes[i] == u'Basic Land':
        primaryTypes[i] = u'Land'
    if primaryTypes[i] == u'Artifact Creature':
        primaryTypes[i] = u'Creature'

data['Primary Type'] = primaryTypes
            
data = pd.concat([data, pd.get_dummies(data['Primary Type'])], axis=1)
data = pd.concat([data, pd.get_dummies(data['rarity'])], axis=1)

data.groupby(data['rarity']).describe().to_csv('byRarity.csv')
data.groupby(data['rarity']).describe()
    
data.groupby(data['Primary Type']).describe().to_csv('byType.csv')
data.groupby(data['Primary Type']).describe()
Out[180]:
toughness power cmc colorlessMana Variable Colorless mana_Blue Blue mana_Black Black mana_Red ... Artifact Creature Enchantment Instant Land Sorcery basic land common rare uncommon
Primary Type
Artifact count 1 1 43.000000 43.000000 43 43 43 43.00 43.00 43 ... 43 43 43 43 43 43 43 43.00 43.000000 43.000000
mean 6 3 2.116279 2.116279 0 0 0 0.00 0.00 0 ... 1 0 0 0 0 0 0 0.00 0.604651 0.395349
std NaN NaN 1.499354 1.499354 0 0 0 0.00 0.00 0 ... 0 0 0 0 0 0 0 0.00 0.494712 0.494712
min 6 3 0.000000 0.000000 0 0 0 0.00 0.00 0 ... 1 0 0 0 0 0 0 0.00 0.000000 0.000000
25% 6 3 1.000000 1.000000 0 0 0 0.00 0.00 0 ... 1 0 0 0 0 0 0 0.00 0.000000 0.000000
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
Sorcery min NaN NaN 1.000000 0.000000 0 0 0 0.00 0.00 0 ... 0 0 0 0 0 1 0 0.00 0.000000 0.000000
25% NaN NaN 1.250000 0.000000 0 0 0 0.00 0.00 0 ... 0 0 0 0 0 1 0 0.00 0.000000 0.000000
50% NaN NaN 2.000000 1.000000 0 0 0 0.00 0.00 0 ... 0 0 0 0 0 1 0 0.00 0.000000 0.000000
75% NaN NaN 3.000000 2.000000 1 0 0 0.75 0.75 0 ... 0 0 0 0 0 1 0 0.75 1.000000 0.750000
max NaN NaN 4.000000 3.000000 1 3 1 3.00 1.00 1 ... 0 0 0 0 0 1 0 1.00 1.000000 1.000000

48 rows × 26 columns

(2c) -- Text features

A helper function from fuzzywuzzy to find partial word matches in card text boxes:

In [181]:
def partialMatch(s, l, threshold=95):
    fuzzVals = [fuzz.partial_ratio(s, x) for x in l]
    if not fuzzVals: fuzzVals = [0]
    return max(fuzzVals) >= threshold

Based on domain knowledge, we'll fuzzy match if certain important words are in a card's text box that will give us a hint of what the card does.

In [182]:
# Create text-based features

if textFeatures:

    data['Damage'] = [1 if partialMatch('damage', l) else 0 for l in data['cardText']]
    data['Hand'] = [1 if partialMatch('hand', l) else 0 for l in data['cardText']]
    data['Draw'] = [1 if partialMatch('draw', l, 80) else 0 for l in data['cardText']]
    data['Upkeep'] = [1 if partialMatch('draw', l, 80) else 0 for l in data['cardText']]
    data['Library'] = [1 if partialMatch('library', l) else 0 for l in data['cardText']]
    data['Sacrifice'] = [1 if partialMatch('sacrifice', l) else 0 for l in data['cardText']]
    data['Destroy'] = [1 if partialMatch('destroy', l) else 0 for l in data['cardText']]
    data['Discard'] = [1 if partialMatch('discard', l) else 0 for l in data['cardText']]
    data['Prevent'] = [1 if partialMatch('prevent', l) else 0 for l in data['cardText']]
    data['Life'] = [1 if partialMatch('life', l) else 0 for l in data['cardText']]
    data['Attack'] = [1 if partialMatch('attack', l) else 0 for l in data['cardText']]
    data['Block'] = [1 if partialMatch('block', l) else 0 for l in data['cardText']]
    data['Search'] = [1 if partialMatch('search', l) else 0 for l in data['cardText']]
    data['Choose'] = [1 if partialMatch('choose', l) else 0 for l in data['cardText']]
    data['Copy'] = [1 if partialMatch('copy', l) else 0 for l in data['cardText']]
    data['Change'] = [1 if partialMatch('change', l) else 0 for l in data['cardText']]
    data['Turn'] = [1 if partialMatch('turn', l) else 0 for l in data['cardText']]
    data['End of turn'] = [1 if partialMatch('end of turn', l, 80) else 0 for l in data['cardText']]
    data['Beginning of turn'] = [1 if partialMatch('beginning of turn', l, 80) else 0 for l in data['cardText']]
    data['Spell ref'] = [1 if partialMatch('spell', l) else 0 for l in data['cardText']]
    data['Creature ref'] = [1 if partialMatch('creature', l) else 0 for l in data['cardText']]
    data['Land'] = [1 if partialMatch('land', l) else 0 for l in data['cardText']]
    data['Mana'] = [1 if partialMatch('mana', l) else 0 for l in data['cardText']]
    data['Battlefield'] = [1 if partialMatch('battlefield', l) else 0 for l in data['cardText']]
    data['Blue ref'] = [1 if partialMatch('blue', l) else 0 for l in data['cardText']]
    data['Black ref'] = [1 if partialMatch('black', l) else 0 for l in data['cardText']]
    data['Green ref'] = [1 if partialMatch('green', l) else 0 for l in data['cardText']]
    data['Red ref'] = [1 if partialMatch('red', l) else 0 for l in data['cardText']]
    data['White ref'] = [1 if partialMatch('white', l) else 0 for l in data['cardText']]
    data['Colorless ref'] = [1 if partialMatch('colorless', l) else 0 for l in data['cardText']]

(2d) -- Functional features

In [183]:
# 4. Special functional features

def isBuff(str, l):
    found = 0
    for val in l:
        if str in val:
            found += 1
    if found > 0: return True
    else: return False

if functionalFeatures:

    data['Untap'] = [1 if partialMatch('untap', l) else 0 for l in data['cardText']]
    data['All'] = [1 if partialMatch('all', l) | partialMatch('any', l) else 0 for l in data['cardText']]

    data['Tap ability'] = [1 if 'Tap' in x else 0 for x in data['cardText']]
    data['Mana symbol'] = [1 if anyIntOrColor(x) else 0 for x in data['cardText']]
    data['Mana related'] = [1 if partialMatch('add mana', l) | partialMatch('your mana pool', l) \
                                  else 0 for l in data['cardText']]

    data['Buff'] = [1 if isBuff('+', l) else 0 for l in data['cardText']]
    data['Debuff'] = [1 if isBuff('-', l) else 0 for l in data['cardText']]

Note

Some of this might have been able to be done automatically, especially the text features, which could have been done by finding the most common words referred to in text boxes. Again, I leave this to future work and am really curious about what the literature on automatic feature creation says about this.

(3) -- Perform PCA

Surprisingly, the PCA itself is the easiest part of this entire thing. We'll use sklearn to perform a 10-component PCA to see how much of the entire data's dimensional variation can be reduced to 10 dimensions.

In [184]:
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
from sklearn.preprocessing import scale
from sklearn.preprocessing import StandardScaler

numericData = data.copy()
# scale to mean 0, variance 1
numericData_std = scale(numericData.fillna(0).select_dtypes(include=['float64', 'int64']))

pca = PCA(n_components=10)
Y_pca = pca.fit_transform(numericData_std)

So, how well did we do?

Well, based on the explained variance vector below it doesn't look like we did very well. The first two principal components only combined for 14% of the total variance in the data; though, of note, is that the first 10 factors do account for 46% of the total variance. Considering we're working with 62 features though, this is pretty decent.

In [185]:
# Analysis of PCA effectiveness

print
print "Variance explained by each factor:"
print [round(x, 3) for x in pca.explained_variance_ratio_]
print
print "Variance explained by all 10 factors:"
print round(sum(pca.explained_variance_ratio_), 3)
print
print "Num features:"
print len(numericData_std[0])
Variance explained by each factor:
[0.08, 0.062, 0.053, 0.052, 0.042, 0.039, 0.038, 0.036, 0.033, 0.03]

Variance explained by all 10 factors:
0.464

Num features:
62

(4) -- Results

Now time to see if it was all worth it -- and apply the PCA projection onto our data set. We want to be able to make a pretty scatterplot grouping the data by different types (color, card type, rarity, etc.) so we will make a helper graphing function using the plotly library.

In [186]:
import plotly.plotly as py
py.sign_in('nhuber', 'bmopo8hk40')
from plotly.graph_objs import *
import plotly.tools as tls

def chooseColor(group):
    
    if group == u'White': return '#B2B2B2'
    if group == u'Artifact': return '#996633'
    if group == u'Red' : return '#E50000'
    if group == u'Blue': return '#0000FF'
    if group == u'Green' : return '#006400'
    if group == u'Black' : return '#000000'

    if group == 'Instant': return '#E81A8C'
    if group == 'Sorcery': return '#F2AB11'
    if group == 'Creature' : return '#102DE8'
    if group == 'Enchantment': return '#1BBF28'
    if group == 'Land' : return '#000000'
    if group == 'Artifact' : return '#82580E'
    
    if group == 'common' : return '#000000'
    if group == 'uncommon': return '#9a9999'
    if group == 'rare': return '#eae002'
    if group == 'basic land': return '#ba7127'
In [187]:
# graphs data on pca axes grouped by type thetype
    
def graphByType(thetype, thetitle, centers=False, fix=-1, typefilter='',
               height=625, width=725, markerfontsize=9, titlefontsize=26):

    # fix reflects data through y = 0 to be backwards
    # compatible with previous annotated visualizations

    # create graph data from pca results
    
    traces = []

    if not typefilter:
        typefilter = set(data[thetype])
    
    for group in typefilter:
        
        matches = []
        for i in xrange(len(data[thetype])):
            if data[thetype].irow(i) == group:
                matches.append(i)

        graphColor = chooseColor(group)

        trace = Scatter(
            x=Y_pca[matches,0],
            y=fix * Y_pca[matches,1],
            mode='text',
            name=group,
            marker=Marker(
                size=8,
                color=graphColor,
                opacity=0.5),
            text = data['cardName'].irow(matches),
            textfont = Font(
                family='Georgia',
                size=markerfontsize,
                color=graphColor
                )
            )
        
        traces.append(trace)

        if centers:

            traceCentroid = Scatter(
                x = np.mean(Y_pca[matches,0]),
                y = np.mean(fix * Y_pca[matches,1]),
                mode = 'marker',
                name = str(thetype) + " center",
                marker = Marker(
                    size = 26,
                    color=graphColor),
                opacity = 0.75
            )

            traces.append(traceCentroid)
        
    # Set up the scatter plot layout

    dataToGraph = Data(traces)

    # auto-focus on where most of the data is clustered
    xRange = max(abs(np.percentile(np.array([x[0] for x in Y_pca]), 2.5)),
                abs(np.percentile(np.array([x[0] for x in Y_pca]), 97.5)))
    yRange = max(abs(np.percentile(np.array([x[1] for x in Y_pca]), 2.5)),
                abs(np.percentile(np.array([x[1] for x in Y_pca]), 97.5)))

    layout = Layout(title=thetitle,
                titlefont=Font(family='Georgia', size=titlefontsize),
                showlegend = True,
                autosize = False,
                height = height,
                width = width,
                xaxis=XAxis(
                    range=[-xRange, +xRange],
                    title='PC1', showline=False),
                yaxis=YAxis(
                    range=[-yRange, +yRange],
                    title='PC2', showline=False))
    
    fig = Figure(data=dataToGraph, layout=layout)
    return fig

(4a) -- Grouping by color

We first visualize all of our cards on the two PCA axes, grouped by color.

In [188]:
fig = graphByType('color', "PCA on MtG by card color")
py.iplot(fig)
Out[188]:

A few notes on this graph:

  • It's interactive: you can zoom into an area on the graph by dragging to create a rectangle
  • Also note that you can click the labels on the top right to turn on and off showing cards of different colors
  • It will probably have more meaning if you know about Magic and what each of these cards do; so, I'll offer my analysis below but if you do play and have alternate interpretations about how these cards are grouped, please do lmk

Result 1: A tale of two psuedo-axes

The primary result of this analysis is that a magic card can be mainly broken down into two components: How much does it behave like a spell vs. a creature? and How much does it affect the board or non-board resources? Visually, we are left with two "psuedo-axes":

  • On the left downward diagonal -- a creature axis which represents how "creature-y" a card is: big creatures are very creature-y, mid-sized utility creatures are somewhat creature-y, and enchantments/spells are not creature-y at all.
  • On the right upward diagonal, we have the mana/hand axis -- which is a spectrum on how much a card relates primarily to the board (i.e. permanents in play) or whether it affects non-board resources such as the player's cards and mana pools.

Going through salient examples

We'll now go through the highlighted examples in the above graph, from left to right, to understand and evaluate how the model performs; my evaluation out of 5 for each card plotting is in parenthesis in the title:

Juzam Djinn and Juggernaut (5/5)

  • Classic examples of "fatties," large creatures that dominate the board through sheer size
  • Very creature-y and very board-related

Goblin King, Old Man of the Sea and White Knight (5/5)

  • Classic examples of "utility creatures": medium size (2/2, 2/3, and 2/2 respectively) but have impact on the board through their abilities
  • Mostly creature-y and very board-related

Berserk, Raise dead (5/5)

  • Combat trick that offers a one-time pump for a creature in combat and enchantment that revives a creature from the graveyard
  • Appropriately somewhat creature-y and very board-related

Birds of Paradise (5/5), Manabarbs (4/5)

  • Birds of paradise -- a tiny, one-drop creature that provides mana ramping ability: the model correctly realizes it's a utility creature (i.e. medium creature-y) and is heavily related to a non-board resource, namely: mana.
  • Manabarbs is similarly a permanent, mana-based effect but it also impacts player's life totals so is appropriately in the middle of this axis
  • Medium creature-y, mostly mana/hand

Balance (4/5)

  • A high-impact spell that equalizes both players creature counts, cards in hand, and lands in play.
  • Very spell-y and related to mostly non-board resources (though it does equalize creatures as well)

Red elemental blast (4/5)

  • This is a tough one for the model: this card has two modes: destroy a blue permanent or counter a blue spell. Clearly, these are very different cards. But the model correctly predicts that it's very spell-y and mostly board-related

Wheel of Fortune, Ancestral Recall (5/5)

  • Perfect categorization for Wheel: this is a completely unique spell in the game where each player discards their hand and draws 7 new cards. Not a creature at all, not related to the board at all; therefore, is correctly labelled spell-y and mana/hand-y
  • Great categorization again for Ancestral: an instant-speed draw spell (note that it's right next to Braingeyser as well): very spell-y very mana/hand-y

Howling Mine (3/5)

  • Primarily card-related (each player draws 2 instead of 1) and a permanent effect so somewhat creature-y
  • Strange card because it has a permanent effect on card resources which is rare (i.e. vs. draw spells, discard spells)

Demonic Tutor, Black Lotus (5/5)

  • DT: The canonical tutor effect; very spell-y and hand-y
  • Black Lotus: the canonical magic card; pure power in a one-time rush of mana (very spell-y and very mana-y)

Summary

The model does a very good job categorizing cards across these two psuedo-axes. Unsurprisingly, the hardest cards to categorize are those that cross many axes -- Balance in it's all-encompassing scope or Red Elemental Blast in its multiple modes -- or have non-traditional effects like Howling Mine being a card-drawing artifact.

Result 2: Exploring the color identities

The PCA also tells a cool story of how different the colors are defined. Here each card is a small point, with the large point representing the "average" of all of the color's cards:

  • Artifacts are somewhat creature-y, somewhat spell-y (as they can have different effects depending on the card) but generally are more related to mana/hand resources than board effects (except for the Artifact creatures like Juggernaut that are correctly by other fatties like Juzam)
  • Blue and White seem to be, like all of the non-artifact colors, a mix of creatures and spells, but they skew towards the spell side; unlike Green, Red, and Black (the first two are very creature-y, the last a mix)
  • This is consistent with the intuitive color identity you have when you play the game: Blue and White are tricky, controlling decks; Red and Green are fatty/monster decks and Black is somewhat in between, arguably the most flexible color in the game in terms of large creatures, mana effects and also removal

Result 3: Exploring the type identities

In [189]:
fig = graphByType('Primary Type', "PCA on MtG by card type")
py.iplot(fig)
Out[189]:

The model correctly associates creatures on the creature axis, instants and sorceries on the spell axis (with ones related to mana like Dark Ritual or Contract from Below higher on the mana/hand axis), enchantments in the "spell-like" but permanent effects region and lands in the mana-related zone. Artifacts because of their unique nature are incredibly hard to categorize and honestly the model probably doesn't do a great job for most of them, and sort of arbitrarily groups them in their own group as a whole, saying they're also mostly mana/hand-y, but there's likely a lot more that could be teased out here to get the artifacts to be distributed more uniformly/appropriately across these axes.

Result 4: Exploring rarity

In [190]:
fig = graphByType('rarity', "PCA on MtG by card rarity")
py.iplot(fig)
Out[190]:

Rares are allowed to have all kinds of effects (i.e. spread across both axes), uncommons as well but less so (i.e. less smattered), commons limited to medium-sized creatures and non-mana/hand-related spells. This is consistent with the game designer's views on what "feels like a common" or what power levels different types of cards are able to have.

Conclusions and future work

There's tons more work to do here that I'd love to if I have time:

  • Play with what features to use (which are the most impactful -- almost certainly type and color, but what else? could we automatically create them from card text boxes without domain knowledge? of those which are the most meaningful?)
  • What separates good cards from bad cards? how could we automatically detect/predict card quality?
  • What determines popular/interesting cards? (note that Gatherer also has community ratings for every card ever made
  • Does this analysis hold for the entire corpus of magic cards? how about for different sets? does it get better or worse over time?

Until then,
@nhuber | [email protected]

custom css

In [ ]: