After watching way too much Game of Thrones and Ink Master, I started wondering what it is that makes a celebrity standout from the rest of us. Are there features, especially facial features, that can distinguish celebrities from common folk?
I put this question out of my mind until recently, while looking for interesting applications of datamining Twitter, I stumbled on AlchemyAPI, an IBM Company. AlchemyAPI
provides a high-level interface to deep-learning tools. They offer a suite of natural language processing tools, for example the ability to have a computer read a newsarticle, extract the relevant people and ideas, and decide if they are of favorable sentiment. Try out the webdemo, it's really cool. They also offer computer vision tools, of particular interest to me, an API for facial recognition and extraction. They provide 9 SDK's, including one for Python
. Python also happens to be a great language for data and image analysis with libraries like scikit-image, IPython, and matplotlib, and there's a lot of cool stuff to be done with faces.
This demo will compare age and gender predictions of faces, automatically extracted from web images, between me, my friends, and some of our favorite celebrities. In regard to facial recognition and analysis, AlchemyAPI
does all of the hard work. I can then simply use some scikit-image
and matplotlib
Py-Fu to visualize the images, and rank them by age and gender scores.
If you want to try this for yourself, you'll need the following:
I used facebook to track the URL to some pictures of my friends, with unobstructed views of their faces and seemingly good image quality. Then I grabbed a bunch of individual and group pics of celebrities.
%pylab inline
from __future__ import division
import skimage.io as skio
# Change matplotlib label fontsize
from matplotlib import rcParams
rcParams['font.size'] = 15
BUDDIES = dict(ME = 'https://fbcdn-sphotos-g-a.akamaihd.net/hphotos-ak-xap1/v/t1.0-9/970276_643330279472_1364386421_n.jpg?oh=ef0a6a5758cee5ddccd1eb951937916f&oe=557FFA87&__gda__=1434357290_1745bc5628ade9870d78c55c661b4046',',
LOVELY_FIANCE ='<URL HIDDEN>',
CLAIRE ='<URL HIDDEN>',
...
)
CELEBS = dict(GOT_GALS = 'http://media4.popsugar-assets.com/files/2013/07/04/088/n/4981324/cdf19997323733ea_Main.xxxlarge/i/Pictures-Women-Game-Thrones-Emilia-Clarke.jpg',
GOT_GUYS = 'http://media1.popsugar-assets.com/files/2014/06/13/959/n/1922283/462854365d4d1e3f_GoT-Cover.xxxlarge/i/Hottest-Guys-Game-Thrones.jpg',
INKMASTER = 'http://www.slangstrong.com/wp-content/uploads/2012/11/Ink-Masters.jpg',
FASTFIVEGUYS = 'http://thatsenuff.com/wp-content/uploads/2011/04/0429-rather-obvi-credit.jpg',
MROD = 'http://d1oi7t5trwfj5d.cloudfront.net/81/3e/1e704c3c4232b7d10eaf67f8d260/michelle-rodriguez.jpg',
EVA = 'http://media1.popsugar-assets.com/files/2013/01/02/3/192/1922398/d8a83663d4d95316_evamendes.xxxlarge_2.jpg',
RYGOSLING = 'http://blogs.psychcentral.com/life-goals/files/2015/01/ryan-gosling.jpg',
THE_CLOONE = 'http://img2-2.timeinc.net/people/i/2014/sandbox/news/140210/george-clooney-600x450.jpg',
REAL_HOUSEWIVES = 'http://media.silive.com/entertainment_impact_tvfilm/photo/real-housewives-of-new-jerseyjpg-87e95765ec41dcc8.jpg'
)
US_AND_THEM = dict(BUDDIES.items() + CELEBS.items())
def showimage(url_or_array, *args, **kwargs):
""" Displays image; removes x/y labels"""
ax = kwargs.pop('ax', pylab.gca())
if isinstance(url_or_array, basestring):
out = ax.imshow(skio.imread(url_or_array), *args, **kwargs)
else:
out = ax.imshow(url_or_array, *args, **kwargs)
# Hide x and y axis ticks/labels
ax.get_xaxis().set_ticks([])
ax.get_yaxis().set_ticks([])
return out
Populating the interactive namespace from numpy and matplotlib
So for example, my not so flattering picture looks like:
showimage(US_AND_THEM['ME'])
plt.title('6th St. Austin');
And for ink master:
showimage(CELEBS['INKMASTER'])
plt.title("OMG DAVE NAVARRO!");
And so on
showimage(CELEBS['GOT_GALS']);
Next, I have to load Alchemy's Python SDK and then I can pass these images into the face recognition features.
import os
os.chdir(os.path.expanduser('~/Desktop/alchemyapi_python/'))
from alchemyapi import AlchemyAPI
api = AlchemyAPI() #<-- Must Instantiate
We can use AlchemyAPI.faceTagging
to find one or more faces in an image. For example, the three ink masters:
api.faceTagging('url', CELEBS['INKMASTER'])
{u'imageFaces': [{u'age': {u'ageRange': u'35-44', u'score': u'0.389988'}, u'gender': {u'gender': u'MALE', u'score': u'0.970688'}, u'height': u'61', u'identity': {u'disambiguated': {u'dbpedia': u'http://dbpedia.org/resource/Dave_Navarro', u'freebase': u'http://rdf.freebase.com/ns/m.01lz4tf', u'name': u'Dave Navarro', u'subType': [u'Person', u'Composer', u'MusicalArtist', u'Celebrity', u'FilmMusicContributor', u'Guitarist', u'Lyricist', u'MusicalGroupMember', u'TVProducer', u'TVActor'], u'website': u'http://www.6767.com/', u'yago': u'http://yago-knowledge.org/resource/Dave_Navarro'}, u'name': u'Dave Navarro', u'score': u'0.970688'}, u'positionX': u'244', u'positionY': u'77', u'width': u'61'}, {u'age': {u'ageRange': u'35-44', u'score': u'0.478642'}, u'gender': {u'gender': u'MALE', u'score': u'0.995033'}, u'height': u'72', u'positionX': u'445', u'positionY': u'80', u'width': u'72'}, {u'age': {u'ageRange': u'45-54', u'score': u'0.388825'}, u'gender': {u'gender': u'MALE', u'score': u'0.993307'}, u'height': u'58', u'positionX': u'73', u'positionY': u'102', u'width': u'58'}], u'status': u'OK', u'totalTransactions': u'4', u'url': u'http://www.slangstrong.com/wp-content/uploads/2012/11/Ink-Masters.jpg', u'usage': u'By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html'}
Rather than work with these JSON streams, I'll make a class (namedtuple specifically) to store a face. Some of the metatadata stored on the face will include:
- Pixels in original image where face is
- Reference to original image
- Predicted gender and confidence score
- Predicted ageRange and confidence score
The code below defines such a class. If you're not a programmer, don't worry about this next cell too much.
from collections import namedtuple
# Custom named tuple class, custom printout
Face = namedtuple('Face', ['face', 'index', 'gender', 'genderscore', 'agerange', 'agescore'], verbose=False)
def newrep(obj):
return 'FACE_{o.index} ({o.gender} {o.genderscore}, {o.agerange} {o.agescore})'.format(o=obj)
Face.__repr__ = newrep
def cutface(image, facetag):
""" Given original image and JSON face tag, crop out the face. """
def _parseFace(attr):
return int(facetag[attr])
X, Y, WIDTH, HEIGHT = _parseFace('positionX'), _parseFace('positionY'), _parseFace('width'), _parseFace('height')
return image[Y:Y+HEIGHT, X:X+WIDTH]
def mapfaces(facedict):
""" Takes a dictionary of name:url (see BUDDIES above) and returns name:Face,
where FACE is the python class for storing image face and metadata.
"""
out = {}
for name, imageurl in US_AND_THEM.items():
try:
faces = api.faceTagging('url', imageurl)['imageFaces']
except Exception as exc:
print "FAILED ON IMAGE: %s with exception:\n%s" % (name, exc.message)
continue
image = skio.imread(imageurl)
out[name] = []
# Iterate over faces, store
for (idx, facetag) in enumerate(faces):
faceregion = cutface(image, facetag)
gender = facetag['gender']['gender']
genderscore = float(facetag['gender']['score'])
agerange = facetag['age']['ageRange']
agescore = float(facetag['age']['score'])
if agerange == '<18':
agerange = '0-18' #<-- For sorting later on
out[name].append(Face(faceregion, idx, gender, genderscore, agerange, agescore))
return out
# THIS ACTUALLY DOES THE MAPPING
US_AND_THEM = mapfaces(US_AND_THEM)
Now the face regions of the image, as well as the important metadata', are stored in a list of Face
classes. The list is called US_AND_THEM
. We can access the face image from the face
attribute, and some images have multiples faces.
f, axes = plt.subplots(1,3, figsize=(4,2))
for idx, celeb in enumerate(US_AND_THEM['INKMASTER']):
showimage(celeb.face, ax=axes[idx])
f.suptitle('The Faces of Ink Mastery');
Alchemy's
facial recognition predicts age range and gender. It also provides a confidence score for these predictions. For example, for face predicted to be male, how confident in the prediction is the algorithm? Let's arrange the faces by increasing confidence in gender prediction...
To do so, first, subdivide list by gender. We'll also need the multi_axes
function from my pyparty
library to make it easier to plot lots of faces and have python
automatically figure out the axes sizes and counts (you will need pyparty to reproduce this).
from pyparty.utils import multi_axes
ALL = []
MEN = []
WOMEN = []
for faces in US_AND_THEM.values():
for face in faces:
ALL.append(face)
if face.gender == 'MALE':
MEN.append(face)
else:
WOMEN.append(face)
def multiface_plot(faces, title=''):
""" From list of faces, plots each face, maintaining sort order
and figures out sizing/number of subaxes to create automatically.
"""
def rint(x):
return int(round(x))
num = len(faces)
# Ad-hoc stuff to ensure images sized right
size=(5+rint(num/2), 2.5+rint(num/2))
axes = multi_axes(num, figsize=size)[0]
#Sort faces in each age group by confidence of age group
for (idx, face) in enumerate(faces):
try:
ax = axes[idx]
except TypeError:
ax = axes #<--- IF length 1
showimage(face.face, ax=ax)
plt.tight_layout()
f=plt.gcf()
f.suptitle(title,
fontsize=16,
y=1.02)
return f, axes
/home/adam/Enthought/Canopy_32bit/User/lib/python2.7/site-packages/enthought/__init__.py:11: UserWarning: Module argparse was already imported from /home/adam/Canopy/appdata/canopy-1.4.0.1938.rh5-x86/lib/python2.7/argparse.pyc, but /home/adam/Enthought/Canopy_32bit/User/lib/python2.7/site-packages is being added to sys.path __import__('pkg_resources').declare_namespace(__name__)
MALECERTAINTY = sorted(MEN, key=lambda x: x.genderscore)
multiface_plot(MALECERTAINTY, title='Increasing Confidence that this is a man -->');
GALCERTAINTY = sorted(WOMEN, key=lambda x: x.genderscore)
multiface_plot(GALCERTAINTY, title='Increasing Confidence that this is a woman -->');
It's interesting that Massie William's character Arya Stark hid in plain sight as a boy in GOT season 2, considering she has been predicted female with highest certainty. I guess AlchemyAPI
can see things that Tywin Lannister can't...
from IPython.display import Image
Image(url='https://uproxx.files.wordpress.com/2013/03/tywin-arya.gif')
#Get all age groups
agegroups = set([p.agerange for p in ALL])
#Group in dict
BY_AGE = dict((group,[]) for group in agegroups)
for face in ALL:
BY_AGE[face.agerange].append(face)
for group, faces in sorted(BY_AGE.items()):
faces = sorted(faces, key=lambda x: x.agescore)
multiface_plot(faces, title='AGES: %s (increasing confidence -->)'% group)
gscores = [f.genderscore for f in ALL]
ascores = [f.agescore for f in ALL]
ages = [f.agerange for f in ALL]
agecolors = ['r','g','b','orange','purple','magenta']
for (idx, (age, agegroup)) in enumerate(sorted(BY_AGE.items())):
gscores = [f.genderscore for f in agegroup]
ascores = [f.agescore for f in agegroup]
plt.scatter(gscores, ascores, color=agecolors[idx], s=75, alpha=.6)
plt.legend(sorted(BY_AGE), loc='upper left', ncol=2, fontsize=14);
plt.xlim(0.7,1.05)
plt.xlabel('Gender Score Confidence')
plt.ylabel('Age Score Confidence')
plt.title('Confidence Faces by Age Group');
A couple interesting points:
#Get face with lowest gender score
youngins = BY_AGE['18-24']
mingender = min(youngins, key=lambda x: x.genderscore).face
maxage = max(youngins, key=lambda x: x.agescore).face
f, (ax1, ax2) = plt.subplots(1,2)
showimage(mingender, ax=ax1)
showimage(maxage, ax=ax2)
ax1.set_title('Least Confident of Gender', fontsize=12)
ax2.set_title('Most Confident of Age Range', fontsize=12);
Gethin Anthony is 31 by the way, but I'd have to agree that he doesn't look it!
Before reading too much into age range or gender confidence, the point of this exercise was really just to show some new tools available to researchers, and just how accessible facial recognition has become. The goal was not to provide a robust quantification of age or gender. Remember, we're using just a single picture for each person. We haven't taken into account the vast amount of variation in the same person's face from image to image, due to facial hair, expression, image quality, lighting and many other factors.
When we talk of gender confidence, we're using a convenient mental notion to refer to what, in actuality, is an abstruse computed quantity. The confidence of a machine-learning predictor is both a reflection of the image set upon which it was trained, and the intrinsic preferences and biases of the method. For example, there could have been an over-represented sampling of guys with fat cheeks in the 18-24 age range. In that case, my less-than-sober selfy is more likely seem like that of a spry 24 year old, as alcohol kind of makes my cheeks puffy. I'm not suggesting this is actually the case, or that there's any fault with the underlying methods or training data, just that without fully understanding the intricate prediction process and its limitations, it's not worth getting too invested in what the results of this excercise really "mean".
In other words, your gender is that with which you identify, and you're really only as old as you feel
# Change the notebook style/layout
# Much thanks to those who created it!
from IPython.core.display import HTML
import urllib2
HTML(urllib2.urlopen('http://bit.ly/1Bf5Hft').read())