#!/usr/bin/env python # coding: utf-8 # # Simple Charts: Core Concepts # # The goal of this section is to teach you the core concepts required to create a basic Altair chart; namely: # # - **Data**, **Marks**, and **Encodings**: the three core pieces of an Altair chart # # - **Encoding Types**: ``Q`` (quantitative), ``N`` (nominal), ``O`` (ordinal), ``T`` (temporal), which drive the visual representation of the encodings # # - **Binning and Aggregation**: which let you control aspects of the data representation within Altair. # # With a good understanding of these core pieces, you will be well on your way to making a variety of charts in Altair. # We'll start by importing Altair, and (if necessary) enabling the appropriate renderer: # In[1]: import altair as alt # ## A Basic Altair Chart # # The essential elements of an Altair chart are the **data**, the **mark**, and the **encoding**. # # The format by which these are specified will look something like this: # # ```python # alt.Chart(data).mark_point().encode( # encoding_1='column_1', # encoding_2='column_2', # # etc. # ) # ``` # # Let's take a look at these pieces, one at a time. # ### The Data # # Data in Altair is built around the [Pandas Dataframe](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html). # For this section, we'll use the cars dataset that we saw before, which we can load using the [vega_datasets](https://github.com/altair-viz/vega_datasets) package: # In[2]: from vega_datasets import data cars = data.cars() cars.head() # Data in Altair is expected to be in a [tidy format](http://vita.had.co.nz/papers/tidy-data.html); in other words: # # - each **row** is an observation # - each **column** is a variable # # See [Altair's Data Documentation](https://altair-viz.github.io/user_guide/data.html) for more information. # ### The *Chart* object # # With the data defined, you can instantiate Altair's fundamental object, the ``Chart``. Fundamentally, a ``Chart`` is an object which knows how to emit a JSON dictionary representing the data and visualization encodings, which can be sent to the notebook and rendered by the Vega-Lite JavaScript library. # Let's take a look at what this JSON representation looks like, using only the first row of the data: # In[3]: cars1 = cars.iloc[:1] alt.Chart(cars1).mark_point().to_dict() # At this point the chart includes a JSON-formatted representation of the dataframe, what type of mark to use, along with some metadata that is included in every chart output. # ### The Mark # # We can decide what sort of *mark* we would like to use to represent our data. # In the previous example, we can choose the ``point`` mark to represent each data as a point on the plot: # In[4]: alt.Chart(cars).mark_point() # The result is a visualization with one point per row in the data, though it is not a particularly interesting: all the points are stacked right on top of each other! # # It is useful to again examine the JSON output here: # In[5]: alt.Chart(cars1).mark_point().to_dict() # Notice that now in addition to the data, the specification includes information about the mark type. # There are a number of available marks that you can use; some of the more common are the following: # # * ``mark_point()`` # * ``mark_circle()`` # * ``mark_square()`` # * ``mark_line()`` # * ``mark_area()`` # * ``mark_bar()`` # * ``mark_tick()`` # # You can get a complete list of ``mark_*`` methods using Jupyter's tab-completion feature: in any cell just type: # # alt.Chart.mark_ # # followed by the tab key to see the available options. # ### Encodings # # The next step is to add *visual encoding channels* (or *encodings* for short) to the chart. An encoding channel specifies how a given data column should be mapped onto the visual properties of the visualization. # Some of the more frequenty used visual encodings are listed here: # # * ``x``: x-axis value # * ``y``: y-axis value # * ``color``: color of the mark # * ``opacity``: transparency/opacity of the mark # * ``shape``: shape of the mark # * ``size``: size of the mark # * ``row``: row within a grid of facet plots # * ``column``: column within a grid of facet plots # # For a complete list of these encodings, see the [Encodings](https://altair-viz.github.io/user_guide/encoding.html) section of the documentation. # # Visual encodings can be created with the `encode()` method of the `Chart` object. For example, we can start by mapping the `y` axis of the chart to the `Origin` column: # In[6]: alt.Chart(cars).mark_point().encode( y='Origin' ) # The result is a one-dimensional visualization representing the values taken on by `Origin`, with the points in each category on top of each other. # As above, we can view the JSON data generated for this visualization: # In[7]: alt.Chart(cars1).mark_point().encode( x='Origin' ).to_dict() # The result is the same as above with the addition of the `'encoding'` key, which specifies the visualization channel (`y`), the name of the field (`Origin`), and the type of the variable (`nominal`). # We'll discuss these data types in a moment. # The visualization can be made more interesting by adding another channel to the encoding: let's encode the `Miles_per_Gallon` as the `x` position: # In[8]: alt.Chart(cars).mark_point().encode( y='Origin', x='Miles_per_Gallon' ) # You can add as many encodings as you wish, with each encoding mapped to a column in the data. # For example, here we will color the points by *Origin*, and plot *Miles_per_gallon* vs *Year*: # In[9]: alt.Chart(cars).mark_point().encode( color='Origin', y='Miles_per_Gallon', x='Year' ) # ### Excercise: Exploring Data # # Now that you know the basics (Data, encodings, marks) take some time and try making a few plots! # # In particular, I'd suggest trying various combinations of the following: # # - Marks: ``mark_point()``, ``mark_line()``, ``mark_bar()``, ``mark_text()``, ``mark_rect()``... # - Data Columns: ``'Acceleration'``, ``'Cylinders'``, ``'Displacement'``, ``'Horsepower'``, ``'Miles_per_Gallon'``, ``'Name'``, ``'Origin'``, ``'Weight_in_lbs'``, ``'Year'`` # - Encodings: ``x``, ``y``, ``color``, ``shape``, ``row``, ``column``, ``opacity``, ``text``, ``tooltip``... # # Work with a partner to use various combinations of these options, and see what you can learn from the data! In particular, think about the following: # # - Which encodings go well with continuous, quantitative values? # - Which encodings go well with discrete, categorical (i.e. nominal) values? # # After about 10 minutes, we'll ask for a couple volunteers to share their combination of marks, columns, and encodings. # --- # ## Encoding Types # One of the central ideas of Altair is that the library will **choose good defaults for your data type**. # # The basic data types supported by Altair are as follows: # # # # # # # # # # # # # # # # # # # # # # # # # # # #
Data TypeCodeDescription
quantitativeQNumerical quantity (real-valued)
nominalNName / Unordered categorical
ordinalOOrdered categorial
temporalTDate/time
# # When you specify data as a pandas dataframe, these types are **automatically determined** by Altair. # # When you specify data as a URL, you must **manually specify** data types for each of your columns. # Let's look at a simple plot containing three of the columns from the cars data: # In[10]: alt.Chart(cars).mark_tick().encode( x='Miles_per_Gallon', y='Origin', color='Cylinders' ) # Questions: # # - what data type best goes with ``Miles_per_Gallon``? # - what data type best goes with ``Origin``? # - what data type best goes with ``Cylinders``? # Let's add the shorthands for each of these data types to our specification, using the one-letter codes above # (for example, change ``"Miles_per_Gallon"`` to ``"Miles_per_Gallon:Q"`` to explicitly specify that it is a quantitative type): # In[11]: alt.Chart(cars).mark_tick().encode( x='Miles_per_Gallon:Q', y='Origin:N', color='Cylinders:O' ) # Notice how if we change the data type for ``'Cylinders'`` to ordinal the plot changes. # # As you use Altair, it is useful to get into the habit of always specifying these types explicitly, because this is *mandatory* when working with data loaded from a file or a URL. # ### Exercise: Adding Explicit Types # # Following are a few simple charts made with the cars dataset. For each one, try to add explicit types to the encodings (i.e. change ``"Horsepower"`` to ``"Horsepower:Q"`` so that the plot doesn't change. # # Are there any plots that can be made better by changing the type? # In[12]: alt.Chart(cars).mark_bar().encode( y='Origin', x='mean(Horsepower)' ) # In[13]: alt.Chart(cars).mark_line().encode( x='Year', y='mean(Miles_per_Gallon)', color='Origin' ) # In[14]: alt.Chart(cars).mark_bar().encode( y='Cylinders', x='count()', color='Origin' ) # In[15]: alt.Chart(cars).mark_rect().encode( x='Cylinders', y='Origin', color='count()' )