#!/usr/bin/env python # coding: utf-8 # ```{admonition} Information # __Section__: Plot graphics # __Goal__: Get some tools to create plots with Python and Pandas. # __Time needed__: 15 min # __Prerequisites__: Curiosity # ``` # # Plot graphics # As we work with data in this course, it is important to be able to represent graphically the data and results we will be working with. # # This notebook provides you with some basic methods to represent the basic information about a dataset. # # To know more about the different types of graphs, visit the page __TODO: add link to graphs page__. # ## Plot the distribution of the attributes of a dataset (with Pandas) # The library ``Pandas`` comes with a method to easily represent some basic features about a Dataframe. The method [plot()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.html) used on a Series or a Dataframe allows you to create several types of plots on the attributes. # # As an example, let's import an easy dataset and use the method ``plot()`` on the attributes. # In[1]: # import the data import pandas as pd df = pd.read_csv('./trip9.csv') # The method ``hist()`` plots the histogram of the Series. The histogram represents the distribution of the values of the attribute in the dataset. # In[4]: # plot the histogram of the attribute 'LAT' df['LAT'].plot.hist() # The method ``box()`` creates the boxplot of the attribute. The boxplot is another representation of the distribution of the dataset. # In[5]: df['SOG'].plot.box() # The method ``scatter()`` plots the values of 2 attributes against each other. For example, here we represent the attributes ``LON`` and ``LAT`` together to visualize the path of the ship. # In[8]: df.plot.scatter('LON', 'LAT') # Other types of plots are possible with this method, visit the documentation for more information: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.html. # ## Plot some lists of values (with Matplotlib) # [Matplotlib](https://matplotlib.org/index.html) is a library that provides a lot of visualization tools for Python. In this course, we want to keep it simple and will only use a few methods of the [pyplot](https://matplotlib.org/api/pyplot_api.html) API. # # First, the library has to be imported: # In[18]: import matplotlib.pyplot as plt # Then, we need to create the figure with the method ``figure()``. We can specify the size of the figure we want to create: # In[19]: plt.figure(figsize = (12, 8)) # We can now plot any value we want, for example, we plot again the attributes ``LAT`` and ``LON`` of the previously used dataset. We use the method [plot()](https://matplotlib.org/3.2.1/api/_as_gen/matplotlib.pyplot.plot.html), which comes with a lot of parameters. Here, we specify an ``'x'`` for the type of marker we want to plot, and ``'orange'`` for the color. We can specify a parameter ``label`` that can be used later for the legend. # In[24]: plt.plot(df['LON'], df['LAT'], marker = 'x', color = 'orange', label = 'Path taken by ship 09') # It is possible to add other plots to the graph. For example, we want to add a point with the coordinates [-122.72, 45.75]. # In[27]: plt.plot(-122.72, 45.75, marker = 'o', color = 'purple', label = 'Single point') # We can combine the last 3 cells to create our final plot. We print the legend with the method ``legend()``, specify a title with the method ``title`` and add the names of the two axes with the methods ``xlabel()`` and ``ylabel()``. # In[28]: plt.figure(figsize = (12, 8)) plt.plot(df['LON'], df['LAT'], marker = 'x', color = 'orange', label = 'Path taken by ship 09') plt.plot(-122.72, 45.75, marker = 'o', color = 'purple', label = 'Single point') plt.legend() plt.title('Example of figure') plt.xlabel('Longitude') plt.ylabel('Latitude')