(whole-game)=
Our goal in this part of the book is to give you a rapid overview of the main tools of data science: importing, cleaning, transforming, and visualising data, as shown in the figure below. We want to show you the “whole game” of data science giving you just enough of all the major pieces so that you can tackle real, if simple, datasets. The later parts of the book will hit each of these topics in more depth, increasing the range of data science challenges that you can tackle.
import matplotlib_inline.backend_inline
matplotlib_inline.backend_inline.set_matplotlib_formats("svg")
# remove-input
import graphviz
dot = graphviz.Digraph(comment="Data science workflow")
dot.attr(compound="true")
dot.edge("Import", "Clean")
with dot.subgraph(name="cluster_0") as c:
c.attr(style="filled", color="lightgrey")
c.node_attr.update(style="filled", color="white")
c.edges(
[("Visualise", "Analyse"), ("Analyse", "Transform"), ("Transform", "Visualise")]
)
c.attr(label="Understand")
dot.edge("Clean", "Analyse", lhead="cluster_0")
dot.edge("Analyse", "Communicate", ltail="cluster_0")
dot
After this chapter, we have four main chapters that focus on the tools of data science:
data-visualise
, you’ll dive into visualisation, learning the basic structure of a plot, and powerful techniques for turning data into plots.data-transform
, you’ll learn the key verbs that allow you to select important variables, filter out key observations, create new variables, and compute summaries.data-tidy
, you’ll learn about cleaning data and specifically "tidy" data, a consistent way of storing tabular data that makes transformation, visualisation, and modelling easier. You’ll learn the underlying principles, and how to get your data into a "tidy" format.data-import
you’ll learn the basics of getting .csv files into your Python session.These are interspersed with four other chapters that focus on your Python workflow:
workflow-basics
, {ref}workflow-style
, and {ref}workflow-writing-code
, you'll learn good workflow practices for writing and organising your code.workflow-packages-and-environments
, you'll learn more about packages and isolating your projects in separate code environments.Finally, {ref}workflow-help
contains some short advice on how to get help and keep learning.