#!/usr/bin/env python
# coding: utf-8
# # Lets-Plot Usage Guide
#
#
#
#
#
#
#
#
# - [System requirements](#sys)
# - [Installation](#install)
# - [Understanding architecture](#implementation)
# - [Learning API](#api)
# - [Getting started](#gsg)
#
#
# **Lets-Plot** is an open-source plotting library for statistical data. It is implemented using the
# [Kotlin programming language](https://kotlinlang.org/) that has a multi-platform nature.
# That's why Lets-Plot provides the plotting functionality that
# is packaged as a JavaScript library, a JVM library, and a native Python extension.
#
# The design of the Lets-Plot library is heavily influenced by
# [ggplot2](https://ggplot2.tidyverse.org) library.
#
#
# ## System requirements
# When installing the Lets-Plot library, consider the following requirements.
#
# Supported operating systems:
# - macOS
# - Linux
# - Windows
#
# Supported Python versions:
# - 3.7
# - 3.8
# - 3.9
# - 3.10
# - 3.11
# - 3.12
#
#
# ## Installation
#
# The `lets-plot` package is available in the [pypi.org](https://pypi.org/project/lets-plot/) repository.
# Execute the following command to install the `lets-plot` package on your Python interpreter:
#
# `pip install lets-plot`
#
#
# ## Understanding Lets-Plot architecture
# In `lets-plot`, the **plot** is represented at least by one
# **layer**. It can be built based on the default dataset with the aesthetics mappings, set of scales, or additional
# features applied.
#
# The **Layer** is responsible for creating the objects painted on the ‘canvas’ and it contains the following elements:
# - **Data** - the set of data specified either once for all layers or on a per layer basis.
# One plot can combine multiple different datasets (one per layer).
# - **Aesthetic mapping** - describes how variables in the dataset are mapped to the visual properties of the layer, such as color, shape, size, or position.
# - **Geometric object** - a geometric object that represents a particular type of plots.
# - **Statistical transformation** - computes some kind of statistical summary on the raw input data.
# For example, `bin` statistics is used for histograms and `smooth` is used for regression lines.
# Most stats take additional parameters to specify details of the statistical transformation of data.
# - **Position adjustment** - a method used to compute the final coordinates of geometry.
# Used to build variants of the same `geom` object or to avoid overplotting.
#
#
#
#
# ## Learning API
# The typical code fragment that renders a plot looks as follows:
#
# ```
# from lets_plot import *
# p = ggplot()
# p + geom_(mapping=aes('x', 'y', =''), stat=, position=)
# ```
#
# ### Geometric objects `geom`
#
# You can add a new geometric object (or plot layer) by creating it using the `geom_xxx()` function and then adding this object to `ggplot`:
#
# ```
# p = ggplot(data=df)
# p + geom_point()
# ```
#
# The following plots are supported:
#
# - Area plot: [`geom_area()`](https://lets-plot.org/python/pages/api/lets_plot.geom_area.html)
# - Discrete plot: [`geom_bar()`](https://lets-plot.org/python/pages/api/lets_plot.geom_bar.html), [`geom_pie()`](https://lets-plot.org/python/pages/api/lets_plot.geom_pie.html), [`geom_lollipop()`](https://lets-plot.org/python/pages/api/lets_plot.geom_lollipop.html), [`geom_count()`](https://lets-plot.org/python/pages/api/lets_plot.geom_count.html), [`stat_sum()`](https://lets-plot.org/python/pages/api/lets_plot.stat_sum.html)
# - Boxplot: [`geom_boxplot()`](https://lets-plot.org/python/pages/api/lets_plot.geom_boxplot.html)
# - Contours: [`geom_contour()`](https://lets-plot.org/python/pages/api/lets_plot.geom_contour.html), [`geom_contourf()`](https://lets-plot.org/python/pages/api/lets_plot.geom_contourf.html)
# - Connectors [`geom_path()`](https://lets-plot.org/python/pages/api/lets_plot.geom_path.html), [`geom_line()`](https://lets-plot.org/python/pages/api/lets_plot.geom_line.html), [`geom_segment()`](https://lets-plot.org/python/pages/api/lets_plot.geom_segment.html), [`geom_curve()`](https://lets-plot.org/python/pages/api/lets_plot.geom_curve.html), [`geom_spoke()`](https://lets-plot.org/python/pages/api/lets_plot.geom_spoke.html), [`geom_step()`](https://lets-plot.org/python/pages/api/lets_plot.geom_step.html)
# - Density plot: [`geom_density()`](https://lets-plot.org/python/pages/api/lets_plot.geom_density.html), [`geom_area_ridges()`](https://lets-plot.org/python/pages/api/lets_plot.geom_area_ridges.html), [`geom_violin()`](https://lets-plot.org/python/pages/api/lets_plot.geom_violin.html)
# and [`geom_density2d()`](https://lets-plot.org/python/pages/api/lets_plot.geom_density2d.html), [`geom_density2df()`](https://lets-plot.org/python/pages/api/lets_plot.geom_density2df.html)
# - Error-bar plot: [`geom_errorbar()`](https://lets-plot.org/python/pages/api/lets_plot.geom_errorbar.html), [`geom_crossbar()`](https://lets-plot.org/python/pages/api/lets_plot.geom_crossbar.html), [`geom_linerange()`](https://lets-plot.org/python/pages/api/lets_plot.geom_linerange.html), [`geom_pointrange()`](https://lets-plot.org/python/pages/api/lets_plot.geom_pointrange.html)
# - Histogram: [`geom_freqpoly()`](https://lets-plot.org/python/pages/api/lets_plot.geom_freqpoly.html), [`geom_histogram()`](https://lets-plot.org/python/pages/api/lets_plot.geom_histogram.html) and [`geom_bin2d()`](https://lets-plot.org/python/pages/api/lets_plot.geom_bin2d.html)
# - Jitter plot: [`geom_jitter()`](https://lets-plot.org/python/pages/api/lets_plot.geom_jitter.html)
# - Line plot: [`geom_line()`](https://lets-plot.org/python/pages/api/lets_plot.geom_line.html)
# - Reference lines: [`geom_abline()`](https://lets-plot.org/python/pages/api/lets_plot.geom_abline.html), [`geom_hline()`](https://lets-plot.org/python/pages/api/lets_plot.geom_hline.html), [`geom_vline()`](https://lets-plot.org/python/pages/api/lets_plot.geom_vline.html)
# - Polygons: [`geom_polygon`](https://lets-plot.org/python/pages/api/lets_plot.geom_polygon.html)
# - Rectangles, Tiles, Raster: [`geom_rect()`](https://lets-plot.org/python/pages/api/lets_plot.geom_rect.html), [`geom_tile()`](https://lets-plot.org/python/pages/api/lets_plot.geom_tile.html), [`geom_raster()`](https://lets-plot.org/python/pages/api/lets_plot.geom_raster.html)
# - Ribbons: [`geom_ribbon()`](https://lets-plot.org/python/pages/api/lets_plot.geom_ribbon.html)
# - Scatter plot: [`geom_point()`](https://lets-plot.org/python/pages/api/lets_plot.geom_point.html)
# - Dot plot: [`geom_dotplot()`](http://lets-plot.org/python/pages/api/lets_plot.geom_dotplot.html), [`geom_ydotplot()`](http://lets-plot.org/python/pages/api/lets_plot.geom_ydotplot.html)
# - Regression lines: [`geom_smooth()`](https://lets-plot.org/python/pages/api/lets_plot.geom_smooth.html)
# - Q-Q plot: [`geom_qq()`](https://lets-plot.org/python/pages/api/lets_plot.geom_qq.html), [`geom_qq_line()`](https://lets-plot.org/python/pages/api/lets_plot.geom_qq_line.html), [`geom_qq2()`](https://lets-plot.org/python/pages/api/lets_plot.geom_qq2.html), [`geom_qq2_line()`](https://lets-plot.org/python/pages/api/lets_plot.geom_qq2_line.html)
# - ECDF plot: [`stat_ecdf()`](https://lets-plot.org/python/pages/api/lets_plot.stat_ecdf.html)
# - Summary: [`stat_summary()`](https://lets-plot.org/python/pages/api/lets_plot.stat_summary.html), [`stat_summary_bin()`](https://lets-plot.org/python/pages/api/lets_plot.stat_summary_bin.html)
# - Function plot: [`geom_function()`](https://lets-plot.org/python/pages/api/lets_plot.geom_function.html)
# - Text: [`geom_text()`](https://lets-plot.org/python/pages/api/lets_plot.geom_text.html), [`geom_label()`](https://lets-plot.org/python/pages/api/lets_plot.geom_label.html)
# - Map: [`geom_map()`](https://lets-plot.org/python/pages/api/lets_plot.geom_map.html)
# - Image: [`geom_imshow()`](https://lets-plot.org/python/pages/api/lets_plot.geom_imshow.html)
#
# See the [geom reference](https://lets-plot.org/python/pages/charts.html) for more information about the supported
# geometric methods, their arguments, and default values.
#
# ### Collections of plots
# With the [`GGBunch()`](https://lets-plot.org/python/pages/api/lets_plot.GGBunch.html) method, you can
# render a collection of plots.
# Use the `add_plot()` method to add plot to the bunch and set an arbitrary location and size for plots inside the grid:
#
# ```
# bunch = GGBunch()
# bunch.add_plot(plot1, 0, 0)
# bunch.add_plot(plot2, 0, 200)
# ```
#
# See the [GGBunch](https://nbviewer.jupyter.org/github/JetBrains/lets-plot-docs/blob/master/source/examples/cookbook/ggbunch.ipynb) example for more information.
#
# ### Stat `stat`
#
# Add `stat` as an argument to `geom_xxx()` function to define statistical data transformations:
#
# `geom_point(stat='count')`
#
# Supported transformations:
#
# - `identity`: leave the data unchanged
# - `count`: calculate the number of points with same x-axis coordinate
# - `bin`: calculate the number of points falling in each of adjacent equally sized ranges along the x-axis
# - `bin2d`: calculate the number of points falling in each of adjacent equal sized rectangles on the plot plane
# - `smooth`: perform smoothing
# - `contour`, `contourf`: calculate contours of 3D data
# - `boxplot`: calculate components of a box plot.
# - `density`, `density2d`, `density2df`: perform a kernel density estimation for 1D and 2D data
#
# ### Aesthetic mappings `mapping`
# With mappings, you can define how variables in dataset are mapped to the visual elements of the plot.
# Pass the result of the `aes(x, y, other)` function to `geom`, where:
# - `x`: the dataframe column to map to the x axis.
# - `y`: the dataframe column to map to the y axis.
# - `other`: other visual properties of the plot, such as color, shape, size, or position.
#
# `geom_bar(x='cty', y='hwy', color='cyl')`
# you can use a simplified form:
# `geom_bar('cty', 'hwy', color='cyl')`
#
# ### Position adjustment `position`
#
# All layers have a position adjustment that computes the final coordinates of geometry.
# Position adjustment is used to build variances of the same plots and resolve overlapping.
# Override the default settings by using the `position` argument in the `geom` functions:
#
# `geom_bar(position='dodge')`
#
# Available adjustments:
# - `dodge`
# - `jitter`
# - `jitterdodge`
# - `nudge`
# - `identity`
# - `fill`
# - `stack`
#
# See the [position reference](https://lets-plot.org/python/pages/api.html#positions) for more information about position adjustments.
#
# ### Features affecting the entire plot
#
# #### Scales
#
# Enables choosing a reasonable scale for each mapped variable depending on the variable attributes. Override default scales to tweak
# details like the axis labels or legend keys, or to use a completely different translation from data to aesthetic.
# For example, to override the fill color on the histogram:
#
# `p + geom_histogram() + scale_fill_brewer(name="Trend", palette="RdPu")`
#
# See the list of the available `scale` methods in the [scale reference](https://lets-plot.org/python/pages/api.html#scales)
#
# #### Coordinated system
#
# The coordinate system determines how the x and y aesthetics combine to position elements in the plot.
# For example, to override the default X and Y ratio:
#
# `p + coord_fixed(ratio=2)`
#
# See the list of the available methods in [coordinates reference](https://lets-plot.org/python/pages/api.html#coordinates)
#
# #### Legend
# The axes and legends help users interpret plots.
# Use the `guide` methods or the `guide` argument of the `scale` method to customize the legend.
# For example, to define the number of columns in the legend:
#
# `p + scale_color_discrete(guide=guide_legend(ncol=2))`
#
# See more information in the [guide reference](https://lets-plot.org/python/pages/api.html#scale-guides)
#
# Adjust legend location on plot using the `theme` legend_position, legend_justification and legend_direction methods, see:
# [TBD]
#
#
# #### Sampling
#
# Sampling is a special technique of data transformation built into Lets-Plot and it is applied after stat transformation.
# Sampling helps prevents UI freezes and out-of-memory crashes when attempting to plot an excessively large number of geometries.
# By default, the technique applies automatically when the data volume exceeds a certain threshold.
# The `none` value disables any sampling for the given layer. The sampling methods can be chained together using the + operator.
#
# Available methods:
# - `sampling_random_stratified`: randomly selects points from each group proportionally to the group size but also ensures
# that each group is represented by at least a specified minimum number of points.
# - `sampling_random`: selects data points at randomly chosen indices without replacement.
# - `sampling_pick`: analyses X-values and selects all points which X-values get in the set of first `n` X-values found in the population.
# - `sampling_systematic`: selects data points at evenly distributed indices.
# - `sampling_vertex_dp`, `sampling_vertex_vw`: simplifies plotting of polygons.
# There is a choice of two implementation algorithms: Douglas-Peucker (`_dp`) and
# Visvalingam-Whyatt (`_vw`).
#
# For more details, see the [sampling reference](https://lets-plot.org/python/pages/sampling.html).
#
# ### Getting started
#
# Let's plot a point plot built using the mpg dataset.
#
# Create the `DataFrame` object and retrieve the data.
# In[1]:
# Data set
import pandas as pd
mpg = pd.read_csv("https://raw.githubusercontent.com/JetBrains/lets-plot-docs/master/data/mpg.csv")
mpg.head()
# Plot the basic point plot.
# In[2]:
# Basic plotting
from lets_plot import *
# Load Lets-Plot JS library
LetsPlot.setup_html()
# Perform the following aesthetic mappings:
# - `x` = displ (the **displ** column of the dataframe)
# - `y` = hwy (the **hwy** column of the dataframe)
# - `color` = cyl (the **cyl** column of the dataframe)
# In[3]:
p = ggplot(mpg)
p + geom_point(aes('displ', 'hwy', color='cyl'))
# Apply statistical data transformation to count the number of cases at each x position.
# In[4]:
p + geom_point(aes('displ', size='..count..', col='..count..'), stat='count')
# Change the pallete and the legend, add the title.
# In[5]:
p += scale_color_continuous(low="blue", high="pink", guide=guide_legend(ncol=2)) \
+ ggtitle('Highway MPG by displacement')
p + geom_point(aes('displ', 'hwy', color='cyl'), position='jitter')
# Apply the randomly stratified sampling to select points from each group proportionally
# to the group size.
# In[6]:
p + geom_point(
aes('displ', 'hwy', color='cyl'),
position='jitter',
sampling=sampling_random_stratified(40))