!pip install funpymodeling
Collecting funpymodeling Downloading funpymodeling-0.1.7-py3-none-any.whl (6.4 kB) Requirement already satisfied: typing-extensions<4.0.0,>=3.7.4 in /usr/local/lib/python3.7/dist-packages (from funpymodeling) (3.7.4.3) Requirement already satisfied: numpy<2.0.0,>=1.18.5 in /usr/local/lib/python3.7/dist-packages (from funpymodeling) (1.19.5) Requirement already satisfied: matplotlib<4.0.0,>=3.2.2 in /usr/local/lib/python3.7/dist-packages (from funpymodeling) (3.2.2) Requirement already satisfied: pandas<2.0.0,>=1.0.5 in /usr/local/lib/python3.7/dist-packages (from funpymodeling) (1.1.5) Collecting seaborn<0.11.0,>=0.10.1 Downloading seaborn-0.10.1-py3-none-any.whl (215 kB) |████████████████████████████████| 215 kB 7.0 MB/s Requirement already satisfied: sklearn<0.1,>=0.0 in /usr/local/lib/python3.7/dist-packages (from funpymodeling) (0.0) Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib<4.0.0,>=3.2.2->funpymodeling) (2.4.7) Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib<4.0.0,>=3.2.2->funpymodeling) (1.3.2) Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib<4.0.0,>=3.2.2->funpymodeling) (0.10.0) Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib<4.0.0,>=3.2.2->funpymodeling) (2.8.2) Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from cycler>=0.10->matplotlib<4.0.0,>=3.2.2->funpymodeling) (1.15.0) Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.7/dist-packages (from pandas<2.0.0,>=1.0.5->funpymodeling) (2018.9) Requirement already satisfied: scipy>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from seaborn<0.11.0,>=0.10.1->funpymodeling) (1.4.1) Requirement already satisfied: scikit-learn in /usr/local/lib/python3.7/dist-packages (from sklearn<0.1,>=0.0->funpymodeling) (0.22.2.post1) Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.7/dist-packages (from scikit-learn->sklearn<0.1,>=0.0->funpymodeling) (1.0.1) Installing collected packages: seaborn, funpymodeling Attempting uninstall: seaborn Found existing installation: seaborn 0.11.2 Uninstalling seaborn-0.11.2: Successfully uninstalled seaborn-0.11.2 Successfully installed funpymodeling-0.1.7 seaborn-0.10.1
#Importacion de las librerias
import pandas as pd
import seaborn as sns
from pandas_profiling import ProfileReport
from funpymodeling.exploratory import cat_vars, num_vars
import numpy as np
# Carga de datos
data=pd.read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-01-21/spotify_songs.csv")
# Removemos duplicados de canciones:
data=data.drop_duplicates(subset="track_id")
#Nos quedamos unicamente con las variables numericas
x_data=data.drop(cat_vars(data), axis=1)
# Sacamos algunas variables adicionales que no aportan valor
x_data=x_data.drop(['key','speechiness', 'mode', 'tempo', 'duration_ms'], axis=1)
#Importamos la libreria para el escalado de los datos
from sklearn.preprocessing import StandardScaler
#Generamos el objeto
scaler = StandardScaler()
#Aplicamos la transformacion
x_scaled = scaler.fit_transform(x_data)
#Importante: Los datos no tienen que tener nulos y deben ser todos numericos
Generamos el modelo y fiteamos:
#Importamos la libreria
from sklearn.decomposition import PCA
#Generamos el objeto
model_pca = PCA()
#Aplicamos pca
x_pca=model_pca.fit_transform(x_scaled)
#Variaanza explicada de las componentes
var_explicada_pca = model_pca.explained_variance_ratio_
var_explicada_pca
array([0.26718277, 0.17844569, 0.14004096, 0.12199857, 0.10753268, 0.08223481, 0.0748838 , 0.02768073])
Interpretación:
La primer componente aporta el 26 % de la varianza explicada, la segunda el 17% y así sucesivamente
UMAP es un método de reducción de dimensionalidad no lineal y es muy eficaz para visualizar agrupaciones o grupos de puntos de datos y sus proximidades relativas.
Link de Interes:
!pip3 install umap-learn
Collecting umap-learn Downloading umap-learn-0.5.1.tar.gz (80 kB) |████████████████████████████████| 80 kB 4.7 MB/s Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.7/dist-packages (from umap-learn) (1.19.5) Requirement already satisfied: scikit-learn>=0.22 in /usr/local/lib/python3.7/dist-packages (from umap-learn) (0.22.2.post1) Requirement already satisfied: scipy>=1.0 in /usr/local/lib/python3.7/dist-packages (from umap-learn) (1.4.1) Requirement already satisfied: numba>=0.49 in /usr/local/lib/python3.7/dist-packages (from umap-learn) (0.51.2) Collecting pynndescent>=0.5 Downloading pynndescent-0.5.4.tar.gz (1.1 MB) |████████████████████████████████| 1.1 MB 16.4 MB/s Requirement already satisfied: setuptools in /usr/local/lib/python3.7/dist-packages (from numba>=0.49->umap-learn) (57.4.0) Requirement already satisfied: llvmlite<0.35,>=0.34.0.dev0 in /usr/local/lib/python3.7/dist-packages (from numba>=0.49->umap-learn) (0.34.0) Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.7/dist-packages (from pynndescent>=0.5->umap-learn) (1.0.1) Building wheels for collected packages: umap-learn, pynndescent Building wheel for umap-learn (setup.py) ... done Created wheel for umap-learn: filename=umap_learn-0.5.1-py3-none-any.whl size=76564 sha256=98e506631763535c0847e3680bf9717b96a571088a8935f85ab6cacbf15b61f6 Stored in directory: /root/.cache/pip/wheels/01/e7/bb/347dc0e510803d7116a13d592b10cc68262da56a8eec4dd72f Building wheel for pynndescent (setup.py) ... done Created wheel for pynndescent: filename=pynndescent-0.5.4-py3-none-any.whl size=52373 sha256=16714ac70828972fe718f24e299c55e5c1bacdcb34c69d29133f455013f5751b Stored in directory: /root/.cache/pip/wheels/d0/5b/62/3401692ddad12324249c774c4b15ccb046946021e2b581c043 Successfully built umap-learn pynndescent Installing collected packages: pynndescent, umap-learn Successfully installed pynndescent-0.5.4 umap-learn-0.5.1
import umap #pip3 install umap-learn
#Generamos el objeto para la estandarizacion
x_scaled = StandardScaler()
#Aplicamos la estandarizacion
x_scaled = x_scaled.fit_transform(x_data)
#Obtenemos el objeto umap
model_umap = umap.UMAP()
#Ejecutamos el umap
model_umap_fit_transform = model_umap.fit_transform(x_scaled)
model_umap_fit_transform.shape
/usr/local/lib/python3.7/dist-packages/numba/np/ufunc/parallel.py:363: NumbaWarning: The TBB threading layer requires TBB version 2019.5 or later i.e., TBB_INTERFACE_VERSION >= 11005. Found TBB_INTERFACE_VERSION = 9107. The TBB threading layer is disabled. warnings.warn(problem)
(28356, 2)
Visualización interactiva con Plotly!
data2 = data.copy() #Hacemos un copy
data2['cancion'] = data2['track_artist'] + ' | ' + data2['track_name'] #Creamos una nueva variable llamada: cancion
data2[['dim1', 'dim2']] = model_umap_fit_transform #Agregamos las dos dimensiones generadas por umap
import plotly.express as px
fig = px.scatter(data2, x="dim1", y="dim2", color="track_popularity", hover_data=['cancion'])
fig.show()