Last Update: 20/02/2022
PyCaret Version: 2.0
Author: Haithem Hermessi
Email: haithem.hermessi@fst.utm.tn
from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
About Dataset: This dataset contains around 65k+ traffic-related violation records.
Attribute Information:
stop_date - Date of violation
stop_time - Time of violation
driver_gender - Gender of violators (Male-M, Female-F)
driver_age - Age of violators
driver_race - Race of violators
violation - Category of violation : Speeding Moving Violation (Reckless Driving, Hit and run, Assaulting another driver, pedestrian, improper turns and lane changes, etc) Equipment (Window tint violations, Headlight/taillights out, Loud exhaust, Cracked windshield, etc.) Registration/Plates Seat Belt other (Call for Service, Violation of City/Town Ordinance, Suspicious Person, Motorist Assist/Courtesy, etc.)
search_conducted - Whether search is conducted in True and False form
stop_outcome - Result of violation
is_arrested - Whether a person was arrested in True and False form
stop_duration - Detained time for violators approx (in minutes)
drugsrelatedstop - Whether a person was involved in drugs crime (True, False)
!pip install pycaret==2.3
Requirement already satisfied: pycaret==2.3 in /usr/local/lib/python3.7/dist-packages (2.3.0) Requirement already satisfied: lightgbm>=2.3.1 in /usr/local/lib/python3.7/dist-packages (from pycaret==2.3) (3.3.2) Requirement already satisfied: joblib in /usr/local/lib/python3.7/dist-packages (from pycaret==2.3) (1.0.1) Requirement already satisfied: scikit-plot in /usr/local/lib/python3.7/dist-packages (from pycaret==2.3) (0.3.7) Requirement already satisfied: nltk in /usr/local/lib/python3.7/dist-packages (from pycaret==2.3) (3.2.5) Requirement already satisfied: IPython in /usr/local/lib/python3.7/dist-packages (from pycaret==2.3) (5.5.0) Requirement already satisfied: mlflow in /usr/local/lib/python3.7/dist-packages (from pycaret==2.3) (1.23.1) Requirement already satisfied: pyod in /usr/local/lib/python3.7/dist-packages (from pycaret==2.3) (0.9.7) Requirement already satisfied: scikit-learn==0.23.2 in /usr/local/lib/python3.7/dist-packages (from pycaret==2.3) (0.23.2) Requirement already satisfied: imbalanced-learn>=0.7.0 in /usr/local/lib/python3.7/dist-packages (from pycaret==2.3) (0.7.0) Requirement already satisfied: Boruta in /usr/local/lib/python3.7/dist-packages (from pycaret==2.3) (0.3) Requirement already satisfied: seaborn in /usr/local/lib/python3.7/dist-packages (from pycaret==2.3) (0.11.2) Requirement already satisfied: wordcloud in /usr/local/lib/python3.7/dist-packages (from pycaret==2.3) (1.5.0) Requirement already satisfied: gensim in /usr/local/lib/python3.7/dist-packages (from pycaret==2.3) (3.6.0) Requirement already satisfied: plotly>=4.4.1 in /usr/local/lib/python3.7/dist-packages (from pycaret==2.3) (5.5.0) Requirement already satisfied: umap-learn in /usr/local/lib/python3.7/dist-packages (from pycaret==2.3) (0.5.2) Requirement already satisfied: mlxtend in /usr/local/lib/python3.7/dist-packages (from pycaret==2.3) (0.19.0) Requirement already satisfied: numpy==1.19.5 in /usr/local/lib/python3.7/dist-packages (from pycaret==2.3) (1.19.5) Requirement already satisfied: cufflinks>=0.17.0 in /usr/local/lib/python3.7/dist-packages (from pycaret==2.3) (0.17.3) Requirement already satisfied: spacy<2.4.0 in /usr/local/lib/python3.7/dist-packages (from pycaret==2.3) (2.2.4) Requirement already satisfied: textblob in /usr/local/lib/python3.7/dist-packages (from pycaret==2.3) (0.15.3) Requirement already satisfied: ipywidgets in /usr/local/lib/python3.7/dist-packages (from pycaret==2.3) (7.6.5) Requirement already satisfied: pyLDAvis in /usr/local/lib/python3.7/dist-packages (from pycaret==2.3) (3.2.2) Requirement already satisfied: kmodes>=0.10.1 in /usr/local/lib/python3.7/dist-packages (from pycaret==2.3) (0.11.1) Requirement already satisfied: matplotlib in /usr/local/lib/python3.7/dist-packages (from pycaret==2.3) (3.2.2) Requirement already satisfied: pandas-profiling>=2.8.0 in /usr/local/lib/python3.7/dist-packages (from pycaret==2.3) (3.1.0) Requirement already satisfied: yellowbrick>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from pycaret==2.3) (1.3.post1) Requirement already satisfied: scipy<=1.5.4 in /usr/local/lib/python3.7/dist-packages (from pycaret==2.3) (1.5.4) Requirement already satisfied: pandas in /usr/local/lib/python3.7/dist-packages (from pycaret==2.3) (1.3.5) Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.7/dist-packages (from scikit-learn==0.23.2->pycaret==2.3) (3.1.0) Requirement already satisfied: colorlover>=0.2.1 in /usr/local/lib/python3.7/dist-packages (from cufflinks>=0.17.0->pycaret==2.3) (0.3.0) Requirement already satisfied: six>=1.9.0 in /usr/local/lib/python3.7/dist-packages (from cufflinks>=0.17.0->pycaret==2.3) (1.15.0) Requirement already satisfied: setuptools>=34.4.1 in /usr/local/lib/python3.7/dist-packages (from cufflinks>=0.17.0->pycaret==2.3) (57.4.0) Requirement already satisfied: pexpect in /usr/local/lib/python3.7/dist-packages (from IPython->pycaret==2.3) (4.8.0) Requirement already satisfied: pickleshare in /usr/local/lib/python3.7/dist-packages (from IPython->pycaret==2.3) (0.7.5) Requirement already satisfied: simplegeneric>0.8 in /usr/local/lib/python3.7/dist-packages (from IPython->pycaret==2.3) (0.8.1) Requirement already satisfied: decorator in /usr/local/lib/python3.7/dist-packages (from IPython->pycaret==2.3) (4.4.2) Requirement already satisfied: prompt-toolkit<2.0.0,>=1.0.4 in /usr/local/lib/python3.7/dist-packages (from IPython->pycaret==2.3) (1.0.18) Requirement already satisfied: pygments in /usr/local/lib/python3.7/dist-packages (from IPython->pycaret==2.3) (2.6.1) Requirement already satisfied: traitlets>=4.2 in /usr/local/lib/python3.7/dist-packages (from IPython->pycaret==2.3) (5.1.1) Requirement already satisfied: widgetsnbextension~=3.5.0 in /usr/local/lib/python3.7/dist-packages (from ipywidgets->pycaret==2.3) (3.5.2) Requirement already satisfied: ipython-genutils~=0.2.0 in /usr/local/lib/python3.7/dist-packages (from ipywidgets->pycaret==2.3) (0.2.0) Requirement already satisfied: nbformat>=4.2.0 in /usr/local/lib/python3.7/dist-packages (from ipywidgets->pycaret==2.3) (5.1.3) Requirement already satisfied: jupyterlab-widgets>=1.0.0 in /usr/local/lib/python3.7/dist-packages (from ipywidgets->pycaret==2.3) (1.0.2) Requirement already satisfied: ipykernel>=4.5.1 in /usr/local/lib/python3.7/dist-packages (from ipywidgets->pycaret==2.3) (4.10.1) Requirement already satisfied: jupyter-client in /usr/local/lib/python3.7/dist-packages (from ipykernel>=4.5.1->ipywidgets->pycaret==2.3) (5.3.5) Requirement already satisfied: tornado>=4.0 in /usr/local/lib/python3.7/dist-packages (from ipykernel>=4.5.1->ipywidgets->pycaret==2.3) (5.1.1) Requirement already satisfied: wheel in /usr/local/lib/python3.7/dist-packages (from lightgbm>=2.3.1->pycaret==2.3) (0.37.1) Requirement already satisfied: jupyter-core in /usr/local/lib/python3.7/dist-packages (from nbformat>=4.2.0->ipywidgets->pycaret==2.3) (4.9.1) Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in /usr/local/lib/python3.7/dist-packages (from nbformat>=4.2.0->ipywidgets->pycaret==2.3) (4.3.3) Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets->pycaret==2.3) (3.10.0.2) Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /usr/local/lib/python3.7/dist-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets->pycaret==2.3) (0.18.1) Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.7/dist-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets->pycaret==2.3) (21.4.0) Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.7/dist-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets->pycaret==2.3) (4.11.0) Requirement already satisfied: importlib-resources>=1.4.0 in /usr/local/lib/python3.7/dist-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets->pycaret==2.3) (5.4.0) Requirement already satisfied: zipp>=3.1.0 in /usr/local/lib/python3.7/dist-packages (from importlib-resources>=1.4.0->jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets->pycaret==2.3) (3.7.0) Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.7/dist-packages (from pandas->pycaret==2.3) (2018.9) Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.7/dist-packages (from pandas->pycaret==2.3) (2.8.2) Requirement already satisfied: PyYAML>=5.0.0 in /usr/local/lib/python3.7/dist-packages (from pandas-profiling>=2.8.0->pycaret==2.3) (5.4.1) Requirement already satisfied: pydantic>=1.8.1 in /usr/local/lib/python3.7/dist-packages (from pandas-profiling>=2.8.0->pycaret==2.3) (1.9.0) Requirement already satisfied: jinja2>=2.11.1 in /usr/local/lib/python3.7/dist-packages (from pandas-profiling>=2.8.0->pycaret==2.3) (2.11.3) Requirement already satisfied: missingno>=0.4.2 in /usr/local/lib/python3.7/dist-packages (from pandas-profiling>=2.8.0->pycaret==2.3) (0.5.0) Requirement already satisfied: multimethod>=1.4 in /usr/local/lib/python3.7/dist-packages (from pandas-profiling>=2.8.0->pycaret==2.3) (1.7) Requirement already satisfied: tqdm>=4.48.2 in /usr/local/lib/python3.7/dist-packages (from pandas-profiling>=2.8.0->pycaret==2.3) (4.62.3) Requirement already satisfied: markupsafe~=2.0.1 in /usr/local/lib/python3.7/dist-packages (from pandas-profiling>=2.8.0->pycaret==2.3) (2.0.1) Requirement already satisfied: tangled-up-in-unicode==0.1.0 in /usr/local/lib/python3.7/dist-packages (from pandas-profiling>=2.8.0->pycaret==2.3) (0.1.0) Requirement already satisfied: htmlmin>=0.1.12 in /usr/local/lib/python3.7/dist-packages (from pandas-profiling>=2.8.0->pycaret==2.3) (0.1.12) Requirement already satisfied: visions[type_image_path]==0.7.4 in /usr/local/lib/python3.7/dist-packages (from pandas-profiling>=2.8.0->pycaret==2.3) (0.7.4) Requirement already satisfied: phik>=0.11.1 in /usr/local/lib/python3.7/dist-packages (from pandas-profiling>=2.8.0->pycaret==2.3) (0.12.0) Requirement already satisfied: requests>=2.24.0 in /usr/local/lib/python3.7/dist-packages (from pandas-profiling>=2.8.0->pycaret==2.3) (2.27.1) Requirement already satisfied: networkx>=2.4 in /usr/local/lib/python3.7/dist-packages (from visions[type_image_path]==0.7.4->pandas-profiling>=2.8.0->pycaret==2.3) (2.6.3) Requirement already satisfied: Pillow in /usr/local/lib/python3.7/dist-packages (from visions[type_image_path]==0.7.4->pandas-profiling>=2.8.0->pycaret==2.3) (7.1.2) Requirement already satisfied: imagehash in /usr/local/lib/python3.7/dist-packages (from visions[type_image_path]==0.7.4->pandas-profiling>=2.8.0->pycaret==2.3) (4.2.1) Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->pycaret==2.3) (1.3.2) Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.7/dist-packages (from matplotlib->pycaret==2.3) (0.11.0) Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.7/dist-packages (from matplotlib->pycaret==2.3) (3.0.7) Requirement already satisfied: tenacity>=6.2.0 in /usr/local/lib/python3.7/dist-packages (from plotly>=4.4.1->pycaret==2.3) (8.0.1) Requirement already satisfied: wcwidth in /usr/local/lib/python3.7/dist-packages (from prompt-toolkit<2.0.0,>=1.0.4->IPython->pycaret==2.3) (0.2.5) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests>=2.24.0->pandas-profiling>=2.8.0->pycaret==2.3) (2.10) Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests>=2.24.0->pandas-profiling>=2.8.0->pycaret==2.3) (1.24.3) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests>=2.24.0->pandas-profiling>=2.8.0->pycaret==2.3) (2021.10.8) Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.7/dist-packages (from requests>=2.24.0->pandas-profiling>=2.8.0->pycaret==2.3) (2.0.11) Requirement already satisfied: blis<0.5.0,>=0.4.0 in /usr/local/lib/python3.7/dist-packages (from spacy<2.4.0->pycaret==2.3) (0.4.1) Requirement already satisfied: preshed<3.1.0,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from spacy<2.4.0->pycaret==2.3) (3.0.6) Requirement already satisfied: thinc==7.4.0 in /usr/local/lib/python3.7/dist-packages (from spacy<2.4.0->pycaret==2.3) (7.4.0) Requirement already satisfied: murmurhash<1.1.0,>=0.28.0 in /usr/local/lib/python3.7/dist-packages (from spacy<2.4.0->pycaret==2.3) (1.0.6) Requirement already satisfied: plac<1.2.0,>=0.9.6 in /usr/local/lib/python3.7/dist-packages (from spacy<2.4.0->pycaret==2.3) (1.1.3) Requirement already satisfied: cymem<2.1.0,>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from spacy<2.4.0->pycaret==2.3) (2.0.6) Requirement already satisfied: wasabi<1.1.0,>=0.4.0 in /usr/local/lib/python3.7/dist-packages (from spacy<2.4.0->pycaret==2.3) (0.9.0) Requirement already satisfied: catalogue<1.1.0,>=0.0.7 in /usr/local/lib/python3.7/dist-packages (from spacy<2.4.0->pycaret==2.3) (1.0.0) Requirement already satisfied: srsly<1.1.0,>=1.0.2 in /usr/local/lib/python3.7/dist-packages (from spacy<2.4.0->pycaret==2.3) (1.0.5) Requirement already satisfied: notebook>=4.4.1 in /usr/local/lib/python3.7/dist-packages (from widgetsnbextension~=3.5.0->ipywidgets->pycaret==2.3) (5.3.1) Requirement already satisfied: nbconvert in /usr/local/lib/python3.7/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->pycaret==2.3) (5.6.1) Requirement already satisfied: Send2Trash in /usr/local/lib/python3.7/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->pycaret==2.3) (1.8.0) Requirement already satisfied: terminado>=0.8.1 in /usr/local/lib/python3.7/dist-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->pycaret==2.3) (0.13.1) Requirement already satisfied: pyzmq>=13 in /usr/local/lib/python3.7/dist-packages (from jupyter-client->ipykernel>=4.5.1->ipywidgets->pycaret==2.3) (22.3.0) Requirement already satisfied: ptyprocess in /usr/local/lib/python3.7/dist-packages (from terminado>=0.8.1->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->pycaret==2.3) (0.7.0) Requirement already satisfied: smart-open>=1.2.1 in /usr/local/lib/python3.7/dist-packages (from gensim->pycaret==2.3) (5.2.1) Requirement already satisfied: PyWavelets in /usr/local/lib/python3.7/dist-packages (from imagehash->visions[type_image_path]==0.7.4->pandas-profiling>=2.8.0->pycaret==2.3) (1.2.0) Requirement already satisfied: alembic in /usr/local/lib/python3.7/dist-packages (from mlflow->pycaret==2.3) (1.7.6) Requirement already satisfied: gitpython>=2.1.0 in /usr/local/lib/python3.7/dist-packages (from mlflow->pycaret==2.3) (3.1.27) Requirement already satisfied: sqlparse>=0.3.1 in /usr/local/lib/python3.7/dist-packages (from mlflow->pycaret==2.3) (0.4.2) Requirement already satisfied: docker>=4.0.0 in /usr/local/lib/python3.7/dist-packages (from mlflow->pycaret==2.3) (5.0.3) Requirement already satisfied: cloudpickle in /usr/local/lib/python3.7/dist-packages (from mlflow->pycaret==2.3) (1.3.0) Requirement already satisfied: prometheus-flask-exporter in /usr/local/lib/python3.7/dist-packages (from mlflow->pycaret==2.3) (0.18.7) Requirement already satisfied: protobuf>=3.7.0 in /usr/local/lib/python3.7/dist-packages (from mlflow->pycaret==2.3) (3.17.3) Requirement already satisfied: entrypoints in /usr/local/lib/python3.7/dist-packages (from mlflow->pycaret==2.3) (0.4) Requirement already satisfied: querystring-parser in /usr/local/lib/python3.7/dist-packages (from mlflow->pycaret==2.3) (1.2.4) Requirement already satisfied: Flask in /usr/local/lib/python3.7/dist-packages (from mlflow->pycaret==2.3) (1.1.4) Requirement already satisfied: gunicorn in /usr/local/lib/python3.7/dist-packages (from mlflow->pycaret==2.3) (20.1.0) Requirement already satisfied: click>=7.0 in /usr/local/lib/python3.7/dist-packages (from mlflow->pycaret==2.3) (7.1.2) Requirement already satisfied: sqlalchemy in /usr/local/lib/python3.7/dist-packages (from mlflow->pycaret==2.3) (1.4.31) Requirement already satisfied: databricks-cli>=0.8.7 in /usr/local/lib/python3.7/dist-packages (from mlflow->pycaret==2.3) (0.16.4) Requirement already satisfied: packaging in /usr/local/lib/python3.7/dist-packages (from mlflow->pycaret==2.3) (21.3) Requirement already satisfied: tabulate>=0.7.7 in /usr/local/lib/python3.7/dist-packages (from databricks-cli>=0.8.7->mlflow->pycaret==2.3) (0.8.9) Requirement already satisfied: websocket-client>=0.32.0 in /usr/local/lib/python3.7/dist-packages (from docker>=4.0.0->mlflow->pycaret==2.3) (1.2.3) Requirement already satisfied: gitdb<5,>=4.0.1 in /usr/local/lib/python3.7/dist-packages (from gitpython>=2.1.0->mlflow->pycaret==2.3) (4.0.9) Requirement already satisfied: smmap<6,>=3.0.1 in /usr/local/lib/python3.7/dist-packages (from gitdb<5,>=4.0.1->gitpython>=2.1.0->mlflow->pycaret==2.3) (5.0.0) Requirement already satisfied: Mako in /usr/local/lib/python3.7/dist-packages (from alembic->mlflow->pycaret==2.3) (1.1.6) Requirement already satisfied: greenlet!=0.4.17 in /usr/local/lib/python3.7/dist-packages (from sqlalchemy->mlflow->pycaret==2.3) (1.1.2) Requirement already satisfied: itsdangerous<2.0,>=0.24 in /usr/local/lib/python3.7/dist-packages (from Flask->mlflow->pycaret==2.3) (1.1.0) Requirement already satisfied: Werkzeug<2.0,>=0.15 in /usr/local/lib/python3.7/dist-packages (from Flask->mlflow->pycaret==2.3) (1.0.1) Requirement already satisfied: testpath in /usr/local/lib/python3.7/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->pycaret==2.3) (0.5.0) Requirement already satisfied: mistune<2,>=0.8.1 in /usr/local/lib/python3.7/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->pycaret==2.3) (0.8.4) Requirement already satisfied: bleach in /usr/local/lib/python3.7/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->pycaret==2.3) (4.1.0) Requirement already satisfied: pandocfilters>=1.4.1 in /usr/local/lib/python3.7/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->pycaret==2.3) (1.5.0) Requirement already satisfied: defusedxml in /usr/local/lib/python3.7/dist-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->pycaret==2.3) (0.7.1) Requirement already satisfied: webencodings in /usr/local/lib/python3.7/dist-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets->pycaret==2.3) (0.5.1) Requirement already satisfied: prometheus-client in /usr/local/lib/python3.7/dist-packages (from prometheus-flask-exporter->mlflow->pycaret==2.3) (0.13.1) Requirement already satisfied: numexpr in /usr/local/lib/python3.7/dist-packages (from pyLDAvis->pycaret==2.3) (2.8.1) Requirement already satisfied: funcy in /usr/local/lib/python3.7/dist-packages (from pyLDAvis->pycaret==2.3) (1.17) Requirement already satisfied: future in /usr/local/lib/python3.7/dist-packages (from pyLDAvis->pycaret==2.3) (0.16.0) Requirement already satisfied: numba>=0.35 in /usr/local/lib/python3.7/dist-packages (from pyod->pycaret==2.3) (0.51.2) Requirement already satisfied: statsmodels in /usr/local/lib/python3.7/dist-packages (from pyod->pycaret==2.3) (0.10.2) Requirement already satisfied: llvmlite<0.35,>=0.34.0.dev0 in /usr/local/lib/python3.7/dist-packages (from numba>=0.35->pyod->pycaret==2.3) (0.34.0) Requirement already satisfied: patsy>=0.4.0 in /usr/local/lib/python3.7/dist-packages (from statsmodels->pyod->pycaret==2.3) (0.5.2) Requirement already satisfied: pynndescent>=0.5 in /usr/local/lib/python3.7/dist-packages (from umap-learn->pycaret==2.3) (0.5.6)
# Pycaret
from pycaret.regression import *
from pycaret.utils import enable_colab
enable_colab()
--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) RuntimeError: module compiled against API version 0xe but this version of numpy is 0xd
--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) RuntimeError: module compiled against API version 0xe but this version of numpy is 0xd
--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) RuntimeError: module compiled against API version 0xe but this version of numpy is 0xd
Colab mode enabled.
/usr/local/lib/python3.7/dist-packages/distributed/config.py:20: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details. defaults = yaml.load(f)
# importing the required libraries
import pandas as pd
import numpy as np
# Visualisation libraries
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import plotly.graph_objects as go
import missingno as ms
# Manipulating the default plot size
plt.rcParams['figure.figsize'] = 10, 12
# Disable warnings
import warnings
warnings.filterwarnings('ignore')
# Reading a .csv file by creating a dataframe using pandas
# Reading the datasets
df= pd.read_csv('Traffic Violaions.csv',encoding = "utf-8",on_bad_lines='skip')
dt = df.copy()
print(df.shape)
(52966, 15)
df.head()
stop_date | stop_time | country_name | driver_gender | driver_age_raw | driver_age | driver_race | violation_raw | violation | search_conducted | search_type | stop_outcome | is_arrested | stop_duration | drugs_related_stop | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1/2/2005 | 1:55 | NaN | M | 1985.0 | 20.0 | White | Speeding | Speeding | False | NaN | Citation | False | 0-15 Min | False |
1 | 1/18/2005 | 8:15 | NaN | M | 1965.0 | 40.0 | White | Speeding | Speeding | False | NaN | Citation | False | 0-15 Min | False |
2 | 1/23/2005 | 23:15 | NaN | M | 1972.0 | 33.0 | White | Speeding | Speeding | False | NaN | Citation | False | 0-15 Min | False |
3 | 2/20/2005 | 17:15 | NaN | M | 1986.0 | 19.0 | White | Call for Service | Other | False | NaN | Arrest Driver | True | 16-30 Min | False |
4 | 3/14/2005 | 10:00 | NaN | F | 1984.0 | 21.0 | White | Speeding | Speeding | False | NaN | Citation | False | 0-15 Min | False |
ms.matrix(df)
<matplotlib.axes._subplots.AxesSubplot at 0x7f5aeb3df090>
df1 = df.copy(deep=True)
df1.drop(columns = ['country_name','search_type','driver_age_raw'], inplace=True)
df1.dropna(subset = ['driver_gender'], inplace = True)
df1.isna().sum()
stop_date 0 stop_time 0 driver_gender 0 driver_age 240 driver_race 0 violation_raw 0 violation 0 search_conducted 0 stop_outcome 0 is_arrested 0 stop_duration 0 drugs_related_stop 0 dtype: int64
1. Some missing values are to remain in the driver_age column. We have to fill these missing values using median.
2. After cleaning, we again have to check the remaining missing values.
df2 = df1.copy(deep=True)
df2['driver_age'] = df2['driver_age'].fillna(df.groupby('driver_gender')['driver_age'].transform('median'))
ms.matrix(df)
del df ,df1
data = df2.copy(deep=True)
pd.to_datetime(data['stop_time'])
data['stop_hour'] = pd.to_datetime(data['stop_time'], format = '%H:%M').dt.hour
data['stop_duration'].value_counts()
data['stop_duration'] = data['stop_duration'].map({'0-15 Min':7.5,'16-30 Min':23,'30+ Min':45})
stop_duration_based_on_race = data.groupby('driver_race')[['stop_duration']].mean()
data.search_conducted = data.search_conducted.replace(to_replace=[True, False], value=[1, 0])
data.drugs_related_stop = data.drugs_related_stop.replace(to_replace=[True, False], value=[1, 0])
data['stop_date'] = pd.to_datetime(data['stop_date'])
data['stop_Year'] = pd.DatetimeIndex(data['stop_date']).year
yearly_data = data.groupby('stop_Year').sum()
yearly_data.reset_index(inplace = True)
search_conducted = len(data[data.search_conducted is True])
arrested_after_search = len(data[(data.search_conducted is True) & (data.is_arrested is True)])
arrested = ((arrested_after_search/search_conducted)*100)
not_arrested = (100-(arrested))
Age Distribution
sns.displot(x = 'driver_age', hue = 'driver_gender', kde = True, data = data,
multiple = 'stack', alpha = 0.8, palette = "bright", height=5, aspect=2)
plt.title('Age Distribution Based on Gender')
plt.xlabel("Driver's Age")
plt.ylabel("No of People");
Distribution in Violation Type
fig , ax = plt.subplots(figsize=(12,5))
ax = sns.countplot(x = data.violation, data = data, order = data.violation.value_counts().index, palette = "bright")
for i in ax.patches:
percentage = '{:.1f}%'.format(100*i.get_height()/len(data.violation))
x = i.get_x()+i.get_width()-0.6
y = i.get_height()
ax.annotate(percentage, (x, y))
plt.title("Distrbution in Violation Type")
plt.xlabel("Violation Type")
plt.ylabel("No. of People Involved");
Hours in Which Speed Violated
plt.figure(figsize = (12,5))
sns.countplot(x = data.stop_hour,data = data,hue = 'driver_gender', palette = "bright")
plt.title('Hours vs. No. of Vehicles Stopped')
plt.legend(['Male','Female'])
plt.xlabel("Stop Hour")
plt.ylabel("No. of Vehicles");
Traffic Violation Distribution Based on Race
fig , ax = plt.subplots(figsize = (12,5))
ax = sns.countplot(x=data.driver_race, data=data, order = data.driver_race.value_counts().index,
linewidth = 0, palette = "bright")
for i in ax.patches:
percentage = '{:.1f}%'.format(100*i.get_height()/len(data.driver_race))
x = i.get_x()+i.get_width()-0.6
y = i.get_height()
ax.annotate(percentage, (x, y))
plt.title('Traffic Violation Based on Race')
plt.xlabel("Driver's Race")
plt.ylabel("No. of People");
# Dataset Sampling
RANDOM_SEED = 42
K_FOLDS = 5
def data_sampling(dataset, frac: float, random_seed: int):
data_sampled_a = dataset.sample(frac=frac, random_state=random_seed)
data_sampled_b = dataset.drop(data_sampled_a.index).reset_index(drop=True)
data_sampled_a.reset_index(drop=True, inplace=True)
return data_sampled_a, data_sampled_b
#A random sample of 10% size of the dataset will be get to make predictions with unseen data.
data, data_unseen = data_sampling(df2, 0.9, RANDOM_SEED)
print(f"There are {data_unseen.shape[0]} samples for Unseen Data.")
There are 4958 samples for Unseen Data.
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 44622 entries, 0 to 44621 Data columns (total 12 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 stop_date 44622 non-null object 1 stop_time 44622 non-null object 2 driver_gender 44622 non-null object 3 driver_age 44622 non-null float64 4 driver_race 44622 non-null object 5 violation_raw 44622 non-null object 6 violation 44622 non-null object 7 search_conducted 44622 non-null object 8 stop_outcome 44622 non-null object 9 is_arrested 44622 non-null object 10 stop_duration 44622 non-null object 11 drugs_related_stop 44622 non-null object dtypes: float64(1), object(11) memory usage: 4.1+ MB
pipeline = setup(data, target='drugs_related_stop', session_id=42,
log_experiment=False, experiment_name='Drug',
remove_outliers=True,fold_shuffle=True,data_split_shuffle=True)
Description | Value | |
---|---|---|
0 | session_id | 42 |
1 | Target | drugs_related_stop |
2 | Original Data | (44622, 12) |
3 | Missing Values | False |
4 | Numeric Features | 1 |
5 | Categorical Features | 8 |
6 | Ordinal Features | False |
7 | High Cardinality Features | False |
8 | High Cardinality Method | None |
9 | Transformed Train Set | (29673, 68) |
10 | Transformed Test Set | (13387, 68) |
11 | Shuffle Train-Test | True |
12 | Stratify Train-Test | False |
13 | Fold Generator | KFold |
14 | Fold Number | 10 |
15 | CPU Jobs | -1 |
16 | Use GPU | False |
17 | Log Experiment | False |
18 | Experiment Name | Drug |
19 | USI | 13d6 |
20 | Imputation Type | simple |
21 | Iterative Imputation Iteration | None |
22 | Numeric Imputer | mean |
23 | Iterative Imputation Numeric Model | None |
24 | Categorical Imputer | constant |
25 | Iterative Imputation Categorical Model | None |
26 | Unknown Categoricals Handling | least_frequent |
27 | Normalize | False |
28 | Normalize Method | None |
29 | Transformation | False |
30 | Transformation Method | None |
31 | PCA | False |
32 | PCA Method | None |
33 | PCA Components | None |
34 | Ignore Low Variance | False |
35 | Combine Rare Levels | False |
36 | Rare Level Threshold | None |
37 | Numeric Binning | False |
38 | Remove Outliers | True |
39 | Outliers Threshold | 0.05 |
40 | Remove Multicollinearity | False |
41 | Multicollinearity Threshold | None |
42 | Clustering | False |
43 | Clustering Iteration | None |
44 | Polynomial Features | False |
45 | Polynomial Degree | None |
46 | Trignometry Features | False |
47 | Polynomial Threshold | None |
48 | Group Features | False |
49 | Feature Selection | False |
50 | Feature Selection Method | classic |
51 | Features Selection Threshold | None |
52 | Feature Interaction | False |
53 | Feature Ratio | False |
54 | Interaction Threshold | None |
55 | Transform Target | False |
56 | Transform Target Method | box-cox |
best_model = compare_models()
Model | MAE | MSE | RMSE | R2 | RMSLE | MAPE | TT (Sec) | |
---|---|---|---|---|---|---|---|---|
gbr | Gradient Boosting Regressor | 0.0085 | 4.300000e-03 | 0.0649 | 1.994000e-01 | 0.0458 | 0.7489 | 2.622 |
omp | Orthogonal Matching Pursuit | 0.0105 | 4.300000e-03 | 0.0651 | 1.902000e-01 | 0.0458 | 0.7878 | 0.065 |
br | Bayesian Ridge | 0.0118 | 4.300000e-03 | 0.0651 | 1.902000e-01 | 0.0458 | 0.7886 | 0.226 |
lr | Linear Regression | 0.0118 | 4.300000e-03 | 0.0651 | 1.900000e-01 | 0.0459 | 0.7870 | 0.718 |
ridge | Ridge Regression | 0.0118 | 4.300000e-03 | 0.0651 | 1.900000e-01 | 0.0459 | 0.7873 | 0.066 |
ada | AdaBoost Regressor | 0.0083 | 4.400000e-03 | 0.0657 | 1.734000e-01 | 0.0463 | 0.7691 | 0.282 |
rf | Random Forest Regressor | 0.0083 | 4.600000e-03 | 0.0671 | 1.400000e-01 | 0.0477 | 0.7440 | 1.344 |
huber | Huber Regressor | 0.0056 | 5.300000e-03 | 0.0722 | 1.500000e-02 | 0.0498 | 0.9895 | 3.446 |
knn | K Neighbors Regressor | 0.0069 | 5.300000e-03 | 0.0723 | 8.200000e-03 | 0.0508 | 0.9454 | 1.863 |
lightgbm | Light Gradient Boosting Machine | 0.0000 | 0.000000e+00 | 0.0000 | 0.000000e+00 | 0.0000 | 0.0000 | 0.073 |
catboost | CatBoost Regressor | 0.0000 | 0.000000e+00 | 0.0000 | 0.000000e+00 | 0.0000 | 0.0000 | 0.037 |
en | Elastic Net | 0.0107 | 5.400000e-03 | 0.0727 | -4.000000e-04 | 0.0504 | 0.9946 | 0.062 |
llar | Lasso Least Angle Regression | 0.0107 | 5.400000e-03 | 0.0727 | -4.000000e-04 | 0.0504 | 0.9946 | 0.069 |
lasso | Lasso Regression | 0.0107 | 5.400000e-03 | 0.0727 | -4.000000e-04 | 0.0504 | 0.9946 | 0.063 |
et | Extra Trees Regressor | 0.0084 | 5.500000e-03 | 0.0736 | -4.090000e-02 | 0.0524 | 0.7303 | 1.349 |
xgboost | Extreme Gradient Boosting | 0.0092 | 5.700000e-03 | 0.0751 | -8.170000e-02 | 0.0524 | 0.7453 | 13.414 |
par | Passive Aggressive Regressor | 0.0443 | 6.300000e-03 | 0.0792 | -2.134000e-01 | 0.0641 | 0.6956 | 0.126 |
dt | Decision Tree Regressor | 0.0085 | 8.500000e-03 | 0.0920 | -6.787000e-01 | 0.0637 | 0.6999 | 0.087 |
lar | Least Angle Regression | 4204.0510 | 2.732328e+08 | 5227.4610 | -4.797804e+10 | 1.1769 | 7657.9611 | 0.078 |
model_metadata = models()
model_metadata['Name']
ID lr Linear Regression lasso Lasso Regression ridge Ridge Regression en Elastic Net lar Least Angle Regression llar Lasso Least Angle Regression omp Orthogonal Matching Pursuit br Bayesian Ridge ard Automatic Relevance Determination par Passive Aggressive Regressor ransac Random Sample Consensus tr TheilSen Regressor huber Huber Regressor kr Kernel Ridge svm Support Vector Regression knn K Neighbors Regressor dt Decision Tree Regressor rf Random Forest Regressor et Extra Trees Regressor ada AdaBoost Regressor gbr Gradient Boosting Regressor mlp MLP Regressor xgboost Extreme Gradient Boosting lightgbm Light Gradient Boosting Machine catboost CatBoost Regressor Name: Name, dtype: object
xgboost = create_model('xgboost')
MAE | MSE | RMSE | R2 | RMSLE | MAPE | |
---|---|---|---|---|---|---|
0 | 0.0081 | 0.0056 | 0.0746 | 0.0220 | 0.0517 | 0.7655 |
1 | 0.0110 | 0.0073 | 0.0854 | -0.2821 | 0.0610 | 0.8233 |
2 | 0.0103 | 0.0068 | 0.0826 | 0.0285 | 0.0553 | 0.7927 |
3 | 0.0070 | 0.0038 | 0.0613 | -0.1186 | 0.0446 | 0.7824 |
4 | 0.0092 | 0.0059 | 0.0767 | 0.0232 | 0.0527 | 0.7690 |
5 | 0.0095 | 0.0056 | 0.0752 | -0.4021 | 0.0537 | 0.7121 |
6 | 0.0080 | 0.0046 | 0.0681 | 0.1347 | 0.0481 | 0.6701 |
7 | 0.0102 | 0.0064 | 0.0801 | -0.1252 | 0.0564 | 0.7719 |
8 | 0.0103 | 0.0070 | 0.0836 | 0.0063 | 0.0547 | 0.8039 |
9 | 0.0078 | 0.0041 | 0.0639 | -0.1040 | 0.0461 | 0.5619 |
Mean | 0.0092 | 0.0057 | 0.0751 | -0.0817 | 0.0524 | 0.7453 |
SD | 0.0013 | 0.0012 | 0.0079 | 0.1533 | 0.0048 | 0.0743 |
gbr = create_model('gbr')
MAE | MSE | RMSE | R2 | RMSLE | MAPE | |
---|---|---|---|---|---|---|
0 | 0.0077 | 0.0043 | 0.0653 | 0.2523 | 0.0452 | 0.7609 |
1 | 0.0096 | 0.0051 | 0.0712 | 0.1097 | 0.0508 | 0.7878 |
2 | 0.0094 | 0.0051 | 0.0715 | 0.2722 | 0.0492 | 0.7616 |
3 | 0.0066 | 0.0028 | 0.0527 | 0.1733 | 0.0380 | 0.7261 |
4 | 0.0088 | 0.0049 | 0.0699 | 0.1890 | 0.0487 | 0.7902 |
5 | 0.0082 | 0.0036 | 0.0599 | 0.1085 | 0.0436 | 0.7432 |
6 | 0.0078 | 0.0037 | 0.0610 | 0.3053 | 0.0423 | 0.7333 |
7 | 0.0091 | 0.0048 | 0.0695 | 0.1519 | 0.0489 | 0.7787 |
8 | 0.0099 | 0.0053 | 0.0725 | 0.2527 | 0.0507 | 0.7392 |
9 | 0.0075 | 0.0030 | 0.0551 | 0.1793 | 0.0406 | 0.6677 |
Mean | 0.0085 | 0.0043 | 0.0649 | 0.1994 | 0.0458 | 0.7489 |
SD | 0.0010 | 0.0009 | 0.0069 | 0.0647 | 0.0043 | 0.0345 |
ada = create_model('ada')
MAE | MSE | RMSE | R2 | RMSLE | MAPE | |
---|---|---|---|---|---|---|
0 | 0.0075 | 0.0043 | 0.0656 | 0.2438 | 0.0455 | 0.7518 |
1 | 0.0093 | 0.0048 | 0.0689 | 0.1655 | 0.0490 | 0.7743 |
2 | 0.0090 | 0.0049 | 0.0700 | 0.3019 | 0.0478 | 0.7574 |
3 | 0.0066 | 0.0030 | 0.0546 | 0.1111 | 0.0393 | 0.7953 |
4 | 0.0086 | 0.0049 | 0.0698 | 0.1915 | 0.0489 | 0.7612 |
5 | 0.0080 | 0.0036 | 0.0600 | 0.1076 | 0.0435 | 0.7643 |
6 | 0.0083 | 0.0041 | 0.0638 | 0.2421 | 0.0449 | 0.7525 |
7 | 0.0086 | 0.0046 | 0.0679 | 0.1902 | 0.0477 | 0.7624 |
8 | 0.0098 | 0.0057 | 0.0756 | 0.1866 | 0.0529 | 0.7792 |
9 | 0.0074 | 0.0037 | 0.0610 | -0.0068 | 0.0436 | 0.7924 |
Mean | 0.0083 | 0.0044 | 0.0657 | 0.1734 | 0.0463 | 0.7691 |
SD | 0.0009 | 0.0007 | 0.0058 | 0.0823 | 0.0036 | 0.0148 |
svm = create_model('svm')
MAE | MSE | RMSE | R2 | RMSLE | MAPE | |
---|---|---|---|---|---|---|
0 | 0.1032 | 0.0143 | 0.1196 | -1.5111 | 0.1041 | 0.9001 |
1 | 0.1025 | 0.0142 | 0.1190 | -1.4875 | 0.1035 | 0.9001 |
2 | 0.1043 | 0.0154 | 0.1241 | -1.1907 | 0.1064 | 0.9000 |
3 | 0.1011 | 0.0124 | 0.1113 | -2.6854 | 0.0999 | 0.9001 |
4 | 0.1032 | 0.0145 | 0.1205 | -1.4077 | 0.1044 | 0.9000 |
5 | 0.1011 | 0.0128 | 0.1132 | -2.1832 | 0.1007 | 0.9001 |
6 | 0.1026 | 0.0140 | 0.1183 | -1.6072 | 0.1033 | 0.8999 |
7 | 0.1029 | 0.0143 | 0.1194 | -1.5028 | 0.1039 | 0.9000 |
8 | 0.1039 | 0.0153 | 0.1237 | -1.1782 | 0.1060 | 0.9000 |
9 | 0.1015 | 0.0127 | 0.1126 | -2.4307 | 0.1006 | 0.8998 |
Mean | 0.1026 | 0.0140 | 0.1182 | -1.7184 | 0.1033 | 0.9000 |
SD | 0.0010 | 0.0010 | 0.0042 | 0.4981 | 0.0021 | 0.0001 |
Fine tune ada
tuned_ada = tune_model(ada)
MAE | MSE | RMSE | R2 | RMSLE | MAPE | |
---|---|---|---|---|---|---|
0 | 0.0074 | 0.0043 | 0.0658 | 0.2393 | 0.0451 | 0.8013 |
1 | 0.0088 | 0.0045 | 0.0671 | 0.2096 | 0.0470 | 0.7894 |
2 | 0.0091 | 0.0051 | 0.0713 | 0.2765 | 0.0485 | 0.7853 |
3 | 0.0065 | 0.0027 | 0.0520 | 0.1955 | 0.0376 | 0.7485 |
4 | 0.0082 | 0.0046 | 0.0675 | 0.2451 | 0.0463 | 0.7930 |
5 | 0.0079 | 0.0034 | 0.0587 | 0.1460 | 0.0425 | 0.7635 |
6 | 0.0078 | 0.0040 | 0.0629 | 0.2631 | 0.0435 | 0.7690 |
7 | 0.0082 | 0.0045 | 0.0668 | 0.2161 | 0.0463 | 0.8021 |
8 | 0.0095 | 0.0053 | 0.0730 | 0.2420 | 0.0502 | 0.7932 |
9 | 0.0071 | 0.0031 | 0.0555 | 0.1651 | 0.0401 | 0.7677 |
Mean | 0.0080 | 0.0041 | 0.0641 | 0.2198 | 0.0447 | 0.7813 |
SD | 0.0009 | 0.0008 | 0.0064 | 0.0397 | 0.0037 | 0.0171 |
tuned_gbr = tune_model(gbr)
tuned_xgboost = tune_model(xgboost)
plot_model(tuned_ada, plot = 'error')
plot_model(tuned_ada, plot='feature')
evaluate_model(tuned_ada)
interactive(children=(ToggleButtons(description='Plot Type:', icons=('',), options=(('Hyperparameters', 'param…
evaluate_model(tuned_xgboost)
evaluate_model(tuned_gbr)
blend = blend_models(estimator_list = [tuned_xgboost, tuned_gbr, tuned_ada])
pred = predict_model(blend)