This tutorial is not about math methods and models. I'll focus on the environment for learning ML
Learning process is a hard way. And there are many factors that got influence on this process. Your mood often make big sense.
And in self-study programms motivation and attitude - your main friends.
Now there will be a bit of philosophy, and I ask you don't scroll to the Technical part. There will be some big ideas.
Folks that came over Tutorials in this session are already definitely strong guys and knows a lot.
I think that this notebook (maybe an article) will be more useful for the next session patients. А lot of tutorials from the previous course played an important role for me.
Why do I think that this course is the best of the best? Two things. First - excellent presentation of the material. And the second is - mechanism of motivation. It is awesome. COMPETITIONS. The competitive process gives a rise to the best decisions and motivates the participants. So it was with me.
But, what if the new feature for regression no longer come to mind
Forget it
Both of them takes time, which directly (almost always) depends on the hardware which you launching your model.
If you also discovered the world of gradient boostings
I have an elderly laptop with i5 and 4 GB RAM. My brain is faster than him sometimes.
I was annoyed by his long freezes on training of the model or the memory error in challenge of beating the baselines in this course.
The first variant:
Buy a new hardware. What we have there, oh yes:
I7-8700k 400 $
1080TI 1000 $
64 GB RAM = in about 500 $
motherboard
ssd disk and so on...
Stop-stop, I'm just learning ...
Option 2:
Сode or computing optimization. A good choice, the right groundwork for the future. But deadlines-work-study-family... where to find time for all of this? Plus, I spend a lot of time in transport (metro-train-transfer) and I want to spend my time effectively. In addition, good idea often comes in the most unexpected places and I want to immediately test it.
In fact, there are many companies whom offer resources for cloud computing. An important aspect for me is that the service must by free of charge, flexy and completeness of the opportunities provided. Therefore, I've choosed Google.
On light computations you can work with Colab or Kaggle Kernels. But there is limit of the memory.
There may be several reasons. The most often is a lack of computing resources (memory). And the most annoying thing is that all cells now need to be restarted. When you save the resulting objects in txt or pickle you are the winner. But only practice makes perfect. At the beginning of the learning path, such restarts of the kernel can, in the first place, enrage. Secondly, it takes a lot of time.
For now we going to use of a nice gift from Google in the amount of 300$.
What you need to start:
There is nothing complicated, everything is standard. Mail and phone confirmation. Specify a real phone number it will be used for billing confirmation.
You'll see that they give us 300$ and 12 months. I note that my account decreased by only 10.53 $ during this course from Professor Yorko and his colleagues.
You can safely confirm the card (system will block $ 1 on it for one hour in average)
Open the hardware shop (or rental).
Take more powerful. 8 CPU and 52 GB RAM.
I think there is no point for my instance to take the power of the GPU. Because neural networks in this course are not considered. And if you want - there is a free GPU in Google Colab. But some of the boosting libraries are now supports GPU. Therefore, keep a close eye on the changes and choose the parameters of the instance based on your needs.
I felt an urgent need for additional resources when I started using XGBOOST, the documentation for it says that there is support for the GPU, but when I started on the Colab GPUs, it began to kill the kernels.
But wait a minute. You'll need to set a static IP adress:
wget http://repo.continuum.io/archive/Anaconda3-5.0.1-Linux-x86_64.sh bash Anaconda3-5.0.1-Linux-x86_64.sh
Read the rules, review and run the installation.
jupyter notebook --generate-config
nano ~/.jupyter/jupyter_notebook_config.py
c.NotebookApp.ip = '*'
c.NotebookApp.open_browser = False
c.NotebookApp.port = 8000
Jupyter notebook --ip=0.0.0.0
"StaticIP" - External IP of instance
All the necessary libraries can be installed by run the following command.
! pip install lightGBM
Let's see what's under the hood:
!cat /proc/cpuinfo
There are 8 pcs like this
!cat /proc/meminfo
Not bad, it's time to ride on
!mkdir -p ~/.kaggle
https://www.kaggle.com/%22YourAccountName%22/account
!cp kaggle.json ~/.kaggle/
Now you can download/send files from Kaggle. And it very fast
Make new dir for Medium
!mkdir Medium
import os os.chdir('Medium')
!kaggle competitions download -c how-good-is-your-medium-article.
The same dataset, the same code. Look at the time:
Fine.
!git clone https://github.com/mikhailsergeevi4/crm.git
Commit
And push them back.
!git push origin master
There are many other useful tricks about Google Cloud and Colab. But I think it's enough for this time
As a result, you get a powerful machine for study ML.
Learn, or work, or make presentations of your notebook from any device.
Presented 300$ is enough to complete the course. I've used instances mostly for high resource operations. For simple things, there is Google Colab (with a free video card, by the way)
IMPORTANT: don't forget to stop the instance when you finished. This machine will quickly devour the allocated budget!