This notebook was created by Jean de Dieu Nyandwi for the love of machine learning community. For any feedback, errors or suggestion, he can be reached on email (johnjw7084 at gmail dot com), Twitter, or LinkedIn.
You have probably heard the notion that Machine Learning Engineers and Data Scientists spend more than 80% cleaning the data. That is not to exagerate, it is true and there is a reason. "Good model comes from good data". In the real world, many datasets are messy. and it takes enormous amount of time to get the data in good shape.
Preparing data is a process. You can manipulate, remove, or create new features. To elaborate it, here are more things that you are most likely to do:
(image_name.bmp)
for example. Images are usually in jpg
or .png
formats.The above points are a shallow list, not a step after a step. There are also other things involved as well, such as shuffling or randomizing the dataset, splitting the data into training, validation and test sets.
As this was a recipe, we will see a proper work flow of data preparation in the next chapters when doing end to end machine learning projects.