Machine Learning Tools for Beginners: A Personal Journey

Machine Learning Tools for Beginners: A Personal Journey
Image by Danni Liu adapted from vnosokin/ Canva

I still remember my very first foray into the world of data science and machine learning. I was overwhelmed by the sheer number of tools, platforms, and programming languages used in this field (Check out the image below if you don't believe me). It felt like I was caught in a tsunami of information, and honestly, it left me feeling paralyzed for a while. However, I eventually picked myself up and waded through the chaos, and after spending a year or so exploring and experimenting with various tools, I gained much more clarity.

Data & AI Landscape 2019 from Matt Turck et al.

I think many of the tools you'll encounter are just distractions. As a beginner, focusing on getting the foundations right is essential. Once we have a solid foundation, we can quickly build upon it. In this blog, I will share my recommendations for tools and languages to focus on if you're interested in learning machine learning or becoming a data scientist. Please bear in mind that I'm neither a data scientist nor proficient in machine learning. I'm merely sharing my personal experience and the knowledge I've gained during my journey so far.

I've broken it down into three categories:

  • Language
  • Development tools
  • Libraries

Language

Let's start with languages. Many programming languages exist, like C, C++, C#, Python, Ruby, Java, Scala, and R. The two most widely used languages in data science are R and Python, with Python being the most prevalent in machine learning. Other go-to languages for machine learning include C and C++. If you already have programming experience and are proficient in C and C++, it makes sense to stick with what you know. However, if you're a beginner like me with no prior programming experience, I suggest you learn Python.

Here are my reasons for choosing to focus on Python over other languages:

  • Easy to learn: Compared to other languages, learning Python is a breeze. While I don't have much programming experience, I have a bit of exposure to languages like C through online courses. Python has a simple and easy-to-understand syntax, almost like English. Additionally, it has built-in features that hide complex aspects of computer programming that we don't need to worry about when starting, such as managing memory.
  • Rich Libraries: Python has numerous libraries for machine learning, such as NumPy, Pandas, Matplotlib, Scikit-learn, and TensorFlow. Libraries are collections of pre-written code that programmers can use to perform common tasks without having to write the code themselves. It's like having a library of books you can borrow instead of writing your own book from scratch.
  • Large and active community: Python has a vast and active community of developers and data scientists who contribute to open-source (free) projects, making it easy to find resources, tutorials, and support online.
  • Flexibility: Python is a versatile language that can be used for various tasks, ranging from data cleaning and exploratory analysis to machine learning and deep learning. Python can also easily integrate with other programming languages and tools.

Development Tools

Development tools are software programs developers use to create, test, and maintain software applications. It's like using Microsoft Word to write, edit, and format a document. There are several development tools on the market. I've tried a few, though I didn't spend a great deal of time on all of them. These are the ones I've tried:

  • Atom
  • PyCharm
  • Visual Studio Code
  • Jupyter Notebook
  • Google Colab

Of the ones I've tried, Jupyter Notebook and Google Colab are the most popular among beginners. Both tools are geared towards machine learning and data science. Jupyter Notebook and Google Colab are both web-based applications and are incredibly easy to use.

Jupyter Notebook is like a digital notebook where you can write and run code in a browser. It allows you to write code in small sections called "cells," which can be run individually, making it easy to test and experiment with code. Additionally, Jupyter Notebook enables you to include text, images, and other media in your notebook, making it an excellent tool for documenting and sharing your work.

Jupyter Notebook with content on Numpy sourced from Coursera: Supervised Machine Learning: Regression and Classification
Jupyter Notebook with content on Numpy sourced from Coursera: Supervised Machine Learning: Regression and Classification

Google Colab is similar to Jupyter Notebook but runs on Google's cloud infrastructure, providing free access to GPUs and TPUs for faster computation. It allows you to run machine learning algorithms and experiment with data without worrying about setting up your own computer environment.

Just a side note for those who don't know what GPUs and TPUs are. GPUs are Graphics Processing Units, and TPUs are Tensor Processing Units. Both are types of computer hardware designed to perform complex calculations more quickly and efficiently than traditional CPUs (Central Processing Units).

Atom, Pycharm, and Visual Studio Code are general-purpose development tools that can be used for a wide range of programming tasks, including machine learning and data science. They provide powerful features like code completion, debugging, and version control integration but have steeper learning curves.

Atom is a text editor, meaning it doesn't have as many features as Pycharm and Visual Studio Code. Atom was my first development tool; honestly, it wasn't very memorable. I wouldn't invest time in learning this tool.

Pycharm is a tool that I just started exploring. A data scientist I know really likes it, so I decided to have a look. It is designed explicitly for Python development and has a clean, clutter-free interface. I think people like this tool because it's tailored for Python development.

Visual Studio Code is more general-purpose. It caters to many programming languages, and Python is just one of many. The user interface is somewhat similar to Pycharm. Visual Studio Code has many features and a large, active community. I personally like Visual Studio Code because it supports multiple languages.

So, for complete novices, I recommend Jupyter Notebook or Google Colab. Both are beginner-friendly.

Libraries

Earlier, I explained what libraries are, so I won't repeat myself here. NumPy, Pandas, Matplotlib, Scikit-learn, TensorFlow, and Keras are essential libraries to learn for machine learning. I'm learning these myself.

Before you learn these libraries, I strongly recommend you have a solid foundation in Python programming, which includes knowing the basics of Python, such as variables, data types, control structures, functions, modules, and loops. Then, you can move on to the libraries. If you don't yet have this knowledge under your wing, you could consider these resources:

  • Python Basics by the RealPython.com Tutorial Team
  • Captain Code by Ben Forta and Shmuel Forta- apart from teaching you the basics of Python it's also great for learning programming principles and best practices.
  • Automate the Boring Stuff with Python by Al Sweigart- the first six chapters cover the basics of Python, and then it goes into teaching you how to automate some of the repetitive stuff.

Now to the libraries. Below, I've laid out the order in which I'm learning these libraries. I've ordered them this way because it follows a logical progression from basic Python concepts to more advanced machine learning concepts:

  • NumPy: NumPy stands for Numerical Python. This library provides support for working with large sets of numbers. You'll need to learn how to create and manipulate arrays and perform basic mathematical operations.

  • Pandas: This library is for data manipulation and analysis. The name "Pandas" has nothing to do with China's national treasure; it is derived from "panel data," which refers to multi-dimensional structured data sets commonly used in statistical analysis. This library is used extensively in machine learning for data preprocessing and cleaning. Here, you'll need to learn how to load data into Pandas data frames, manipulate and clean data, and perform basic exploratory data analysis.

  • Matplotlib: This popular library creates static, animated, and interactive visualizations in Python. You'll need to learn a visualization library for several reasons in machine learning: exploring the data, feature engineering, model evaluation, and communicating the results of machine learning models to others.

Another popular alternative visualization library is Seaborne. Seaborne is more advanced than Matplotlib, but Matplotlib is easier to pick up for beginners. So, I suggest learning Matplotlib first and then progressing to Seaborne later.

  • Scikit-learn: This library provides a wide range of machine learning algorithms and tools for data preprocessing, feature extraction, and model selection. With this library, you'll need to learn the different machine learning algorithms provided by Scikit-learn and how to train and evaluate models. Did you know that it's been suggested that there are over 1000 different machine learning algorithms? It's mind-boggling!

  • TensorFlow: This library is used for building intelligent machines that can learn on their own. It's used to create neural networks, which are like virtual brains that can learn from data and make predictions.

  • Keras: This library makes building and training neural networks easier using TensorFlow. It's like a shortcut that simplifies the process of creating smart machines.

Alright, that's a wrap. Learning machine learning isn't easy. It's been quite a wild ride already for me. Hopefully, your ride will be a bit smoother with what I've just shared. This journey will take time, dedication and patience, but I know it'll be worth it. Remember to stay curious, keep learning and have fun during the process should you embark on this journey because there is no end destination in this field hahahaha … This field is constantly evolving and changing, and that's what makes it so exciting!