Schemescape

Development log of a life-long coder

Getting started with machine learning

One of the reasons I decided to give Python one last try is that Python is popular for machine learning, and machine learning is a topic I'm interested in.

Why?

Why am I interested in machine learning?

First steps

As an introduction, I'm following MIT's Introduction to Machine Learning (2020) class. It's in Python and builds on NumPy. The first 4 weeks focus on linear classifiers for binary classification.

Although my math is rusty, my biggest struggle is actually with the NumPy API. Here is my list of grievances:

Experimentation

Using a sample data set

To get a better handle on NumPy, I'd like to actually attempt to create a linear classifier from scratch. A quick search led me to a page with links to data sets for binary classification problems. I'm using UCI Machine Learning Repository's "banknote authentication" data set because the data format is simple (4 predictor variables and a 0 or 1 for the classification).

To my surprise, simple algorithms (e.g. selecting random parameters) were able to correctly classify over 95% of examples. For what it's worth, my code is posted here.

Solving a problem from scratch

I'd like to try solving a real world problem from scratch, but I don't really have a problem in mind that lends itself to binary classification. For what it's worth, here are some Kaggle data sets that might eventually inspire me: