by Rahel Lüthy
I have recently bought Hilary Mason’s An Introduction to Machine Learning with Web Data to refresh my rusty ML memory. The video class offers a very solid, yet entertaining mix of coding and theory, and is definitely worth watching.
Hilary is coding in Python, but I took her excellent examples as an inspiration to further practice my Scala. Based on Breeze, a set of libraries for machine learning and numerical computing, I wrote my first little classifier.
My classifier uses a support vector machine (SVM) to distinguish images of two distinct categories — one containing circles, the other containing crosses. I used 10 different fonts to create the two sets of test images:
I used the red component of each pixel’s RGB value as a feature vector. Given that all images are in grey scale, this seemed like a good first choice. And indeed, after training the SVM with 9 images of each category, the 10th image can be classified with a probability of roughly 0.75.
All code can be found on GitHub.