ML

Machine Learning

Learning materials

Lectures: Oct 8 (partially saved) Oct 15 Oct 22 Oct 29 lost (forgot to save) Nov 5 Nov 12 Nov 19 Nov 26 Dec 3 Dec 10 Dec 17 Jan 7 Jan 14 Jan 21 Jan 28
You may watch at least two first videos from Neural networks by 3blue1brown.
A (freely available as a pdf) book An Introduction to Statistical Learning with Applications in R, by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani, will be used for a large part of the lecture.

Lecture 2, October 15

Lecture 3, October 22

Demo code

Lecture 4, October 29

Notes were not saved. We were talking about bias--variance trade off, regularization, Ridge and Lasso regression. Also about handling categorical variables by dummy encodings.

Demo code

Lectures 5--6, November 5, 12

Demo code and figures

Lecture 9, December 3

Demo code taken from scikit documentation.

Lecture 10, December 10

Assignment 1

Write neural network from scratch to deal with a classification problem of your choice.
Questions and answers:

Q1. are we allowed to base our solution on the code that you sent us?
A1. yes, but it's not the best choice. It's more natural to use layer as a base building block.
Q2. what kind of solution is expected - should it simply be a capable neural net that is able to learn from a dataset + a more complex net that was trained beforehand + some functions that show the results, statistics etc. about the performance of the net?
A2. Yes + training beforehand is allowed, but is not obligatory (too complex to save/load weights) + some functions will be needed, but no fancy demonstrations are expected.

Database with handwritten digits:
you may take the files mnist_loader.py and mnist.pkl.gz from this repository. Please read the documentation in mnist_loader.py to learn how to use it (it's simple!).

Assignment 2

Data. Look for multivariate, categorical. Iris, breast cancer and titanic (not on that webpage) are one of the most commonly used. Minimal program: use decision trees to model; check pruning, find alpha by cross-correlations (or checking on test data); check bagging and random forests.

Assignment 3

Find data that needs some processing, e.g. here. Then use SVM or k-means clustering (the choice is up to you, but you may need to choose appropriate data for doing that step).