Machine Learning
Learning materials
- Lectures: Oct 8 (partially saved) Oct 15 Oct 22 Oct 29 lost (forgot to save) Nov 5 Nov 12 Nov 19 Nov 26 Dec 3 Dec 10 Dec 17 Jan 7 Jan 14 Jan 21 Jan 28
- You may watch at least two first videos from Neural networks by 3blue1brown.
- A (freely available as a pdf) book An Introduction to Statistical Learning with Applications in R, by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani, will be used for a large part of the lecture.
Lecture 2, October 15
Lecture 3, October 22
Lecture 4, October 29
Notes were not saved. We were talking about bias--variance trade off, regularization, Ridge and Lasso regression. Also about handling categorical variables by dummy encodings.
Demo code
Lectures 5--6, November 5, 12
Lecture 9, December 3
Demo code taken from scikit documentation.
Lecture 10, December 10
Lecture 11, December 17
Lecture 13, January 14
Lecture 14, January 15
Assignment 1
Write neural network from scratch to deal with a classification problem of your choice.
Questions and answers:
- Q1. are we allowed to base our solution on the code that you sent us?
A1. yes, but it's not the best choice. It's more natural to use layer as a base building block. - Q2. what kind of solution is expected - should it simply be a capable neural net that is able to learn from a dataset + a more complex net that was trained beforehand + some functions that show the results, statistics etc. about the performance of the net?
A2. Yes + training beforehand is allowed, but is not obligatory (too complex to save/load weights) + some functions will be needed, but no fancy demonstrations are expected.
Database with handwritten digits:
you may take the files mnist_loader.py and mnist.pkl.gz from
this repository. Please read the documentation in mnist_loader.py to learn how to use it (it's simple!).
Assignment 2
Data. Look for multivariate, categorical. Iris, breast cancer and titanic (not on that webpage) are one of the most commonly used. Minimal program: use decision trees to model; check pruning, find alpha by cross-correlations (or checking on test data); check bagging and random forests.
Assignment 3
Find data that needs some processing, e.g. here. Then use SVM or k-means clustering (the choice is up to you, but you may need to choose appropriate data for doing that step).