Machine Learning
Learning materials
- You may watch at least two first videos from Neural networks by 3blue1brown.
- A (freely available as a pdf) book An Introduction to Statistical Learning with Applications in R, by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani, will be used for a large part of the lecture.
- Lectures: Oct 6, Oct 13, Oct 20 (by Dr. Bogus), Oct 27, Nov 03 and screenshots, Nov 10, Nov 17, Nov 24, Dec 01, Dec 08 screenshot, Dec 22, Jan 12, Jan 19, Jan 26
Time slots for the 'exam'
Friday, Feb 4, 1-2 pm. We will use the same zoom meeting as for the lectures.
(other days will be announced later)
Office hours
tba
Assignment 1
linear regression (0.5 points) and Ridge (iterative method: 0.5 points; formula: 0.25 points). Deadline: labs during the week 18--22 October.
Assignment 2
Write neural network from scratch to deal with a classification problem of your choice.
It is enough to implement a 2-layer dense neural network and play with some of the hyperparameters. With a good implementation you may easily take more layers and experiment with the number of layers as well,
but it is not obligatory.
Suggestions: use the MNIST database with handwritten digits (see below); use Layer as a basic building block so that you may use efficient matrix operations.
You may look at the forward pass/backpropagation example, sort of hand-calculated using very small 2x2 network.
For debugging you may adapt the unit tests from a (bad) example implementation. (This implementation uses Neuron as a base building block, which is a bad idea.)
Database with handwritten digits:
you may take the files mnist_loader.py and mnist.pkl.gz from
this repository. Please read the documentation in mnist_loader.py to learn how to use it (it's simple!).
Assignment 3
Data. Look for multivariate, categorical. Iris, breast cancer and titanic (not on that webpage) are one of the most commonly used (but Iris is not the best, it has too few observations). Minimal program: use decision trees to model; check pruning, find alpha by cross-correlations (or checking on test data); check at least 2 out of these three: bagging, random forests, boosting.
Assignment 4
Use k-means and hierarchical clustering on a data set of your choice. Run it several times (for k-means) and use at least 2 different linkage types (for hierarchical). You may use e.g. scikit. Deadline: January, 21.
Assignment 5
Use CNNs or RNNs for some problem of your choice. For example, you may use CNN for image classification, or RNNs for some time series prediction. You may use Keras or some other framework. Deadline: January, 25.
Rules
- There will be at least 5 assignments for the labs, each worth 1 point.
Failing to deliver the solution before the deadline will result in multiplying the score (between 0 and 1) by a factor (1 - 0.1 * delay in weeks).
You are supposed both to discuss your solution with the instructor and to send your code e.g. by email. Only the one that you do first is taken into account when considering delay.
You need to come to the labs to discuss your solution, apart from that your attendance is voluntary. You are always welcome to come, ask questions, etc. However, if you do not intend to ask questions or discuss your solutions it might be better (for you) if you do not come. It is also allowed to leave the labs at any time you wish, you do not have to ask for a permission - in fact it is better if you leave as quietly as possible. - At the end there will be a sort of an oral exam. Before that a list of methods (from the lecture), which you should know for the exam, will be given. During the exam a student marks the methods they know and will be asked questions only about those. For the answers a score p between 0 and 1 will be given. The final exam score will be calculated as p*(0.2 + (number of methods marked) / (number of all methods)). Cheating will result in 0 points and 2.0 as a final mark.
- To obtain a positive mark it is sufficient and necessary to obtain at least 2.5 points total, including at least 0.5 from the 'exam'. For the marks 3.0, 3.5, ..., 5.0 one needs to obtain at least: 0.5; 0.575; 0.65; 0.725; 0.8 points from the exam AND at least: 2.5; 3.25; 3.75; 4.25; 4.75 points total, respectively. For example for 4.5 (aka 4+), it suffices to obtain 4.25 points total, including at least 0.725 from the 'exam'.