From: 2weiEmu Date: Tue, 4 Nov 2025 16:23:43 +0000 (+0100) Subject: even more ML notes X-Git-Url: https://git.saalbach.dev/?a=commitdiff_plain;h=6d30bc12053e486c5d0d0dbfd80bf4c92c9e1912;p=research-obsidian.git even more ML notes --- diff --git a/.obsidian/workspace.json b/.obsidian/workspace.json index f10bf8a..47b87d2 100644 --- a/.obsidian/workspace.json +++ b/.obsidian/workspace.json @@ -4,71 +4,34 @@ "type": "split", "children": [ { - "id": "69ea515b2aa83af1", - "type": "split", + "id": "0d762e903c6b0576", + "type": "tabs", "children": [ { - "id": "0101728309e6c9e1", - "type": "tabs", - "children": [ - { - "id": "e2e550886a75d1d2", - "type": "leaf", - "state": { - "type": "markdown", - "state": { - "file": "University/Machine Learning/Full Notes.md", - "mode": "source", - "source": false, - "backlinks": true, - "backlinkOpts": { - "collapseAll": false, - "extraContext": false, - "sortOrder": "alphabetical", - "showSearch": false, - "searchQuery": "", - "backlinkCollapsed": false, - "unlinkedCollapsed": true - } - }, - "icon": "lucide-file", - "title": "Full Notes" - } - } - ] - }, - { - "id": "0d762e903c6b0576", - "type": "tabs", - "children": [ - { - "id": "214929be76b06d19", - "type": "leaf", - "state": { - "type": "markdown", - "state": { - "file": "University/Machine Learning/Full Notes.md", - "mode": "source", - "source": false, - "backlinks": true, - "backlinkOpts": { - "collapseAll": false, - "extraContext": false, - "sortOrder": "alphabetical", - "showSearch": false, - "searchQuery": "", - "backlinkCollapsed": false, - "unlinkedCollapsed": true - } - }, - "icon": "lucide-file", - "title": "Full Notes" + "id": "214929be76b06d19", + "type": "leaf", + "state": { + "type": "markdown", + "state": { + "file": "University/Machine Learning/Full Notes.md", + "mode": "source", + "source": false, + "backlinks": true, + "backlinkOpts": { + "collapseAll": false, + "extraContext": false, + "sortOrder": "alphabetical", + "showSearch": false, + "searchQuery": "", + "backlinkCollapsed": false, + "unlinkedCollapsed": true } - } - ] + }, + "icon": "lucide-file", + "title": "Full Notes" + } } - ], - "direction": "horizontal" + ] } ], "direction": "vertical" @@ -212,21 +175,21 @@ "pdf-plus:PDF++: Toggle auto-paste": false } }, - "active": "e2e550886a75d1d2", + "active": "214929be76b06d19", "lastOpenFiles": [ - "Pasted image 20251103163149.png", - "Pasted image 20251103162442.png", - "Pasted image 20251103161635.png", - "Pasted image 20251103161604.png", - "Pasted image 20251103161333.png", - "Pasted image 20251103161144.png", - "Pasted image 20251103161028.png", - "Pasted image 20251103160756.png", - "Pasted image 20251102180335.png", - "Pasted image 20251102180326.png", - "Pasted image 20251102175852.png", - "Untitled 1.md", + "Pasted image 20251104172129.png", + "Pasted image 20251104172116.png", + "Pasted image 20251104172012.png", + "Pasted image 20251104172000.png", + "Pasted image 20251104171952.png", + "Pasted image 20251104171659.png", + "Pasted image 20251104171550.png", + "Pasted image 20251104171524.png", + "Pasted image 20251104171324.png", + "Pasted image 20251104171316.png", "University/Machine Learning/Full Notes.md", + "Pasted image 20251104170706.png", + "Untitled 1.md", "some_ideas.md", "University/Machine Learning", "Physics/Just some questions.md", diff --git a/Pasted image 20251104161342.png b/Pasted image 20251104161342.png new file mode 100644 index 0000000..fc77ed4 Binary files /dev/null and b/Pasted image 20251104161342.png differ diff --git a/Pasted image 20251104161351.png b/Pasted image 20251104161351.png new file mode 100644 index 0000000..2760969 Binary files /dev/null and b/Pasted image 20251104161351.png differ diff --git a/Pasted image 20251104161450.png b/Pasted image 20251104161450.png new file mode 100644 index 0000000..521759f Binary files /dev/null and b/Pasted image 20251104161450.png differ diff --git a/Pasted image 20251104161501.png b/Pasted image 20251104161501.png new file mode 100644 index 0000000..0b889a2 Binary files /dev/null and b/Pasted image 20251104161501.png differ diff --git a/Pasted image 20251104161540.png b/Pasted image 20251104161540.png new file mode 100644 index 0000000..2a03428 Binary files /dev/null and b/Pasted image 20251104161540.png differ diff --git a/Pasted image 20251104161712.png b/Pasted image 20251104161712.png new file mode 100644 index 0000000..6bdc0b7 Binary files /dev/null and b/Pasted image 20251104161712.png differ diff --git a/Pasted image 20251104161913.png b/Pasted image 20251104161913.png new file mode 100644 index 0000000..59e3a1e Binary files /dev/null and b/Pasted image 20251104161913.png differ diff --git a/Pasted image 20251104162104.png b/Pasted image 20251104162104.png new file mode 100644 index 0000000..6994c35 Binary files /dev/null and b/Pasted image 20251104162104.png differ diff --git a/Pasted image 20251104162129.png b/Pasted image 20251104162129.png new file mode 100644 index 0000000..3b979f4 Binary files /dev/null and b/Pasted image 20251104162129.png differ diff --git a/Pasted image 20251104162353.png b/Pasted image 20251104162353.png new file mode 100644 index 0000000..44e03e7 Binary files /dev/null and b/Pasted image 20251104162353.png differ diff --git a/Pasted image 20251104162733.png b/Pasted image 20251104162733.png new file mode 100644 index 0000000..0e70f25 Binary files /dev/null and b/Pasted image 20251104162733.png differ diff --git a/Pasted image 20251104162845.png b/Pasted image 20251104162845.png new file mode 100644 index 0000000..afe49ee Binary files /dev/null and b/Pasted image 20251104162845.png differ diff --git a/Pasted image 20251104162934.png b/Pasted image 20251104162934.png new file mode 100644 index 0000000..4765b02 Binary files /dev/null and b/Pasted image 20251104162934.png differ diff --git a/Pasted image 20251104163022.png b/Pasted image 20251104163022.png new file mode 100644 index 0000000..34ccc82 Binary files /dev/null and b/Pasted image 20251104163022.png differ diff --git a/Pasted image 20251104163117.png b/Pasted image 20251104163117.png new file mode 100644 index 0000000..c093f48 Binary files /dev/null and b/Pasted image 20251104163117.png differ diff --git a/Pasted image 20251104163347.png b/Pasted image 20251104163347.png new file mode 100644 index 0000000..3f6465f Binary files /dev/null and b/Pasted image 20251104163347.png differ diff --git a/Pasted image 20251104163416.png b/Pasted image 20251104163416.png new file mode 100644 index 0000000..9992904 Binary files /dev/null and b/Pasted image 20251104163416.png differ diff --git a/Pasted image 20251104163518.png b/Pasted image 20251104163518.png new file mode 100644 index 0000000..b25eeb9 Binary files /dev/null and b/Pasted image 20251104163518.png differ diff --git a/Pasted image 20251104163530.png b/Pasted image 20251104163530.png new file mode 100644 index 0000000..9f98516 Binary files /dev/null and b/Pasted image 20251104163530.png differ diff --git a/Pasted image 20251104165636.png b/Pasted image 20251104165636.png new file mode 100644 index 0000000..90a4a83 Binary files /dev/null and b/Pasted image 20251104165636.png differ diff --git a/Pasted image 20251104165837.png b/Pasted image 20251104165837.png new file mode 100644 index 0000000..46290c4 Binary files /dev/null and b/Pasted image 20251104165837.png differ diff --git a/Pasted image 20251104165911.png b/Pasted image 20251104165911.png new file mode 100644 index 0000000..88b67f8 Binary files /dev/null and b/Pasted image 20251104165911.png differ diff --git a/Pasted image 20251104165918.png b/Pasted image 20251104165918.png new file mode 100644 index 0000000..9b83621 Binary files /dev/null and b/Pasted image 20251104165918.png differ diff --git a/Pasted image 20251104170328.png b/Pasted image 20251104170328.png new file mode 100644 index 0000000..96106b2 Binary files /dev/null and b/Pasted image 20251104170328.png differ diff --git a/Pasted image 20251104170353.png b/Pasted image 20251104170353.png new file mode 100644 index 0000000..25ab191 Binary files /dev/null and b/Pasted image 20251104170353.png differ diff --git a/Pasted image 20251104170414.png b/Pasted image 20251104170414.png new file mode 100644 index 0000000..fbe0758 Binary files /dev/null and b/Pasted image 20251104170414.png differ diff --git a/Pasted image 20251104170504.png b/Pasted image 20251104170504.png new file mode 100644 index 0000000..a8cf281 Binary files /dev/null and b/Pasted image 20251104170504.png differ diff --git a/Pasted image 20251104170627.png b/Pasted image 20251104170627.png new file mode 100644 index 0000000..63b9b0e Binary files /dev/null and b/Pasted image 20251104170627.png differ diff --git a/Pasted image 20251104170656.png b/Pasted image 20251104170656.png new file mode 100644 index 0000000..00cd776 Binary files /dev/null and b/Pasted image 20251104170656.png differ diff --git a/Pasted image 20251104170706.png b/Pasted image 20251104170706.png new file mode 100644 index 0000000..2c3f716 Binary files /dev/null and b/Pasted image 20251104170706.png differ diff --git a/Pasted image 20251104171316.png b/Pasted image 20251104171316.png new file mode 100644 index 0000000..1388ef9 Binary files /dev/null and b/Pasted image 20251104171316.png differ diff --git a/Pasted image 20251104171324.png b/Pasted image 20251104171324.png new file mode 100644 index 0000000..67cee85 Binary files /dev/null and b/Pasted image 20251104171324.png differ diff --git a/Pasted image 20251104171524.png b/Pasted image 20251104171524.png new file mode 100644 index 0000000..fe99d78 Binary files /dev/null and b/Pasted image 20251104171524.png differ diff --git a/Pasted image 20251104171550.png b/Pasted image 20251104171550.png new file mode 100644 index 0000000..b8f4136 Binary files /dev/null and b/Pasted image 20251104171550.png differ diff --git a/Pasted image 20251104171659.png b/Pasted image 20251104171659.png new file mode 100644 index 0000000..7c55886 Binary files /dev/null and b/Pasted image 20251104171659.png differ diff --git a/Pasted image 20251104171952.png b/Pasted image 20251104171952.png new file mode 100644 index 0000000..4604b7f Binary files /dev/null and b/Pasted image 20251104171952.png differ diff --git a/Pasted image 20251104172000.png b/Pasted image 20251104172000.png new file mode 100644 index 0000000..a2120ba Binary files /dev/null and b/Pasted image 20251104172000.png differ diff --git a/Pasted image 20251104172012.png b/Pasted image 20251104172012.png new file mode 100644 index 0000000..ce0f1c6 Binary files /dev/null and b/Pasted image 20251104172012.png differ diff --git a/Pasted image 20251104172116.png b/Pasted image 20251104172116.png new file mode 100644 index 0000000..78c7c75 Binary files /dev/null and b/Pasted image 20251104172116.png differ diff --git a/Pasted image 20251104172129.png b/Pasted image 20251104172129.png new file mode 100644 index 0000000..31a1c1d Binary files /dev/null and b/Pasted image 20251104172129.png differ diff --git a/University/Machine Learning/Full Notes.md b/University/Machine Learning/Full Notes.md index fe0ae5e..01de4a6 100644 --- a/University/Machine Learning/Full Notes.md +++ b/University/Machine Learning/Full Notes.md @@ -628,4 +628,216 @@ We assume conditional independence given y (nvm it doesn't seem to be reading the next slide, maybe figure this out TODO) we just estimate $p(x_i|y)$ per feature and multiply them -$p(x|y)=p(x_1,x_2,x_3,x_4,...,x_d|y)=\prod$ +$$\begin{align} +p(x|y)=p(x_1,x_2,x_3,x_4,...,x_d|y)=\prod_{i=1}^d p(x_i|y)=\\p(x_1|y)p(x_2|y)...p(x_d|y)\end{align}$$ +(there is no curse of dimensionality) + +### Parametric vs Non. Parametric +But that means you still have to choose a model for $p(x_i|y)$. +![[Pasted image 20251104161342.png]] +![[Pasted image 20251104161351.png]] + +EXAMPLE: +![[Pasted image 20251104161450.png]] +![[Pasted image 20251104161501.png]] +TODO: what is the $\exp$ function? + +![[Pasted image 20251104161540.png]] + +### Zero Frequency Problem +![[Pasted image 20251104161712.png]] +(there is also an example that has to do with email spam and this, and it seems to be not working well on Naive Bayes im ngl) + +Pros and Cons of Naive Bayes: +- can handle high dimensional feature spaces +- fast training time +- can handle continuous and discrete data + +Cons: +- can't deal with correlated features + +EXAMPLE +![[Pasted image 20251104161913.png]] + +things you should be able to do: +- explain the difference between parametric and non-parametric density estimation +- explain parzen, k-nearest neighbour and niave bayes density estimation and classification in detail +- explain the advantages and disadvatnages of those methods +- implement knn classfier in Python + +# Evaluation +![[Pasted image 20251104162104.png]] +![[Pasted image 20251104162129.png]] + +### answering the question of what classifier to use +- hard if we can't visualise the data +- we need some kind of criteria + - Typical answer: classification / performance error +- test it on independent data +- for simplicity we assume now that classification error is good enough (though other factors may be in play) + +![[Pasted image 20251104162353.png]] + +Error is the sum of Bernoulli random variables: +$$\hat{\epsilon}=\frac{1}{N}\sum_{i=1}^N Z_i$$ where: $$Z_i$$ is 0 if $x_i$ was correct +and 1 if $x_i$ was incorrectly classified + +Variance: +$$\sigma^2_{\hat{\epsilon}}=Var(\hat{\epsilon}|\text{test set size } N)=\frac{\epsilon(1-\epsilon)}{N}$$ +you can also compute the standard deviation for different sample sizes and error: +![[Pasted image 20251104162733.png]] + +## training vs. test set size +- Large training set -> good classifiers +- large test set -> reliable, unbiased error estimate +- In practice often just a single design set is given + +![[Pasted image 20251104162845.png]] + +## this is what is called bootstrapping +![[Pasted image 20251104162934.png]] +TODO: okay honestly I don't entirely get this ngl +## k-fold cross validation +![[Pasted image 20251104163022.png]] +TODO: i don't understand this for the same reason +do you also retrain the classifier? +I guess so, same with the one above, you do this many many times, and I guess you take the best idea you ahve in either case +so I guess that checks out. + +## leave-one-out procedure +![[Pasted image 20251104163117.png]] +i assume the same goes here as for the other ones + +## hyper-parameters +- ML methods often have 'hyperparameters' +- Parzen density estimator: width "h" +- knn: number of neighbours "k" +- decisions trees: pruning method, stopping criterion +- neural networks: architecture, learning rate + +- Don't optimise these numbers by looking at the test set! + +## double cross validation + +![[Pasted image 20251104163416.png]] +we going crazy now, and you can apparently use this to optimise the hyperparameters + +![[Pasted image 20251104163518.png]] + + +## apparent classifciation error +![[Pasted image 20251104165636.png]] + +## learning curves +- curves that plot (estimated) classification errors against the number of sampels in training set +- usually plot error both on training and on test set +- gives insight into: + - amount of overtraining + - usefulness of additional data + - allows comparison between classifiers + - stability of training + +There is no single best classifier +![[Pasted image 20251104165837.png]] +![[Pasted image 20251104165911.png]] +![[Pasted image 20251104165918.png]] + +- larger training sets yield better classifiers (wow really) +- independent test sets needed for unbiased error estimates +- larger tests yield more accurate error estimates +- LOO cross validation "optimal" but may be infeasible +- 10-fold cross validation is often used +- more complex classifiers need larger training sets + - as well as larger feature sets +- small training sets need simpler classifeiers or smaller feature sets + +## squared error: +imagine you have the following error: +$$E[||g(x)-y||^2]$$ +you can derive something more general + +## bias-variance dilemma +- when we are given some data we may get lucky, or unlucky: + - sometiems we get very a-typical data +- to say something general we need to average over different (training) sets + +the classifier is now also a function of the training set: +$$D = \{(y_i,x_i); i=1,...,N\}$$ +$$g(x;D)$$ +![[Pasted image 20251104170328.png]] +![[Pasted image 20251104170353.png]] +![[Pasted image 20251104170414.png]] + +variance: how much does classifer g vary over different training sets +bias: how much does the average classifer g differ from the true output + +![[Pasted image 20251104170504.png]] + +this was originally derived for neutral networks and squared error +general phenomenon though: we encounter it often in pattern recognition + +more simple classifier is more stable (and needs less data) +more complex classifier only works when you have sufficnet training data + +## feature curve +![[Pasted image 20251104170656.png]] +![[Pasted image 20251104170706.png]] + +there is a fundamental tradeoff between the two error / performances of the two classes + +Standard Classification Error: $$\epsilon=\epsilon_1p(y_1)+\epsilon_2p(y_2)$$ +Weighted Classification Error: $$\epsilon=\lambda_{12}\epsilon_1p(y_1)+\lambda_{21}\epsilon_2p(y_2)$$ +F1-Score (harmonic Mean): $$F_1=2\frac{\text{precision}\cdot\text{recall}}{\text{precision}+\text{recall}}$$ +## types of error and performance measures + +Error: Probability of Erroneous Classifications +Performance / Accuracy: 1 - error +Sensitivty of a target class [e.g. diseased patients]: performance for objects from that target class +Specificity: performance for all objects outside target class +Precision of a target class: fraction of correct objects among all objects assigned to that class +Recall: fraction of correctly classifed objects; identical to sensitivity when related to particular class +True positive rate: identical to sensitivity +False Positive Rate: error for all objects outside target + +## confusion matrices +Provides counts of class-dependent errors: how many objects have been classified as A that should have been B? +- give a more deteailed view than overall error +- cna be used to estimate overall cost for classifier + + +![[Pasted image 20251104171316.png]] +![[Pasted image 20251104171324.png]] + +## ROC Analysis (receiver operator characteristic) +![[Pasted image 20251104171524.png]] +![[Pasted image 20251104171550.png]] +TODO: waht? + +### area under ROC curve: AUC +![[Pasted image 20251104171659.png]] + +### how to interpret ROC and AUC: +- each point on the ROC curve represents a specific classification threshold (ok that is cool but what is that TODO) +- A classifier that randomly guesses produces a curve along the diagonal line (from-bottom left to top right) - ok that checks out +- A classifier that perfectly separates will reach the top left corner (true positive rate =1 and false positive rate = 0): AUC = 1.0 +- so the closer the ROC curve is to the top-left corner the better the classifer is at distinguishing between the two classes + +is the threshold like how many thigns we give it acccess to or something? TODO (seems to be, something like that) + +![[Pasted image 20251104171952.png]] +![[Pasted image 20251104172012.png]] +![[Pasted image 20251104172116.png]] + +ok this checks out more and more + +![[Pasted image 20251104172129.png]] + +conclusions: +- there is no best classifier +- there are alternative principles to find a good classifier + - maximising the likelihood + - minimising the classification error + - minimising the mean squared error + +- there is a fundamental tradeoff between the bias and the variance of a classifer (depending on how flexible / complex the classifier is) +- finding the correct regulariser is a 'black art' of ML