From: 2weiEmu <saalbach.robert@outlook.de>
Date: Tue, 4 Nov 2025 16:23:43 +0000 (+0100)
Subject: even more ML notes
X-Git-Url: https://git.saalbach.dev/?a=commitdiff_plain;h=6d30bc12053e486c5d0d0dbfd80bf4c92c9e1912;p=research-obsidian.git

even more ML notes
---

diff --git a/.obsidian/workspace.json b/.obsidian/workspace.json
index f10bf8a..47b87d2 100644
--- a/.obsidian/workspace.json
+++ b/.obsidian/workspace.json
@@ -4,71 +4,34 @@
     "type": "split",
     "children": [
       {
-        "id": "69ea515b2aa83af1",
-        "type": "split",
+        "id": "0d762e903c6b0576",
+        "type": "tabs",
         "children": [
           {
-            "id": "0101728309e6c9e1",
-            "type": "tabs",
-            "children": [
-              {
-                "id": "e2e550886a75d1d2",
-                "type": "leaf",
-                "state": {
-                  "type": "markdown",
-                  "state": {
-                    "file": "University/Machine Learning/Full Notes.md",
-                    "mode": "source",
-                    "source": false,
-                    "backlinks": true,
-                    "backlinkOpts": {
-                      "collapseAll": false,
-                      "extraContext": false,
-                      "sortOrder": "alphabetical",
-                      "showSearch": false,
-                      "searchQuery": "",
-                      "backlinkCollapsed": false,
-                      "unlinkedCollapsed": true
-                    }
-                  },
-                  "icon": "lucide-file",
-                  "title": "Full Notes"
-                }
-              }
-            ]
-          },
-          {
-            "id": "0d762e903c6b0576",
-            "type": "tabs",
-            "children": [
-              {
-                "id": "214929be76b06d19",
-                "type": "leaf",
-                "state": {
-                  "type": "markdown",
-                  "state": {
-                    "file": "University/Machine Learning/Full Notes.md",
-                    "mode": "source",
-                    "source": false,
-                    "backlinks": true,
-                    "backlinkOpts": {
-                      "collapseAll": false,
-                      "extraContext": false,
-                      "sortOrder": "alphabetical",
-                      "showSearch": false,
-                      "searchQuery": "",
-                      "backlinkCollapsed": false,
-                      "unlinkedCollapsed": true
-                    }
-                  },
-                  "icon": "lucide-file",
-                  "title": "Full Notes"
+            "id": "214929be76b06d19",
+            "type": "leaf",
+            "state": {
+              "type": "markdown",
+              "state": {
+                "file": "University/Machine Learning/Full Notes.md",
+                "mode": "source",
+                "source": false,
+                "backlinks": true,
+                "backlinkOpts": {
+                  "collapseAll": false,
+                  "extraContext": false,
+                  "sortOrder": "alphabetical",
+                  "showSearch": false,
+                  "searchQuery": "",
+                  "backlinkCollapsed": false,
+                  "unlinkedCollapsed": true
                 }
-              }
-            ]
+              },
+              "icon": "lucide-file",
+              "title": "Full Notes"
+            }
           }
-        ],
-        "direction": "horizontal"
+        ]
       }
     ],
     "direction": "vertical"
@@ -212,21 +175,21 @@
       "pdf-plus:PDF++: Toggle auto-paste": false
     }
   },
-  "active": "e2e550886a75d1d2",
+  "active": "214929be76b06d19",
   "lastOpenFiles": [
-    "Pasted image 20251103163149.png",
-    "Pasted image 20251103162442.png",
-    "Pasted image 20251103161635.png",
-    "Pasted image 20251103161604.png",
-    "Pasted image 20251103161333.png",
-    "Pasted image 20251103161144.png",
-    "Pasted image 20251103161028.png",
-    "Pasted image 20251103160756.png",
-    "Pasted image 20251102180335.png",
-    "Pasted image 20251102180326.png",
-    "Pasted image 20251102175852.png",
-    "Untitled 1.md",
+    "Pasted image 20251104172129.png",
+    "Pasted image 20251104172116.png",
+    "Pasted image 20251104172012.png",
+    "Pasted image 20251104172000.png",
+    "Pasted image 20251104171952.png",
+    "Pasted image 20251104171659.png",
+    "Pasted image 20251104171550.png",
+    "Pasted image 20251104171524.png",
+    "Pasted image 20251104171324.png",
+    "Pasted image 20251104171316.png",
     "University/Machine Learning/Full Notes.md",
+    "Pasted image 20251104170706.png",
+    "Untitled 1.md",
     "some_ideas.md",
     "University/Machine Learning",
     "Physics/Just some questions.md",
diff --git a/Pasted image 20251104161342.png b/Pasted image 20251104161342.png
new file mode 100644
index 0000000..fc77ed4
Binary files /dev/null and b/Pasted image 20251104161342.png differ
diff --git a/Pasted image 20251104161351.png b/Pasted image 20251104161351.png
new file mode 100644
index 0000000..2760969
Binary files /dev/null and b/Pasted image 20251104161351.png differ
diff --git a/Pasted image 20251104161450.png b/Pasted image 20251104161450.png
new file mode 100644
index 0000000..521759f
Binary files /dev/null and b/Pasted image 20251104161450.png differ
diff --git a/Pasted image 20251104161501.png b/Pasted image 20251104161501.png
new file mode 100644
index 0000000..0b889a2
Binary files /dev/null and b/Pasted image 20251104161501.png differ
diff --git a/Pasted image 20251104161540.png b/Pasted image 20251104161540.png
new file mode 100644
index 0000000..2a03428
Binary files /dev/null and b/Pasted image 20251104161540.png differ
diff --git a/Pasted image 20251104161712.png b/Pasted image 20251104161712.png
new file mode 100644
index 0000000..6bdc0b7
Binary files /dev/null and b/Pasted image 20251104161712.png differ
diff --git a/Pasted image 20251104161913.png b/Pasted image 20251104161913.png
new file mode 100644
index 0000000..59e3a1e
Binary files /dev/null and b/Pasted image 20251104161913.png differ
diff --git a/Pasted image 20251104162104.png b/Pasted image 20251104162104.png
new file mode 100644
index 0000000..6994c35
Binary files /dev/null and b/Pasted image 20251104162104.png differ
diff --git a/Pasted image 20251104162129.png b/Pasted image 20251104162129.png
new file mode 100644
index 0000000..3b979f4
Binary files /dev/null and b/Pasted image 20251104162129.png differ
diff --git a/Pasted image 20251104162353.png b/Pasted image 20251104162353.png
new file mode 100644
index 0000000..44e03e7
Binary files /dev/null and b/Pasted image 20251104162353.png differ
diff --git a/Pasted image 20251104162733.png b/Pasted image 20251104162733.png
new file mode 100644
index 0000000..0e70f25
Binary files /dev/null and b/Pasted image 20251104162733.png differ
diff --git a/Pasted image 20251104162845.png b/Pasted image 20251104162845.png
new file mode 100644
index 0000000..afe49ee
Binary files /dev/null and b/Pasted image 20251104162845.png differ
diff --git a/Pasted image 20251104162934.png b/Pasted image 20251104162934.png
new file mode 100644
index 0000000..4765b02
Binary files /dev/null and b/Pasted image 20251104162934.png differ
diff --git a/Pasted image 20251104163022.png b/Pasted image 20251104163022.png
new file mode 100644
index 0000000..34ccc82
Binary files /dev/null and b/Pasted image 20251104163022.png differ
diff --git a/Pasted image 20251104163117.png b/Pasted image 20251104163117.png
new file mode 100644
index 0000000..c093f48
Binary files /dev/null and b/Pasted image 20251104163117.png differ
diff --git a/Pasted image 20251104163347.png b/Pasted image 20251104163347.png
new file mode 100644
index 0000000..3f6465f
Binary files /dev/null and b/Pasted image 20251104163347.png differ
diff --git a/Pasted image 20251104163416.png b/Pasted image 20251104163416.png
new file mode 100644
index 0000000..9992904
Binary files /dev/null and b/Pasted image 20251104163416.png differ
diff --git a/Pasted image 20251104163518.png b/Pasted image 20251104163518.png
new file mode 100644
index 0000000..b25eeb9
Binary files /dev/null and b/Pasted image 20251104163518.png differ
diff --git a/Pasted image 20251104163530.png b/Pasted image 20251104163530.png
new file mode 100644
index 0000000..9f98516
Binary files /dev/null and b/Pasted image 20251104163530.png differ
diff --git a/Pasted image 20251104165636.png b/Pasted image 20251104165636.png
new file mode 100644
index 0000000..90a4a83
Binary files /dev/null and b/Pasted image 20251104165636.png differ
diff --git a/Pasted image 20251104165837.png b/Pasted image 20251104165837.png
new file mode 100644
index 0000000..46290c4
Binary files /dev/null and b/Pasted image 20251104165837.png differ
diff --git a/Pasted image 20251104165911.png b/Pasted image 20251104165911.png
new file mode 100644
index 0000000..88b67f8
Binary files /dev/null and b/Pasted image 20251104165911.png differ
diff --git a/Pasted image 20251104165918.png b/Pasted image 20251104165918.png
new file mode 100644
index 0000000..9b83621
Binary files /dev/null and b/Pasted image 20251104165918.png differ
diff --git a/Pasted image 20251104170328.png b/Pasted image 20251104170328.png
new file mode 100644
index 0000000..96106b2
Binary files /dev/null and b/Pasted image 20251104170328.png differ
diff --git a/Pasted image 20251104170353.png b/Pasted image 20251104170353.png
new file mode 100644
index 0000000..25ab191
Binary files /dev/null and b/Pasted image 20251104170353.png differ
diff --git a/Pasted image 20251104170414.png b/Pasted image 20251104170414.png
new file mode 100644
index 0000000..fbe0758
Binary files /dev/null and b/Pasted image 20251104170414.png differ
diff --git a/Pasted image 20251104170504.png b/Pasted image 20251104170504.png
new file mode 100644
index 0000000..a8cf281
Binary files /dev/null and b/Pasted image 20251104170504.png differ
diff --git a/Pasted image 20251104170627.png b/Pasted image 20251104170627.png
new file mode 100644
index 0000000..63b9b0e
Binary files /dev/null and b/Pasted image 20251104170627.png differ
diff --git a/Pasted image 20251104170656.png b/Pasted image 20251104170656.png
new file mode 100644
index 0000000..00cd776
Binary files /dev/null and b/Pasted image 20251104170656.png differ
diff --git a/Pasted image 20251104170706.png b/Pasted image 20251104170706.png
new file mode 100644
index 0000000..2c3f716
Binary files /dev/null and b/Pasted image 20251104170706.png differ
diff --git a/Pasted image 20251104171316.png b/Pasted image 20251104171316.png
new file mode 100644
index 0000000..1388ef9
Binary files /dev/null and b/Pasted image 20251104171316.png differ
diff --git a/Pasted image 20251104171324.png b/Pasted image 20251104171324.png
new file mode 100644
index 0000000..67cee85
Binary files /dev/null and b/Pasted image 20251104171324.png differ
diff --git a/Pasted image 20251104171524.png b/Pasted image 20251104171524.png
new file mode 100644
index 0000000..fe99d78
Binary files /dev/null and b/Pasted image 20251104171524.png differ
diff --git a/Pasted image 20251104171550.png b/Pasted image 20251104171550.png
new file mode 100644
index 0000000..b8f4136
Binary files /dev/null and b/Pasted image 20251104171550.png differ
diff --git a/Pasted image 20251104171659.png b/Pasted image 20251104171659.png
new file mode 100644
index 0000000..7c55886
Binary files /dev/null and b/Pasted image 20251104171659.png differ
diff --git a/Pasted image 20251104171952.png b/Pasted image 20251104171952.png
new file mode 100644
index 0000000..4604b7f
Binary files /dev/null and b/Pasted image 20251104171952.png differ
diff --git a/Pasted image 20251104172000.png b/Pasted image 20251104172000.png
new file mode 100644
index 0000000..a2120ba
Binary files /dev/null and b/Pasted image 20251104172000.png differ
diff --git a/Pasted image 20251104172012.png b/Pasted image 20251104172012.png
new file mode 100644
index 0000000..ce0f1c6
Binary files /dev/null and b/Pasted image 20251104172012.png differ
diff --git a/Pasted image 20251104172116.png b/Pasted image 20251104172116.png
new file mode 100644
index 0000000..78c7c75
Binary files /dev/null and b/Pasted image 20251104172116.png differ
diff --git a/Pasted image 20251104172129.png b/Pasted image 20251104172129.png
new file mode 100644
index 0000000..31a1c1d
Binary files /dev/null and b/Pasted image 20251104172129.png differ
diff --git a/University/Machine Learning/Full Notes.md b/University/Machine Learning/Full Notes.md
index fe0ae5e..01de4a6 100644
--- a/University/Machine Learning/Full Notes.md	
+++ b/University/Machine Learning/Full Notes.md	
@@ -628,4 +628,216 @@ We assume conditional independence given y
 (nvm it doesn't seem to be reading the next slide, maybe figure this out TODO)
 
 we just estimate $p(x_i|y)$ per feature and multiply them
-$p(x|y)=p(x_1,x_2,x_3,x_4,...,x_d|y)=\prod$
+$$\begin{align}
+p(x|y)=p(x_1,x_2,x_3,x_4,...,x_d|y)=\prod_{i=1}^d p(x_i|y)=\\p(x_1|y)p(x_2|y)...p(x_d|y)\end{align}$$
+(there is no curse of dimensionality)
+
+### Parametric vs Non. Parametric
+But that means you still have to choose a model for $p(x_i|y)$.
+![[Pasted image 20251104161342.png]]
+![[Pasted image 20251104161351.png]]
+
+EXAMPLE:
+![[Pasted image 20251104161450.png]]
+![[Pasted image 20251104161501.png]]
+TODO: what is the $\exp$ function?
+
+![[Pasted image 20251104161540.png]]
+
+### Zero Frequency Problem
+![[Pasted image 20251104161712.png]]
+(there is also an example that has to do with email spam and this, and it seems to be not working well on Naive Bayes im ngl)
+
+Pros and Cons of Naive Bayes:
+- can handle high dimensional feature spaces
+- fast training time
+- can handle continuous and discrete data
+
+Cons:
+- can't deal with correlated features
+
+EXAMPLE
+![[Pasted image 20251104161913.png]]
+
+things you should be able to do:
+- explain the difference between parametric and non-parametric density estimation
+- explain parzen, k-nearest neighbour and niave bayes density estimation and classification in detail
+- explain the advantages and disadvatnages of those methods
+- implement knn classfier in Python
+
+# Evaluation
+![[Pasted image 20251104162104.png]]
+![[Pasted image 20251104162129.png]]
+
+### answering the question of what classifier to use
+- hard if we can't visualise the data
+- we need some kind of criteria
+	- Typical answer: classification / performance error
+- test it on independent data
+- for simplicity we assume now that classification error is good enough (though other factors may be in play)
+
+![[Pasted image 20251104162353.png]]
+
+Error is the sum of Bernoulli random variables:
+$$\hat{\epsilon}=\frac{1}{N}\sum_{i=1}^N Z_i$$ where: $$Z_i$$ is 0 if $x_i$ was correct
+and 1 if $x_i$ was incorrectly classified
+
+Variance:
+$$\sigma^2_{\hat{\epsilon}}=Var(\hat{\epsilon}|\text{test set size } N)=\frac{\epsilon(1-\epsilon)}{N}$$
+you can also compute the standard deviation for different sample sizes and error:
+![[Pasted image 20251104162733.png]]
+
+## training vs. test set size
+- Large training set -> good classifiers
+- large test set -> reliable, unbiased error estimate
+- In practice often just a single design set is given
+
+![[Pasted image 20251104162845.png]]
+
+## this is what is called bootstrapping
+![[Pasted image 20251104162934.png]]
+TODO: okay honestly I don't entirely get this ngl
+## k-fold cross validation
+![[Pasted image 20251104163022.png]]
+TODO: i don't understand this for the same reason
+do you also retrain the classifier?
+I guess so, same with the one above, you do this many many times, and I guess you take the best idea you ahve in either case
+so I guess that checks out.
+
+## leave-one-out procedure
+![[Pasted image 20251104163117.png]]
+i assume the same goes here as for the other ones
+
+## hyper-parameters
+- ML methods often have 'hyperparameters'
+- Parzen density estimator: width "h"
+- knn: number of neighbours "k"
+- decisions trees: pruning method, stopping criterion
+- neural networks: architecture, learning rate
+
+- Don't optimise these numbers by looking at the test set!
+
+## double cross validation
+
+![[Pasted image 20251104163416.png]]
+we going crazy now, and you can apparently use this to optimise the hyperparameters
+
+![[Pasted image 20251104163518.png]]
+
+
+## apparent classifciation error
+![[Pasted image 20251104165636.png]]
+
+## learning curves
+- curves that plot (estimated) classification errors against the number of sampels in training set
+- usually plot error both on training and on test set
+- gives insight into:
+	- amount of overtraining
+	- usefulness of additional data
+	- allows comparison between classifiers
+	- stability of training
+
+There is no single best classifier 
+![[Pasted image 20251104165837.png]]
+![[Pasted image 20251104165911.png]]
+![[Pasted image 20251104165918.png]]
+
+- larger training sets yield better classifiers (wow really)
+- independent test sets needed for unbiased error estimates
+- larger tests yield more accurate error estimates
+- LOO cross validation "optimal" but may be infeasible
+- 10-fold cross validation is often used
+- more complex classifiers need larger training sets
+	- as well as larger feature sets
+- small training sets need simpler classifeiers or smaller feature sets
+
+## squared error:
+imagine you have the following error:
+$$E[||g(x)-y||^2]$$
+you can derive something more general
+
+## bias-variance dilemma
+- when we are given some data we may get lucky, or unlucky:
+	- sometiems we get very a-typical data
+- to say something general we need to average over different (training) sets
+
+the classifier is now also a function of the training set:
+$$D = \{(y_i,x_i); i=1,...,N\}$$
+$$g(x;D)$$
+![[Pasted image 20251104170328.png]]
+![[Pasted image 20251104170353.png]]
+![[Pasted image 20251104170414.png]]
+
+variance: how much does classifer g vary over different training sets
+bias: how much does the average classifer g differ from the true output
+
+![[Pasted image 20251104170504.png]]
+
+this was originally derived for neutral networks and squared error
+general phenomenon though: we encounter it often in pattern recognition
+
+more simple classifier is more stable (and needs less data)
+more complex classifier only works when you have sufficnet training data
+
+## feature curve
+![[Pasted image 20251104170656.png]]
+![[Pasted image 20251104170706.png]]
+
+there is a fundamental tradeoff between the two error / performances of the two classes
+
+Standard Classification Error: $$\epsilon=\epsilon_1p(y_1)+\epsilon_2p(y_2)$$
+Weighted Classification Error: $$\epsilon=\lambda_{12}\epsilon_1p(y_1)+\lambda_{21}\epsilon_2p(y_2)$$
+F1-Score (harmonic Mean): $$F_1=2\frac{\text{precision}\cdot\text{recall}}{\text{precision}+\text{recall}}$$
+## types of error and performance measures
+
+Error: Probability of Erroneous Classifications
+Performance / Accuracy: 1 - error
+Sensitivty of a target class [e.g. diseased patients]: performance for objects from that target class
+Specificity: performance for all objects outside target class
+Precision of a target class: fraction of correct objects among all objects assigned to that class
+Recall: fraction of correctly classifed objects; identical to sensitivity when related to particular class
+True positive rate: identical to sensitivity
+False Positive Rate: error for all objects outside target
+
+## confusion matrices
+Provides counts of class-dependent errors: how many objects have been classified as A that should have been B?
+- give a more deteailed view than overall error
+- cna be used to estimate overall cost for classifier
+
+
+![[Pasted image 20251104171316.png]]
+![[Pasted image 20251104171324.png]]
+
+## ROC Analysis (receiver operator characteristic)
+![[Pasted image 20251104171524.png]]
+![[Pasted image 20251104171550.png]]
+TODO: waht?
+
+### area under ROC curve: AUC
+![[Pasted image 20251104171659.png]]
+
+### how to interpret ROC and AUC:
+- each point on the ROC curve represents a specific classification threshold (ok that is cool but what is that TODO)
+- A classifier that randomly guesses produces a curve along the diagonal line (from-bottom left to top right) - ok that checks out
+- A classifier that perfectly separates will reach the top left corner (true positive rate =1 and false positive rate = 0): AUC = 1.0
+- so the closer the ROC curve is to the top-left corner the better the classifer is at distinguishing between the two classes
+
+is the threshold like how many thigns we give it acccess to or something? TODO (seems to be, something like that)
+
+![[Pasted image 20251104171952.png]]
+![[Pasted image 20251104172012.png]]
+![[Pasted image 20251104172116.png]]
+
+ok this checks out more and more
+
+![[Pasted image 20251104172129.png]]
+
+conclusions:
+- there is no best classifier
+- there are alternative principles to find a good classifier
+	- maximising the likelihood
+	- minimising the classification error
+	- minimising the mean squared error
+
+- there is a fundamental tradeoff between the bias and the variance of a classifer (depending on how flexible / complex the classifier is)
+- finding the correct regulariser is a 'black art' of ML