From: 2weiEmu Date: Mon, 3 Nov 2025 23:08:13 +0000 (+0100) Subject: more ML notes X-Git-Url: https://git.saalbach.dev/?a=commitdiff_plain;h=ec9c82db6d18fc3b53b68ef8cddf8603e39288cb;p=research-obsidian.git more ML notes --- diff --git a/.obsidian/workspace.json b/.obsidian/workspace.json index 1c07440..f10bf8a 100644 --- a/.obsidian/workspace.json +++ b/.obsidian/workspace.json @@ -125,7 +125,8 @@ } ], "direction": "horizontal", - "width": 300 + "width": 300, + "collapsed": true }, "right": { "id": "52c8cd2985704b8e", @@ -213,19 +214,19 @@ }, "active": "e2e550886a75d1d2", "lastOpenFiles": [ + "Pasted image 20251103163149.png", + "Pasted image 20251103162442.png", + "Pasted image 20251103161635.png", + "Pasted image 20251103161604.png", + "Pasted image 20251103161333.png", + "Pasted image 20251103161144.png", + "Pasted image 20251103161028.png", + "Pasted image 20251103160756.png", "Pasted image 20251102180335.png", "Pasted image 20251102180326.png", "Pasted image 20251102175852.png", - "Pasted image 20251102175814.png", - "Pasted image 20251102175755.png", "Untitled 1.md", "University/Machine Learning/Full Notes.md", - "Pasted image 20251102174704.png", - "Pasted image 20251102174424.png", - "Pasted image 20251102174407.png", - "Pasted image 20251102174358.png", - "Pasted image 20251102174343.png", - "Pasted image 20251102174335.png", "some_ideas.md", "University/Machine Learning", "Physics/Just some questions.md", diff --git a/Pasted image 20251103160756.png b/Pasted image 20251103160756.png new file mode 100644 index 0000000..a157353 Binary files /dev/null and b/Pasted image 20251103160756.png differ diff --git a/Pasted image 20251103161028.png b/Pasted image 20251103161028.png new file mode 100644 index 0000000..d180152 Binary files /dev/null and b/Pasted image 20251103161028.png differ diff --git a/Pasted image 20251103161144.png b/Pasted image 20251103161144.png new file mode 100644 index 0000000..f31662b Binary files /dev/null and b/Pasted image 20251103161144.png differ diff --git a/Pasted image 20251103161333.png b/Pasted image 20251103161333.png new file mode 100644 index 0000000..4992ee2 Binary files /dev/null and b/Pasted image 20251103161333.png differ diff --git a/Pasted image 20251103161604.png b/Pasted image 20251103161604.png new file mode 100644 index 0000000..19dace3 Binary files /dev/null and b/Pasted image 20251103161604.png differ diff --git a/Pasted image 20251103161635.png b/Pasted image 20251103161635.png new file mode 100644 index 0000000..f9f0b17 Binary files /dev/null and b/Pasted image 20251103161635.png differ diff --git a/Pasted image 20251103162442.png b/Pasted image 20251103162442.png new file mode 100644 index 0000000..2e47eba Binary files /dev/null and b/Pasted image 20251103162442.png differ diff --git a/Pasted image 20251103163149.png b/Pasted image 20251103163149.png new file mode 100644 index 0000000..f95393f Binary files /dev/null and b/Pasted image 20251103163149.png differ diff --git a/University/Machine Learning/Full Notes.md b/University/Machine Learning/Full Notes.md index 5935b31..fe0ae5e 100644 --- a/University/Machine Learning/Full Notes.md +++ b/University/Machine Learning/Full Notes.md @@ -528,3 +528,104 @@ but basically, K is the kernel function here, so you have the kernel function, s ![[Pasted image 20251102180335.png]] ## classification with Parzen Density Estimation +![[Pasted image 20251103160756.png]] + +this checks out, and then for bayes' rule as well, because by then you have enough of everything, and the class prior does not tend to be that hard to find + +![[Pasted image 20251103161028.png]] + +What does Parzen (also Kernel, don't forget that) Density Estimation do? + +- Does not assume a distribution +- estimates using the kernel function +- width matters +- shape and width are fixed + +## K-Nearest Neighbours +![[Pasted image 20251103161144.png]] +(we just kinda assume the missing point is red, because the other points around are red) + +Ok, now let us, in a way, formalise that intuition. +Make a circle on the center of the thing we are looking at, but do not fix the volume, make it as big as needed to fit the "k" nearest objects, count them and predict the class (i.e. the density, I guess, as that is what the formula implies) +![[Pasted image 20251103161333.png]] + +$$\hat{p}(x|y_i)=\frac{k_i}{n_iV_k(x)}$$ +$k_i$ number of neighbours of class i within $V_k(x)$ +$V_k$ volume of the sphere centered at x with radius r +$r$ distance to the k-th nearest neighbour +$n_i$ total number of points in class i + +![[Pasted image 20251103161604.png]] + + +![[Pasted image 20251103161635.png]] +(trivial, if you know Python I guess) + +## the influence of k +(and how it's related to the classification error) + +What is the largest / smallest value of k that you can choose? +What will be the classification error? + +The value of k has a strong effect on the performance of the algorithm +Large -> everything as the most likely class +Small: highly variable, unstable boundaries + +so what is a good way of choosing the value k? +1. set aside portion of training data (validation set) +2. vary k +3. pick k that gives best generationlisation performance + +What about equal number of positive and negative neighbours? +- use an odd k (wow great, thanks yep so smart) +- breaking ties: + - random + - prior: pick class with greater prior + - nearest: use 1-nn classifier to decide + +## distance measrures (as part of kNN) +helps define which examples are similar and which aren't +can have strong effect on performance +Euclidean Distance (numeric features:): +$$D(x,x')=\sqrt{\sum_d|x_d-{x'}_d|^2}$$ +Manhattan Distance: +$$D(x,x')=\sum_d|x_d-{x'}_d|$$ +Hamming (categorical features): +- number of features where x and x' differ +$$D(x,x')=\sum_d1_{x_d\neq{x'}_d}$$ +(crazy formula) +there are also others given on the slide (Kullback Leibler (KL) divergence (histograms) or BM25 for text) +![[Pasted image 20251103162442.png]] + +Pros and Cons: + +Pro: +- simple and flexible classifiers +- often a very good classification performance +- simple to adapt the complexity of classifier + +Cons: +- need large training sets +- complete training set has to be stored +- distances to all training objects have to be computed +- the features have to be scaled sensibly (quite sensitive) +- k has to be optimized + +## naive bayes classifier + +bayes classifier +For classification we need: p(y|x) +we can use bayes' theroem if we can estimate p(y) and p(x|y) as +$$p(y|x)=\frac{p(x|y)p(y)}{p(x)}$$ +assign an object to the class with the max. posterior probability gives the Bayes' classifer easy as pie (simply the greater than comparison here) +also don't forget: $p(x)=p(x|y_1)p(y_1)+p(x|y_0)p(y_0)$ + +Naive bayes: conditional independence assumption +We make a strong assumption: all features are independent +We assume conditional independence given y +![[Pasted image 20251103163149.png]] +(should that in the middle for the chance perhaps be a +, but maybe I am wrong, and am misunderstanding the notation) +(nvm it doesn't seem to be reading the next slide, maybe figure this out TODO) + +we just estimate $p(x_i|y)$ per feature and multiply them +$p(x|y)=p(x_1,x_2,x_3,x_4,...,x_d|y)=\prod$