}
],
"direction": "horizontal",
- "width": 300
+ "width": 300,
+ "collapsed": true
},
"right": {
"id": "52c8cd2985704b8e",
},
"active": "e2e550886a75d1d2",
"lastOpenFiles": [
+ "Pasted image 20251103163149.png",
+ "Pasted image 20251103162442.png",
+ "Pasted image 20251103161635.png",
+ "Pasted image 20251103161604.png",
+ "Pasted image 20251103161333.png",
+ "Pasted image 20251103161144.png",
+ "Pasted image 20251103161028.png",
+ "Pasted image 20251103160756.png",
"Pasted image 20251102180335.png",
"Pasted image 20251102180326.png",
"Pasted image 20251102175852.png",
- "Pasted image 20251102175814.png",
- "Pasted image 20251102175755.png",
"Untitled 1.md",
"University/Machine Learning/Full Notes.md",
- "Pasted image 20251102174704.png",
- "Pasted image 20251102174424.png",
- "Pasted image 20251102174407.png",
- "Pasted image 20251102174358.png",
- "Pasted image 20251102174343.png",
- "Pasted image 20251102174335.png",
"some_ideas.md",
"University/Machine Learning",
"Physics/Just some questions.md",
![[Pasted image 20251102180335.png]]
## classification with Parzen Density Estimation
+![[Pasted image 20251103160756.png]]
+
+this checks out, and then for bayes' rule as well, because by then you have enough of everything, and the class prior does not tend to be that hard to find
+
+![[Pasted image 20251103161028.png]]
+
+What does Parzen (also Kernel, don't forget that) Density Estimation do?
+
+- Does not assume a distribution
+- estimates using the kernel function
+- width matters
+- shape and width are fixed
+
+## K-Nearest Neighbours
+![[Pasted image 20251103161144.png]]
+(we just kinda assume the missing point is red, because the other points around are red)
+
+Ok, now let us, in a way, formalise that intuition.
+Make a circle on the center of the thing we are looking at, but do not fix the volume, make it as big as needed to fit the "k" nearest objects, count them and predict the class (i.e. the density, I guess, as that is what the formula implies)
+![[Pasted image 20251103161333.png]]
+
+$$\hat{p}(x|y_i)=\frac{k_i}{n_iV_k(x)}$$
+$k_i$ number of neighbours of class i within $V_k(x)$
+$V_k$ volume of the sphere centered at x with radius r
+$r$ distance to the k-th nearest neighbour
+$n_i$ total number of points in class i
+
+![[Pasted image 20251103161604.png]]
+
+
+![[Pasted image 20251103161635.png]]
+(trivial, if you know Python I guess)
+
+## the influence of k
+(and how it's related to the classification error)
+
+What is the largest / smallest value of k that you can choose?
+What will be the classification error?
+
+The value of k has a strong effect on the performance of the algorithm
+Large -> everything as the most likely class
+Small: highly variable, unstable boundaries
+
+so what is a good way of choosing the value k?
+1. set aside portion of training data (validation set)
+2. vary k
+3. pick k that gives best generationlisation performance
+
+What about equal number of positive and negative neighbours?
+- use an odd k (wow great, thanks yep so smart)
+- breaking ties:
+ - random
+ - prior: pick class with greater prior
+ - nearest: use 1-nn classifier to decide
+
+## distance measrures (as part of kNN)
+helps define which examples are similar and which aren't
+can have strong effect on performance
+Euclidean Distance (numeric features:):
+$$D(x,x')=\sqrt{\sum_d|x_d-{x'}_d|^2}$$
+Manhattan Distance:
+$$D(x,x')=\sum_d|x_d-{x'}_d|$$
+Hamming (categorical features):
+- number of features where x and x' differ
+$$D(x,x')=\sum_d1_{x_d\neq{x'}_d}$$
+(crazy formula)
+there are also others given on the slide (Kullback Leibler (KL) divergence (histograms) or BM25 for text)
+![[Pasted image 20251103162442.png]]
+
+Pros and Cons:
+
+Pro:
+- simple and flexible classifiers
+- often a very good classification performance
+- simple to adapt the complexity of classifier
+
+Cons:
+- need large training sets
+- complete training set has to be stored
+- distances to all training objects have to be computed
+- the features have to be scaled sensibly (quite sensitive)
+- k has to be optimized
+
+## naive bayes classifier
+
+bayes classifier
+For classification we need: p(y|x)
+we can use bayes' theroem if we can estimate p(y) and p(x|y) as
+$$p(y|x)=\frac{p(x|y)p(y)}{p(x)}$$
+assign an object to the class with the max. posterior probability gives the Bayes' classifer easy as pie (simply the greater than comparison here)
+also don't forget: $p(x)=p(x|y_1)p(y_1)+p(x|y_0)p(y_0)$
+
+Naive bayes: conditional independence assumption
+We make a strong assumption: all features are independent
+We assume conditional independence given y
+![[Pasted image 20251103163149.png]]
+(should that in the middle for the chance perhaps be a +, but maybe I am wrong, and am misunderstanding the notation)
+(nvm it doesn't seem to be reading the next slide, maybe figure this out TODO)
+
+we just estimate $p(x_i|y)$ per feature and multiply them
+$p(x|y)=p(x_1,x_2,x_3,x_4,...,x_d|y)=\prod$