From: Robert Saalbach Date: Tue, 28 Oct 2025 22:52:11 +0000 (+0100) Subject: more notes for ml X-Git-Url: https://git.saalbach.dev/?a=commitdiff_plain;h=c05e99854780bfb596407380b955bc042b12c66a;p=research-obsidian.git more notes for ml --- diff --git a/.obsidian/workspace.json b/.obsidian/workspace.json index 4db4229..b831289 100644 --- a/.obsidian/workspace.json +++ b/.obsidian/workspace.json @@ -4,16 +4,16 @@ "type": "split", "children": [ { - "id": "073fe34e944da9c4", + "id": "c6dea15c43e47f34", "type": "tabs", "children": [ { - "id": "1799484043bce33c", + "id": "73f492393c1ef2cc", "type": "leaf", "state": { "type": "markdown", "state": { - "file": "Watchlist & Do List.md", + "file": "University/Machine Learning/Full Notes.md", "mode": "source", "source": false, "backlinks": true, @@ -28,7 +28,7 @@ } }, "icon": "lucide-file", - "title": "Watchlist & Do List" + "title": "Full Notes" } } ] @@ -88,7 +88,8 @@ } ], "direction": "horizontal", - "width": 300 + "width": 300, + "collapsed": true }, "right": { "id": "52c8cd2985704b8e", @@ -174,19 +175,20 @@ "pdf-plus:PDF++: Toggle auto-paste": false } }, - "active": "1799484043bce33c", + "active": "73f492393c1ef2cc", "lastOpenFiles": [ - "Untitled.md", + "Pasted image 20251027175634.png", + "Pasted image 20251027173751.png", + "Pasted image 20251027160630.png", + "Pasted image 20251027160619.png", + "Pasted image 20251027160603.png", + "Pasted image 20251027160553.png", + "Pasted image 20251027160532.png", + "Pasted image 20251027160014.png", + "Pasted image 20251027155945.png", + "Pasted image 20251027155909.png", + "Pasted image 20251027154610.png", "University/Machine Learning/Full Notes.md", - "Pasted image 20251025164452.png", - "Pasted image 20251025163915.png", - "Pasted image 20251025162205.png", - "Pasted image 20251025161239.png", - "Pasted image 20251025131332.png", - "Pasted image 20251025122355.png", - "Pasted image 20251025122036.png", - "Pasted image 20251025121602.png", - "Pasted image 20251025120122.png", "Introduction, Software Project.md", "University/Machine Learning", "European Union/Child Sexual Abuse Act/cellar_13e33abf-d209-11ec-a95f-01aa75ed71a1.0001.02_DOC_1.pdf", @@ -204,6 +206,7 @@ "Blog/sis50.nl Experiences Writing that Software.md", "Blog/Engine-Light and Experiences writing that Software.md", "Blog/Saalbach.dev and experiences writing that software.md", + "Untitled.md", "Blog", "University/Software Project/Grading our Own Report Assignment.md", "Nebulous Command/Notes on Nebulous Command.md", @@ -216,7 +219,6 @@ "Thoughts on Politics and Researching, and finding out things that you think are right.md", "Quotes.md", "Poet List.md", - "Pasted image 20250207160807.png", "University/Software Project/General Assembly 10-1-2025.md", "University/Software Project", "University/Recontextualising Creativity/RC Research Project/References on RC Research Project.md", diff --git a/Pasted image 20251027151936.png b/Pasted image 20251027151936.png new file mode 100644 index 0000000..26c0e1b Binary files /dev/null and b/Pasted image 20251027151936.png differ diff --git a/Pasted image 20251027152814.png b/Pasted image 20251027152814.png new file mode 100644 index 0000000..40cdac5 Binary files /dev/null and b/Pasted image 20251027152814.png differ diff --git a/Pasted image 20251027154610.png b/Pasted image 20251027154610.png new file mode 100644 index 0000000..c2d2ada Binary files /dev/null and b/Pasted image 20251027154610.png differ diff --git a/Pasted image 20251027155909.png b/Pasted image 20251027155909.png new file mode 100644 index 0000000..443e8ba Binary files /dev/null and b/Pasted image 20251027155909.png differ diff --git a/Pasted image 20251027155945.png b/Pasted image 20251027155945.png new file mode 100644 index 0000000..80ee46b Binary files /dev/null and b/Pasted image 20251027155945.png differ diff --git a/Pasted image 20251027160014.png b/Pasted image 20251027160014.png new file mode 100644 index 0000000..01f3d91 Binary files /dev/null and b/Pasted image 20251027160014.png differ diff --git a/Pasted image 20251027160532.png b/Pasted image 20251027160532.png new file mode 100644 index 0000000..c1f7d67 Binary files /dev/null and b/Pasted image 20251027160532.png differ diff --git a/Pasted image 20251027160553.png b/Pasted image 20251027160553.png new file mode 100644 index 0000000..33787a1 Binary files /dev/null and b/Pasted image 20251027160553.png differ diff --git a/Pasted image 20251027160603.png b/Pasted image 20251027160603.png new file mode 100644 index 0000000..697b042 Binary files /dev/null and b/Pasted image 20251027160603.png differ diff --git a/Pasted image 20251027160619.png b/Pasted image 20251027160619.png new file mode 100644 index 0000000..aeb8e24 Binary files /dev/null and b/Pasted image 20251027160619.png differ diff --git a/Pasted image 20251027160630.png b/Pasted image 20251027160630.png new file mode 100644 index 0000000..46e2c71 Binary files /dev/null and b/Pasted image 20251027160630.png differ diff --git a/Pasted image 20251027173751.png b/Pasted image 20251027173751.png new file mode 100644 index 0000000..25193b3 Binary files /dev/null and b/Pasted image 20251027173751.png differ diff --git a/Pasted image 20251027175634.png b/Pasted image 20251027175634.png new file mode 100644 index 0000000..a25926e Binary files /dev/null and b/Pasted image 20251027175634.png differ diff --git a/University/Machine Learning/Full Notes.md b/University/Machine Learning/Full Notes.md index 2651bf0..9967887 100644 --- a/University/Machine Learning/Full Notes.md +++ b/University/Machine Learning/Full Notes.md @@ -240,3 +240,145 @@ i.e. $\lambda_{21}\space p(y_2|x)$ and $\lambda_{12}\space p(y_1|x)$ (missclassification error loss times the prob its in that class (posterior prob.)) + +# Parametric Densities + +For the output of a model we would find, for each object in the feature space: $p(y|x)$ +In practice we approx: $\hat{p}(y|x)$ +or we fit a function. + +Difference between $p(x)$ and $P(x)$ (first is probability density, second is probability mass, the first is continuous, the second discrete) + +In Bayes' rule how do you get $p(x)$? +You can compute it: (assuming two classes here): +$p(x)=p(x|y_1)p(y_1)+p(x|y_2)p(y_2)$ +![[Pasted image 20251027151936.png]] + +up to now we assumed we know $p(y|x)$ or $p(x|y), p(y)$ +but realistically we only get a sample - so we have to approx. + +For this, we need models of multiple categories: +- Discriminative and Generative +- Parametric and Nonparametric + +## Generative Models +$p(y|x)\propto p(y)p(x|y)$ + +When we know the prior and conditional densities we know everything about the data for classification +the density has to be estimated and given examples from different classes, 'standard' density estimation is sufficient +It is possible to 'generate' (sample) from the classes + +## Discriminative Models +$\hat{p}(y|x)$ +When we don't know the class conditional probs and prios, directly estimate posterior? +- hard problem: given measurements e.g. height, how to estimate $p(\text{woman}|\text{height})$ +- Strong assumptions or sloppy approx + +## Parametric Modeling & Estimation +Density Estimation and related topics: +- Simple Nonparametric approach +- curse of dimensionality +- parametric models +- sphering +- properties of gaussian +- mixture modeling + +## Histogram based Density Estimation +![[Pasted image 20251027152814.png]] + +the problem here though, is accuracy, you may need a lot of repetitions, like in this case. + +for 1 dimensional data apparently +- 1000 objects needed +For each bin we estimate on value: 50 bins, 50 parameters + +for $M$-dimensional data $\pm 1000^M$ objects needed. this becomes basically unworkable if $M>2$ + +This is part of the ***curse of dimensionality*** +intuitively, using more features should give us more information, and make prediction easier +But we hve to estimate the densities, and the number of parameters increases with the number of features +to estimate these well you need more objects +Consequence: There is an optimal number of features to use + +![[Pasted image 20251027154610.png]] + +> Parametric: need only a few parameters and assume a simple global model e.g. Gaussian +> Non-Parametric: depends on training data, simple local model such as uniform or Gassuian + +Normal Distribution === Gaussian Distribution +Standard normal distro: $\mu=0, \sigma^2=1$ +95% of data between $[u-2\sigma, u+2\sigma]$ (1 dimensional) + +1D formula: +$$p(x)=\frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)$$ +p(x) is the density don't forget that + +Why gauss? +special, central limit for large i.i.d. random vars will be gaussian +approx occurs in real life +e.g. sum of eyes on 10,000 dice throws +also has few params +easy to estimate parems when using max. likelihood + +### Multivariate Gaussians +$M$-dimensional density: +$$p(x)=\frac{1}{\sqrt{(2\pi)^M\det(\Sigma)}}\exp\left(-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu)\right)$$ +Which is also written as (crazy) +$$N(x|\mu,\Sigma)$$ +![[Pasted image 20251027155909.png]] + +![[Pasted image 20251027155945.png]] + +![[Pasted image 20251027160014.png]] +(from top-down view) + +## Max. likelihood estimates + +What are the max. likelihood estimators for the mean and the covariance matrix? (the parameters we want to estimate) +$$ +\begin{align} +\hat{\mu}=\frac{1}{n}\sum^n_{i=1}x_i \\ +\hat{\Sigma}=\frac{1}{n}\sum^n_{i=1}(x_i-\hat{\mu})(x_i-\hat{\mu})^T +\end{align} +$$ +seeing as estimate the mean for the second forumla using the first, the second estimator is biased. + +to make it unbiased, we can make one simple change +$$ +\hat{\Sigma}=\frac{1}{n-1}\sum^n_{i=1}(x_i-\hat{\mu})(x_i-\hat{\mu})^T +$$ +#### example as on slides: +![[Pasted image 20251027160553.png]] +![[Pasted image 20251027160603.png]] +![[Pasted image 20251027160619.png]]![[Pasted image 20251027160630.png]] +(x is typically on bold on these slides, because after all it is a vector if there are multiple features, and I guess technically even if there is only one feature) + +REMIND YOURSELF: "T" operation (and matrix stuff in general such as finding inverse) +![[Pasted image 20251027173751.png]] +to solve that you would need to get some 'c' for which $0\times c = 1$, which doesn't work for obvious reasons + +Number of objects is insufficient to find the inverse, two object in a 2 dimensional feature space is a _degenerate_ Gaussian distribution + +### Parametric Estimation +Now for $M$-dimensional data: +$\mu$: is a vector with $M$ elements +$\Sigma$: is a matrix with $0.5M\space (M+1)$ elements + +number of parems increases quadratically with m, and you might still need a lot of data + +I am not sure what this means but apparently "Any projection of a high-dimensional gaussian is itself again Gaussian", so I guess reducing it to one feature? but that makes sense (or a lower feature count rather) + +### Estimating class priors +Given a training set, how can you estimate $\hat{p}(y)$? +The classes are discrete, $\hat{p}(y)$ is a true probability, and often are known or assumed. If not, we need to learn them + +Max. likelihood estimator for priors turns out to be counting: +$\hat{p}(y_1)=\frac{N_1}{N}$ and $\hat{p}(y_2)=\frac{N_2}{N}$ +(you don't need unconditional probablity for finding which is larger remember) ![[Pasted image 20251027151936.png]] + +### How to define the classifier based on the estimates: +#### two-class case +Discriminant: $f(x)=\log p(y_1|x)-\log p(y_2|x)$ +and then this wonderful jumble, which is just simply plugging everything in: +![[Pasted image 20251027175634.png]] + +