"type": "split",
"children": [
{
- "id": "073fe34e944da9c4",
+ "id": "c6dea15c43e47f34",
"type": "tabs",
"children": [
{
- "id": "1799484043bce33c",
+ "id": "73f492393c1ef2cc",
"type": "leaf",
"state": {
"type": "markdown",
"state": {
- "file": "Watchlist & Do List.md",
+ "file": "University/Machine Learning/Full Notes.md",
"mode": "source",
"source": false,
"backlinks": true,
}
},
"icon": "lucide-file",
- "title": "Watchlist & Do List"
+ "title": "Full Notes"
}
}
]
}
],
"direction": "horizontal",
- "width": 300
+ "width": 300,
+ "collapsed": true
},
"right": {
"id": "52c8cd2985704b8e",
"pdf-plus:PDF++: Toggle auto-paste": false
}
},
- "active": "1799484043bce33c",
+ "active": "73f492393c1ef2cc",
"lastOpenFiles": [
- "Untitled.md",
+ "Pasted image 20251027175634.png",
+ "Pasted image 20251027173751.png",
+ "Pasted image 20251027160630.png",
+ "Pasted image 20251027160619.png",
+ "Pasted image 20251027160603.png",
+ "Pasted image 20251027160553.png",
+ "Pasted image 20251027160532.png",
+ "Pasted image 20251027160014.png",
+ "Pasted image 20251027155945.png",
+ "Pasted image 20251027155909.png",
+ "Pasted image 20251027154610.png",
"University/Machine Learning/Full Notes.md",
- "Pasted image 20251025164452.png",
- "Pasted image 20251025163915.png",
- "Pasted image 20251025162205.png",
- "Pasted image 20251025161239.png",
- "Pasted image 20251025131332.png",
- "Pasted image 20251025122355.png",
- "Pasted image 20251025122036.png",
- "Pasted image 20251025121602.png",
- "Pasted image 20251025120122.png",
"Introduction, Software Project.md",
"University/Machine Learning",
"European Union/Child Sexual Abuse Act/cellar_13e33abf-d209-11ec-a95f-01aa75ed71a1.0001.02_DOC_1.pdf",
"Blog/sis50.nl Experiences Writing that Software.md",
"Blog/Engine-Light and Experiences writing that Software.md",
"Blog/Saalbach.dev and experiences writing that software.md",
+ "Untitled.md",
"Blog",
"University/Software Project/Grading our Own Report Assignment.md",
"Nebulous Command/Notes on Nebulous Command.md",
"Thoughts on Politics and Researching, and finding out things that you think are right.md",
"Quotes.md",
"Poet List.md",
- "Pasted image 20250207160807.png",
"University/Software Project/General Assembly 10-1-2025.md",
"University/Software Project",
"University/Recontextualising Creativity/RC Research Project/References on RC Research Project.md",
$\lambda_{21}\space p(y_2|x)$ and $\lambda_{12}\space p(y_1|x)$
(missclassification error loss times the prob its in that class (posterior prob.))
+
+# Parametric Densities
+
+For the output of a model we would find, for each object in the feature space: $p(y|x)$
+In practice we approx: $\hat{p}(y|x)$
+or we fit a function.
+
+Difference between $p(x)$ and $P(x)$ (first is probability density, second is probability mass, the first is continuous, the second discrete)
+
+In Bayes' rule how do you get $p(x)$?
+You can compute it: (assuming two classes here):
+$p(x)=p(x|y_1)p(y_1)+p(x|y_2)p(y_2)$
+![[Pasted image 20251027151936.png]]
+
+up to now we assumed we know $p(y|x)$ or $p(x|y), p(y)$
+but realistically we only get a sample - so we have to approx.
+
+For this, we need models of multiple categories:
+- Discriminative and Generative
+- Parametric and Nonparametric
+
+## Generative Models
+$p(y|x)\propto p(y)p(x|y)$
+
+When we know the prior and conditional densities we know everything about the data for classification
+the density has to be estimated and given examples from different classes, 'standard' density estimation is sufficient
+It is possible to 'generate' (sample) from the classes
+
+## Discriminative Models
+$\hat{p}(y|x)$
+When we don't know the class conditional probs and prios, directly estimate posterior?
+- hard problem: given measurements e.g. height, how to estimate $p(\text{woman}|\text{height})$
+- Strong assumptions or sloppy approx
+
+## Parametric Modeling & Estimation
+Density Estimation and related topics:
+- Simple Nonparametric approach
+- curse of dimensionality
+- parametric models
+- sphering
+- properties of gaussian
+- mixture modeling
+
+## Histogram based Density Estimation
+![[Pasted image 20251027152814.png]]
+
+the problem here though, is accuracy, you may need a lot of repetitions, like in this case.
+
+for 1 dimensional data apparently +- 1000 objects needed
+For each bin we estimate on value: 50 bins, 50 parameters
+
+for $M$-dimensional data $\pm 1000^M$ objects needed. this becomes basically unworkable if $M>2$
+
+This is part of the ***curse of dimensionality***
+intuitively, using more features should give us more information, and make prediction easier
+But we hve to estimate the densities, and the number of parameters increases with the number of features
+to estimate these well you need more objects
+Consequence: There is an optimal number of features to use
+
+![[Pasted image 20251027154610.png]]
+
+> Parametric: need only a few parameters and assume a simple global model e.g. Gaussian
+> Non-Parametric: depends on training data, simple local model such as uniform or Gassuian
+
+Normal Distribution === Gaussian Distribution
+Standard normal distro: $\mu=0, \sigma^2=1$
+95% of data between $[u-2\sigma, u+2\sigma]$ (1 dimensional)
+
+1D formula:
+$$p(x)=\frac{1}{\sqrt{2\pi\sigma^2}}\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)$$
+p(x) is the density don't forget that
+
+Why gauss?
+special, central limit for large i.i.d. random vars will be gaussian
+approx occurs in real life
+e.g. sum of eyes on 10,000 dice throws
+also has few params
+easy to estimate parems when using max. likelihood
+
+### Multivariate Gaussians
+$M$-dimensional density:
+$$p(x)=\frac{1}{\sqrt{(2\pi)^M\det(\Sigma)}}\exp\left(-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu)\right)$$
+Which is also written as (crazy)
+$$N(x|\mu,\Sigma)$$
+![[Pasted image 20251027155909.png]]
+
+![[Pasted image 20251027155945.png]]
+
+![[Pasted image 20251027160014.png]]
+(from top-down view)
+
+## Max. likelihood estimates
+
+What are the max. likelihood estimators for the mean and the covariance matrix? (the parameters we want to estimate)
+$$
+\begin{align}
+\hat{\mu}=\frac{1}{n}\sum^n_{i=1}x_i \\
+\hat{\Sigma}=\frac{1}{n}\sum^n_{i=1}(x_i-\hat{\mu})(x_i-\hat{\mu})^T
+\end{align}
+$$
+seeing as estimate the mean for the second forumla using the first, the second estimator is biased.
+
+to make it unbiased, we can make one simple change
+$$
+\hat{\Sigma}=\frac{1}{n-1}\sum^n_{i=1}(x_i-\hat{\mu})(x_i-\hat{\mu})^T
+$$
+#### example as on slides:
+![[Pasted image 20251027160553.png]]
+![[Pasted image 20251027160603.png]]
+![[Pasted image 20251027160619.png]]![[Pasted image 20251027160630.png]]
+(x is typically on bold on these slides, because after all it is a vector if there are multiple features, and I guess technically even if there is only one feature)
+
+REMIND YOURSELF: "T" operation (and matrix stuff in general such as finding inverse)
+![[Pasted image 20251027173751.png]]
+to solve that you would need to get some 'c' for which $0\times c = 1$, which doesn't work for obvious reasons
+
+Number of objects is insufficient to find the inverse, two object in a 2 dimensional feature space is a _degenerate_ Gaussian distribution
+
+### Parametric Estimation
+Now for $M$-dimensional data:
+$\mu$: is a vector with $M$ elements
+$\Sigma$: is a matrix with $0.5M\space (M+1)$ elements
+
+number of parems increases quadratically with m, and you might still need a lot of data
+
+I am not sure what this means but apparently "Any projection of a high-dimensional gaussian is itself again Gaussian", so I guess reducing it to one feature? but that makes sense (or a lower feature count rather)
+
+### Estimating class priors
+Given a training set, how can you estimate $\hat{p}(y)$?
+The classes are discrete, $\hat{p}(y)$ is a true probability, and often are known or assumed. If not, we need to learn them
+
+Max. likelihood estimator for priors turns out to be counting:
+$\hat{p}(y_1)=\frac{N_1}{N}$ and $\hat{p}(y_2)=\frac{N_2}{N}$
+(you don't need unconditional probablity for finding which is larger remember) ![[Pasted image 20251027151936.png]]
+
+### How to define the classifier based on the estimates:
+#### two-class case
+Discriminant: $f(x)=\log p(y_1|x)-\log p(y_2|x)$
+and then this wonderful jumble, which is just simply plugging everything in:
+![[Pasted image 20251027175634.png]]
+
+