From: Robert Saalbach Date: Sat, 25 Oct 2025 23:10:21 +0000 (+0200) Subject: more ml notes X-Git-Url: https://git.saalbach.dev/?a=commitdiff_plain;h=a0415313212cc0f9a0d189f6b60cc2c433c6704e;p=research-obsidian.git more ml notes --- diff --git a/.obsidian/workspace.json b/.obsidian/workspace.json index 127ff11..7282378 100644 --- a/.obsidian/workspace.json +++ b/.obsidian/workspace.json @@ -4,11 +4,11 @@ "type": "split", "children": [ { - "id": "dc2933b8cfa1398d", + "id": "073fe34e944da9c4", "type": "tabs", "children": [ { - "id": "7e4d864f85e26e39", + "id": "1799484043bce33c", "type": "leaf", "state": { "type": "markdown", @@ -88,7 +88,8 @@ } ], "direction": "horizontal", - "width": 300 + "width": 300, + "collapsed": true }, "right": { "id": "52c8cd2985704b8e", @@ -174,28 +175,20 @@ "pdf-plus:PDF++: Toggle auto-paste": false } }, - "active": "7e4d864f85e26e39", + "active": "1799484043bce33c", "lastOpenFiles": [ - "Watchlist & Do List.md", "University/Machine Learning/Full Notes.md", - "University/Machine Learning", - "WT Skins.md", - "Untitled.md", - "Thoughts on Politics and Researching, and finding out things that you think are right.md", - "System Overview and Links.md", - "Some cool music perhaps?.md", - "Quotes.md", - "Poet List.md", - "Pasted image 20250207160807.png", - "Pasted image 20250103161642.png", + "Pasted image 20251025164452.png", + "Pasted image 20251025163915.png", + "Pasted image 20251025162205.png", + "Pasted image 20251025161239.png", + "Pasted image 20251025131332.png", + "Pasted image 20251025122355.png", + "Pasted image 20251025122036.png", + "Pasted image 20251025121602.png", + "Pasted image 20251025120122.png", "Introduction, Software Project.md", - "get over it, everyone's tipsy. Dance..md", - "Food with the boys - money note.md", - "AI/Thoughts on the Ethics of AI.md", - "AI/References on AI.md", - "4th and 5th Gen Fighters/Grippen and F-35 Deep Dive.md", - "European Military Capability/On the Military Readiness of Europe.md", - "European Military Capability/References on European Military Capability.md", + "University/Machine Learning", "European Union/Child Sexual Abuse Act/cellar_13e33abf-d209-11ec-a95f-01aa75ed71a1.0001.02_DOC_1.pdf", "European Union/Child Sexual Abuse Act/Notes on the Original Document.md", "Physics/Just some questions.md", @@ -203,23 +196,31 @@ "European Union/Untitled.md", "European Union", "Physics/Experience of the world from a Light Particle's Point of view?.md", + "Watchlist & Do List.md", "Physics", + "4th and 5th Gen Fighters/Grippen and F-35 Deep Dive.md", "4th and 5th Gen Fighters", "Blog/My Design Philosophy when creating (web) apps.md", "Blog/sis50.nl Experiences Writing that Software.md", "Blog/Engine-Light and Experiences writing that Software.md", "Blog/Saalbach.dev and experiences writing that software.md", + "Untitled.md", "Blog", "University/Software Project/Grading our Own Report Assignment.md", + "Nebulous Command/Notes on Nebulous Command.md", + "University/Software Project/Notes Masters Thesis (AIDM).md", "Nebulous Command", + "Food with the boys - money note.md", + "get over it, everyone's tipsy. Dance..md", + "WT Skins.md", + "System Overview and Links.md", + "Thoughts on Politics and Researching, and finding out things that you think are right.md", + "Quotes.md", + "Poet List.md", + "Pasted image 20250207160807.png", + "University/Software Project/General Assembly 10-1-2025.md", "University/Software Project", - "University/Recontextualising Creativity/RC Research Project", - "University/Computer Security/Pasted image 20250416221211.png", - "University/Computer Security/Pasted image 20250416221155.png", - "NoteOn/note-on-logo.png", - "NoteOn/note-on-500-oops.svg", - "NoteOn/note-on-404-not-found.svg", - "NoteOn/note-on-401-unauthorized.svg", - "NoteOn/Frame 1.png" + "University/Recontextualising Creativity/RC Research Project/References on RC Research Project.md", + "University/Recontextualising Creativity/RC Research Project" ] } \ No newline at end of file diff --git a/Pasted image 20251025120122.png b/Pasted image 20251025120122.png new file mode 100644 index 0000000..57ecd89 Binary files /dev/null and b/Pasted image 20251025120122.png differ diff --git a/Pasted image 20251025121602.png b/Pasted image 20251025121602.png new file mode 100644 index 0000000..ae58611 Binary files /dev/null and b/Pasted image 20251025121602.png differ diff --git a/Pasted image 20251025122036.png b/Pasted image 20251025122036.png new file mode 100644 index 0000000..be55fce Binary files /dev/null and b/Pasted image 20251025122036.png differ diff --git a/Pasted image 20251025122355.png b/Pasted image 20251025122355.png new file mode 100644 index 0000000..f6a387e Binary files /dev/null and b/Pasted image 20251025122355.png differ diff --git a/Pasted image 20251025131332.png b/Pasted image 20251025131332.png new file mode 100644 index 0000000..2df24e2 Binary files /dev/null and b/Pasted image 20251025131332.png differ diff --git a/Pasted image 20251025161239.png b/Pasted image 20251025161239.png new file mode 100644 index 0000000..62bc96c Binary files /dev/null and b/Pasted image 20251025161239.png differ diff --git a/Pasted image 20251025162205.png b/Pasted image 20251025162205.png new file mode 100644 index 0000000..12cbc6d Binary files /dev/null and b/Pasted image 20251025162205.png differ diff --git a/Pasted image 20251025163915.png b/Pasted image 20251025163915.png new file mode 100644 index 0000000..0e80d0f Binary files /dev/null and b/Pasted image 20251025163915.png differ diff --git a/Pasted image 20251025164452.png b/Pasted image 20251025164452.png new file mode 100644 index 0000000..4729ccd Binary files /dev/null and b/Pasted image 20251025164452.png differ diff --git a/University/Machine Learning/Full Notes.md b/University/Machine Learning/Full Notes.md index e69de29..2651bf0 100644 --- a/University/Machine Learning/Full Notes.md +++ b/University/Machine Learning/Full Notes.md @@ -0,0 +1,242 @@ +## What is Machine Learning +> Study of the algorithms that use example objects (data) to learn mappings from objects to desires outcomes. + +Input -> "Model" -> Prediction + +![[Pasted image 20251025120122.png]] + +### Why Machine Learning? +1. Many tasks are too complicated to explicitly encode. + +**Goal of Machine Learning**: $E_O[L(g, O)]$ +$E_O$ -> Expected Loss (Risk), on avg. how bad is my model +$L$ -> Loss function, how much does a mistake "cost"? +$g$ -> Model (mapping from inputs to outputs) +$O$ -> Object, the thing that I want to make a prediction for + +The following are the steps of machine learning: +```mermaid +graph LR; + +Object--Representation-->a("Feature Values") +a("Feature Values")--Model-->Prediction +Prediction--Evaluation-->b("Performance Measure") +``` + +How to represent an object mathematically? +What measurements do you make? + +The measurements = the features +and they are listed using a *vector*. +This vector space is called the **feature space** + +what measurements can I do that will help me distinguish the apple from the pear? +Perhaps? Roundness and weight? + +Now we've hit the classification problem. What does a mapping / classifier / model look like? + +A function that assign an output to every location in the input space, in our case $g: \mathbb{R}^2 -> \{\text{apple},\text{pear}\}$ + +The line at which the decision changes is called the *decision boundary* + + +Good features makes discrimination easier (between classes) +Measuring the wrong things will make this task impossible + +In order to have a performance measure we need a human to "label" data. so that we can have this be automatic + +![[Pasted image 20251025121602.png]] + +choosing a performance metric for evaluation can be hard. + +Now what about the learning part? +What about remembering? + +We want to learn an input-output mapping that works for previously unseen objects as well. + +### Learning: how to find a mapping + +Most L approaches: find parameters w for a function that maximises the performance +1. What functions? +2. We don't know the real answer + +This is the entirety of what this course deals with basically. + +There are linear classifiers (it's a line through the space) +There are non linear classifiers (they are a bit funky) + +The performance measure having some impact on the model is the "learning" part that we care about. + +![[Pasted image 20251025122036.png]] + +How to do evaluation using the data? +Idea: split into training set and test set +Training set to fit the model +Evaluation using the test set +(should be a good general indicator because of the i.i.d) + +### Types of Machine Learning + +- Supervised Learning + - Examples: Classification, Regression +- Unsupervised Learning + - Examples: Dimensionality Reduction, Clustering +- Reinforcement Learning (not course material) + - Example: Select optimal actions + +![[Pasted image 20251025122355.png]] + +Unsupervised Learning +(there is no output label given) + +Density estimation +-> for each x, give an estimate of the probability density p(x) + +Clustering +-> Identify a small number of "groups" in the data + +Dimensionality Reduction +-> find a lower dimensional description of high dimensional objects +-> e.g. construct new features that contain most of the information in a smaller number of features + +We are not looking to find the learning algorithm that is always best, but we want to find out which approach works well for which problem and understand why. +(the reason we study so many classifiers) + +In total: +- ML, use examples to learn mapping from input to desired output. Generalise well to new examples +- Supervised & Unsupervised + +## Learning, Probability and Decision Theory + +Training Set: +All examples are labeled +This set is used to train / develop our system + +Test Set: +These examples cannot be used to train our system +The examples do not have to be labeled +When there are labels we can use it to evaluate the system's performance + +Objects are encoded by defining features + +![[Pasted image 20251025131332.png]] + +Variations in measurements exist because objects in a class vary and measurements will never be perfect (noise). + +We need a tool to describe ad reason about variations: + +Given a feature and a training set, where is the blue class? + +Class posterior probability +(in this case the chance something is blue given that have a feature) +-> For each object we want to estimate $p(\text{blue}|\text{feature 1})$ +i.e. estimate $p(y|x)$ +(y given that x) + +![[Pasted image 20251025161239.png]] +The probability that something is blue given the feature (estimation) + +So in this case, if we wanted to classify new objects, we would set the decision boundary where the chances are equal, and then have one class be on the side where it is more likely, and the opposite for the other. + +The decision boundary therefore is at $p(y_1|x) = p(y_2|x)$ +(The label of the class goes with the largest posterior probability) + +### Description of a Classifier +1. If $p(y_1|x) > p(y_2|x)$ then assign to $y_1$, otherwise $y_2$ +2. If $p(y_1|x) - p(y_2|x) > 0$ then assign to $y_1$, otherwise $y_2$ +3. $\frac{p(y_1|x)}{p(y_2|x)} > 1$ +4. $\log(p(y_1|x)) - \log(p(y_2|x)) > 0$ (NOTE: not quite sure why we do the log here i am gonna be honest with you) + +### Bayes' theorem + +How do we find the class posterior? +Sometimes function form of the class distributions can be assumed (i.e. it's a function of some kind) +$$p(y|x)=\frac{p(x|y)\space p(y)}{p(x)}$$ +class (conditional) distribution $p(x|y)$ +class prior $p(y)$ +(unconditional) data distribution $p(x)$ +posterior probability as the answer + +![[Pasted image 20251025162205.png]] + +(how these values would relate in an example) + +okay that is all good and nice but how do we obtain $p(x|y_c)$? +- typically you assume a model +- estimate the parameters well such that example objects fit + - max. likelihood estimators + +(there are also other approaches but that comes later) + +(don't forget that X here is a feature (or multiple I guess) and Y is the class we want to estimate) + +### Error of type I and type II + +For a two-class classification problem, the following two errors are defined: + +(R2 is a label of a class) +- Type I error: $\epsilon_1=\int_{R_2}p(x|y_1)dx$ +- Type II error: $\epsilon_2=\int_{R_1}p(x|y_2)dx$ + +When we call $y_1$ the negative class and $y_2$ the positive class then $\epsilon_1$ is the false positive fraction and $\epsilon_2$ is the false negative fraction. + +![[Pasted image 20251025163915.png]] + +### Classification Error + +The error: $p(\text{error})=\sum^{C}_{i=1}p(\text{error}|y_i)p(y_i)$ + +### Bayes' Error +is the minimum error and typically more than 0. + +![[Pasted image 20251025164452.png]] + +this error does not depend on the classification you apply but on the distribution of the data + +In general you can not compute the Bayes' erro: +- You dont know the true class conditional probabilities $p(x|y)$ +- the dimensional integrals are very complicated + + +Misclassification Costs: Obviously sometimes making mistakes one way is more dangerous than another way. +We can introduce a loss that measures the cost of assigning an object that came from one class to another class: $\lambda_{ji}$ (came from j, assigned to i) + +Assume a labeled dataset $D = \{(x_i, y_i)\}^N_{i=1}$ +Also assume that these objects are classified by a classifier and the estimated class labels are $\hat{y}_i$ +Then the total empirical risk: +$$R=\frac{1}{N}\sum^{N}_{i=1}\lambda_{y_i, \hat{y}_i}$$ +This can basically be understood as: the average missclassification cost no? + +### Conditional and Total Risk +The conditional risk of assigning object x (OK SO X IS AN OBJECT IN ALL OF THESE PEOPLE NOW MAKE SENSE OF THAT, holy shit i am stupid) to class $y_i$: +$$l^i(x)=\sum^C_{j=1}\lambda_{ji}\space p(y_j|x)$$ +Sum of all misclassification costs times each of the corresponding posterior probabilites (chance that something is that class given the object (features)) + +The average risk over a region: +$$ +\begin{align} +r^i=\int_{R_i}l^i(x)\space p(x)\space dx \\ += \int_{R_i}\sum^C_{j=1}\lambda_{ji}\space p(y_j|x)\space p(x)\space dx +\end{align}$$ + +Overall risk is therefore: +$$r=\sum^C_{i=1}r^i$$ +(sum of all the average risks over a region) + +### Min Total Risk +We min. the risk when the regions ($R_i$) are chosen such that each of the integrals is as small as possible + +So make x part of $R_i$ if: +$$\sum^C_{j=1}\lambda_{ji}\space p(y_j|x) \leq \sum^C_{j=1}\lambda_{jk}\space p(y_j|x)$$ +where k = 1, ..., C + +if I has the lowest conditional risk is the way im reading that right now?, i.e. if being part of that class is the lowest conditional risk + +### Minimum Total risk with two classes +When you predict class $y_i$ for an object that is in that class you would say that the misclassification cost is 0 + +For two classes therefore all you have to do is compare the missclassifcation cost of classifying it as the other class +i.e. +$\lambda_{21}\space p(y_2|x)$ and $\lambda_{12}\space p(y_1|x)$ +(missclassification error loss times the prob its in that class (posterior prob.)) +