links - IntML
Info

  • exam
    • multiple choice, fully on paper
  • projects
    • 3/4 have to "pass" to be able to do the exam
    • 1min video explaining solution for each project needed, uploaded to polybox
  • practice classes

Hilfsmittel IntroductionToMachineLearning

  • 2 A4 pages (1 sheet) handwritten / >11pt
  • calculator

Exercises

-> FS2026_tasks

Projects

Vorlesung

#timestamp 2026-02-27

#todo completely useless - delete or rewrite

We have:

w^=ws+wpwsspan(X)=span(XX)

normal equation:

XXω=Xyω=(XX)Xy=Xy

#timestamp 2026-03-03

Variance might not be only because of noise, but also because there are not enough samples.

overfitting =^ too sensitive to noise

#timestamp 2026-03-04

-> increasing model complexity beyond point where training error =0, can lead to second decrease in generalization error (more than just u-curve)

splitting data

control model complexity
Complex model (e.g. high-degree polynomial) might make weights large / osscilate. To counteract:

  1. Smaller degree m
  2. Smaller number of monomials “active” by limiting l1-norm
  3. Limit l2 norm

Lasso Regression (1 Regularization)
Adds a penalty proportional to the absolute value of the coefficients.

w^lasso=argminw||yΦw||22+λ||w||1

Ridge Regression (2 Regularization)
Adds a penalty proportional to the square of the magnitude of coefficients.

w^ridge=argminw||yΦw||22+λ||w||22

#todo why does lasso induce sparsity?

Feature Ridge (ℓ2​) Lasso (ℓ1​)
Penalty λwj2 $\lambda \sum
Solution Closed-form Numerical optimization
Sparsity No (coefficients 0) Yes (coefficients =0)
Use Case Preventing overfitting Feature selection

#timestamp 2026-03-20

Pasted image 20260320143223.png

check whether kernel is valid:

  1. check symmetry
  2. decompose into known valid kernels (using linearity)
  3. use closure rule (sum/product/scaling/composition)
  4. if unsure, build a small kernel matrix counterexample

10. Neural networks

#timestamp 2026-03-24

computation at each layer consists of three parts:

forward propagation

h(0)=x z(l)=W(l)h(l1)+b(l)l={1,,L1}h(l)=ϕ(z(l)) f=W(L)h(L1)+b(L)

=> for a multilayer perceptron with a single hidden layer:

y=f(x,w,θ)=j=1pwj(2)φ(wj(1)x+wj,0(1))

backwards propagation:
https://xnought.github.io/backprop-explainer/