Loss Spikes in Gradient Descent
Loss spikes aren’t noise. They’re gradient descent briefly exceeding the edge of stability and snapping back. Here’s why.
Loss spikes aren’t noise. They’re gradient descent briefly exceeding the edge of stability and snapping back. Here’s why.
A ground-up look at how floating-point and fixed-point numbers are represented and how arithmetic works for each
L1 regression is more robust to outliers than least squares, but harder to solve. We walk through four algorithms, each addressing a limitation of the previo...
Golden section search reuses objective evaluations to efficiently minimize 1D functions. Learn how this classical algorithm connects to the golden ratio and ...
Detect nonlinear relationships that Pearson and Spearman miss.
Interpolate equally-spaced data efficiently and discover its connection to Taylor series.
Compute logarithms and exponentials without a floating point unit.
Compute sine, cosine, and exponentials using only addition, subtraction, and bit shifts.
Fit rational functions to data with poles and discontinuities where polynomials fail.
Tackle tricky integrals with endpoint singularities using a clever variable transformation.