Technical Digital Garden

Bits

A collection of atomic notes, code snippets, and technical 'cheats' I’ve gathered over the years. These are unpolished references intended for quick utility rather than narrative reading.

Quick-fire references

Scan the grid, filter by utility tags, and grab the snippet you need without diving into long-form posts.

Algorithms

Calculating Average Precision (AP) without Sklearn

Algorithms
import numpy as np

def calculate_ap(recalls, precisions):
    # Ensure monotonic decreasing precision (11-point or all-point interpolation)
    m_rec = np.concatenate(([0.0], recalls, [1.0]))
    m_pre = np.concatenate(([0.0], precisions, [0.0]))

    for i in range(len(m_pre) - 1, 0, -1):
        m_pre[i - 1] = np.maximum(m_pre[i - 1], m_pre[i])

    # Area under the curve via trapezoidal integration
    indices = np.where(m_rec[1:] != m_rec[:-1])[0]
    ap = np.sum((m_rec[indices + 1] - m_rec[indices]) * m_pre[indices + 1])
    return ap

Why it matters: Object detection and retrieval metrics break when you only eyeball curves. Manual AP keeps leaderboard numbers reproducible.

Data

Bessel's Correction in Variance Calculation

Data
import numpy as np

data = [10, 12, 23, 23, 16, 23, 21, 16]

# Population Variance (N)
pop_var = np.var(data)

# Sample Variance (N-1) - The "Unbiased" Estimator
sample_var = np.var(data, ddof=1)

Why it matters: For small samples, dividing by N underestimates the population variance. ddof=1 keeps statistical reporting honest.

Algorithms

Representative Centroid Selection for Long-Context RAG

Algorithms
from sklearn.cluster import KMeans
import numpy as np

def get_representative_embeddings(embeddings, k=5):
    # Instead of taking the top-K similar, take the K most diverse centroids
    kmeans = KMeans(n_clusters=k, init='k-means++', n_init=10)
    kmeans.fit(embeddings)
    # Find the actual vectors closest to these centroids
    return kmeans.cluster_centers_

Why it matters: Mitigates β€œlost in the middle” issues in RAG by feeding the model diverse context instead of redundant snippets.

Deep Learning

The Log-Sum-Exp Trick for Softmax

Deep Learning
import numpy as np

def log_sum_exp(x):
    # Subtracting the max prevents overflow when exponentiating large numbers
    c = np.max(x)
    return c + np.log(np.sum(np.exp(x - c)))

def stable_softmax(x):
    return np.exp(x - log_sum_exp(x))

Why it matters: The log-sum-exp pattern prevents NaN or Inf when logits are large, keeping gradients finite during backprop.

Code

Vectorized Covariance Matrix Calculation

Code
import numpy as np

def fast_covariance(X):
    # X is an (n_samples, n_features) matrix
    n = X.shape[0]
    X_centered = X - X.mean(axis=0)
    # Using the dot product is significantly faster than np.cov for large matrices
    return (X_centered.T @ X_centered) / (n - 1)

Why it matters: Center once, multiply once. Large feature banks compute faster when you skip Python loops and lean on vectorized math.

Deep Learning

Attention Mechanism

Deep Learning

Attention computes a weighted sum of values based on query-key similarity:

Attention(Q,K,V)=softmax(QKTdk)V\text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V

The scaling factor dk\sqrt{d_k} prevents the dot products from growing too large, which would push softmax into regions with extremely small gradients.

Math

Softmax Function

Math

Softmax converts a vector of real numbers into a probability distribution:

softmax(xi)=exiβˆ‘jexj\text{softmax}(x_i) = \frac{e^{x_i}}{\sum_{j} e^{x_j}}

Properties:

  • Output sums to 1
  • All values are positive
  • Preserves relative ordering
  • Temperature parameter Ο„\tau controls sharpness: exi/Ο„βˆ‘jexj/Ο„\frac{e^{x_i/\tau}}{\sum_j e^{x_j/\tau}}
Code

Python @dataclass

Code

The @dataclass decorator auto-generates __init__, __repr__, and __eq__:

from dataclasses import dataclass

@dataclass
class Point:
    x: float
    y: float
    label: str = "origin"

Useful options:

  • frozen=True - immutable instances
  • order=True - enables comparison operators
  • slots=True - use __slots__ for memory efficiency
Deep Learning

Transformer Architecture

Deep Learning

The Transformer consists of stacked encoder/decoder blocks:

Encoder block:

  1. Multi-head self-attention
  2. Add & Norm (residual connection)
  3. Feed-forward network
  4. Add & Norm

Key innovations:

  • Positional encoding (no recurrence)
  • Multi-head attention (parallel attention)
  • Layer normalization
  • Residual connections throughout
ML Theory

Gradient Descent

ML Theory

Gradient descent updates parameters to minimize a loss function:

ΞΈt+1=ΞΈtβˆ’Ξ·βˆ‡ΞΈL(ΞΈt)\theta_{t+1} = \theta_t - \eta \nabla_\theta L(\theta_t)

Variants:

  • Batch GD: Uses all data (stable but slow)
  • SGD: Uses one sample (noisy but fast)
  • Mini-batch: Uses subset (balanced)

Learning rate Ξ·\eta controls step size - too high causes divergence, too low causes slow convergence.

Tools

Docker Basics

Tools

Essential Docker commands:

# Build image
docker build -t myapp .

# Run container
docker run -d -p 8080:80 myapp

# List running containers
docker ps

# Stop container
docker stop <container_id>

Dockerfile basics:

  • FROM - base image
  • COPY - add files
  • RUN - execute commands
  • CMD - default command
Algorithms

Binary Search

Algorithms

Binary search finds an element in a sorted array in O(log n):

def binary_search(arr, target):
    left, right = 0, len(arr) - 1
    while left <= right:
        mid = (left + right) // 2
        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            left = mid + 1
        else:
            right = mid - 1
    return -1

Tip: Use left + (right - left) // 2 to avoid integer overflow.

Data

Data Normalization

Data

Common normalization techniques:

Min-Max Scaling (range [0,1]): xβ€²=xβˆ’xminxmaxβˆ’xminx' = \frac{x - x_{min}}{x_{max} - x_{min}}

Z-Score Standardization (mean=0, std=1): xβ€²=xβˆ’ΞΌΟƒx' = \frac{x - \mu}{\sigma}

When to use:

  • Min-Max: bounded features, neural networks
  • Z-Score: Gaussian-like data, SVMs, linear regression
  • Robust scaling: data with outliers