Technical Digital Garden
Bits
A collection of atomic notes, code snippets, and technical 'cheats' Iβve gathered over the years. These are unpolished references intended for quick utility rather than narrative reading.
Quick-fire references
Scan the grid, filter by utility tags, and grab the snippet you need without diving into long-form posts.
Calculating Average Precision (AP) without Sklearn
import numpy as np
def calculate_ap(recalls, precisions):
# Ensure monotonic decreasing precision (11-point or all-point interpolation)
m_rec = np.concatenate(([0.0], recalls, [1.0]))
m_pre = np.concatenate(([0.0], precisions, [0.0]))
for i in range(len(m_pre) - 1, 0, -1):
m_pre[i - 1] = np.maximum(m_pre[i - 1], m_pre[i])
# Area under the curve via trapezoidal integration
indices = np.where(m_rec[1:] != m_rec[:-1])[0]
ap = np.sum((m_rec[indices + 1] - m_rec[indices]) * m_pre[indices + 1])
return ap
Why it matters: Object detection and retrieval metrics break when you only eyeball curves. Manual AP keeps leaderboard numbers reproducible.
Bessel's Correction in Variance Calculation
import numpy as np
data = [10, 12, 23, 23, 16, 23, 21, 16]
# Population Variance (N)
pop_var = np.var(data)
# Sample Variance (N-1) - The "Unbiased" Estimator
sample_var = np.var(data, ddof=1)
Why it matters: For small samples, dividing by N underestimates the population variance. ddof=1 keeps statistical reporting honest.
Representative Centroid Selection for Long-Context RAG
from sklearn.cluster import KMeans
import numpy as np
def get_representative_embeddings(embeddings, k=5):
# Instead of taking the top-K similar, take the K most diverse centroids
kmeans = KMeans(n_clusters=k, init='k-means++', n_init=10)
kmeans.fit(embeddings)
# Find the actual vectors closest to these centroids
return kmeans.cluster_centers_
Why it matters: Mitigates βlost in the middleβ issues in RAG by feeding the model diverse context instead of redundant snippets.
The Log-Sum-Exp Trick for Softmax
import numpy as np
def log_sum_exp(x):
# Subtracting the max prevents overflow when exponentiating large numbers
c = np.max(x)
return c + np.log(np.sum(np.exp(x - c)))
def stable_softmax(x):
return np.exp(x - log_sum_exp(x))
Why it matters: The log-sum-exp pattern prevents NaN or Inf when logits are large, keeping gradients finite during backprop.
Vectorized Covariance Matrix Calculation
import numpy as np
def fast_covariance(X):
# X is an (n_samples, n_features) matrix
n = X.shape[0]
X_centered = X - X.mean(axis=0)
# Using the dot product is significantly faster than np.cov for large matrices
return (X_centered.T @ X_centered) / (n - 1)
Why it matters: Center once, multiply once. Large feature banks compute faster when you skip Python loops and lean on vectorized math.
Attention Mechanism
Attention computes a weighted sum of values based on query-key similarity:
The scaling factor prevents the dot products from growing too large, which would push softmax into regions with extremely small gradients.
Softmax Function
Softmax converts a vector of real numbers into a probability distribution:
Properties:
- Output sums to 1
- All values are positive
- Preserves relative ordering
- Temperature parameter controls sharpness:
Python @dataclass
The @dataclass decorator auto-generates __init__, __repr__, and __eq__:
from dataclasses import dataclass
@dataclass
class Point:
x: float
y: float
label: str = "origin"
Useful options:
frozen=True- immutable instancesorder=True- enables comparison operatorsslots=True- use__slots__for memory efficiency
Transformer Architecture
The Transformer consists of stacked encoder/decoder blocks:
Encoder block:
- Multi-head self-attention
- Add & Norm (residual connection)
- Feed-forward network
- Add & Norm
Key innovations:
- Positional encoding (no recurrence)
- Multi-head attention (parallel attention)
- Layer normalization
- Residual connections throughout
Gradient Descent
Gradient descent updates parameters to minimize a loss function:
Variants:
- Batch GD: Uses all data (stable but slow)
- SGD: Uses one sample (noisy but fast)
- Mini-batch: Uses subset (balanced)
Learning rate controls step size - too high causes divergence, too low causes slow convergence.
Docker Basics
Essential Docker commands:
# Build image
docker build -t myapp .
# Run container
docker run -d -p 8080:80 myapp
# List running containers
docker ps
# Stop container
docker stop <container_id>
Dockerfile basics:
FROM- base imageCOPY- add filesRUN- execute commandsCMD- default command
Binary Search
Binary search finds an element in a sorted array in O(log n):
def binary_search(arr, target):
left, right = 0, len(arr) - 1
while left <= right:
mid = (left + right) // 2
if arr[mid] == target:
return mid
elif arr[mid] < target:
left = mid + 1
else:
right = mid - 1
return -1
Tip: Use left + (right - left) // 2 to avoid integer overflow.
Data Normalization
Common normalization techniques:
Min-Max Scaling (range [0,1]):
Z-Score Standardization (mean=0, std=1):
When to use:
- Min-Max: bounded features, neural networks
- Z-Score: Gaussian-like data, SVMs, linear regression
- Robust scaling: data with outliers
No bits matched your filters. Try a different keyword or category.