Quick Knowledge
Bits & Flashcards
Small, digestible pieces of knowledge - concepts, formulas, code patterns, and tools that I find useful and worth remembering.
Click any card to flip it and reveal the details. Think of it as my digital flashcard collection for ML, math, and engineering.
Knowledge Cards
Filter by category or search to find specific concepts, patterns, or tools.
Attention Mechanism
Attention computes a weighted sum of values based on query-key similarity:
The scaling factor prevents the dot products from growing too large, which would push softmax into regions with extremely small gradients.
Softmax Function
Softmax converts a vector of real numbers into a probability distribution:
Properties:
- Output sums to 1
- All values are positive
- Preserves relative ordering
- Temperature parameter controls sharpness:
Python @dataclass
The @dataclass decorator auto-generates __init__, __repr__, and __eq__:
from dataclasses import dataclass
@dataclass
class Point:
x: float
y: float
label: str = "origin"
Useful options:
frozen=True- immutable instancesorder=True- enables comparison operatorsslots=True- use__slots__for memory efficiency
Transformer Architecture
The Transformer consists of stacked encoder/decoder blocks:
Encoder block:
- Multi-head self-attention
- Add & Norm (residual connection)
- Feed-forward network
- Add & Norm
Key innovations:
- Positional encoding (no recurrence)
- Multi-head attention (parallel attention)
- Layer normalization
- Residual connections throughout
Gradient Descent
Gradient descent updates parameters to minimize a loss function:
Variants:
- Batch GD: Uses all data (stable but slow)
- SGD: Uses one sample (noisy but fast)
- Mini-batch: Uses subset (balanced)
Learning rate controls step size - too high causes divergence, too low causes slow convergence.
Docker Basics
Essential Docker commands:
# Build image
docker build -t myapp .
# Run container
docker run -d -p 8080:80 myapp
# List running containers
docker ps
# Stop container
docker stop <container_id>
Dockerfile basics:
FROM- base imageCOPY- add filesRUN- execute commandsCMD- default command
Binary Search
Binary search finds an element in a sorted array in O(log n):
def binary_search(arr, target):
left, right = 0, len(arr) - 1
while left <= right:
mid = (left + right) // 2
if arr[mid] == target:
return mid
elif arr[mid] < target:
left = mid + 1
else:
right = mid - 1
return -1
Tip: Use left + (right - left) // 2 to avoid integer overflow.
Data Normalization
Common normalization techniques:
Min-Max Scaling (range [0,1]):
Z-Score Standardization (mean=0, std=1):
When to use:
- Min-Max: bounded features, neural networks
- Z-Score: Gaussian-like data, SVMs, linear regression
- Robust scaling: data with outliers
No bits matched your filters. Try a different keyword or category.