Building an automated missile interception algorithm (ongoing)

This post is intentionally dense: lists, blockquotes, math, tables, images with captions, and footnotes that link out (and even sneak in a line of code). If the layout survives this, it survives anything.1

Overview

The project is a 3D-ish intercept simulation: incoming threats move along ballistic-ish arcs; a defender learns when to launch and how to steer mid-course corrections so the interceptor meets the threat. Think less “production missile defense” and more “gym environment where I can iterate on reward shaping without filing export paperwork.”2

Unordered laundry list of what actually exists in the repo today:

  • Simulation core: discrete time steps, configurable gravity, drag hacks3, noisy sensors.
  • Policy: started with a hand-tuned PD-ish baseline, moving toward PPO/SAC-style updates (see log).
  • Visualization: WebGL preview + offline GIF exports for debugging angles and bone rig stupidity.

What “done” means (for now)

  1. Reproducible training runs with a fixed seed.
  2. Intercept rate above random on a held-out threat distribution.
  3. A single command to record a rollout GIF for the README.
  4. Stop lying to myself in the README about how finished any of this is.

Note: The hardest part hasn’t been neural networks. It’s time. When your simulation step isn’t aligned with how you log events, you get “ghost” intercepts that never happened — only your instrumentation thinks they did.

Nested blockquote, because why not:

Early prototype mantra:

Make it run. Make it right. Make it fast.
(I’m still somewhere between one and two.)

Log and Notes

A longer, messier stream of thoughts + experiments lives in the dedicated notes page: missile interception — log + notes. Below is the “executive summary” version with figures.

Still from the intercept visualization — threats, defender, and debug vectors Prototype view: intercept geometry + debug vectors (colors are meaningless; joy is real).

RL training GIF — policy fiddling with bone / angle targets Episode clip: angle / “bone” targets moving as the policy figures out it should not yeet the interceptor into the floor.

Earlier episode capture Older rollout export — kept around as a reminder that “bad but moving” beats “perfect but never run.”

Vocabulary (definition list)

Kramdown-style definitions — handy for glossaries:

RL loop
Agent observes state \(s_t\), emits action \(a_t\), environment returns \(s_{t+1}\) and reward \(r_t\). Rinse4.
Interceptor
The controllable object trying to meet the threat; not assumed to have unlimited lateral acceleration.
Threat
Anything you want to not reach the protected volume — modeled as a point mass first, fancier later.

Simulation sketch (code)

Python — environment step (toy)

@dataclass
class Vec3:
    x: float
    y: float
    z: float

    def __add__(self, o: "Vec3") -> "Vec3":
        return Vec3(self.x + o.x, self.y + o.y, self.z + o.z)

    def scale(self, s: float) -> "Vec3":
        return Vec3(self.x * s, self.y * s, self.z * s)

def step(pos: Vec3, vel: Vec3, acc: Vec3, dt: float) -> tuple[Vec3, Vec3]:
    """Semi-implicit Euler — good enough until energy blows up."""
    vel_next = vel + acc.scale(dt)
    pos_next = pos + vel_next.scale(dt)
    return pos_next, vel_next

Bash — one-liner to grep my own chaos

rg -n "TODO|FIXME|WTF" src/ notes/ --glob '!_site/**'

JavaScript — time slider mental model (what broke my brain)

// Discrete ticks; "continuous" feel is just small dt + good interpolation.
function clamp(x, lo, hi) {
  return Math.min(hi, Math.max(lo, x));
}

export function advanceTime(t, dt, tMax) {
  return clamp(t + dt, 0, tMax);
}

GLSL — fake “heat” in the debug view (fragment idea)

precision mediump float;
varying vec2 vUv;
uniform float uTime;
uniform vec2 uThreatPos;
uniform vec2 uInterceptPos;

void main() {
  float dThreat = distance(vUv, uThreatPos);
  float dIntercept = distance(vUv, uInterceptPos);
  float heat = exp(-12.0 * dThreat) + 0.6 * exp(-10.0 * dIntercept);
  vec3 col = mix(vec3(0.05, 0.07, 0.12), vec3(1.0, 0.35, 0.1), heat);
  gl_FragColor = vec4(col, 1.0);
}

JSON — config fragment

{
  "simulation": {
    "dt": 0.02,
    "max_episode_seconds": 45,
    "integrator": "semi_implicit_euler"
  },
  "rewards": {
    "intercept_bonus": 100.0,
    "distance_shaping": true
  }
}

Reward hacking (ordered list of failure modes)

  1. Agent learns to stall just inside the success radius without intercepting — looks good in logs, is cowardice.
  2. Huge terminal reward causes value explosions; advantage estimates go brrr.
  3. Shaped distance rewards fight terminal sparse rewards unless you schedule curriculum.
  4. You “fix” (3) by adding more terms until the reward is a Christmas tree and nobody knows what’s being optimized.

Table: what I thought vs what the metrics said

PhaseWhat I believedWhat eval showed
Week 1“Distance shaping will help exploration”Learned to orbit
Week 2“Bigger intercept bonus fixes orbit”Learned to slam into terrain
Week 3“Penalize crash”Learned to do nothing
Week 4“Tune penalties”Finally something like intercept

Reward shaping sketch (don’t copy-paste blindly):5

Table stress tests (long cells, code, TeX, dollars)

SymptomLikely causeQuick check
Loss goes nan after ~2k stepsBad learning rate or log(0) in policytorch.isfinite(loss).all(); clamp logits
Interceptor orbits foreverDistance-only shaping with no terminal intercept termInspect \(\mathbb{E}[r_T]\) vs shaped \(\sum \gamma^k r_k\)
Policy ignores threatObservation normalization off or wrong framePrint obs.min(), obs.max() each eval

Footer: three-column layout with inline code and inline math \(\mathbb{E}[\cdot]\).

Budget line itemAmountNotes
GPU hours$12/hr spot × 40hEscaped dollars for spreadsheet brain
Coffee$4.50 × 2/day × 30Same — \$ not math
Emotional damagePricelessLong text cell: this row exists to see whether a joking label plus a medium-length explanation still wraps cleanly when the table is full-width and the type is Crimson Pro at ~1.1rem. If anything clips or the baseline looks wrong, that’s a signal to tweak td padding or vertical-align.

Footer: currency + a deliberately verbose third column.

IntegratorUpdateStability
Eulerv += a*dt; x += v*dtMeh
Semi-implicitv += a*dt; x += v*dt (order swap)Better energy
RK4four k stagesOverkill for this prototype

Counted block: first table should get “Table 1 —” if your CSS counter is active.

ConstraintExpressionCode
Speed cap\(\|v\| \leq v_{\max}\)v = v * min(1.0, v_max / (torch.norm(v)+1e-8))
No-flyaltitude \(h \geq h_{\min}\)penalty = torch.relu(h_min - h) ** 2

Second table in same block — should read “Table 2 —”.

Extreme table torture

Pure layout stress: horizontal scroll, vertical scroll, dense grids, mixed modalities. If anything looks off, tweak .page-content table in fugg2.css — not your prose.

Absurdly wide grid (12 columns × 17 data rows)

C0C1C2C3C4C5C6C7C8C9C10C11
row_00(x_{0,1})np.float32v0_3v0_4pow(2,0)v0_6v0_7v0_8v0_9v0_10v0_11
row_01(x_{1,1})torch.Tensorv1_3v1_4v1_5v1_6v1_7v1_8v1_9v1_10v1_11
row_02(x_{2,1})torch.Tensorv2_3v2_4v2_5v2_6v2_7v2_8v2_9v2_10v2_11
row_03(x_{3,1})np.float32v3_3v3_4v3_5v3_6v3_7v3_8v3_9v3_10v3_11
row_04(x_{4,1})torch.Tensorv4_3v4_4pow(2,4)v4_6v4_7v4_8v4_9v4_10v4_11
row_05(x_{5,1})torch.Tensorv5_3v5_4v5_5v5_6v5_7v5_8v5_9v5_10v5_11
row_06(x_{6,1})np.float32v6_3v6_4v6_5v6_6v6_7v6_8v6_9v6_10v6_11
row_07(x_{7,1})torch.Tensorv7_3v7_4v7_5v7_6v7_7v7_8v7_9v7_10Mega-cell: word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word …
row_08(x_{8,1})torch.Tensorv8_3v8_4pow(2,0)v8_6v8_7v8_8v8_9v8_10v8_11
row_09(x_{9,1})np.float32v9_3v9_4v9_5v9_6v9_7v9_8v9_9v9_10v9_11
row_10(x_{10,1})torch.Tensorv10_3v10_4v10_5v10_6v10_7v10_8v10_9v10_10v10_11
row_11(x_{11,1})torch.Tensorv11_3v11_4v11_5v11_6v11_7v11_8v11_9v11_10v11_11
row_12(x_{12,1})np.float32v12_3v12_4pow(2,4)v12_6v12_7v12_8v12_9v12_10v12_11
row_13(x_{13,1})torch.Tensorv13_3v13_4v13_5v13_6v13_7v13_8v13_9v13_10v13_11
row_14(x_{14,1})torch.Tensorv14_3v14_4v14_5v14_6v14_7v14_8v14_9v14_10v14_11
row_15(x_{15,1})np.float32v15_3v15_4v15_5v15_6v15_7v15_8v15_9v15_10v15_11
row_16(x_{16,1})torch.Tensorv16_3v16_4pow(2,0)v16_6v16_7v16_8v16_9v16_10v16_11

Wide grid footer — check overflow-x, cell padding, and last-column long text.

Absurdly tall runbook (4 columns × 54 data rows)

#StatusCommand / tokenNote
0OKtorchrun --nproc_per_node=2 train.py --seed 0( \sum_{k=1}^{n} k = n(n+1)/2 ) for sanity check 0
1WARNpytest -q tests/test_001.pytick 1
2FAILpytest -q tests/test_002.pytick 2
3SKIPpytest -q tests/test_003.pytick 3
4OKpytest -q tests/test_004.pytick 4
5WARNtorchrun --nproc_per_node=2 train.py --seed 5tick 5
6FAILpytest -q tests/test_006.pytick 6
7SKIPpytest -q tests/test_007.pytick 7
8OKpytest -q tests/test_008.pytick 8
9WARNpytest -q tests/test_009.pytick 9
10FAILtorchrun --nproc_per_node=2 train.py --seed 10tick 10
11SKIPpytest -q tests/test_011.py( \sum_{k=1}^{n} k = n(n+1)/2 ) for sanity check 11
12OKpytest -q tests/test_012.pytick 12
13WARNpytest -q tests/test_013.pytick 13
14FAILpytest -q tests/test_014.pytick 14
15SKIPtorchrun --nproc_per_node=2 train.py --seed 15tick 15
16OKpytest -q tests/test_016.pytick 16
17WARNpytest -q tests/test_017.pytick 17
18FAILpytest -q tests/test_018.pytick 18
19SKIPpytest -q tests/test_019.pytick 19
20OKtorchrun --nproc_per_node=2 train.py --seed 20tick 20
21WARNpytest -q tests/test_021.pytick 21
22FAILpytest -q tests/test_022.py( \sum_{k=1}^{n} k = n(n+1)/2 ) for sanity check 22
23SKIPpytest -q tests/test_023.pyLong note: lorem ipsum dolor sit amet, lorem ipsum dolor sit amet, lorem ipsum dolor sit amet, lorem ipsum dolor sit amet, lorem ipsum dolor sit amet, lorem ipsum dolor sit amet, lorem ipsum dolor sit amet, lorem ipsum dolor sit amet, lorem ipsum dolor sit amet, lorem ipsum dolor sit amet, lorem ipsum dolor sit amet, lorem ipsum dolor sit amet, end.
24OKpytest -q tests/test_024.pytick 24
25WARNtorchrun --nproc_per_node=2 train.py --seed 25tick 25
26FAILpytest -q tests/test_026.pytick 26
27SKIPpytest -q tests/test_027.pytick 27
28OKpytest -q tests/test_028.pytick 28
29WARNpytest -q tests/test_029.pytick 29
30FAILtorchrun --nproc_per_node=2 train.py --seed 30tick 30
31SKIPpytest -q tests/test_031.pytick 31
32OKpytest -q tests/test_032.pytick 32
33WARNpytest -q tests/test_033.py( \sum_{k=1}^{n} k = n(n+1)/2 ) for sanity check 33
34FAILpytest -q tests/test_034.pytick 34
35SKIPtorchrun --nproc_per_node=2 train.py --seed 35tick 35
36OKpytest -q tests/test_036.pytick 36
37WARNpytest -q tests/test_037.pytick 37
38FAILpytest -q tests/test_038.pytick 38
39SKIPpytest -q tests/test_039.pytick 39
40OKtorchrun --nproc_per_node=2 train.py --seed 40tick 40
41WARNpytest -q tests/test_041.pytick 41
42FAILpytest -q tests/test_042.pytick 42
43SKIPpytest -q tests/test_043.pytick 43
44OKpytest -q tests/test_044.py( \sum_{k=1}^{n} k = n(n+1)/2 ) for sanity check 44
45WARNtorchrun --nproc_per_node=2 train.py --seed 45tick 45
46FAILpytest -q tests/test_046.pytick 46
47SKIPpytest -q tests/test_047.pytick 47
48OKpytest -q tests/test_048.pytick 48
49WARNpytest -q tests/test_049.pytick 49
50FAILtorchrun --nproc_per_node=2 train.py --seed 50tick 50
51SKIPpytest -q tests/test_051.pytick 51
52OKpytest -q tests/test_052.pytick 52
53WARNpytest -q tests/test_053.pytick 53

Tall runbook footer — row hover, alternating statuses, one obese note row.

Dense numeric matrix (16 × 16, every cell filled)

 0123456789101112131415
001734516885102119136153170187204221238255
13148658299116133150167184201218235252269286
2627996113130147164181198215232249266283300317
393110127144161178195212229246263280297314331348
4124141158175192209226243260277294311328345362379
5155172189206223240257274291308325342359376393410
6186203220237254271288305322339356373390407424441
7217234251268285302319336353370387404421438455472
8248265282299316333350367384401418435452469486503
9279296313330347364381398415432449466483500517534
10310327344361378395412429446463480497514531548565
11341358375392409426443460477494511528545562579596
12372389406423440457474491508525542559576593610627
13403420437454471488505522539556573590607624641658
14434451468485502519536553570587604621638655672689
15465482499516533550567584601618635652669686703720

Dense matrix footer — typography at small sizes, column alignment, thead vs tbody.

Mixed chaos (empty cells, long strings — pipes only as delimiters)

ABC
 single-space(empty first column)
pipe-in-code-okgrep a\\|b fileor-op in regex
U+1F680Not an emoji font testjust characters
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxshortmid
shortyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyytail

Chaos footer — empty cells, code with pipe characters inside backticks, ultra-long strings.

Screenshot archaeology

Sometimes the most “real” artifact is a random screengrab from a late night:

Debug screenshot — UI spaghetti era Exported screenshot from an older debug UI — buttons everywhere, soul nowhere.


Status: ongoing. If you’re reading this in the future and the project is still “ongoing,” then: same.

  1. Meta-footnote: this paragraph exists to stress-test footnotes-to-gutter.js, sidenotes, and long footnote bodies. Inline check: python -m pip install -e . 

  2. Not a weapons system. Toy research code. If you need real intercept math, start with proportional navigation and a proper aerospace textbook — e.g. Zarchan, Tactical and Strategic Missile Guidance

  3. Drag is not “real” yet — it is a scalar fudge you tune until trajectories look plausible: drag_coeff * velocity ** 2 with clamps. 

  4. If you want the formalism: MDP. Implementation detail: my state is not Markov yet — there’s hidden history in the integrator unless I augment the observation. 

  5. \(r_t = -\alpha \|p_{\text{int}} - p_{\text{threat}}\|2 + \beta \mathbb{1}{\text{hit}}\). In code: reward = -alpha * dist + beta * float(hit). Compare with Hugging Face RL notes for the bigger picture.