100 Days

Wrapped Phase 1 of the local AI experiment. The 100-day deliverable on model selection landed earlier than expected, with a meaningful course correction from initial assumptions.

What we did

Spent four sessions running time-series foundation models against real BESS telemetry from the shop system. Pulled 14 days of battery power data, plus a separate slice from a period when the battery actually cycled. Ran the same evaluation methodology across five models — Chronos-T5-Tiny, T5-Small, Bolt-Small, TimesFM-200M, and Chronos-2 — on quiet data and on cycling data, with persistence as the baseline.

What we found

The headline: Chronos-Bolt-Small initially looked like the winner — fastest inference, best accuracy in early tests. Picked it as the foundation.

Then we validated the assumption that made it the foundation. Bolt is univariate-only. Its API doesn't accept covariates at all. The thing we needed it to do for production didn't exist.

Pivoted to Chronos-2, which has a structured covariate API and built-in LoRA fine-tuning. Validated on cycling data with voltage and time-of-day as covariates: 26% MAE reduction vs univariate, beating the persistence baseline that neither univariate model could touch.

Why it matters

The deeper finding: pure univariate forecasting can't predict event-driven BESS behavior. The actual drivers are external — solar, customer load, grid signals. The model captures statistical patterns of the past but can't predict the events that will define the future. Multivariate with exogenous inputs is the only path that actually works.

This isn't a Phase 1 failure. It's the architectural answer for what production needs to look like, backed by real experimental evidence on our data.

Where we go next

Phase 2 — fine-tuning Chronos-2 on Alchemy's data. Two things to evaluate across: quiet regimes (battery parked) and cycling regimes (active charge/discharge). Fine-tuning isn't optional based on what we saw — out-of-the-box covariate conditioning was net-negative when the foundation's priors didn't match the deployment regime.

The system in the shop ships within a couple weeks. Real cycling data starts flowing then.

Day 2 of 100

Phase 2 — First Fine-Tune

Recent Log

Phase 1 Complete — Foundation Locked

What we did

What we found

Why it matters

Where we go next

Open Questions

Recent Decisions

Ongoing in Background