Day 2 of 100

Started 2026-04-27 · 2% elapsed

Phase 2First Fine-Tune

Weeks 3–5

See what fine-tuning actually does on our data. Close the most important data gaps.

  • todoFirst fine-tune attempt: base Chronos-2 vs Chronos-2 fine-tuned on telemetry
  • todoCompare predictions side by side across quiet and cycling regimes
  • todoConfigure Cerbo GX to enable per-cell V&T logging (closes the biggest data gap)
  • todoReconfigure Lynx Shunt to log continuously at 1Hz instead of event-driven

Deliverable: Documented comparison of what fine-tuning improved, what it didn't, on our data specifically.

Recent Log

View all →

Phase 1 Complete — Foundation Locked

2026-04-29Day 2

Wrapped Phase 1 of the local AI experiment. The 100-day deliverable on model selection landed earlier than expected, with a meaningful course correction from initial assumptions.

What we did

Spent four sessions running time-series foundation models against real BESS telemetry from the shop system. Pulled 14 days of battery power data, plus a separate slice from a period when the battery actually cycled. Ran the same evaluation methodology across five models — Chronos-T5-Tiny, T5-Small, Bolt-Small, TimesFM-200M, and Chronos-2 — on quiet data and on cycling data, with persistence as the baseline.

What we found

The headline: Chronos-Bolt-Small initially looked like the winner — fastest inference, best accuracy in early tests. Picked it as the foundation.

Then we validated the assumption that made it the foundation. Bolt is univariate-only. Its API doesn't accept covariates at all. The thing we needed it to do for production didn't exist.

Pivoted to Chronos-2, which has a structured covariate API and built-in LoRA fine-tuning. Validated on cycling data with voltage and time-of-day as covariates: 26% MAE reduction vs univariate, beating the persistence baseline that neither univariate model could touch.

Why it matters

The deeper finding: pure univariate forecasting can't predict event-driven BESS behavior. The actual drivers are external — solar, customer load, grid signals. The model captures statistical patterns of the past but can't predict the events that will define the future. Multivariate with exogenous inputs is the only path that actually works.

This isn't a Phase 1 failure. It's the architectural answer for what production needs to look like, backed by real experimental evidence on our data.

Where we go next

Phase 2 — fine-tuning Chronos-2 on Alchemy's data. Two things to evaluate across: quiet regimes (battery parked) and cycling regimes (active charge/discharge). Fine-tuning isn't optional based on what we saw — out-of-the-box covariate conditioning was net-negative when the foundation's priors didn't match the deployment regime.

The system in the shop ships within a couple weeks. Real cycling data starts flowing then.

Open Questions

4
  • ?When does the shop BESS system ship and start producing real cycling data?
  • ?Are customer-side sensors going into the spec for first deployments? Required for load forecasting in production architecture.
  • ?For Phase 2 fine-tuning, do we have enough cycling-regime data from the shop system, or do we need to wait for deployment?

Recent Decisions

All →
  • 2026-04-29Univariate forecasting insufficient for production
  • 2026-04-29Foundation model: Chronos-2
  • 2026-04-28Chronos comparison methodology locked

Ongoing in Background

  • in-progressBESS sizing/ROI tool completion
  • todoFirst Attio stage trigger end-to-end
  • todoVaultwarden credential consolidation
  • todoThor refactor as a true delegator
  • todoarc.alchemyindustrial.com dashboard