zfin/TODO.md

18 KiB

Future Work

No work here is blocking - we're in a good state. Items below are ordered roughly by priority within each section. Priority labels (HIGH / MEDIUM / LOW) mark items that deserve explicit ranking; unlabeled items are "someday, if the mood strikes."

Projections: future enhancements

  • Goal-seek over distribution horizon for W1 - priority LOW. Today the W1 ("set spending, find date") workflow reports the earliest retirement at each user-configured (horizon, confidence) cell. The philosophically correct version asks "when have I accumulated enough wealth that the projection shows a 95% probability of success withdrawing X per year from retirement until age-of-death?" - i.e. goal-seek across both accumulation_years AND distribution_years simultaneously, anchored to a configured age-of-death. NP-shaped search; not worth optimizing until someone wants it.
  • Per-person retirement_age - priority LOW. V1 of the accumulation-phase spec chose Option A: a single household retirement boundary derived from the oldest configured birthdate. Households where one earner retires significantly earlier than the other would benefit from per-person retirement_age fields on each type::birthdate record, with contributions stopped per-person.
  • Historical projection overlay follow-ups. The base --overlay-actuals overlay shipped (CLI tip + TUI primary surface). Open enhancements:
    • Historical metadata.srf / projections.srf for back-dated runs. Today the overlay re-runs against current classifications and assumptions; for historically faithful what-the-model-said-then output we'd check out the git-tracked versions of those files at the as-of commit and load those instead. Edge case until classifications materially drift.
    • Contribution-attribution overlay. Today's actuals line includes contributions implicitly; the bands assume modeled contributions that may or may not match reality. A "decompose actuals into market return vs contributions" annotation would clarify how much of the trajectory was the model being right vs new money arriving on schedule.
    • Mosaic mode: overlay multiple as-of starting points on one chart ("show me 1Y, 3Y, 5Y, 10Y projections all at once") so the user can see how the projection envelope tightened as data came in.
    • Better composition basis for imported-only as-of. Today the imported-only path uses today's allocations scaled by imported_liquid / today_total_liquid. That's the simplest thing that could work, but it's "today's mix back-dated" - it ignores everything we know about the historical context. Specifically: imported_values.srf already carries an expected_return field per row that the user captured at that date in their source spreadsheet. We could:
      • Use the imported expected_return as a sanity check against the simulation's per-position weighted return (warn or clamp if they diverge wildly - the spreadsheet's number reflects what the user actually saw at the time).
      • Use the imported expected_return to bias the stock/bond split inference: a higher expected return implies a higher historical equity weighting than today's mix probably reflects.
      • Reach further: derive a synthetic stock/bond split from the imported expected_return directly, treating it as a weighted average of SPY and AGG returns at that date and solving for the weights. That gives a per-imported- row composition that's locally faithful instead of one-mix-fits-all. None of these are urgent - the current "today's mix scaled" approximation is documented as such and the bands still render meaningfully - but each would tighten the historical faithfulness one notch. Pick whichever has the highest payoff vs. complexity when this gets revisited.

--export-chart follow-ups - priority LOW

V1 of --export-chart <PATH> shipped for quote, projections (default bands mode only), and history. Several adjacent surfaces still don't have PNG export and were deferred:

  • projections --convergence / --return-backtest. Both render forecast-evaluation charts via tui/forecast_chart.zig. Not refactored to expose a renderToSurface seam yet - parser rejects --export-chart in those modes today. Low effort to add (mirror the tui/chart.zig pattern).
  • projections --vs <DATE>. No chart at all in this mode (text-only delta table); --export-chart rejected at parse time. Could grow a side-by-side bands comparison chart, but that's a feature of its own - not just an export plumbing job.
  • Theme overrides at export time. Today the export always uses theme.default_theme. A --theme <PATH> flag at export time would let users render with their configured theme or a presentation-friendly one. Out of scope for V1; gate when someone asks for it.
  • File format alternatives. SVG / PDF / WebP - z2d only exports PNG natively today; would need an external dependency or a pixel-buffer-to-format conversion.

Refactor: trim src/format.zig once Money / Date have absorbed their helpers - priority LOW

src/format.zig is still a ~1600-line grab-bag, but the money- and date-shaped helpers that used to live there have been moved out: money formatting now lives in src/Money.zig (with {f} / whole() / trim() / signed() / padRight(N) / padLeft(N)), date formatting lives in src/Date.zig (with {f} / padRight(N) / padLeft(N)), and the braille sparkline chart now lives in src/charts/braille.zig. What's left in format.zig is the genuinely-format-domain stuff: return formatters, allocation notes, signed-percent rendering.

If the file ever grows enough to be annoying again, consider renaming to src/render.zig to better describe what's left. Not blocking - file it as cleanup if and when it bites.

Investigate: detailed 401(k) contributions data source

Found a more detailed contributions screen on at least one employer-sponsored 401(k) provider portal - distinct from the standard positions/holdings view we already pull from. Worth investigating whether this unlocks better attribution than what we get from the positions CSV alone, and whether other 401(k) providers expose similar screens.

Open questions to answer when picking this up:

  • Which screen specifically (path / URL within the portal)? Is there an export option, or is it view-only / scrape territory?
  • What fields does it expose (employee pre-tax, employer match, after-tax / mega-backdoor, by-pay-period dates, per-fund allocations)?
  • Refresh cadence - per-paycheck, daily, on-demand?
  • Can it be auto-discovered like the existing audit CSVs, or is it manual-entry territory?

If the export is structured and recurring, this could feed a 401(k)-specific contributions classifier that bypasses the lot-diff heuristic for that account, similar to how cash_is_contribution opts ESPP/HSA accounts into cash-based attribution.

Related: ESPP-style accrual blind spot in the "Audit: manual-check accounts mechanism" section above.

Torn SRF files from server sync (root cause unknown)

Status: Root cause still unidentified. We have mitigations and diagnostics in place that keep torn responses from corrupting the cache, but we don't yet know why responses arrive torn. Until we have a root cause, this is not resolved - it's mitigated.

Mitigations landed so far:

  • syncFromServer (src/service.zig) validates responses via cache.Store.looksCompleteSrf before writeRaw. Torn HTTP bodies (empty, missing #!srfv1 header, or no trailing newline) are rejected with a warn-level log and NOT written to cache.
  • HTTP responses are checked for an ETag sha256 header; on mismatch we retry the request once before giving up and falling back to the provider.
  • Read-path self-heal: on SRF parse failure during read, the cache entry is invalidated so a subsequent refresh can repair without user intervention.
  • Diagnostics: richer error capture around the sync path. So far, HTTP transit is the dominant source of torn responses - but that's an observation, not a root cause.

Remaining work:

  • Identify root cause. Candidates to investigate: proxy/load-balancer behavior, HTTP keepalive reuse, partial reads on the server side, client-side buffer handling. The etag retry tells us whether the problem is per-request or persistent; dig into the diagnostics output when the next occurrence is captured.
  • Once root cause is known, decide whether the current mitigations are sufficient or whether a targeted fix is needed. The mitigations may end up being the whole answer, but we can't conclude that without understanding the underlying cause.

(Content-Length validation was considered and rejected: once the server starts compressing response bodies, Content-Length reflects the compressed byte count, not the decoded payload, so it's not a reliable integrity check.)

On-demand server-side fetch for new symbols

Currently the server's SRF endpoints (/candles, /dividends, etc.) are pure cache reads - they 404 if the data isn't already on disk. New symbols only get populated when added to the portfolio and picked up by the next cron refresh.

Consider: on a cache miss, instead of blocking the HTTP response with a multi-second provider fetch, kick off an async background fetch (or just auto-add the symbol to the portfolio) and return 404 as usual. The next request - or the next cron run - would then have the data. This gives "instant-ish gratification" for new symbols without the downsides of synchronous fetch-on-miss (latency, rate limit contention, unbounded cache growth from arbitrary tickers).

Note that this process doesn't do anything to eliminate all the API keys that are necessary for a fully functioning system. A more aggressive view would be to treat ZFIN_SERVER as a 100% source of record, but that would introduce some opacity to the process as we wait for candles (for example) to populate. This could be solved on the server by spawning a thread to fetch the data, then returning 202 Accepted, which could then be polled client side. Maybe this is a better long term approach?

Support Tiingo paid plan - priority LOW

zfin hardwires Tiingo to free-tier assumptions: the provider is constructed with RateLimiter.perHour(io, 50) in Tiingo.init (providers/tiingo.zig), and the only Tiingo surface is end-of-day candles plus the corporate actions that ride along in the same response. A user who pays for a Tiingo plan ($30/mo Power tier and up) gets nothing for it today - the same 50/hour throttle, the same EOD-only data. "Support the paid plan" is the umbrella for unlocking what that subscription actually buys: higher rate limits and real-time IEX quotes. The two are coupled (real-time polling only makes sense once the limit is raised), which is why they belong in one entry rather than two.

Tier-aware rate limiting

The 50/hour cap is hardcoded in Tiingo.init (RateLimiter.perHour(io, 50)), and the module docstring bakes in "Free tier: 50 requests/hour and 1,000 requests/day." Paid tiers raise both ceilings substantially, so a paying subscriber is being throttled far below their entitlement. Today only the hourly bucket is wired; the daily ceiling isn't enforced at all (the docstring notes it's "far from binding" for bursty EOD usage - real-time polling changes that calculus).

Work:

  • Make the Tiingo limits configurable instead of hardcoded. Options: explicit ZFIN_TIINGO_RATE_PER_HOUR (and per-day) numeric env knobs, or a coarser ZFIN_TIINGO_PLAN = free (default) | power | ... that maps to known limits. Lean toward explicit numeric overrides so we aren't chasing Tiingo's published per-tier numbers as they drift.
  • RateLimiter already supports arbitrary init(io, max, window_ns) plus perDay/perHour convenience ctors, so the limiter side is cheap. Decide whether a paid plan needs both an hourly and a daily bucket enforced, or whether hourly alone stays sufficient.
  • Caveat from RateLimiter's own docs: the bucket is in-memory and per-process - it caps a single run's burst, not usage across separate launches in the same window. Sustained real-time polling (below) makes cross-process usage likelier, so revisit whether per-process accounting is still good enough.

Real-time IEX quotes (was: configurable live-quote provider)

The TUI refresh key (r) values the portfolio with live intraday quotes via DataService.loadLiveQuotes (service.zig), which is Yahoo-only: Yahoo is keyless, consolidated, and stays off every rate-limit budget, so bursty refresh traffic costs nothing. The tradeoffs are that Yahoo's unofficial feed is ~15-minute delayed and "can break without notice."

Tiingo's IEX endpoint (/iex/?tickers=A,B,C) is a strong opt-in alternative for a paid subscriber: it's genuinely real-time (IEX last-sale, no 15-min delay), official/keyed, and bills per HTTP request - one call returns the whole portfolio (confirmed empirically: a 2-ticker batch decrements the daily quota by 1, not 2). Fields map cleanly: tngoLast to price, prevClose to day-change. Caveats: IEX is a single venue (~2-3% of volume), so tngoLast can sit stale between prints on illiquid names, and IEX doesn't trade mutual funds, so those fall back to the candle close.

Proposal: a config knob (env var, e.g. ZFIN_LIVE_QUOTE_PROVIDER = yahoo (default) | tiingo) that switches loadLiveQuotes to a new Tiingo.fetchQuotes(tickers) batched call. A paid subscriber who wants real-time and mashes r a lot (or once we add streaming) reuses their existing TIINGO_API_KEY and gets real-time coverage; everyone else keeps the keyless Yahoo default.

Implementation notes:

  • Tiingo.fetchQuotes returns an array whose order is NOT guaranteed to match the request order, so key results by the returned ticker field, not by position.
  • Live quotes share Tiingo's token bucket, so this is the concrete reason the tier-aware rate-limiting work above has to land first (or alongside): a batched quote call is only 1 request, but heavy r use plus candle refreshes draining the free 50/hour bucket is exactly the contention that raising the paid-tier limit relieves.

Websocket streaming (follow-on)

Tiingo's IEX websocket would be the natural follow-on for true push-based real-time, replacing poll-on-r entirely. Materially bigger than the REST quote path (persistent connection, reconnect handling, a background task feeding the TUI) and squarely a paid-plan feature. Sequence it after the REST quote path proves out.

Analysis: dividend equity / income-shaped equity - think about it

Dividend-equity ETFs (SCHD, VYM, DGRO, NOBL, SDY, VIG, etc.) bucket as Equity in analysis.bucketSector. That's correct for risk-exposure analysis - they drop with the market in a 2008-style crash, regardless of the dividend stream - but it loses the income-vs-growth distinction that retirement-planning tools care about.

Open question: is there a useful second dimension to add? Possibilities:

  • Yield-weighted breakdown. Aggregate current_yield per position, weight by market value, report a portfolio-level yield. Doesn't change the asset-class taxonomy; adds a new metric.
  • Income coverage of expenses. "My dividends + bond coupons cover X% of projected retirement spending." Closer to what the income-side framing actually wants - answers the question rather than redefining the buckets.
  • Income-equity sub-bucket within Equity. A sub-row in the Asset Category breakdown, not a 5th top-level bucket. Would need a way to mark funds as "income-shaped" - probably a per-symbol opt-in in metadata.srf.

Not a bug. Not blocking anything. Could end up being a feature. This is a note to revisit after using the 4-bucket view for a while and seeing whether the missing dimension actually matters in practice.

Resist the temptation to:

  • Add a 5th top-level bucket ("Income Equity" / "Dividend Equity"). The 4-bucket view is already the right answer for "how much equity exposure do I have?". A 5th bucket fragments the headline number.
  • Override SCHD to Fixed Income. Wrong on risk grounds. SCHD will lose 35-45% in an equity crash; treating it as FI makes the user think they have downside protection they don't.
  • Add per-symbol "intent" metadata (held_for_income::true). Smell of putting framing into data. Intent is a property of the holder's strategy, not the security.

If a fix lands, it's probably a separate analysis section (yield breakdown, income coverage) - not a change to the asset-class taxonomy.

The following items are acknowledged but not prioritized. Listed here so they don't get lost; pick up opportunistically.

UX

  • CLI options command UX. The options command auto-expands only the nearest monthly expiration and lists others collapsed. Reconsider the interaction model - e.g. allow specifying an expiration date, showing all monthlies expanded by default, or filtering by strategy (covered calls, spreads).

Audit

  • Audit large-lot threshold tuning. src/commands/audit.zig uses audit_large_lot_threshold: f64 = 10_000.0 as the cutoff for "surface this new lot for confirmation." Revisit if $10k proves too aggressive (ESPP accruals spam the report) or too permissive (large DRIP confirmations slip past). If runtime tuning becomes necessary, a --large-lot <amount> flag or a global audit_large_lot_threshold field on accounts.srf would be reasonable extensions.

Infra / performance

  • HTTP connection pooling. Parallel server sync in loadAllPrices spawns up to 8 threads, each with its own HTTP connection. Could reuse connections to reduce TCP handshake overhead. Only matters with very large portfolios (100+ symbols) hitting ZFIN_SERVER.
  • Streaming cache deserialization. Cache store reads entire files into memory (readFileAlloc with 50MB limit). For portfolios with 10+ years of daily candles, this could use significant memory. Keep current approach unless memory becomes a real problem.