zfin/docs/explanation/caching.md
2026-06-25 14:46:08 -07:00

8.3 KiB

Caching and data freshness

zfin makes a lot of API calls on your behalf -- prices, dividends, earnings, ETF holdings -- against providers with strict free-tier limits. Aggressive caching is what keeps it fast and keeps you well under those limits. This page explains how it works so the --refresh-data flag makes sense.

The fetch path

Every data request walks the same tiers, stopping at the first one that can satisfy it:

  1. Local cache. Look for ~/.cache/zfin/<SYMBOL>/<type>.srf. If the file exists and is within its TTL, deserialize and return -- no network at all.
  2. Shared server (optional). On a miss or stale entry, if ZFIN_SERVER is set, zfin asks that server before any provider; a hit is written into your local cache and served from there, so no provider call happens. See Server sync.
  3. Provider. Otherwise zfin fetches from the upstream provider, writes the result to the cache, and returns it.

Freshness is decided by the cache file's modification time versus the TTL for that data type. The cache directory defaults to ~/.cache/zfin and is set with ZFIN_CACHE_DIR.

The --refresh-data policy decides which tiers run:

  • auto (default) walks all three.
  • force skips the local cache and the server, going straight to the provider, then re-caches the result.
  • never stops at the local cache: it returns cached data even if stale, and never touches the server or a provider.

Time-to-live by data type

Different data ages at different rates, so each type has its own TTL:

Data type TTL Why
Daily candles market-aware Keyed to the next time a fresh bar is expected, not a rolling window (see below)
Dividends 14 days Declared well in advance
Splits 14 days Rare corporate events
Options 1 hour Prices move continuously when markets are open
Earnings 30 days* Quarterly; smart-refreshed around announcements
ETF profiles ~30 days Holdings and weights change slowly
Quotes never cached Meant to be a live price check

* Earnings smart refresh: even inside the 30-day window, cached earnings re-fetch automatically once an earnings date has passed but the cache still lacks the actual result -- so numbers appear promptly after an announcement without daily polling.

Market-aware candle freshness

A daily bar only becomes meaningful once the market settles, so candle freshness is keyed to the market clock rather than a rolling 24-hour window. Each cached candle's expiry is set to the next moment fresh data should be available:

  • Equities and ETFs settle shortly after the 16:00 ET close. Their bars expire at 16:55 ET on the next trading day (weekends and NYSE holidays are skipped). This time allows for providers to become consistent with the market while also allowing a few minutes prior to a scheduled refresh task at the top of the hour.
  • Mutual funds strike a single daily NAV that isn't reliably published until the next morning, so their bars expire at 03:25 ET the morning after a trading session. Despite Tiingo's claims, NAVs only seem reliably available until about 3am Eastern. This is again timed such that scheduled jobs for the bottom of the hour can run reliably.

This keeps the expiry boundary out of trading hours, so a refresh fired just after it always sees a finalized bar instead of a half-formed one, and an interactive command run mid-session won't trigger a needless refetch. If a refresh runs but the provider hasn't posted the just-closed bar yet, the entry is retried in ~30 minutes rather than waiting a full day.

Un-modeled closures self-correct. Some market closures aren't on the modeled holiday calendar (Good Friday, which needs the Easter computus, plus ad-hoc closures for national mourning or weather). On such a day the calendar thinks a bar is due, the fetch keeps coming back empty, and the ~30-minute retry would otherwise repeat all day - and, for a Friday closure, all weekend. To avoid that thrash, once an expected bar is ~90 minutes overdue the cache concludes the market was closed and falls back to the normal next-session boundary - at most three 30-minute retries. That window comfortably covers ordinary provider posting lag, so genuinely-late data is still picked up by the short retry; only a true closure trips the fallback.

Warming a shared cache on a schedule. If you run a cron to warm a server cache (or your own local cache), the boundaries above are also the natural cron times: a run shortly after 17:00 ET picks up the day's equity/ETF closes, and a run shortly after 03:30 ET picks up the prior session's mutual-fund NAVs. The boundaries sit a couple of minutes before those times so the cron reliably sees the cache already expired.

Quotes are never cached

Because quotes exist to give you a live price, they're never served from cache. The practical consequence: in offline mode (--refresh-data=never) the quote command has nothing to serve, while candle-based commands like perf work fine from cached history.

Incremental candle updates

Price history isn't re-downloaded wholesale. On a cache miss, zfin fetches only candles newer than the last cached date and appends them, using a small candles_meta.srf companion file to track the last date and source provider. A ten-year history costs one big fetch the first time and tiny top-ups thereafter.

Negative caching

When a provider permanently fails for a symbol -- a nonexistent ticker, say -- zfin records a negative cache entry so it doesn't retry the same dead lookup on every run. (Transient failures like rate limits are not cached this way; they're retried.)

Rate limiting

Each provider has a client-side token-bucket limiter sized to its free-tier ceiling (e.g. Polygon 5/min, FMP 250/day). When you'd exceed the rate, zfin blocks until a token is available rather than firing a request that would 429. This is why a --refresh-data=force run across many symbols can pace itself instead of failing. Limits are listed in Data providers and API keys.

Server sync (ZFIN_SERVER)

ZFIN_SERVER points zfin at an optional zfin-server instance -- a shared cache that sits between your local cache and the upstream providers, and is the second tier of the fetch path. On a local miss, zfin requests GET {ZFIN_SERVER}/<SYMBOL>/<type> (candles, dividends, splits, options, earnings, classification, ETF metrics, and EDGAR entity facts), and a hit is written straight into your local cache.

Why bother: the server is warmed once -- say by a cron job on one machine -- and then every client draws from it instead of each spending its own provider quota, so a household or a fleet of machines shares one set of API-key budgets and gets faster cold starts. For the portfolio price load, the server is queried in parallel across symbols, with per-symbol provider fallback only for what it can't supply.

It is entirely optional: when ZFIN_SERVER is unset, every server-sync path silently no-ops and zfin runs local-cache-then-provider. Live quotes are never served by the server (they aren't cached anywhere), and --refresh-data=force bypasses the server to re-fetch from the provider.

Controlling it

You rarely need to intervene -- auto does the right thing. When you do:

  • --refresh-data=force re-fetches everything (after a close, or to clear suspected bad data).
  • --refresh-data=never goes fully offline.
  • zfin cache stats shows what's cached; zfin cache clear wipes it (everything re-fetches next run).

See Offline use and refreshing data.