zfin/docs/explanation/caching.md
2026-06-25 14:46:08 -07:00

166 lines
8.3 KiB
Markdown

# Caching and data freshness
zfin makes a lot of API calls on your behalf -- prices, dividends,
earnings, ETF holdings -- against providers with strict free-tier
limits. Aggressive caching is what keeps it fast and keeps you well
under those limits. This page explains how it works so the
[`--refresh-data`](../guides/offline-and-refresh.md) flag makes sense.
## The fetch path
Every data request walks the same tiers, stopping at the first one that
can satisfy it:
1. **Local cache.** Look for `~/.cache/zfin/<SYMBOL>/<type>.srf`. If the
file exists and is within its TTL, deserialize and return -- no
network at all.
2. **Shared server** *(optional)*. On a miss or stale entry, if
`ZFIN_SERVER` is set, zfin asks that server before any provider; a
hit is written into your local cache and served from there, so no
provider call happens. See [Server sync](#server-sync-zfin_server).
3. **Provider.** Otherwise zfin fetches from the upstream provider,
writes the result to the cache, and returns it.
Freshness is decided by the cache file's modification time versus the
TTL for that data type. The cache directory defaults to `~/.cache/zfin`
and is set with `ZFIN_CACHE_DIR`.
The `--refresh-data` policy decides which tiers run:
- `auto` (default) walks all three.
- `force` skips the local cache and the server, going straight to the
provider, then re-caches the result.
- `never` stops at the local cache: it returns cached data even if
stale, and never touches the server or a provider.
## Time-to-live by data type
Different data ages at different rates, so each type has its own TTL:
| Data type | TTL | Why |
|---------------|---------------|----------------------------------------------------------------------------------|
| Daily candles | market-aware | Keyed to the next time a fresh bar is expected, not a rolling window (see below) |
| Dividends | 14 days | Declared well in advance |
| Splits | 14 days | Rare corporate events |
| Options | 1 hour | Prices move continuously when markets are open |
| Earnings | 30 days\* | Quarterly; smart-refreshed around announcements |
| ETF profiles | ~30 days | Holdings and weights change slowly |
| Quotes | never cached | Meant to be a live price check |
\* **Earnings smart refresh:** even inside the 30-day window, cached
earnings re-fetch automatically once an earnings date has passed but
the cache still lacks the actual result -- so numbers appear promptly
after an announcement without daily polling.
## Market-aware candle freshness
A daily bar only becomes meaningful once the market settles, so candle
freshness is keyed to the market clock rather than a rolling 24-hour
window. Each cached candle's expiry is set to the next moment fresh data
should be available:
- **Equities and ETFs** settle shortly after the 16:00 ET close. Their
bars expire at **16:55 ET** on the next trading day (weekends and NYSE
holidays are skipped). This time allows for providers to become consistent
with the market while also allowing a few minutes prior to a scheduled
refresh task at the top of the hour.
- **Mutual funds** strike a single daily NAV that isn't reliably
published until the next morning, so their bars expire at **03:25 ET**
the morning after a trading session. Despite Tiingo's claims, NAVs only
seem reliably available until about 3am Eastern. This is again timed such
that scheduled jobs for the bottom of the hour can run reliably.
This keeps the expiry boundary out of trading hours, so a refresh fired
just after it always sees a finalized bar instead of a half-formed one,
and an interactive command run mid-session won't trigger a needless
refetch. If a refresh runs but the provider hasn't posted the just-closed
bar yet, the entry is retried in ~30 minutes rather than waiting a full
day.
**Un-modeled closures self-correct.** Some market closures aren't on the
modeled holiday calendar (Good Friday, which needs the Easter computus,
plus ad-hoc closures for national mourning or weather). On such a day the
calendar thinks a bar is due, the fetch keeps coming back empty, and the
~30-minute retry would otherwise repeat all day - and, for a Friday
closure, all weekend. To avoid that thrash, once an expected bar is ~90
minutes overdue the cache concludes the market was closed and falls back
to the normal next-session boundary - at most three 30-minute retries.
That window comfortably covers ordinary provider posting lag, so
genuinely-late data is still picked up by the short retry; only a true
closure trips the fallback.
**Warming a shared cache on a schedule.** If you run a cron to warm a
[server cache](#server-sync-zfin_server) (or your own local cache), the
boundaries above are also the natural cron times: a run shortly after
**17:00 ET** picks up the day's equity/ETF closes, and a run shortly
after **03:30 ET** picks up the prior session's mutual-fund NAVs. The
boundaries sit a couple of minutes before those times so the cron
reliably sees the cache already expired.
## Quotes are never cached
Because quotes exist to give you a live price, they're never served
from cache. The practical consequence: in offline mode
(`--refresh-data=never`) the [`quote`](../reference/cli/quote.md)
command has nothing to serve, while candle-based commands like
[`perf`](../reference/cli/perf.md) work fine from cached history.
## Incremental candle updates
Price history isn't re-downloaded wholesale. On a cache miss, zfin
fetches only candles newer than the last cached date and appends them,
using a small `candles_meta.srf` companion file to track the last date
and source provider. A ten-year history costs one big fetch the first
time and tiny top-ups thereafter.
## Negative caching
When a provider permanently fails for a symbol -- a nonexistent
ticker, say -- zfin records a negative cache entry so it doesn't retry
the same dead lookup on every run. (Transient failures like rate limits
are not cached this way; they're retried.)
## Rate limiting
Each provider has a client-side token-bucket limiter sized to its
free-tier ceiling (e.g. Polygon 5/min, FMP 250/day). When you'd exceed
the rate, zfin blocks until a token is available rather than firing a
request that would 429. This is why a `--refresh-data=force` run across
many symbols can pace itself instead of failing. Limits are listed in
[Data providers and API keys](../reference/providers.md).
## Server sync (`ZFIN_SERVER`)
`ZFIN_SERVER` points zfin at an optional
[zfin-server](https://git.lerch.org/lobo/zfin-server) instance -- a
shared cache that sits between your local cache and the upstream
providers, and is the second tier of [the fetch path](#the-fetch-path).
On a local miss, zfin requests `GET {ZFIN_SERVER}/<SYMBOL>/<type>`
(candles, dividends, splits, options, earnings, classification, ETF
metrics, and EDGAR entity facts), and a hit is written straight into
your local cache.
Why bother: the server is warmed once -- say by a cron job on one
machine -- and then every client draws from it instead of each spending
its own provider quota, so a household or a fleet of machines shares one
set of API-key budgets and gets faster cold starts. For the portfolio
price load, the server is queried in parallel across symbols, with
per-symbol provider fallback only for what it can't supply.
It is entirely optional: when `ZFIN_SERVER` is unset, every server-sync
path silently no-ops and zfin runs local-cache-then-provider. Live
quotes are never served by the server (they aren't cached anywhere), and
`--refresh-data=force` bypasses the server to re-fetch from the provider.
## Controlling it
You rarely need to intervene -- `auto` does the right thing. When you
do:
- `--refresh-data=force` re-fetches everything (after a close, or to
clear suspected bad data).
- `--refresh-data=never` goes fully offline.
- [`zfin cache stats`](../reference/cli/cache.md) shows what's cached;
`zfin cache clear` wipes it (everything re-fetches next run).
See [Offline use and refreshing data](../guides/offline-and-refresh.md).