zfin/docs/explanation/caching.md
Emil Lerch 74fc219afd
All checks were successful
Generic zig build / build (push) Successful in 5m48s
Generic zig build / publish-macos (push) Successful in 11s
Generic zig build / deploy (push) Successful in 23s
add docs/guides
2026-06-22 14:53:53 -07:00

121 lines
5.6 KiB
Markdown

# Caching and data freshness
zfin makes a lot of API calls on your behalf -- prices, dividends,
earnings, ETF holdings -- against providers with strict free-tier
limits. Aggressive caching is what keeps it fast and keeps you well
under those limits. This page explains how it works so the
[`--refresh-data`](../guides/offline-and-refresh.md) flag makes sense.
## The fetch path
Every data request walks the same tiers, stopping at the first one that
can satisfy it:
1. **Local cache.** Look for `~/.cache/zfin/<SYMBOL>/<type>.srf`. If the
file exists and is within its TTL, deserialize and return -- no
network at all.
2. **Shared server** *(optional)*. On a miss or stale entry, if
`ZFIN_SERVER` is set, zfin asks that server before any provider; a
hit is written into your local cache and served from there, so no
provider call happens. See [Server sync](#server-sync-zfin_server).
3. **Provider.** Otherwise zfin fetches from the upstream provider,
writes the result to the cache, and returns it.
Freshness is decided by the cache file's modification time versus the
TTL for that data type. The cache directory defaults to `~/.cache/zfin`
and is set with `ZFIN_CACHE_DIR`.
The `--refresh-data` policy decides which tiers run:
- `auto` (default) walks all three.
- `force` skips the local cache and the server, going straight to the
provider, then re-caches the result.
- `never` stops at the local cache: it returns cached data even if
stale, and never touches the server or a provider.
## Time-to-live by data type
Different data ages at different rates, so each type has its own TTL:
| Data type | TTL | Why |
|---------------|---------------|-------------------------------------------------------------|
| Daily candles | ~24h (23h45m) | One bar per trading day; slightly under 24h for cron jitter |
| Dividends | 14 days | Declared well in advance |
| Splits | 14 days | Rare corporate events |
| Options | 1 hour | Prices move continuously when markets are open |
| Earnings | 30 days\* | Quarterly; smart-refreshed around announcements |
| ETF profiles | ~30 days | Holdings and weights change slowly |
| Quotes | never cached | Meant to be a live price check |
\* **Earnings smart refresh:** even inside the 30-day window, cached
earnings re-fetch automatically once an earnings date has passed but
the cache still lacks the actual result -- so numbers appear promptly
after an announcement without daily polling.
## Quotes are never cached
Because quotes exist to give you a live price, they're never served
from cache. The practical consequence: in offline mode
(`--refresh-data=never`) the [`quote`](../reference/cli/quote.md)
command has nothing to serve, while candle-based commands like
[`perf`](../reference/cli/perf.md) work fine from cached history.
## Incremental candle updates
Price history isn't re-downloaded wholesale. On a cache miss, zfin
fetches only candles newer than the last cached date and appends them,
using a small `candles_meta.srf` companion file to track the last date
and source provider. A ten-year history costs one big fetch the first
time and tiny top-ups thereafter.
## Negative caching
When a provider permanently fails for a symbol -- a nonexistent
ticker, say -- zfin records a negative cache entry so it doesn't retry
the same dead lookup on every run. (Transient failures like rate limits
are not cached this way; they're retried.)
## Rate limiting
Each provider has a client-side token-bucket limiter sized to its
free-tier ceiling (e.g. Polygon 5/min, FMP 250/day). When you'd exceed
the rate, zfin blocks until a token is available rather than firing a
request that would 429. This is why a `--refresh-data=force` run across
many symbols can pace itself instead of failing. Limits are listed in
[Data providers and API keys](../reference/providers.md).
## Server sync (`ZFIN_SERVER`)
`ZFIN_SERVER` points zfin at an optional
[zfin-server](https://git.lerch.org/lobo/zfin-server) instance -- a
shared cache that sits between your local cache and the upstream
providers, and is the second tier of [the fetch path](#the-fetch-path).
On a local miss, zfin requests `GET {ZFIN_SERVER}/<SYMBOL>/<type>`
(candles, dividends, splits, options, earnings, classification, ETF
metrics, and EDGAR entity facts), and a hit is written straight into
your local cache.
Why bother: the server is warmed once -- say by a cron job on one
machine -- and then every client draws from it instead of each spending
its own provider quota, so a household or a fleet of machines shares one
set of API-key budgets and gets faster cold starts. For the portfolio
price load, the server is queried in parallel across symbols, with
per-symbol provider fallback only for what it can't supply.
It is entirely optional: when `ZFIN_SERVER` is unset, every server-sync
path silently no-ops and zfin runs local-cache-then-provider. Live
quotes are never served by the server (they aren't cached anywhere), and
`--refresh-data=force` bypasses the server to re-fetch from the provider.
## Controlling it
You rarely need to intervene -- `auto` does the right thing. When you
do:
- `--refresh-data=force` re-fetches everything (after a close, or to
clear suspected bad data).
- `--refresh-data=never` goes fully offline.
- [`zfin cache stats`](../reference/cli/cache.md) shows what's cached;
`zfin cache clear` wipes it (everything re-fetches next run).
See [Offline use and refreshing data](../guides/offline-and-refresh.md).