wttr/zig/ARCHITECTURE.md

432 lines
12 KiB
Markdown

# wttr.in Architecture Documentation
## System Overview
wttr.in is a console-oriented weather forecast service with a hybrid Python/Go architecture:
- **Go proxy layer** (cmd/): LRU caching proxy with prefetching
- **Python backend** (lib/, bin/): Weather data fetching, formatting, rendering
- **Static assets** (share/): Translations, templates, emoji, help files
## Request Flow
```
Client Request
Go Proxy (port 8082) - LRU cache + prefetch
↓ (cache miss)
Python Backend (port 8002) - Flask/gevent
Location Resolution (GeoIP/IP2Location/IPInfo)
Weather API (met.no or WorldWeatherOnline)
Format & Render (ANSI/HTML/PNG/JSON/Prometheus)
Response (cached with TTL 1000-2000s)
```
## Component Breakdown
### 1. Go Proxy Layer (cmd/)
**Files:**
- `cmd/srv.go` - Main HTTP server (port 8082)
- `cmd/processRequest.go` - Request processing & caching logic
- `cmd/peakHandling.go` - Peak time prefetching (cron-based)
**Responsibilities:**
- LRU cache (12,800 entries, 1000-1500s TTL)
- Cache key: `UserAgent:Host+URI:ClientIP:AcceptLanguage`
- Prefetch popular requests at :24 and :54 past the hour
- Forward cache misses to Python backend (127.0.0.1:9002)
- Handle concurrent requests (InProgress flag prevents thundering herd)
**Key Logic:**
- `dontCache()`: Skip caching for cyclic requests (location contains `:`)
- `getCacheDigest()`: Generate cache key from request metadata
- `processRequest()`: Main request handler with cache-aside pattern
- `savePeakRequest()`: Record requests at :30 and :00 for prefetching
### 2. Python Backend (bin/, lib/)
#### Entry Points (bin/)
**bin/srv.py** - Main Flask application
- Listens on port 8002 (configurable via WTTRIN_SRV_PORT)
- Routes: `/`, `/<location>`, `/files/<path>`, `/favicon.ico`
- Uses gevent WSGI server for async I/O
- Delegates to `wttr_srv.wttr()` for all weather requests
**bin/proxy.py** - Weather API proxy (separate service)
- Caches weather API responses
- Transforms met.no/WWO data to standard JSON
- Test mode support (WTTRIN_TEST env var)
- Handles translations for weather conditions
**bin/geo-proxy.py** - Geolocation service proxy
- Not examined in detail (separate microservice)
#### Core Logic (lib/)
**lib/wttr_srv.py** - Main request handler
- `wttr(location, request)` - Entry point for all weather queries
- `parse_request()` - Parse location, language, format from request
- `_response()` - Generate response (checks cache, calls renderers)
- Rate limiting (300/min, 3600/hour, 24*3600/day per IP)
- ThreadPool (25 workers) for PNG rendering
- Two-phase processing: fast path (cache/static) then full path
**lib/parse_query.py** - Query string parsing
- Parse single-letter options: `n`=narrow, `m`=metric, `u`=imperial, `T`=no-terminal, etc.
- Parse PNG filenames: `City_200x_lang=ru.png` → structured dict
- Serialize/deserialize query state (base64+zlib for short URLs)
- Metric vs imperial logic (US IPs default to imperial)
**lib/location.py** - Location resolution
- `location_processing()` - Main entry point
- IP → Location: GeoIP2 (MaxMind), IP2Location API, IPInfo API
- Location normalization (lowercase, strip special chars)
- Geolocator service (localhost:8004) for GPS coordinates
- IATA airport code support
- Alias resolution (share/aliases file)
- Blacklist checking (share/blacklist file)
- Hemisphere detection for moon phases
- Special prefixes:
- `~` = search term (use geolocator)
- `@` = domain name (resolve to IP first)
- No prefix = exact location name
**lib/globals.py** - Configuration
- Environment variables: WTTR_MYDIR, WTTR_GEOLITE, WTTR_WEGO, etc.
- File paths: cache dirs, static files, translations
- API keys: IP2Location, IPInfo, WorldWeatherOnline
- Constants: NOT_FOUND_LOCATION, PLAIN_TEXT_AGENTS, QUERY_LIMITS
- IP location order: geoip → ip2location → ipinfo
**lib/cache.py** - LRU cache (Python side)
- In-memory LRU (10,000 entries, pylru)
- File cache for large responses (>80 bytes)
- TTL: 1000-2000s (randomized)
- Cache key: `UserAgent:QueryString:ClientIP:Lang`
- Dynamic timestamp replacement: `%{{NOW(timezone)}}`
**lib/limits.py** - Rate limiting
- Per-IP query limits (minute/hour/day buckets)
- Whitelist support
- Returns 429 on limit exceeded
#### View Renderers (lib/view/)
**lib/view/wttr.py** - Main weather view
- Calls `wego` (Go binary) for weather rendering
- Passes flags: -inverse, -wind_in_ms, -narrow, -lang, -imperial
- Post-processes output (location name, formatting)
- Converts to HTML if needed
**lib/view/line.py** - One-line format
- Formats: 1, 2, 3, 4, or custom with % notation
- Custom format codes: %c=condition, %t=temp, %h=humidity, %w=wind, etc.
- Supports multiple locations (`:` separated)
**lib/view/v2.py** - Data-rich v2 format
- Experimental format with more detail
- Moon phase, astronomical times, temperature graphs
- Terminal-only, English-only
**lib/view/moon.py** - Moon phase view
- Uses `pyphoon-lolcat` for rendering
- Supports date selection: `Moon@2016-12-25`
**lib/view/prometheus.py** - Prometheus metrics
- Exports weather data as Prometheus metrics
- Format: `p1`
#### Formatters (lib/fmt/)
**lib/fmt/png.py** - PNG rendering
- Converts ANSI terminal output to PNG images
- Uses pyte (terminal emulator) + PIL
- Transparency support
- Font rendering
**lib/fmt/unicodedata2.py** - Unicode handling
- Character width calculations for terminal rendering
#### Other Modules (lib/)
**lib/translations.py** - i18n support
- 54 languages supported
- Weather condition translations
- Help file translations (share/translations/)
- Language detection from Accept-Language header
**lib/constants.py** - Weather constants
- Weather codes (WWO API)
- Condition mappings
- Emoji mappings
**lib/buttons.py** - HTML UI elements
- Add interactive buttons to HTML output
**lib/fields.py** - Data field extraction
- Parse weather API responses
**lib/weather_data.py** - Weather data structures
**lib/airports.py** - IATA code handling
**lib/metno.py** - met.no API client
- Norwegian Meteorological Institute API
- Transforms to standard JSON format
### 3. Static Assets (share/)
**share/translations/** - 54 language files
- Format: `{lang}.txt` (weather conditions)
- Format: `{lang}-help.txt` (help pages)
**share/emoji/** - Weather emoji PNGs
- Used for PNG rendering
**share/static/** - Web assets
- favicon.ico
- style.css
- example images
**share/templates/** - Jinja2 templates
- index.html (HTML output wrapper)
**share/** - Data files
- `aliases` - Location aliases (from:to format)
- `blacklist` - Blocked locations
- `list-of-iata-codes.txt` - Airport codes
- `help.txt` - English help
- `bash-function.txt` - Shell integration
- `translation.txt` - Translation info page
## API Endpoints
### Weather Queries
- `GET /` - Weather for IP-based location
- `GET /{location}` - Weather for specific location
- `GET /{location}.png` - PNG image output
- `GET /{location}?{options}` - Weather with options
### Special Pages
- `GET /:help` - Help page
- `GET /:bash.function` - Shell function
- `GET /:translation` - Translation info
- `GET /:iterm2` - iTerm2 integration
### Static Files
- `GET /files/{path}` - Static assets
- `GET /favicon.ico` - Favicon
## Query Parameters
### Single-letter Options (combined in query string)
- `A` - Force ANSI output
- `n` - Narrow output
- `m` - Metric units
- `M` - m/s for wind speed
- `u` - Imperial units
- `I` - Inverted colors
- `t` - Transparency (PNG)
- `T` - No terminal sequences
- `p` - Padding
- `0-3` - Number of days
- `q` - No caption
- `Q` - No city name
- `F` - No follow line
### Named Parameters
- `lang={code}` - Language override
- `format={fmt}` - Output format (1-4, v2, j1, p1, custom)
- `view={view}` - View type (alias for format)
- `period={sec}` - Update interval for cyclic locations
### PNG Filename Format
`{location}_{width}x{height}_{options}_lang={lang}.png`
Example: `London_200x_t_lang=ru.png`
## Output Formats
1. **ANSI** - Terminal with colors/formatting
2. **Plain text** - No ANSI codes (T option)
3. **HTML** - Web browser output
4. **PNG** - Image file
5. **JSON** (j1) - Machine-readable data
6. **Prometheus** (p1) - Metrics format
7. **One-line** (1-4) - Compact formats
8. **v2** - Data-rich experimental format
## External Dependencies
### Weather APIs
- **met.no** (Norwegian Meteorological Institute) - Primary, free
- **WorldWeatherOnline** - Fallback, requires API key
### Geolocation
- **GeoLite2** (MaxMind) - Free GeoIP database (required)
- **IP2Location** - Commercial API (optional, needs key)
- **IPInfo** - Commercial API (optional, needs key)
- **Geolocator service** - localhost:8004 (GPS coordinates)
### External Binaries
- **wego** (we-lang) - Go weather rendering binary
- **pyphoon-lolcat** - Moon phase rendering
### Python Libraries
- Flask - Web framework
- gevent - Async I/O
- geoip2 - GeoIP lookups
- geopy - Geocoding
- requests - HTTP client
- PIL - Image processing
- pyte - Terminal emulator
- pytz - Timezone handling
- pylru - LRU cache
### Go Libraries
- github.com/hashicorp/golang-lru - LRU cache
- github.com/robfig/cron - Cron scheduler
## Configuration
### Environment Variables
- `WTTR_MYDIR` - Installation directory
- `WTTR_GEOLITE` - Path to GeoLite2-City.mmdb
- `WTTR_WEGO` - Path to wego binary
- `WTTR_LISTEN_HOST` - Bind address (default: "")
- `WTTR_LISTEN_PORT` - Port (default: 8002)
- `WTTR_USER_AGENT` - Custom user agent
- `WTTR_IPLOCATION_ORDER` - IP location method order
- `WTTRIN_SRV_PORT` - Override listen port
- `WTTRIN_TEST` - Enable test mode
### API Key Files
- `~/.wwo.key` - WorldWeatherOnline API key
- `~/.ip2location.key` - IP2Location API key
- `~/.ipinfo.key` - IPInfo token
- `~/.wegorc` - Wego configuration (JSON)
### Data Directories
- `/wttr.in/cache/ip2l/` - IP location cache
- `/wttr.in/cache/png/` - PNG cache
- `/wttr.in/cache/lru/` - LRU file cache
- `/wttr.in/cache/proxy-wwo/` - Weather API cache
- `/wttr.in/log/` - Log files
## Caching Strategy
### Three-tier Cache
1. **Go LRU** (12,800 entries, 1000-1500s TTL)
- In-memory, fastest
- Full HTTP responses
- Shared across all requests
2. **Python LRU** (10,000 entries, 1000-2000s TTL)
- In-memory for small responses (<80 bytes)
- File-backed for large responses
- Per-process cache
3. **File Cache**
- IP location cache (persistent)
- Weather API cache (persistent)
- PNG cache (persistent)
### Cache Keys
- Go: `UserAgent:Host+URI:ClientIP:AcceptLanguage`
- Python: `UserAgent:QueryString:ClientIP:Lang`
### Cache Invalidation
- TTL-based expiration (no manual invalidation)
- Randomized TTL prevents thundering herd
- Non-cacheable: cyclic requests (location contains `:`)
## Prefetching
- Cron jobs at :24 and :54 past the hour
- Records popular requests at :30 and :00
- Spreads prefetch over 5 minutes (300s)
- Prevents cache expiry during peak times
## Rate Limiting
- Per-IP limits: 300/min, 3600/hour, 24*3600/day
- Whitelist support (MY_EXTERNAL_IP)
- Returns HTTP 429 on limit exceeded
- Implemented in Python layer only
## Error Handling
- Location not found "not found" location (fallback weather)
- API errors 503 Service Unavailable
- Malformed requests 500 Internal Server Error (HTML) or error message (text)
- Blocked locations 403 Forbidden
- Rate limit 429 Too Many Requests
## Logging
- Main log: `/wttr.in/log/main.log`
- Debug log: `/tmp/wttr.in-debug.log`
- Go proxy logs to stdout
## Testing
- No unit tests
- Integration test: `test/query.sh`
- Makes HTTP requests to running server
- Compares SHA1 hashes of responses
- Test data in `test/test-data/signatures`
- CI: flake8 linting only (no actual tests run)
## Known Issues & Limitations
1. No unit test coverage
2. v2 format is experimental (terminal-only, English-only)
3. Moon phase Unicode ambiguity (hemisphere-dependent)
4. Hardcoded IP whitelist (MY_EXTERNAL_IP)
5. Multiple cache layers with different keys
6. Mixed Python/Go codebase
7. External binary dependencies (wego, pyphoon)
8. Requires external geolocator service (port 8004)
9. File cache grows unbounded
10. No cache warming on startup
## Performance Characteristics
- Go proxy handles ~12,800 cached requests in memory
- Python backend spawns 25 threads for PNG rendering
- Gevent provides async I/O for Python
- Prefetching reduces latency during peak times
- File cache avoids memory pressure for large responses
- Rate limiting prevents abuse
## Security Considerations
- IP-based rate limiting
- Location blacklist
- No authentication required
- User-provided location names passed to external APIs
- File cache uses MD5 hashes (not cryptographic)
- No input sanitization for location names
- Trusts X-Forwarded-For header