wttr/zig/DATA_FLOW.md

641 lines
14 KiB
Markdown

# wttr.in Data Flow Documentation
## Request Processing Flow
### 1. Initial Request
```
Client → Go Proxy (port 8082)
```
**Input:**
- HTTP request with location in URL
- Headers: User-Agent, Accept-Language, X-Forwarded-For
- Query parameters
**Go Proxy Actions:**
1. Extract cache key: `UserAgent:Host+URI:ClientIP:AcceptLanguage`
2. Check if request is cacheable (no `:` in location)
3. Look up in LRU cache (12,800 entries)
**Cache Hit Path:**
```
Go Proxy → Check expiry → Return cached response
```
**Cache Miss Path:**
```
Go Proxy → Set InProgress flag → Forward to Python Backend
```
### 2. Python Backend Processing
```
Go Proxy → Python Backend (port 8002) → Flask Router
```
**Flask Routes:**
- `/``wttr_srv.wttr(None, request)`
- `/{location}``wttr_srv.wttr(location, request)`
- `/:help`, `/:bash.function`, etc. → Static file handlers
### 3. Request Parsing (wttr_srv.py)
**Phase 1: Fast Path (Cache + Static)**
```python
parse_request(location, request, query, fast_mode=True)
_response(parsed_query, query, fast_mode=True)
Check Python LRU cache
Check if static page (:help, :bash.function, etc.)
Return if found, else continue to slow path
```
**Phase 2: Full Processing**
```python
parse_request(location, request, query, fast_mode=False)
Location Processing
_response(parsed_query, query, fast_mode=False)
Render weather
Cache and return
```
### 4. Location Processing (location.py)
**Input:** Location string, Client IP
**Processing Steps:**
```
1. Detect location type
├─ Empty/MyLocation → Use client IP
├─ IP address → Resolve to location
├─ @domain → Resolve domain to IP, then location
├─ ~search → Use geolocator service
├─ Moon → Special moon handler
└─ Name → Use as-is
2. Normalize location
├─ Lowercase
├─ Replace _ and + with space
└─ Remove special chars (!@#$*;:\)
3. Check aliases (share/aliases)
└─ from:to mapping
4. Check blacklist (share/blacklist)
└─ Return 403 if blocked
5. Resolve location
├─ IP → Location (GeoIP/IP2Location/IPInfo)
├─ Name → GPS coords (Geolocator service)
└─ IATA code → Airport location
6. Get hemisphere (for moon queries)
└─ GPS latitude > 0 = North
```
**Output:**
- `location` - Normalized location or GPS coords
- `override_location_name` - Display name
- `full_address` - Full address from geolocator
- `country` - Country name
- `query_source_location` - Client's location (city, country)
- `hemisphere` - True=North, False=South
### 5. IP to Location Resolution
**Method Priority (configurable via WTTR_IPLOCATION_ORDER):**
```
1. GeoIP (MaxMind GeoLite2)
├─ Read from GeoLite2-City.mmdb
├─ Extract city and country
└─ Fast, local, free
2. IP2Location API (optional)
├─ HTTP GET to api.ip2location.com
├─ Requires API key (~/.ip2location.key)
├─ Cache result in /wttr.in/cache/ip2l/{ip}
└─ Format: city;country
3. IPInfo API (optional)
├─ HTTP GET to ipinfo.io
├─ Requires token (~/.ipinfo.key)
├─ Cache result in /wttr.in/cache/ip2l/{ip}
└─ JSON response
Fallback: NOT_FOUND_LOCATION ("not found")
```
**Caching:**
- File cache: `/wttr.in/cache/ip2l/{ip_address}`
- Format: `city;country` or `location;country;extra;city`
- Persistent across restarts
### 6. Geolocator Service
**For search terms (~location) and non-ASCII names:**
```
Python Backend → HTTP GET localhost:8004/{location}
Geolocator Service (separate microservice)
Returns JSON:
{
"latitude": 48.8582602,
"longitude": 2.29449905432,
"address": "Tour Eiffel, 5, Avenue Anatole France..."
}
```
**Used for:**
- `~Eiffel Tower` → GPS coordinates
- `~Kilimanjaro` → GPS coordinates
- Non-ASCII location names
- IATA airport codes
### 7. Weather Data Fetching
**Two data sources (configured via WWO_KEY):**
#### Option A: met.no (Norwegian Meteorological Institute)
```
Python Backend → metno.py
HTTP GET to api.met.no
Parse XML/JSON response
Transform to standard JSON format
Return weather data
```
**Advantages:**
- Free, no API key required
- High quality data
- No rate limits
#### Option B: WorldWeatherOnline (WWO)
```
Python Backend → bin/proxy.py (separate service)
Check proxy cache (/wttr.in/cache/proxy-wwo/)
If miss: HTTP GET to api.worldweatheronline.com
Cache response
Return weather data
```
**Advantages:**
- More locations supported
- Historical data available
**Disadvantages:**
- Requires API key (~/.wwo.key)
- Rate limited (500 queries/day free tier)
**Weather Data Structure:**
```json
{
"current_condition": [{
"temp_C": "22",
"temp_F": "72",
"weatherCode": "122",
"weatherDesc": [{"value": "Overcast"}],
"windspeedKmph": "7",
"humidity": "76",
...
}],
"weather": [
{
"date": "2025-12-17",
"maxtempC": "25",
"mintempC": "18",
"hourly": [...]
}
]
}
```
### 8. Weather Rendering
**Route to appropriate renderer based on query:**
```
parsed_query → Determine view type
├─ format=1,2,3,4 → view/line.py (one-line format)
├─ format=j1 → Return raw JSON
├─ format=p1 → view/prometheus.py
├─ format=v2 → view/v2.py (data-rich)
├─ location=Moon → view/moon.py
└─ default → view/wttr.py (main view)
```
#### Main View (view/wttr.py)
```
get_wetter(parsed_query)
Call wego binary (Go program)
├─ Pass flags: -city, -lang, -imperial, -narrow, etc.
├─ wego fetches weather data
├─ wego renders ANSI output
└─ Return ANSI text
Post-process output
├─ Add location name override
├─ Add "not found" message if needed
└─ Format for display
If HTML output:
└─ Convert ANSI to HTML (ansi2html.sh)
```
**wego Command Example:**
```bash
/path/to/we-lang \
--city=London,GB \
-lang=en \
-imperial \
-narrow \
-location_name="London"
```
#### One-Line View (view/line.py)
```
wttr_line(query, parsed_query)
Get weather data (JSON)
Parse format string
├─ Predefined: 1, 2, 3, 4
└─ Custom: %c, %t, %h, %w, etc.
Replace format codes with data
├─ %c → Weather emoji
├─ %t → Temperature
├─ %h → Humidity
└─ etc.
Return formatted string
```
**Format Examples:**
- `format=3``London: ⛅️ +7°C`
- `format=%l:+%c+%t``London: ⛅️ +7°C`
#### Moon View (view/moon.py)
```
get_moon(parsed_query)
Parse date from location (Moon@2016-12-25)
Call pyphoon-lolcat binary
├─ Pass date parameter
└─ Return ASCII moon phase art
Return moon phase output
```
#### v2 View (view/v2.py)
```
Experimental data-rich format
Get weather data
Render:
├─ Temperature graph (ASCII)
├─ Precipitation graph (ASCII)
├─ Moon phases (4 days)
├─ Current conditions (detailed)
├─ Astronomical times (dawn, sunrise, etc.)
└─ GPS coordinates
Return formatted output
```
#### Prometheus View (view/prometheus.py)
```
Get weather data (JSON)
Convert to Prometheus metrics format
├─ temperature_feels_like_celsius{forecast="current"} 7
├─ humidity_percent{forecast="current"} 65
└─ etc.
Return metrics text
```
### 9. PNG Rendering (fmt/png.py)
**For .png requests:**
```
ANSI text output
Spawn thread (ThreadPool, 25 workers)
render_ansi(output, options)
Create virtual terminal (pyte)
├─ Feed ANSI sequences
└─ Capture terminal state
Render to image (PIL)
├─ Draw characters with font
├─ Apply colors from ANSI codes
└─ Apply transparency if requested
Return PNG bytes
Cache in /wttr.in/cache/png/
```
**Options:**
- `t` - Transparency (150)
- `transparency={0-255}` - Custom transparency
- `{width}x{height}` - Image dimensions
### 10. Translation (translations.py)
**Language Detection:**
```
1. Check subdomain (de.wttr.in → lang=de)
2. Check lang parameter (?lang=de)
3. Check Accept-Language header
4. Default to English
```
**Translation Files:**
- `share/translations/{lang}.txt` - Weather conditions
- `share/translations/{lang}-help.txt` - Help pages
**Translation Process:**
```
Weather condition text (English)
Look up in translations.py:TRANSLATIONS dict
Find translation for target language
Return translated text
```
**Example:**
```python
TRANSLATIONS = {
"en": {"Partly cloudy": "Partly cloudy"},
"de": {"Partly cloudy": "Teilweise bewölkt"},
"fr": {"Partly cloudy": "Partiellement nuageux"}
}
```
### 11. Caching (cache.py)
**Python LRU Cache:**
```
Request → Generate cache signature
signature = f"{user_agent}:{query_string}:{client_ip}:{lang}"
Check in-memory LRU (10,000 entries)
If found and not expired:
├─ If value starts with "file:" or "bfile:"
│ └─ Read from /wttr.in/cache/lru/{md5_hash}
└─ Return value
If not found:
├─ Generate response
├─ If response > 80 bytes:
│ ├─ Write to /wttr.in/cache/lru/{md5_hash}
│ └─ Store "file:{md5_hash}" in LRU
└─ Else: Store value directly in LRU
Set expiry: current_time + random(1000, 2000) seconds
Return response
```
**Dynamic Timestamps:**
```
Cached response with %{{NOW(timezone)}}
On retrieval: Replace with current time in timezone
Example: %{{NOW(Europe/London)}} → 14:32:15+0000
```
### 12. Response Wrapping
**Final response preparation:**
```
Response text/bytes
Determine content type
├─ PNG → image/png
├─ HTML → text/html
└─ ANSI/text → text/plain
Add buttons (if HTML and not format query)
├─ Add interactive UI elements
└─ Wrap in HTML template
Set HTTP headers
├─ Content-Type
├─ Cache-Control (PNG only)
└─ Access-Control-Allow-Origin: *
Return Flask response
```
### 13. Go Proxy Caching
**After Python backend returns:**
```
Python Backend → Response
Go Proxy receives response
If status code 200 or 304:
├─ Store in LRU cache
├─ Set expiry: current_time + random(1000, 1500) seconds
└─ Remove InProgress flag
Else (error):
└─ Remove from cache
Return response to client
```
### 14. Peak Request Prefetching
**Cron-based prefetching:**
```
Every hour at :30 and :00
Record incoming requests in sync.Map
At :24 and :54 (5 minutes before peak)
Iterate through recorded requests
For each request:
├─ Spawn goroutine
├─ Call processRequest() (refreshes cache)
├─ Sleep (spread over 300 seconds)
└─ Delete from sync.Map
Cache is warm for peak time
```
**Peak Times:**
- :30 past the hour (recorded at :30, prefetched at :24)
- :00 on the hour (recorded at :00, prefetched at :54)
## Data Structures
### Parsed Query
```python
{
"location": "London,GB", # Normalized location
"orig_location": "London", # Original input
"override_location_name": None, # Display name override
"full_address": "London, UK", # Full address
"country": "GB", # Country code
"query_source_location": ("Paris", "France"), # Client location
"hemisphere": True, # North=True, South=False
"lang": "en", # Language code
"view": None, # View type (v2, etc.)
"html_output": False, # HTML vs ANSI
"png_filename": None, # PNG filename if .png request
"ip_addr": "1.2.3.4", # Client IP
"user_agent": "curl/7.68.0", # User agent
"request_url": "http://...", # Full request URL
# Query options
"use_metric": True, # Metric units
"use_imperial": False, # Imperial units
"use_ms_for_wind": False, # m/s for wind
"narrow": False, # Narrow output
"inverted_colors": False, # Inverted colors
"no-terminal": False, # Plain text
"no-caption": False, # No caption
"no-city": False, # No city name
"no-follow-line": False, # No follow line
"days": "3", # Number of days
"transparency": None, # PNG transparency
"padding": False, # Add padding
"force-ansi": False, # Force ANSI
}
```
### Cache Entry (Go)
```go
type responseWithHeader struct {
InProgress bool // Request being processed
Expires time.Time // Expiration time
Body []byte // Response body
Header http.Header // HTTP headers
StatusCode int // HTTP status code
}
```
### Cache Entry (Python)
```python
{
"val": "response text" or "file:md5hash",
"expiry": 1702834567.123 # Unix timestamp
}
```
## Error Handling Flow
### Location Not Found
```
Location resolution fails
Set location = NOT_FOUND_LOCATION ("not found")
Fetch weather for default location (Oymyakon)
Append "not found" message in user's language
Return response
```
### API Error
```
Weather API returns error
Log error
If HTML output:
└─ Return malformed-response.html (500)
Else:
└─ Return "capacity limit reached" message (503)
```
### Rate Limit Exceeded
```
Check IP against limits (300/min, 3600/hour, 86400/day)
If exceeded:
└─ Return 429 with error message
```
### Blocked Location
```
Check location against blacklist
If blocked:
└─ Return 403 Forbidden
```
## Performance Optimizations
1. **Two-tier caching** (Go + Python)
2. **Fast path** (cache + static files checked first)
3. **File cache** for large responses (>80 bytes)
4. **Prefetching** at peak times
5. **ThreadPool** for PNG rendering (25 workers)
6. **Gevent** for async I/O in Python
7. **LRU eviction** prevents memory bloat
8. **Randomized TTL** prevents thundering herd
9. **InProgress flag** prevents duplicate work
10. **IP location caching** (persistent file cache)