wttr/DATA_FLOW.md

14 KiB

wttr.in Data Flow Documentation

Request Processing Flow

1. Initial Request

Client → Go Proxy (port 8082)

Input:

  • HTTP request with location in URL
  • Headers: User-Agent, Accept-Language, X-Forwarded-For
  • Query parameters

Go Proxy Actions:

  1. Extract cache key: UserAgent:Host+URI:ClientIP:AcceptLanguage
  2. Check if request is cacheable (no : in location)
  3. Look up in LRU cache (12,800 entries)

Cache Hit Path:

Go Proxy → Check expiry → Return cached response

Cache Miss Path:

Go Proxy → Set InProgress flag → Forward to Python Backend

2. Python Backend Processing

Go Proxy → Python Backend (port 8002) → Flask Router

Flask Routes:

  • /wttr_srv.wttr(None, request)
  • /{location}wttr_srv.wttr(location, request)
  • /:help, /:bash.function, etc. → Static file handlers

3. Request Parsing (wttr_srv.py)

Phase 1: Fast Path (Cache + Static)

parse_request(location, request, query, fast_mode=True)
  
_response(parsed_query, query, fast_mode=True)
  
Check Python LRU cache
  
Check if static page (:help, :bash.function, etc.)
  
Return if found, else continue to slow path

Phase 2: Full Processing

parse_request(location, request, query, fast_mode=False)
  
Location Processing
  
_response(parsed_query, query, fast_mode=False)
  
Render weather
  
Cache and return

4. Location Processing (location.py)

Input: Location string, Client IP

Processing Steps:

1. Detect location type
   ├─ Empty/MyLocation → Use client IP
   ├─ IP address → Resolve to location
   ├─ @domain → Resolve domain to IP, then location
   ├─ ~search → Use geolocator service
   ├─ Moon → Special moon handler
   └─ Name → Use as-is

2. Normalize location
   ├─ Lowercase
   ├─ Replace _ and + with space
   └─ Remove special chars (!@#$*;:\)

3. Check aliases (share/aliases)
   └─ from:to mapping

4. Check blacklist (share/blacklist)
   └─ Return 403 if blocked

5. Resolve location
   ├─ IP → Location (GeoIP/IP2Location/IPInfo)
   ├─ Name → GPS coords (Geolocator service)
   └─ IATA code → Airport location

6. Get hemisphere (for moon queries)
   └─ GPS latitude > 0 = North

Output:

  • location - Normalized location or GPS coords
  • override_location_name - Display name
  • full_address - Full address from geolocator
  • country - Country name
  • query_source_location - Client's location (city, country)
  • hemisphere - True=North, False=South

5. IP to Location Resolution

Method Priority (configurable via WTTR_IPLOCATION_ORDER):

1. GeoIP (MaxMind GeoLite2)
   ├─ Read from GeoLite2-City.mmdb
   ├─ Extract city and country
   └─ Fast, local, free

2. IP2Location API (optional)
   ├─ HTTP GET to api.ip2location.com
   ├─ Requires API key (~/.ip2location.key)
   ├─ Cache result in /wttr.in/cache/ip2l/{ip}
   └─ Format: city;country

3. IPInfo API (optional)
   ├─ HTTP GET to ipinfo.io
   ├─ Requires token (~/.ipinfo.key)
   ├─ Cache result in /wttr.in/cache/ip2l/{ip}
   └─ JSON response

Fallback: NOT_FOUND_LOCATION ("not found")

Caching:

  • File cache: /wttr.in/cache/ip2l/{ip_address}
  • Format: city;country or location;country;extra;city
  • Persistent across restarts

6. Geolocator Service

For search terms (~location) and non-ASCII names:

Python Backend → HTTP GET localhost:8004/{location}
  ↓
Geolocator Service (separate microservice)
  ↓
Returns JSON:
{
  "latitude": 48.8582602,
  "longitude": 2.29449905432,
  "address": "Tour Eiffel, 5, Avenue Anatole France..."
}

Used for:

  • ~Eiffel Tower → GPS coordinates
  • ~Kilimanjaro → GPS coordinates
  • Non-ASCII location names
  • IATA airport codes

7. Weather Data Fetching

Two data sources (configured via WWO_KEY):

Option A: met.no (Norwegian Meteorological Institute)

Python Backend → metno.py
  ↓
HTTP GET to api.met.no
  ↓
Parse XML/JSON response
  ↓
Transform to standard JSON format
  ↓
Return weather data

Advantages:

  • Free, no API key required
  • High quality data
  • No rate limits

Option B: WorldWeatherOnline (WWO)

Python Backend → bin/proxy.py (separate service)
  ↓
Check proxy cache (/wttr.in/cache/proxy-wwo/)
  ↓
If miss: HTTP GET to api.worldweatheronline.com
  ↓
Cache response
  ↓
Return weather data

Advantages:

  • More locations supported
  • Historical data available

Disadvantages:

  • Requires API key (~/.wwo.key)
  • Rate limited (500 queries/day free tier)

Weather Data Structure:

{
  "current_condition": [{
    "temp_C": "22",
    "temp_F": "72",
    "weatherCode": "122",
    "weatherDesc": [{"value": "Overcast"}],
    "windspeedKmph": "7",
    "humidity": "76",
    ...
  }],
  "weather": [
    {
      "date": "2025-12-17",
      "maxtempC": "25",
      "mintempC": "18",
      "hourly": [...]
    }
  ]
}

8. Weather Rendering

Route to appropriate renderer based on query:

parsed_query → Determine view type
  ↓
  ├─ format=1,2,3,4 → view/line.py (one-line format)
  ├─ format=j1 → Return raw JSON
  ├─ format=p1 → view/prometheus.py
  ├─ format=v2 → view/v2.py (data-rich)
  ├─ location=Moon → view/moon.py
  └─ default → view/wttr.py (main view)

Main View (view/wttr.py)

get_wetter(parsed_query)
  ↓
Call wego binary (Go program)
  ├─ Pass flags: -city, -lang, -imperial, -narrow, etc.
  ├─ wego fetches weather data
  ├─ wego renders ANSI output
  └─ Return ANSI text
  ↓
Post-process output
  ├─ Add location name override
  ├─ Add "not found" message if needed
  └─ Format for display
  ↓
If HTML output:
  └─ Convert ANSI to HTML (ansi2html.sh)

wego Command Example:

/path/to/we-lang \
  --city=London,GB \
  -lang=en \
  -imperial \
  -narrow \
  -location_name="London"

One-Line View (view/line.py)

wttr_line(query, parsed_query)
  ↓
Get weather data (JSON)
  ↓
Parse format string
  ├─ Predefined: 1, 2, 3, 4
  └─ Custom: %c, %t, %h, %w, etc.
  ↓
Replace format codes with data
  ├─ %c → Weather emoji
  ├─ %t → Temperature
  ├─ %h → Humidity
  └─ etc.
  ↓
Return formatted string

Format Examples:

  • format=3London: ⛅️ +7°C
  • format=%l:+%c+%tLondon: ⛅️ +7°C

Moon View (view/moon.py)

get_moon(parsed_query)
  ↓
Parse date from location (Moon@2016-12-25)
  ↓
Call pyphoon-lolcat binary
  ├─ Pass date parameter
  └─ Return ASCII moon phase art
  ↓
Return moon phase output

v2 View (view/v2.py)

Experimental data-rich format
  ↓
Get weather data
  ↓
Render:
  ├─ Temperature graph (ASCII)
  ├─ Precipitation graph (ASCII)
  ├─ Moon phases (4 days)
  ├─ Current conditions (detailed)
  ├─ Astronomical times (dawn, sunrise, etc.)
  └─ GPS coordinates
  ↓
Return formatted output

Prometheus View (view/prometheus.py)

Get weather data (JSON)
  ↓
Convert to Prometheus metrics format
  ├─ temperature_feels_like_celsius{forecast="current"} 7
  ├─ humidity_percent{forecast="current"} 65
  └─ etc.
  ↓
Return metrics text

9. PNG Rendering (fmt/png.py)

For .png requests:

ANSI text output
  ↓
Spawn thread (ThreadPool, 25 workers)
  ↓
render_ansi(output, options)
  ↓
Create virtual terminal (pyte)
  ├─ Feed ANSI sequences
  └─ Capture terminal state
  ↓
Render to image (PIL)
  ├─ Draw characters with font
  ├─ Apply colors from ANSI codes
  └─ Apply transparency if requested
  ↓
Return PNG bytes
  ↓
Cache in /wttr.in/cache/png/

Options:

  • t - Transparency (150)
  • transparency={0-255} - Custom transparency
  • {width}x{height} - Image dimensions

10. Translation (translations.py)

Language Detection:

1. Check subdomain (de.wttr.in → lang=de)
2. Check lang parameter (?lang=de)
3. Check Accept-Language header
4. Default to English

Translation Files:

  • share/translations/{lang}.txt - Weather conditions
  • share/translations/{lang}-help.txt - Help pages

Translation Process:

Weather condition text (English)
  ↓
Look up in translations.py:TRANSLATIONS dict
  ↓
Find translation for target language
  ↓
Return translated text

Example:

TRANSLATIONS = {
  "en": {"Partly cloudy": "Partly cloudy"},
  "de": {"Partly cloudy": "Teilweise bewölkt"},
  "fr": {"Partly cloudy": "Partiellement nuageux"}
}

11. Caching (cache.py)

Python LRU Cache:

Request → Generate cache signature
  ↓
signature = f"{user_agent}:{query_string}:{client_ip}:{lang}"
  ↓
Check in-memory LRU (10,000 entries)
  ↓
If found and not expired:
  ├─ If value starts with "file:" or "bfile:"
  │   └─ Read from /wttr.in/cache/lru/{md5_hash}
  └─ Return value
  ↓
If not found:
  ├─ Generate response
  ├─ If response > 80 bytes:
  │   ├─ Write to /wttr.in/cache/lru/{md5_hash}
  │   └─ Store "file:{md5_hash}" in LRU
  └─ Else: Store value directly in LRU
  ↓
Set expiry: current_time + random(1000, 2000) seconds
  ↓
Return response

Dynamic Timestamps:

Cached response with %{{NOW(timezone)}}
  ↓
On retrieval: Replace with current time in timezone
  ↓
Example: %{{NOW(Europe/London)}} → 14:32:15+0000

12. Response Wrapping

Final response preparation:

Response text/bytes
  ↓
Determine content type
  ├─ PNG → image/png
  ├─ HTML → text/html
  └─ ANSI/text → text/plain
  ↓
Add buttons (if HTML and not format query)
  ├─ Add interactive UI elements
  └─ Wrap in HTML template
  ↓
Set HTTP headers
  ├─ Content-Type
  ├─ Cache-Control (PNG only)
  └─ Access-Control-Allow-Origin: *
  ↓
Return Flask response

13. Go Proxy Caching

After Python backend returns:

Python Backend → Response
  ↓
Go Proxy receives response
  ↓
If status code 200 or 304:
  ├─ Store in LRU cache
  ├─ Set expiry: current_time + random(1000, 1500) seconds
  └─ Remove InProgress flag
  ↓
Else (error):
  └─ Remove from cache
  ↓
Return response to client

14. Peak Request Prefetching

Cron-based prefetching:

Every hour at :30 and :00
  ↓
Record incoming requests in sync.Map
  ↓
At :24 and :54 (5 minutes before peak)
  ↓
Iterate through recorded requests
  ↓
For each request:
  ├─ Spawn goroutine
  ├─ Call processRequest() (refreshes cache)
  ├─ Sleep (spread over 300 seconds)
  └─ Delete from sync.Map
  ↓
Cache is warm for peak time

Peak Times:

  • :30 past the hour (recorded at :30, prefetched at :24)
  • :00 on the hour (recorded at :00, prefetched at :54)

Data Structures

Parsed Query

{
  "location": "London,GB",              # Normalized location
  "orig_location": "London",            # Original input
  "override_location_name": None,       # Display name override
  "full_address": "London, UK",         # Full address
  "country": "GB",                      # Country code
  "query_source_location": ("Paris", "France"),  # Client location
  "hemisphere": True,                   # North=True, South=False
  "lang": "en",                         # Language code
  "view": None,                         # View type (v2, etc.)
  "html_output": False,                 # HTML vs ANSI
  "png_filename": None,                 # PNG filename if .png request
  "ip_addr": "1.2.3.4",                # Client IP
  "user_agent": "curl/7.68.0",         # User agent
  "request_url": "http://...",         # Full request URL
  
  # Query options
  "use_metric": True,                   # Metric units
  "use_imperial": False,                # Imperial units
  "use_ms_for_wind": False,            # m/s for wind
  "narrow": False,                      # Narrow output
  "inverted_colors": False,            # Inverted colors
  "no-terminal": False,                 # Plain text
  "no-caption": False,                  # No caption
  "no-city": False,                     # No city name
  "no-follow-line": False,             # No follow line
  "days": "3",                          # Number of days
  "transparency": None,                 # PNG transparency
  "padding": False,                     # Add padding
  "force-ansi": False,                  # Force ANSI
}

Cache Entry (Go)

type responseWithHeader struct {
    InProgress bool          // Request being processed
    Expires    time.Time     // Expiration time
    Body       []byte        // Response body
    Header     http.Header   // HTTP headers
    StatusCode int           // HTTP status code
}

Cache Entry (Python)

{
  "val": "response text" or "file:md5hash",
  "expiry": 1702834567.123  # Unix timestamp
}

Error Handling Flow

Location Not Found

Location resolution fails
  ↓
Set location = NOT_FOUND_LOCATION ("not found")
  ↓
Fetch weather for default location (Oymyakon)
  ↓
Append "not found" message in user's language
  ↓
Return response

API Error

Weather API returns error
  ↓
Log error
  ↓
If HTML output:
  └─ Return malformed-response.html (500)
Else:
  └─ Return "capacity limit reached" message (503)

Rate Limit Exceeded

Check IP against limits (300/min, 3600/hour, 86400/day)
  ↓
If exceeded:
  └─ Return 429 with error message

Blocked Location

Check location against blacklist
  ↓
If blocked:
  └─ Return 403 Forbidden

Performance Optimizations

  1. Two-tier caching (Go + Python)
  2. Fast path (cache + static files checked first)
  3. File cache for large responses (>80 bytes)
  4. Prefetching at peak times
  5. ThreadPool for PNG rendering (25 workers)
  6. Gevent for async I/O in Python
  7. LRU eviction prevents memory bloat
  8. Randomized TTL prevents thundering herd
  9. InProgress flag prevents duplicate work
  10. IP location caching (persistent file cache)