wttr/zig/REWRITE_STRATEGY.md

714 lines
15 KiB
Markdown

# wttr.in Zig Rewrite Strategy
## Goals
1. **Single binary** - Replace Python + Go with one Zig executable
2. **No external binaries** - Eliminate wego, pyphoon dependencies
3. **Maintain compatibility** - All existing API endpoints work identically
4. **Improve performance** - Faster response times, lower memory usage
5. **Add tests** - Comprehensive test coverage from day one
6. **Simplify deployment** - Single binary + data files
## Non-Goals
- Rewriting weather APIs (still use met.no/WWO)
- Changing API surface (maintain backward compatibility)
- Rewriting geolocator service (keep as separate service for now)
## Phase 1: Analysis & Design ✓
**Status:** Complete (this document)
**Deliverables:**
- [x] ARCHITECTURE.md - System overview
- [x] API_ENDPOINTS.md - API reference
- [x] DATA_FLOW.md - Request processing flow
- [x] REWRITE_STRATEGY.md - This document
## Phase 2: Foundation (Week 1-2)
### 2.1 Project Setup
**Tasks:**
- [ ] Create `build.zig` with proper structure
- [ ] Set up module organization
- [ ] Configure dependencies (if any)
- [ ] Set up test framework
- [ ] Create CI/CD pipeline (GitHub Actions)
**Modules:**
```
src/
├── main.zig # Entry point
├── server.zig # HTTP server
├── router.zig # Request routing
├── config.zig # Configuration
├── cache/
│ ├── lru.zig # LRU cache implementation
│ └── file.zig # File-backed cache
├── location/
│ ├── resolver.zig # Location resolution
│ ├── geoip.zig # GeoIP lookups
│ └── normalize.zig # Location normalization
├── weather/
│ ├── client.zig # Weather API client
│ ├── metno.zig # met.no API
│ └── wwo.zig # WorldWeatherOnline API
├── render/
│ ├── ansi.zig # ANSI rendering
│ ├── html.zig # HTML rendering
│ ├── json.zig # JSON rendering
│ ├── png.zig # PNG rendering
│ ├── line.zig # One-line format
│ └── v2.zig # v2 format
├── i18n/
│ ├── translations.zig # Translation system
│ └── loader.zig # Load translation files
└── utils/
├── http.zig # HTTP utilities
├── ip.zig # IP address handling
└── time.zig # Time utilities
```
### 2.2 Core HTTP Server
**Tasks:**
- [ ] Implement HTTP server using std.http.Server
- [ ] Request parsing (headers, query params, path)
- [ ] Response building (status, headers, body)
- [ ] Basic routing (exact match, wildcard)
- [ ] Static file serving
**Tests:**
- [ ] Parse GET requests
- [ ] Parse query parameters
- [ ] Route to handlers
- [ ] Serve static files
- [ ] Handle 404s
**Acceptance Criteria:**
- Server listens on configurable port
- Handles concurrent requests
- Routes to placeholder handlers
- Serves files from share/ directory
### 2.3 Configuration System
**Tasks:**
- [ ] Load environment variables
- [ ] Load config files (if needed)
- [ ] Validate configuration
- [ ] Provide defaults
**Configuration:**
```zig
const Config = struct {
listen_host: []const u8,
listen_port: u16,
geolite_path: []const u8,
cache_dir: []const u8,
log_level: LogLevel,
// API keys
ip2location_key: ?[]const u8,
ipinfo_token: ?[]const u8,
wwo_key: ?[]const u8,
};
```
**Tests:**
- [ ] Load from environment
- [ ] Apply defaults
- [ ] Validate paths exist
## Phase 3: Caching Layer (Week 2-3)
### 3.1 LRU Cache
**Tasks:**
- [ ] Implement generic LRU cache
- [ ] Thread-safe operations (Mutex)
- [ ] TTL support
- [ ] Eviction policy
- [ ] Cache statistics
**Tests:**
- [ ] Insert and retrieve
- [ ] LRU eviction
- [ ] TTL expiration
- [ ] Concurrent access
- [ ] Memory limits
**Acceptance Criteria:**
- Configurable size (default 12,800)
- O(1) get/put operations
- Thread-safe
- TTL-based expiration
### 3.2 File Cache
**Tasks:**
- [ ] Store large responses to disk
- [ ] MD5 hash for filenames
- [ ] Read/write with proper locking
- [ ] Cleanup old files
**Tests:**
- [ ] Write and read files
- [ ] Handle binary data
- [ ] Concurrent access
- [ ] Disk space limits
### 3.3 Cache Integration
**Tasks:**
- [ ] Generate cache keys
- [ ] Check cache before processing
- [ ] Store responses after processing
- [ ] Handle InProgress flag (prevent thundering herd)
**Tests:**
- [ ] Cache hit returns cached response
- [ ] Cache miss processes request
- [ ] Concurrent requests for same key
- [ ] Cache key generation
## Phase 4: Location Resolution (Week 3-4)
### 4.1 Location Parsing
**Tasks:**
- [ ] Parse location from URL
- [ ] Normalize location names
- [ ] Handle special prefixes (~, @)
- [ ] Parse cyclic locations (:)
- [ ] Load aliases file
- [ ] Load blacklist file
**Tests:**
- [ ] Parse city names
- [ ] Parse coordinates
- [ ] Parse IATA codes
- [ ] Parse special locations (Moon, etc.)
- [ ] Normalize names
- [ ] Apply aliases
- [ ] Check blacklist
### 4.2 GeoIP Lookup
**Tasks:**
- [ ] Read MaxMind GeoLite2 database
- [ ] IP to location lookup
- [ ] Cache results
**Options:**
- Use C library (libmaxminddb) via @cImport
- Or: Parse MMDB format in pure Zig
**Tests:**
- [ ] Lookup IPv4 address
- [ ] Lookup IPv6 address
- [ ] Handle not found
- [ ] Cache lookups
### 4.3 External Geolocation
**Tasks:**
- [ ] HTTP client for geolocator service
- [ ] Parse JSON responses
- [ ] Error handling
**Tests:**
- [ ] Query geolocator
- [ ] Parse response
- [ ] Handle errors
- [ ] Timeout handling
### 4.4 IP Location APIs
**Tasks:**
- [ ] IP2Location API client
- [ ] IPInfo API client
- [ ] File-based caching
- [ ] Fallback chain
**Tests:**
- [ ] Query each API
- [ ] Parse responses
- [ ] Cache results
- [ ] Fallback on error
## Phase 5: Weather Data (Week 4-5)
### 5.1 HTTP Client
**Tasks:**
- [ ] Generic HTTP client
- [ ] Connection pooling
- [ ] Timeout handling
- [ ] Retry logic
- [ ] User-Agent setting
**Tests:**
- [ ] GET requests
- [ ] POST requests
- [ ] Headers
- [ ] Timeouts
- [ ] Retries
### 5.2 met.no Client
**Tasks:**
- [ ] API endpoint construction
- [ ] Parse XML/JSON responses
- [ ] Transform to internal format
- [ ] Error handling
**Tests:**
- [ ] Fetch weather data
- [ ] Parse response
- [ ] Handle API errors
- [ ] Handle rate limits
### 5.3 WorldWeatherOnline Client
**Tasks:**
- [ ] API endpoint construction
- [ ] Parse JSON responses
- [ ] Transform to internal format
- [ ] Caching (proxy cache)
**Tests:**
- [ ] Fetch weather data
- [ ] Parse response
- [ ] Handle API errors
- [ ] Cache responses
### 5.4 Weather Data Model
**Tasks:**
- [ ] Define internal weather data structure
- [ ] Conversion from met.no format
- [ ] Conversion from WWO format
- [ ] JSON serialization
**Tests:**
- [ ] Convert met.no data
- [ ] Convert WWO data
- [ ] Serialize to JSON
- [ ] Validate data
## Phase 6: Rendering (Week 5-7)
### 6.1 ANSI Renderer
**Tasks:**
- [ ] Generate ANSI weather report
- [ ] ASCII art for weather conditions
- [ ] Color codes
- [ ] Box drawing characters
- [ ] Temperature graphs
- [ ] Wind direction arrows
**Tests:**
- [ ] Render current weather
- [ ] Render forecast
- [ ] Apply colors
- [ ] Handle narrow mode
- [ ] Handle inverted colors
**Acceptance Criteria:**
- Output matches wego format
- All weather conditions supported
- Configurable width
### 6.2 One-Line Renderer
**Tasks:**
- [ ] Parse format string
- [ ] Replace format codes (%c, %t, etc.)
- [ ] Handle predefined formats (1-4)
- [ ] Emoji support
**Tests:**
- [ ] Format 1-4
- [ ] Custom format strings
- [ ] All format codes
- [ ] Multiple locations
### 6.3 JSON Renderer
**Tasks:**
- [ ] Serialize weather data to JSON
- [ ] Match WWO API format
- [ ] Pretty printing
**Tests:**
- [ ] Serialize current conditions
- [ ] Serialize forecast
- [ ] Match expected format
### 6.4 HTML Renderer
**Tasks:**
- [ ] Convert ANSI to HTML
- [ ] Apply CSS styling
- [ ] Add interactive buttons
- [ ] Template system
**Tests:**
- [ ] Convert ANSI codes
- [ ] Apply colors
- [ ] Render buttons
- [ ] Template rendering
### 6.5 PNG Renderer
**Tasks:**
- [ ] Render ANSI to image
- [ ] Font rendering
- [ ] Color support
- [ ] Transparency
**Options:**
- Use C library (libpng, freetype) via @cImport
- Or: Pure Zig implementation (more work)
**Tests:**
- [ ] Render text
- [ ] Apply colors
- [ ] Apply transparency
- [ ] Handle fonts
### 6.6 v2 Renderer
**Tasks:**
- [ ] Data-rich format
- [ ] Temperature graphs
- [ ] Moon phases
- [ ] Astronomical times
**Tests:**
- [ ] Render all sections
- [ ] Handle different locations
- [ ] Match expected format
### 6.7 Prometheus Renderer
**Tasks:**
- [ ] Convert weather data to metrics
- [ ] Prometheus text format
- [ ] Metric naming
**Tests:**
- [ ] Render metrics
- [ ] Match Prometheus format
- [ ] All weather fields
## Phase 7: Translation System (Week 7-8)
### 7.1 Translation Loader
**Tasks:**
- [ ] Load translation files
- [ ] Parse translation format
- [ ] Build translation tables
- [ ] Language detection
**Tests:**
- [ ] Load all language files
- [ ] Parse translations
- [ ] Detect language from header
- [ ] Detect language from subdomain
### 7.2 Message Translation
**Tasks:**
- [ ] Translate weather conditions
- [ ] Translate UI messages
- [ ] Fallback to English
**Tests:**
- [ ] Translate conditions
- [ ] Translate messages
- [ ] Handle missing translations
- [ ] Fallback logic
## Phase 8: Request Processing (Week 8-9)
### 8.1 Query Parser
**Tasks:**
- [ ] Parse query parameters
- [ ] Parse single-letter options
- [ ] Parse PNG filenames
- [ ] Serialize/deserialize state
**Tests:**
- [ ] Parse all options
- [ ] Parse PNG filenames
- [ ] Serialize state
- [ ] Deserialize state
### 8.2 Request Handler
**Tasks:**
- [ ] Main request handler
- [ ] Fast path (cache + static)
- [ ] Full path (location + weather + render)
- [ ] Error handling
- [ ] Rate limiting
**Tests:**
- [ ] Handle cached requests
- [ ] Handle static pages
- [ ] Handle weather queries
- [ ] Handle errors
- [ ] Enforce rate limits
### 8.3 Rate Limiting
**Tasks:**
- [ ] Per-IP counters
- [ ] Time buckets (minute, hour, day)
- [ ] Whitelist support
- [ ] Return 429 on limit
**Tests:**
- [ ] Count requests
- [ ] Enforce limits
- [ ] Reset counters
- [ ] Whitelist bypass
## Phase 9: Integration & Testing (Week 9-10)
### 9.1 End-to-End Tests
**Tasks:**
- [ ] Test all API endpoints
- [ ] Test all output formats
- [ ] Test all query options
- [ ] Test error cases
- [ ] Compare with Python output
**Test Cases:**
- [ ] Basic weather query
- [ ] Location resolution
- [ ] All output formats
- [ ] All languages
- [ ] PNG rendering
- [ ] Rate limiting
- [ ] Error handling
### 9.2 Performance Testing
**Tasks:**
- [ ] Benchmark request latency
- [ ] Benchmark throughput
- [ ] Memory profiling
- [ ] Cache hit rates
- [ ] Compare with Python/Go
**Metrics:**
- Requests per second
- Average latency
- P95/P99 latency
- Memory usage
- Cache hit rate
### 9.3 Compatibility Testing
**Tasks:**
- [ ] Run integration test (test/query.sh)
- [ ] Compare SHA1 hashes
- [ ] Fix any differences
- [ ] Document intentional changes
## Phase 10: Deployment (Week 10-11)
### 10.1 Packaging
**Tasks:**
- [ ] Build release binary
- [ ] Package data files
- [ ] Create Docker image
- [ ] Write deployment docs
### 10.2 Migration Plan
**Tasks:**
- [ ] Deploy Zig version alongside Python/Go
- [ ] Route small percentage of traffic
- [ ] Monitor errors and performance
- [ ] Gradually increase traffic
- [ ] Full cutover
**Rollback Plan:**
- Keep Python/Go running
- Route traffic back if issues
- Fix issues in Zig version
- Retry migration
### 10.3 Documentation
**Tasks:**
- [ ] Update README
- [ ] Installation instructions
- [ ] Configuration guide
- [ ] API documentation
- [ ] Development guide
## Risks & Mitigations
### Risk: PNG Rendering Complexity
**Mitigation:**
- Start with external library (libpng, freetype)
- Consider pure Zig later if needed
- Or: Keep PNG rendering in separate service
### Risk: GeoIP Database Parsing
**Mitigation:**
- Use C library (libmaxminddb) initially
- Consider pure Zig parser later
- Well-documented format
### Risk: ANSI Rendering Differences
**Mitigation:**
- Extensive testing against wego output
- Pixel-perfect comparison for PNG
- Accept minor differences if functionally equivalent
### Risk: Performance Regression
**Mitigation:**
- Benchmark early and often
- Profile hot paths
- Optimize critical sections
- Compare with Python/Go baseline
### Risk: Translation File Parsing
**Mitigation:**
- Simple format (key: value)
- Robust parser with error handling
- Validate all files on startup
### Risk: Weather API Changes
**Mitigation:**
- Abstract API clients
- Version API responses
- Monitor for changes
- Fallback to other API
## Success Criteria
### Functional
- [ ] All API endpoints work
- [ ] All output formats match
- [ ] All languages supported
- [ ] All query options work
- [ ] Integration tests pass
### Performance
- [ ] Latency < Python/Go
- [ ] Throughput > Python/Go
- [ ] Memory < Python/Go
- [ ] Binary size < 10MB
### Quality
- [ ] >80% test coverage
- [ ] No memory leaks
- [ ] No crashes under load
- [ ] Clean error handling
### Operational
- [ ] Single binary deployment
- [ ] No external dependencies (except data files)
- [ ] Easy configuration
- [ ] Good logging
- [ ] Metrics/monitoring
## Timeline
**Total: 10-11 weeks**
- Week 1-2: Foundation
- Week 2-3: Caching
- Week 3-4: Location
- Week 4-5: Weather
- Week 5-7: Rendering
- Week 7-8: Translation
- Week 8-9: Integration
- Week 9-10: Testing
- Week 10-11: Deployment
**Milestones:**
1. **Week 2:** HTTP server running, basic routing
2. **Week 4:** Location resolution working
3. **Week 6:** Weather data fetching working
4. **Week 8:** ANSI rendering working
5. **Week 10:** All features complete, testing done
6. **Week 11:** Deployed to production
## Alternative: Incremental Approach
Instead of full rewrite, replace components incrementally:
### Option A: Replace Go Proxy First
1. Rewrite Go proxy in Zig (Week 1-2)
2. Keep Python backend
3. Test and deploy
4. Then rewrite Python backend (Week 3-11)
**Advantages:**
- Smaller initial scope
- Faster time to value
- De-risks Zig HTTP handling
- Can abandon if issues
### Option B: Replace Python Backend First
1. Keep Go proxy
2. Rewrite Python backend in Zig (Week 1-10)
3. Test and deploy
4. Then replace Go proxy (Week 11)
**Advantages:**
- Most complexity in backend
- Proves out rendering logic
- Can keep Go proxy if it works well
### Recommendation: Option A
Start with Go proxy replacement:
- Smaller scope (400 lines vs 5000+)
- Clear interface boundary
- Tests caching/HTTP in Zig
- Quick win (2 weeks)
- De-risks larger rewrite
## Next Steps
1. **Review this document** - Get feedback on approach
2. **Choose strategy** - Full rewrite vs incremental
3. **Set up project** - Create build.zig, directory structure
4. **Start coding** - Begin with HTTP server or Go proxy replacement
5. **Iterate** - Build, test, refine
## Questions to Answer
- [ ] Use C libraries (libpng, freetype, libmaxminddb) or pure Zig?
- [ ] Keep geolocator as separate service or integrate?
- [ ] Keep wego/pyphoon or rewrite rendering?
- [ ] Full rewrite or incremental replacement?
- [ ] Target Zig version (0.11, 0.12, 0.13)?
- [ ] Async I/O strategy (std.event, manual, blocking)?