Commit Graph

2 Commits

Author SHA1 Message Date
damir 63ca005b6e DEBUG OBSERVABILITY: live error feed + auto-triage bot + dashboard
PHASE 1 — DEBUG mode:
- /etc/systemd/system/pgz-sport.service.d/debug.conf: DEBUG=1, LOG_LEVEL=DEBUG, PYTHONUNBUFFERED=1, UVICORN_LOG_LEVEL=debug

PHASE 2 — Error stream:
- /opt/pgz-sport/scripts/debug_tail.sh: tail journalctl + nginx → /var/log/pgz-sport-debug/{stream,errors}.jsonl
- pgz-debug-tail.service (always restart, multiplexes 4 sources)

PHASE 3 — Auto-triage bot:
- /opt/pgz-sport/scripts/auto_triage.py: classifies errors, dispatches CC agents
- Patterns: 5xx spike → CC4, 401/403 spike → CC2, 4xx API → CC3, ImportError/DB → CC4
- Rate limit: 6 telegram/5min
- Records decisions in triage_decisions.jsonl
- pgz-auto-triage.service

PHASE 4 — Live dashboard:
- routers/debug_router.py mounted in pgz_sport_api
- GET /api/debug/health — services + DB + error count
- GET /api/debug/errors?limit=N — last N errors (JSON)
- GET /api/debug/decisions — auto-fix decisions
- GET /api/debug/stream — full log tail
- GET /api/debug/dashboard — live HTML refresh 5s

Damir admin tier dashboard: https://sport.rinet.one/sport/api/debug/dashboard
2026-05-05 08:46:09 +02:00
damir 4fc8327789 R7+ orchestrator + CC3 logo home: combined patches
Orchestrator-side:
- routers/img_proxy_router.py: 4xx/5xx → 1x1 transparent PNG (eliminates cascade <img onerror>)
- static/sport2.html: removed standalone three.min.js (3d-force-graph bundles), bumped to 1.73.4

CC3 (before limit hit):
- Logo home link applied to ALL HTML pages (admin.html, admin_users.html, audit.html, crm.html, erp.html, kpi.html, login.html)
- Backups in _backups/*.cc3_pre_logo.$ts

CC4 R3 (before plan mode):
- _backups/r3_cc4/ocr.py.pre_S2.$ts

Audit screenshots (80 pages) committed to _audit/audit_20260505_023639/shots/
2026-05-05 08:20:07 +02:00