# HANDOFF — FULL MIGRATION + CLEANUP **Datum:** 04.05.2026 23:50 CEST **Autor:** Damir Radulić (kroz Claude session) **Verzija:** v1.0 ## TL;DR Migracija s GPU servera (144.76.68.5) na Server B (10.10.0.2) **POTPUNA**. Lokalni PG **stopped+disabled**. Sustav radi 100% iz Server B-a. Disk recovered ~30GB. Cron timeoutovi dodani da spriječe daljnje stuck procese. ## Što je urađeno večeras ### 1. Migracija ovisnosti (pgz_sport + ostali) - `pgz_sport_api.py`: DSN `localhost:5432` → `10.10.0.2:6432` ✅ - `pgz_sport_v2_router.py`: isto fixed - `learn_loop.py`: provjereno, već ide na Server B - `reembed_phase2.py`: DSN fix → 10.10.0.2:6432 - `reembed_knowledge_v2.py`: import iz docstring-a fix (DB_DSN bio undefined) ### 2. EnvironmentFile fix (GLAVNI BUG) Bilo bez `EnvironmentFile=/opt/rinet-gpu/.env.master`: - `dabi-orchestrator-v3.service` ✅ - `rinet-mcp.service` ✅ - `rinet-supervisor.service` ✅ - `rinet-heartbeat.service` ✅ Posljedica: env vars (QDRANT_URL, GROQ_API_KEY, ANTHROPIC_API_KEY, DEEPSEEK_API_KEY) nisu stizale procesima. ### 3. Mass-fix Qdrant URL (35+ scripts) - `localhost:6333` → `10.10.0.2:6333` u **55+ aktivnih file-ova** - Pokriveno: /opt/rinet-gpu, /opt/ai-rinet, /opt/pgz-sport, /opt/dabi-persona, /opt/portal-rinet - Ostali: backup files (pre_b_switch, .bak.*) — nije dirano ### 4. TG spam blokiranje - Globalni Python monkey-patch `/usr/lib/python3/dist-packages/usercustomize.py` - Intercept svaki `requests.post("api.telegram.org/...")` u svim Python procesima - Šalje kroz `rinet-notify` rate-limited helper (max 5/h, dedup 30min) - Bash wrapper `/usr/local/bin/rinet-curl-tg` - Disabled cron monitor (embed_monitor.sh, embed_monitor_p2.sh) ### 5. Anthropic Tier 4 (ZADNJI u waterfall) ✅ Linije 484+496 u `dabi_orchestrator_v3.py`: ``` Tier 0: dabi-budget LoRA (port 8765) Tier 1: vLLM Qwen 7B (port 8001) Tier 2: Groq llama-4-scout Tier 3: DeepSeek V3 Tier 4: Anthropic Claude ← ZADNJI ``` ENV var bug fix: `CLAUDE_API_KEY` → `ANTHROPIC_API_KEY` ### 6. Multi-language support (HR/EN/DE/IT) - `_translate_to()` + `_detect_query_lang()` u `/opt/ai-rinet/ai_gateway.py` - HR: native ✅ - EN: radi ✅ - DE: radi ✅ - IT: povremeno (Groq rate-limit issue) ### 7. Sport scrapers — pokrenuti svi Bili 5 INACTIVE, sad SVI ACTIVE: - sport-pgz-deep-loop ✅ - sport-master-loop ✅ - sport-extra-loop ✅ - sport-fed-scrapers ✅ - sport-oib-loop ✅ - sport-dabi-quiz ✅ `pgz_sport_deep.py`: keyword filter prošireno **8 → 26 keywords** (sport, klub, savez, sportaš, kup, prvenstvo, liga, utakmica, igrač, trener, olimpij, paraolimpij, turn, medalj, pobjed, rijeka, pgž, primorsko, subvenc, natječaj, odluka, proračun, rebal...) ### 8. Reembed processes — radi - `tmux 'reembed'`: 89% done, rate 55-173k/s ⭐ - `reembed_phase2.py`: PID 1790646, 85-102k/h, court_notices_v2 + rsv_enriched_v2 ### 9. LoRA daily timer — REVIVED ⭐ **Bug**: timer bio mrtav od 03.05.2026! **Fix**: `systemctl enable lora-finetune.timer` + start Training pokrenuto 23:24 — 100,000 examples + 309 eval ### 10. KPI Dashboard — LIVE - JSON: https://sport.rinet.one/admin/api/kpi - HTML: https://sport.rinet.one/admin/api/kpi-page (auto-refresh 30s) ### 11. Continuous loops (15 cron) | Cron | Loop | Timeout | |---|---|---| | */2 min | lora_watchdog | - | | */5 min | smoke_test | 60s ⭐ | | */5 min | kpi_snapshot | 30s ⭐ | | */10 min | latency_alert | 30s ⭐ | | */15 min | halu_scanner | 60s ⭐ | | */20 min | learn_from_errors | 90s ⭐ | | */30 min | capture_to_training | 120s ⭐ | | */30 min | scraper_health | 90s ⭐ | | */45 min | regression_test | 90s ⭐ | | 0 * | hourly_status | 30s ⭐ | | 0 8 | daily_learning | - | | 0 4 daily | RAGAS eval | - | | 0 2 daily | overnight_learning | - | | daily 03:00 | LoRA fine-tune | - | | daily 03:07 | master_backup 22TB | - | ⭐ = timeout dodan večeras (spriječava stuck procese) ### 12. Lokalni PG — STOPPED + DISABLED - `systemctl stop postgresql` ✅ - `systemctl disable postgresql` ✅ - Listen 5432: NONE - Schema backup u `/mnt/cold/local_pg_schema_backup_20260504_2343.sql.gz` (109K) - Data dir `/var/lib/postgresql/18/main` (47GB) **NIJE OBRISAN** (čekamo 24h verifikaciju) ### 13. Stuck procesi ubijeni - 46× smoke_test stuck → 0 - 8× scraper_health stuck → 0 - 5× hourly_status stuck → 0 - 1× duplicate master_scraper_coordinator → 0 - **Total 60 stuck procesa eliminirano** ### 14. Disk cleanup (~30GB recovered) - `/tmp/ocr_resized` (15GB) - `/tmp/sprint` (13GB) - `/tmp/rinet_v3_backup.dump` (2.2GB old PG dump) - `/root/.cache/uv` (6.1GB) - 201× .bak files older 14 days - 113× __pycache__ dirs ## Trenutno stanje ``` PG: Server B 10.10.0.2:6432 (5,315,161 facts) Lokalni 5432 STOPPED + DISABLED PgBouncer: 127.0.0.1:6432 → host=10.10.0.2 port=5432 (proxy to Server B) Qdrant: Server B 10.10.0.2:6333 (46 collections, 14M+ vectors) Lokalni 6333: NE POSTOJI Redis: Lokalni 6379 (cache) Neo4j: Lokalni 7687 (615,580 nodes, 756,333 relations) Embed: Lokalni 9879 (BGE-M3, dim 1024) Reranker: Lokalni 8099/8100/8101 (3 instance) vLLM: Lokalni 8001 (Qwen2.5-7B-Instruct-AWQ) F10 LoRA: Lokalni 8765 (dabi-budget-lora-q4) Ollama: Lokalni 11434 (qwen3:14b, llama3.2:3b) MCP: Lokalni 8810 (7 tools) ``` ## Što ostaje za dovršiti 1. **24h dry-run lokalni PG stop** — provjeriti je li sve OK pa onda obrisati `/var/lib/postgresql/18/main` (47GB) 2. **`drop_gpu_pg.sh`** — pripremljen prije, **NE pokretati** dok dry-run ne potvrdi 3. **Multi-lang IT/DE retry** — Groq rate-limit issue povremeno 4. **9 facts bez source** — UPDATE bio prekinut Bridge timeout-om, treba ponoviti 5. **Neo4j integration u RAG** — orchestrator još ne koristi knowledge graph (756k relations leže neiskorišteno) ## Testovi prošli - Smoke 4 questions: 3/4 PASS (Bok, NK Rijeka predsjednik, Kup HR; PGŽ proracun timeout via Bridge) - vLLM: response OK - Embed BGE-M3: dim 1024 OK - RAG: tier 1 vLLM + tier 2 Groq + tier 0 DB priority sve rade - Server B PG via PgBouncer: 5,315,161 facts ✅ - Sport+PGŽ embed: 99.97% / 99.92% ✅ - Halucinacije 24h: 0 ✅ - Sport scrapers: 6 active ✅ ## Bridge stability notes - Bridge timeout-i tijekom session-a (server pod opterećenjem) - Glavni razlog: GPU 100% util (LoRA training), 18+ paralelni scrapers - Load average peak: 126 (sad 11)