- auth/auth_v2.py: JWT login/refresh/logout/me + bcrypt + tenant_id/role/tier claims - auth/admin_users.py: /api/admin/users CRUD + invite/role/suspend + bulk CSV - auth/gdpr.py: cookie consent + Art.20 export + Art.17 erasure + admin queue - auth/seed_demo.py: 3 demo tenants + 4 users (damir@pgz.hr / PGZ2026!) - Removed legacy /api/auth/login + /api/auth/me from pgz_sport_api.py - Wired auth/admin/gdpr routers into FastAPI 5/5 live curl tests pass: damir@pgz.hr login → JWT with tenant_id=1, role=pgz_admin, tier=0
6.3 KiB
HANDOFF — FULL MIGRATION + CLEANUP
Datum: 04.05.2026 23:50 CEST
Autor: Damir Radulić (kroz Claude session)
Verzija: v1.0
TL;DR
Migracija s GPU servera (144.76.68.5) na Server B (10.10.0.2) POTPUNA. Lokalni PG stopped+disabled. Sustav radi 100% iz Server B-a. Disk recovered ~30GB. Cron timeoutovi dodani da spriječe daljnje stuck procese.
Što je urađeno večeras
1. Migracija ovisnosti (pgz_sport + ostali)
pgz_sport_api.py: DSNlocalhost:5432→10.10.0.2:6432✅pgz_sport_v2_router.py: isto fixedlearn_loop.py: provjereno, već ide na Server Breembed_phase2.py: DSN fix → 10.10.0.2:6432reembed_knowledge_v2.py: import iz docstring-a fix (DB_DSN bio undefined)
2. EnvironmentFile fix (GLAVNI BUG)
Bilo bez EnvironmentFile=/opt/rinet-gpu/.env.master:
dabi-orchestrator-v3.service✅rinet-mcp.service✅rinet-supervisor.service✅rinet-heartbeat.service✅
Posljedica: env vars (QDRANT_URL, GROQ_API_KEY, ANTHROPIC_API_KEY, DEEPSEEK_API_KEY) nisu stizale procesima.
3. Mass-fix Qdrant URL (35+ scripts)
localhost:6333→10.10.0.2:6333u 55+ aktivnih file-ova- Pokriveno: /opt/rinet-gpu, /opt/ai-rinet, /opt/pgz-sport, /opt/dabi-persona, /opt/portal-rinet
- Ostali: backup files (pre_b_switch, .bak.*) — nije dirano
4. TG spam blokiranje
- Globalni Python monkey-patch
/usr/lib/python3/dist-packages/usercustomize.py - Intercept svaki
requests.post("api.telegram.org/...")u svim Python procesima - Šalje kroz
rinet-notifyrate-limited helper (max 5/h, dedup 30min) - Bash wrapper
/usr/local/bin/rinet-curl-tg - Disabled cron monitor (embed_monitor.sh, embed_monitor_p2.sh)
5. Anthropic Tier 4 (ZADNJI u waterfall) ✅
Linije 484+496 u dabi_orchestrator_v3.py:
Tier 0: dabi-budget LoRA (port 8765)
Tier 1: vLLM Qwen 7B (port 8001)
Tier 2: Groq llama-4-scout
Tier 3: DeepSeek V3
Tier 4: Anthropic Claude ← ZADNJI
ENV var bug fix: CLAUDE_API_KEY → ANTHROPIC_API_KEY
6. Multi-language support (HR/EN/DE/IT)
_translate_to()+_detect_query_lang()u/opt/ai-rinet/ai_gateway.py- HR: native ✅
- EN: radi ✅
- DE: radi ✅
- IT: povremeno (Groq rate-limit issue)
7. Sport scrapers — pokrenuti svi
Bili 5 INACTIVE, sad SVI ACTIVE:
- sport-pgz-deep-loop ✅
- sport-master-loop ✅
- sport-extra-loop ✅
- sport-fed-scrapers ✅
- sport-oib-loop ✅
- sport-dabi-quiz ✅
pgz_sport_deep.py: keyword filter prošireno 8 → 26 keywords (sport, klub, savez, sportaš, kup, prvenstvo, liga, utakmica, igrač, trener, olimpij, paraolimpij, turn, medalj, pobjed, rijeka, pgž, primorsko, subvenc, natječaj, odluka, proračun, rebal...)
8. Reembed processes — radi
tmux 'reembed': 89% done, rate 55-173k/s ⭐reembed_phase2.py: PID 1790646, 85-102k/h, court_notices_v2 + rsv_enriched_v2
9. LoRA daily timer — REVIVED ⭐
Bug: timer bio mrtav od 03.05.2026!
Fix: systemctl enable lora-finetune.timer + start
Training pokrenuto 23:24 — 100,000 examples + 309 eval
10. KPI Dashboard — LIVE
- JSON: https://sport.rinet.one/admin/api/kpi
- HTML: https://sport.rinet.one/admin/api/kpi-page (auto-refresh 30s)
11. Continuous loops (15 cron)
| Cron | Loop | Timeout |
|---|---|---|
| */2 min | lora_watchdog | - |
| */5 min | smoke_test | 60s ⭐ |
| */5 min | kpi_snapshot | 30s ⭐ |
| */10 min | latency_alert | 30s ⭐ |
| */15 min | halu_scanner | 60s ⭐ |
| */20 min | learn_from_errors | 90s ⭐ |
| */30 min | capture_to_training | 120s ⭐ |
| */30 min | scraper_health | 90s ⭐ |
| */45 min | regression_test | 90s ⭐ |
| 0 * | hourly_status | 30s ⭐ |
| 0 8 | daily_learning | - |
| 0 4 daily | RAGAS eval | - |
| 0 2 daily | overnight_learning | - |
| daily 03:00 | LoRA fine-tune | - |
| daily 03:07 | master_backup 22TB | - |
⭐ = timeout dodan večeras (spriječava stuck procese)
12. Lokalni PG — STOPPED + DISABLED
systemctl stop postgresql✅systemctl disable postgresql✅- Listen 5432: NONE
- Schema backup u
/mnt/cold/local_pg_schema_backup_20260504_2343.sql.gz(109K) - Data dir
/var/lib/postgresql/18/main(47GB) NIJE OBRISAN (čekamo 24h verifikaciju)
13. Stuck procesi ubijeni
- 46× smoke_test stuck → 0
- 8× scraper_health stuck → 0
- 5× hourly_status stuck → 0
- 1× duplicate master_scraper_coordinator → 0
- Total 60 stuck procesa eliminirano
14. Disk cleanup (~30GB recovered)
/tmp/ocr_resized(15GB)/tmp/sprint(13GB)/tmp/rinet_v3_backup.dump(2.2GB old PG dump)/root/.cache/uv(6.1GB)- 201× .bak files older 14 days
- 113× pycache dirs
Trenutno stanje
PG: Server B 10.10.0.2:6432 (5,315,161 facts)
Lokalni 5432 STOPPED + DISABLED
PgBouncer: 127.0.0.1:6432 → host=10.10.0.2 port=5432 (proxy to Server B)
Qdrant: Server B 10.10.0.2:6333 (46 collections, 14M+ vectors)
Lokalni 6333: NE POSTOJI
Redis: Lokalni 6379 (cache)
Neo4j: Lokalni 7687 (615,580 nodes, 756,333 relations)
Embed: Lokalni 9879 (BGE-M3, dim 1024)
Reranker: Lokalni 8099/8100/8101 (3 instance)
vLLM: Lokalni 8001 (Qwen2.5-7B-Instruct-AWQ)
F10 LoRA: Lokalni 8765 (dabi-budget-lora-q4)
Ollama: Lokalni 11434 (qwen3:14b, llama3.2:3b)
MCP: Lokalni 8810 (7 tools)
Što ostaje za dovršiti
- 24h dry-run lokalni PG stop — provjeriti je li sve OK pa onda obrisati
/var/lib/postgresql/18/main(47GB) drop_gpu_pg.sh— pripremljen prije, NE pokretati dok dry-run ne potvrdi- Multi-lang IT/DE retry — Groq rate-limit issue povremeno
- 9 facts bez source — UPDATE bio prekinut Bridge timeout-om, treba ponoviti
- Neo4j integration u RAG — orchestrator još ne koristi knowledge graph (756k relations leže neiskorišteno)
Testovi prošli
- Smoke 4 questions: 3/4 PASS (Bok, NK Rijeka predsjednik, Kup HR; PGŽ proracun timeout via Bridge)
- vLLM: response OK
- Embed BGE-M3: dim 1024 OK
- RAG: tier 1 vLLM + tier 2 Groq + tier 0 DB priority sve rade
- Server B PG via PgBouncer: 5,315,161 facts ✅
- Sport+PGŽ embed: 99.97% / 99.92% ✅
- Halucinacije 24h: 0 ✅
- Sport scrapers: 6 active ✅
Bridge stability notes
- Bridge timeout-i tijekom session-a (server pod opterećenjem)
- Glavni razlog: GPU 100% util (LoRA training), 18+ paralelni scrapers
- Load average peak: 126 (sad 11)