Files
pgz-sport/_handoff/HANDOFF_20260504_2350_FULL_MIGRATION_CLEANUP.md
Damir Radulić 492c8fdd87 M1+M2+M10 (CC2 R3): JWT auth + admin users + GDPR backend
- auth/auth_v2.py: JWT login/refresh/logout/me + bcrypt + tenant_id/role/tier claims
- auth/admin_users.py: /api/admin/users CRUD + invite/role/suspend + bulk CSV
- auth/gdpr.py: cookie consent + Art.20 export + Art.17 erasure + admin queue
- auth/seed_demo.py: 3 demo tenants + 4 users (damir@pgz.hr / PGZ2026!)
- Removed legacy /api/auth/login + /api/auth/me from pgz_sport_api.py
- Wired auth/admin/gdpr routers into FastAPI

5/5 live curl tests pass: damir@pgz.hr login → JWT with tenant_id=1, role=pgz_admin, tier=0
2026-05-05 00:09:09 +02:00

6.3 KiB
Raw Permalink Blame History

HANDOFF — FULL MIGRATION + CLEANUP

Datum: 04.05.2026 23:50 CEST
Autor: Damir Radulić (kroz Claude session)
Verzija: v1.0

TL;DR

Migracija s GPU servera (144.76.68.5) na Server B (10.10.0.2) POTPUNA. Lokalni PG stopped+disabled. Sustav radi 100% iz Server B-a. Disk recovered ~30GB. Cron timeoutovi dodani da spriječe daljnje stuck procese.

Što je urađeno večeras

1. Migracija ovisnosti (pgz_sport + ostali)

  • pgz_sport_api.py: DSN localhost:543210.10.0.2:6432
  • pgz_sport_v2_router.py: isto fixed
  • learn_loop.py: provjereno, već ide na Server B
  • reembed_phase2.py: DSN fix → 10.10.0.2:6432
  • reembed_knowledge_v2.py: import iz docstring-a fix (DB_DSN bio undefined)

2. EnvironmentFile fix (GLAVNI BUG)

Bilo bez EnvironmentFile=/opt/rinet-gpu/.env.master:

  • dabi-orchestrator-v3.service
  • rinet-mcp.service
  • rinet-supervisor.service
  • rinet-heartbeat.service

Posljedica: env vars (QDRANT_URL, GROQ_API_KEY, ANTHROPIC_API_KEY, DEEPSEEK_API_KEY) nisu stizale procesima.

3. Mass-fix Qdrant URL (35+ scripts)

  • localhost:633310.10.0.2:6333 u 55+ aktivnih file-ova
  • Pokriveno: /opt/rinet-gpu, /opt/ai-rinet, /opt/pgz-sport, /opt/dabi-persona, /opt/portal-rinet
  • Ostali: backup files (pre_b_switch, .bak.*) — nije dirano

4. TG spam blokiranje

  • Globalni Python monkey-patch /usr/lib/python3/dist-packages/usercustomize.py
  • Intercept svaki requests.post("api.telegram.org/...") u svim Python procesima
  • Šalje kroz rinet-notify rate-limited helper (max 5/h, dedup 30min)
  • Bash wrapper /usr/local/bin/rinet-curl-tg
  • Disabled cron monitor (embed_monitor.sh, embed_monitor_p2.sh)

5. Anthropic Tier 4 (ZADNJI u waterfall)

Linije 484+496 u dabi_orchestrator_v3.py:

Tier 0: dabi-budget LoRA (port 8765)
Tier 1: vLLM Qwen 7B (port 8001)
Tier 2: Groq llama-4-scout
Tier 3: DeepSeek V3
Tier 4: Anthropic Claude  ← ZADNJI

ENV var bug fix: CLAUDE_API_KEYANTHROPIC_API_KEY

6. Multi-language support (HR/EN/DE/IT)

  • _translate_to() + _detect_query_lang() u /opt/ai-rinet/ai_gateway.py
  • HR: native
  • EN: radi
  • DE: radi
  • IT: povremeno (Groq rate-limit issue)

7. Sport scrapers — pokrenuti svi

Bili 5 INACTIVE, sad SVI ACTIVE:

  • sport-pgz-deep-loop
  • sport-master-loop
  • sport-extra-loop
  • sport-fed-scrapers
  • sport-oib-loop
  • sport-dabi-quiz

pgz_sport_deep.py: keyword filter prošireno 8 → 26 keywords (sport, klub, savez, sportaš, kup, prvenstvo, liga, utakmica, igrač, trener, olimpij, paraolimpij, turn, medalj, pobjed, rijeka, pgž, primorsko, subvenc, natječaj, odluka, proračun, rebal...)

8. Reembed processes — radi

  • tmux 'reembed': 89% done, rate 55-173k/s
  • reembed_phase2.py: PID 1790646, 85-102k/h, court_notices_v2 + rsv_enriched_v2

9. LoRA daily timer — REVIVED

Bug: timer bio mrtav od 03.05.2026! Fix: systemctl enable lora-finetune.timer + start Training pokrenuto 23:24 — 100,000 examples + 309 eval

10. KPI Dashboard — LIVE

11. Continuous loops (15 cron)

Cron Loop Timeout
*/2 min lora_watchdog -
*/5 min smoke_test 60s
*/5 min kpi_snapshot 30s
*/10 min latency_alert 30s
*/15 min halu_scanner 60s
*/20 min learn_from_errors 90s
*/30 min capture_to_training 120s
*/30 min scraper_health 90s
*/45 min regression_test 90s
0 * hourly_status 30s
0 8 daily_learning -
0 4 daily RAGAS eval -
0 2 daily overnight_learning -
daily 03:00 LoRA fine-tune -
daily 03:07 master_backup 22TB -

= timeout dodan večeras (spriječava stuck procese)

12. Lokalni PG — STOPPED + DISABLED

  • systemctl stop postgresql
  • systemctl disable postgresql
  • Listen 5432: NONE
  • Schema backup u /mnt/cold/local_pg_schema_backup_20260504_2343.sql.gz (109K)
  • Data dir /var/lib/postgresql/18/main (47GB) NIJE OBRISAN (čekamo 24h verifikaciju)

13. Stuck procesi ubijeni

  • 46× smoke_test stuck → 0
  • 8× scraper_health stuck → 0
  • 5× hourly_status stuck → 0
  • 1× duplicate master_scraper_coordinator → 0
  • Total 60 stuck procesa eliminirano

14. Disk cleanup (~30GB recovered)

  • /tmp/ocr_resized (15GB)
  • /tmp/sprint (13GB)
  • /tmp/rinet_v3_backup.dump (2.2GB old PG dump)
  • /root/.cache/uv (6.1GB)
  • 201× .bak files older 14 days
  • 113× pycache dirs

Trenutno stanje

PG:       Server B 10.10.0.2:6432 (5,315,161 facts)
          Lokalni 5432 STOPPED + DISABLED
PgBouncer: 127.0.0.1:6432 → host=10.10.0.2 port=5432 (proxy to Server B)
Qdrant:   Server B 10.10.0.2:6333 (46 collections, 14M+ vectors)
          Lokalni 6333: NE POSTOJI
Redis:    Lokalni 6379 (cache)
Neo4j:    Lokalni 7687 (615,580 nodes, 756,333 relations)
Embed:    Lokalni 9879 (BGE-M3, dim 1024)
Reranker: Lokalni 8099/8100/8101 (3 instance)
vLLM:     Lokalni 8001 (Qwen2.5-7B-Instruct-AWQ)
F10 LoRA: Lokalni 8765 (dabi-budget-lora-q4)
Ollama:   Lokalni 11434 (qwen3:14b, llama3.2:3b)
MCP:      Lokalni 8810 (7 tools)

Što ostaje za dovršiti

  1. 24h dry-run lokalni PG stop — provjeriti je li sve OK pa onda obrisati /var/lib/postgresql/18/main (47GB)
  2. drop_gpu_pg.sh — pripremljen prije, NE pokretati dok dry-run ne potvrdi
  3. Multi-lang IT/DE retry — Groq rate-limit issue povremeno
  4. 9 facts bez source — UPDATE bio prekinut Bridge timeout-om, treba ponoviti
  5. Neo4j integration u RAG — orchestrator još ne koristi knowledge graph (756k relations leže neiskorišteno)

Testovi prošli

  • Smoke 4 questions: 3/4 PASS (Bok, NK Rijeka predsjednik, Kup HR; PGŽ proracun timeout via Bridge)
  • vLLM: response OK
  • Embed BGE-M3: dim 1024 OK
  • RAG: tier 1 vLLM + tier 2 Groq + tier 0 DB priority sve rade
  • Server B PG via PgBouncer: 5,315,161 facts
  • Sport+PGŽ embed: 99.97% / 99.92%
  • Halucinacije 24h: 0
  • Sport scrapers: 6 active

Bridge stability notes

  • Bridge timeout-i tijekom session-a (server pod opterećenjem)
  • Glavni razlog: GPU 100% util (LoRA training), 18+ paralelni scrapers
  • Load average peak: 126 (sad 11)