PGŽ Sport Platform — Round 1+2 baseline (sport2.html + API)

This commit is contained in:
Damir Radulić
2026-05-04 23:39:08 +02:00
commit a7ec0a86be
1820 changed files with 694455 additions and 0 deletions
@@ -0,0 +1,205 @@
# HANDOFF — 30.04.2026 01:15 — KOMPLETNI FORENZIČKI AUDIT
## 🔴 BRUTAL VERDIKT — TL;DR
Ri.NET nije monstrum. Ri.NET je **ozbiljan civic-intelligence platform** s 48.6M
redova, 35 Qdrant kolekcija, 50+ servisa. ALI: tvoja "samouči, autohealing,
samorazvijajući kod" priča je **70% marketing, 30% istina**. Stvarno radi 3 od 15
self-learning servisa. Ostalo je inactive ili failed.
## 📊 STANJE — KLJUČNI BROJEVI
| Metrika | Vrijednost |
|---------|------------|
| GPU | RTX 4000 SFF Ada, **100% utilization, 78% mem, 70°C** |
| RAM | 62 GB total, 41 GB available, **swap u upotrebi 13/31 GB** |
| Disk | 1.7 TB, 68% used, 539 GB free |
| Load avg | **6.04 / 4.67 / 3.50** (na 20-thread CPU) |
| PostgreSQL | 18.3, **39 GB**, 28 schemas, ~600 tablica |
| Total DB rows | **48.592.560** (2.3× više nego docs tvrde) |
| Qdrant | 35 kolekcija, **~8M vektora** total |
| Redis | samo 63 keys / 2.15 MB used (cache **NEDOVOLJNO iskorišten**) |
| Systemd services | **80+ rinet servisa**, 3 failed |
| Aktivni cron jobs | 27+ |
| Backup .bak fajlovi u /opt | **536** (cleanup needed) |
## 🟢 ŠTO RADI DOBRO
1. **PostgreSQL tuning** — shared_buffers 8GB, effective_cache 48GB, work_mem 128MB, random_page_cost 1.1 (SSD-tuned)
2. **UFW + fail2ban + iptables** — DROP policy, 5 jails, blokirani recurring scanners
3. **PG ANALYZE cron** — radi svakih 6h ✓ (zakon 3)
4. **Bridge API + UFW DENY za interne portove** — 16 deny pravila
5. **vLLM + BGE-M3 embedder** — aktivni i odzivni
6. **PGŽ Sport data integrity trigger** — radi (clanovi_validate_source)
7. **OS-First arhitektura** — 16/18 projekata koristi centralnu DB
8. **DABI eval framework** — 954 eval rezultata, RAGAS daily cron, hallucination detection radi
9. **Handoff disciplina** — 7 handoff dokumenata 29.04 jučer
## 🔴 ŠTO RADI LOŠE
### Failed servisi
- `budget-active-learning.service` — RAGAS eval + auto-regen — **FAILED**
- `lora-finetune.service` — Qwen2.5-3B + DABI Croatian fine-tune — **FAILED**
- `eoglasna-collector.service` — sudski oglasi scraper — **FAILED 6× zadnja 4h**
### Self-learning farsi
| Servis | Status |
|--------|--------|
| rinet-self-learning | inactive disabled |
| rinet-self-learn (DUPLIKAT!) | inactive disabled |
| rinet-meta-agent | inactive disabled |
| rinet-perpetual | inactive (enabled) |
| rinet-qa-gen | inactive disabled |
| rinet-eval | inactive (enabled) |
| rinet-eval-daily | inactive |
| rinet-backfill-knowledge | inactive |
| rinet-gpu-learn | inactive disabled |
| dabi-eval | inactive disabled |
**Samo 3 od 15 self-learning servisa stvarno radi: budget-continuous, dabi-orchestrator-v3, gpu-learning.**
### Resource stress
- **GPU 100% utilization** (vLLM 40% + ollama + embedder boriće se za isti GPU)
- **Swap 13 GB used** (na 32GB swap → znači RAM pressure postoji)
- **Load avg 6** (sustainable na 20 cores ali nije idealno)
- **Qdrant 17 GB RAM + 43% CPU kontinuirano**
### Security defects (defense-by-accident)
- 27 python servisa veže na 0.0.0.0 (a ne 127.0.0.1)
- UFW DENY pokriva samo 8040, 8050, 8031, 8055 — **portovi 8000, 8001, 8042, 8051, 8060, 8070, 8080, 8090, 8095, 8098, 8099, 8100, 8101, 8765, 9090, 9091, 9099, 9876, 9878, 9879 NISU u UFW DENY**
- Spasilo nas iptables INPUT policy DROP — ali to je accident, ne by design
### Code hygiene
- **536 .bak/deprecated/backup_ fajlova u /opt**
- 9 .bak.* unit files u /etc/systemd/system/
- nginx sites-enabled ima `rinet.bak.1777502696` ⚠️
- 309 dirty fajlova u portal-rinet repu
- 98 dirty u novitalia, 42 u dabi-persona
- MASTER_CREDENTIALS_v3.md i v5.md — duplikati
### Data quality (pgz_sport)
- 922 sportaša 'manual' source — **0.4% sa source_url** (sumnjivi)
- 1986 klubova bez source_url
### Audit incompleteness
- 27 cron jobs aktivnih
- sys_audit za 30 dana = **47 entries**
- Audit chain trigger NE hvata cron operacije, samo neke API calls
- Tvrdnja "audit log poslije svake bigger operacije" je polu-istina
### Dokumentacija laži
- Doc kaže schema je `eu_fondovi.*` — stvarno je `eu.*`
- Doc kaže "21.4M rows / 245 tables" — stvarno **48.6M / ~600 tables**
- Doc ne spominje **civic schema (235 tablica, 27 GB)** — najveći dio sustava
- Doc ne spominje **legal schema, openalex schema, dabi schema (35 tablica)**
## 📋 OS-FIRST POTVRDA — JE LI Ri.NET TEMELJ?
**DA, empirijski potvrđeno:**
| Resurs | Korisnika |
|--------|-----------|
| `rinet_v3` centralna DB | **16 projekata** |
| BGE-M3 embedder :9879 | **12 projekata** |
| Qdrant :6333 | **12 projekata** |
**IZNIMKE (PREKRŠAJI Zakona 1):**
1. **novitalia** — ima vlastitu PG bazu `novitalia` + DB_USER=novitalia → **PREKRŠAJ**
2. **rinet-gpu/cortex/cortex.db** — vlastita SQLite → **PREKRŠAJ** (manji)
3. mail-server SQLite (4 db) — OK, mail server logika
4. Qdrant **35 kolekcija po domeni** — ovo je dobar pattern, ne prekršaj
**Schema-per-projekt funkcionira:** 28 schemas, jasno odvojeno.
## 🎯 ARHITEKTURA REVIEW — IS THIS THE BEST WE CAN DO?
### House MD verdikt: **NIJE, ali nije ni katastrofa**
#### Što je dobro:
- Single GPU monolith za solo developera = **smart** (nema cluster overhead)
- Schema-per-projekt = **smart** (jasna izolacija, lako backupirat)
- Bridge API kao jedini external entry = **smart** (manji attack surface)
- DB triggers za data integrity = **smart** (Emil Baltić incident lesson learned)
#### Što je pretjerano:
- **80+ systemd servisa** — preglomazno za solo developera
- **Duplikati: rinet-self-learn vs rinet-self-learning, gpu-learning vs rinet-gpu-learn** — confusing
- 3 reranker instance (8099, 8100, 8101) za solo developera = overengineered
- 4 sudreg-api + 3 worker instance = previše paralelizma
- 35 Qdrant kolekcija — neke imaju **0 ili <100 points** (pgz_zip_v1, pgz_kultura_v1, pgz_obrazovanje_v1)
#### Što fali:
- **Ozbiljan auto-restart na fail** (eoglasna-collector failed 6× za 4h, nije se sam popravio)
- **Canary deployment** — nema
- **Rollback mehanizam** — nema (samo .bak file copies)
- **Centralni monitoring dashboard** (Grafana radi ali bez exposed dashboards)
- **Prometheus alerting** — node_exporter radi, ali nema alertmanager
- **Backup koji STVARNO backupira 39GB DB** (current backup = 65KB → samo metadata)
## 🤖 SAMOUČEĆI ASPEKT — ŠTO STVARNO RADI
### Marketing vs reality
**Tvrdiš:** "Ri.NET ima autohealing, samorazvijajući kod, sam analizira, mijenja, testira i deploya"
**Stvarno:**
| Komponenta | Status |
|------------|--------|
| Auto-healing logika | **Djelomično** — health-guardian.service active, master-watchdog active, ali ne self-fix |
| Code generation pipeline | **NEMA** — cc-swarm scripts postoje ali nisu cron-driven |
| Automatski testing prije deploya | **NEMA** |
| Canary/rollback | **NEMA** |
| Monitoring koji TRIGGERA promjene | **NEMA** — samo loga |
| Learning loop iz audit logova | **DJELOMIČNO** — chat_learner.py i intensive_learner.py rade svakih 4h, ALI sys_audit ima samo 47 entry/30d |
**ISTINA:** Ri.NET ima **eval framework** (RAGAS daily, eval_runner svakih sat,
954 eval rezultata u dabi.eval_results_v2) — to je realan progress.
**Ima TRAINING corpus** (365K Q&A parova u dabi.training_qa).
**ALI:** Nema feedback loop koji ZATIM koristi training_qa za fine-tune
(lora-finetune.service je FAILED).
## 🎯 TOP 5 STVARI ZA SLJEDEĆA 4 TJEDNA
### Tjedan 1: Stabilizacija (must-do)
1. **Popraviti eoglasna-collector.service** — failed 6× za 4h, missing scrape
2. **Popraviti budget-active-learning.service** — to je RAGAS eval + auto-regen
3. **Bind sve python servise na 127.0.0.1** ili dodati UFW DENY za sve 8xxx i 9xxx portove
4. **Cleanup 536 .bak fajlova + 9 .bak unit files + nginx rinet.bak**
5. **Stvarni DB backup** — pg_dump 39GB → /opt/rinet-backups (ne samo 65KB metadata)
### Tjedan 2: Self-learning aktivacija
6. **Popraviti lora-finetune.service** — već imaš 365K training_qa, samo fali fine-tune step
7. **Decide: rinet-self-learning vs rinet-self-learn** — ubij duplikat, zadrži jedan, enable
8. **Dovršiti rinet-meta-agent** — to je ono što "samouči-trigger" obećava
9. **Cron za retraining** kad nova batch training_qa dosegne threshold
### Tjedan 3: Monitoring + alerting
10. **Grafana dashboards** — DB rows growth, query latency, eval scores per category
11. **Alertmanager + Prometheus rules** — GPU >95% za >30 min, swap >50%, service failed
12. **DABI eval scores trending** — ako tjedna agregirana ocjena padne >10%, alert
### Tjedan 4: Hardening + dokumentacija
13. **Refresh dokumentacije** — civic schema, legal schema, openalex schema TREBAJU u docs
14. **novitalia migracija** na centralnu DB ili formalna iznimka
15. **Audit chain trigger** — proširiti da hvata cron operacije, ne samo API calls
## 📌 OPERATIVNI QUICK-REF (potvrđeno radi)
```bash
# Bridge API (jedini izvana)
curl -X POST https://api.rinet.one/bridge/exec \
-H "X-API-KEY: rinet-yS4ZnKlwUqsjk" -d '{"cmd":"..."}'
# DB
PGPASSWORD='R1net2026!SecureDB#v7' psql -h localhost -p 5432 -U rinet -d rinet_v3
# vLLM (potvrđeno active)
curl http://localhost:8001/v1/models
# Embedder (potvrđeno active)
curl -X POST http://localhost:9879/api/embeddings -d '{"input":["test"]}'
# Qdrant (35 kolekcija)
curl http://10.10.0.2:6333/collections
```