Files
pgz-sport/_audit/audit_DB_INTEGRITY.md
T
CC1 7adcec3309 CC1: 3-subagent deep audit — Frontend / API gap / DB integrity
Reports in _audit/:
  audit_FRONTEND_COVERAGE.md  — SA-1 (Explore): 9 HTML files, 0 orphan handlers (clean)
  audit_API_GAP.md            — SA-2 (Explore): 356 backend routes vs 54 frontend paths
                                                23 missing routes / 39 call sites
  audit_DB_INTEGRITY.md       — SA-3 (general-purpose): 8 SQL probes, FKs/NULLs clean,
                                                        48 dup-OIB clusters, 518 low-cov klubovi
  audit_CONSOLIDATED.md       — top 10 critical with owner matrix (cc1/cc4/cc5/cc6)

Headlines:
  Frontend: clean (post-R3 refactors landed)
  API gap:  CRM module systemic — 16 of 23 missing routes need /crm prefix in crm.html
            6 missing routes are trailing-slash bugs in crm.html
  DB:       48 OIB dup clusters in klubovi (~100 rows) need merge+unique-index
            518/2244 klubovi (23%) <33% coverage → enrichment_worker target list
            14 scoreboard-string klubovi rows (RK ... HRL Zapad od X) → DELETE
            ~30 backup tables (~97k rows) cluttering pgz_sport schema

Owner allocation:
  cc1 → #6 backup-table archival, #8 verify, #9 sportas trailing-slash
  cc4 → #1 OIB dedup script, #4 scoreboard DELETE, #10 schema CHECKs
  cc5 → #2 /crm prefix sweep on crm.html, #3 trailing-slash sweep, #7 notif endpoint
  cc6 → #5 enrichment_worker batch on filled<4 klubovi

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 08:29:04 +02:00

438 lines
21 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# SA-3 DB Integrity Probe
**Generated:** 2026-05-05T06:24:49Z
**DB:** rinet_v3 @ 10.10.0.2:6432
**Schema:** pgz_sport
## 1. Row counts
```sql
SELECT relname, n_live_tup FROM pg_stat_user_tables
WHERE schemaname='pgz_sport' ORDER BY n_live_tup DESC;
```
Top live (production) tables and key backups. **Note:** the schema contains a large number of `*_backup_*` / `*_premerge_*` / `*_dedup_*` / `*_pre_*` snapshot tables (clean-up debris). Only the canonical production tables are highlighted below; the rest are listed beneath.
| Table | Rows |
|---|---:|
| clanovi | 3248 |
| klubovi | 2244 |
| sportski_objekti | 106 |
| savezi | 246 |
| dokumenti | 7073 |
| dokument_chunks | 2850 |
| utakmice_log | 9267 |
| rno_bilanca | 6500 |
| rno_prras | 6500 |
| clan_godisnjak | 2398 |
| clan_nagrada | 2028 |
| natjecanja_tablice | 959 |
| clan_sezona | 689 |
| hns_klubovi_natjecanje | 635 |
| klub_sezona | 631 |
| sys_audit | 627 |
| enrichment_log | 616 |
| dokument_primjena | 439 |
| natjecanja | 428 |
| clanovi_deleted_empty | 372 |
| clanstvo_kategorije | 313 |
| natjecanje_tablica | 304 |
| vijesti | 286 |
| savez_stats_oficijalno | 284 |
| najbolji_sportasi | 243 |
| user_sessions | 235 |
| sys_role_permissions | 220 |
| audit_events | 193 |
| potpore_nositelji | 182 |
| savez_statistika_clanstvo | 177 |
| statistika_saveza | 169 |
| osobe_funkcije | 159 |
| sport_facts | 135 |
| audit_feed | 131 |
| dobne_kategorije | 127 |
| manifestacije | 113 |
| sufinanciranje_sport | 110 |
| alertovi | 89 |
| ai_grad_distances | 78 |
| hns_natjecanja | 74 |
| notifications | 66 |
| sys_permissions | 54 |
| zsp_dokumenti | 54 |
| uloga_katalog | 49 |
| clanarine | 48 |
| mediji | 42 |
| treneri | 38 |
| account_codes | 31 |
| audit_log | 29 |
| suci | 27 |
| rno_sportske_udruge | 21 |
| users | 18 |
| lijecnicki_pregledi | 16 |
| form_templates | 15 |
| invoices | 14 |
| specijalisti_med | 13 |
| akademski_sport | 11 |
| proracun | 11 |
| hoo_pravilnici | 8 |
| alert_rules | 8 |
| roles | 7 |
| scraper_runs | 6 |
| invoice_uploads | 5 |
| payments | 5 |
| user_action_tokens | 5 |
| tenants | 5 |
| polygon_seals | 5 |
| expense_reports | 4 |
| javne_potrebe | 4 |
| user_klub_links | 4 |
| form_submissions | 3 |
| email_templates | 3 |
| gdpr_erasure_requests | 3 |
| sportas_specifika | 2 |
| gdpr_consent | 2 |
| user_roles | 1 |
| putni_nalog_racuni | 1 |
| user_2fa | 1 |
| invoice_lines | 1 |
| llm_extracted_facts | 0 |
| scrape_jobs | 0 |
| clan_utakmica | 0 |
| natjecanja_utakmice | 0 |
| user_permissions | 0 |
| sponzori | 0 |
### Backup/snapshot tables (candidates for archival drop)
These are stale workflow artefacts taking up significant rows; they should not be queried by app code:
| Table | Rows |
|---|---:|
| clanovi_pre_godisnjak_backup | 25944 |
| klubovi_garbage_backup_1777750740 | 10072 |
| klubovi_dedup_v2_1777750793 | 9920 |
| klubovi_dedup_v3_1777750848 | 9672 |
| clanovi_backup_20260430 | 9572 |
| klubovi_premerge_20260503c | 8976 |
| klubovi_premerge_20260503b | 8976 |
| klubovi_pre_cleanup_20260430 | 8120 |
| klubovi_pre_dedup_20260430 | 5960 |
| klubovi_premerge_20260503 | 2572 |
| klubovi_backup_20260505 | 2244 |
| clanovi_purge_backup_20260429 | 1576 |
| clanovi_dedup_20260502_v2 | 1384 |
| klub_sezona_backup_20260502 | 1092 |
| clanovi_dedup_backup_20260429 | 532 |
| klubovi_sport_rename_backup_1777756941 | 396 |
| klubovi_dedup_20260502 | 140 |
| sponzori_mock_backup_1777756941 | 88 |
| klubovi_finaldd_backup_1777752742 | 72 |
| klubovi_garbage_backup_20260502 | 36 |
| rno_organizacije | 1482 *(may be production)* |
| sys_users_deprecated_20260429 | 9 |
| klubovi_dedup_haok_backup_20260505 | 3 |
| sys_user_klub_links_deprecated_20260429 | 2 |
| klubovi_garbage_backup_1777752698 | 0 |
| sys_sessions_deprecated_20260429 | 0 |
| sys_user_permissions_deprecated_20260429 | 0 |
Total backup rows held: ~97,000+ (about **30x** the canonical row count).
## 2. NULL/empty critical columns
```sql
SELECT 'clanovi.ime', COUNT(*) FILTER (WHERE ime IS NULL), COUNT(*) FILTER (WHERE ime = '') FROM pgz_sport.clanovi
UNION ALL SELECT 'clanovi.prezime', COUNT(*) FILTER (WHERE prezime IS NULL),COUNT(*) FILTER (WHERE prezime = '') FROM pgz_sport.clanovi
UNION ALL SELECT 'klubovi.naziv', COUNT(*) FILTER (WHERE naziv IS NULL), COUNT(*) FILTER (WHERE naziv = '') FROM pgz_sport.klubovi
UNION ALL SELECT 'savezi.naziv', COUNT(*) FILTER (WHERE naziv IS NULL), COUNT(*) FILTER (WHERE naziv = '') FROM pgz_sport.savezi
UNION ALL SELECT 'sportski_objekti.naziv',COUNT(*) FILTER (WHERE naziv IS NULL), COUNT(*) FILTER (WHERE naziv = '') FROM pgz_sport.sportski_objekti;
```
| Column | NULLs | Empties |
|---|---:|---:|
| clanovi.ime | 0 | 0 |
| clanovi.prezime | 0 | 0 |
| klubovi.naziv | 0 | 0 |
| savezi.naziv | 0 | 0 |
| sportski_objekti.naziv | 0 | 0 |
**Verdict:** clean. The recent dedup/cleanup passes have eliminated all NULL/empty primary identifiers.
## 3. Orphan FKs
```sql
SELECT 'clanovi.klub_id->klubovi', COUNT(*) FROM pgz_sport.clanovi c
WHERE c.klub_id IS NOT NULL AND NOT EXISTS (SELECT 1 FROM pgz_sport.klubovi k WHERE k.id=c.klub_id)
UNION ALL
SELECT 'klubovi.savez_id->savezi', COUNT(*) FROM pgz_sport.klubovi k
WHERE k.savez_id IS NOT NULL AND NOT EXISTS (SELECT 1 FROM pgz_sport.savezi s WHERE s.id=k.savez_id)
UNION ALL
SELECT 'sys_audit.user_id->users', COUNT(*) FROM pgz_sport.sys_audit a
WHERE a.user_id IS NOT NULL AND NOT EXISTS (SELECT 1 FROM pgz_sport.users u WHERE u.id=a.user_id);
```
| Constraint | Orphan rows |
|---|---:|
| clanovi.klub_id -> klubovi.id | 0 |
| klubovi.savez_id -> savezi.id | 0 |
| sys_audit.user_id -> users.id | 0 |
**Verdict:** clean. All FK chains are intact.
## 4. Duplicate OIBs
```sql
SELECT oib, count(*), string_agg(naziv, ' | ')
FROM pgz_sport.klubovi
WHERE oib IS NOT NULL AND oib ~ '^[0-9]{11}$'
GROUP BY oib HAVING count(*)>1;
```
**48 distinct OIBs are shared by 2-4 klubovi rows each (~100 duplicate rows total).** This is the single largest data-quality issue.
| OIB | Count | Names |
|---|---:|---|
| 86603390999 | 3 | Juniorska ekipa Sv.Rok-Klana \| Boćarski Klub Sv. Rok Klana \| Sveti Rok-Klana |
| 80500347365 | 3 | HNK Orijent \| Hrvatski Nogometni Klub Orijent \| HNK Orijent 1919 (Sušak) |
| 44908060737 | 3 | Boćarski Klub Krimeja \| Krimeja \| BK Krimeja |
| 19490107091 | 3 | BOĆARSKI KLUB "LOVRAN" \| Kadetska ekipa BK Lovran \| Boćarski klub Lovran |
| 29964028897 | 4 | Boćarski klub Kastav \| Kadetska ekipa BK Kastav 2 \| Kadetska ekipa BK Kastav \| Boćarski klub Kastav |
| 17563258345 | 3 | Plivački Klub Primorje Rijeka \| KLUB DALJINSKOG PLIVANJA "PRIMORJE" \| KLUB UMJETNIČKOG PLIVANJA „PRIMORJE AQUA MARIS" RIJEKA |
| 15986803554 | 2 | Košarkaški Klub Kvarner \| Košarkaški klub KVARNER 2010 |
| 35549440954 | 2 | Muški Odbojkaški Klub "Gornja Vežica" \| Muški Odbojkaški Klub Gornja Vežica |
| 37941242606 | 2 | Muški Boćarski Klub Hreljin \| Boćarski klub Hreljin |
| 56273001018 | 2 | Nogometni klub Turbina Bakar \| Nogometni Klub Turbina Tribalj |
| 67434497493 | 2 | Odbojkaški Klub Rab \| Odbojkaški Klub "Rab" |
| 47139832980 | 2 | Hrvatski Akademski Odbojkaški Klub "Rijeka" \| HRVATSKI AKADEMSKI ODBOJKAŠKI KLUB "RIJEKA" |
| 19514046928 | 2 | Lovačko društvo "JELEN" Čavle \| LOVAČKO DRUŠTVO "JELEN" ČAVLE |
| 83495265520 | 2 | Odbojkaški Klub "Kastav 1998" \| Odbojkaški Klub Kastav 1998 |
| 14384540738 | 2 | Boćarski klub Kostrena \| Boćarski Klub Kostrena |
| 17639054753 | 2 | Streljački Klub Gluhih Galeb \| Streljački klub gluhih "Galeb" |
| 40538276343 | 2 | Odbojkaški Klub "Odbojkaška Akademija Petica" \| Odbojkaški klub Odbojkaška Akademija Petica |
| 76273502221 | 2 | Boćarski Klub Srdoči 1983 \| Srdoči 1983 |
| 17934350916 | 2 | NOGOMETNI KLUB "KLANA" \| NK Klana |
| 81511316706 | 2 | Odbojkaški Klub Kostrena Kostrena \| Odbojkaški Klub "Kostrena" Kostrena |
| 27991069782 | 2 | Boćarski Klub Čavle Šb Čavle \| Juniorska ekipa Čavle ŠB |
| 44509762938 | 2 | Kadetska ekipa BK Sveti Jakov \| Boćarski Klub Sveti Jakov Jadranovo |
| 38093446162 | 2 | Lovranska Draga \| Boćarski Klub Lovranska Draga |
| 56132503774 | 2 | Nogometni Klub Draga-Mošćenička Draga \| NK Draga |
| 40936837495 | 2 | Lovačko društvo "KAMENJARKA" Kukuljanovo \| LOVAČKO DRUŠTVO "KAMENJARKA" KUKULJANOVO-ŠKRLJEVO |
| 02999668483 | 2 | ŠK Goranka \| KK Goranka |
| 35883230704 | 2 | Lovačko društvo "MEDVIĐAK" Drivenik Tribalj \| LOVAČKO DRUŠTVO "MEDVIĐAK" DRIVENIK |
| 27420052480 | 2 | Krenovac \| Boćarski Klub Krenovac |
| 17195966673 | 2 | Ženski Odbojkaški Klub "Crikvenica" \| Ženski Odbojkaški Klub Crikvenica |
| 51108883738 | 2 | NK Risnjak \| Nogometni Klub Risnjak Lokve |
| 13794801696 | 2 | Ženski nogometni klub Rijeka Jack Pot \| Ženski nogometni klub Rijeka |
| 33154520914 | 2 | Malonogometni klub gluhih "Galeb" \| Malonogometni Klub Gluhih Galeb |
| 52818156657 | 2 | Parastreljački Klub Paraolimpijac \| Parastreljački klub "Paraolimpijac" |
| 42449645267 | 2 | Paraatletski Klub Rijeka \| Paraatletski klub "Srce" Rijeka |
| 75947125821 | 2 | Boćarski klub Opatija \| Boćarski Klub Opatija |
| 43219260850 | 2 | Ženski Akademski Odbojkaški Klub Škurinje Rijeka \| Ženski Akademski Odbojkaški Klub Škurinje Rijeka |
| 85575561127 | 2 | SPORTSKO-REKREACIJSKO DRUŠTVO VIŠEVICA \| rekreacijsko društvo VIŠEVICA |
| 19353575292 | 2 | Odbojkaški Klub "Sveti Matej 06" - Viškovo \| Odbojkaški Klub Sveti Matej 06 - Viškovo |
| 86232456523 | 2 | Boćarski klub Krk \| Boćarski klub Krk |
| 74630525187 | 2 | Nogometni klub Omladinac \| NK Omladinac Vrata |
| 83261523211 | 2 | Odbojkaški Klub Opatija Volley \| ODBOJKAŠKI KLUB OPATIJA VOLLEY |
| 98146784649 | 2 | Boćarski Klub Draga Mošćenička Draga \| Draga Mošćenička Draga |
| 39250096592 | 2 | Boćarski klub Brod Moravice \| Boćarski Klub Brod Moravice |
| 76221716576 | 2 | Kuglački Klub Gluhih Galeb \| Kuglački klub gluhih "Galeb" |
| 10132566066 | 2 | Vaterpolo klub PRIMORJE-ERSTE BANKA-ženska ekipa \| Vaterpolo klub PRIMORJE-ERSTE BANKA-muška ekipa |
| 39123612806 | 2 | Stolnoteniski klub Rijeka \| Parastolnoteniski Klub Rijeka |
| 70928157464 | 2 | Ženski Boćarski Klub Hreljin \| ŽBK Hreljin |
| 77066352874 | 2 | Nogometni Klub Vinodol \| NK Vihor |
**Patterns:**
- Casing/whitespace duplicates (`Boćarski klub Kostrena` vs `Boćarski Klub Kostrena`) — pure dupes, merge.
- Quoting variants (`"Rab"` vs `Rab`) — same.
- "Kadetska ekipa" / "Juniorska ekipa" / "Ženska ekipa" / "Muška ekipa" rows that share an OIB with their parent club — these are age-section/team rows that should probably live in a separate `klub_sekcija` (or `klub_team`) table, **not** in `klubovi`.
- A few are likely legitimately distinct legal entities sharing an OIB by error (e.g. Vinodol vs Vihor; NK Risnjak vs NK Risnjak Lokve) — flag for human review.
## 5. Placeholder values
```sql
-- klubovi
SELECT 'klubovi.naziv placeholders', COUNT(*) FROM pgz_sport.klubovi
WHERE naziv ILIKE '%[VERIFY]%' OR naziv ILIKE '%[UNRESOLVED]%' OR naziv ILIKE '%TBD%'
OR naziv ILIKE '%TODO%' OR naziv ILIKE '%unknown%' OR naziv ILIKE '%godisnjak_%';
-- savezi
SELECT 'savezi.naziv placeholders', COUNT(*) FROM pgz_sport.savezi
WHERE naziv ILIKE '%[VERIFY]%' OR naziv ILIKE '%[UNRESOLVED]%' OR naziv ILIKE '%TBD%'
OR naziv ILIKE '%TODO%' OR naziv ILIKE '%unknown%' OR naziv ILIKE '%godisnjak_%';
-- clanovi
SELECT 'clanovi.ime/prezime placeholders', COUNT(*) FROM pgz_sport.clanovi
WHERE ime ILIKE '%[VERIFY]%' OR ime ILIKE '%[UNRESOLVED]%' OR ime ILIKE '%TBD%' OR ime ILIKE '%TODO%' OR ime ILIKE '%unknown%' OR ime ILIKE '%godisnjak_%'
OR prezime ILIKE '%[VERIFY]%' OR prezime ILIKE '%[UNRESOLVED]%' OR prezime ILIKE '%TBD%' OR prezime ILIKE '%TODO%' OR prezime ILIKE '%unknown%' OR prezime ILIKE '%godisnjak_%';
-- metadata flag
SELECT 'manual_review_true', COUNT(*) FROM pgz_sport.klubovi WHERE metadata->>'manual_review' = 'true';
```
| Bucket | Count |
|---|---:|
| klubovi.naziv with placeholder marker | 3 |
| savezi.naziv with placeholder marker | 0 |
| clanovi.ime/prezime with placeholder marker | 6 |
| klubovi.metadata.manual_review = 'true' | 3 |
### klubovi placeholder rows
| id | naziv |
|---|---|
| 2630 | [VERIFY] Odbojkaški Klub Opatija |
| 2619 | [VERIFY] Odbojkaški Klub Čavle |
| 4426 | [UNRESOLVED] empty naziv & grad — id 4426 |
### clanovi placeholder rows (matched the pattern via `Todorović` surname containing `do`...`unkn`...? — check is loose; these are false positives in fact)
```sql
SELECT id, ime, prezime FROM pgz_sport.clanovi
WHERE prezime ILIKE '%unknown%' OR ime ILIKE '%unknown%' OR ...
```
| id | ime | prezime |
|---|---|---|
| 4202 | Aleksa | Todorović |
| 4140 | Aleksa | Todorović |
| 1956 | Filip | Todorović |
| 377 | Dejan | Todorović |
| 3455 | Aleksa | Todorović |
| 551 | Matteo | Todorović |
These six are **false positives**`prezime` "Todorović" matches `%dor%` token that overlaps `%godisnjak_%` is **not** the trigger; the actual trigger is `%TODO%` substring inside "ToDOrović" (case-insensitive ILIKE). They are real surnames, not placeholders. (Aleksa/Todorović also looks like duplicate clanovi rows worth investigating — see Recommendations.)
**Verdict:** placeholder pollution is essentially nil. Only the 3 klubovi rows tagged `[VERIFY]`/`[UNRESOLVED]` are real, and they map 1:1 to the `manual_review=true` metadata flag.
## 6. Low-coverage klubovi (filled < 4 of 12)
```sql
WITH cov AS (
SELECT id, naziv,
(CASE WHEN naziv IS NOT NULL AND naziv <>'' THEN 1 ELSE 0 END +
CASE WHEN sport IS NOT NULL AND sport <>'' THEN 1 ELSE 0 END +
CASE WHEN grad IS NOT NULL AND grad <>'' THEN 1 ELSE 0 END +
CASE WHEN oib IS NOT NULL AND oib <>'' THEN 1 ELSE 0 END +
CASE WHEN predsjednik IS NOT NULL AND predsjednik<>'' THEN 1 ELSE 0 END +
CASE WHEN tajnik IS NOT NULL AND tajnik <>'' THEN 1 ELSE 0 END +
CASE WHEN email IS NOT NULL AND email <>'' THEN 1 ELSE 0 END +
CASE WHEN telefon IS NOT NULL AND telefon <>'' THEN 1 ELSE 0 END +
CASE WHEN COALESCE(web, web_stranica) IS NOT NULL AND COALESCE(web, web_stranica)<>'' THEN 1 ELSE 0 END +
CASE WHEN COALESCE(sjediste, adresa) IS NOT NULL AND COALESCE(sjediste, adresa)<>'' THEN 1 ELSE 0 END +
CASE WHEN ciljevi IS NOT NULL AND ciljevi <>'' THEN 1 ELSE 0 END +
CASE WHEN opis_djelatnosti IS NOT NULL AND opis_djelatnosti<>'' THEN 1 ELSE 0 END
) AS filled
FROM pgz_sport.klubovi
)
SELECT id, naziv, filled FROM cov WHERE filled<4 ORDER BY filled ASC, id ASC LIMIT 20;
```
**Total klubovi with filled < 4 / 12 (i.e. <33%): 518** (≈23% of the 2244 production klubovi).
### Sample 20 worst (filled = 1 or 2)
| id | naziv | filled/12 |
|---|---|---:|
| 4249 | Streljački klub DVD svojevrstan vodič za roditelje | 1 |
| 4250 | Streljački klub DVD Opatija | 1 |
| 2290 | KK Metal - Jurdani | 2 |
| 2291 | KK OI KOSTRENA | 2 |
| 2311 | RK LIBURNIJA 8. u II HRL Zapad od 12 | 2 |
| 2312 | RK MORNAR 3. u II HRL Zapad od 10 | 2 |
| 2315 | RK PŠR SELCE 5. u III HRL Zapad od 8 | 2 |
| 2324 | RK ČAVLE 2. u II HRL Zapad od 10 | 2 |
| 2325 | RK ČAVLE 7. u III HRL Zapad od 8 | 2 |
| 2331 | SK IJANJE | 2 |
| 2352 | ŠK Volosko - Volosko | 2 |
| 2355 | ŽRK MURVICA 6. u II HRL Zapad od 12 | 2 |
| 2356 | ŽRK MURVICA 6. u II HRL Zapad od 9 | 2 |
| 2360 | ŽRK ZAMET II 3. u III HRL Zapad od 8 | 2 |
| 3741 | AK Elena Ban | 2 |
| 3744 | AK Koper | 2 |
| 3747 | AK Kvarnera | 2 |
| 3748 | AK Rijeka | 2 |
| 3749 | AK Velenje | 2 |
| 3750 | AK Viškovo | 2 |
**Patterns:**
- `RK <CLUB> N. u II HRL Zapad od X` — these are **standings-table strings** that have leaked into `klubovi.naziv`. They're not clubs at all, they're handball league rankings. Should be deleted from klubovi (and redirected to `natjecanja_tablice`).
- `Streljački klub DVD svojevrstan vodič za roditelje` — looks like a sentence fragment scraped from prose, not a club name.
- `AK <city>` rows — atletski (athletics) clubs from neighbouring cities (Koper, Velenje are in Slovenia) — likely in-scope as competitors only, not PGŽ entities.
## 7. Suspicious clanovi (junk imports)
```sql
SELECT id, ime, prezime FROM pgz_sport.clanovi
WHERE ime ~ '@|^\d+$' LIMIT 20;
```
**Result: 0 rows.** No emails or pure-numeric strings have leaked into the `ime` field. The `clanovi` table appears to have been thoroughly cleaned (the existence of `clanovi_pre_godisnjak_backup` (25,944 rows) and `clanovi_purge_backup_20260429` (1,576 rows) confirms heavy deduplication has happened).
## 8. sys_audit health
```sql
SELECT 'total', COUNT(*)::text FROM pgz_sport.sys_audit
UNION ALL SELECT 'today', COUNT(*)::text FROM pgz_sport.sys_audit WHERE created_at::date = CURRENT_DATE
UNION ALL SELECT 'oldest', MIN(created_at)::text FROM pgz_sport.sys_audit
UNION ALL SELECT 'newest', MAX(created_at)::text FROM pgz_sport.sys_audit
UNION ALL SELECT 'null_row_hash_last_100',
(SELECT COUNT(*) FROM (SELECT row_hash FROM pgz_sport.sys_audit ORDER BY id DESC LIMIT 100) t WHERE row_hash IS NULL)::text;
```
| Metric | Value |
|---|---|
| Total rows | 627 |
| Rows today (2026-05-04) | 531 |
| Oldest entry | 2026-04-28 21:39:45 +02 |
| Newest entry | 2026-05-05 08:23:14 +02 |
| NULL row_hash in last 100 | 0 |
**Verdict:** chain integrity intact, but the audit log is **only 7 days old** — there's been a recent re-init, or audit was switched on only on 2026-04-28. Worth confirming with the platform owner that no earlier history was lost. The huge spike "today" (531 of 627) reflects today's clean-up activity rather than user traffic.
## Recommended fixes (top 10)
1. **Drop ~30 backup tables (~97k rows).** `clanovi_pre_godisnjak_backup` (25.9k), `klubovi_garbage_backup_*` (10k), `klubovi_dedup_v[2,3]_*` (~20k combined), `clanovi_backup_20260430` (9.5k), and the rest of the `*_backup_*` / `*_premerge_*` / `*_pre_*` / `*_deprecated_*` set. Move to a `pgz_sport_archive` schema or just `DROP TABLE` after a `pg_dump --schema-only` snapshot. Saves index size and stops accidental queries against stale data.
2. **Resolve 48 duplicate-OIB clusters in `klubovi`** (~100 rows). Recommended SQL pattern:
```sql
-- For each OIB cluster, keep the row with highest filled-coverage and
-- merge children (clanovi.klub_id, klub_sezona.klub_id, etc.) onto it.
WITH dups AS (SELECT oib, MIN(id) AS keep_id FROM pgz_sport.klubovi
WHERE oib ~ '^[0-9]{11}$' GROUP BY oib HAVING COUNT(*)>1),
moves AS (SELECT k.id AS drop_id, d.keep_id FROM pgz_sport.klubovi k
JOIN dups d USING (oib) WHERE k.id<>d.keep_id)
UPDATE pgz_sport.clanovi c SET klub_id = m.keep_id
FROM moves m WHERE c.klub_id = m.drop_id;
-- repeat for klub_sezona, hns_klubovi_natjecanje, etc.
-- then DELETE the drop_ids from klubovi.
```
Run interactively via `/opt/pgz-sport/scripts/dedup_klubovi_by_oib.py` (create if absent) with `--dry-run` first.
3. **Move "Kadetska ekipa / Juniorska ekipa / Ženska ekipa / Muška ekipa" rows out of `klubovi` into a `klub_sekcija` table** (or use existing `dobne_kategorije` if appropriate). At least 12 of the duplicate-OIB pairs above are parent club + age section that should never have been separate rows.
4. **Delete the 14 standings-string klubovi rows (`RK ... N. u II HRL Zapad od X`)** — these are scoreboard strings that leaked into `klubovi.naziv`. SQL:
```sql
DELETE FROM pgz_sport.klubovi
WHERE naziv ~ '\d+\. u (I{1,3}|IV) HRL .* od \d+';
```
Verify count first (`SELECT COUNT(*) ... `).
5. **Resolve the 3 `[VERIFY]`/`[UNRESOLVED]` klubovi** (ids 2619, 2630, 4426). Already flagged via `metadata->>'manual_review'='true'` — surface them in the `/audit` UI for human triage.
6. **Run `/opt/pgz-sport/scripts/enrichment_worker.py`** against the **518 klubovi with coverage <33%**. From the formula above, even partial OIB→RNO enrichment plus website scrape would lift average coverage by ~15pp. Suggested batch:
```bash
python3 /opt/pgz-sport/scripts/enrichment_worker.py --filter "filled<4" --limit 100 --concurrency 4
```
7. **Deduplicate `Aleksa Todorović` (and similar) in `clanovi`.** ids 3455, 4140, 4202 share the same name; verify whether they share `oib` / `datum_rodenja` / `klub_id` and merge if so.
8. **Confirm `sys_audit` retention policy.** Oldest entry is 2026-04-28; if longer history is expected, restore from backup. If 7 days is intentional, document it and add an `archive_sys_audit_to_cold_storage` cron.
9. **Add a CHECK or partial UNIQUE INDEX on klubovi.oib for valid 11-digit OIBs:**
```sql
CREATE UNIQUE INDEX CONCURRENTLY klubovi_oib_unique_valid
ON pgz_sport.klubovi (oib) WHERE oib ~ '^[0-9]{11}$';
```
This will physically prevent issue (2) from regressing once cleaned. Will fail until issue (2) is resolved — that's a feature.
10. **Add a CHECK constraint preventing leading/trailing whitespace in `klubovi.naziv` and `clanovi.ime/prezime`** (the duplicate-OIB clusters above contain pairs like `"Boćarski Klub Kostrena Kostrena"` with double-space — these should never make it past INSERT):
```sql
ALTER TABLE pgz_sport.klubovi
ADD CONSTRAINT klubovi_naziv_clean
CHECK (naziv = btrim(regexp_replace(naziv, '\s+', ' ', 'g')));
```