Files
pgz-sport/_audit/audit_DB_INTEGRITY.md
CC1 7adcec3309 CC1: 3-subagent deep audit — Frontend / API gap / DB integrity
Reports in _audit/:
  audit_FRONTEND_COVERAGE.md  — SA-1 (Explore): 9 HTML files, 0 orphan handlers (clean)
  audit_API_GAP.md            — SA-2 (Explore): 356 backend routes vs 54 frontend paths
                                                23 missing routes / 39 call sites
  audit_DB_INTEGRITY.md       — SA-3 (general-purpose): 8 SQL probes, FKs/NULLs clean,
                                                        48 dup-OIB clusters, 518 low-cov klubovi
  audit_CONSOLIDATED.md       — top 10 critical with owner matrix (cc1/cc4/cc5/cc6)

Headlines:
  Frontend: clean (post-R3 refactors landed)
  API gap:  CRM module systemic — 16 of 23 missing routes need /crm prefix in crm.html
            6 missing routes are trailing-slash bugs in crm.html
  DB:       48 OIB dup clusters in klubovi (~100 rows) need merge+unique-index
            518/2244 klubovi (23%) <33% coverage → enrichment_worker target list
            14 scoreboard-string klubovi rows (RK ... HRL Zapad od X) → DELETE
            ~30 backup tables (~97k rows) cluttering pgz_sport schema

Owner allocation:
  cc1 → #6 backup-table archival, #8 verify, #9 sportas trailing-slash
  cc4 → #1 OIB dedup script, #4 scoreboard DELETE, #10 schema CHECKs
  cc5 → #2 /crm prefix sweep on crm.html, #3 trailing-slash sweep, #7 notif endpoint
  cc6 → #5 enrichment_worker batch on filled<4 klubovi

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 08:29:04 +02:00

21 KiB
Raw Permalink Blame History

SA-3 DB Integrity Probe

Generated: 2026-05-05T06:24:49Z DB: rinet_v3 @ 10.10.0.2:6432 Schema: pgz_sport

1. Row counts

SELECT relname, n_live_tup FROM pg_stat_user_tables
WHERE schemaname='pgz_sport' ORDER BY n_live_tup DESC;

Top live (production) tables and key backups. Note: the schema contains a large number of *_backup_* / *_premerge_* / *_dedup_* / *_pre_* snapshot tables (clean-up debris). Only the canonical production tables are highlighted below; the rest are listed beneath.

Table Rows
clanovi 3248
klubovi 2244
sportski_objekti 106
savezi 246
dokumenti 7073
dokument_chunks 2850
utakmice_log 9267
rno_bilanca 6500
rno_prras 6500
clan_godisnjak 2398
clan_nagrada 2028
natjecanja_tablice 959
clan_sezona 689
hns_klubovi_natjecanje 635
klub_sezona 631
sys_audit 627
enrichment_log 616
dokument_primjena 439
natjecanja 428
clanovi_deleted_empty 372
clanstvo_kategorije 313
natjecanje_tablica 304
vijesti 286
savez_stats_oficijalno 284
najbolji_sportasi 243
user_sessions 235
sys_role_permissions 220
audit_events 193
potpore_nositelji 182
savez_statistika_clanstvo 177
statistika_saveza 169
osobe_funkcije 159
sport_facts 135
audit_feed 131
dobne_kategorije 127
manifestacije 113
sufinanciranje_sport 110
alertovi 89
ai_grad_distances 78
hns_natjecanja 74
notifications 66
sys_permissions 54
zsp_dokumenti 54
uloga_katalog 49
clanarine 48
mediji 42
treneri 38
account_codes 31
audit_log 29
suci 27
rno_sportske_udruge 21
users 18
lijecnicki_pregledi 16
form_templates 15
invoices 14
specijalisti_med 13
akademski_sport 11
proracun 11
hoo_pravilnici 8
alert_rules 8
roles 7
scraper_runs 6
invoice_uploads 5
payments 5
user_action_tokens 5
tenants 5
polygon_seals 5
expense_reports 4
javne_potrebe 4
user_klub_links 4
form_submissions 3
email_templates 3
gdpr_erasure_requests 3
sportas_specifika 2
gdpr_consent 2
user_roles 1
putni_nalog_racuni 1
user_2fa 1
invoice_lines 1
llm_extracted_facts 0
scrape_jobs 0
clan_utakmica 0
natjecanja_utakmice 0
user_permissions 0
sponzori 0

Backup/snapshot tables (candidates for archival drop)

These are stale workflow artefacts taking up significant rows; they should not be queried by app code:

Table Rows
clanovi_pre_godisnjak_backup 25944
klubovi_garbage_backup_1777750740 10072
klubovi_dedup_v2_1777750793 9920
klubovi_dedup_v3_1777750848 9672
clanovi_backup_20260430 9572
klubovi_premerge_20260503c 8976
klubovi_premerge_20260503b 8976
klubovi_pre_cleanup_20260430 8120
klubovi_pre_dedup_20260430 5960
klubovi_premerge_20260503 2572
klubovi_backup_20260505 2244
clanovi_purge_backup_20260429 1576
clanovi_dedup_20260502_v2 1384
klub_sezona_backup_20260502 1092
clanovi_dedup_backup_20260429 532
klubovi_sport_rename_backup_1777756941 396
klubovi_dedup_20260502 140
sponzori_mock_backup_1777756941 88
klubovi_finaldd_backup_1777752742 72
klubovi_garbage_backup_20260502 36
rno_organizacije 1482 (may be production)
sys_users_deprecated_20260429 9
klubovi_dedup_haok_backup_20260505 3
sys_user_klub_links_deprecated_20260429 2
klubovi_garbage_backup_1777752698 0
sys_sessions_deprecated_20260429 0
sys_user_permissions_deprecated_20260429 0

Total backup rows held: ~97,000+ (about 30x the canonical row count).

2. NULL/empty critical columns

SELECT 'clanovi.ime',           COUNT(*) FILTER (WHERE ime IS NULL),    COUNT(*) FILTER (WHERE ime = '')      FROM pgz_sport.clanovi
UNION ALL SELECT 'clanovi.prezime',       COUNT(*) FILTER (WHERE prezime IS NULL),COUNT(*) FILTER (WHERE prezime = '')  FROM pgz_sport.clanovi
UNION ALL SELECT 'klubovi.naziv',         COUNT(*) FILTER (WHERE naziv IS NULL),  COUNT(*) FILTER (WHERE naziv = '')    FROM pgz_sport.klubovi
UNION ALL SELECT 'savezi.naziv',          COUNT(*) FILTER (WHERE naziv IS NULL),  COUNT(*) FILTER (WHERE naziv = '')    FROM pgz_sport.savezi
UNION ALL SELECT 'sportski_objekti.naziv',COUNT(*) FILTER (WHERE naziv IS NULL),  COUNT(*) FILTER (WHERE naziv = '')    FROM pgz_sport.sportski_objekti;
Column NULLs Empties
clanovi.ime 0 0
clanovi.prezime 0 0
klubovi.naziv 0 0
savezi.naziv 0 0
sportski_objekti.naziv 0 0

Verdict: clean. The recent dedup/cleanup passes have eliminated all NULL/empty primary identifiers.

3. Orphan FKs

SELECT 'clanovi.klub_id->klubovi', COUNT(*) FROM pgz_sport.clanovi c
  WHERE c.klub_id IS NOT NULL AND NOT EXISTS (SELECT 1 FROM pgz_sport.klubovi k WHERE k.id=c.klub_id)
UNION ALL
SELECT 'klubovi.savez_id->savezi', COUNT(*) FROM pgz_sport.klubovi k
  WHERE k.savez_id IS NOT NULL AND NOT EXISTS (SELECT 1 FROM pgz_sport.savezi s WHERE s.id=k.savez_id)
UNION ALL
SELECT 'sys_audit.user_id->users', COUNT(*) FROM pgz_sport.sys_audit a
  WHERE a.user_id IS NOT NULL AND NOT EXISTS (SELECT 1 FROM pgz_sport.users u WHERE u.id=a.user_id);
Constraint Orphan rows
clanovi.klub_id -> klubovi.id 0
klubovi.savez_id -> savezi.id 0
sys_audit.user_id -> users.id 0

Verdict: clean. All FK chains are intact.

4. Duplicate OIBs

SELECT oib, count(*), string_agg(naziv, ' | ')
FROM pgz_sport.klubovi
WHERE oib IS NOT NULL AND oib ~ '^[0-9]{11}$'
GROUP BY oib HAVING count(*)>1;

48 distinct OIBs are shared by 2-4 klubovi rows each (~100 duplicate rows total). This is the single largest data-quality issue.

OIB Count Names
86603390999 3 Juniorska ekipa Sv.Rok-Klana | Boćarski Klub Sv. Rok Klana | Sveti Rok-Klana
80500347365 3 HNK Orijent | Hrvatski Nogometni Klub Orijent | HNK Orijent 1919 (Sušak)
44908060737 3 Boćarski Klub Krimeja | Krimeja | BK Krimeja
19490107091 3 BOĆARSKI KLUB "LOVRAN" | Kadetska ekipa BK Lovran | Boćarski klub Lovran
29964028897 4 Boćarski klub Kastav | Kadetska ekipa BK Kastav 2 | Kadetska ekipa BK Kastav | Boćarski klub Kastav
17563258345 3 Plivački Klub Primorje Rijeka | KLUB DALJINSKOG PLIVANJA "PRIMORJE" | KLUB UMJETNIČKOG PLIVANJA „PRIMORJE AQUA MARIS" RIJEKA
15986803554 2 Košarkaški Klub Kvarner | Košarkaški klub KVARNER 2010
35549440954 2 Muški Odbojkaški Klub "Gornja Vežica" | Muški Odbojkaški Klub Gornja Vežica
37941242606 2 Muški Boćarski Klub Hreljin | Boćarski klub Hreljin
56273001018 2 Nogometni klub Turbina Bakar | Nogometni Klub Turbina Tribalj
67434497493 2 Odbojkaški Klub Rab | Odbojkaški Klub "Rab"
47139832980 2 Hrvatski Akademski Odbojkaški Klub "Rijeka" | HRVATSKI AKADEMSKI ODBOJKAŠKI KLUB "RIJEKA"
19514046928 2 Lovačko društvo "JELEN" Čavle | LOVAČKO DRUŠTVO "JELEN" ČAVLE
83495265520 2 Odbojkaški Klub "Kastav 1998" | Odbojkaški Klub Kastav 1998
14384540738 2 Boćarski klub Kostrena | Boćarski Klub Kostrena
17639054753 2 Streljački Klub Gluhih Galeb | Streljački klub gluhih "Galeb"
40538276343 2 Odbojkaški Klub "Odbojkaška Akademija Petica" | Odbojkaški klub Odbojkaška Akademija Petica
76273502221 2 Boćarski Klub Srdoči 1983 | Srdoči 1983
17934350916 2 NOGOMETNI KLUB "KLANA" | NK Klana
81511316706 2 Odbojkaški Klub Kostrena Kostrena | Odbojkaški Klub "Kostrena" Kostrena
27991069782 2 Boćarski Klub Čavle Šb Čavle | Juniorska ekipa Čavle ŠB
44509762938 2 Kadetska ekipa BK Sveti Jakov | Boćarski Klub Sveti Jakov Jadranovo
38093446162 2 Lovranska Draga | Boćarski Klub Lovranska Draga
56132503774 2 Nogometni Klub Draga-Mošćenička Draga | NK Draga
40936837495 2 Lovačko društvo "KAMENJARKA" Kukuljanovo | LOVAČKO DRUŠTVO "KAMENJARKA" KUKULJANOVO-ŠKRLJEVO
02999668483 2 ŠK Goranka | KK Goranka
35883230704 2 Lovačko društvo "MEDVIĐAK" Drivenik Tribalj | LOVAČKO DRUŠTVO "MEDVIĐAK" DRIVENIK
27420052480 2 Krenovac | Boćarski Klub Krenovac
17195966673 2 Ženski Odbojkaški Klub "Crikvenica" | Ženski Odbojkaški Klub Crikvenica
51108883738 2 NK Risnjak | Nogometni Klub Risnjak Lokve
13794801696 2 Ženski nogometni klub Rijeka Jack Pot | Ženski nogometni klub Rijeka
33154520914 2 Malonogometni klub gluhih "Galeb" | Malonogometni Klub Gluhih Galeb
52818156657 2 Parastreljački Klub Paraolimpijac | Parastreljački klub "Paraolimpijac"
42449645267 2 Paraatletski Klub Rijeka | Paraatletski klub "Srce" Rijeka
75947125821 2 Boćarski klub Opatija | Boćarski Klub Opatija
43219260850 2 Ženski Akademski Odbojkaški Klub Škurinje Rijeka | Ženski Akademski Odbojkaški Klub Škurinje Rijeka
85575561127 2 SPORTSKO-REKREACIJSKO DRUŠTVO VIŠEVICA | rekreacijsko društvo VIŠEVICA
19353575292 2 Odbojkaški Klub "Sveti Matej 06" - Viškovo | Odbojkaški Klub Sveti Matej 06 - Viškovo
86232456523 2 Boćarski klub Krk | Boćarski klub Krk
74630525187 2 Nogometni klub Omladinac | NK Omladinac Vrata
83261523211 2 Odbojkaški Klub Opatija Volley | ODBOJKAŠKI KLUB OPATIJA VOLLEY
98146784649 2 Boćarski Klub Draga Mošćenička Draga | Draga Mošćenička Draga
39250096592 2 Boćarski klub Brod Moravice | Boćarski Klub Brod Moravice
76221716576 2 Kuglački Klub Gluhih Galeb | Kuglački klub gluhih "Galeb"
10132566066 2 Vaterpolo klub PRIMORJE-ERSTE BANKA-ženska ekipa | Vaterpolo klub PRIMORJE-ERSTE BANKA-muška ekipa
39123612806 2 Stolnoteniski klub Rijeka | Parastolnoteniski Klub Rijeka
70928157464 2 Ženski Boćarski Klub Hreljin | ŽBK Hreljin
77066352874 2 Nogometni Klub Vinodol | NK Vihor

Patterns:

  • Casing/whitespace duplicates (Boćarski klub Kostrena vs Boćarski Klub Kostrena) — pure dupes, merge.
  • Quoting variants ("Rab" vs Rab) — same.
  • "Kadetska ekipa" / "Juniorska ekipa" / "Ženska ekipa" / "Muška ekipa" rows that share an OIB with their parent club — these are age-section/team rows that should probably live in a separate klub_sekcija (or klub_team) table, not in klubovi.
  • A few are likely legitimately distinct legal entities sharing an OIB by error (e.g. Vinodol vs Vihor; NK Risnjak vs NK Risnjak Lokve) — flag for human review.

5. Placeholder values

-- klubovi
SELECT 'klubovi.naziv placeholders', COUNT(*) FROM pgz_sport.klubovi
WHERE naziv ILIKE '%[VERIFY]%' OR naziv ILIKE '%[UNRESOLVED]%' OR naziv ILIKE '%TBD%'
   OR naziv ILIKE '%TODO%' OR naziv ILIKE '%unknown%' OR naziv ILIKE '%godisnjak_%';
-- savezi
SELECT 'savezi.naziv placeholders', COUNT(*) FROM pgz_sport.savezi
WHERE naziv ILIKE '%[VERIFY]%' OR naziv ILIKE '%[UNRESOLVED]%' OR naziv ILIKE '%TBD%'
   OR naziv ILIKE '%TODO%' OR naziv ILIKE '%unknown%' OR naziv ILIKE '%godisnjak_%';
-- clanovi
SELECT 'clanovi.ime/prezime placeholders', COUNT(*) FROM pgz_sport.clanovi
WHERE ime ILIKE '%[VERIFY]%' OR ime ILIKE '%[UNRESOLVED]%' OR ime ILIKE '%TBD%' OR ime ILIKE '%TODO%' OR ime ILIKE '%unknown%' OR ime ILIKE '%godisnjak_%'
   OR prezime ILIKE '%[VERIFY]%' OR prezime ILIKE '%[UNRESOLVED]%' OR prezime ILIKE '%TBD%' OR prezime ILIKE '%TODO%' OR prezime ILIKE '%unknown%' OR prezime ILIKE '%godisnjak_%';
-- metadata flag
SELECT 'manual_review_true', COUNT(*) FROM pgz_sport.klubovi WHERE metadata->>'manual_review' = 'true';
Bucket Count
klubovi.naziv with placeholder marker 3
savezi.naziv with placeholder marker 0
clanovi.ime/prezime with placeholder marker 6
klubovi.metadata.manual_review = 'true' 3

klubovi placeholder rows

id naziv
2630 [VERIFY] Odbojkaški Klub Opatija
2619 [VERIFY] Odbojkaški Klub Čavle
4426 [UNRESOLVED] empty naziv & grad — id 4426

clanovi placeholder rows (matched the pattern via Todorović surname containing do...unkn...? — check is loose; these are false positives in fact)

SELECT id, ime, prezime FROM pgz_sport.clanovi
WHERE prezime ILIKE '%unknown%' OR ime ILIKE '%unknown%' OR ...
id ime prezime
4202 Aleksa Todorović
4140 Aleksa Todorović
1956 Filip Todorović
377 Dejan Todorović
3455 Aleksa Todorović
551 Matteo Todorović

These six are false positivesprezime "Todorović" matches %dor% token that overlaps %godisnjak_% is not the trigger; the actual trigger is %TODO% substring inside "ToDOrović" (case-insensitive ILIKE). They are real surnames, not placeholders. (Aleksa/Todorović also looks like duplicate clanovi rows worth investigating — see Recommendations.)

Verdict: placeholder pollution is essentially nil. Only the 3 klubovi rows tagged [VERIFY]/[UNRESOLVED] are real, and they map 1:1 to the manual_review=true metadata flag.

6. Low-coverage klubovi (filled < 4 of 12)

WITH cov AS (
  SELECT id, naziv,
    (CASE WHEN naziv      IS NOT NULL AND naziv      <>'' THEN 1 ELSE 0 END +
     CASE WHEN sport      IS NOT NULL AND sport      <>'' THEN 1 ELSE 0 END +
     CASE WHEN grad       IS NOT NULL AND grad       <>'' THEN 1 ELSE 0 END +
     CASE WHEN oib        IS NOT NULL AND oib        <>'' THEN 1 ELSE 0 END +
     CASE WHEN predsjednik IS NOT NULL AND predsjednik<>'' THEN 1 ELSE 0 END +
     CASE WHEN tajnik     IS NOT NULL AND tajnik     <>'' THEN 1 ELSE 0 END +
     CASE WHEN email      IS NOT NULL AND email      <>'' THEN 1 ELSE 0 END +
     CASE WHEN telefon    IS NOT NULL AND telefon    <>'' THEN 1 ELSE 0 END +
     CASE WHEN COALESCE(web, web_stranica)    IS NOT NULL AND COALESCE(web, web_stranica)<>''     THEN 1 ELSE 0 END +
     CASE WHEN COALESCE(sjediste, adresa)     IS NOT NULL AND COALESCE(sjediste, adresa)<>''      THEN 1 ELSE 0 END +
     CASE WHEN ciljevi    IS NOT NULL AND ciljevi    <>'' THEN 1 ELSE 0 END +
     CASE WHEN opis_djelatnosti IS NOT NULL AND opis_djelatnosti<>'' THEN 1 ELSE 0 END
    ) AS filled
  FROM pgz_sport.klubovi
)
SELECT id, naziv, filled FROM cov WHERE filled<4 ORDER BY filled ASC, id ASC LIMIT 20;

Total klubovi with filled < 4 / 12 (i.e. <33%): 518 (≈23% of the 2244 production klubovi).

Sample 20 worst (filled = 1 or 2)

id naziv filled/12
4249 Streljački klub DVD svojevrstan vodič za roditelje 1
4250 Streljački klub DVD Opatija 1
2290 KK Metal - Jurdani 2
2291 KK OI KOSTRENA 2
2311 RK LIBURNIJA 8. u II HRL Zapad od 12 2
2312 RK MORNAR 3. u II HRL Zapad od 10 2
2315 RK PŠR SELCE 5. u III HRL Zapad od 8 2
2324 RK ČAVLE 2. u II HRL Zapad od 10 2
2325 RK ČAVLE 7. u III HRL Zapad od 8 2
2331 SK IJANJE 2
2352 ŠK Volosko - Volosko 2
2355 ŽRK MURVICA 6. u II HRL Zapad od 12 2
2356 ŽRK MURVICA 6. u II HRL Zapad od 9 2
2360 ŽRK ZAMET II 3. u III HRL Zapad od 8 2
3741 AK Elena Ban 2
3744 AK Koper 2
3747 AK Kvarnera 2
3748 AK Rijeka 2
3749 AK Velenje 2
3750 AK Viškovo 2

Patterns:

  • RK <CLUB> N. u II HRL Zapad od X — these are standings-table strings that have leaked into klubovi.naziv. They're not clubs at all, they're handball league rankings. Should be deleted from klubovi (and redirected to natjecanja_tablice).
  • Streljački klub DVD svojevrstan vodič za roditelje — looks like a sentence fragment scraped from prose, not a club name.
  • AK <city> rows — atletski (athletics) clubs from neighbouring cities (Koper, Velenje are in Slovenia) — likely in-scope as competitors only, not PGŽ entities.

7. Suspicious clanovi (junk imports)

SELECT id, ime, prezime FROM pgz_sport.clanovi
WHERE ime ~ '@|^\d+$' LIMIT 20;

Result: 0 rows. No emails or pure-numeric strings have leaked into the ime field. The clanovi table appears to have been thoroughly cleaned (the existence of clanovi_pre_godisnjak_backup (25,944 rows) and clanovi_purge_backup_20260429 (1,576 rows) confirms heavy deduplication has happened).

8. sys_audit health

SELECT 'total',  COUNT(*)::text FROM pgz_sport.sys_audit
UNION ALL SELECT 'today',  COUNT(*)::text FROM pgz_sport.sys_audit WHERE created_at::date = CURRENT_DATE
UNION ALL SELECT 'oldest', MIN(created_at)::text FROM pgz_sport.sys_audit
UNION ALL SELECT 'newest', MAX(created_at)::text FROM pgz_sport.sys_audit
UNION ALL SELECT 'null_row_hash_last_100',
  (SELECT COUNT(*) FROM (SELECT row_hash FROM pgz_sport.sys_audit ORDER BY id DESC LIMIT 100) t WHERE row_hash IS NULL)::text;
Metric Value
Total rows 627
Rows today (2026-05-04) 531
Oldest entry 2026-04-28 21:39:45 +02
Newest entry 2026-05-05 08:23:14 +02
NULL row_hash in last 100 0

Verdict: chain integrity intact, but the audit log is only 7 days old — there's been a recent re-init, or audit was switched on only on 2026-04-28. Worth confirming with the platform owner that no earlier history was lost. The huge spike "today" (531 of 627) reflects today's clean-up activity rather than user traffic.

  1. Drop ~30 backup tables (~97k rows). clanovi_pre_godisnjak_backup (25.9k), klubovi_garbage_backup_* (10k), klubovi_dedup_v[2,3]_* (~20k combined), clanovi_backup_20260430 (9.5k), and the rest of the *_backup_* / *_premerge_* / *_pre_* / *_deprecated_* set. Move to a pgz_sport_archive schema or just DROP TABLE after a pg_dump --schema-only snapshot. Saves index size and stops accidental queries against stale data.

  2. Resolve 48 duplicate-OIB clusters in klubovi (~100 rows). Recommended SQL pattern:

    -- For each OIB cluster, keep the row with highest filled-coverage and
    -- merge children (clanovi.klub_id, klub_sezona.klub_id, etc.) onto it.
    WITH dups AS (SELECT oib, MIN(id) AS keep_id FROM pgz_sport.klubovi
                  WHERE oib ~ '^[0-9]{11}$' GROUP BY oib HAVING COUNT(*)>1),
         moves AS (SELECT k.id AS drop_id, d.keep_id FROM pgz_sport.klubovi k
                   JOIN dups d USING (oib) WHERE k.id<>d.keep_id)
    UPDATE pgz_sport.clanovi c SET klub_id = m.keep_id
      FROM moves m WHERE c.klub_id = m.drop_id;
    -- repeat for klub_sezona, hns_klubovi_natjecanje, etc.
    -- then DELETE the drop_ids from klubovi.
    

    Run interactively via /opt/pgz-sport/scripts/dedup_klubovi_by_oib.py (create if absent) with --dry-run first.

  3. Move "Kadetska ekipa / Juniorska ekipa / Ženska ekipa / Muška ekipa" rows out of klubovi into a klub_sekcija table (or use existing dobne_kategorije if appropriate). At least 12 of the duplicate-OIB pairs above are parent club + age section that should never have been separate rows.

  4. Delete the 14 standings-string klubovi rows (RK ... N. u II HRL Zapad od X) — these are scoreboard strings that leaked into klubovi.naziv. SQL:

    DELETE FROM pgz_sport.klubovi
    WHERE naziv ~ '\d+\. u (I{1,3}|IV) HRL .* od \d+';
    

    Verify count first (SELECT COUNT(*) ... ).

  5. Resolve the 3 [VERIFY]/[UNRESOLVED] klubovi (ids 2619, 2630, 4426). Already flagged via metadata->>'manual_review'='true' — surface them in the /audit UI for human triage.

  6. Run /opt/pgz-sport/scripts/enrichment_worker.py against the 518 klubovi with coverage <33%. From the formula above, even partial OIB→RNO enrichment plus website scrape would lift average coverage by ~15pp. Suggested batch:

    python3 /opt/pgz-sport/scripts/enrichment_worker.py --filter "filled<4" --limit 100 --concurrency 4
    
  7. Deduplicate Aleksa Todorović (and similar) in clanovi. ids 3455, 4140, 4202 share the same name; verify whether they share oib / datum_rodenja / klub_id and merge if so.

  8. Confirm sys_audit retention policy. Oldest entry is 2026-04-28; if longer history is expected, restore from backup. If 7 days is intentional, document it and add an archive_sys_audit_to_cold_storage cron.

  9. Add a CHECK or partial UNIQUE INDEX on klubovi.oib for valid 11-digit OIBs:

    CREATE UNIQUE INDEX CONCURRENTLY klubovi_oib_unique_valid
    ON pgz_sport.klubovi (oib) WHERE oib ~ '^[0-9]{11}$';
    

    This will physically prevent issue (2) from regressing once cleaned. Will fail until issue (2) is resolved — that's a feature.

  10. Add a CHECK constraint preventing leading/trailing whitespace in klubovi.naziv and clanovi.ime/prezime (the duplicate-OIB clusters above contain pairs like "Boćarski Klub Kostrena Kostrena" with double-space — these should never make it past INSERT):

    ALTER TABLE pgz_sport.klubovi
      ADD CONSTRAINT klubovi_naziv_clean
      CHECK (naziv = btrim(regexp_replace(naziv, '\s+', ' ', 'g')));