7adcec3309
Reports in _audit/:
audit_FRONTEND_COVERAGE.md — SA-1 (Explore): 9 HTML files, 0 orphan handlers (clean)
audit_API_GAP.md — SA-2 (Explore): 356 backend routes vs 54 frontend paths
23 missing routes / 39 call sites
audit_DB_INTEGRITY.md — SA-3 (general-purpose): 8 SQL probes, FKs/NULLs clean,
48 dup-OIB clusters, 518 low-cov klubovi
audit_CONSOLIDATED.md — top 10 critical with owner matrix (cc1/cc4/cc5/cc6)
Headlines:
Frontend: clean (post-R3 refactors landed)
API gap: CRM module systemic — 16 of 23 missing routes need /crm prefix in crm.html
6 missing routes are trailing-slash bugs in crm.html
DB: 48 OIB dup clusters in klubovi (~100 rows) need merge+unique-index
518/2244 klubovi (23%) <33% coverage → enrichment_worker target list
14 scoreboard-string klubovi rows (RK ... HRL Zapad od X) → DELETE
~30 backup tables (~97k rows) cluttering pgz_sport schema
Owner allocation:
cc1 → #6 backup-table archival, #8 verify, #9 sportas trailing-slash
cc4 → #1 OIB dedup script, #4 scoreboard DELETE, #10 schema CHECKs
cc5 → #2 /crm prefix sweep on crm.html, #3 trailing-slash sweep, #7 notif endpoint
cc6 → #5 enrichment_worker batch on filled<4 klubovi
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
438 lines
21 KiB
Markdown
438 lines
21 KiB
Markdown
# SA-3 DB Integrity Probe
|
||
**Generated:** 2026-05-05T06:24:49Z
|
||
**DB:** rinet_v3 @ 10.10.0.2:6432
|
||
**Schema:** pgz_sport
|
||
|
||
## 1. Row counts
|
||
|
||
```sql
|
||
SELECT relname, n_live_tup FROM pg_stat_user_tables
|
||
WHERE schemaname='pgz_sport' ORDER BY n_live_tup DESC;
|
||
```
|
||
|
||
Top live (production) tables and key backups. **Note:** the schema contains a large number of `*_backup_*` / `*_premerge_*` / `*_dedup_*` / `*_pre_*` snapshot tables (clean-up debris). Only the canonical production tables are highlighted below; the rest are listed beneath.
|
||
|
||
| Table | Rows |
|
||
|---|---:|
|
||
| clanovi | 3248 |
|
||
| klubovi | 2244 |
|
||
| sportski_objekti | 106 |
|
||
| savezi | 246 |
|
||
| dokumenti | 7073 |
|
||
| dokument_chunks | 2850 |
|
||
| utakmice_log | 9267 |
|
||
| rno_bilanca | 6500 |
|
||
| rno_prras | 6500 |
|
||
| clan_godisnjak | 2398 |
|
||
| clan_nagrada | 2028 |
|
||
| natjecanja_tablice | 959 |
|
||
| clan_sezona | 689 |
|
||
| hns_klubovi_natjecanje | 635 |
|
||
| klub_sezona | 631 |
|
||
| sys_audit | 627 |
|
||
| enrichment_log | 616 |
|
||
| dokument_primjena | 439 |
|
||
| natjecanja | 428 |
|
||
| clanovi_deleted_empty | 372 |
|
||
| clanstvo_kategorije | 313 |
|
||
| natjecanje_tablica | 304 |
|
||
| vijesti | 286 |
|
||
| savez_stats_oficijalno | 284 |
|
||
| najbolji_sportasi | 243 |
|
||
| user_sessions | 235 |
|
||
| sys_role_permissions | 220 |
|
||
| audit_events | 193 |
|
||
| potpore_nositelji | 182 |
|
||
| savez_statistika_clanstvo | 177 |
|
||
| statistika_saveza | 169 |
|
||
| osobe_funkcije | 159 |
|
||
| sport_facts | 135 |
|
||
| audit_feed | 131 |
|
||
| dobne_kategorije | 127 |
|
||
| manifestacije | 113 |
|
||
| sufinanciranje_sport | 110 |
|
||
| alertovi | 89 |
|
||
| ai_grad_distances | 78 |
|
||
| hns_natjecanja | 74 |
|
||
| notifications | 66 |
|
||
| sys_permissions | 54 |
|
||
| zsp_dokumenti | 54 |
|
||
| uloga_katalog | 49 |
|
||
| clanarine | 48 |
|
||
| mediji | 42 |
|
||
| treneri | 38 |
|
||
| account_codes | 31 |
|
||
| audit_log | 29 |
|
||
| suci | 27 |
|
||
| rno_sportske_udruge | 21 |
|
||
| users | 18 |
|
||
| lijecnicki_pregledi | 16 |
|
||
| form_templates | 15 |
|
||
| invoices | 14 |
|
||
| specijalisti_med | 13 |
|
||
| akademski_sport | 11 |
|
||
| proracun | 11 |
|
||
| hoo_pravilnici | 8 |
|
||
| alert_rules | 8 |
|
||
| roles | 7 |
|
||
| scraper_runs | 6 |
|
||
| invoice_uploads | 5 |
|
||
| payments | 5 |
|
||
| user_action_tokens | 5 |
|
||
| tenants | 5 |
|
||
| polygon_seals | 5 |
|
||
| expense_reports | 4 |
|
||
| javne_potrebe | 4 |
|
||
| user_klub_links | 4 |
|
||
| form_submissions | 3 |
|
||
| email_templates | 3 |
|
||
| gdpr_erasure_requests | 3 |
|
||
| sportas_specifika | 2 |
|
||
| gdpr_consent | 2 |
|
||
| user_roles | 1 |
|
||
| putni_nalog_racuni | 1 |
|
||
| user_2fa | 1 |
|
||
| invoice_lines | 1 |
|
||
| llm_extracted_facts | 0 |
|
||
| scrape_jobs | 0 |
|
||
| clan_utakmica | 0 |
|
||
| natjecanja_utakmice | 0 |
|
||
| user_permissions | 0 |
|
||
| sponzori | 0 |
|
||
|
||
### Backup/snapshot tables (candidates for archival drop)
|
||
|
||
These are stale workflow artefacts taking up significant rows; they should not be queried by app code:
|
||
|
||
| Table | Rows |
|
||
|---|---:|
|
||
| clanovi_pre_godisnjak_backup | 25944 |
|
||
| klubovi_garbage_backup_1777750740 | 10072 |
|
||
| klubovi_dedup_v2_1777750793 | 9920 |
|
||
| klubovi_dedup_v3_1777750848 | 9672 |
|
||
| clanovi_backup_20260430 | 9572 |
|
||
| klubovi_premerge_20260503c | 8976 |
|
||
| klubovi_premerge_20260503b | 8976 |
|
||
| klubovi_pre_cleanup_20260430 | 8120 |
|
||
| klubovi_pre_dedup_20260430 | 5960 |
|
||
| klubovi_premerge_20260503 | 2572 |
|
||
| klubovi_backup_20260505 | 2244 |
|
||
| clanovi_purge_backup_20260429 | 1576 |
|
||
| clanovi_dedup_20260502_v2 | 1384 |
|
||
| klub_sezona_backup_20260502 | 1092 |
|
||
| clanovi_dedup_backup_20260429 | 532 |
|
||
| klubovi_sport_rename_backup_1777756941 | 396 |
|
||
| klubovi_dedup_20260502 | 140 |
|
||
| sponzori_mock_backup_1777756941 | 88 |
|
||
| klubovi_finaldd_backup_1777752742 | 72 |
|
||
| klubovi_garbage_backup_20260502 | 36 |
|
||
| rno_organizacije | 1482 *(may be production)* |
|
||
| sys_users_deprecated_20260429 | 9 |
|
||
| klubovi_dedup_haok_backup_20260505 | 3 |
|
||
| sys_user_klub_links_deprecated_20260429 | 2 |
|
||
| klubovi_garbage_backup_1777752698 | 0 |
|
||
| sys_sessions_deprecated_20260429 | 0 |
|
||
| sys_user_permissions_deprecated_20260429 | 0 |
|
||
|
||
Total backup rows held: ~97,000+ (about **30x** the canonical row count).
|
||
|
||
## 2. NULL/empty critical columns
|
||
|
||
```sql
|
||
SELECT 'clanovi.ime', COUNT(*) FILTER (WHERE ime IS NULL), COUNT(*) FILTER (WHERE ime = '') FROM pgz_sport.clanovi
|
||
UNION ALL SELECT 'clanovi.prezime', COUNT(*) FILTER (WHERE prezime IS NULL),COUNT(*) FILTER (WHERE prezime = '') FROM pgz_sport.clanovi
|
||
UNION ALL SELECT 'klubovi.naziv', COUNT(*) FILTER (WHERE naziv IS NULL), COUNT(*) FILTER (WHERE naziv = '') FROM pgz_sport.klubovi
|
||
UNION ALL SELECT 'savezi.naziv', COUNT(*) FILTER (WHERE naziv IS NULL), COUNT(*) FILTER (WHERE naziv = '') FROM pgz_sport.savezi
|
||
UNION ALL SELECT 'sportski_objekti.naziv',COUNT(*) FILTER (WHERE naziv IS NULL), COUNT(*) FILTER (WHERE naziv = '') FROM pgz_sport.sportski_objekti;
|
||
```
|
||
|
||
| Column | NULLs | Empties |
|
||
|---|---:|---:|
|
||
| clanovi.ime | 0 | 0 |
|
||
| clanovi.prezime | 0 | 0 |
|
||
| klubovi.naziv | 0 | 0 |
|
||
| savezi.naziv | 0 | 0 |
|
||
| sportski_objekti.naziv | 0 | 0 |
|
||
|
||
**Verdict:** clean. The recent dedup/cleanup passes have eliminated all NULL/empty primary identifiers.
|
||
|
||
## 3. Orphan FKs
|
||
|
||
```sql
|
||
SELECT 'clanovi.klub_id->klubovi', COUNT(*) FROM pgz_sport.clanovi c
|
||
WHERE c.klub_id IS NOT NULL AND NOT EXISTS (SELECT 1 FROM pgz_sport.klubovi k WHERE k.id=c.klub_id)
|
||
UNION ALL
|
||
SELECT 'klubovi.savez_id->savezi', COUNT(*) FROM pgz_sport.klubovi k
|
||
WHERE k.savez_id IS NOT NULL AND NOT EXISTS (SELECT 1 FROM pgz_sport.savezi s WHERE s.id=k.savez_id)
|
||
UNION ALL
|
||
SELECT 'sys_audit.user_id->users', COUNT(*) FROM pgz_sport.sys_audit a
|
||
WHERE a.user_id IS NOT NULL AND NOT EXISTS (SELECT 1 FROM pgz_sport.users u WHERE u.id=a.user_id);
|
||
```
|
||
|
||
| Constraint | Orphan rows |
|
||
|---|---:|
|
||
| clanovi.klub_id -> klubovi.id | 0 |
|
||
| klubovi.savez_id -> savezi.id | 0 |
|
||
| sys_audit.user_id -> users.id | 0 |
|
||
|
||
**Verdict:** clean. All FK chains are intact.
|
||
|
||
## 4. Duplicate OIBs
|
||
|
||
```sql
|
||
SELECT oib, count(*), string_agg(naziv, ' | ')
|
||
FROM pgz_sport.klubovi
|
||
WHERE oib IS NOT NULL AND oib ~ '^[0-9]{11}$'
|
||
GROUP BY oib HAVING count(*)>1;
|
||
```
|
||
|
||
**48 distinct OIBs are shared by 2-4 klubovi rows each (~100 duplicate rows total).** This is the single largest data-quality issue.
|
||
|
||
| OIB | Count | Names |
|
||
|---|---:|---|
|
||
| 86603390999 | 3 | Juniorska ekipa Sv.Rok-Klana \| Boćarski Klub Sv. Rok Klana \| Sveti Rok-Klana |
|
||
| 80500347365 | 3 | HNK Orijent \| Hrvatski Nogometni Klub Orijent \| HNK Orijent 1919 (Sušak) |
|
||
| 44908060737 | 3 | Boćarski Klub Krimeja \| Krimeja \| BK Krimeja |
|
||
| 19490107091 | 3 | BOĆARSKI KLUB "LOVRAN" \| Kadetska ekipa BK Lovran \| Boćarski klub Lovran |
|
||
| 29964028897 | 4 | Boćarski klub Kastav \| Kadetska ekipa BK Kastav 2 \| Kadetska ekipa BK Kastav \| Boćarski klub Kastav |
|
||
| 17563258345 | 3 | Plivački Klub Primorje Rijeka \| KLUB DALJINSKOG PLIVANJA "PRIMORJE" \| KLUB UMJETNIČKOG PLIVANJA „PRIMORJE AQUA MARIS" RIJEKA |
|
||
| 15986803554 | 2 | Košarkaški Klub Kvarner \| Košarkaški klub KVARNER 2010 |
|
||
| 35549440954 | 2 | Muški Odbojkaški Klub "Gornja Vežica" \| Muški Odbojkaški Klub Gornja Vežica |
|
||
| 37941242606 | 2 | Muški Boćarski Klub Hreljin \| Boćarski klub Hreljin |
|
||
| 56273001018 | 2 | Nogometni klub Turbina Bakar \| Nogometni Klub Turbina Tribalj |
|
||
| 67434497493 | 2 | Odbojkaški Klub Rab \| Odbojkaški Klub "Rab" |
|
||
| 47139832980 | 2 | Hrvatski Akademski Odbojkaški Klub "Rijeka" \| HRVATSKI AKADEMSKI ODBOJKAŠKI KLUB "RIJEKA" |
|
||
| 19514046928 | 2 | Lovačko društvo "JELEN" Čavle \| LOVAČKO DRUŠTVO "JELEN" ČAVLE |
|
||
| 83495265520 | 2 | Odbojkaški Klub "Kastav 1998" \| Odbojkaški Klub Kastav 1998 |
|
||
| 14384540738 | 2 | Boćarski klub Kostrena \| Boćarski Klub Kostrena |
|
||
| 17639054753 | 2 | Streljački Klub Gluhih Galeb \| Streljački klub gluhih "Galeb" |
|
||
| 40538276343 | 2 | Odbojkaški Klub "Odbojkaška Akademija Petica" \| Odbojkaški klub Odbojkaška Akademija Petica |
|
||
| 76273502221 | 2 | Boćarski Klub Srdoči 1983 \| Srdoči 1983 |
|
||
| 17934350916 | 2 | NOGOMETNI KLUB "KLANA" \| NK Klana |
|
||
| 81511316706 | 2 | Odbojkaški Klub Kostrena Kostrena \| Odbojkaški Klub "Kostrena" Kostrena |
|
||
| 27991069782 | 2 | Boćarski Klub Čavle Šb Čavle \| Juniorska ekipa Čavle ŠB |
|
||
| 44509762938 | 2 | Kadetska ekipa BK Sveti Jakov \| Boćarski Klub Sveti Jakov Jadranovo |
|
||
| 38093446162 | 2 | Lovranska Draga \| Boćarski Klub Lovranska Draga |
|
||
| 56132503774 | 2 | Nogometni Klub Draga-Mošćenička Draga \| NK Draga |
|
||
| 40936837495 | 2 | Lovačko društvo "KAMENJARKA" Kukuljanovo \| LOVAČKO DRUŠTVO "KAMENJARKA" KUKULJANOVO-ŠKRLJEVO |
|
||
| 02999668483 | 2 | ŠK Goranka \| KK Goranka |
|
||
| 35883230704 | 2 | Lovačko društvo "MEDVIĐAK" Drivenik Tribalj \| LOVAČKO DRUŠTVO "MEDVIĐAK" DRIVENIK |
|
||
| 27420052480 | 2 | Krenovac \| Boćarski Klub Krenovac |
|
||
| 17195966673 | 2 | Ženski Odbojkaški Klub "Crikvenica" \| Ženski Odbojkaški Klub Crikvenica |
|
||
| 51108883738 | 2 | NK Risnjak \| Nogometni Klub Risnjak Lokve |
|
||
| 13794801696 | 2 | Ženski nogometni klub Rijeka Jack Pot \| Ženski nogometni klub Rijeka |
|
||
| 33154520914 | 2 | Malonogometni klub gluhih "Galeb" \| Malonogometni Klub Gluhih Galeb |
|
||
| 52818156657 | 2 | Parastreljački Klub Paraolimpijac \| Parastreljački klub "Paraolimpijac" |
|
||
| 42449645267 | 2 | Paraatletski Klub Rijeka \| Paraatletski klub "Srce" Rijeka |
|
||
| 75947125821 | 2 | Boćarski klub Opatija \| Boćarski Klub Opatija |
|
||
| 43219260850 | 2 | Ženski Akademski Odbojkaški Klub Škurinje Rijeka \| Ženski Akademski Odbojkaški Klub Škurinje Rijeka |
|
||
| 85575561127 | 2 | SPORTSKO-REKREACIJSKO DRUŠTVO VIŠEVICA \| rekreacijsko društvo VIŠEVICA |
|
||
| 19353575292 | 2 | Odbojkaški Klub "Sveti Matej 06" - Viškovo \| Odbojkaški Klub Sveti Matej 06 - Viškovo |
|
||
| 86232456523 | 2 | Boćarski klub Krk \| Boćarski klub Krk |
|
||
| 74630525187 | 2 | Nogometni klub Omladinac \| NK Omladinac Vrata |
|
||
| 83261523211 | 2 | Odbojkaški Klub Opatija Volley \| ODBOJKAŠKI KLUB OPATIJA VOLLEY |
|
||
| 98146784649 | 2 | Boćarski Klub Draga Mošćenička Draga \| Draga – Mošćenička Draga |
|
||
| 39250096592 | 2 | Boćarski klub Brod Moravice \| Boćarski Klub Brod Moravice |
|
||
| 76221716576 | 2 | Kuglački Klub Gluhih Galeb \| Kuglački klub gluhih "Galeb" |
|
||
| 10132566066 | 2 | Vaterpolo klub PRIMORJE-ERSTE BANKA-ženska ekipa \| Vaterpolo klub PRIMORJE-ERSTE BANKA-muška ekipa |
|
||
| 39123612806 | 2 | Stolnoteniski klub Rijeka \| Parastolnoteniski Klub Rijeka |
|
||
| 70928157464 | 2 | Ženski Boćarski Klub Hreljin \| ŽBK Hreljin |
|
||
| 77066352874 | 2 | Nogometni Klub Vinodol \| NK Vihor |
|
||
|
||
**Patterns:**
|
||
- Casing/whitespace duplicates (`Boćarski klub Kostrena` vs `Boćarski Klub Kostrena`) — pure dupes, merge.
|
||
- Quoting variants (`"Rab"` vs `Rab`) — same.
|
||
- "Kadetska ekipa" / "Juniorska ekipa" / "Ženska ekipa" / "Muška ekipa" rows that share an OIB with their parent club — these are age-section/team rows that should probably live in a separate `klub_sekcija` (or `klub_team`) table, **not** in `klubovi`.
|
||
- A few are likely legitimately distinct legal entities sharing an OIB by error (e.g. Vinodol vs Vihor; NK Risnjak vs NK Risnjak Lokve) — flag for human review.
|
||
|
||
## 5. Placeholder values
|
||
|
||
```sql
|
||
-- klubovi
|
||
SELECT 'klubovi.naziv placeholders', COUNT(*) FROM pgz_sport.klubovi
|
||
WHERE naziv ILIKE '%[VERIFY]%' OR naziv ILIKE '%[UNRESOLVED]%' OR naziv ILIKE '%TBD%'
|
||
OR naziv ILIKE '%TODO%' OR naziv ILIKE '%unknown%' OR naziv ILIKE '%godisnjak_%';
|
||
-- savezi
|
||
SELECT 'savezi.naziv placeholders', COUNT(*) FROM pgz_sport.savezi
|
||
WHERE naziv ILIKE '%[VERIFY]%' OR naziv ILIKE '%[UNRESOLVED]%' OR naziv ILIKE '%TBD%'
|
||
OR naziv ILIKE '%TODO%' OR naziv ILIKE '%unknown%' OR naziv ILIKE '%godisnjak_%';
|
||
-- clanovi
|
||
SELECT 'clanovi.ime/prezime placeholders', COUNT(*) FROM pgz_sport.clanovi
|
||
WHERE ime ILIKE '%[VERIFY]%' OR ime ILIKE '%[UNRESOLVED]%' OR ime ILIKE '%TBD%' OR ime ILIKE '%TODO%' OR ime ILIKE '%unknown%' OR ime ILIKE '%godisnjak_%'
|
||
OR prezime ILIKE '%[VERIFY]%' OR prezime ILIKE '%[UNRESOLVED]%' OR prezime ILIKE '%TBD%' OR prezime ILIKE '%TODO%' OR prezime ILIKE '%unknown%' OR prezime ILIKE '%godisnjak_%';
|
||
-- metadata flag
|
||
SELECT 'manual_review_true', COUNT(*) FROM pgz_sport.klubovi WHERE metadata->>'manual_review' = 'true';
|
||
```
|
||
|
||
| Bucket | Count |
|
||
|---|---:|
|
||
| klubovi.naziv with placeholder marker | 3 |
|
||
| savezi.naziv with placeholder marker | 0 |
|
||
| clanovi.ime/prezime with placeholder marker | 6 |
|
||
| klubovi.metadata.manual_review = 'true' | 3 |
|
||
|
||
### klubovi placeholder rows
|
||
| id | naziv |
|
||
|---|---|
|
||
| 2630 | [VERIFY] Odbojkaški Klub Opatija |
|
||
| 2619 | [VERIFY] Odbojkaški Klub Čavle |
|
||
| 4426 | [UNRESOLVED] empty naziv & grad — id 4426 |
|
||
|
||
### clanovi placeholder rows (matched the pattern via `Todorović` surname containing `do`...`unkn`...? — check is loose; these are false positives in fact)
|
||
|
||
```sql
|
||
SELECT id, ime, prezime FROM pgz_sport.clanovi
|
||
WHERE prezime ILIKE '%unknown%' OR ime ILIKE '%unknown%' OR ...
|
||
```
|
||
|
||
| id | ime | prezime |
|
||
|---|---|---|
|
||
| 4202 | Aleksa | Todorović |
|
||
| 4140 | Aleksa | Todorović |
|
||
| 1956 | Filip | Todorović |
|
||
| 377 | Dejan | Todorović |
|
||
| 3455 | Aleksa | Todorović |
|
||
| 551 | Matteo | Todorović |
|
||
|
||
These six are **false positives** — `prezime` "Todorović" matches `%dor%` token that overlaps `%godisnjak_%` is **not** the trigger; the actual trigger is `%TODO%` substring inside "ToDOrović" (case-insensitive ILIKE). They are real surnames, not placeholders. (Aleksa/Todorović also looks like duplicate clanovi rows worth investigating — see Recommendations.)
|
||
|
||
**Verdict:** placeholder pollution is essentially nil. Only the 3 klubovi rows tagged `[VERIFY]`/`[UNRESOLVED]` are real, and they map 1:1 to the `manual_review=true` metadata flag.
|
||
|
||
## 6. Low-coverage klubovi (filled < 4 of 12)
|
||
|
||
```sql
|
||
WITH cov AS (
|
||
SELECT id, naziv,
|
||
(CASE WHEN naziv IS NOT NULL AND naziv <>'' THEN 1 ELSE 0 END +
|
||
CASE WHEN sport IS NOT NULL AND sport <>'' THEN 1 ELSE 0 END +
|
||
CASE WHEN grad IS NOT NULL AND grad <>'' THEN 1 ELSE 0 END +
|
||
CASE WHEN oib IS NOT NULL AND oib <>'' THEN 1 ELSE 0 END +
|
||
CASE WHEN predsjednik IS NOT NULL AND predsjednik<>'' THEN 1 ELSE 0 END +
|
||
CASE WHEN tajnik IS NOT NULL AND tajnik <>'' THEN 1 ELSE 0 END +
|
||
CASE WHEN email IS NOT NULL AND email <>'' THEN 1 ELSE 0 END +
|
||
CASE WHEN telefon IS NOT NULL AND telefon <>'' THEN 1 ELSE 0 END +
|
||
CASE WHEN COALESCE(web, web_stranica) IS NOT NULL AND COALESCE(web, web_stranica)<>'' THEN 1 ELSE 0 END +
|
||
CASE WHEN COALESCE(sjediste, adresa) IS NOT NULL AND COALESCE(sjediste, adresa)<>'' THEN 1 ELSE 0 END +
|
||
CASE WHEN ciljevi IS NOT NULL AND ciljevi <>'' THEN 1 ELSE 0 END +
|
||
CASE WHEN opis_djelatnosti IS NOT NULL AND opis_djelatnosti<>'' THEN 1 ELSE 0 END
|
||
) AS filled
|
||
FROM pgz_sport.klubovi
|
||
)
|
||
SELECT id, naziv, filled FROM cov WHERE filled<4 ORDER BY filled ASC, id ASC LIMIT 20;
|
||
```
|
||
|
||
**Total klubovi with filled < 4 / 12 (i.e. <33%): 518** (≈23% of the 2244 production klubovi).
|
||
|
||
### Sample 20 worst (filled = 1 or 2)
|
||
|
||
| id | naziv | filled/12 |
|
||
|---|---|---:|
|
||
| 4249 | Streljački klub DVD svojevrstan vodič za roditelje | 1 |
|
||
| 4250 | Streljački klub DVD Opatija | 1 |
|
||
| 2290 | KK Metal - Jurdani | 2 |
|
||
| 2291 | KK OI KOSTRENA | 2 |
|
||
| 2311 | RK LIBURNIJA 8. u II HRL Zapad od 12 | 2 |
|
||
| 2312 | RK MORNAR 3. u II HRL Zapad od 10 | 2 |
|
||
| 2315 | RK PŠR SELCE 5. u III HRL Zapad od 8 | 2 |
|
||
| 2324 | RK ČAVLE 2. u II HRL Zapad od 10 | 2 |
|
||
| 2325 | RK ČAVLE 7. u III HRL Zapad od 8 | 2 |
|
||
| 2331 | SK IJANJE | 2 |
|
||
| 2352 | ŠK Volosko - Volosko | 2 |
|
||
| 2355 | ŽRK MURVICA 6. u II HRL Zapad od 12 | 2 |
|
||
| 2356 | ŽRK MURVICA 6. u II HRL Zapad od 9 | 2 |
|
||
| 2360 | ŽRK ZAMET II 3. u III HRL Zapad od 8 | 2 |
|
||
| 3741 | AK Elena Ban | 2 |
|
||
| 3744 | AK Koper | 2 |
|
||
| 3747 | AK Kvarnera | 2 |
|
||
| 3748 | AK Rijeka | 2 |
|
||
| 3749 | AK Velenje | 2 |
|
||
| 3750 | AK Viškovo | 2 |
|
||
|
||
**Patterns:**
|
||
- `RK <CLUB> N. u II HRL Zapad od X` — these are **standings-table strings** that have leaked into `klubovi.naziv`. They're not clubs at all, they're handball league rankings. Should be deleted from klubovi (and redirected to `natjecanja_tablice`).
|
||
- `Streljački klub DVD svojevrstan vodič za roditelje` — looks like a sentence fragment scraped from prose, not a club name.
|
||
- `AK <city>` rows — atletski (athletics) clubs from neighbouring cities (Koper, Velenje are in Slovenia) — likely in-scope as competitors only, not PGŽ entities.
|
||
|
||
## 7. Suspicious clanovi (junk imports)
|
||
|
||
```sql
|
||
SELECT id, ime, prezime FROM pgz_sport.clanovi
|
||
WHERE ime ~ '@|^\d+$' LIMIT 20;
|
||
```
|
||
|
||
**Result: 0 rows.** No emails or pure-numeric strings have leaked into the `ime` field. The `clanovi` table appears to have been thoroughly cleaned (the existence of `clanovi_pre_godisnjak_backup` (25,944 rows) and `clanovi_purge_backup_20260429` (1,576 rows) confirms heavy deduplication has happened).
|
||
|
||
## 8. sys_audit health
|
||
|
||
```sql
|
||
SELECT 'total', COUNT(*)::text FROM pgz_sport.sys_audit
|
||
UNION ALL SELECT 'today', COUNT(*)::text FROM pgz_sport.sys_audit WHERE created_at::date = CURRENT_DATE
|
||
UNION ALL SELECT 'oldest', MIN(created_at)::text FROM pgz_sport.sys_audit
|
||
UNION ALL SELECT 'newest', MAX(created_at)::text FROM pgz_sport.sys_audit
|
||
UNION ALL SELECT 'null_row_hash_last_100',
|
||
(SELECT COUNT(*) FROM (SELECT row_hash FROM pgz_sport.sys_audit ORDER BY id DESC LIMIT 100) t WHERE row_hash IS NULL)::text;
|
||
```
|
||
|
||
| Metric | Value |
|
||
|---|---|
|
||
| Total rows | 627 |
|
||
| Rows today (2026-05-04) | 531 |
|
||
| Oldest entry | 2026-04-28 21:39:45 +02 |
|
||
| Newest entry | 2026-05-05 08:23:14 +02 |
|
||
| NULL row_hash in last 100 | 0 |
|
||
|
||
**Verdict:** chain integrity intact, but the audit log is **only 7 days old** — there's been a recent re-init, or audit was switched on only on 2026-04-28. Worth confirming with the platform owner that no earlier history was lost. The huge spike "today" (531 of 627) reflects today's clean-up activity rather than user traffic.
|
||
|
||
## Recommended fixes (top 10)
|
||
|
||
1. **Drop ~30 backup tables (~97k rows).** `clanovi_pre_godisnjak_backup` (25.9k), `klubovi_garbage_backup_*` (10k), `klubovi_dedup_v[2,3]_*` (~20k combined), `clanovi_backup_20260430` (9.5k), and the rest of the `*_backup_*` / `*_premerge_*` / `*_pre_*` / `*_deprecated_*` set. Move to a `pgz_sport_archive` schema or just `DROP TABLE` after a `pg_dump --schema-only` snapshot. Saves index size and stops accidental queries against stale data.
|
||
|
||
2. **Resolve 48 duplicate-OIB clusters in `klubovi`** (~100 rows). Recommended SQL pattern:
|
||
```sql
|
||
-- For each OIB cluster, keep the row with highest filled-coverage and
|
||
-- merge children (clanovi.klub_id, klub_sezona.klub_id, etc.) onto it.
|
||
WITH dups AS (SELECT oib, MIN(id) AS keep_id FROM pgz_sport.klubovi
|
||
WHERE oib ~ '^[0-9]{11}$' GROUP BY oib HAVING COUNT(*)>1),
|
||
moves AS (SELECT k.id AS drop_id, d.keep_id FROM pgz_sport.klubovi k
|
||
JOIN dups d USING (oib) WHERE k.id<>d.keep_id)
|
||
UPDATE pgz_sport.clanovi c SET klub_id = m.keep_id
|
||
FROM moves m WHERE c.klub_id = m.drop_id;
|
||
-- repeat for klub_sezona, hns_klubovi_natjecanje, etc.
|
||
-- then DELETE the drop_ids from klubovi.
|
||
```
|
||
Run interactively via `/opt/pgz-sport/scripts/dedup_klubovi_by_oib.py` (create if absent) with `--dry-run` first.
|
||
|
||
3. **Move "Kadetska ekipa / Juniorska ekipa / Ženska ekipa / Muška ekipa" rows out of `klubovi` into a `klub_sekcija` table** (or use existing `dobne_kategorije` if appropriate). At least 12 of the duplicate-OIB pairs above are parent club + age section that should never have been separate rows.
|
||
|
||
4. **Delete the 14 standings-string klubovi rows (`RK ... N. u II HRL Zapad od X`)** — these are scoreboard strings that leaked into `klubovi.naziv`. SQL:
|
||
```sql
|
||
DELETE FROM pgz_sport.klubovi
|
||
WHERE naziv ~ '\d+\. u (I{1,3}|IV) HRL .* od \d+';
|
||
```
|
||
Verify count first (`SELECT COUNT(*) ... `).
|
||
|
||
5. **Resolve the 3 `[VERIFY]`/`[UNRESOLVED]` klubovi** (ids 2619, 2630, 4426). Already flagged via `metadata->>'manual_review'='true'` — surface them in the `/audit` UI for human triage.
|
||
|
||
6. **Run `/opt/pgz-sport/scripts/enrichment_worker.py`** against the **518 klubovi with coverage <33%**. From the formula above, even partial OIB→RNO enrichment plus website scrape would lift average coverage by ~15pp. Suggested batch:
|
||
```bash
|
||
python3 /opt/pgz-sport/scripts/enrichment_worker.py --filter "filled<4" --limit 100 --concurrency 4
|
||
```
|
||
|
||
7. **Deduplicate `Aleksa Todorović` (and similar) in `clanovi`.** ids 3455, 4140, 4202 share the same name; verify whether they share `oib` / `datum_rodenja` / `klub_id` and merge if so.
|
||
|
||
8. **Confirm `sys_audit` retention policy.** Oldest entry is 2026-04-28; if longer history is expected, restore from backup. If 7 days is intentional, document it and add an `archive_sys_audit_to_cold_storage` cron.
|
||
|
||
9. **Add a CHECK or partial UNIQUE INDEX on klubovi.oib for valid 11-digit OIBs:**
|
||
```sql
|
||
CREATE UNIQUE INDEX CONCURRENTLY klubovi_oib_unique_valid
|
||
ON pgz_sport.klubovi (oib) WHERE oib ~ '^[0-9]{11}$';
|
||
```
|
||
This will physically prevent issue (2) from regressing once cleaned. Will fail until issue (2) is resolved — that's a feature.
|
||
|
||
10. **Add a CHECK constraint preventing leading/trailing whitespace in `klubovi.naziv` and `clanovi.ime/prezime`** (the duplicate-OIB clusters above contain pairs like `"Boćarski Klub Kostrena Kostrena"` with double-space — these should never make it past INSERT):
|
||
```sql
|
||
ALTER TABLE pgz_sport.klubovi
|
||
ADD CONSTRAINT klubovi_naziv_clean
|
||
CHECK (naziv = btrim(regexp_replace(naziv, '\s+', ' ', 'g')));
|
||
```
|