73163de39cc0c5a89990d27f9ee7da0df3802c98
Backup: pgz_sport.klubovi_backup_20260505 (2244 rows snapshot before changes).
Issues fixed (18 of 23 detected):
1. Address-in-naziv (14 odbojkaški klubovi):
- 10 auto-fixed by joining civic.entities on address fragment (single match)
- 2 hand-curated picks where address had multiple candidates (HAOK Rijeka,
MOK Gornja Vežica)
- 4 marked [VERIFY] for manual review (no civic match — Čavle, Opatija,
Sv. Križ Rijeka, Crikvenica)
2. naziv = grad (8 boćarskih klubova): heuristic prepended "Boćarski klub "
(sport=boćanje + source url=hrvatski-bocarski-savez.hr confirms pattern).
3. Empty naziv (1 klub id 4426): marked [UNRESOLVED] with manual_review=true.
4. Sportaši with email/phone in ime/prezime: 0 found (schema clean).
All updates write metadata.cleanup_at / cleanup_reason / cleanup_source for audit
trail. Rollback path documented in data_cleanup_report.md.
Files added:
scripts/cleanup_garbage_clubs.py (idempotent, env-driven DSN)
data_cleanup_report.md (per-row table + manual review queue)
data_cleanup_run.json (raw script output)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Description
PGZ Sport Intelligence Platform
Languages
HTML
89.3%
Python
9.5%
Stata
0.8%
Ruby
0.2%