Commit Graph

1 Commits

Author SHA1 Message Date
claude-cc1 73163de39c CC1 R4-DC — data cleanup pass on pgz_sport.klubovi
Backup: pgz_sport.klubovi_backup_20260505 (2244 rows snapshot before changes).

Issues fixed (18 of 23 detected):

1. Address-in-naziv (14 odbojkaški klubovi):
   - 10 auto-fixed by joining civic.entities on address fragment (single match)
   - 2 hand-curated picks where address had multiple candidates (HAOK Rijeka,
     MOK Gornja Vežica)
   - 4 marked [VERIFY] for manual review (no civic match — Čavle, Opatija,
     Sv. Križ Rijeka, Crikvenica)

2. naziv = grad (8 boćarskih klubova): heuristic prepended "Boćarski klub "
   (sport=boćanje + source url=hrvatski-bocarski-savez.hr confirms pattern).

3. Empty naziv (1 klub id 4426): marked [UNRESOLVED] with manual_review=true.

4. Sportaši with email/phone in ime/prezime: 0 found (schema clean).

All updates write metadata.cleanup_at / cleanup_reason / cleanup_source for audit
trail. Rollback path documented in data_cleanup_report.md.

Files added:
  scripts/cleanup_garbage_clubs.py  (idempotent, env-driven DSN)
  data_cleanup_report.md            (per-row table + manual review queue)
  data_cleanup_run.json             (raw script output)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 01:29:27 +02:00