From 7251d27c214052f4e2b706dfa07fe975aa1ac911 Mon Sep 17 00:00:00 2001 From: claude-cc1 Date: Tue, 5 May 2026 01:46:39 +0200 Subject: [PATCH] =?UTF-8?q?CC1=20R6=20=E2=80=94=20coverage=20report=20+=20?= =?UTF-8?q?2=20more=20klubovi=20fixed?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Coverage report (`/opt/pgz-sport/data_quality_report.md`): - 5952 entities measured (savezi 246, klubovi 2244, sportasi 3243, objekti 106, manifestacije 113) - Weighted mean coverage 52.1% - Per-type stats: objekt 79.7% > manif 81.9% > savez 59.8% > klub 57.1% > sportas 46.2% - Distribution histogram per type - TOP 50 entities for manual review (lowest coverage with non-empty name) with portal links Mreža verification (Playwright headless): - pgz-savez-nogometni anchor injected, label "Nogometni savez PGŽ", color #F4C430, size 40 - 6 anchor edges to top-3 persons + top-3 entities - 90 nodes / 186 edges total after augmentation - "🎯 Centar (PGŽ)" button visible - centerMrezaOnAnchor() fires 1.5s after render Cleanup v2 (`scripts/r6_cleanup_v2.sql`): - 2636 [VERIFY] → Odbojkaški Klub "Odbojkaška Akademija Petica" (civic#114850) - 2641 [VERIFY] → Ženski Odbojkaški Klub "Crikvenica" (civic#78781) - 12 of 14 originals now confirmed; 2 still need manual ([VERIFY] 2619 Vrh Čavje 31, 2630 1. Istarske čete 3 — no civic.entities row at those addresses) sport-pgz.hr scrape: site is a Vite SPA with no public JSON club listing endpoint; individual club slugs return 404. Best authoritative source remains civic.entities. Co-Authored-By: Claude Opus 4.7 (1M context) --- scripts/r6_cleanup_v2.sql | 39 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+) create mode 100644 scripts/r6_cleanup_v2.sql diff --git a/scripts/r6_cleanup_v2.sql b/scripts/r6_cleanup_v2.sql new file mode 100644 index 0000000..445a039 --- /dev/null +++ b/scripts/r6_cleanup_v2.sql @@ -0,0 +1,39 @@ +-- R6 cleanup v2 — additional klubovi fixed using broader civic.entities search +-- Author: dradulic@outlook.com / damir@rinet.one +-- Date: 2026-05-05 +-- Run after: scripts/cleanup_garbage_clubs.py +-- +-- Identifies clubs the original Round-4 cleanup left as [VERIFY], then fixes +-- those that match civic.entities by relaxed (non-odbojka) address search. + +BEGIN; + +-- 2636: Sv. Križ 24, Rijeka (civic.entity 114850 odbojkaška akademija Petica) +UPDATE pgz_sport.klubovi +SET naziv = 'Odbojkaški Klub "Odbojkaška Akademija Petica"', + oib = COALESCE(NULLIF(oib,''), '40538276343'), + metadata = COALESCE(metadata,'{}'::jsonb) + || jsonb_build_object( + 'cleanup_at_v2', now()::text, + 'cleanup_source_v2', 'civic.entities#114850 (broader address match)', + 'manual_review', false) +WHERE id = 2636; + +-- 2641: Kotorska 15a, Crikvenica (civic.entity 78781 ŽOK Crikvenica) +UPDATE pgz_sport.klubovi +SET naziv = 'Ženski Odbojkaški Klub "Crikvenica"', + oib = COALESCE(NULLIF(oib,''), '17195966673'), + metadata = COALESCE(metadata,'{}'::jsonb) + || jsonb_build_object( + 'cleanup_at_v2', now()::text, + 'cleanup_source_v2', 'civic.entities#78781 (filtered odbojka at Kotorska 15a)', + 'manual_review', false) +WHERE id = 2641; + +COMMIT; + +-- Status check +SELECT id, naziv, metadata->>'manual_review' AS needs_review +FROM pgz_sport.klubovi +WHERE id IN (2613,2616,2618,2619,2622,2624,2626,2630,2632,2634,2636,2638,2641,2643) +ORDER BY id;