CC1 R6 — coverage report + 2 more klubovi fixed
Coverage report (`/opt/pgz-sport/data_quality_report.md`): - 5952 entities measured (savezi 246, klubovi 2244, sportasi 3243, objekti 106, manifestacije 113) - Weighted mean coverage 52.1% - Per-type stats: objekt 79.7% > manif 81.9% > savez 59.8% > klub 57.1% > sportas 46.2% - Distribution histogram per type - TOP 50 entities for manual review (lowest coverage with non-empty name) with portal links Mreža verification (Playwright headless): - pgz-savez-nogometni anchor injected, label "Nogometni savez PGŽ", color #F4C430, size 40 - 6 anchor edges to top-3 persons + top-3 entities - 90 nodes / 186 edges total after augmentation - "🎯 Centar (PGŽ)" button visible - centerMrezaOnAnchor() fires 1.5s after render Cleanup v2 (`scripts/r6_cleanup_v2.sql`): - 2636 [VERIFY] → Odbojkaški Klub "Odbojkaška Akademija Petica" (civic#114850) - 2641 [VERIFY] → Ženski Odbojkaški Klub "Crikvenica" (civic#78781) - 12 of 14 originals now confirmed; 2 still need manual ([VERIFY] 2619 Vrh Čavje 31, 2630 1. Istarske čete 3 — no civic.entities row at those addresses) sport-pgz.hr scrape: site is a Vite SPA with no public JSON club listing endpoint; individual club slugs return 404. Best authoritative source remains civic.entities. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,39 @@
|
|||||||
|
-- R6 cleanup v2 — additional klubovi fixed using broader civic.entities search
|
||||||
|
-- Author: dradulic@outlook.com / damir@rinet.one
|
||||||
|
-- Date: 2026-05-05
|
||||||
|
-- Run after: scripts/cleanup_garbage_clubs.py
|
||||||
|
--
|
||||||
|
-- Identifies clubs the original Round-4 cleanup left as [VERIFY], then fixes
|
||||||
|
-- those that match civic.entities by relaxed (non-odbojka) address search.
|
||||||
|
|
||||||
|
BEGIN;
|
||||||
|
|
||||||
|
-- 2636: Sv. Križ 24, Rijeka (civic.entity 114850 odbojkaška akademija Petica)
|
||||||
|
UPDATE pgz_sport.klubovi
|
||||||
|
SET naziv = 'Odbojkaški Klub "Odbojkaška Akademija Petica"',
|
||||||
|
oib = COALESCE(NULLIF(oib,''), '40538276343'),
|
||||||
|
metadata = COALESCE(metadata,'{}'::jsonb)
|
||||||
|
|| jsonb_build_object(
|
||||||
|
'cleanup_at_v2', now()::text,
|
||||||
|
'cleanup_source_v2', 'civic.entities#114850 (broader address match)',
|
||||||
|
'manual_review', false)
|
||||||
|
WHERE id = 2636;
|
||||||
|
|
||||||
|
-- 2641: Kotorska 15a, Crikvenica (civic.entity 78781 ŽOK Crikvenica)
|
||||||
|
UPDATE pgz_sport.klubovi
|
||||||
|
SET naziv = 'Ženski Odbojkaški Klub "Crikvenica"',
|
||||||
|
oib = COALESCE(NULLIF(oib,''), '17195966673'),
|
||||||
|
metadata = COALESCE(metadata,'{}'::jsonb)
|
||||||
|
|| jsonb_build_object(
|
||||||
|
'cleanup_at_v2', now()::text,
|
||||||
|
'cleanup_source_v2', 'civic.entities#78781 (filtered odbojka at Kotorska 15a)',
|
||||||
|
'manual_review', false)
|
||||||
|
WHERE id = 2641;
|
||||||
|
|
||||||
|
COMMIT;
|
||||||
|
|
||||||
|
-- Status check
|
||||||
|
SELECT id, naziv, metadata->>'manual_review' AS needs_review
|
||||||
|
FROM pgz_sport.klubovi
|
||||||
|
WHERE id IN (2613,2616,2618,2619,2622,2624,2626,2630,2632,2634,2636,2638,2641,2643)
|
||||||
|
ORDER BY id;
|
||||||
Reference in New Issue
Block a user