faf6beb536c977c0d674afb610d8bd9a5b6da9d2
- data/sport_federations.json: 24 Croatian sport federations + aliases +
PGŽ local media (Novi list, Glas Istre, Rijeka.danas).
- enrich_router._sport_fed/_normalize_sport/_load_sport_feds: cached
loader that picks up file changes via mtime.
- _research_links() now sport-aware: when row.sport maps to a known fed,
the dynamic links list shows that fed (national + PGŽ regional) plus the
three PGŽ local-media search URLs in place of the static HNS Semafor +
transfermarkt fallback.
- scrape_sport_federation(sport, ime, prezime): generic profile-page
scraper (slug pattern OR search-results crawl) → returns
{profile_url, slika_url, datum_rodenja, mjesto_rodenja, klub_naziv}.
- _propose_for_sportas() now routes through the federation scraper before
HNS Semafor; HNS path is gated to nogomet or rows already linked.
- _load_row(sportas) JOINs klubovi to fall back to klub.sport when
c.sport is empty.
- Tested on 1024 Marijan Alkić (boćanje): proposed profile_url +
datum_rodenja from hrvatski-bocarski-savez.hr; /apply persisted them.
- Tested on 3335 Toni Jelenković (košarka) and 3379 Niko Miknić
(plivanje): research_links surface HKS/KS PGŽ and HPS respectively.
Worker:
- _pick_sportas now selects on coverage<70 across ALL sports (sport
set OR known external linkage), not just hns_*.
- _SOURCE_WEIGHTS extended with 16 federation hosts at 0.88-0.92.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Description
PGZ Sport Intelligence Platform
Languages
HTML
89.3%
Python
9.5%
Stata
0.8%
Ruby
0.2%