- data/sport_federations.json: 24 Croatian sport federations + aliases +
PGŽ local media (Novi list, Glas Istre, Rijeka.danas).
- enrich_router._sport_fed/_normalize_sport/_load_sport_feds: cached
loader that picks up file changes via mtime.
- _research_links() now sport-aware: when row.sport maps to a known fed,
the dynamic links list shows that fed (national + PGŽ regional) plus the
three PGŽ local-media search URLs in place of the static HNS Semafor +
transfermarkt fallback.
- scrape_sport_federation(sport, ime, prezime): generic profile-page
scraper (slug pattern OR search-results crawl) → returns
{profile_url, slika_url, datum_rodenja, mjesto_rodenja, klub_naziv}.
- _propose_for_sportas() now routes through the federation scraper before
HNS Semafor; HNS path is gated to nogomet or rows already linked.
- _load_row(sportas) JOINs klubovi to fall back to klub.sport when
c.sport is empty.
- Tested on 1024 Marijan Alkić (boćanje): proposed profile_url +
datum_rodenja from hrvatski-bocarski-savez.hr; /apply persisted them.
- Tested on 3335 Toni Jelenković (košarka) and 3379 Niko Miknić
(plivanje): research_links surface HKS/KS PGŽ and HPS respectively.
Worker:
- _pick_sportas now selects on coverage<70 across ALL sports (sport
set OR known external linkage), not just hns_*.
- _SOURCE_WEIGHTS extended with 16 federation hosts at 0.88-0.92.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Coverage computed in SQL (filled_keys * 100 / total_keys); only rows below
threshold (default 70%, override ENRICHER_COVERAGE_MAX) are queued.
- Per-row confidence is the max of source weights (semafor.hns.family=0.95,
wikipedia.hr=0.80, sport-pgz.hr=0.55) plus a small evidence-count bonus.
Below threshold (default 0.70, override ENRICHER_CONFIDENCE), only 'hard'
structured fields (profile_url, source_url, slika_url, hns_igrac_id) are
applied — never an LLM-synthesised biografija.
- Logs now mirrored to /var/log/pgz-sport-enricher.log alongside the project
log, so 'tail /var/log/pgz-sport-enricher.log' works as the brief asks.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The bare @app.get/post('/api/admin/users') decorators in pgz_sport_api.py
were registered before app.include_router(admin_users_router) and shadowed
the JWT-protected M2 routes, leaking user list to anyone.
Removed all three: GET /api/admin/users, POST /api/admin/users,
POST /api/admin/users/{uid}/toggle. The auth.admin_users router now owns
this prefix exclusively and gates every method with require_user.
Verified: no-auth → 401, invalid token → 401, valid Bearer → 200.
Critical bug fix: /v2/enrich/sportas/{id} returned proposed:{} for athletes
because the v3 pipeline was still relying on Wikipedia-only evidence and never
actually fetched semafor.hns.family/igraci/.
- enrich_router._propose_for_sportas now:
• Resolves a HNS Semafor URL from profile_url, source_url, hns_igrac_id,
vanjski_id JSONB ('hns_comet'+'hns_slug'), or source='hns_semafor'+source_id.
• Fetches and parses the player page (BS4, regex fallback) and proposes
profile_url, source_url, slika_url, hns_igrac_id, datum_rodenja,
mjesto_rodenja, broj_dresa, biografija (DeepSeek synthesis from HNS+Wiki).
- _load_row(sportas) widened to read every relevant column + vanjski_id.
- _TABLE_MAP['sportas'] writeback whitelist expanded to 12 fields.
- workers/enrichment_worker.py: 24/7 daemon, picks under-enriched
clanovi/klubovi/savezi every 5 min via SQL, calls /apply for each.
- systemd unit pgz-sport-enricher.service installed + enabled.
- Tested end-to-end: id=2222 (Abdija) and id=449 (Zec) now have
profile_url, slika_url, hns_igrac_id, biografija persisted.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>