Commit Graph

5 Commits

Author SHA1 Message Date
CC6 Worker faf6beb536 M12.6 SF: sport-aware enrichment + federation map (HBS, HKS, HRS, HOS, HVS, HPS, HBS bocanje…)
- data/sport_federations.json: 24 Croatian sport federations + aliases +
  PGŽ local media (Novi list, Glas Istre, Rijeka.danas).
- enrich_router._sport_fed/_normalize_sport/_load_sport_feds: cached
  loader that picks up file changes via mtime.
- _research_links() now sport-aware: when row.sport maps to a known fed,
  the dynamic links list shows that fed (national + PGŽ regional) plus the
  three PGŽ local-media search URLs in place of the static HNS Semafor +
  transfermarkt fallback.
- scrape_sport_federation(sport, ime, prezime): generic profile-page
  scraper (slug pattern OR search-results crawl) → returns
  {profile_url, slika_url, datum_rodenja, mjesto_rodenja, klub_naziv}.
- _propose_for_sportas() now routes through the federation scraper before
  HNS Semafor; HNS path is gated to nogomet or rows already linked.
- _load_row(sportas) JOINs klubovi to fall back to klub.sport when
  c.sport is empty.
- Tested on 1024 Marijan Alkić (boćanje): proposed profile_url +
  datum_rodenja from hrvatski-bocarski-savez.hr; /apply persisted them.
- Tested on 3335 Toni Jelenković (košarka) and 3379 Niko Miknić
  (plivanje): research_links surface HKS/KS PGŽ and HPS respectively.

Worker:
- _pick_sportas now selects on coverage<70 across ALL sports (sport
  set OR known external linkage), not just hns_*.
- _SOURCE_WEIGHTS extended with 16 federation hosts at 0.88-0.92.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 01:30:16 +02:00
CC6 Worker 9c5116eaa3 M12.5 R4: coverage<70 picker + confidence>=0.7 gate + /var/log target
- Coverage computed in SQL (filled_keys * 100 / total_keys); only rows below
  threshold (default 70%, override ENRICHER_COVERAGE_MAX) are queued.
- Per-row confidence is the max of source weights (semafor.hns.family=0.95,
  wikipedia.hr=0.80, sport-pgz.hr=0.55) plus a small evidence-count bonus.
  Below threshold (default 0.70, override ENRICHER_CONFIDENCE), only 'hard'
  structured fields (profile_url, source_url, slika_url, hns_igrac_id) are
  applied — never an LLM-synthesised biografija.
- Logs now mirrored to /var/log/pgz-sport-enricher.log alongside the project
  log, so 'tail /var/log/pgz-sport-enricher.log' works as the brief asks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 00:45:48 +02:00
Damir Radulić f5c6570d47 CC2 R4 #2+#5: remove legacy unauth /api/admin/users — close 401 gap
The bare @app.get/post('/api/admin/users') decorators in pgz_sport_api.py
were registered before app.include_router(admin_users_router) and shadowed
the JWT-protected M2 routes, leaking user list to anyone.

Removed all three: GET /api/admin/users, POST /api/admin/users,
POST /api/admin/users/{uid}/toggle. The auth.admin_users router now owns
this prefix exclusively and gates every method with require_user.

Verified: no-auth → 401, invalid token → 401, valid Bearer → 200.
2026-05-05 00:44:50 +02:00
CC6 Worker ece556de11 M12.4: real HNS Semafor scraper for sportas + 24/7 enrichment worker
Critical bug fix: /v2/enrich/sportas/{id} returned proposed:{} for athletes
because the v3 pipeline was still relying on Wikipedia-only evidence and never
actually fetched semafor.hns.family/igraci/.

- enrich_router._propose_for_sportas now:
  • Resolves a HNS Semafor URL from profile_url, source_url, hns_igrac_id,
    vanjski_id JSONB ('hns_comet'+'hns_slug'), or source='hns_semafor'+source_id.
  • Fetches and parses the player page (BS4, regex fallback) and proposes
    profile_url, source_url, slika_url, hns_igrac_id, datum_rodenja,
    mjesto_rodenja, broj_dresa, biografija (DeepSeek synthesis from HNS+Wiki).
- _load_row(sportas) widened to read every relevant column + vanjski_id.
- _TABLE_MAP['sportas'] writeback whitelist expanded to 12 fields.
- workers/enrichment_worker.py: 24/7 daemon, picks under-enriched
  clanovi/klubovi/savezi every 5 min via SQL, calls /apply for each.
- systemd unit pgz-sport-enricher.service installed + enabled.
- Tested end-to-end: id=2222 (Abdija) and id=449 (Zec) now have
  profile_url, slika_url, hns_igrac_id, biografija persisted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 00:36:57 +02:00
Damir Radulić a7ec0a86be PGŽ Sport Platform — Round 1+2 baseline (sport2.html + API) 2026-05-04 23:39:08 +02:00