# Subagent C — Cross-Klub Duplicate / Stale-Transfer Detection Run timestamp: 2026-05-05 08:36 batch Scope: pgz_sport.clanovi cross-klub duplicates Pre-run row count: 3240 (after Subagents A and B) ## Strict-Criteria Results | Detector | Cases found | |---|---| | Same `hns_igrac_id` across multiple `klub_id` | 0 | | Same `lower(ime)+lower(prezime)+datum_rodenja` across multiple `klub_id` | 0 | | **Total confirmed cross-klub duplicates requiring action** | **0** | Notes: - Only 3 rows in clanovi have a populated `hns_igrac_id` (Subagent A already merged the 3 same-ID-same-klub duplicates). None of the surviving rows share an HNS ID across klubs. - Brief specified `datum_rodjenja`. The canonical column with data is `datum_rodenja` (no 'j'); 684 rows populated. `datum_rodjenja` (with 'j') has only 1 row. Both columns checked — zero cross-klub matches by name+DOB. ## Soft Match (Review-Only, NO Mutation) A weaker name-only check (same `lower(ime)+lower(prezime)`, ignoring DOB) returned **56 candidate groups / 117 rows** spanning multiple klub_ids. Per brief instruction "halt if unsure → write to review-only", these were NOT modified. Why review-only and not stale-purge: - Different source pipelines (`godisnjak_2025_HOO`, `hbs_savez`, `hns_semafor`, `klub_web`, `klub_web_v2`, `manual`) index the SAME real person under DIFFERENT klub_id rows because saveze and individual clubs are distinct legal entities. A water-polo player listed in HBS savez (klub_id 2599 = the savez "klub" container) AND in HOO godisnjak (klub_id 544) is not a transfer — he is the same active player viewed from two registries. - Croatian names like "Ivan Vuletić", "Marko Komadina", "Tomislav Katalenić" are common; without DOB confirmation, soft matches are unreliable. - All 117 rows have `aktivni_status='aktivan'` and were created within ~5 days of each other (2026-04-29 to 2026-05-03) — fits the brief's edge case "both active AND created_at within 30 days → LEGITIMATE in-season". ## Decisions | Decision | Count | |---|---| | LEGITIMATE transfer (tagged secondary_klub) | 0 | | STALE transfer (purged + reparented) | 0 | | REVIEW_ONLY (soft match, awaiting human review) | 56 groups (117 rows) | | Hard delete | 0 | No mutations were performed. Backup `pgz_sport.clanovi_backup_20260505_0836` untouched. Row count stays at 3240. ## Sample Soft Matches (full list in C_TRANSFERS.json) 1. **Niko Janković** — 4 rows, 2 distinct klubs ({1,2362}). One row (id 4132) has `datum_rodenja=2001-08-25` from `hns_semafor`; others lack DOB. Same person across HOO godisnjak / klub_web / hns_semafor / manual ingestion of klub_id 2362. Soft match only — needs DOB-fill before any merge. 2. **Cherno Saho** — 3 rows, 2 distinct klubs ({2362,3840}). One row has `datum_rodenja=2005-01-07`. The two klub_id 2362 rows are likely intra-klub dups (Subagent A scope, hns_igrac_id was missing). The 3840 row may be an actual transfer or a savez/klub indexing split. Needs human review. 3. **David Pekar** (id 464 klub 2200, id 1021 klub 428). Row 464 has `datum_rodenja=2008-11-24` from `hns_semafor`; row 1021 from `hbs_savez` lacks DOB. Likely same youth player ingested from two saveze. Cannot confirm without DOB on both — soft match only. ## Reasoning Summary Per brief's hard rule "NEVER act on a case without recording reasoning in the JSON. Halt if unsure". The strict criteria yielded zero actionable cases. The 56 soft-match groups all fall under at least one safe-haven rule: - Both rows `aktivni_status='aktivan'` and created_at within 30 days → LEGITIMATE in-season → tag-only allowed, but tagging without DOB confirmation could mis-tag distinct people sharing a name. Therefore REVIEW_ONLY. - No transfer table exists in pgz_sport beyond `clan_sezona` (season stats, keyed on clan_id+sezona+natjecanje+klub_naziv, no source-clan pointer). Cannot programmatically infer "transfer" vs "duplicate". ## Post-run Counts | Metric | Value | |---|---| | pgz_sport.clanovi rows | 3240 (unchanged) | | Rows mutated | 0 | | Rows soft-deleted to clanovi_purged | 0 | | sys_audit rows added | 1 (C_DETECTION_RUN summary) | ## Errors None. ## Files - `C_TRANSFERS.json` — structured decisions and full review-only cases - `C_sql_transcript.sql` — SQL run during this subagent