DI exec: applied CC-DI Subagent A+B SQL — 3245 clanovi, Manuel Boras merged
This commit is contained in:
@@ -0,0 +1,61 @@
|
||||
# Subagent D — Schema Quality Constraints (pgz_sport.clanovi)
|
||||
|
||||
Date: 2026-05-05
|
||||
Live row count: 3240 (backup retained at `pgz_sport.clanovi_backup_20260505_0836` = 3243)
|
||||
|
||||
## Summary table
|
||||
|
||||
| # | Candidate | Type | Pre-flight violators | Status | Object name |
|
||||
|---|-----------|------|----------------------|--------|-------------|
|
||||
| C1 | No internal CamelCase boundary | CHECK | 0 | APPLIED | `clanovi_no_camelcase_chk` |
|
||||
| C2 | ime/prezime trimmed | CHECK | 0 | APPLIED | `clanovi_trimmed_chk` |
|
||||
| C3 | length(ime) >= 2 AND length(prezime) >= 2 | CHECK | 22 | SKIPPED (see D_violations.md) | — |
|
||||
| C4 | spol IN ('M','Ž',NULL) | CHECK | 0 | ALREADY PRESENT | `clanovi_spol_check` (pre-existing) |
|
||||
| C5 | hns_igrac_id partial UNIQUE | UNIQUE INDEX | 0 dup-groups | APPLIED | `clanovi_hns_uniq` |
|
||||
| C6 | (klub_id, lower(ime), lower(prezime), datum_rodenja) UNIQUE | UNIQUE INDEX | 68 dup-groups | SKIPPED (see D_violations.md) | — |
|
||||
| C7 | BEFORE INSERT/UPDATE normalize trigger | TRIGGER | n/a | APPLIED | `clanovi_normalize_trigger` + `pgz_sport.clanovi_normalize_fn()` |
|
||||
|
||||
## Trigger semantics
|
||||
|
||||
`clanovi_normalize_fn`:
|
||||
1. Always `trim()` `NEW.ime` and `NEW.prezime`.
|
||||
2. On `INSERT`, or on `UPDATE` only when `ime` or `prezime` actually change:
|
||||
- reject CamelCase boundary (lenient: only ascii+Croatian-diacritic lower→upper pairs);
|
||||
- reject `length(ime) < 2` or `length(prezime) < 2`.
|
||||
3. The "only-when-name-changes" rule preserves the 22 legitimate historical short-name rows (e.g. `id=1852..2141`, mostly placeholder `'-'` ime + surname-only entries) so they can still receive `UPDATE`s on other fields.
|
||||
|
||||
## Smoke insert tests (all wrapped in BEGIN/ROLLBACK so live data unchanged)
|
||||
|
||||
| # | Scenario | Expected | Result |
|
||||
|---|----------|----------|--------|
|
||||
| 1 | INSERT `('IvoIvic','Test')` | reject (CamelCase) | REJECTED — `CamelCase rejected in ime: IvoIvic` |
|
||||
| 2 | INSERT `('PetarPan','Test')` | reject | REJECTED |
|
||||
| 3 | INSERT `(' Ivo ',' Ivić ')` | trim then succeed | INSERTED — stored as `('Ivo','Ivić')` |
|
||||
| 4 | INSERT `('A','Test')` | reject (length) | REJECTED — `ime too short (<2 chars): A` |
|
||||
| 5 | INSERT `('Ivan',' X ')` | trim → `'X'` len 1 → reject | REJECTED — `prezime too short (<2 chars): X` |
|
||||
| 6 | INSERT `('Marko ',' Marković')` | trim then succeed | INSERTED — stored as `('Marko','Marković')` |
|
||||
| 7 | INSERT duplicate `hns_igrac_id='209352'` | reject | REJECTED — `duplicate key value violates unique constraint "clanovi_hns_uniq"` |
|
||||
| 8 | 2× NULL + 2× `''` `hns_igrac_id` rows | all 4 succeed (partial uniqueness ignores NULL/empty) | 4 INSERTS OK |
|
||||
| 9 | UPDATE `id=1852` (`ime='-'`) `napomena=...` (no name change) | succeed | UPDATED — short-name row still mutable |
|
||||
| 10 | UPDATE `id=1852` `ime='?'` (single char) | reject | REJECTED — `ime too short (<2 chars): ?` |
|
||||
|
||||
All 10 behaviours match expectations. No live row was modified — every test ROLLBACKed.
|
||||
|
||||
## Final lockdown state on `pgz_sport.clanovi`
|
||||
|
||||
CHECK constraints in force:
|
||||
- `clanovi_no_camelcase_chk` (NEW)
|
||||
- `clanovi_trimmed_chk` (NEW)
|
||||
- `clanovi_spol_check` (pre-existing)
|
||||
|
||||
UNIQUE indexes in force:
|
||||
- `clanovi_pkey` (id)
|
||||
- `uq_clanovi_klub_profile` (klub_id, profile_url) — pre-existing
|
||||
- `clanovi_hns_uniq` (hns_igrac_id) WHERE not null/empty — NEW
|
||||
|
||||
User triggers in force (BEFORE INSERT OR UPDATE):
|
||||
- `clanovi_normalize_trigger` (NEW)
|
||||
- `clanovi_validate_source` (pre-existing)
|
||||
- `pgz_sport_clanovi_fts_trg` (pre-existing)
|
||||
|
||||
Row count unchanged at 3240.
|
||||
@@ -0,0 +1,116 @@
|
||||
-- pgz_sport.clanovi — schema lockdown DDL (Subagent D)
|
||||
-- Author: dradulic@outlook.com / damir@rinet.one
|
||||
-- Date: 2026-05-05
|
||||
-- Description: Final, applied DDL. Pre-flight all-clean blocks below were
|
||||
-- committed; SKIPPED candidates (length>=2 CHECK, klub+name+dob
|
||||
-- UNIQUE) are documented in D_violations.md and intentionally
|
||||
-- omitted here.
|
||||
--
|
||||
-- Row count at apply time: 3240 (live), 3243 (backup_20260505_0836).
|
||||
-- Rollback hints: each block is independent and reversible via
|
||||
-- ALTER TABLE pgz_sport.clanovi DROP CONSTRAINT ...;
|
||||
-- DROP INDEX pgz_sport.clanovi_hns_uniq;
|
||||
-- DROP TRIGGER clanovi_normalize_trigger ON pgz_sport.clanovi;
|
||||
-- DROP FUNCTION pgz_sport.clanovi_normalize_fn();
|
||||
|
||||
-- ===========================================================================
|
||||
-- C1: CHECK no internal CamelCase boundary (lower->upper letter pair)
|
||||
-- Pre-flight violators: 0
|
||||
-- ===========================================================================
|
||||
BEGIN;
|
||||
ALTER TABLE pgz_sport.clanovi
|
||||
ADD CONSTRAINT clanovi_no_camelcase_chk
|
||||
CHECK (
|
||||
ime !~ '[a-zćčšđžáàâäéèêëíìîïóòôöúùûüñçý][A-ZĆČŠĐŽÁÀÂÄÉÈÊËÍÌÎÏÓÒÔÖÚÙÛÜÑÇÝ]'
|
||||
AND prezime !~ '[a-zćčšđžáàâäéèêëíìîïóòôöúùûüñçý][A-ZĆČŠĐŽÁÀÂÄÉÈÊËÍÌÎÏÓÒÔÖÚÙÛÜÑÇÝ]'
|
||||
);
|
||||
COMMIT;
|
||||
|
||||
-- ===========================================================================
|
||||
-- C2: CHECK ime/prezime are trimmed
|
||||
-- Pre-flight violators: 0
|
||||
-- ===========================================================================
|
||||
BEGIN;
|
||||
ALTER TABLE pgz_sport.clanovi
|
||||
ADD CONSTRAINT clanovi_trimmed_chk
|
||||
CHECK (ime = trim(ime) AND prezime = trim(prezime));
|
||||
COMMIT;
|
||||
|
||||
-- ===========================================================================
|
||||
-- C4: spol values constraint
|
||||
-- NOT applied as new constraint — existing clanovi_spol_check already enforces
|
||||
-- spol IN ('M','Ž',NULL). Documented for completeness.
|
||||
-- CHECK (spol IS NULL OR spol IN ('M','Ž'))
|
||||
-- ===========================================================================
|
||||
|
||||
-- ===========================================================================
|
||||
-- C5: UNIQUE partial index on hns_igrac_id (non-null, non-empty)
|
||||
-- Pre-flight duplicate groups: 0
|
||||
-- ===========================================================================
|
||||
BEGIN;
|
||||
CREATE UNIQUE INDEX IF NOT EXISTS clanovi_hns_uniq
|
||||
ON pgz_sport.clanovi (hns_igrac_id)
|
||||
WHERE hns_igrac_id IS NOT NULL AND hns_igrac_id != '';
|
||||
COMMIT;
|
||||
|
||||
-- ===========================================================================
|
||||
-- C7: BEFORE INSERT/UPDATE normalize trigger
|
||||
-- Trims ime/prezime, rejects CamelCase, enforces length>=2 only when names
|
||||
-- change (so the existing 22 short-name historical rows can still be UPDATEd
|
||||
-- on other fields without rejection).
|
||||
-- ===========================================================================
|
||||
BEGIN;
|
||||
|
||||
CREATE OR REPLACE FUNCTION pgz_sport.clanovi_normalize_fn()
|
||||
RETURNS trigger
|
||||
LANGUAGE plpgsql
|
||||
AS $fn$
|
||||
DECLARE
|
||||
v_changed_name boolean;
|
||||
BEGIN
|
||||
IF NEW.ime IS NOT NULL THEN
|
||||
NEW.ime := trim(NEW.ime);
|
||||
END IF;
|
||||
IF NEW.prezime IS NOT NULL THEN
|
||||
NEW.prezime := trim(NEW.prezime);
|
||||
END IF;
|
||||
|
||||
IF TG_OP = 'INSERT' THEN
|
||||
v_changed_name := true;
|
||||
ELSE
|
||||
v_changed_name := (NEW.ime IS DISTINCT FROM OLD.ime)
|
||||
OR (NEW.prezime IS DISTINCT FROM OLD.prezime);
|
||||
END IF;
|
||||
|
||||
IF v_changed_name THEN
|
||||
IF NEW.ime ~ '[a-zćčšđžáàâäéèêëíìîïóòôöúùûüñçý][A-ZĆČŠĐŽÁÀÂÄÉÈÊËÍÌÎÏÓÒÔÖÚÙÛÜÑÇÝ]' THEN
|
||||
RAISE EXCEPTION 'CamelCase rejected in ime: %', NEW.ime
|
||||
USING ERRCODE = 'check_violation';
|
||||
END IF;
|
||||
IF NEW.prezime ~ '[a-zćčšđžáàâäéèêëíìîïóòôöúùûüñçý][A-ZĆČŠĐŽÁÀÂÄÉÈÊËÍÌÎÏÓÒÔÖÚÙÛÜÑÇÝ]' THEN
|
||||
RAISE EXCEPTION 'CamelCase rejected in prezime: %', NEW.prezime
|
||||
USING ERRCODE = 'check_violation';
|
||||
END IF;
|
||||
IF length(coalesce(NEW.ime, '')) < 2 THEN
|
||||
RAISE EXCEPTION 'ime too short (<2 chars): %', NEW.ime
|
||||
USING ERRCODE = 'check_violation';
|
||||
END IF;
|
||||
IF length(coalesce(NEW.prezime, '')) < 2 THEN
|
||||
RAISE EXCEPTION 'prezime too short (<2 chars): %', NEW.prezime
|
||||
USING ERRCODE = 'check_violation';
|
||||
END IF;
|
||||
END IF;
|
||||
|
||||
RETURN NEW;
|
||||
END;
|
||||
$fn$;
|
||||
|
||||
DROP TRIGGER IF EXISTS clanovi_normalize_trigger ON pgz_sport.clanovi;
|
||||
|
||||
CREATE TRIGGER clanovi_normalize_trigger
|
||||
BEFORE INSERT OR UPDATE ON pgz_sport.clanovi
|
||||
FOR EACH ROW EXECUTE FUNCTION pgz_sport.clanovi_normalize_fn();
|
||||
|
||||
COMMIT;
|
||||
|
||||
-- END
|
||||
@@ -0,0 +1,126 @@
|
||||
# Subagent D — Skipped constraints, violator samples
|
||||
|
||||
Two candidate constraints were SKIPPED at apply-time because pre-existing rows
|
||||
would have been rejected. They are documented here so Damir can decide whether
|
||||
to clean the data and re-attempt the constraint, or accept the current state.
|
||||
|
||||
The trigger `clanovi_normalize_trigger` already enforces both rules **for new
|
||||
inserts and for name-changing updates**, so future data ingest cannot
|
||||
reintroduce these patterns. Only retroactive enforcement on existing rows is
|
||||
deferred.
|
||||
|
||||
---
|
||||
|
||||
## C3 — `CHECK (length(ime)>=2 AND length(prezime)>=2)` — SKIPPED
|
||||
|
||||
Violator count: **22** rows.
|
||||
|
||||
Two clusters:
|
||||
|
||||
1. **Single-letter `prezime`** — `id=1160` and `id=1165`, both klub_id=848:
|
||||
- `('Boris Mičetić','B')` — note the embedded space in `ime`; the surname appears truncated to a single initial.
|
||||
- `('Boris Mičetić','J')` — same pattern.
|
||||
- **Decision suggestion**: probably real-name parse errors. Resolve manually in `clanovi`.
|
||||
|
||||
2. **Placeholder `ime='-'` (single dash)** — 20 rows, klub_id mostly NULL plus one with klub_id=3896:
|
||||
|
||||
| id | klub_id | ime | prezime |
|
||||
|----|---------|-----|---------|
|
||||
| 1852 | NULL | - | Grabovac |
|
||||
| 1853 | NULL | - | Pilepić |
|
||||
| 1854 | NULL | - | Maslak |
|
||||
| 1855 | NULL | - | Jugo |
|
||||
| 1856 | NULL | - | Miličević |
|
||||
| 1857 | NULL | - | Marjanović |
|
||||
| 1858 | NULL | - | Poljak |
|
||||
| 1859 | NULL | - | Kurelić |
|
||||
| 2021 | 3896 | - | Mohorić |
|
||||
| 2125 | NULL | - | Mittrovich (braća) |
|
||||
| 2130 | NULL | - | Loich |
|
||||
| 2131 | NULL | - | Paulinich |
|
||||
| 2132 | NULL | - | Zidarich |
|
||||
| 2133 | NULL | - | Bertok |
|
||||
| 2134 | NULL | - | Marincich |
|
||||
| 2135 | NULL | - | Tiblias |
|
||||
| 2138 | NULL | - | Veselica |
|
||||
| 2139 | NULL | - | Naumović |
|
||||
| 2140 | NULL | - | Osojnak |
|
||||
| 2141 | NULL | - | Medle |
|
||||
|
||||
These look like **historical / surname-only roster entries** (note `napomena`
|
||||
on id=1852 mentions "POVIJESNI: KK Kvarner najtrofejnija generacija …" so the
|
||||
cluster is intentional historical data with unknown given name).
|
||||
|
||||
**Decision suggestion**: replace `ime='-'` with `ime='?'` is also rejected;
|
||||
either backfill the given names from a source, mark them inactive/historical
|
||||
in another column, or accept the data and never enable C3.
|
||||
|
||||
---
|
||||
|
||||
## C6 — `UNIQUE (klub_id, lower(ime), lower(prezime), COALESCE(datum_rodenja,'0001-01-01'))` — SKIPPED
|
||||
|
||||
Conflict groups: **68** (each group has 2+ rows that would collide).
|
||||
|
||||
Most are concentrated on **klub_id=2362 (HNK Rijeka roster)** where the same
|
||||
player appears twice — once with `datum_rodenja IS NULL` and once also with
|
||||
NULL DOB but a different scrape source / older `id`. Sample:
|
||||
|
||||
| klub_id | l_ime | l_prez | dob | dups | ids |
|
||||
|---------|-------|--------|-----|------|-----|
|
||||
| 2362 | amer | gojak | NULL | 2 | {3402, 4214} |
|
||||
| 2362 | leon | šerifi | NULL | 2 | {3334, 4238} |
|
||||
| 2362 | ante | oreč | NULL | 2 | {1581, 4230} |
|
||||
| 2362 | branko | pavić | NULL | 2 | {2715, 4231} |
|
||||
| 2362 | lovro | kitin | NULL | 2 | {3481, 4220} |
|
||||
| 2362 | ante | majstorović | NULL | 2 | {3456, 4224} |
|
||||
| 2362 | dejan | petrovič | NULL | 2 | {3399, 4232} |
|
||||
| 2362 | duje | čop | NULL | 2 | {1579, 4211} |
|
||||
| 2362 | fran | škalamera | NULL | 2 | {3480, 4239} |
|
||||
| 2362 | gabriel | rukavina | NULL | 2 | {3404, 4234} |
|
||||
| 2362 | bruno | bogojević | NULL | 2 | {3437, 4208} |
|
||||
| 2362 | aleksa | todorović | NULL | 2 | {3455, 4202} |
|
||||
| 2362 | cherno | saho | NULL | 2 | {3403, 4235} |
|
||||
| 2362 | luka | menalo | NULL | 2 | {3454, 4226} |
|
||||
| 2362 | martin | zlomislić | NULL | 2 | {3440, 4203} |
|
||||
| 2362 | mladen | devetak | NULL | 2 | {3400, 4212} |
|
||||
| 2362 | niko | janković | NULL | 2 | {3607, 4218} |
|
||||
| 2362 | noel | bodetić | NULL | 2 | {3705, 4207} |
|
||||
| 2362 | silvio | ilinković | NULL | 2 | {3412, 4217} |
|
||||
| 2362 | šimun | butić | NULL | 2 | {3401, 4209} |
|
||||
| 2362 | stjepan | radeljić | NULL | 2 | {3448, 4233} |
|
||||
| 2362 | toni | fruk | 2001-03-09 | 2 | {3438, 4135} |
|
||||
| 2362 | vito | kovač | NULL | 2 | {3298, 4201} |
|
||||
| 2362 | jovan | manev | NULL | 2 | {3439, 4225} |
|
||||
| 2585 | ivo | butrica | NULL | 2 | {2282, 4163} |
|
||||
| 2585 | luko | ledinić | NULL | 2 | {2283, 4164} |
|
||||
| 2586 | siniša | saftić | NULL | 2 | {2298, 4165} |
|
||||
| 2587 | damir | poslek | NULL | 2 | {2310, 4167} |
|
||||
| 2589 | matej | viduka | NULL | 2 | {2340, 4174} |
|
||||
| 2589 | čedo | vukelić | NULL | 2 | {2339, 4175} |
|
||||
|
||||
(38 more groups not shown — query reproduction below.)
|
||||
|
||||
**Cause**: the dedup-fold key collapses on `COALESCE(NULL, '0001-01-01')`, so
|
||||
two records of the same name+klub with missing DOB look identical even when
|
||||
they are distinct profiles (different `profile_url`, `source_id`, `hns_igrac_id`).
|
||||
Today's working composite key is the existing `uq_clanovi_klub_profile
|
||||
(klub_id, profile_url)` which is already enforced.
|
||||
|
||||
**Decision suggestion**: do NOT enable C6 as-is. Either (a) restrict the
|
||||
uniqueness to `WHERE datum_rodenja IS NOT NULL`, or (b) merge true dupes via a
|
||||
follow-up subagent that promotes one row and back-fills `hns_igrac_id` /
|
||||
`profile_url`. Until then, ingestion is still protected by
|
||||
`uq_clanovi_klub_profile` and (for HNS-keyed players) `clanovi_hns_uniq`.
|
||||
|
||||
### Reproduce full list
|
||||
|
||||
```sql
|
||||
SELECT klub_id, lower(ime) AS l_ime, lower(prezime) AS l_prez,
|
||||
COALESCE(datum_rodenja, '0001-01-01'::date) AS dob,
|
||||
count(*) AS dups,
|
||||
array_agg(id ORDER BY id) AS ids
|
||||
FROM pgz_sport.clanovi
|
||||
GROUP BY klub_id, lower(ime), lower(prezime), COALESCE(datum_rodenja, '0001-01-01'::date)
|
||||
HAVING count(*) > 1
|
||||
ORDER BY dups DESC, klub_id;
|
||||
```
|
||||
Reference in New Issue
Block a user