Privacy Budget Notes¶
PrivSyn supports two synthesis engines—PrivSyn (rho-CDP) and AIM (approximate zCDP). This note summarises how epsilon, delta, and rho play together so you can rationalise the budget before deploying to production.
1. PrivSyn¶
- Interface: the API accepts an
epsilon/deltapair. Internally the PrivSyn library converts these to a rho-zCDP budget usingmethod/util/rho_cdp.py. - Iterations: the number of update iterations (
update_iterations) and consistency passes (consist_iterations) consume the same global budget. Reducing them lowers overall runtime and the amount of per-marginal noise but also reduces fidelity. - Preprocessing: discretisation (e.g. PrivTree binning) can optionally draw a portion of the budget via
dp_budget_fractionin the UI. When set, the fraction reduces the budget available to the main synthesiser.
Practical Tips¶
- Prefer providing tight numeric bounds to avoid the extra noise introduced by clipping.
- For categorical columns, trimming rare values (or mapping them to the special token) improves stability.
- When exploring, start with ε between 0.5 and 2.0 and δ ≤ 1e-5; adjust upward only if the quality is insufficient and privacy policy allows it.
2. AIM¶
- AIM consumes the provided
(epsilon, delta)pair directly. - Runtime is sensitive to the number of iterations and workload size; for quick evaluations consider reducing the workload prior to deployment or sampling a subset of columns.
3. Rho Conversion¶
The helper method/util/rho_cdp.py implements conversions between (ε, δ) and ρ (zCDP). Keep in mind that when composing multiple DP mechanisms (e.g., preprocessing + synthesis + evaluation), the rho values add linearly:
ρ_total = ρ_preprocess + ρ_synthesis + ρ_evaluation
Use the conversion utilities if you need to enforce a global privacy ledger across multiple pipelines.
4. Evaluation Considerations¶
The /evaluate endpoint uses histogram-aware TVD which is not differentially private—it's intended only for offline quality assessment. Avoid exposing it to untrusted users or run it only on subsampled/aggregated outputs.
5. Future Work¶
- Pluggable privacy budget managers (e.g., Google DP accounting) to track consumption across multiple runs.
- UI hints for typical epsilon/delta values per domain (health, finance, etc.).
For more background on CDP and the PrivSyn method, see the references in the pdf/ folder.