NSF Prototype Deployment Note¶
Purpose¶
This document summarizes how the PrivSyn research prototype is deployed for the NSF project deliverable, what has been validated so far, and how the deployment path extends onto UVA Research Computing (RC) infrastructure.
The goal of this deployment is a lightweight, lab-maintainable research prototype for differentially private synthetic tabular data generation. It is intentionally narrower than a production SaaS deployment.
Prototype Architecture¶
The system is organized into three logical layers:
- Web frontend
- React/Vite application in
frontend/ - preserves the existing PrivSyn user experience as much as possible
- API/backend service
- FastAPI application in
web_app/main.py - handles uploads, metadata inference, job submission, status polling, and downloads
- Execution layer
- local execution for workstation testing
- Slurm-backed execution for RC/HPC environments
This refactor keeps the existing UI and synthesis methods while separating deployment concerns cleanly.
Implemented User Flow¶
The deployed prototype supports the following research workflow:
- Upload a CSV dataset or load the bundled sample dataset.
- Infer metadata through the backend.
- Confirm or adjust metadata in the web interface.
- Submit a synthesis job.
- Track job status (
pending,running,completed,failed). - Download the generated synthetic CSV.
The backend exposes the following minimal prototype API:
POST /generateGET /status/{job_id}GET /download/{job_id}
Compatibility endpoints from the earlier hosted deployment remain in place so the project can still support the older synchronous web flow when needed.
Current Deployment Modes¶
1. Local prototype mode¶
Default mode for development and demonstrations on a single machine.
EXECUTION_MODE=local- jobs are stored under
jobs/<job_id>/ - synthesis runs through
web_app.job_runner
This mode is used for laptop testing, Docker-based demos, and rapid iteration.
2. Hosted compatibility mode¶
The repository still supports the older hosted web deployment pattern:
- frontend on Vercel
- backend-compatible container deployment
- synchronous synthesis fallback for compatibility
This mode exists to preserve the current hosted demo path while the RC deployment path matures.
3. RC / Slurm mode¶
The execution layer supports Slurm-backed synthesis jobs for UVA RC environments.
EXECUTION_MODE=slurmSlurmExecutorsubmits localsbatchjobs when the API service itself is on a Slurm-capable hostSshSlurmExecutorsubmitssbatchjobs over SSH when the API service is hosted elsewhere
This is the intended direction for RC integration.
Validated RC Integration¶
The following RC-facing steps have already been validated:
- SSH access to Rivanna submit host
- Slurm submission with RC allocation
dplab - remote code synchronization to a Rivanna checkout
- Python environment bootstrap on Rivanna
- short-lived FastAPI smoke tests on Rivanna
- frontend static asset sync into the FastAPI-served location
Validated Rivanna settings for the current account include:
SLURM_ACCOUNT=dplab
SLURM_PARTITION=standard
SLURM_REMOTE_PROJECT_ROOT=/home/nkp2mr/privsyn-tabular-rc
SLURM_REMOTE_JOBS_ROOT=/scratch/nkp2mr/privsyn/jobs
Recommended RC Deployment Shape¶
For a stable RC deployment, the preferred architecture is:
- a long-running web/API service on an always-on service platform
- Slurm-backed execution on Rivanna
- shared job/output storage visible to both the API service and compute jobs
This avoids leaving the application running on a Rivanna login node, while still using RC compute allocations for the actual synthesis jobs.
Repository support for this path now includes:
deploy/rc/env.rivanna.exampledeploy/rc/sync_to_rivanna.shdeploy/rc/bootstrap_rivanna.shdeploy/rc/k8s/
Persistence and Outputs¶
Each synthesis job receives a unique identifier and a dedicated directory containing:
- input parquet
- confirmed metadata
- job metadata
- runner logs
- synthetic output CSV
- Slurm script metadata when applicable
This design keeps the system easy for an academic lab to inspect, debug, and maintain.
Security and Operational Notes¶
- uploaded files are stored per job
- synthetic outputs are written to per-job directories
- SSH-based Slurm submission expects credentials to be mounted as secrets, not baked into images
- the RC Kubernetes templates are placeholders and are intended to be copied into an RC-managed deployment repository later
Current Limitations¶
- long-running public service hosting is not intended to remain on Rivanna login nodes
- the strongest RC deployment path still depends on a final persistent shared storage decision
- Kubernetes manifests for RC are templates and will need namespace/storage customization with RC staff
Summary¶
The PrivSyn NSF prototype has been refactored into a deployable research system with:
- preserved web workflow
- clean API/job separation
- local execution for development
- Slurm execution for RC integration
- documented RC deployment artifacts
This provides a practical bridge from the existing hosted prototype to a more sustainable UVA Research Computing deployment.