NSF Prototype Deployment Note¶

Purpose¶

This document summarizes how the PrivSyn research prototype is deployed for the NSF project deliverable, what has been validated so far, and how the deployment path extends onto UVA Research Computing (RC) infrastructure.

The goal of this deployment is a lightweight, lab-maintainable research prototype for differentially private synthetic tabular data generation. It is intentionally narrower than a production SaaS deployment.

Prototype Architecture¶

The system is organized into three logical layers:

Web frontend
React/Vite application in frontend/
preserves the existing PrivSyn user experience as much as possible
API/backend service
FastAPI application in web_app/main.py
handles uploads, metadata inference, job submission, status polling, and downloads
Execution layer
local execution for workstation testing
Slurm-backed execution for RC/HPC environments

This refactor keeps the existing UI and synthesis methods while separating deployment concerns cleanly.

Implemented User Flow¶

The deployed prototype supports the following research workflow:

Upload a CSV dataset or load the bundled sample dataset.
Infer metadata through the backend.
Confirm or adjust metadata in the web interface.
Submit a synthesis job.
Track job status (pending, running, completed, failed).
Download the generated synthetic CSV.

The backend exposes the following minimal prototype API:

POST /generate
GET /status/{job_id}
GET /download/{job_id}

Compatibility endpoints from the earlier hosted deployment remain in place so the project can still support the older synchronous web flow when needed.

Current Deployment Modes¶

1. Local prototype mode¶

Default mode for development and demonstrations on a single machine.

EXECUTION_MODE=local
jobs are stored under jobs/<job_id>/
synthesis runs through web_app.job_runner

This mode is used for laptop testing, Docker-based demos, and rapid iteration.

2. Hosted compatibility mode¶

The repository still supports the older hosted web deployment pattern:

frontend on Vercel
backend-compatible container deployment
synchronous synthesis fallback for compatibility

This mode exists to preserve the current hosted demo path while the RC deployment path matures.

3. RC / Slurm mode¶

The execution layer supports Slurm-backed synthesis jobs for UVA RC environments.

EXECUTION_MODE=slurm
SlurmExecutor submits local sbatch jobs when the API service itself is on a Slurm-capable host
SshSlurmExecutor submits sbatch jobs over SSH when the API service is hosted elsewhere

This is the intended direction for RC integration.

Validated RC Integration¶

The following RC-facing steps have already been validated:

SSH access to Rivanna submit host
Slurm submission with RC allocation dplab
remote code synchronization to a Rivanna checkout
Python environment bootstrap on Rivanna
short-lived FastAPI smoke tests on Rivanna
frontend static asset sync into the FastAPI-served location

Validated Rivanna settings for the current account include:

SLURM_ACCOUNT=dplab
SLURM_PARTITION=standard
SLURM_REMOTE_PROJECT_ROOT=/home/nkp2mr/privsyn-tabular-rc
SLURM_REMOTE_JOBS_ROOT=/scratch/nkp2mr/privsyn/jobs

Recommended RC Deployment Shape¶

For a stable RC deployment, the preferred architecture is:

a long-running web/API service on an always-on service platform
Slurm-backed execution on Rivanna
shared job/output storage visible to both the API service and compute jobs

This avoids leaving the application running on a Rivanna login node, while still using RC compute allocations for the actual synthesis jobs.

Repository support for this path now includes:

deploy/rc/env.rivanna.example
deploy/rc/sync_to_rivanna.sh
deploy/rc/bootstrap_rivanna.sh
deploy/rc/k8s/

Persistence and Outputs¶

Each synthesis job receives a unique identifier and a dedicated directory containing:

input parquet
confirmed metadata
job metadata
runner logs
synthetic output CSV
Slurm script metadata when applicable

This design keeps the system easy for an academic lab to inspect, debug, and maintain.

Security and Operational Notes¶

uploaded files are stored per job
synthetic outputs are written to per-job directories
SSH-based Slurm submission expects credentials to be mounted as secrets, not baked into images
the RC Kubernetes templates are placeholders and are intended to be copied into an RC-managed deployment repository later

Current Limitations¶

long-running public service hosting is not intended to remain on Rivanna login nodes
the strongest RC deployment path still depends on a final persistent shared storage decision
Kubernetes manifests for RC are templates and will need namespace/storage customization with RC staff

Summary¶

The PrivSyn NSF prototype has been refactored into a deployable research system with:

preserved web workflow
clean API/job separation
local execution for development
Slurm execution for RC integration
documented RC deployment artifacts

This provides a practical bridge from the existing hosted prototype to a more sustainable UVA Research Computing deployment.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search