RC Deployment Handoff Checklist

This checklist is intended for handoff to UVA Research Computing staff or collaborators who will help host the PrivSyn research prototype on RC-managed infrastructure.

1. Application Scope

The deployed prototype is expected to support:

  • CSV upload
  • metadata inference and confirmation
  • synthetic tabular data generation
  • job status polling
  • synthetic CSV download

The system is not intended to be a production public SaaS deployment.

  • long-running frontend/API service on RC-managed microservice or Kubernetes infrastructure
  • synthesis execution on Rivanna through Slurm
  • shared job/output storage visible to both the API service and Rivanna compute jobs

The application should not be left running persistently on a Rivanna login node.

3. What Is Already Prepared In This Repository

  • FastAPI backend with job manager
  • local executor for workstation testing
  • Slurm executor for direct Slurm submission
  • SSH-backed Slurm executor for remote API hosting
  • Dockerfile for the API + bundled frontend
  • RC helper scripts under deploy/rc/
  • Kubernetes starter manifests under deploy/rc/k8s/
  • NSF deployment summary under docs/nsf-prototype-deployment.md

4. Inputs Needed From RC

  • target namespace/project for RC microservice hosting
  • approved persistent storage class and PVC sizing
  • external hostname or ingress route
  • secret-management mechanism for SSH credentials
  • preferred image registry and image pull flow
  • confirmation of how the shared jobs directory should be mounted
  • any RC-required network policy or egress restrictions

5. Inputs Needed From The Project Team

  • container image to deploy
  • SSH keypair dedicated to Slurm submission
  • Rivanna submit host information
  • Slurm account and partition settings
  • initial job resource defaults
  • expected upload size and job duration envelope
  • retention policy for job inputs/outputs/logs

6. Runtime Configuration To Finalize

  • EXECUTION_MODE=slurm
  • SLURM_ACCOUNT=dplab
  • SLURM_PARTITION=standard
  • SLURM_SSH_TARGET=<submit-host>
  • SLURM_REMOTE_PROJECT_ROOT=<runnner-checkout>
  • SLURM_REMOTE_JOBS_ROOT=<shared-jobs-root>
  • SLURM_REMOTE_RUNNER_COMMAND=<python-runner-command>
  • JOBS_ROOT=<shared-jobs-root-mounted-in-service>

7. Storage Requirements

The final RC deployment should avoid using Rivanna /scratch as the durable system of record for jobs.

The target shared storage should support:

  • per-job input parquet
  • metadata JSON
  • synthetic CSV output
  • runner stdout/stderr logs
  • Slurm script persistence

8. Security Requirements

  • SSH private key must be mounted as a secret, not embedded in the image
  • known_hosts should be pinned
  • uploaded datasets should remain scoped to the project’s authorized environment
  • public ingress should be limited to the intended audience if this is still an internal prototype

9. Acceptance Tests Before Launch

  • frontend loads successfully through the RC ingress
  • upload and metadata inference succeed
  • /generate submits a Slurm job
  • /status/{job_id} transitions correctly
  • output CSV is generated and downloadable
  • failure path returns failed cleanly with logs available
  • job files appear in the shared jobs directory

10. Operational Follow-Up

  • define job retention/cleanup policy
  • define storage quota monitoring
  • decide whether completed jobs should be archived or pruned automatically
  • decide whether authentication should be added before broader access

11. Repository Pointers

  • deploy/rc/README.md
  • deploy/rc/env.rivanna.example
  • deploy/rc/sync_to_rivanna.sh
  • deploy/rc/bootstrap_rivanna.sh
  • deploy/rc/k8s/
  • docs/nsf-prototype-deployment.md