How does the model stay inside our perimeter?

We deploy an open-weights LLM on infrastructure you control - your servers, your VPC, or an air-gapped environment. The model has no outbound network egress configured. Inference, fine-tuning, and retrieval all happen against systems already inside your network. We hand you the deployment artifacts; you operate them.

What about updates and retraining?

Model updates ship as signed artifacts that your team applies on a cadence you control. There is no auto-update channel, no telemetry, no phone-home. Retraining runs on your infrastructure against your data - we do not request exports.

How is this different from running ChatGPT in your tenant?

Hosted models, even in dedicated tenants, sit behind a vendor's control plane. Logs, metadata, and request payloads typically transit vendor infrastructure under their TOS. Vectra deployments transit nothing - the model files live on your disk, the inference runs on your CPUs/GPUs, and the audit trail belongs to you.

Which deployment modes do you support?

On-prem. Inference on hardware your team owns and patches.
VPC. Inference inside your AWS, GCP, or Azure account, isolated by your existing network controls.
Air-gapped. No network at all. Updates ship by signed media; deployments shipped this way have to make peace with the operational tradeoffs.

What controls do you align with?

We design deployments to integrate with SOC 2, ISO 27001, and the relevant sector frameworks (HIPAA, PCI-DSS, BSA/AML for financial workflows). We do not certify your environment - we hand you the artifacts and the documentation your auditor needs.

Architecture

How does the model stay inside our perimeter?

What about updates and retraining?

How is this different from running ChatGPT in your tenant?

Which deployment modes do you support?

What controls do you align with?