Documentation

Everything you need to launch GPU instances, train models, and deploy inference endpoints on FuturaNexus.

Quick Start

Account setup, SSH keys, first instance in 60 seconds

Instance Types

GPU specs, RAM, vCPUs, NVLink, multi-GPU configs, spot vs on-demand

Environments

Pre-configured PyTorch, JAX, TensorFlow, vLLM, ComfyUI, Ollama

SSH & Access

Zero-drop SSH, Jupyter Lab, VS Code Server, HTTP ports

Training

Closed-Form Newton, Reasoning Alignment, SFT/DPO/GRPO/LoRA/QLoRA, multi-stage pipelines, scaled training

Data Pipeline

Cloud connectors, transfer manager, model cache, dataset catalog

Models & Inference

Import, deploy, OpenAI-compatible API endpoints

Storage

Persistent volumes, BYO S3/R2/Wasabi, anti-orphan TTL

Providers

Managed vs BYO (Vast.ai, Lambda, RunPod, Hyperbolic, CoreWeave), API key setup

Billing

Per-second billing, budget caps, auto-shutdown, invoices

API Reference

REST API, authentication, WebSocket events, rate limits

CLI

Command-line interface for scripting and CI/CD automation

Quick Start

1. Create an Account

Sign up at futuranexus.io/login with GitHub or Google. Add a payment method and you're ready to launch. No minimum spend required.

2. Add SSH Key

Settings → SSH Keys → Add SSH Key. Or generate one:

ssh-keygen -t ed25519 -C "your@email.com" -f ~/.ssh/futuranexus
cat ~/.ssh/futuranexus.pub  # Copy this to Settings → SSH Keys

Ed25519 recommended. RSA 4096-bit and ECDSA also supported. Keys are auto-injected into every new instance.

3. Launch an Instance

Dashboard → Launch Instance → Select instance type (GPU + RAM + vCPUs) → Choose environment (PyTorch, JAX, etc.) → Enable access methods (SSH, Jupyter, VS Code) → Set budget cap → Launch. Running in under 30 seconds.

dashboard.futuranexus.app/instances/new

Launch a GPU instance

Up and running in under a minute. Billed per second.

Instance name

GPU

H100 SXMPopular

80 GB VRAM$3.00/hr

A100 80GB

80 GB VRAM$2.00/hr

RTX 4090

24 GB VRAM$0.50/hr

A6000

48 GB VRAM$0.80/hr

Environment

PyTorch 2.4vLLMJAXCustom

Summary

GPUH100 SXM

Rate$3.00 / hr

Per second$0.000833

EnvironmentPyTorch 2.4

3,200 credits available

1/4Name your instance

A friendly name — find it later in the dashboard.

4. Connect

Use the in-browser terminal, Jupyter Lab link, VS Code Server link, or connect via native SSH:

ssh -i ~/.ssh/futuranexus root@<instance-ip>
# Or click "Terminal" in the dashboard — zero-drop, session persists across disconnects

5. Start Training (Optional)

Dashboard → Training → New Job → Select a base model (Qwen3, Llama, Gemma, etc.) → Upload dataset → Configure method (Closed-Form Newton for single-pass optimization, Reasoning Alignment, SFT, DPO, GRPO, LoRA, QLoRA) → Enable multi-stage pipeline or scaled training → Start. GPU is auto-selected based on model size.

Instance Types

Available GPUs

Each instance type is a complete machine: GPU(s) + system RAM + vCPUs + local SSD. Multi-GPU configs (×2, ×4, ×8) are provisioned as a single node with NVLink/NVSwitch high-speed interconnect — not separate machines.

GPU	VRAM	RAM	vCPUs	Interconnect	BW	FP16 TFLOPS
RTX 4090	24 GB	64 GB	16	—	1.0 TB/s	330
L40S	48 GB	128 GB	16	—	864 GB/s	362
A100 40GB	40 GB	128 GB	12	NVLink 3	1.5 TB/s	312
A100 80GB	80 GB	256 GB	16	NVLink 3	2.0 TB/s	312
H100 SXM	80 GB	256 GB	26	NVLink 4	3.35 TB/s	989
H200 SXM	141 GB	480 GB	32	NVLink 4	4.8 TB/s	989
B200 (coming)	192 GB	512 GB	48	NVLink 5	8.0 TB/s	2,250

Multi-GPU configs available for A100, H100, and H200 in ×1, ×2, ×4, and ×8. Full 8-GPU nodes use NVSwitch for maximum interconnect bandwidth.

On-Demand vs Spot

On-demand: Pay per second, stop anytime. No interruptions. Best for inference servers, interactive development, and long-running jobs that can't tolerate interruption.

Spot: 40-60% off on-demand rates. Uses spare GPU capacity. Your instance can be reclaimed with 60 seconds notice if demand spikes. Best for training with checkpointing — if preempted, resume from the last checkpoint.

Tip: Enable "Gradient checkpointing" and save checkpoints every N steps when using spot instances. If preempted, launch a new spot instance, re-attach your persistent volume, and resume.

Instance Lifecycle

Requested → Provisioning → Booting → Running → Stopping → Terminated. Failed instances show error details in the dashboard. Budget cap reached triggers a 5-minute warning before auto-termination.

Auto-shutdown on idle: Configurable idle timeout (15 min–2 hours). Triggers when no SSH sessions are active and GPU utilization is below 5%. Prevents forgotten instances from burning credits.

Environments

Pre-Configured Environments

Every environment includes CUDA toolkit, cuDNN, and NCCL. Higher-level environments add framework-specific packages. All environments include nvidia-smi, htop, tmux, git, and wget.

🔥 PyTorch 2.5 + CUDA 12.4torch, torchvision, transformers, accelerate, peft, bitsandbytes, datasets, wandb, tensorboard, jupyter

🧪 JAX + CUDA 12.4jax, flax, optax, orbax, jupyter

📊 TensorFlow 2.17tensorflow, keras 3, tensorboard, jupyter

⚡ vLLM InferencevLLM server with OpenAI-compatible API

🎨 ComfyUIStable Diffusion with ComfyUI node-based workflow

🦙 OllamaRun and serve open-weight LLMs via CLI

🔧 CUDA 12.4 MinimalCUDA toolkit, cuDNN, NCCL only

🐧 Ubuntu 22.04 (Bare)Clean OS, full root access

Startup Scripts

Optional bash scripts that run automatically after the environment boots. Use them to clone repos, install extra packages, download datasets, or set environment variables.

#!/bin/bash
# Clone your repo
git clone https://github.com/your-org/your-repo.git /workspace/repo
cd /workspace/repo && pip install -r requirements.txt

# Download a HuggingFace dataset
python -c "from datasets import load_dataset; load_dataset('tatsu-lab/alpaca').save_to_disk('/data/alpaca')"

# Set env vars
export WANDB_API_KEY=your_key_here

Startup script logs are visible in the terminal after boot.

SSH & Access Methods

Every running instance exposes four ways in — an in-browser zero-drop terminal, native SSH, Jupyter Lab, and VS Code Server — all from the instance card. Close the tab and the session keeps running; reconnect with full scrollback. See it:

dashboard.futuranexus.app/instances/llama-finetune

llama-finetuneRunning

H100 SXM · 198.51.100.42 · up 1h 12m

PyTorch 2.5

Access methods

Web Terminal

Zero-drop, in-browser

Native SSH

Your client + key

Jupyter Lab

Port 8888, token

VS Code Server

Port 8080, IDE

root@llama-finetunezero-drop · session persists

$ nvidia-smi --query-gpu=name,memory.used --format=csv,noheader
H100 SXM, 41216 MiB
$ tail -f train.log
step 1420/4000  loss 0.612  lr 1.8e-4  12.4 tok/s/gpu
# ── browser tab closed, reopened 3 min later ──
$ # …still here. scrollback intact.

1/3Open your instance

Running, healthy, and reachable — access methods live on the card.

Zero-Drop SSH

Our control plane maintains a persistent SSH connection to your instance. When you connect via the web terminal or native SSH, you're proxied through this persistent link. If your browser disconnects, the SSH session continues. Reconnect and pick up exactly where you left off — full scrollback preserved.

Native SSH

# Basic connection
ssh -i ~/.ssh/futuranexus root@<instance-ip>

# With SSH config (~/.ssh/config)
Host fn-*
  User root
  IdentityFile ~/.ssh/futuranexus
  ServerAliveInterval 30
  ServerAliveCountMax 3

# Then: ssh fn-<instance-name>

Jupyter Lab

Auto-starts on port 8888 when enabled. Accessible directly via a dashboard link — no port forwarding needed. Token is auto-generated and shown in the instance detail page.

VS Code Server

Full VS Code IDE in your browser via code-server on port 8080. Extensions, themes, keybindings, settings sync — everything works. Accessible via dashboard link.

Port Forwarding & HTTP Ports

# Forward Jupyter manually (if not using dashboard link)
ssh -L 8888:localhost:8888 root@<instance-ip>

# Forward TensorBoard + Gradio
ssh -L 6006:localhost:6006 -L 7860:localhost:7860 root@<instance-ip>

# Or enable "Expose HTTP Ports" at launch — auto-generates
# public URLs for ports 7860-9000 (Gradio, Streamlit, FastAPI, etc)

File Transfer

# Upload dataset
scp -i ~/.ssh/futuranexus data.jsonl root@<ip>:/workspace/data/

# Download trained model
scp -i ~/.ssh/futuranexus -r root@<ip>:/workspace/output/ ./local-output/

# Rsync for large transfers (resume on interrupt)
rsync -avz -e "ssh -i ~/.ssh/futuranexus" root@<ip>:/workspace/checkpoints/ ./checkpoints/

Training

Go from a base model to a fine-tuned checkpoint without touching a launch script. Pick a model, point at your data, choose a method — the platform sizes the GPU, streams live loss curves, and registers the result. Step through the real flow below.

dashboard.futuranexus.app/training/new

New training job

Fine-tune an open model on your data. GPU auto-selected by model size.

Base model

Llama 3.3 70BPopular

70B params~40 GB (QLoRA)

Qwen3 32B

32B params~22 GB (QLoRA)

Gemma 3 12B

12B params~10 GB (LoRA)

Mistral Small 24B

24B params~16 GB (QLoRA)

Dataset

Detected: JSONL (instruct) · 52,002 rows · validated

Method

LoRA

Adapter weights only

QLoRA

4-bit base + adapters

SFT

Full supervised tune

Summary

ModelLlama 3.3 70B

MethodQLoRA · 4-bit

GPUA100 80GB

Epochs3

Est. time~2h 10m

Fits on one GPU

1/4Pick a base model

Llama, Qwen3, Gemma, Mistral… VRAM estimated per method, GPU auto-selected.

Supported Methods

SFT (Supervised Fine-Tuning)

Standard fine-tuning on instruction/response pairs. Best for teaching a model new tasks or domains.

DPO (Direct Preference Optimization)

Align model with human preferences using chosen/rejected pairs. No reward model needed.

GRPO (Group Relative Policy Optimization)

RL-based alignment with verifiable rewards. Best for math, code, and factual accuracy.

LoRA (Low-Rank Adaptation)

Parameter-efficient fine-tuning — trains small adapter weights, not the full model. 10-100x less VRAM.

QLoRA (Quantized LoRA)

LoRA on a 4-bit quantized base model. Train 70B models on a single A100 80GB.

Model Selection

Select any public model from the HuggingFace Hub or enter a custom model ID. Popular models (Llama 4, Llama 3.3, Gemma 3, Mistral Small, Phi-4, DeepSeek R1) are pre-listed with VRAM estimates. GPU is auto-selected based on model size — override manually if needed.

Dataset Formats

JSONL (Chat){"messages": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}

JSONL (Instruct){"instruction": "...", "output": "..."}

JSONL (ShareGPT){"conversations": [{"from": "human", "value": "..."}, {"from": "gpt", "value": "..."}]}

CSVColumns: instruction, input (optional), output

ParquetHuggingFace Datasets format — columns auto-detected

Upload directly or provide a HuggingFace dataset ID. Format is auto-detected and validated before training starts. Max 10 GB per upload.

Training Options

Gradient checkpointing: Reduces VRAM usage at the cost of ~20% slower training. Enabled by default for models >7B.

Completions-only: Mask prompt tokens during training — the model only learns from responses, not instructions. Reduces noise.

Early stopping: Automatically stops training when validation loss stops improving (patience: 3 eval steps).

Push to HuggingFace Hub: Automatically push the trained LoRA adapter to your HuggingFace account on completion.

Multi-Stage Pipeline

Chain multiple training stages in sequence. Example: Closed-Form Newton (fast adapter initialization in minutes) → Reasoning Alignment (align reasoning with constitutional principles) → SFT refinement (polish on curated examples). Each stage uses the output of the previous stage as its starting point.

Scaled Training

Multi-GPU and multi-node distributed training with configurable parallelism strategy. Supports DDP, FSDP (Fully Sharded), DeepSpeed ZeRO-2/3, and Pipeline Parallelism. Configure nodes, GPUs per node, micro-batch size, and communication backend (NCCL, Gloo, MPI).

Native Control Plane

For maximum performance, dispatch training to the native binary. Zero garbage collection pauses, direct Metal GPU compute on Apple Silicon unified memory, zero-copy CPU↔GPU transfer. The Closed-Form Newton engine runs natively with no Python overhead — optimal adapter weights computed analytically via SVD decomposition.

Data Pipeline

Cloud Connectors

Connect external storage backends to pull and push data to your instances and the platform cache. Supported connectors:

AWS S3Standard S3 buckets. Provide access key + secret or use IAM role.

Google Cloud StorageGCS buckets via service account JSON key.

Azure Blob StorageAzure containers via connection string or SAS token.

Cloudflare R2S3-compatible. Uses R2 API tokens. Zero egress fees.

HuggingFace HubPull models and datasets directly from HF. Uses your access token for gated content.

HTTP / URLDirect download from any public or authenticated HTTPS URL.

Credentials are encrypted at rest. Configure in Settings → Storage Providers or inline when creating a connector.

Transfer Manager

Transfers are created automatically when you cache a model or import a dataset, or you can initiate them manually from a connector. Features:

Parallel streams: Large files are split into concurrent streams for maximum throughput.

Progress tracking: Real-time progress percentage, transfer speed, and ETA displayed in the dashboard.

Pause/Resume: Pause active transfers and resume later without re-downloading.

Auto-retry: Transient failures are automatically retried with exponential backoff.

Model Cache

The platform maintains a shared NVMe cache for frequently-used models. When you cache a model (e.g., from HuggingFace Hub), it's stored on high-speed platform storage and can be instant-mounted to any instance — no re-download required.

Use the quick import bar on the Data Pipeline → Model Cache tab: enter a HuggingFace repo ID (e.g., meta-llama/Llama-3.1-8B-Instruct) and click Cache Model.

Cached models can be evicted when no longer needed. Models currently mounted to running instances cannot be evicted.

Dataset Catalog

Import datasets from HuggingFace Datasets, S3 buckets, or direct URLs. Imported datasets are indexed and can be used directly in training jobs.

Source prefixes: huggingface:// for HF Datasets, s3:// for S3 objects, or a direct HTTPS URL.

Streamable datasets: Large datasets can be streamed directly to training without requiring a full local copy.

Models & Inference

Any model in your registry deploys as an OpenAI-compatible endpoint in a single click — the platform provisions vLLM, loads the weights, and warms the URL. Swap your base_url and you're live. Walk the flow:

dashboard.futuranexus.app/models/deploy

Deploy an inference endpoint

OpenAI-compatible API, one click. Streaming, temperature, stop sequences — all supported.

Model

alpaca-llama-70b

safetensors39 GB

qwen3-coder-sft

safetensors16 GB

gemma3-support

GGUF Q47 GB

Instance

A100 80GB

$2.00/hr

H100 SXM

$3.00/hr

L40S

$1.10/hr

Your endpoint

Live · OpenAI-compatible

base_url = "https://api.futuranexus.io/v1"
model    = "alpaca-llama-70b"
# POST /chat/completions  → streaming ready

Deployment

BackendvLLM 0.6

GPUA100 80GB

Context32k tokens

Cold start~45 s

Rate$2.00 / hr

1/4Pick a model

Anything in your registry — trained, imported, or uploaded.

Model Registry

The model registry stores all your models — imported from HuggingFace, uploaded directly, trained on the platform, or pulled via URL. Each model tracks its format (safetensors, GGUF, ONNX, PyTorch), size, and deployment status.

Importing Models

Four import methods are available:

HuggingFace HubEnter a repo ID (e.g., meta-llama/Llama-3.1-8B-Instruct). Supports gated models with your HF access token.

URLDirect download from any HTTPS URL. Format auto-detected from file extension.

File UploadUpload model files directly via the browser. Supports multi-part uploads up to 50 GB.

From TrainingModels trained on the platform are automatically added to the registry.

Deploying as Inference Endpoint

Any model in the registry can be deployed as an OpenAI-compatible inference endpoint with one click. The platform provisions a vLLM inference instance, loads the model weights, and generates an API endpoint URL.

# Example: call your deployed model
curl https://api.futuranexus.io/v1/chat/completions \
  -H "Authorization: Bearer fn_prod_sk_..." \
  -H "Content-Type: application/json" \
  -d '{"model": "your-model-name", "messages": [{"role": "user", "content": "Hello!"}]}'

Endpoints support the full OpenAI Chat Completions API including streaming, temperature, top_p, max_tokens, and stop sequences.

Storage

Persistent volumes outlive any instance — create one, attach it at a mount path, and re-attach it to the next box in the same region. Or back it with your own S3 / R2 / Wasabi bucket and keep full ownership of the bytes.

dashboard.futuranexus.app/storage/new

New persistent volume

Datasets, checkpoints, and models that outlive the instance.

Volume name & size

100 GB500 GB1 TBCustom

Attach to instance

llama-finetune

mount → /workspace/data

will attach

Backing store

Managed SSD

$0.05/GB·mo

AWS S3

Your bucket

Cloudflare R2

Zero egress

Summary

Namecheckpoints-q3

Size500 GB

BackingManaged SSD

Regionus-east

Cost$25.00 / mo

Anti-orphan TTL: 7 days

1/3Create a volume

Network-attached SSD that survives termination — name it and size it.

Storage Types

Ephemeral (free): Local NVMe SSD on the instance. Fast but lost on termination. Good for temp files, cache, and scratch space.

Persistent ($0.05/GB/mo managed): Network-attached SSD that survives instance termination. Can be re-attached to any new instance in the same region. Use for datasets, checkpoints, and trained models.

Object storage: S3-compatible storage for large datasets and artifacts. Not mounted as a filesystem — accessed via API or tools like aws s3 cp.

Anti-Orphan Protection

Detached persistent volumes have a configurable TTL (time-to-live). When a volume is detached from a terminated instance, the TTL countdown begins. If not re-attached within the TTL period, the volume is automatically deleted. This prevents forgotten volumes from silently accumulating storage charges. You'll receive a notification before deletion.

BYO Storage Providers

Connect your own cloud storage in Settings → Storage Providers. Supported backends:

AWS S3 · Cloudflare R2 · Wasabi · Google Cloud Storage · Backblaze B2

When you create a persistent volume with a BYO provider, data is stored in your bucket — you retain full ownership. Credentials are encrypted at rest.

Compute Providers

FuturaNexus Managed (Default)

We provision GPUs from our provider network. Best availability, fastest launch times (under 30 seconds), automatic failover, and no setup required. Prices as listed in the pricing page.

BYO (Bring Your Own) Provider

Use your own API key from a third-party provider. We handle orchestration, monitoring, zero-drop SSH, and the full dashboard experience — you just bring the compute. Supported providers:

Vast.ai · Lambda · RunPod · Hyperbolic · CoreWeave

Add your API key in Settings → Providers. A flat 5% orchestration fee applies for dashboard, monitoring, and SSH proxy services. You pay the provider directly for GPU compute.

Billing

Spend is visible to the second and fenced by guardrails you set: a hard budget cap with a 5-minute grace warning, and auto-shutdown when an instance goes idle. Nothing burns credits while you sleep. Step through it:

dashboard.futuranexus.app/billing

Billing & controls

Spend visible to the second, with hard guardrails you set.

This instance · live

$4.71

H100 · $3.00/hr · $0.000833/s

metering · 1h 34m 12s

Budget cap$4.71 / $20.00

$10$20$50Custom

Notifications at 80% · 95% · 100%, then a 5-minute warning before auto-terminate.

Auto-shutdown on idle

No active SSH and GPU utilization below 5% for 30 minutes.

15m30m1h2h

1/3Watch the per-second meter

Every GPU billed to the second — no rounding, no hourly minimums.

Per-Second Billing

All GPU charges are per-second with no rounding — ever. If you use an H100 for 47 seconds, you pay for exactly 47 seconds. No hourly minimums, no rounding up.

Budget Caps

Set a maximum spend per instance. When the cap is reached, the instance auto-terminates with a 5-minute warning — enough time to save checkpoints and data. You'll receive notifications at 80%, 95%, and 100% of the cap.

Auto-Shutdown on Idle

Configurable idle timeout (15 min to 2 hours). Triggers when no SSH sessions are active and GPU utilization is below 5%. Prevents forgotten instances from burning credits. Recommended for all interactive development sessions.

Credits & Invoices

Pre-paid credits are applied first to all charges. Monthly invoices are generated for any usage beyond your credit balance. Download PDF invoices from the Billing page. All prices are in USD.

API Reference

Authentication

Create API keys in Settings → API Keys. Include the key in the Authorization header:

curl -H "Authorization: Bearer fn_prod_sk_..." \
  https://api.futuranexus.io/v1/instances

API keys can be scoped to specific resources (instances, training, storage, billing). Rate limit: 100 req/min per key.

Endpoints

GET /v1/instancesList all instances

POST /v1/instancesLaunch new instance

GET /v1/instances/:idGet instance details + metrics

POST /v1/instances/:id/stopStop instance

DELETE /v1/instances/:idTerminate & delete

GET /v1/trainingList training jobs

POST /v1/trainingStart training job

GET /v1/training/:idGet job details + metrics

GET /v1/modelsList trained models

POST /v1/models/:id/deployDeploy as endpoint

GET /v1/storage/volumesList volumes

POST /v1/storage/volumesCreate volume

GET /v1/billing/usageCurrent period usage

GET /v1/billing/creditsCredit balance

GET /v1/billing/invoicesInvoice history

WebSocket Events

Real-time instance metrics and state changes via WebSocket at wss://api.futuranexus.io/v1/ws. Events: instance.state_change, instance.metrics, training.progress, training.complete, billing.budget_warning.

CLI

Installation

# Install via npm
npm install -g @futuranexus/cli

# Or via Homebrew (macOS/Linux)
brew install futuranexus/tap/fn

# Authenticate
fn auth login

Common Commands

# List instances
fn instances list

# Launch instance
fn instances launch --gpu h100_sxm --env pytorch-full --name my-run

# SSH into instance
fn ssh my-run

# Start training
fn training start --model meta-llama/Llama-3.1-8B-Instruct --dataset ./data.jsonl --method sft

# Check training status
fn training status tj_001

# List volumes
fn storage list