Pitch DGX + CUDA pilots
The briefing recommends offering a short DGX + CUDA pilot to Baseten for post‑training/distillation work and to benchmark vLLM‑Omni multi‑modal inference — packets mentioned explicitly include targeted benchmarks and pilot credits. The suggestion ties Baseten’s Post‑Training Research Engineer posting and Modal’s vLLM‑Omni release into a concrete pilot opportunity. ( )
Baseten posted a Post‑Training Research Scientist role (date posted March 18, 2026) that explicitly calls for experiments at multi‑node scale and work on 1T+ parameter model evaluations. (goremotejob.com) Baseten’s research blog "Distillation without the dark" details distilling Qwen3‑4B from GPT‑5.2 using 8× H200 GPUs in roughly 10 hours on 20,000 examples via Baseten Training. (baseten.co) Baseten closed a $300M financing at a $5B valuation on January 23, 2026 with NVIDIA named as an anchor investor alongside IVP and CapitalG. (businesswire.com) vLLM‑Omni was published on GitHub as a framework for efficient omni‑modality inference (text, image, video, audio) with explicit design for disaggregated, multi‑stage serving pipelines. (github.com) The vLLM‑Omni docs and repo show platform‑agnostic installation paths (CUDA, ROCm, XPU) and recent CI/nightly benchmark work added to the project, signaling ready targets for end‑to‑end throughput and latency comparisons. (docs.vllm.ai) Baseten lists customers including Cursor, Notion and Gamma and positions its platform for both training and mission‑critical inference, matching the scale and workflows used in their published distillation experiments. (jobs.ashbyhq.com)