Use these templates to bootstrap language workflows from baseline inference to RAG.
Run a 7B instruction model with vLLM for efficient token throughput
Adapter-based fine-tuning workflow for parameter-efficient LLM training
Minimal retrieval-augmented generation pipeline for document question answering