Unsloth: write parametric memory in with a fine-tune config
Write a valid Unsloth fine-tune config that bakes stable, always-needed domain knowledge into a small model so the knowledge is native rather than carried in a prompt on every call.
Run this workflow
CI-verified, 2/2 fixtures passing.
Build this with your agent
One copy-paste hands Claude Code, Codex, or Cursor the full recipe, steps included, nothing to fetch.
Intended Use
Anyone who wants domain knowledge (jargon, house style, factual context) to be native to the model rather than pasted in every prompt. CI validates the fine-tune config: a model name, max_seq_length, a LoRA block with r set, and training hyperparameters including a learning rate and output dir. No install, no GPU, no key. The actual fine-tune is fenced.
Not for
- Knowledge that changes weekly, parametric memory costs a retrain to update; use retrieval memory (LlamaIndex, Cognee) instead for anything dynamic
- Fine-tuning without a GPU, Unsloth needs CUDA; Google Colab free-tier T4 is the cheapest no-hardware path
- The Unsloth Studio UI in a commercial product without reading the AGPL-3.0 terms
- Trusting the first fine-tune, overfitting and catastrophic forgetting are real risks; evaluate on a hold-out set before shipping
The Stack
Tested Against
unsloth docs 2026-07node@20Side effects & data flow
- Network
- none, local only
- Writes
- ./finetune-config.json
- Credentials
- none required
Prerequisites
- A CUDA-capable GPU or Google Colab for the actual fine-tune
- pip install unsloth (installs torch + transformers + trl; heavy, ~4 GB) for the fenced training step
Steps
- 1
Write the fine-tune config and validate it
Define the target model, context length, LoRA rank, and training hyperparameters. CI checks the config parses and all required fields are present. The actual fine-tune (which needs CUDA and your dataset) is fenced.
cat > finetune-config.json <<'JSON' { "model_name": "unsloth/Meta-Llama-3.1-8B-bnb-4bit", "max_seq_length": 2048, "load_in_4bit": true, "lora": { "r": 16, "lora_alpha": 16, "lora_dropout": 0, "target_modules": ["q_proj", "k_proj", "v_proj", "o_proj"] }, "training": { "per_device_train_batch_size": 2, "gradient_accumulation_steps": 4, "warmup_steps": 5, "max_steps": 60, "learning_rate": 2e-4, "output_dir": "./outputs" } } JSON node -e ' const fs = require("fs"); const c = JSON.parse(fs.readFileSync("finetune-config.json", "utf8")); function bad(m) { console.error("BAD: " + m); process.exit(1); } if (!c.model_name || typeof c.model_name !== "string") bad("model_name must be a string"); if (!c.max_seq_length || typeof c.max_seq_length !== "number") bad("max_seq_length must be a number"); const lora = c.lora || {}; if (!lora.r || typeof lora.r !== "number") bad("lora.r (rank) must be a number"); const train = c.training || {}; if (!train.learning_rate) bad("training.learning_rate must be set"); if (!train.output_dir) bad("training.output_dir must be set"); console.log("config OK: fine-tune " + c.model_name + " at seq=" + c.max_seq_length + " LoRA r=" + lora.r + " lr=" + train.learning_rate); ' - 2
Run the fine-tune (the GPU step, not checked by CI)
Install Unsloth (pip install unsloth), load the model with FastLanguageModel.from_pretrained() using your LoRA config, then run SFTTrainer on your dataset. Evaluate on a hold-out set before shipping. Training and quality are fenced.
Eval, 2 fixtures
Last passed: verified todayconfig-okcontainstimeout 30s · max $0Expected:
config OK: fine-tune unsloth/Meta-Llama-3.1-8B-bnb-4bit at seq=2048 LoRA r=16clean-exitexit_codetimeout 30s · max $0Expected:
0
Results
Unsloth reports 2x faster fine-tuning and 70% less VRAM than standard QLoRA, making a fine-tune that used to need an A100 runnable on a 24 GB consumer GPU. The knowledge gain is permanent for the run's life; updating it means retraining.
Did this work for you?
Our CI checks the setup runs. You tell us if the whole thing worked. Tell us straight.
Related workflows
- Validate an Apple Core AI export entry and skill plugin before you touch a Mac
- Serve NVIDIA Nemotron 3 Ultra yourself for high-throughput agents (vLLM)
- Serve GLM-5.1 yourself for long-horizon agentic coding (vLLM)
- Serve MiniMax M3 yourself for agentic coding (vLLM)
- Local model chore: read a photo with a vision model, on-device
- Local model chore: draft a sensitive message in private
Liked this workflow?
Get new verified workflows in WebAfterAI, three issues a week (Tue, Thu, Sat).