---
name: meta-llama3-3b-auto-finetune
description: Automated end-to-end pipeline to generate training data, fine-tune Meta Llama 3 (3B) with adapters, run adapter and merged-model tests, convert to GGUF, quantize, and produce final tested model. Triggers: "auto-finetune", "finetune pipeline", "llama3 3b pipeline"
---
name: meta-llama3-3b-auto-finetune
description: Automated end-to-end pipeline to generate training data, fine-tune Meta Llama 3 (3B) with adapters, run adapter and merged-model tests, convert to GGUF, quantize, and produce final tested model. Triggers: "auto-finetune", "finetune pipeline", "llama3 3b pipeline"

---

# Skill: Meta Llama 3 (3B) Automatic Fine-tune Pipeline

Purpose

This Skill defines a fully automated pipeline that: automatically generates task-specific training data, fine-tunes Meta Llama3 3B using adapter-style training (PEFT), runs tests on adapter-only checkpoints, merges adapters into the base model and re-tests, converts the merged model to GGUF format, performs quantization, and runs final tests. The pipeline is intended for single-developer use on a personal laptop or server and will provide brief notifications at key milestones.

Step-by-step instructions Claude must follow

1. Pre-flight checks
   - Verify required tools and libraries exist (or list them to user if missing): Python >=3.9, PyTorch, transformers, peft, bitsandbytes (optional for 8-bit), accelerate, llama.cpp or ggml tools for GGUF, sentencepiece/tokenizers if needed, and evaluation utilities (BLEU/accuracy scripts). Use reasonable defaults when versions unknown.
   - Confirm available GPU/CPU resources and disk space; if resources low, warn and adapt defaults (smaller batch size, fewer epochs).

2. Collect configuration (use sensible defaults if user provides none)
   - Base model path or Hugging Face identifier for Meta Llama 3 3B.
   - Task description (short plain-English prompt describing service-specific function). If not provided, prompt user once; otherwise use default placeholder: "task: implement my_service_tool behavior".
   - Number of synthetic examples to generate (default 2000), validation split (default 10%), seed, epochs for adapter training (default 3), adapter size/alpha (defaults), learning rate, batch size.
   - Quantization target (e.g., 4-bit/8-bit) and GGUF output path.
   - Notification preference: summarize at milestones (data ready, training start, training end, merge complete, gguf ready, final test passed/failed).

3. Automated data generation
   - Use the provided task description to programmatically generate synthetic instruction-response pairs. Recommended method:
     a. Create diverse templates: instruction, few-shot input-output, multi-turn context, edge cases, negative cases.
     b. Use the base Llama3 model in generation mode (or a small local model) with prompt templates to produce candidate outputs.
     c. Apply simple quality filters (length limits, profanity filters, uniqueness deduplication).
   - Split into train/validation/test sets per configuration.
   - Save data in standard formats: JSONL with {"instruction","input","output"} or HF dataset format.
   - Notify user: data generation complete, show counts and path.

4. Adapter (PEFT) fine-tuning
   - Prepare dataset loader and tokenization using the base model tokenizer.
   - Configure PEFT/LoRA adapter parameters (rank, alpha, dropout). Default small-rank for 3B (r=8-16) to fit personal hardware.
   - Use accelerate or training loop to run adapter-only fine-tuning. If GPU memory limited, use gradient accumulation, fp16 or bitsandbytes 8-bit optimizer.
   - During training: checkpoint adapters periodically, log training/validation loss and simple evaluation metrics on validation set.
   - Notify user: training started and training finished with summary metrics and adapter checkpoint path.

5. Adapter-only testing
   - Load base model + adapter checkpoint in inference mode and run the test set.
   - Compute evaluation metrics relevant to task (accuracy, BLEU, F1, or custom unit tests) and produce a brief test summary.
   - Save example inputs+predictions for inspection.
   - Notify user: adapter test results summary.

6. Merge adapter into base model and re-test
   - Use PEFT merge utilities to apply adapter weights into a full fine-tuned checkpoint (non-adapter merged model). Save merged checkpoint separately.
   - Load merged checkpoint and re-run the same tests as step 5.
   - Compare adapter-only vs merged-model metrics and present differences.
   - Notify user: merge complete and comparison summary.

7. Convert merged model to GGUF
   - If merged checkpoint is in Hugging Face safe tensor or PyTorch format, run conversion pipeline to GGUF compatible with llama.cpp/ggml tools.
   - Provide recommended conversion commands (use local conversion scripts or llama.cpp conversion utilities). Validate the generated GGUF file integrity and size.
   - Notify user: GGUF file ready with path and size.

8. Quantization
   - Offer quantization options (e.g., q4_0, q4_1, q8_0). Default to conservative 4-bit option for local hardware.
   - Run quantization using appropriate tool (e.g., llama.cpp quantize utility or gguf quantization tool). Keep original GGUF as backup.
   - Validate quantized GGUF can be loaded by the chosen runtime (quick smoke inference test).
   - Notify user: quantization complete and quantized model path.

9. Final testing and smoke checks
   - Run final test suite on quantized model using a small sample of test cases. Measure latency and essential metrics.
   - If metrics fall below thresholds, report and offer recommended actions (increase adapter epochs, more data, adjust quantization settings).
   - Produce a short summary: final metric table, example inputs/outputs, final model paths (merged GGUF, quantized GGUF), and suggested next steps.

10. Clean up and artifacts
   - Collect artifacts: generated dataset, adapter checkpoints, merged checkpoint, GGUF files, logs, evaluation reports. Offer compressed archive for download.
   - Present final concise summary notification with artifact locations.

Usage examples

- Example 1: One-command autopipeline (defaults)
  - User runs: auto-finetune --task "summarize my app logs into actionable items" --model meta-llama/llama-3-3b --auto
  - Behavior: Generates 2000 synthetic examples for log summarization, trains a LoRA adapter for 3 epochs, runs tests, merges, converts to gguf, quantizes to q4, runs final tests, then outputs artifacts and summaries.

- Example 2: Custom config with smaller data and more epochs
  - User supplies config JSON or flags: --examples 500 --epochs 6 --adapter-rank 16 --quant q8_0
  - Behavior: Pipeline uses provided parameters and notifies at milestones.

Best practices

- Start small: generate a few hundred examples and run quick adapter epochs to validate pipeline behavior before scaling.
- Use validation and test splits to avoid overfitting; track metrics at each stage (adapter-only vs merged vs quantized).
- Keep backups of base and merged models before quantization.
- For personal machines, prefer small adapter ranks and lower batch sizes; use gradient accumulation and mixed precision.
- If possible, test conversion and quantization steps on a small model first to verify tooling.

Templates and placeholders

- Provide default prompt templates for common task types (instruction-only, instruction+input, multi-turn). Replace placeholders with user-specified task description.
- Configuration file placeholder (suggested fields):
  - model_id: meta-llama/llama-3-3b
  - task_description: "..."
  - examples: 2000
  - adapter_rank: 8
  - epochs: 3
  - batch_size: 8
  - quantization: q4_0

Notifications and checkpoints

- Notify user at these milestones: data generation complete, training start, training end (with metrics), adapter test complete, merge complete, GGUF conversion complete, quantization complete, final tests complete.
- Keep checkpointing frequency configurable so users can resume if interrupted.

Safety, permissions, and limitations

- Warn user about license/usage restrictions of base models. Ensure converted GGUF/quantized models remain compliant with model licensing.
- Note memory and compute constraints; automated defaults aim to be conservative for personal hardware.
- This pipeline generates synthetic data; user should review and curate before deploying to production.

Related tools and recommended commands

- Hugging Face Transformers + PEFT for adapter training
- accelerate for distributed/local optimization
- bitsandbytes for 8-bit optimizer support
- llama.cpp / ggml tools for GGUF conversion and quantization
- Example conversion commands should be provided by the implementation script where applicable.

Implementation notes for developer

- Provide CLI entrypoint that accepts either a single "--auto" flag to run default end-to-end or a config JSON to customize stages.
- Implement modular stages so users can stop/re-run a single stage (e.g., only generate data or only convert/quantize).
- Log everything to structured JSON logs for easy inspection.

