research: evaluate HRM-Text-1B as fine-tuning base for email classifier #68

New issue

Open

opened 2026-05-22 19:47:43 -07:00 by pyr0ball · 0 comments

pyr0ball commented

2026-05-22 19:47:43 -07:00

Owner

Summary

Evaluate sapientinc/HRM-Text-1B as a candidate fine-tuning base model for Avocet's email classifier, in addition to or instead of standard decoder-only 1B models.

Why it is interesting

HRM (Hierarchical Reasoning Model) uses a dual-timescale recurrent architecture: two transformer stacks (H=slow/high-level, L=fast/low-level) iterate over the same embeddings in nested cycles. With H_cycles=2, L_cycles=3, each forward pass makes 6 effective reasoning passes through the input — more effective compute depth than a standard 1B decoder-only model at the same parameter count.

This may generalize better than a standard 1B on small labeled datasets, which is exactly the regime Avocet operates in: limited human-labeled email samples, high label diversity, and a need for robust generalization across user inboxes.

Model facts

Property	Value
Parameters	1B
Architecture	Dual-timescale recurrent (HRM) — novel, not standard transformer
Format	Safetensors BF16 (primary); GGUF quants at `sinimiini/HRM-Text-1B-GGUF`
License	Apache 2.0 — clean for CF use
Alignment	Pre-alignment base model — requires SFT
Transformers support	Native in transformers >= 5.9.0 (no trust_remote_code needed)
VRAM (BF16)	~2 GB
Training data	40B tokens of structured public corpora; English only; no code

GGUF / llama.cpp status

GGUF quants exist (sinimiini/HRM-Text-1B-GGUF) but require a 556-line patch to llama.cpp across 11 core files (llama-arch.cpp, llama-model.cpp, new hrm-text.cpp, etc.). Patch not yet upstreamed. For fine-tuning, use the safetensors path via transformers — GGUF is only relevant post-quantization for inference deployment.

Proposed evaluation

Baseline: fine-tune a standard Qwen2.5-1.5B-Instruct on Avocet's labeled email corpus; record F1 per label, epochs to convergence, and GPU-hours.
HRM challenger: apply identical SFT recipe to HRM-Text-1B; compare same metrics.
Decision criterion: if HRM matches or exceeds baseline F1 with fewer labeled examples or fewer epochs, prefer it as the standard fine-tuning base.

Fine-tuning notes

PrefixLM attention (bidirectional prefix, causal response) — the fine-tuning data format should use the token_type_ids prefix-marking convention from the model card
<|quad_end|><|object_ref_end|> composite condition tokens enable chain-of-thought; may be worth including in fine-tuning prompts for label reasoning
LLaMA Factory supports HRM (see upstream training scripts in the model repo)

References

Model: https://huggingface.co/sapientinc/HRM-Text-1B
GGUF quants: https://huggingface.co/sinimiini/HRM-Text-1B-GGUF
Architecture paper: arXiv 2605.20613
llama.cpp patch: runtime/llama.cpp-hrm_text.patch in GGUF repo

## Summary Evaluate `sapientinc/HRM-Text-1B` as a candidate fine-tuning base model for Avocet's email classifier, in addition to or instead of standard decoder-only 1B models. ## Why it is interesting HRM (Hierarchical Reasoning Model) uses a dual-timescale recurrent architecture: two transformer stacks (H=slow/high-level, L=fast/low-level) iterate over the same embeddings in nested cycles. With `H_cycles=2, L_cycles=3`, each forward pass makes 6 effective reasoning passes through the input — more effective compute depth than a standard 1B decoder-only model at the same parameter count. This may generalize better than a standard 1B on small labeled datasets, which is exactly the regime Avocet operates in: limited human-labeled email samples, high label diversity, and a need for robust generalization across user inboxes. ## Model facts | Property | Value | |----------|-------| | Parameters | 1B | | Architecture | Dual-timescale recurrent (HRM) — novel, not standard transformer | | Format | Safetensors BF16 (primary); GGUF quants at `sinimiini/HRM-Text-1B-GGUF` | | License | Apache 2.0 — clean for CF use | | Alignment | Pre-alignment base model — requires SFT | | Transformers support | Native in transformers >= 5.9.0 (no trust_remote_code needed) | | VRAM (BF16) | ~2 GB | | Training data | 40B tokens of structured public corpora; English only; no code | ## GGUF / llama.cpp status GGUF quants exist (`sinimiini/HRM-Text-1B-GGUF`) but require a 556-line patch to llama.cpp across 11 core files (`llama-arch.cpp`, `llama-model.cpp`, new `hrm-text.cpp`, etc.). Patch not yet upstreamed. For fine-tuning, use the safetensors path via transformers — GGUF is only relevant post-quantization for inference deployment. ## Proposed evaluation 1. **Baseline**: fine-tune a standard Qwen2.5-1.5B-Instruct on Avocet's labeled email corpus; record F1 per label, epochs to convergence, and GPU-hours. 2. **HRM challenger**: apply identical SFT recipe to HRM-Text-1B; compare same metrics. 3. **Decision criterion**: if HRM matches or exceeds baseline F1 with fewer labeled examples or fewer epochs, prefer it as the standard fine-tuning base. ## Fine-tuning notes - PrefixLM attention (bidirectional prefix, causal response) — the fine-tuning data format should use the `token_type_ids` prefix-marking convention from the model card - `<|quad_end|><|object_ref_end|>` composite condition tokens enable chain-of-thought; may be worth including in fine-tuning prompts for label reasoning - LLaMA Factory supports HRM (see upstream training scripts in the model repo) ## References - Model: https://huggingface.co/sapientinc/HRM-Text-1B - GGUF quants: https://huggingface.co/sinimiini/HRM-Text-1B-GGUF - Architecture paper: arXiv 2605.20613 - llama.cpp patch: `runtime/llama.cpp-hrm_text.patch` in GGUF repo