QORA-LLM-2B – Pure Rust ternary inference, no multiplication needed
Posted by blockmandev 3 hours ago
Comments
Comment by blockmandev 3 hours ago
Pure Rust ternary inference engine based on BitNet b1.58-2B-4T. No Python, no CUDA, no external ML frameworks. Single executable + model weights = portable AI that runs on any machine.
Zero-multiplication inference — ternary weights {-1, 0, +1} mean the inner GEMV loop uses only addition and subtraction, no floating-point multiply. Smart system awareness — detects RAM and CPU at startup and adjusts generation limits automatically.