QORA-LLM-2B – Pure Rust ternary inference, no multiplication needed

Posted by blockmandev 3 hours ago

Counter3Comment1OpenOriginal

Comments

Comment by blockmandev 3 hours ago

Pure Rust ternary inference engine based on BitNet b1.58-2B-4T. No Python, no CUDA, no external ML frameworks. Single executable + model weights = portable AI that runs on any machine.

Zero-multiplication inference — ternary weights {-1, 0, +1} mean the inner GEMV loop uses only addition and subtraction, no floating-point multiply. Smart system awareness — detects RAM and CPU at startup and adjusts generation limits automatically.