The “AI industrial complex” in its current form is not sustainable.
While transformers have delivered remarkable capabilities, their energy consumption and computational demands reveal a fundamental inefficiency: we’re fighting against nature’s design principles. The human brain operates on roughly 20 watts, processing massive volumes of information through sparse, event-driven spikes. (at least, as we currently understand it today) Current AI systems consume thousands of watts to support narrow inference capabilities, forcing dense matrix operations through every computation. This disparity isn’t just inefficient; it suggests we’re missing something fundamental about intelligence itself.
Spiking Neural Networks (SNNs) offer a radically different path, one that neuromorphic processors have begun to realize in silicon. Yet despite decades of research and impressive hardware developments, SNNs remain frustratingly difficult to train and deploy. As with many algorithmic methods, efficient and accurate gradient calculation has been a constant challenge. For those that have been working in the field for decades, the core question in addressing SNNs surround how to compute gradients through discrete, non-differentiable spike events.
This document explores an unexpected convergence of ideas that may finally reveal the astounding potential of neuromorphic computing. It’s a story that connects several unique, yet proven ideas: ternary number systems, a breakthrough in gradient computation, and new designs in “spiking” neural processing. Coupled with SpeakEZ’s unique Fidelity framework design, revolutionary solutions will bring heterogeneous architectures into a manageable and coherent platform. The implications reach beyond any single hardware innovation to suggest a fundamental shift in how we can build intelligent systems.
The Ternary Revelation: Beyond Binary Thinking
Modern spiking neural network algorithms, as described in the Multi-Plasticity Synergy Learning (MPSL) framework1, are shown to operate on a binary principle despite running on far more capable hardware. The Leaky Integrate-and-Fire equation from the paper defines spike generation as:
\[ S^{t,l} = \Theta(U^{t,l} - V_{th}) = \begin{cases} 1, & U^{t,l} \geq V_{th} \\ 0, & U^{t,l} < V_{th} \end{cases} \]This binary representation; spike (1) or silent (0); has been the algorithmic convention in neuromorphic computing, not because of hardware limitations, but due to historical precedent and the mathematical challenges of training. Some neuromorphic processors like Intel’s Loihi 2 actually support graded spikes with up to 32-bit payloads, programmable neuron models, and thousands of states per neuron. This is a radical shift that completely outstrips the current theoretical conventions. Even the sophisticated MPSL framework, which innovatively combines multiple learning mechanisms (Spatio-Temporal Backpropagation or STBP for gradient-based learning, Hebbian plasticity for correlation-based local learning, and Self-Backpropagation or SBP for local feedback without explicit gradients), constrains itself to binary representations despite the hardware’s richer capabilities.
Lessons from Biology
Neuroscience tells us that biological neurons exhibit far richer dynamics, which is a subtle but significant contributor to the model. Between rest and firing, neurons spend significant time in distinct computational regimes, actively processing information without generating spikes. This isn’t merely noise or inefficiency; it’s a computational feature that binary SNNs completely miss.
Consider what happens in the binary model: a neuron accumulating toward threshold carries critical temporal information about recent inputs, yet this information vanishes the moment we sample its state. If the membrane potential is at 0.9 × threshold, the binary representation sees only “0”; identical to a neuron at rest. This discretization throws away precisely the information that makes temporal processing powerful.
The Computational Regime Model
The notion is to distinguish between the continuous membrane potential dynamics and the discrete computational regimes that neurons occupy. Biological neurons don’t just have voltage levels; they have distinct operational modes based on their membrane potential:
- Silent/Resting: Near the resting potential (typically -70mV), the neuron is minimally responsive, with leak currents dominating
- Active/Integrating: Depolarized but below firing threshold (between -55mV and -40mV), actively accumulating and processing inputs
- Spiking/Firing: Above threshold, generating output spikes
This biological reality maps naturally to a ternary encoding that captures computational regime, not just voltage:
\[ \text{TernaryState} = \begin{cases} 0 & \text{Silent (near resting potential)} \\ -1 & \text{Active (integrating, between thresholds)} \\ +1 & \text{Spiking (above firing threshold)} \end{cases} \]The continuous membrane potential \(U\) maps to discrete states via two thresholds:
- \(\theta_{active}\): Transition from silent to active integration
- \(\theta_{fire}\): Spike generation threshold
This preserves critical information about neurons actively integrating inputs (\(U \in [\theta_{active}, \theta_{fire})\)) that binary representations discard.
Leveraging New Hardware
This observation leads to our key innovation: expanding the algorithmic state space to match hardware capabilities. While the MPSL paper advances the field through multiple learning mechanisms, it follows the algorithmic convention of binary spike representation, leaving hardware capabilities untapped. Our ternary encoding leverages the multi-level states these processors already support.
This isn’t a hardware modification; it’s simply using the hardware as it was designed. Intel’s Loihi 2 can represent 4096 states per neuron, SambaNova’s RDU can reconfigure for arbitrary word-level operations, and we’re finally going to use these capabilities. The active state (-1) captures neurons that are actively integrating inputs but haven’t yet reached firing threshold, preserving temporal context that binary algorithms discard.
// Clear separation of continuous dynamics and discrete states
type TernarySpikingNeuron = {
Potential: Posit<16, 1> // Continuous membrane potential
RestingPotential: float32 // Baseline (e.g., -70mV)
ActiveThreshold: float32 // Activation begins (e.g., -55mV)
FiringThreshold: float32 // Spike generation (e.g., -40mV)
State: TernaryState // Discrete computational regime
}
// State mapping based on potential regions
let computeState (potential: Posit<16,1>) (neuron: TernarySpikingNeuron) =
match Posit.toFloat32 potential with
| p when p >= neuron.FiringThreshold -> Spiking // +1
| p when p >= neuron.ActiveThreshold -> Active // -1
| _ -> Silent // 0
// Accumulation happens in continuous domain
let updateNeuron (neuron: TernarySpikingNeuron) (input: float32) =
// Continuous dynamics (Leaky Integrate-and-Fire)
let leak = (neuron.Potential - neuron.RestingPotential) * leakRate
let newPotential = neuron.Potential - leak + Posit.fromFloat32 input
// Discrete state for communication
let newState = computeState newPotential neuron
// Reset only after spike
match newState with
| Spiking ->
{ neuron with
State = Spiking
Potential = Posit.fromFloat32 neuron.RestingPotential }
| otherState ->
{ neuron with
State = otherState
Potential = newPotential }
Breaking the Backpropagation Dependency
The Surrogate Gradient Problem
The MPSL paper, like virtually all modern SNN training approaches, relies on surrogate gradients to handle the non-differentiable spike function. As the paper states in Equation 6:
\[ \frac{\partial S^{t,l}}{\partial U^{t,l}} \approx u'(U^{t,l}, V_{th}) \]This approximation; replacing the undefined gradient with a smooth surrogate function; is a mathematical fiction that introduces instability and limits learning efficiency. Every major SNN training method (STBP, BPTT, even the innovative MPSL approach) depends on this workaround. We pretend the spike function is smooth when it fundamentally isn’t.
The Forward Gradient Revolution
Recent breakthrough work by Baydin, Pearlmutter, Siskind and Syme2 demonstrates a revolutionary alternative that eliminates this fiction entirely. Their forward gradient method computes unbiased gradient estimates using only forward-mode automatic differentiation:
\[ g(\theta) = (\nabla f(\theta) \cdot v) v \]Where \(v\) is a random perturbation vector. This formula has profound implications for SNNs:
- No surrogate needed: The directional derivative \(\nabla f(\theta) \cdot v\) can be computed exactly even for discrete spike functions
- Single forward pass: Eliminates the entire backward propagation phase
- Proven unbiased: Mathematically guaranteed to converge to the true gradient in expectation
- 2x speedup: The paper demonstrates training neural networks up to twice as fast as backpropagation
Why This Changes Everything for SNNs
The forward gradient approach solves the exact problem that has plagued SNN training. Where the MPSL framework must resort to rectangular surrogate functions (Equation 7 in their paper), forward gradients handle discrete transitions naturally:
// Forward gradient captures state transition sensitivities
let computeStateGradient (potential: Posit<16,1>) (thresholds: Thresholds) =
// Directional derivative exists at transition boundaries
let perturbation = samplePerturbation()
let perturbedPotential = potential + perturbation
// State change detection (no surrogate needed!)
let originalState = computeState potential thresholds
let perturbedState = computeState perturbedPotential thresholds
// Exact gradient through discrete transition
if originalState <> perturbedState then
perturbation // Sensitivity at boundary
else
Posit.zero // No transition
// Training with exact gradients through discrete states
let trainTernarySNN (network: SpikingNetwork) =
// Sample random perturbation with posit precision
let v = samplePerturbation<Posit<16,1>>()
// Single forward pass computes output AND directional derivative
// Even though spike function is discrete!
let output, directional =
Furnace.ForwardMode.evaluateWithDerivative network v
// Unbiased gradient estimate
let forwardGradient = directional * v
// Update using local plasticity rules
updateSynapticWeights forwardGradient
The takeaway: discreteness doesn’t break directional derivatives.
When a perturbation causes a state transition (Silent → Active, Active → Spiking), the derivative captures that sensitivity exactly. When it doesn’t, the derivative is zero. The expectation over random perturbations recovers complete gradient information without any approximation.
Biological Plausibility Through Global Signals
The forward gradient paper notes something remarkable: this approach can be interpreted as “feedback of a single global scalar quantity that is identical for all computation nodes”2. This maps naturally to biological neuromodulatory systems:
- Dopamine for reward signaling
- Serotonin for mood regulation
- Acetylcholine for attention modulation
Combined with the MPSL framework’s multiple plasticity mechanisms1, this creates a biologically plausible learning system that doesn’t force the algorithm into a “backpropagation corner”, which is by its nature implausible in biological neural networks.
Hebbian Plasticity Through State Transitions
The forward gradient approach naturally combines with local Hebbian rules based on our ternary state transitions:
\[ \Delta w_{ij} = \eta \cdot (\nabla f \cdot v) \cdot P(\text{State}_j | \text{State}_i) \]Where weight updates depend on state transition probabilities:
- Silent → Active: Potentiation (strengthen connection)
- Active → Spiking: Hebbian reinforcement
- Spiking → Silent: Refractory adjustment
This directly enhances the MPSL framework’s multi-plasticity approach, where they already combine STBP, Hebbian, and SBP mechanisms. Our ternary states provide richer transition information for these learning rules to exploit:
// Forward gradient weight update - mathematically principled
let updateSynapticWeights (network: SpikingNetwork) (weight: Posit<16,1>) =
// Sample perturbation vector
let v = sampleGaussian<Posit<16,1>>()
// Compute directional derivative (exact, not surrogate!)
let directional = computeDirectionalDerivative network v
// Forward gradient is unbiased estimate of true gradient
let gradient = directional * v
// Combine with state transition probabilities
match (preState, postState) with
| (Silent, Active) ->
weight + learningRate * gradient * potentiationFactor
| (Active, Spiking) ->
weight + learningRate * gradient * hebbianFactor
| (Spiking, Silent) ->
weight - learningRate * gradient * depressionFactor
| _ -> weight
Posits: The Natural Language of Membrane Dynamics
The Leaky Integrate-and-Fire equation from the MPSL paper reveals why posit arithmetic is ideal for SNNs:
\[ U^{t,l} = \rho_m(U^{t-1,l} - S^{t-1,l}V_{th}) + I^{t,l} \]This equation involves:
- Exponential decay (\(\rho_m\))
- Threshold comparisons
- Accumulation of many small inputs
Posit arithmetic’s variable precision naturally matches these requirements:
- High precision near threshold: Where spike/no-spike decisions are critical
- Lower precision for strongly polarized states: Where exact values matter less
- Exponential representation: Natural for the \(\rho_m\) decay factor
- Exact accumulation via quire: No rounding errors during integration
// Membrane potential dynamics with posit arithmetic
let computeMembranePotential (current: Posit<32,2>) (input: Posit<32,2>) =
use quire = Quire<32, 512>.Zero // Exact accumulation
// Decay current potential
quire.AddProduct(current, decayRate)
// Accumulate weighted inputs (no rounding errors!)
for synapse in activeSynapses do
quire.AddProduct(synapse.Weight, synapse.Input)
quire.ToPosit() // Single rounding at the end
Integration with Furnace Auto-Differentiation
The Furnace library, originally developed as ‘DiffSharp’ by the same team behind the forward gradient breakthrough (Syme, Baydin, Pearlmutter and Siskind), provides the perfect foundation for implementing forward-mode SNNs:
module Furnace.Neuromorphic =
// Leverage existing forward-mode AD infrastructure
let trainSpikingNetwork (network: TernarySpikingNetwork) (data: SensorData) =
// Forward gradient computation in a single pass
let forwardGradient = furnace {
// Random perturbation for unbiased gradient estimation
let! v = sampleStandardNormal network.ParameterShape
// Forward pass computes both output and directional derivative
// No backward pass needed!
let! output, directional =
ForwardMode.evaluateWithDirectional network data v
// Forward gradient theorem: E[g] = ∇f
return directional * v
}
// Update weights using local plasticity rules
network.UpdateWeights forwardGradient
This approach achieves what the forward gradient paper demonstrated: training neural networks “without backpropagation” while being “computationally competitive”2, often achieving 2x speedup over traditional methods.
The Untapped Silicon Potential
Modern neuromorphic processors and Coarse-Grained Reconfigurable Architectures (CGRAs) already possess the capabilities needed for our approach; they’re just waiting for the right software to unlock their potential.
Intel’s Loihi 2, far from being limited to binary spikes, actually supports:
- Graded spikes with up to 32-bit integer payloads
- Programmable neuron models via microcode that can implement arbitrary dynamics
- Up to 4096 states per neuron, not just spike/no-spike
- Ternary weight matrices already demonstrated in recent implementations
IBM’s TrueNorth, BrainChip’s Akida, and other neuromorphic processors similarly offer programmable models and multi-bit communications. The limitation has never been the silicon; it’s been our algorithms.
CGRAs: The Perfect Platform for Adaptive Intelligence
Coarse-Grained Reconfigurable Architectures from companies like NextSilicon and SambaNova offer even more flexibility:
Platform | Architecture | Key Advantage for SNNs |
---|---|---|
NextSilicon Maverick | Runtime reconfigurable dataflow | Automatically tunes to code patterns, no manual optimization needed |
SambaNova RDU | Reconfigurable at each clock cycle | Can morph between neural and conventional processing dynamically |
General CGRAs | Word-level reconfigurable arrays | Natural fit for ternary representations and posit arithmetic |
As SambaNova describes it, their RDU is “an array of compute and memory on chip” that can be reconfigured to match the exact computational pattern needed. This makes CGRAs ideal for:
- Ternary state machines that can be efficiently mapped to word-level operations
- Posit arithmetic implementations using the flexible compute units
- Dynamic network topologies that adapt during runtime
- Mixed conventional/neuromorphic workloads in the same chip
Hardware Capability Summary
Feature | Neuromorphic (Loihi 2) | CGRAs (SambaNova/NextSilicon) | Fidelity Framework |
---|---|---|---|
State representation | Up to 4096 states/neuron | Arbitrary via reconfiguration | Ternary mapping |
Arithmetic precision | 8-32 bit configurable | Word-level operations | Posit arithmetic |
Learning capability | Programmable plasticity | Runtime adaptable | Forward gradients |
Computation model | Event-driven spikes | Dataflow reconfigurable | Both paradigms |
Programming model | Microcode/assembly | High-level dataflow | F# unified abstraction |
The Reality of Hybrid Compute
In practice, CGRA and neuromorphic processors rarely operate as the sole component to a solution. They’re deployed in heterogeneous systems as accelerators:
- On-die integration: Accelerators alongside conventional CPU/GPU cores
- CXL coherent memory: Shared memory spaces between neuromorphic and traditional processors
- PCIe accelerators: Accelerator cards working within host systems
- Edge hybrids: Low-power neuromorphic/CGRA units paired with DSPs or microcontrollers
The Fidelity framework’s design, particularly the Firefly Hypergraph as a “control flow to data flow” tranformer would make it uniquely suited for these heterogeneous deployments:
// Platform-agnostic neuromorphic compilation
[<CompileToNeuromorphic>]
let neuromorphicCore (neurons: TernarySpikingNeuron array) =
neuromorphic {
// Configure for available neuromorphic target
let! target = detectNeuromorphicPlatform()
match target with
| Intel_Loihi2 config ->
configureLoihi config
| IBM_TrueNorth config ->
configureTrueNorth config
| BrainChip_Akida config ->
configureAkida config
| Infineon_Neuromorphic config ->
configureInfineon config
| FPGA_Emulation config ->
configureFPGAEmulation config
| CPU_Simulation fallback ->
// Graceful degradation to CPU simulation
configureCPUSimulation fallback
// Common neuromorphic operations
return! compileToDataFlow neurons
}
Platform-Specific Implementation Strategies
The beauty of our approach is how naturally this future design will map to numerous hardware architectures:
On Neuromorphic Processors (Loihi 2):
// Direct mapping to Loihi 2's programmable neurons
[<CompileToLoihi>]
let ternaryNeuronLoihi (state: int32) (input: int32) =
// Loihi 2 supports up to 4096 states - we use just 3
// Maps to microcode on neuromorphic cores
match state with
| -1 -> processActive input // State 0-1365
| 0 -> processSilent input // State 1366-2730
| 1 -> processSpike input // State 2731-4095
On CGRAs (SambaNova RDU, NextSilicon Maverick):
// CGRA implementation leverages word-level reconfiguration
[<CompileToCGRA>]
let ternaryNeuronCGRA (neurons: TernaryNeuron array) =
cgra {
// Configure processing elements for ternary operations
let! pe_array = allocatePEs (neurons.Length)
// Runtime reconfiguration based on state distribution
for pe in pe_array do
pe.ConfigureForTernary() // Word-level ternary ops
pe.SetPositPrecision(16, 1) // Native posit support
// Dataflow automatically optimized by platform
return dataflowProcess neurons
}
CGRAs are particularly powerful here because they can:
- Reconfigure arithmetic units for posit operations dynamically
- Adapt dataflow patterns based on spike density
- Seamlessly transition between neural and conventional processing
- Implement the forward gradient computation in parallel across PEs
Each learning mechanism operates on appropriate hardware:
- STBP (Spatio-Temporal Backpropagation): Gradient-based learning that propagates errors through both space (layers) and time (timesteps)
- Hebbian Plasticity: Local learning based on the principle “neurons that fire together, wire together”
- SBP (Self-Backpropagation): Local feedback mechanism that approximates gradients without explicit error propagation
The ternary states provide richer information than binary for all three mechanisms. The MPSL paper’s choice to use binary states was algorithmic convention, not hardware necessity.
Each learning mechanism operates independently on parallel cores, then combines via learnable coefficients as described in the MPSL paper:
\[ W^l = \sum_{i=1}^{3} \lambda_i W_i^l \]Where \(\lambda_i\) are adaptively learned mixing coefficients, optimized through local feedback using forward gradients, not global backpropagation.
Revolutionary Performance Projections
The convergence of ternary representations, forward gradient training, and advanced acceleration hardware promises unprecedented efficiency gains over both conventional approaches and existing binary SNNs:
Metric | GPU (A100) | Binary SNN (Multi-Plasticity)1 | Ternary + Forward Gradient | Improvement vs GPU |
---|---|---|---|---|
Power (Inference) | 400W | 50W | 1-5W | 80-400x |
Power (Training) | 400W | 100W | 2-10W | 40-200x |
Latency (per spike) | 10μs | 1μs | 10-100ns | 100-10000x |
Training passes | 2 (fwd+bwd) | 2 (fwd+bwd) | 1 (fwd only) | 2x |
Gradient accuracy | N/A | Surrogate | Exact | Mathematically honest |
Information preserved | N/A | Binary states | Ternary states | 50% more |
Biological correspondence | None | Medium | High | Paradigm shift |
Note: Performance varies by neuromorphic processor and deployment configuration.
The forward gradient approach demonstrated 2x speedup over backpropagation in conventional networks2. For SNNs, the advantage is even greater since we eliminate the surrogate gradient approximation entirely.
Roadmap: From Vision to Silicon
Phase 1: Foundation
- Implement ternary SNN models in Fidelity framework
- Integrate forward gradient training via Furnace
- Develop neuromorphic backend for Firefly compiler
- Demonstrate MNIST/CIFAR-10 benchmarks
Phase 2: Hardware Integration
- Intel Loihi 2 support with ternary neuron models
- BAREWire integration for event streaming
- Posit arithmetic emulation on fixed-point units
- Heterogeneous CPU-neuromorphic demonstrations
Phase 3: Platform Expansion
- Support for IBM TrueNorth, BrainChip Akida
- FPGA-based neuromorphic emulation
- Cloud deployment with neuromorphic simulation
- Edge deployment on heterogeneous SoCs
Phase 4: Applications
- Real-time sensor fusion for robotics
- Ultra-low-power edge AI
- High-throughput inference systems
- Continuous learning systems
The Strategic Opportunity
The convergence of these technologies reveals an extraordinary opportunity: the hardware is already here, waiting for the right algorithms to unlock its potential. Current neuromorphic software treats advanced processors as if they were simple binary spike generators, using only a fraction of their capabilities. Similarly, CGRAs from NextSilicon and SambaNova are often programmed with conventional approaches that don’t leverage their reconfigurable nature. Our framework would change this by:
- Utilizing existing hardware features: Ternary states map naturally to the multi-bit spikes and programmable neurons already in silicon
- Eliminating algorithmic bottlenecks: Forward gradients remove the surrogate gradient fiction that has limited SNN training
- Providing unified abstractions: F# code that compiles efficiently to both neuromorphic and CGRA targets
Platform-Specific Advantages
For Neuromorphic Processors (Intel, IBM, BrainChip):
- Finally use the full state space (4096 states, not just 2)
- Leverage graded spikes for richer information encoding
- Implement true online learning without backpropagation
For CGRAs (NextSilicon, SambaNova):
- Natural word-level operations for ternary representations
- Runtime reconfiguration for adaptive neural topologies
- Seamless integration of neural and conventional processing
For Heterogeneous Systems:
- Neuromorphic cores for spiking dynamics
- CGRA/GPU for dense operations when needed
- CPU for orchestration and control flow
- All unified through BAREWire’s zero-copy communication
Why This Convergence Matters Now
The hardware ecosystem has reached a critical point where multiple platforms; neuromorphic processors, CGRAs, and heterogeneous systems; all have the capabilities needed for advanced SNNs. What’s been missing is the software layer that can:
- Train these networks without mathematical compromises
- Deploy across diverse hardware without rewriting
- Utilize the full capabilities of modern silicon
The Fidelity framework with forward gradient training provides exactly this missing piece.
Unlocking Today’s & Tomorrow’s Silicon
The intelligent chip revolution isn’t waiting for new hardware; it’s waiting for software that can unleash the capabilities already available. Neuromorphic chips have multiple states per neuron, but we’ve been using just two. CGRA solutions can reconfigure every clock cycle, but we’ve been treating them like fixed architectures. The hardware industry has delivered remarkable capabilities; now it’s time for algorithms to catch up.
Our novel approach to ternary spiking neural networks, converging with forward gradient training, and existing classical hardware integration represents more than incremental progress; it’s about finally using what we’ve built. By embracing nature’s organizing principles and matching them to silicon’s actual capabilities, we can achieve the efficiency gains that neuromorphic computing has long promised.
The mathematical foundations are now clear:
- Ternary modeling leverages the multi-state capabilities already in neuromorphic processors and CGRAs, capturing distinct computational regimes of biological neurons
- Forward gradients provide exact training without the surrogate approximations that have limited the field
- Posit arithmetic maps naturally to the word-level operations of CGRAs and programmable precision of neuromorphic chips
- Existing hardware from Intel, IBM, BrainChip, NextSilicon, and SambaNova is ready today
The Fidelity framework bridges the gap between hardware capability and algorithmic reality. Our control-flow to data-flow compilation, forward gradient training powered by Furnace, and platform-agnostic approach create the software foundation that can finally free the potential of neuromorphic and reconfigurable hardware.
The future of neuromorphic computing isn’t just about building new silicon; it’s also about finding ways to fully employ this remarkable silicon to its fullest, world-changing potential.
Let’s unlock true intelligence with The Fidelity Framework.
Liu, Y., Deng, X., & Yu, Q. (2024). Multi-Plasticity Synergy with Adaptive Mechanism Assignment for Training Spiking Neural Networks. arXiv preprint arXiv:2508.13673v1 ↩︎ ↩︎ ↩︎
Baydin, A. G., Pearlmutter, B. A., Syme, D., Wood, F., & Torr, P. (2022). Gradients without Backpropagation. arXiv preprint arXiv:2202.08587. ↩︎ ↩︎ ↩︎ ↩︎