The danger of deep networksUp to Benchmarks
The danger of deep networksPosted by Giacomo Indiveri at February 27. 2015
I start from the assumption that:
- our goal is to design a neuromorphic computing system that processes sensory signals (e.g. audio) for basic decision making (e.g., as the outcome of a recognition or classification task)
- we will attempt to minimize latency and power consumption
- we will not have access to additional external resources (e.g. "the cloud")
Re: The danger of deep networksPosted by Herbert Jaeger at February 27. 2015
I totally agree with Giacomo (although this spoils his intention to spur a hot discussion!). The current standard benchmarks in machine learning are run under quite different premises than what our project is aiming at, and we can't hope to excel in them at their face value. But I don't think we actually compete in the current Deep Learning community. Their application scope, while impressing, is actually quite limited: essentially it boils down to what machine learning people would call supervised regression tasks. But beyond those tasks there are entire classes of tasks and aspects of tasks that are important but currently a bit out of fashion in machine learning:
- online signal processing
- multiple-timescale memory functionalities
- "lifelong learning" capabilities, that is, extending a learnt model during exploitation when good new training data come along
- reactive systems that are tied in a sense-act loop (all kinds of robot applications, and human-computer-interfacing)
Task aspects that are unconventional in current mainstream ML but that we must address are robustness w.r.t. noise, low parameter precision, parameter drift (I don't know whether that's an issue), local hardware failures, variances across different copies of a chip. If we show that we can address such aspects then we are really innovative, not riding the mainstream. The proposal should contain some text explaining that we don't do deep learning in the first place (needs to be stated explicitly these days where almost everybody equates ML with deep learning). And we have to think of appropriate benchmarks of our own making (but should scan the literature whether there are benchmarks out there targetting noise robustness etc).
Re: The danger of deep networksPosted by Bernabe Linares-Barranco at March 01. 2015
There is one thing I would like to clarify/comment on spiking nets, in relation to a sentence from Herbert on his email of Feb 21st: "spiking dynamics need to be integrated in time to yield such
analog values". Yes, there is a very obvious way to map from analog-valued continuous neurons to spiking neurons, which is integrating the spikes to obtain such analog value. But this is highly inefficient and yields an explosion of spikes: for example, if you use an 8-bit precision for the analog values you need to map 0 to 255 spikes in a given time interval for each neuron.
In order to exploit efficiently the advantage of spiking signal representation, it is useful to think that each neuron will be receiving spikes from a large receptive field of neurons (about 10000 in biology), and each neuron just needs to contribute usually one spike to signal the presence of a given "feature". Therefore, each neuron is receiving a collection of "features" from its receptive field in a given time window (usually a few mili seconds), and is tuned to detect a more compound "feature". One can think of this as a "coincidence detector". This way, spike encoding can be very efficient, and also can result in what I like to call the "pseudo-simultaneity" property, which can be very interesting for recurrent topologies where feedback is used.
Let me use the following figure to explain this.
This represents a feedforward 5-stage vision processing system. The top part is a classical frame-based system, where each stage can be implemented using analog-graded neurons. Each frame is processed stage after stage, requiring a frame-processing-time (here assumed as 1ms). Therefore, if the sensor also requires 1ms to acquire an image, one would obtain "recognition" at time 6ms.
In a spiking systems with a spiking retina (bottom part), the sensor is already providing spikes while things are happening in reality (well, with a typical delay per spike in the range of micro-seconds). The first layer is a collection of spiking feature detectors, typically oriented segments. So, as soon as a few retina pixels provide enough spatially aligned events (like 5 to 10 spikes representing an oriented edge), a neuron in the first layer detecting this oriented edge in a particular location will fire. The second layer would group oriented edges to form more complex shapes, and so on, layer after layer, until full object recognition. But each layer does not need to wait to process a full frame. Each neuron, independently, will signal the presence of its feature. This way, recognition is possible while the sensor is still providing spikes, and it is (theoretically) possible to adjust parameters so that each neuron fires just one spike. If you look at the bottom part of the figure (Event-based processing) all layers are operating almost simultaneously. This is very interesting for processing with feedback, as in recurrent systems.
Feedback processing in a frame-based vision system would require to send an image back to a previous stage, add/combine it with the feedforward flowing one, and iterate until convergence.
However, in a spiking system, events flow naturally as they occurr, and those flowing backwards are combined with those flowing forward naturally. One just needs to assure stability.
I hope you find this interesting and constructive.