Project MONDEGO/ Research

Adapted processing architectures

Surveillance and monitoring in poorly structured environments require a very low power consumption sensor network. Incorporating advanced features at the sensory nodes is only admissible at a low power budget [12]. Although projections on the performance of integrated circuits point to a sustained increase in the available computing power [13], conventional architectures are giving way to multi-core systems [14]. The reason for this is avoiding the constraints in power consumption imposed by the thermal characteristics of the materials. There is a maximum energy that can be dissipated by the silicon die and the chip package [15]

An additional difficulty is the implementation of vision algorithms in conventional architectures. The visual stimulus is multidimensional, containing high spatial and temporal correlation. Signals for the different sensors of he array are inherently parallel. Conventional architectures have, in principle, a low degree of parallelism. It can be increased by proper instruction segmentation [16]. It is also common to use hardware accelerators for some specific tasks [17].

Finally, a big problem found in conventional architectures is the limited memory bandwidth [18]. Because of the massive data flow and the inherent parallelism of the signals, memory access becomes a data bottle-neck. Processing speed limitations appear unless an extra power consumption is permitted. The current trend, especially in what respect to image and video processing, is to divide tasks into several cores with their own cache. These multiple core systems are capable of a higher computing power per milliwatt than their single-gore counterparts [19]. Basically, while power consumption scales linearly with he CPU clock frequency, computing power —measured in MIPS— does not depend solely on the processor. There are elements, like memory access, not scaling with the clock, that have an influence in the achievable computing power. In this way, two processors operating in parallel are more efficient that a single processor operating at double the frequency. This technique is limited by the fact that not every task can be parallelized. Also, Amdahl's law [20]puts a limit to the speedup that can be achieved by parallelization. This limitation has been revised however by taking into account different mechanisms for internal reconfiguration and the important reduction of the memory latency obtained when using distributed resources [21]. The following table displays the effect of parallelization in the energy efficiency of the processor:

Ref. Year Processor Tech. (nm) Freq. (MHz) MIPS/ mW

[22] 2008 Intel® Atom™, single-core 45 1730 1.32

[23] 2010 Intel® Atom™, dual-core, Z520 45 1300 1.64

[24] 2010 NVIDIA Tegra T20-250, 10-core 40 1000 4.60

[25] 2011 Cellular Processor Array, 22×19 SIMD array 350 75 38.0

[26] 2007 Xetal-II, SIMD with 320 heterogeneous PEs 90 84 178.3

[27] 2011 Multiple level SIMD, 32PEs + 32×128 SIMD 180 100 217.3

[28] 2011 Digital Cellular Processor, 120PEs + 80×60 130 200 285.7

Table I. Digital processors of different architecture.

Table I. Digital processors of different architecture.
Ref.	Year	Processor	Tech. (nm)	Freq. (MHz)	MIPS/ mW
[22]	2008	Intel® Atom™, single-core	45	1730	1.32
[23]	2010	Intel® Atom™, dual-core, Z520	45	1300	1.64
[24]	2010	NVIDIA Tegra T20-250, 10-core	40	1000	4.60
[25]	2011	Cellular Processor Array, 22×19 SIMD array	350	75	38.0
[26]	2007	Xetal-II, SIMD with 320 heterogeneous PEs	90	84	178.3
[27]	2011	Multiple level SIMD, 32PEs + 32×128 SIMD	180	100	217.3
[28]	2011	Digital Cellular Processor, 120PEs + 80×60	130	200	285.7

As can be seen, the most energy efficient processors in Table I were built in less advanced technologies. This has been achieved by exploiting parallelism. Clock frequency has been slowed down and memory access is much simpler. Both things contribute to power saving. But still further adaptation can be realized. We propose a bio-inspired approach in which vision is a hierarchical process. Information gets refined in the sense that the number of data decrease as they gain in complexity and formal content [29]. In the biological retina, image features are extracted by parallel processing the visual stimulus. Anything within the visual field is represented by a combination of features. This constitutes a description that can be naturally handled by the brain [30]. At every stage in this process, the proper allocation of resources permits solving the speed/power trade-off. Resource allocation is a two-folded problem, on the one hand, the algorithm needs to be partitioned in tasks with an efficiency criterion. On the other hand, the selection of the appropriate circuit blocks to realize these tasks also contributes to power budget reduction. In particular, tasks requiring low to moderate accuracy can be very efficiently realized at the focal plane by means of analog circuits. Energy efficiency is boosted: 250 [31] and 214 [32] MOPS/mW. These chips are more than processor arrays, they contain photosensors that operate concurrently with the processing circuitry.