Previous Page   Next Page

 

ARM7 32-bit MCUs are Well Known and Widely Used in Embedded Applications. So What Future is There for the 16-bit MCU?

By Gidi Mizrahi, B.Eng., Field Applications Engineer, Future Electronics (Israel)





Specifying a high-end microcontroller can look simple: the price of 32-bit microcontrollers has fallen very close to the price of a high-end 8-bit microcontroller. The 32-bit devices offer higher clock speeds, support larger memories and provide more I/O than the 8-bit devices. So surely it is obvious that the designer looking for more performance than their 8-bit device can offer should migrate straight to a 32-bit micro?


In fact, the tried and tested 16-bit microcontroller has certain advantages over a 32-bit device. A 16- bit MCU will occupy a similar price point to many 32-bit MCUs – devices using the ARM7 core are very popular. But embedded designs typically require deterministic execution of code, small footprint and ease of software design. And in some cases, a 16-bit device is better able to offer these characteristics. So how to choose? The designer’s evaluation should start with an examination of the rival architectures.

The most useful comparison to make is between the Harvard architecture most often used by 16- bit MCUs, and the von Neumann architecture used by ARM7 devices.

 

How the Different Architectures Execute Instructions

 


Figure 1. The ARM7 core’s von Neumann architecture, as implemented by NXP

 

The von Neumann architecture used by the ARM7 core, named after the mathematician and early computer scientist John von Neumann, was originally developed for use in computers. Its distinctive feature is that it uses a single storage structure to hold both program memory and data memory (see Figure 1).

The 16-bit Harvard architecture has a crucial difference from the von Neumann architecture. Here, there are separate memory spaces for program memory and data memory (see Figure 2). A 16- bit RISC CPU core can often have a wider program memory bus, and one or two 16-bit data buses.

 


Figure 2. The Harvard architecture as implemented in some 16-bit MCUs

 

Even a cursory comparison of Figures 1 and 2 reveals one obvious advantage of the Harvard architecture: the separate data and program buses allow simultaneous access of both program memory and data memory. Since one bus never has to wait while the other hogs its bus, faster and more deterministic execution is often possible. This is particularly valuable in applications that are rich in single-cycle, single-word instructions.

So, for instance, some 16-bit MCUs operate at full speed from on-board Flash (at up to 40MHz). This high operating frequency, efficiently used by the internal circuitry of the devices, provides the deterministic performance expected by control engineers.

By contrast, in the ARM7 core, the separation between the CPU and memory can lead to a situation known as the ‘von Neumann bottleneck’. Under some circumstances (when the CPU is required to perform minimal processing on large amounts of data), this gives rise to a serious limitation in effective processing speed. This is because the CPU is constantly waiting for vital data to be transferred to or from memory. Interestingly, the bottleneck has the potential to become tighter the higher the CPU operating frequency rises and the bigger the memory grows.

Suppliers of ARM7 devices have worked hard to mitigate this inherent weakness. NXP, for instance, in its LPC2000 family provides a Memory Accelerator Module (MAM). This is a CMOS Flash memory that is 128-bits wide. One fetch reads four 32-bit words at a time. The devices can also implement a complex fetching sequence that uses multiple buffers to speculatively pre-fetch and store one batch of data or instructions while the CPU is still executing a previous batch.

The main purpose of this complex scheme is to prevent a branch or data access from stalling the processor, especially during real-time operations. Nevertheless, this effort by NXP to work around the von Neumann architecture produces branches that break up the sequence of code execution and requires the constant flushing and re-filling of the pre-fetch buffers. This consumes clock cycles and slows down code execution.

At this stage, then, it could seem as though the 16-bit device is a clear winner. But it is not so simple. For a start, ARM7 devices can mitigate their congested architecture by driving traffic through at higher frequencies. While typical 16-bit MCUs operate at 40MIPS CPU speed, NXP’s LPC2100 ARM7 family is quoted as offering CPU speeds up to 72MHz. To use an analogy, the Harvard device is like a wide road that accommodates more traffic; the ARM device might have a narrower road, but each vehicle is moving a whole lot faster than in the Harvard device. Indeed, suppliers of ARM7 devices can always find performance tests in which their device executes code faster than a comparable Harvard-architecture 16-bit device, and vice versa.

Second, look again at the Harvard architecture in Figure 2: it uses two different buses, one for data and one for program memory. This architecture is far from easy to implement in silicon. And the difficulty of both designing and manufacturing Harvard-architecture devices is reflected in higher priced silicon or a less abundant feature set.

The lack of competitiveness of 16-bit devices in terms of price and features is not their only drawback. They are also constrained in the amount of memory that they can access. At best, a 16-bit device today can address 256kB of Flash. The roadmaps of some 16-bit manufacturers envisage devices offering 512kB of Flash, but that is yet to be delivered in working silicon.

There is one other important area in which to draw a comparison between 16-bit Harvard devices and ARM7-based 32-bit MCUs: code efficiency. The question of how efficiently a device compiles code is highly dependent on many variables, not least of which is the quality of the compiler. In addition, ARM7 devices can be operated in ‘Thumb’ mode, in which instructions are compressed to 16-bits wide to save on memory footprint.

Nevertheless, in almost all cases, the compiled code for a 16-bit MCU will be slightly smaller than comparable instructions implemented in an ARM7 device.

It is worth mentioning that the competitive strength of ARM-based devices could be set to grow in the near future with the release of MCUs based on the company’s ‘Cortex’ core. ST Microelectronics in June 2007 was the first large silicon vendor to announce an ARM Cortex MCU.

Interesting claims are already being made for the Cortex core, which breaks from the ARM7 mold by adopting Harvard architecture. It is said to be fast and to offer considerably lower power consumption than the ARM7. It also implements a new 16- bit ‘Thumb 2’ instruction set which produces much smaller compiled code than the ARM7, even when used in Thumb mode.

But with ST’s Cortex device only recently introduced to the market, it is early to be making definitive judgments about the comparative benefits of ARM Cortex versus either traditional 16-bit devices or the ARM7 core.

So at least in the coming months, the main battle will continue to be between traditional 16-bit devices and the ARM7 core. And as we can see in the two contrasting products described below, the designer’s choice will generally be determined by the needs of the application.

 

Home Alarm Control Panel Illustration

The first application I would use to illustrate my argument is a control panel for a home alarm, in the form of a touch-sensitive color LCD panel. This design requires a large memory (to save the graphics data for the LCD). It also needs a robust communication interface to the host controller.

An ARM7 MCU such as a device from NXP’s LPC24xx family would be ideal for this application. First, it has many important features integrated into the chip, including Ethernet, USB host, CAN bus, 4 UARTs and an LCD driver that can drive LCDs up to 1024 x 768 pixels. Such a rich feature set will not be found on any 16-bit device due to the difficulty of designing and manufacturing such a device in the Harvard architecture.

The LPC24xx family also offers 512kB of on-board Flash and 98kB of on-board SRAM – enough to support the large memory requirements of an LCD driver. (Interestingly, on introduction, ST’s new Cortex device integrates only 128kB of Flash – but more is expected soon.)

The device allows the designer to offer flexible communications interfaces to the host controller. In office buildings, an Ethernet infrastructure will already exist, so this will provide a route back to the controller. For residential installations, the device provides an RS-485 link using one of the UART or CAN bus interfaces.

Again, no 16-bit device can offer such a wide choice of communications interfaces on-chip.

 

Wireless Smart Detector

This design offers a contrast: here, a powerful MCU will be required to perform intensive calculations, including Fast Fourier Transforms (FFT), to process algorithms quickly and communicate back to the main system controller before going into Sleep mode. The smart detector’s battery should be capable of lasting five years.

ARM7 MCUs typically have higher power consumption than that offered by comparable 16-bit processors – a fact which makes the 16-bit processors more appropriate for battery powered operation. In particular, their peripherals can be driven at variable voltages and at variable clock speeds. This means the design engineer can use the same MCU to perform interval measurements, to wake up from Sleep mode within 10µs, and go into high speed operational mode to execute algorithms fast if it detects a change from a sensor. This whole routine of waking up, executing code and going back to sleep can be accomplished in as little as 150µs. This is at least eight times faster than an ARM7 device, which has a minimum wake-up time of more than 1ms.

Crucially, with a 16-bit Harvard MCU, the software execution time is deterministic and predictable. This means that the software engineer can lower the clock speed after code execution to save battery power, send the data to the system controller and be confident that the data will emerge from the MCU at the right speed and in the right format. The ST ARM Cortex 32-bit MCU, however, might soon be a disruptive technology in this space as a move to Harvard architecture, with its low power consumption and fast start-up from Sleep, might rewrite the current rulebook.

 

Conclusion

Both examples above require a powerful MCU: one – the alarm panel – to drive multiple peripherals such as the LCD display and communications interfaces; the other – the smart detector – to perform fast and deterministic code execution. They illustrate the different strengths of each type of device. So for now, the embedded developer must make a choice between a 16-bit Harvard architecture MCU and a 32-bit ARM7 MCU by reference to the needs of their application.

But technology never stops changing, and if the potential of ARM’s Cortex core is fulfilled it could kill the 16-bit MCU. Only time will tell…

Future Electronics can supply 16-bit MCUs from Microchip, ST Microelectronics and Freescale Semiconductor; 32-bit ARM7 MCUs from NXP and ST Microelectronics; and the new ARM Cortex MCU from ST.

 

 

 

Previous Page
Terms of Use  |  Privacy Policy
© 2012 Future Electronics. All rights reserved.

Next Page