ARM7 32-bit MCUs are Well Known and Widely Used in Embedded Applications.
So What Future is There for the 16-bit MCU?
By Gidi Mizrahi, B.Eng., Field Applications Engineer, Future Electronics (Israel)
|
Specifying a high-end microcontroller can look simple: the price of 32-bit microcontrollers has fallen very close to the price of a high-end 8-bit microcontroller. The 32-bit devices offer higher clock speeds, support larger memories and provide more I/O than the 8-bit devices. So surely it is obvious that the designer looking for more performance than their 8-bit device can offer should migrate straight to a 32-bit micro? |
In fact, the tried and tested 16-bit microcontroller
has certain advantages over a 32-bit device. A 16-
bit MCU will occupy a similar price point to many
32-bit MCUs – devices using the ARM7 core are
very popular. But embedded designs typically require
deterministic execution of code, small footprint
and ease of software design. And in some
cases, a 16-bit device is better able to offer these
characteristics. So how to choose? The designer’s
evaluation should start with an examination of the
rival architectures.
The most useful comparison to make is between
the Harvard architecture most often used by 16-
bit MCUs, and the von Neumann architecture used
by ARM7 devices.
How the Different Architectures Execute Instructions

Figure 1. The ARM7 core’s von Neumann architecture, as implemented by NXP
The von Neumann architecture used by the ARM7
core, named after the mathematician and early
computer scientist John von Neumann, was originally
developed for use in computers. Its distinctive
feature is that it uses a single storage structure to
hold both program memory and data memory (see
Figure 1).
The 16-bit Harvard architecture has a crucial difference
from the von Neumann architecture. Here,
there are separate memory spaces for program
memory and data memory (see Figure 2). A 16-
bit RISC CPU core can often have a wider program
memory bus, and one or two 16-bit data buses.

Figure 2. The Harvard architecture as implemented in some 16-bit MCUs
Even a cursory comparison of Figures 1 and 2
reveals one obvious advantage of the Harvard architecture:
the separate data and program buses
allow simultaneous access of both program memory
and data memory. Since one bus never has to
wait while the other hogs its bus, faster and more
deterministic execution is often possible. This is
particularly valuable in applications that are rich in
single-cycle, single-word instructions.
So, for instance, some 16-bit
MCUs operate at full speed from
on-board Flash (at up to 40MHz).
This high operating frequency,
efficiently used by the internal
circuitry of the devices, provides
the deterministic performance expected
by control engineers.
By contrast, in the ARM7 core, the
separation between the CPU and
memory can lead to a situation
known as the ‘von Neumann bottleneck’. Under
some circumstances (when the CPU is required to
perform minimal processing on large amounts of
data), this gives rise to a serious limitation in effective
processing speed. This is because the CPU
is constantly waiting for vital data to be transferred
to or from memory. Interestingly, the bottleneck
has the potential to become tighter the higher the
CPU operating frequency rises and the bigger the
memory grows.
Suppliers of ARM7 devices have worked hard to
mitigate this inherent weakness. NXP, for instance,
in its LPC2000 family provides a Memory Accelerator
Module (MAM). This is a CMOS Flash memory
that is 128-bits wide. One fetch reads four 32-bit
words at a time. The devices can also implement a
complex fetching sequence that uses multiple buffers
to speculatively pre-fetch and store one batch
of data or instructions while the CPU is still executing
a previous batch.
The main purpose of this complex scheme is to
prevent a branch or data access from stalling the
processor, especially during real-time operations.
Nevertheless, this effort by NXP to work around
the von Neumann architecture produces branches
that break up the sequence of code execution and
requires the constant flushing and re-filling of the
pre-fetch buffers. This consumes clock cycles and
slows down code execution.
At this stage, then, it could seem as though the
16-bit device is a clear winner. But it is not so simple.
For a start, ARM7 devices can mitigate their
congested architecture by driving traffic through
at higher frequencies. While typical 16-bit MCUs
operate at 40MIPS CPU speed, NXP’s LPC2100
ARM7 family is quoted as offering CPU speeds up
to 72MHz. To use an analogy, the Harvard device
is like a wide road that accommodates more traffic;
the ARM device might have a narrower road,
but each vehicle is moving a whole lot faster than
in the Harvard device. Indeed, suppliers of ARM7
devices can always find performance tests in
which their device executes code faster than a
comparable Harvard-architecture 16-bit device,
and vice versa.
Second, look again at the Harvard architecture
in Figure 2: it uses two different buses, one for
data and one for program memory. This architecture
is far from easy to implement in silicon. And
the difficulty of both designing and manufacturing
Harvard-architecture devices is reflected in higher
priced silicon or a less abundant feature set.
The lack of competitiveness of 16-bit devices in
terms of price and features is not their only drawback.
They are also constrained in the amount of
memory that they can access. At best, a 16-bit
device today can address 256kB of Flash. The
roadmaps of some 16-bit manufacturers envisage
devices offering 512kB of Flash, but that is yet to
be delivered in working silicon.
There is one other important area in which to draw
a comparison between 16-bit Harvard devices and
ARM7-based 32-bit MCUs: code efficiency. The
question of how efficiently a device compiles code
is highly dependent on many variables, not least
of which is the quality of the compiler. In addition,
ARM7 devices can be operated in ‘Thumb’ mode,
in which instructions are compressed to 16-bits
wide to save on memory footprint.
Nevertheless, in almost all cases, the compiled
code for a 16-bit MCU will be slightly smaller
than comparable instructions implemented in an
ARM7 device.
It is worth mentioning that the competitive strength
of ARM-based devices could be set to grow in the
near future with the release of MCUs based on
the company’s ‘Cortex’ core. ST Microelectronics
in June 2007 was the first large silicon vendor to
announce an ARM Cortex MCU.
Interesting claims are already being made for the
Cortex core, which breaks from the ARM7 mold by
adopting Harvard architecture. It is said to be fast
and to offer considerably lower power consumption
than the ARM7. It also implements a new 16-
bit ‘Thumb 2’ instruction set which produces much
smaller compiled code than the ARM7, even when
used in Thumb mode.
But with ST’s Cortex device only recently introduced
to the market, it is early to be making
definitive judgments about the comparative benefits
of ARM Cortex versus either traditional 16-bit
devices or the ARM7 core.
So at least in the coming months, the main battle
will continue to be between traditional 16-bit devices
and the ARM7 core. And as we can see in
the two contrasting products described below, the
designer’s choice will generally be determined by
the needs of the application.
Home Alarm Control Panel Illustration
The first application I would use to illustrate my
argument is a control panel for a home alarm, in
the form of a touch-sensitive color LCD panel.
This design requires a large memory (to save the
graphics data for the LCD). It also needs a robust
communication interface to the host controller.
An ARM7 MCU such as a device from NXP’s LPC24xx
family would be ideal for this application.
First, it has many important features integrated
into the chip, including Ethernet, USB host, CAN
bus, 4 UARTs and an LCD driver that can drive
LCDs up to 1024 x 768 pixels. Such a rich feature
set will not be found on any 16-bit device due to
the difficulty of designing and manufacturing such
a device in the Harvard architecture.
The LPC24xx family also offers 512kB of on-board
Flash and 98kB of on-board SRAM – enough to
support the large memory requirements of an LCD
driver. (Interestingly, on introduction, ST’s new
Cortex device integrates only 128kB of Flash – but
more is expected soon.)
The device allows the designer to offer flexible
communications interfaces to the host controller.
In office buildings, an Ethernet infrastructure
will already exist, so this will provide a route back
to the controller. For residential installations, the
device provides an RS-485 link using one of the
UART or CAN bus interfaces.
Again, no 16-bit device can offer such a wide
choice of communications interfaces on-chip.
Wireless Smart Detector
This design offers a contrast: here, a powerful
MCU will be required to perform intensive calculations,
including Fast Fourier Transforms (FFT), to
process algorithms quickly and communicate back
to the main system controller before going into
Sleep mode. The smart detector’s battery should
be capable of lasting five years.
ARM7 MCUs typically have higher power consumption
than that offered by comparable 16-bit processors
– a fact which makes the 16-bit processors
more appropriate for battery powered operation.
In particular, their peripherals can be driven at
variable voltages and at variable clock speeds.
This means the design engineer can use the same
MCU to perform interval measurements, to wake
up from Sleep mode within 10µs, and go into high
speed operational mode to execute algorithms fast
if it detects a change from a sensor. This whole
routine of waking up, executing code and going
back to sleep can be accomplished in as little as
150µs. This is at least eight times faster than an
ARM7 device, which has a minimum wake-up time
of more than 1ms.
Crucially, with a 16-bit Harvard MCU, the software
execution time is deterministic and predictable.
This means that the software engineer can lower
the clock speed after code execution to save battery
power, send the data to the system controller
and be confident that the data will emerge from
the MCU at the right speed and in the right format.
The ST ARM Cortex 32-bit MCU, however, might
soon be a disruptive technology in this space as a
move to Harvard architecture, with its low power
consumption and fast start-up from Sleep, might
rewrite the current rulebook.
Conclusion
Both examples above require a powerful MCU: one
– the alarm panel – to drive multiple peripherals
such as the LCD display and communications interfaces;
the other – the smart detector – to perform
fast and deterministic code execution. They
illustrate the different strengths of each type of
device. So for now, the embedded developer must
make a choice between a 16-bit Harvard architecture
MCU and a 32-bit ARM7 MCU by reference to
the needs of their application.
But technology never stops changing, and if the
potential of ARM’s Cortex core is fulfilled it could
kill the 16-bit MCU. Only time will tell…
Future Electronics can supply 16-bit MCUs from
Microchip, ST Microelectronics and Freescale
Semiconductor; 32-bit ARM7 MCUs from NXP
and ST Microelectronics; and the new ARM Cortex
MCU from ST.