Jazelle Extension Another way to speed Java execution on a standard RISC architecture is to extend the architecture itself, to directly execute Java instructions. ARM designers added a new Java instruction set to the classic ARM architecture. The Java ISA is executed in a Java mode, which is entered on a branch. In the Java mode, the CPU executes Java bytecode instructions. Bytecodes are fetched and decoded in two stages, compared to Thumb's single stage.
Jazelle delivers eight times the performance gains of Java software JVM. Jazelle does 6.0 Caffeine Marks per megahertz and takes roughly 12,000 gates. An ARM 926EJ delivers 1000 Caffeine Marks at 200 MHz. Jazelle is implemented as an additional path in the instruction-stream decode. It extends the five-stage ARM9 pipeline to six stages. www.arm.com. See associated figure.
Espresso Aurora VLSI's Espresso Java processor is a superscalar RISC engine. The CPU has two operational units, each with an integer and a floating-point processing unit. The core delivers 32,500 Caffeine Marks at 200 MHz and 60,000 CMs at 400 MHz. Its peak execution rate is eight Java instructions/cycle. Espresso supports a 32-bit 128-entry stack. It has 32 to 256 on-chip registers (configurable) and supports 16k to 32k I and D caches with 64-bit interfaces.
Aurora also fields a low-power core, DeCaf. Both cores are available in versions that also execute C and C++. DeCaf power consumption is around 2.0 mW/MHz (0.18 µm). DeCaf delivers 20,000/35,000 Caffeine Marks (200/400 MHz). It executes four instructions/cycle or seven bytecodes/cycle. http://vodka.auroravlsi.com. See associated figure.
Lightfoot Java CPU Another way to speed Java execution is to directly execute bytecodes in hardware. This design tactic eliminates the interpreter and keeps Java's small program memory footprint. Digital Communications Technologies' Lightfoot is a direct-execution Java CPU with a one-to-one mapping between bytecodes and lightfoot instructions. The 32-bit Harvard RISC CPU provides stack execution for both Java and C. It implements an eight-register-deep stack, with extensions to data memory.
The ALU incorporates a 32-bit barrel shifter and a 2-bit step multiplier. (It takes 16 cycles for a 32-bit multiply.) The CPU implements an 8-bit "bytecode" instruction memory interface (24-bit address). Data memory is supported by a 24-bit address, 32-bit memory path. The Java core supports J2ME, JavaCard, and C. Also, the core is extensible; users can add additional instructions. The soft core is available as a VHDL IP for ASICs and for Xilinx FPGAs. On a Xilinx Vertex-II FPGA, the core requires 1710 CLBs. The core supports J2ME, JavaCard, KVM, and JINI. www.dctl.com. See associated figure.