WO2001061476A2

WO2001061476A2 - System including cpu and code translator for translating code from a second instruction set to a first instruction set

Info

Publication number: WO2001061476A2
Application number: PCT/US2001/004742
Authority: WO
Inventors: John E. Derrick; Robert G. Mcdonald
Original assignee: Chicory Systems, Inc.
Priority date: 2000-02-14
Filing date: 2001-02-13
Publication date: 2001-08-23
Also published as: AU2001241486A1; WO2001061476A3

Abstract

A system includes a CPU and a code translator. The CPU may execute instructions from a first instruction set, and the code translator may translate Java code sequences from Java bytecodes to code sequences having instructions defined in the first instruction set. The translated code sequences may be executed by the CPU. While Java is used as an example of instructions which the code translator translates, the code translator may translate instructions from any instruction set to instructions executable by the CPU. In one embodiment, the translated code sequences may be stored in an area of memory reserved for the code translator. Additionally, a table may be maintained which maps addresses of instructions in the untranslated code sequences (e.g. the Java code sequences) to addresses of translated code sequences in the reserved memory area. Thus, prior to activating the code translator to translate a code sequence, the table may be checked to determine if the code sequence has already been translated. If so, the translated code sequence may be executed instead of activating the code translator.

Description

TITLE: SYSTEM INCLUDING CPU AND CODE TRANSLATOR FOR TRANSLATING CODE FROM A SECOND INSTRUCTION SET TO A FIRST INSTRUCTION SET EXECUTABLE BY THE CPU

BACKGROUND OF THE INVENTION

1 . Field of the Invention

This invention relates to the field of programmable computing systems and. more particularly, to translation between instruction sets in computing systems.

2. Description of the Related Art

Java programs have become quite popular in recent years, particularly in view of the popularity of the Internet. Λ Java program is a program written to the Java language specification, and executes on a Java virtual machine ( JVM ). A JVM is an abstract computing machine which may be supported on any haidware platform ( employing any suitable operating system and a native instruction set defined via any of a variety of architectures, e.g. xi-16. Powei PC, A RM. Alpha, etc. ), and thus a Java program may execute on a variety of diiϊei ent hardware platforms. Thus, a Java program may be written and made available for download on the Internet, and the Java program may be executed on any hardware platform w hich supports the JVM.

In many cases, the JVM is itself a program w ritten in the native instruction set of a gι\ en haidware platform. The JVM is called when a Java program is to be executed, and the JVM reads the instructions (termed "bytecodes") in the Java program one at a time in program order and emulates the execution behavior of the instructions on the hardware platform. Executing a program by having an interpreter program read each instruction and emulate that instruction's execution behavior is referred to as "interpreting" the program, or operating in an "interpreter mode". Unfortunately, executing programs m an interpreter mode typically results in a slow execution speed. In an attempt to speed the execution, software just-in-time (JIT) compilers have been pi oposed. Λ JIT compiler complies Java bytecodes into instructions specified by the native instruction set of the hardwaie platform upon which execution is desn ed. While executing the compiled code is faster than execution in interpreter mode, the software compilation process itself is relatively time consuming. Thus, a large amount of memory is typically dedicated to storing the compiled code, so that the amount of time required to perform the compilation may be absorbed by performing the compilation once and allowing for the compiled code to be executed many times.

While the JIT compiler provides for speedier execution, the large amount of memory required to store the compiled code makes the JIT compiler unsuitable for certain types of machines. For example, set top boxes, personal digital assistants, and other hand-held computing devices generally have a limited amount of memory. Thus, dedicating a large amount of memory to store compiled Java code is not possible in these types of computing devices.

SUMMARY OF THE INVENTION

The problems outlined above are in large part solved by a system as described herein. The system includes a CPU and a code translator. The CPU may execute instructions from a first instruction set, and the code translator may translate Java code sequences from Java bytecodes to code sequences having instructions defined in the first instruction set The translated code sequenc -s may be executed by the CPU Since the translated code sequences are native to the CPU, the translated code se juences may be executed with high performance Additionally, since the translation is performed in hardware, the translation may be performed in a relatively short period of time Accordingly, fewer translated code sequen s may be stored in memory at any given time and the performance achieved may still be substantial While Java I -, used as an example of instructions which the code translator translates, the code translator may translate instructions from any instruction set to instructions executable by the CPU

In one embodiment, the translated code sequences may be stored in an area of memory reserved for the code translator Additionally, a table may be maintained which maps addresses of instructions in the untranslated code sequences (e g the Java code sequences) to addresses of translated code sequences in the reserved memory area Thus, prior to activating the code translator to translate a code sequence, the table may be checked to determine if the code sequence has already been translated If so, the translated code sequence may be executed instead of activating the code translator

Broadly speaking, an apparatus is contemplated comprising a CPU and a code translator coupled to the CPU The CPU is configured to execute instructions defined in a first instruction set The CPU is configured to detect a first code sequence including instructions defined in a second instruction set The code translator is configured to translate the first code sequence into a second code sequence including instructions defined in the first instruction set The CPU is configured to activate the code translator responsive to detecting the first code sequence Additionally, a method is contemplated In a CPU configured to execute instructions defined in a first instruction set, a first code sequence is detected which includes instructions defined in a second instruction set A code translator coupled to the CPU is activated in response to the detecting The first code sequence is translated in the code translator to a second code sequence including instructions defined in the first instruction set

Still further, another method is contemplated A first code sequence having one or more instructions defined in a first instruction set is translated to a second code sequence having one or more instructions defined in a second instruction set The second code sequence is stored An indication of the first code sequence and the second code sequence is recorded in a table

BRIEF DESCRIPTION OF THE DRAWINGS Other objects and advantages of the invention will become apparent upon reading the following detailed description and upon reference to the accompanying drawings in which

Fig 1 is a block diagram of a computing system

Fig 2 is a block diagram of an exemplary memory map for one embodiment of the computing system shown in Fig 1 Fig 3 is a block diagram illustrating a storage model for a source instruction set and a target instruction set, according to one embodiment of the computing system shown in Fig 1

Fig 4 is a flowchart illustrating operation of one embodiment of the computing system shown m Fig 1 during invocation of a method

Fig 5 is a flowchart illustrating operation of one embodiment of the computing system shown in Fig 1 in response to an interrupt from the code translator.

Fig. 6 is a block diagram illustrating one embodiment of translated and non-translated code streams.

Fig. 7 is a block diagram of one embodiment of the code translator shown in Fig 1.

Fig. 8 is a block diagram of one embodiment of a translate unit shown in Fig. 7. Fig. 9 is a block diagram of one embodiment of a stack to register transform unit shown in Fig. 8.

Fig. 10 is a block diagram of one embodiment of a fetch unit shown in Fig. 7.

Fig. 1 1 is a block diagram of one embodiment of a translate unit shown in Fig 7

Fig. 12 is a table illustrating assignment of source operands according to one embodiment of a stack to l egister transform unit shown in Fig. 9. Fig. 13 is a table illustrating resulting stack transform according to one embodiment ot a stack to register transform unit shown in Fig 9 for a decode group of instructions.

Fig. 14 is a table illustrating resulting free list according to one embodiment ot a stack to register transform unit shown in Fig 9 for a decode group of instructions

1 ig. ι 5 is a flowchart illustrating operation of one embodiment of a decode unit shown in Fig 8. Fig 16 is an exemplary code sequence which may be produced by the decode unit shown m Fig. 8 according to the flowchart shown in Fig. 15

Fig 17 is a flowchart illustrating operation of one embodiment of a decode unit shown in Fig 8

Fig 18 is an exemplary code sequence which may be produced by the decode unit shown Fig. 8 according to the flowchart shown in Fig. 17. While the invention is susceptible to various modifications and alternative forms, specific embodiments thereoi are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that the drawings and detailed description thereto are not intended to limit the invention to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the present invention as defined by the appended claims.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

Turning now to Fig 1 , a block diagram oi one embodiment of a system 10 is shown Other

are possible and contemplated The illustrated system 10 includes a central processing unit (CPU) 12, a memory controller 14, a memory 16, a Peripheral Component Interconnect (PCI) bridge 18, a PCI bus 20, a code translator 22, and an interrupt controller 24. CPU 12 is coupled to PCI bridge 18, memory controller 14, and interrupt controller 24. Memory controller 14 is further coupled to memory 16 PCI bridge 18 is further coupled to PCI bus 20 Code translator 22 is coupled to interrupt controller 24 and to PCI bus 20. In the illustrated embodiment, code translator 22 includes a source address register 26, a target address register 28, a control register 30, and a status register 32 In one embodiment. CPU 12. memory controller 14, and PCI bridge 18 may be integrated onto a single chip or into a package as illustrated by the dotted line surrounding these components in Fig 1 (although other embodiments may provide these components separately).

Generally, CPU 12 is capable of executing instructions defined in a first instruction set (the native instruction set of system 10) The native instruction set may be any instruction set. e g. the ARM instruction set, the PowerPC instruction set. the x86 instruction set. the Alpha instruction set. etc Code translator 22 is provided for translating code sequences coded using a sc cond instruction set, different from the native instruction set, to a code sequence coded using the native mstructior set Instruction sequences coded using the second instruction set are referred to as "non-native" code sequences, and code sequences coded using the first instruction set of CPU 12 are referred to as "native" code sequences. When CPU 12 detects that a non-nafve code sequence is to be executed. CPU 12 (I) stores the source address of the non-native code sequence in source address register 26, (n) stores the target address at which code translator 22 is to write the translated code sequence; and (in) stores a command in control register 30 to activate code translator 22. In response to the command, code translator 22 reads the non-native code sequence from the source address, translates the non-native code sequence to a native code sequence, and stores the native code sequence at the target address Code translator 22 provides a status for the translation in status register 32, and signals CPU 12 that the translation is complete In the present embodiment, for example, code translator 22 may assert an interrupt signal to interrupt controller 24. which may subsequently interrupt CPU 12 CPU 12 may access interrupt controller 24 to determine the source ol the interrupt (e g interrupt controller 24 may be coupled to PCI bus 20) In response to determining that the source of the interrupt is code translator 22, CPU 12 may read status register 32 to ensure that no errors occurred during the translation, and may then execute the native code sequence stored at the target address. It is noted that the source and target addresses are addresses identifying memory locations within the memory 16

In one embodiment, code translator 22 is configured to translate Java code sequences to the native instruction set Thus, Java bytecodes will be used as an example of a non-native instruction set below. However, the techniques described below may be used with any non-native instruction set Additionally, the Java instruction set uses a stack-based programming and storage model, while the native instruction set may use a register-based programming and storage model. The techniques described below for converting between the Java instruction set and the register-based native instruction set are applicable to converting any other stack-based instruction set As used herein, the term "stack-based programming and storage model" or "stack-based instruction set" refer to a model or instruction set in which operands for instructions are stored in a stack, generally in memory Thus, execution ot an instruction typically involves a memory reference for the operands (except lor immediate operands). On the other hand, the terms "register-based programming and storage model" or "register-based instruction set" refer to a model or instruction set in which operands for instructions are stored in a set of registers defined by the architecture Each register is identified via a register index, and the register indexes are coded into the instructions to specify the operands of the instructions Operand fetch for instructions in a register-based instruction set are then generally reads of the registers, typically implemented within the CPU Register-based instruction sets often use explicit load/store instructions to load operands from memory locations to registers for subsequent instructions to use as operands and to store results from registers to memory locations. Furthermore, the term "instruction set" as used herein refers to a group of instructions defined by a particular architecture Each instruction in the instruction set may be assigned an opcode which differentiates the instruction from other instructions in the instruction set, and the operands and behavior of the instruction are defined by the instruction set. Thus, Java bytecodes are instructions within the instruction set specified '>y the Java language specification, and the term bytecode and instruction will be used interchangeably herein when ( iscussing Java bytecodes. Similarly, ARM instructions are instructions specified in the ARM instruction set, F jwerPC instructions are instructions specified in the PowerPC instruction set. etc.

Since code translator 22 may translate from a stack-based instruction set to a register-based instruction set, code translator 22 may include hardware for translating the stack references in the stack-based instruction set to register indexes in the register-based instruction set More particularly, a subset (or "pool") of the registers may be reserved to store stack operands. Code translator 22 may assign register indexes as values are pushed onto the stack, and may use those register indexes as source operands for instructions which reference the stack. After the values are popped from the stack, the corresponding registers may be free for use for another value pushed onto the stack. Thus, the register pool may store the topmost operands on the stack, and memory may be used for lower items ( as will be described in more detail below ). The register-based instruction set may be most efficient at accessing operands in registers ( since loads and stores may be needed to read the values from memory), and thus keeping items at the top of the stack in registers may enhance performance

As an alternam e to reserving the pool of registers, code translator 22 may be configured to statically or dynamically allocate registers from the register set of CPU 12 into the register pool Code translator 22 may generate native instructions to store the registers selected for the pool to a scratchpad memory area (preserving the values in the selected registers), and then these registers may be used to stoi e stack items In a static embodiment, the entire pool of registers may be allocated at the beginning of a translated code sequence. In a dynamic embodiment, registers may be allocated to the pool as additional registers are needed during the translation. At the end of the translated code sequence, code translator 22 may insert instructions to restore the values of these registers by reading the values from the scratchpad memory area (after storing the items to the operand stack) In one embodiment, code translator 22 may translate instructions beginning at the source address and up to a basic block boundary in response to being activated by CPU 12. Generally, instructions within a basic block are not branch instructions (e.g. conditional or unconditional branches, call or return instructions, etc. ) Once a basic block is entered, each instruction in the basic block is executed. The basic block boundary is formed by an branch instruction. Upon translating the bianch instruction, code translator 22 may update status register 32 and assert the interrupt signal. Other embodiments may employ branch prediction and speculatively translate instructions past the basic block boundary based on the branch prediction If the branch prediction is incorrect, the speculative translation may be discarded.

In another embodiment, code translatoi 22 may translate instructions through an unconditional branch, stopping translation when a conditional branch instruction or the end of the code sequence is encountered. The unconditional branch instruction may be deleted from the translated code sequence ( "folded out") and the instructions at the target address of the unconditional branch instruction may be inserted in-line in the translated code (sequential to the instructions translated from the code preceding the unconditional branch instruction). Such an embodiment may further provide speculative translation beyond conditional branches, as mentioned above Additionally, code translator 22 may limit the total number of instructions translated before stopping and signalling CPU 12 The total number may be the number of source instructions (e g non-native instructions) or the number of target instructions (e.g. native instructions). Alternatively, the number of bytes may be limited (and may be either the number of bytes of source instructions or the number or bytes of target instructions). The limit on the number of bytes/instructions may be programmable in a configuration register of code translator 22 (not shown). In one particular implementation, for example, a maximum size of 64 or 128 bytes of translated code may be programmably selected.

Because code translator 22 translates code in hardware, code translator 22 may be capable of producing native code sequences corresponding to Java code sequences more rapidly than a software JIT compiler. Accordingly, system 10 need not dedicate a large amount of memory to store translated code sequences. Instead, a relatively small amount of memory may be used and additional sequences may be translated by code translator 22 as needed.

Generally, CPU 12 executes native code sequences and controls other portions of the system in response to the native code sequences. More particularly, CPU 12 may execute the JVM for system 10, including the inteφreter mode to handle exception conditions detected by code translator 22. The JVM executed by CPU 12 may include all of the standard features of a JVM and may further include code to activate code translator 22 when a Java code sequence is to be executed, and to jump to the translated code after code translator 22 completes the translation. Code translator 22 may insert a return instruction to the JVM. CPU 12 may further execute the operating system code for system 10, as well as any native application code that may be included in system 10 Memory controller 14 receives memory read and write operations from CPU 12 and PCI bridge 18 and performs these read and write operations to memory 16 It is noted that some of the read and write operations presented by PCI bridge 18 may be read and write operations generated by code translator 22 (e.g. read operations from the source address and subsequent addresses and write operations to the target address and subsequent addresses). Memory 16 may comprise any suitable type of memory, including SRAM, DRAM, SDRAM, RDRAM, or any other type of memory PCI bridge 18 facilitates communication between PCI bus 20 and memory controller 14 or CPU 12. More particularly, source address register 26, target address register 28, control register 30, and status register 32 may be memory-mapped registers. PCI bridge 18 may detect read or write operations to the addresses to which the registers are mapped, and transmit those operations on PCI bus 20 to code translator 22 As mentioned above, PCI bridge 18 may also detect read and write operations from code translator 22 to memory 16 on PCI bus 20 and may transmit those operations to memory controller 14

Interrupt controller 24 generally receives interrupt signals from code translator 22 and other devices within system 10 (not shown), and prioritizes the interrupts received. If one or more interrupts have been signalled, interrupt controller 24 may assert the interrupt signal to CPU 12. CPU 12 may then access interrupt controller 24 to determine the source of the highest priority pending interrupt, and may service that interrupt. It is noted that, while the PCI bus is used as an exemplary peripheral bus in the embodiment of Fig. 1, any other bus may be used. For example, the Universal Serial Bus (USB), IEEE 1394 bus, the Industry Standard Architecture (ISA) or Enhanced ISA (EISA) bus, the Personal Computer Memory Card International Association (PCMCIA) bus, etc. may be used. Still further, the Advanced RISC Machines (ARM) Advanced Microcontroller Bus Architecture (AMBA) bus, including the Advanced High-Performance (AHB) and/or Advanced System Bus (ASB) may be used, as may the Handspring Interconnect specified by Handspring, Inc. (Mountain View, CA). Still further, code translator 22 may be connected to memory 16 using a Unified Memory Architecture connection. In other alternatives, code translator 22 may be directly connected to CPU 12 or memory 16, or may be integrated into CPU 12, memory controller 14, or PCI bridge 18.

In other embodiments, interrupt controller 24 may be deleted and code translator 22 may assert an interrupt signal directly to CPU 12. Still further, other embodiments may employ semaphores in memory for communication between CPU 12 and code translator 22. Any technique for communicating code sequences to be translated and completion of the translation may be used.

Turning now to Fig. 2, a block diagram illustrating an exemplary contents of memory 16 is shown. More particularly, memory 16 in the example is storing a Java Virtual Machine (JVM) 40, a Java class 42 including a first Java method 44 and a second Java method 46, a translation cache table 48, and a scratch pad memory 50 including a translated method 52

JVM 40 includes the native instructions to implement the Java Virtual Machine Specification, and further includes instructions to activate code translator 22 if Java bytecodes are to be executed Flowcharts shown in Figs. 4-5 below may illustrate portions of JVM 40 used to interface with code translator 22

Class 42 is an exemplary Java class including methods 44 and 46 Methods 44 and 46 are coded using Java bytecodes On the other hand, translated method 52 in scratch pad memory 50 includes native instructions which, when executed, perform the same function in system 10 as method 44 would if executed in inteφreter mode. Trans u.d method 52 may comprise a portion of method 44, if code translator 22 has not yet translated all of method 44 Furthermore, if translated method 52 would exceed the size of scratch pad memory 50. the translation of a later portion ot method 44 may overwrite the translation of an earlier portion of method 44 within scratch pad memory 50 Translated method 52 may comprise multiple code sequences, each terminated with a return instruction which returns to JVM 40 JVM 40 may then check the next address to be executed against translation cache table 48 to determine if the code sequence is already translated and residing in scratch pad memory 50 If the code sequence is aheady translated, JVM 40 calls the translated code sequence. Otherwise, JVM 40 activates code translator 22 to translate the code sequence.

Accordingly, if method 44 is to be translated by code translator 22. the source address stored into source address register 26 may be the entry point of method 44 within memory 16 The target address may be an address within scratch pad memory 52. Fig. 2 further illustrates that the translated method code is placed in different memory locations than the original method code. In this manner, the untranslated, non-native code sequence is available tor inteφreted execution in the event of an exception during the translation process

Translation cache table 48 is used to store information related to translated methods More partir ulai ly, translation cache table 48 may comprise a number of entries Each entry may include a method reference identifying the method (or portion of the method if the method is translated in portions, e.g., because it includes conditional branches or is too long to translate as a whole). For example, the method reference may be the source address of the first instruction translated by the corresponding translated code sequence stored in scratch pad memory 52. The entry may further include a pointer to the translated code sequence (e g. an address withm scratch pad memory 52) and the size of the translated code sequence. Other information may be included as desired. It is noted that, in an embodiment in which a maximum size of the translated code is implemented, scratch pad memory 52 may be allocated m units of the maximum size and translation cache table 48 may include an entry for each unit within scratch pad memory 52

It is noted that one or more memory locations within memory 16 may correspond to memory-mapped registers (e.g. registers 26-32, as well as any additional configuration/control registers which may be implemented according to various embodiments of code translator 22) Turning now to Fig. 3, a diagram illustrating a stack-based programming and storage model (e.g. the Java programming and storage model) and a re μster-based programming and storage model (e.g. the programming and storage model of CPU 12) when executing translated Java code sequences is shown. In the stack-based programming and storage model, a stack ( 0 is stored in memory 16 A top of stack (TOS) pointer is maintained by the JVM 40 which identifies the memory location storing the stack item which is at the top of the stack. The TOS pointer may be more succinctly referred to as the stack pointer, and is stored in a register 62. Register 62 may be one of the registers in the register set of CPU 12, for a JVM implemented as native code operating on CPU 12 As items are pushed onto the stack, the items are stored into memory locations contiguous to the memory location indicated by the stack pointer, and the stack pointer is updated to indicate the new top of stack. As items are popped from the stack, the stack pointer is updated to indicate the new top of stack (e.g. if item S(0) is popped in Fig. 3, the stack pointer is updated to indicate S( l ))

The stack 60 is represented in the register-based based programming and storage model (after the corresponding code is translated by code translator 22) by a register pool 64, a stack transform 66, a stack 68, and a memory top of stack (MTOS) pointer 70 As Fig 3 illustrates, a portion of the top of stack 60 are stored in registers within register pool 64 (which is a subset of the registers included in CPU 12) Accordingly, access to the operands at the top of the stack may be efficient in a register-based programming and storage model, since these operands are stored in registers Operands further down the stack are stored in stack 68, with the MTOS pointer 70 indicating the top of the stack 68 (e.g. item S(4) in this example, where items S(0) through S(3) are stored in registers) It is noted that MTOS pointer 70 may be stored in another register within the register set, outside of the register pool 64 More particularly, MTOS pointer 70 may be stored in register 62, and may be the stack pointer prior to entry into the translated code, m one embodiment. In such an embodiment, updates to the stack pointer register stored in register 62 may be deferred until the translated code sequence is terminated. As items are pushed onto the stack, registers from register pool 64 are allocated to store the items. As items are popped from the stack, the registers storing those items become free for allocation during a subsequent push. Code translator 22 manages the register pool 64 and assigns register indexes for the operands of instructions in the translated Java code sequences based on which registers are storing the top of stack operands Code translator 22 maintains stack transform 66, which maps the stack locations of stack 60 to the register indexes of the registers in register pool 64 assigned to those stack locations For example, register 72 in Fig 3 is storing the top stack item S(0) Stack transform 66 thus provides the register index identifying register 72 for instructions needing the top of stack value as an operand

Code translator 22 may also handle the overflow and underflow of register pool 64 If a push is detected m the code sequence and all the registers in register pool 64 are storing stack items (overflow), one or more registers in the register pool may be freed by pushing the values in those registers onto stack 68. More particularly, the registers storing the stack items farthest from the top of the stack may be freed in this fashion Code translator 22 may automatically generate the store instructions (to be executed by CPU 12) to push the values onto stack 68 and free the registers (storing these instructions at the target address along vith the instructions representing the translated Java code sequence), or may generate an interrupt to CP J 12 and have an interrupt service routine provide the instructions to push the values to memory. Alternatively, coi e translator 22 may monitor the number of free registers available and, when the number is less than i threshold vah e, free some of the registers by pushing their contents to stack 68 Similarly, if the registers in register pool 64 are storing no stack items ( underflow), code translator 22 may generate instructions to load the values from the top of stack 68 into the registers (or may use an interrupt to CPU 12 similar to the above description)

Turning next to Fig 4. a flowchart illustrating certain operations of one embodiment of JVM 40 when 5 invoking a method is shown Other embodiments are possible and contemplated The steps shown in Fig 4 are illustrated in a particular order for ease of understanding Howe\ er. any suitable order may be used

When inv oking a method. JVM 40 determines it the method has previously been translated by code translator 22 and still remains w ithin scratch pad memory 50 (decision block 100) More particularly, JVM 40 scans ti anslation cache table 48 to determine if the method is recorded in the table If the method has been ti anslated. JVM 40 branches to the pointer from translation cache table 48 and executes the translated code (step 108) On the other hand, if the method has not been translated, JVM 40 activates code translator 22 More particulai ly JVM 40 stoi es the source address ot the method in source addiess register 26 (step 102 ), the target addiess at w hich the translated method is to be wntten within sci atch pad memory 50 into target address register 28 (step 104), and the command into control register 30 w hich initiates code translation in code translator 22 (the "go" ^ command) (step 106) JVM 40 then waits (or a signal tiom code translator 22 that the translation is complete (e g an interrupt is signalled) JVM 40 may perlorm other activities while waiting for the signal, if desired

Turning next tc Fig 5. a flow chart lllusti ating certain opeiations of one embodiment of JVM 40 when an interrupt from code translatoi 22 is received is shown Other embodiments aie possible and contemplated The steps show n in Fig 5 are illustrated in a paiticulai oider for ease of understanding However, any suitable order 0 may be used Pπoi to pei forming the actions illustrated in Fig 5, JVM 40 may determine that the interrupt is from code translatoi 22. if CPU 12 may receive interrupts from multiple sources within system 10

JVM 40 reads status register 32 to determine if the translation completed successfully (step 120) There may be a v ariety of reasons why the translation could not be completed successfully Tor example, certain embodiments of code translator 22 may signal an interrupt with unsuccessful translation if an undei flow or 5 overflow of register pool 64 occurs Additionally, cei tain embodiments of code translator 22 may signal an interrupt if the translated code sequence exceeds the size of scratch pad memory 50 or the maximum size of a translated code sequence Other embodiments may signal an interrupt to handle certain Java instruction encodings (bytecodes) w hich may be too dif ficult to translate in hardware These bytecodes may be executed in the inteφreter. and then code translator 22 may be reactiv ated to continue translating the subsequent bytecodes Code 0 translator 22 may be configured to generate instructions in the translated code sequence which store the stack items which are in registers of CPU 12 to the operand stack in memory ( "spill the registers to the operand stack") and update the stack pointer maintained by the JVM to reflect the current stack state Additionally, the code translator may update another register of CPU 12 which stores the program counter (PC) of the Java code sequence for the JVM, to reflect the instructions translated by code translator 22 In this manner, the stack pointer and PC may 5 reflect the operation of the Java instructions translated by code translator 22

If status register 32 indicates unsuccessful translation (decision block 122), JVM 40 may execute the method (or the portion of the method which is unsuccessfully translated, e g beginning at the source address stored in source address register 26 prior to the exception) in inteφreter mode to handle the exception condition (step 124) Fig 6 below illustrates the handling of exceptions detected by code translator 22 If status register 32 indicates no exception. JVM 40 may update the translation cache table 48 with information indicating the source method, a pointer to the translated code, the size, etc (step 1 18). JVM 40 may manage translation cache table 48 in any suitable fashion. For example, entries in translation cache table 48 may be managed in a first-in. first-out (FIFO) fashion, reusing the oldest entry after each entry has been used. Alternatively, entries may be managed in a least recently used (LRU) fashion.

JVM 40 may determine if there is additional code to translate (decision block 126). If so, the source address, target address, and command value may be stored in source address register 26, target address register 28, and control register 30, respectively (steps 102-106) Additionally, JVM 40 may branch to the translated code and execute the code (step 128). It is noted that the flowchart shown in Fig. 5 may represent speculation on the pan of JVM 40 that the translated code sequence executes properly Furthermore, JVM 40 may speculate on the direction of a conditional control-flow instruction to determine the next code sequence to translate Other embodiments may execute the translated code sequence first, then activate code translator 22 to translate the next code sequence to be executed

Fig 6 is a block diagram illustrating the handling of exceptions detected by code translator 22. Illustrated in Fig 6 is a Java code stream 130 which includes a call to Java method 132. A corresponding translated Java code stream 134 generated by code translator 22 in response to Java code stream 130 and a translated method 136 generated by code translator 22 in response to Java method 132 are also shown. The call, from Java code stream 130, to Java method 132 is illustrated by solid arrow 138. A corresponding call from translated code stream 134 to translated method 136 is illustrated by solid arrow 140 Dotted arrow 142 illustrates the detection, by code translator 22 during translation of method 132 to method 136, of an exception. The exception causes JVM 40 to execute Java method 132 in inteφreter mode (dotted arrow 144). It is noted that, if a portion of method 132 has been translated and executed successfully, JVM 40 may execute, in inteφreter mode, the portion of the method for which an exception was detected during translation (rather than executing the entire method in inteφreter mode). Turning next to Fig. 7, a block diagram of one embodiment of code translator 22 is shown Other embodiments are possible and contemplated. In the embodiment of Fig. 7, code translator 22 includes a PCI interface unit 150. a fetch unit 152, a translate unit 154, a write unit 156, source address register 26 (within fetch unit 152), target address register 28 (within write unit 156), control register 30, and status register 32. PCI interface unit 150 is coupled to fetch unit 152, translate unit 154, write unit 156, control register 30, status register 32, source address register 26, target address register 28, PCI bus 20, and an interrupt line 158. Fetch unit 152 is further coupled to translate unit 154 and control register 30 Translate unit 154 is further coupled to status register 32 and write unit 156.

Generally speaking, m response to a "go" command written into control register 30, fetch unit 152 is configured to begin fetching source instructions Fetch unit 152 may initiate fetching from the source address, and may receive fetch addresses from translate unit 154. Additionally, fetch unit 152 may be configured to prefetch addresses ahead of translate unit 154 requests, if desired Translate unit 154 may receive the source instructions and may predecode the source instructions to determine stack change information corresponding to each instruction Additionally, translate unit 154 may decode the source instructions into target instructions. Translate unit 154 translates the stack operand references of the source instructions into register indexes for the target instructions. The target instructions, with register operand assignments, are then provided to write unit 156 Write unit 156 provides the target instructions to PCI interface unit 150 along with a target address (initially the address in target address register 156 and subsequently incremented as instructions are stored out) PCI interface unit 150 writes the translated instructions to memory 16 via PCI bus 20 Once the translation of the code sequence is stopped (e g an exception condition or basic block boundary is detected, the maximum translation size is reached, etc ), translate unit 154 updates status register 32 with the stahis of the translation Additionally, translate unit 154 may generate a return instruction to the JVM Responsive to the status being updated (and subsequent to completing the write commands from write unit 156), PCI interface unit 150 may assert the interrupt signal on interrupt line 158 to interrupt CPU 12 for execution of the translated code sequence

For the description of portions of one embodiment of code translator 22 provided with respect to Figs 7- 18, the terms "source instructions" and "target instructions" will be used to refer to instructions fetched by code translator 22 and generated by code translator 22, respectively For a system embodiment similar to system 10 shown in Fig 1 , source instructions may be non-native instructions (e g Java bytecodes), and target instructions may be nati e instructions for CPU 12

As used heiein. the term "stack change information" refers to information indicative of a modification of an operand stack by a corresponding source instruction T he stack change information may take any suitable form In one embodiment the stack change information may include a number of pushes performed by the source instruction a number of pops performed by the source instruction, and a stack pointer modification by the source instruction (e g the difference between the number of pushes and the number of pops, or vice versa) Other embodiments may include alternative encodings of the stack change information, including any subset of the above information

As mentioned above with respect to Fig 1, while PCI interface unit 150 is shown in the present embodiment (e g with respect to Figs 7-18), other embodiments may use any suitable external interface For example, the Universal Serial Bus (USB), IEEE 1394 bus, the Industry Standard Architecture (ISA) or Enhanced ISA (EISA) bus. the Personal Computer Memory Card International Association (PCMCIA) bus, etc may be used Still further, the Advanced RISC Machines (ARM) Advanced Microcontroller Bus Architecture (AMBA) bus, including the Advanced High-Performance (AHB) and/or Advanced System Bus (ASB) may be used as may the Handspring Interconnect specified by Handspring, Inc (Mountain View, CA)

Turning next to Fig 8, a block diagram of one embodiment of translate unit 154 is shown Other embodiments are possible and contemplated In the embodiment of Fig 8, translate unit 154 includes a predecode unit 160, a decode unit 162, and a stack to register transform unit 164 Stack to register transform unit 164 is coupled to decode unit 162, write unit 156, and fetch unit 152 Decode unit 162 is further coupled to predecode unit 160, which is further coupled to PCI interface unit 150

Generally, fetch unit 152 is configured to begin fetching source instructions from the source address stored in source address register 26 responsive to detecting the "go" command in control register 30 As the code is translated by translate unit 154. stack to register transform unit 164 may generate fetch addresses for fetch unit 152 Fetch unit 152 may continue to generate addresses until directed to stop fetching by translate unit 154 (via a stop fetch signal illustrated in Fig 8)

Predecode unit 160 is coupled to receive the source instructions from PCI interface 150, and predecodes each source instruction to determine stack change information Predecode unit 160 supplies the source instructions to decode unit 162 along with the stack chi nge information. Decode unit 162 generates target instructions for each source instruction, and provides the target nstructions ana the stack change information to stack to register transform unit 164. The stack change info mation may then be used to assign register operands to the target instructions in stack to register transform unit 16^

It is noted that more than one target instruction may be generated for various source instructions. The stack change information corresponding to each target instruction may be derived from the stack change information provided by predecode unit 160. For example, in one embodiment, each target instruction may perform at most one push or one pop. In such embodiments, sufficient target instructions to perform each push and pop as specified by the source instruction may be generated

Predecode unit 160 may be implemented in any suitable fashion For example, predecode unit 160 may comprise a programmable logic array (PLA) structure, combinatorial logic, or a lookup table (either a read-only memory (ROM) lookup table, or a random access memory (RAM) lookup table) In lookup table form, each byte code could be assigned an entry in the table, with the number of pushes, the number of pops, and the stack pointer modification stored in the entry Additionally, Java bytecodes may include a wide prefix which indicates that the operands each occupy two stack entries. Accordingly, if the wide prefix is included, the values from the lookup table may be left-shifted by one bit to double the numbers The left-shift may be performed in the senseamps at the output of the table, or via muxes outside the table, as desired. Table 1 below illustrates exemplary values stored in the lookup table for one embodiment of predecode unit 160 which may produce a number of pushes, number of pops, and a stack pointer modification for Java bytecodes As mentioned above, certain Java byte codes may not be translated by code translator 22. Those instructions are indicated by "NT" in the number of pushes, number of pops, and stack pointer modification columns of table 1. Which instructions are not translated may be varied from embodiment to embodiment, including embodiments which translate all instructions In another alternative, predefined code sequences ("macros") may be stored in memory 16 for one or more Java byte codes (e g byte codes which are complex to translate in hardware but also frequently used) When code translator 22 encounters a byte code for which a macro is provided, code translator 22 may generate a branch to the macro The macro may return to the next instruction in the translated code sequence after completing execution

Table 1 : Exemplary Table of Predecode Information

Decode unit 162 may be implemented in any suitable fashion, similar to predecode unit 160 For example, decode unit 162 may comprise a programmable logic array (PLA) structure, combinatorial logic, or a lookup table ( either a read-only memory ( ROM) lookup table, or a random access memory (RAM) lookup table). In lookup table form, each byte code could be assigned an entry in the table, with the corresponding set of target instructions stored in the entry

In the illustrated embodiment, decode unit 162 is coupled to a virtual stack pointer (V^'SP) legister 166, a v irtual program counter ( VPC) register 167, and a spill count register 168 Decode unit 162 may use these registers to assist in generating tai et instructions for the translated code sequence More particularly, decode unit 162 may use the VSP register 166 and the VPC register 167 to defer updates to the registers which JVM 40 uses to store the stack pointer to the operand stack and the PC of the Java code sequence, respectively Rather than generate target instructions which update the PC and stack pointer registers as part of the target instructions corresponding to each source lnstiuction. code translator 22 may record the cumulative updates to these registers for the source instructions which ha e been processed by decode unit 162. Decode unit 162 is coupled to receive the stop fetch signal from stack to register transform unit 164, and in response to the stop fetch signal, decode unit 162 may generate target instructions which add the cumulative update to the stack pointer ( from VSP register 166) to the stack pointer register and add the cumulative update to the PC ( from VPC register 167) to the PC register Thus, these instructions may update the stack pointer register and PC register to reflect the effects of the source instructions which have been translated to target instructions in the translated code sequence Decode unit 162 may use the stack pointer modification provided by predecode unit 160 for each source instruction to generate the cumulative stack pointer modification for the source instructions processed ir pai ticular clock cycle. Decode unit 162 may add the stack pointer modifications provided by predecode unit 162 to the current value in VSP register 166 to generate the updated value for VSP register 166, and may store that value back into VSP register 166 On the other hand, decode unit 162 may determine the PC updates by decoding the source instructions. Decode unit 162 may determine the length of each instruction, and may add the lengths of one or more source instructions processed during the clock cycle to the current value stored in VPC register 167 to generate the updated value for VPC register 167 The updated value may be stored back into VPC register 167. Accordingly, VSP register 166 may indicate the cumulative effect on the stack pointer of source instructions processed by decode unit 162 since being activated by CPU 12 Similarly, VPC register 167 may reflect the cumulative effect on the PC of source instructions processed by decode unit 162 since being activated by CPU 12. Subsequent to generating the target instructions to update the PC and stack pointer registers, decode unit 160 may clear VSP register 166 and VPC register 167 to prepare for translating the next code sequence on the next activation. Alternatively, registers 166 and 167 may be cleared when code translator 22 is activated to translate another code sequence. Decode unit 162 may use spill cornt register 168 to support dynamic allocation of registers to the register pool for storing stack items. Upon activati to translate a code sequence, decode unit 162 may allocate one or more registers for use to store stack items. Stack to register transform unit 164 may allocate the actual register indexes (e g., counting backward from the highesi register index), but decode unit 162 may control the number of registers allocated Decode unit 162 may gei erate instructions to store the current contents of the registers to scratchpad memory 50, and may keep a count of registers thus freed in spill count register 168 (and may signal stack to register transform unit 164 with an indication that a register has been freed). Decode unit 162 may monitor the cumulative stack modification stored in VSP register 166. If the cumulative modification indicates that the number of items pushed onto the stack exceeds the number of items popped by a number close to or equal to the number of registers allocated (as indicate by spill count register 168), decode unit 162 may generate additional instructions to allocate registers into the register pool and may update spill count register 168 accordingly When decode unit 162 receives an asserted stop fetch signal, decode unit 162 may generate instructions to restore the allocated registers, using the spill count to indicate how many instructions to generate Additional details regarding dynamic allocation of registers into the register pool are provided further below. It is noted that, tor embodiments which statically allocate registers into the register pool at the beginning ol a translated code sequence or embodiments in which the register pool is reserved by the JVM for used by code translator 22, spill count register 168 may not be needed and may be eliminated.

Turning next to Fig 9, a block diagram of one embodiment of stack to register transform unit 164 is shown Other embodiments are possible and contemplated. In the embodiment of Fig 9, stack to register transform unit 164 includes a translate control circuit 170, a free list 172, a stack transform storage 174, a transform circuit 176, and a top of stack pointer 178 Transform circuit 176 may comprise a register assign circuit 180 and a final stack transform circuit 182 Translate control circuit 170 is coupled to free list 172, stack transform storage 174, and fetch unit 152 Free list 172 is further coupled to transform circuit 176. Stack transform storage 174 is further coupled to transform circuit 176 and to top ol stack pointer 178 Transform circuit 176 is further coupled to write unit 156

Generally, stack transform storage 174 stores the mapping of register indexes to stack items which represents the state of the stack corresponding to instructions which have previously been processed by stack to register transform unit 164 Free list 172 stores the register indexes from register pool 64 which are free for assignment to newly pushed stack items. Each clock cycle, stack transform storage 174 and free list 172 provide register indexes to transform circuit 176 for assignment as operands of instructions, and free list 172 and stack transform storage 174 are updated to reflect the effects on the stack of the instructions processed during that clock cycle.

In one embodiment, stack transform storage 174 may comprise a register file or RAM for storing register indexes The register file may be operated as a wrap around buffer, with the current top of stack indicated by top of stack pointer 178 In another embodiment, stack transform storage 174 may be configured to store multiple indexes in each entry The entry including the index indicated as the top of sta :k and the subsequent entry may be read each clock cycle, and the indexes which comprise the top of stack may be prov ided from the indexes stored in the two entries. In one embodiment, free list 172 may be a FIFO storing the free register indexes and presenting the register indexes at the head of the list to transform circuit 176 Register assign circuit 180 assigns register indexes for the source and destination operands of a particular target instructions based on the stack change information of the instructions preceding that particular target instruction within the same decode group (concuπently received from decode unit 162) and based on the free list and stack transform information provided by free list 172 and stack tianslorm storage 174 respectively For example, the second instruction (in program order) of a decode group is assigned register indexes based on the stack translorm and tree list information, modified bv the effects on the stack ol the first instruction (as indicated by the stack change information coπespondmg to the first instruction) Similarly the third instruction of the decode group is assigned register indexes based on the stack transform and tree list information, modified by the effects on the stack of the first and second instructions ( as indicated by the stack change information corresponding to the first and second instructions) Register assign circuit 180 provides the register indexes and target instructions to write unit 156

More particularly, register assign circuit 180 is configured for each instruction to assign source operand register indexes ot the registers w hich are the top of the stack and the next to the top of the stack ( as modified by the preceding instructions ithin the decode group) The destination operand i egister index for each instruction is the head of the free list ( as updated to delete register indexes consumed for destination operands of preceding instructions within the decode group) It is noted that a particular instruction mav hav e a destination operand if that instruction pushes a v alue onto the operand stack as defined by the soui ce instruction set

As an example Fig 12 may be a truth table illustrating opei ation ot one embodiment of registei assign circuit 180 for assigning source operands to the third target instruction (instruction 2 ) in a decode group In the embodiment shown, each target instruction may cause one push, one pop, or no stack change Lach row of the table illustrates the number ot pushes and pops caused by each of instructions 0 and 1 , and the resulting source operand assignment for instruction 2 (Si cO and Sic l ) T he source operand assignments are listed in terms of stack transform information prior to the ef fects of instructions w ithin the decode group ( Sfnumber]) or free list information prior to the effects of instructions within the decode group ( I [number]) More particularly, S[0j may be the register index corresponding to the stack item at the top ol the stack (as indicated bv the stack transform information) S[ l ] may be the registei index corresponding to the stack item second to the top of the stack (as indicated by the stack transform information), S[2] mav be the register index corresponding to the stack item third to the top of the stack (as indicated by the stack transform information) etc Similarly, F[0] mav be the register index at the head of the free list, F[ l ] mav be the register index second to the head of the free list, etc It is noted that the first three rows of Tig 12 ( which show instruction 1 as causing zero pushes and zero pops) may also illustrate source operand assignment tor instruction 1. based on the pushes and pops of instruction 0 and the stack transform and tree list information prior to the effects of instruction 1 While the table illustrated in Fig 12 illustrates source operand assignment for the third instruction other embodiments may include more than three instructions in a decode group Thus the table shown in Fig 12 is merely exemplary Final stack transform circuit 182 computes the updated stack transform and free list information for update into stack transform storage 174 and free list 172, respectively For example, final stack transform circuit 182 may indicate to free list 172 how many register indexes were consumed from the head of the free list, and may provide register indexes to be added to the end of the free list (registers which stored stack items which were popped) Final stack transform circuit 174 may further provide a new top of stack pointer for top of stack pointer 178 and may provide an updated list of register indexes to stack transform storage 174.

As an example. Figs. 13 and 14 may be truth tables which illustrate the resulting stack transform (Fig. 13) and the resulting free list (Fig. 14) from a decode group of three instructions (instruction 0. instruction 1, and instruction 2) based on the stack transform and free list prior to the effects of the three instructions according to one embodiment of final stack transform circuit 182. Similar to the table shown in Fig. 12, each target instruction may cause one push, one pop, or no stack change. Each row of the table illustrates the number of pushes and pops caused by each of instructions 0, 1, and 2, and the resulting stack transform and free list for that set of pushes and pops The resulting stack transform and resulting free list are listed in terms of stack transform information prior to the effects of instructions within the decode group (Sfnumber]) or free list information prior to the effects of instructions within the decode group (F[number]). More particularly, S[0] may be the register index corresponding to the stack item at the top of the stack (as indicated by the stack transform information), S[ 1 ] may be the register index corresponding to the stack item second to the top of the stack (as indicated by the stack transform information), S[2] may be the register index corresponding to the stack item third to the top of the stack (as indicated by the stack transform information), etc. Similarly, F[0] may be the register index at the head of the free list, Ff 1 ] may be the register index second to the head of the free list, etc

The resulting stack transform illustrated in the table of Fig 13 shows the four top items of the stack transform, with the top item on the left of the list and other items increasingly away from the top as the list progresses to the right Since each instruction may include at most one push or one pop, the remaining elements of the stack transform below those shown will be the elements in increasing order from the last element shown (e.g. in the first row, the fifth element in the stack transform is S[4] and the sixth element is S[5], etc , while in the second row, the fifth element in the stack translorm is S[5] and the sixth element is S[6], etc )

The resulting free list illustrated in the table of Fig. 14 shows the list (with the head of the list on the left and increasing in order to the tail of the list on the right). Items indicated by ellipses are F[4], F[5], and F[6], in that order It is noted that the first 9 rows of the tables in Figs 13 and 14 illustrate the resulting stack transform and resulting free list for a decode group having two instructions (respectively), and the first 3 rows of the tables illustrate the resulting stack transform and resulting free list for a decode group having one instruction (respectively). Additionally, the tables may be expanded to handle decode groups having four or more instructions Translate control circuit 170 is coupled to receive the target instructions from decode unit 162, and is further coupled to receive a free list empty signal from free list 172 and a stack empty signal from stack transform storage 174. If either the free list is empty or the stack is empty, translate control circuit 170 may terminate translation of the code and store an exception encoding in status register 32. CPU 12 may service the interrupt by pushing register values to memory (free list empty) or loading values from memory (stack empty) to allow translation to continue. Alternatively, translate control circuit may generate these instructions automatically rather than causing an exception. Additionally, translate control circuit 170 may be configured to detect other exceptions (e.g. instructions which are not translated by translate control circuit 170) and to terminate translation and store an exception encoding in status register 32 for those exceptions. A different exception encoding may be provided for each type of exception.

Additionally, translate control circuit 170 may determine, from an examination of the target instructions, that the translation is complete For example, in one embodiment, code translator 22 translates up to a conditional branch or a maximum number of bytes in the source or target sequence If translate control circuit 170 determines that the translation is complete, translate control circuit 170 terminates translation and stores a non-exception status encoding into status register 32 If translate control circuit 170 terminates translation, it also asserts a stop fetch signal to fetch unit 152 to terminate additional fetching of source instructions On the other hand, if translation is to continue, translate control circuit 170 may determine the next fetch address by examining the instructions currently being operated upon by translate unit 154 The next fetch address may generally be the address sequential to the last instruction in the current decode group, or may be the target address of a branch instruction if a branch instruction is encountered The target address may be generated by translate control circuit 170 by adding the source address of the opcode of the branch instruction ( hich may be provided along with the decode group) to the displacement field of the branch instruction (one or more bytes following the branch instruction opcode)

After assertion of the stop fetch signal, stack to register transform unit 164 mav continue to perform registei assignment for target instructions provided by decode unit 162 (e g stack and PC register adjust instructions and restore instructions)

Turning next to Figs 10 and 1 1 , a second embodiment of a portion of fetch unit 152 (Fig 10) and translate unit 154 ( Fig 1 1 ) is show n The embodiment illustrated in Figs 1 1 and 12 may provide for speculative translation past a conditional branch instruction More particularly, the embodiment of translate unit 154 shown in Fig 1 1 includes a pair of transform circuits 176A and 176B, a branch predictor 190, and a multiplexor (mux) 192 Each of transform circuits 176A and 176B may be similar to transform circuit 176, and are coupled to receive the register indexes from stack transform storage 174 and free list 172, as described above for transform circuit 176 Additionally, tianslate control circuit 170 is configured to generate a sequential fetch address and a non-sequential fetch address in the present embodiment More particularly, if translate control circuit 170 detects a branch instruction in the sequential instructions received during a clock cycle, translate control circuit 170 may generate the sequential address to the branch instruction and the target (non-sequential) address of the branch instruction Both addresses may be transmitted to fetch unit 152 for fetching Fetch unit 152 may provide the sequential instructions and the non-sequential instructions to predecode unit 160 which may predecode both _^ets of instructions concurrently Predecode unit 160 may provide the predecoded instructions and stack change information to decode unit 162, which may decode each set of instructions concurrently Decode unit 162 may provide the sequential instructions fetched in response to the sequential address to transform circuit 176A, and the non-sequential instructions fetched in response to the non-sequential address to transform circuit 176B Each transform circuit 176A and 176B assigns register indexes as described above, and generates a final transform The outputs of each transform circuit 176A and 176B are provided to mux 192, which is controlled by branch predictor 190 In addition to generating two fetch addresses in response to a branch instruction, translate control circuit

170 may inform branch predictor 190 of the branch instruction Branch predictor 190 may employ any suitable branch prediction algorithm Branch predictor 190 predicts the branch instruction, and selects the sequential instructions (from transform circuit 176A) if the prediction is not-taken and the non-sequential instructions (from transform circuit 176B) if the prediction is taken The selected instructions are provided to write unit 156, and stack transform storage 174 and free list 172 are ipdated according to the final transform corresponding to the selected instructions

Since the translation subsequent t( a predicted branch instruction is speculative, translate unit 154 may be configured to store a shadow copy of the fr;e list md stack transform for recovery if the prediction is incorrect Translate control circuit 170 may update the ' tatus register in response to detecting the branch instruction (and thus CPU 12 may be interrupted to execute the translated code), as described above, while speculatively translating additional code Additionally, the address of the first instruction in the predicted path would be the source address stored by JVM 40 into source address register 26 during the next activation of code translator 22 Thus, translate control circuit 170 may be configured to store the address of the first predicted instruction and to compare that address to the address stored in source address register 26 upon the next activation (via the "go" command being stored in control register 30) If the addresses match the speculative translation was correct and may continue If the addresses do not match, the speculative translation was incorrect and the shadow copies of the free list and stack transform may be copied back into free list 172 and stack transform storage 174 Translation dow n the correct path may then be performed As illustrated in Fig 10, fetch unit 152 may include a cache 200, muxes 194, 196, and 198 and a fetch control circuit 160 Cache 200 may include two ports in this embodiment The first port, labeled IA 1 in Fig 10, may be used to fetch the sequential addresses (first the source address from source address register 26, then the sequential addresses from translate control circuit 170, as selected through mux 194) The second port, labeled IA2 in Fig 10, may be used to fetch the non-sequential addresses or a prefetch address from fetch control circuit 160 (which may employ any suitable prefetch algorithm), as selected through mux 196 Fetch control circuit 160 may provide selection controls to both muxes 194 and 196 Generally, if a "go" command is received from control register 30, fetch control unit 160 may select source address register 26 through mux 194, otherwise fetch control unit 160 may select the sequential fetch address provided by translate unit 154 Similarly, if a non-sequential fetch address is provided from translate unit 154, fetch control circuit 160 may be configured to select the non-sequential fetch address through mux 196 Otherwise, the prefetch address may be selected

Miss information corresponding to both input addresses is provided to fetch control circuit 160, and fetch control circuit 160 may control mux 198 for providing a fetch request and address to PCI interface unit 150 If one of the addresses on the input address ports to cache 200 is a miss, that address may be selected via mux 198 If both addresses are a miss, the address on IA1 is selected If both addresses are a hit in cache 200, the prefetch address may be selected

In one embodiment, the first port (IA1) on cache 200 is a read-only port while the second port (IA2) is read- write Thus, the source instructions from PCI interface 150 may be provided to an input data port corresponding to IA2 Thus, if a miss is detected on the first port, the address is re-presented on the second port when the corresponding instructions are provided for storage in cache 200 It is noted that cache 200 is optional. and if not implemented fetch control unit 160 may provide one or more fetch addresses to PCI interface 150 for fetching

Turning next to Fig 15, a flowchart illustrating operation of one embodiment of decode unit 162 for decoding source instructions is shown Other embodiments are possible < nd contemplated While the steps shown are illustrated in a particular order for ease of understanding, any suitable order may be used Particularly, various steps shown may be performed in parallel by combinatorial logic within decode unit 162 Decode unit 162 may perform the steps shown in Fig 15 each cycle that source instructions are provided by predecode unit 160 for decoding

Decode unit 162 determines if additional registers are needed in the register pool used by code translator 22 to store stack items (decision block 200) For example, decode unit 162 may compare the spill count in spill count register 168 to the cumulative stack pointer modification in VSP register 166 If the cumulative stack pointer modification indicates that the number of pushes minus the number of pops exceeds the number of registers in the register pool (or exceeds a threshold value near the number of registers in the register pool), decode unit 162 may generate target instructions to store the values in additional registers to scratchpad memory 50 so that the registers may be added to the register pool (step 202) Alternatively, decode unit 162 may cause an interrupt to be asserted if the number of allocated registers reaches a predetermined threshold, to allow software to spill the registers to the operand stack and thus free them for use for new stack operands

Decode unit 162 generates target instructions for the source instructions, as described above (step 204) Additionally, decode unit 162 updates VPC register 167 and VSP register 166 with the cumulative effects of the source instructions ( step 206)

Decode unit 162 further determines, from the stop fetch signal from stack to register transform unit 164, whether or not the translation is complete (decision block 208) If the translation is complete, decode unit 162 generates target instructions which adjust the PC register with the cumulative PC modification recorded in VPC register 167, adjust the stack pointer register with the cumulative stack pointer modification recorded in VSP register 166, and restore registers allocated to the register pool by decode unit 162 (step 210)

Turning now to Fig 16, an exemplary code sequence 220 is shown Code sequence 220 may be generated by an embodiment of decode unit 162 according to the flowchart shown in Fig 15 Decode unit 162 generates target instructions corresponding to the source instructions (e g target instructions 222) until the translation is determined to be complete (e g due to exception, translating a number of instructions equal to the translation limit, detecting an instruction which is not translated by code translator 22, etc ) Upon detecting that the translation is complete, decode unit 162 may generate target instructions to adjust the PC and stack pointer registers and to restore registers allocated to the register pool by decode unit 162

More particularly, decode unit 162 may generate a target instruction 224 to update the PC register and a target instruction 226 to update the stack pointer register In the embodiment shown instruction 224 may be an add instruction having the PC register as a destination register and as a source register, and having an immediate field carrying the cumulative PC modification (VPC) from VPC register 167 Similarly, instruction 226 may be an add instruction having the stack pointer register as a destination register and as a source register, and having an immediate field carrying the cumulative stack pointer modification (VSP) from VSP register 166 Thus, the cumulative effects of the translated instructions on each of the PC and stack pointer registers may be reflected in the registers of CPU 12 used by the JVM to store the PC and stack pointer values

Target instructions 228 may be instructions which restore the registers allocated to the register pool durmg the translation More particularly, each register which is currently storing a stack item may be restored using a store instruction (to store the stack item m the register to the operand stack) and a load instruction (to load the saved value of the register from the scratch area) Potentially, a register no longer contains useful data and only a load is generated Each store instruction may store a register value to various offsets from the stack pointer register (e g the first store instruction to offset 0, the second store instruction to offset 4, etc ) Thus, the register storing the top of stack item is stored to the top of the stack by the first store instruction, the register storing the second to the top of stack item is stored to the second to the top of stack entry by the second store instruction, etc In the embodiment shown, stack items are 32 bits, although other embodiments may employ different sizes Additionally, the stack change information provided by decode unit 162 with the store instructions may indicate no stack change (no pushes and no pops), while the stack change information corresponding to the load instructions may indicate a pop, so that the next store instruction may receive the next register index down from the top of stack from the stack transform as the source register For registers which are on the free list (and thus are not currently storing stack items), a load instruction to load the saved value from the scratch area may be generated (with the corresponding stack change information indicating no pushes or pops)

It is noted that if registers are reserved for the register pool, rather than allocated, the load instructions may be eliminated from target instructions 228 Additionally, the stack change information corresponding to the store instructions may indicate a pop, so that the next store instruction receives the next register index down from the stack transform, as described above

Finally, code sequence 220 may conclude with a return instruction to the JVM (reference numeral 230) Turning next to Fig 17, a flowchart illustrating operation of one embodiment of decode unit 162 for decoding a source conditional branch instruction is shown Other embodiments are possible and contemplated While the steps shown are illustrated in a particular order for ease of understanding, any suitable order may be used Particularly, various steps shown may be performed in parallel by combinatorial logic within decode unit 162

Additionally, Fig 18 illustrates an exemplary target code sequence 250 which may be generated by the embodiment of decode unit 162 shown in Fig 17

The embodiment illustrated by Figs 17 and 18 may be used if translation beyond a conditional branch in the source code sequence is speculatively performed by code translator 22 The embodiment illustrated in Figs 17 and 18 may be an alternative embodiment for handling conditional branches than the embodiment shown in Figs 10 and 1 1 By employing the embodiment shown in Figs 17 and 18, the restoration of registers from the register pool and the adjustment of the PC register and stack pointer register may be delayed until a return to the JVM is actually performed For example, in the case of a conditional branch, code translator 22 may predict a direction for the conditional branch (e g not taken) If not taken is predicted, decode unit 162 may operate as shown in Figs 17 and 18 Alterations if taken is predicted are described below

As shown in Fig 17, decode unit 162 may generate a target conditional branch instruction which is checking for the logical opposite of the condition checked for by the source conditional branch instruction (step 240) For example, if the source conditional branch instruction is checking for a condition of greater than, the logical opposite is less than or equal If the source conditional branch instruction is checking for equal, the logical opposite is not equal, etc In other words, if the source conditional branch instruction results in taken, the target conditional branch instruction checking for the logically opposite condition is not taken Similarly, if the source conditional branch instruction results m not taken, the target conditional branch instruction is taken The target address of the target conditional branch instruction generated in step 240 is explained m more detail below The target conditional branch instruction generated in response to step 240 is illustrated in code sequence 250 as instruction 252

Decode unit 162 generates the target instructions to adjust the PC and stack pointer registers, based on the values in the VPC and VSP registers 167 and 166, respectively (step 242) Additionally, target instructions to restore the registers allocated to the register pool are generated Step 242 may be similar to step 210 described above Accordingly, instructions 224, 226, and 228 from Fig 16 are illustrated in Fig 18 as well Finally, decode unit 162 generates a second target branch instruction (step 244) The second target branch instruction may be a return instruction to the JVM to determine if the source conditional branch instruction's target address corresponds to a translated code sequence or to determine if translation is to be initiated at the source conditional branch instruction's target address Alternatively, the second target branch instruction may be a second target conditional branch instruction checking for the same condition as the source conditional branch instruction and having a target address of instructions translated from the source instructions at the target of the source conditional branch instruction The second target branch instruction is illustrated in code sequence 250 as instruction 254

In the case illustrated in Figs 17 and 18, the source conditional branch instruction is predicted not taken If the pre iction is correct, the target conditional branch instruction generated at step 240 is taken Accordingly, the target address of the target conditional branch instruction is set to bypass the instructions which adjust the PC and stack pointer and restore the registers in the register pool and further to bypass the second target branch instruction Thus, the target address of the target conditional branch instruction generated in step 240 is the address of the instruction succeeding the second target branch instruction generated in step 244 (see arrow 256 in Fig 18) Thus, in the present embodiment, the target address may be relative to the target conditional branch instruction generated in step 240 and may be the restore count plus 3 instructions (the two adjust instructions and the second target branch instruction)

Thus, if the prediction is correct, adjustment of the PC and stack pointer registers and restoration of the registers allocated to the register pool may be delayed, and execution may continue with additional target instructions 258, generated from source instructions sequential to the source conditional branch instruction (translated after decode unit 162 performs the translation for the source conditional branch instruction as illustrated in Fig 17) If the prediction is incorrect, the adjustment and the restoration may be performed

On the other hand, if the conditional branch instruction is predicted taken, target conditional br° iΛi instruction 252 may be generated to check for the same condition as the source conditional branch instruction The target address of target conditional branch instruction 252 may remain the same as shown in Fig 18 Additionally, in the predicted taken case, target instructions 258 may comprise instructions translated from source instructions at the target address of the source conditional branch instruction

It is noted that the above description refers to PC and stack pointer registers maintained by the JVM for a Java code sequence These registers may be predetermined to be in certain registers of the register set employed by CPU 12 (and thus the register indexes for these registers may be predetermined for code translator 22) Alternatively, code translator 22 may include a configuration register (not shown) which mav be programmed with the register indexes of each register

Numerous variations and modifications will become apparent to those skilled m the art once the above disclosure is fully appreciated It is intended that the following claims be inteφreted to embrace all such variations and modifications

Claims

WHAT IS CLAIMED IS:

1 An apparatus comprising a central processing unit (CPU) ct nfigured to execute instructions defined in a first instruction set, wherein said CPU is configured to detect a first code sequence including instructions defined in a second instruction set, and a code translator coupled to said CPU, wherein said code translator is configured to translate said first code sequence into a second code sequence including instructions defined in said first instruction set, and wherein said CPU is configured to activate said code translator responsive to detecting said first code sequence

2 The apparatus as recited in claim 1 wherein said code translator includes a source address register, and wherein said CPU is configured to store an address of a first instruction within said first code sequence

3 The apparatus as recited in claim 2 wherein said code translator further includes a target address register, and wherein said CPU is configured to store an address at which said code translator is to store said second code sequence into said target address register

4 The apparatus as recited in claim 3 wherein said code translator further includes a control register, and wherein said CPU is configured to activate said code translator by storing a value in said control register subsequent to storing said addresses in said source address register and said target address register

5 The apparatus as recited in claim 4 further comprising a status register, wherein said code translator is configured to return a status of the translation from the first code sequence into the second code sequence in said status register

6 The apparatus as recited in claim 5 wherein said source address register, said target address register, said control register, and said status register are memory mapped

7 The apparatus as recited in claim 5 wherein said code translator is configured to interrupt said CPU upon completing translation of said first code sequence to said second code sequence, and wherein said CPU is configured to access said status register in response to the interrupt

8 The apparatus as recited in claim 7 wherein said CPU is configured to execute said second code sequence responsive to said status indicating that the translation is successful

9 The apparatus as recited in claim 7 wherein said CPU is configu'ed to t xecute said first code sequence in inteφreter mode responsive to said status indicating that the translation is insuccessful

10. The apparatus as recited in claim 1 wherein said first instruction set is register-based and said second instruction set is stack-based.

11. The apparatus as recited in claim 10 wherein said code translator is configured to use one or more registers from a register set defined in said first instruction set to store one or more stack items from a top of a stack defined by said second instruction set.

12. The apparatus as recited in claim 11 wherein said code translator is configured to assign a register from said one or more registers as a destination of an instruction which pushes a value onto said stack.

13. The apparatus as recited in claim 12 wherein said code translator is configured to free said register in response to an instruction which pops said value from said stack, whereby said register is assignable as a destination of another instruction

14 A method comprising: detecting, in a central processing unit (CPU) configured to execute instructions defined in a first instruction set, a first code sequence including instructions defined in a second instruction set, activating a code translator coupled to said CPU in response to said detecting; and translating, in said code translator, said first code sequence to a second code sequence including instructions defined in said first instruction set

15. The method as recited in claim 14 wherein said activating comprises; storing a source address of said first code sequence in a source address register of said code translator; storing a target address indicating memory into which said second code sequence is to be stored in a target address register of said code translator; and storing a value in a control register of said code translator, said value causing said code translator to begin said translating

16. The method as recited in claim 14 further comprising: reading a status from a status register of said code translator by said CPU; and executing said second code sequence in said CPU if said status indicates that said translating is successful

17. The method as recited in claim 14 further comprising: reading a status from a status register of said code translator by said CPU; and executing said first code sequence in said CPU in an inteφreter mode if said status indicates that said translating is unsuccessful.

18. The method as recited in claim 14 wherein said first instruction set is a register-based instruction set and said second instruction set is a stack-based instruction set. 19 The method as recited in claim 18 further comprising usmg one or more registers defined m said first instruction set to store a top of a stack defined in said second instruction set

20 A method compπsing translating a first code sequence having one or more instructions defined in a first instruction set to a second code sequence having one or more instructions defined in a second instruction set, storing said second code sequence, and recording an indication of said first code sequence and said second code sequence in a table

21 The method as recited in claim 20 further compπsing invoking said first code sequence wherein said invoking comprises checking said table for said indication

22 The method as recited in claim 21 wherein said invoking comprises executing said second code sequence without performing said translating if said indication is stored in said table

23 The method as recited in claim 22 wherein said invoking comprises said translating if said indication is not stored in said table

24 The method as recited in claim 21 wherein said indication comprises a pointer to said second code sequence

25 The method as recited in claim 24 wherein said indication comprises a size of said second code sequence