CN101133390A

CN101133390A - Single-cycle low-power cpu architecture

Info

Publication number: CN101133390A
Application number: CNA2006800071570A
Authority: CN
Inventors: 本杰明·F·弗勒明; 埃米尔·兰布朗克
Original assignee: Atmel Corp
Current assignee: Atmel Corp
Priority date: 2005-03-04
Filing date: 2006-01-25
Publication date: 2008-02-27
Also published as: WO2006096250A2; US20090319760A1; TW200703103A; DE112006000514T5; AU2006221114A1; US20060200650A1; WO2006096250A3

Abstract

An n architecture for implementing an instruction pipeline within a CPU comprises an arithmetic logic unit (ALU) (210) , an address arithmetic unit (AAU) (215) , a program counter (PC) (220) , a read not only memory (ROM) (230) coupled to the program counter (220) , to an instruction register (240) , and to an instruction decoder (250) coupled to the arithmetic logic unit (210) . A random access memory (RAM) (270) is coupled to the instruction decoder (250) , to the arithmetic logic unit (210) , and to a RAM address register (260) .

Description

Single cycle low-power CPU (central processing unit) framework

Technical field

The present invention relates to integrated circuit.More particularly, the present invention is a kind of equipment and method that is used for the microcontroller framework, and described microcontroller framework is implemented instruction pipeline and carried out with accelerated procedure and the minimizing power consumption.

Background technology

Improve system clock frequency and be the conventional process of the calculated performance that is used to improve the CPU (central processing unit) (CPU) in microprocessor or the microcontroller.The those skilled in the art knows, according to formula: Poc CV ²F, the exemplary power that CPU consumed (P) depends on total CPU door electric capacity (C), supply voltage (V) and system clock frequency (f).

Can reduce power consumption by reducing C, V or f.Determine electric capacity (C) on the chip by need in order to the door quantity of implementing design.Normal basis minimizes a door number that needs in order to realize required logic optimize determined design, and described design provides very little improvement chance usually.By technology be based upon described technical transistorized operation associated feature and limit operating voltage (V).System clock frequency (f) often provides best improvement chance.

Need in order to finish the clock cyclic number of instruction by reducing, can reduce system clock frequency when reducing power, to keep computational throughput.Perhaps, system clock frequency can be kept, and the calculating of higher rate can be carried out for given power consumption.In either case, reduce each and calculated required energy.Therefore, reduce needing clock cyclic number in order to execution command be the important method that is used to improve the performance of CPU.Therefore, required is a kind of being used for need realize the method for high-performance CPU (that is, having high-speed and low power consumption) in order to the clock cyclic number of execution command by reducing.A kind of system and method for parallel execution of instructions can satisfy this demand by the number that increases the instruction of carrying out with the system clock circulation of giving determined number.

Summary of the invention

The present invention is the equipment and the method for the instruction pipeline of a kind of CPU of being used for.In an exemplary embodiment, the present invention incorporates in the microcontroller, and described microcontroller is operated the MCS-51 instruction set with 16 bit address and 8 bit data.The microcontroller of the known MCS-51 of the utilization instruction set of those skilled in the art is 8051 microcontrollers.With reference to figure 1, the block scheme of 8051 microcontrollers as be known in the art has internal bus, and it is provided for the co-route that communicates between ROM (read-only memory) (ROM), random-access memory (ram) and ALU (ALU).Each is attached to described internal data bus address register (AR), accumulator registers (ACC), temporary register (TMP), data pointer register (DPTR) and SP (SP).

Typical 8051 microcontrollers well known in the prior art need three system clocks to circulate to obtain the one-byte instruction from ROM (read-only memory) (ROM) to order register (IR).The present invention obtains one-byte instruction and reduces to the circulation of individual system clock.The length of the instruction in the MCS51 instruction set is one, two or three bytes.In prior art 8051 microcontrollers, therefore instruction fetch operation can need nearly nine system clock circulations:

Instruction length (byte)	Obtain (system clock)
Instruction length (byte)	Obtain (system clock)	One two three	Three six nine

In prior art 8051 microcontrollers, need to have surpassed acquisition time in order to the time of finishing the instruction execution, because the only just required microoperation of executable instruction after finishing instruction fetch operation, and microoperation must be carried out time-sharing operation to single internal bus.Usually, six of instruction needs or 12 system clocks circulate and carry out.Therefore, a byte instruction or two byte instructions will be carried out in six system clock circulations, thereby waste three system clock circulations in the process of fill order's byte instruction significantly.Three-byte instruction will need 12 system clocks to circulate to carry out, thereby waste three system clocks circulations significantly.

In one exemplary embodiment of the present invention, 16 bit address arithmetical units (AAU) by being coupled to programmable counter (PC) and the special-purpose increment/decrement unit that is coupled to stack pointer (SP) are enabled the single round-robin of every byte and are obtained.Programmable counter (PC) is increment value " 1 " continuously along with each command byte of being obtained, so that keep instruction pipeline, but stack pointer (SP) can be pressed into or move back stack independently, to start service disruption.Random-access memory (ram) is used for save routine counter (PC) value during break in service, and in case returns just recovery routine counter (PC) value from interruption subroutine.Private buffer is preserved correct return address during interruption or software transfer, to be used for being pressed into RAM.

Further improvement with respect to prior art is to implement by utilizing independent register to provide random-access memory (ram) to read the address storage and write the address storage.Special-purpose RAM writes address register and makes and can delay the write operation that and instruction is associated.Write operation through delaying makes the call instruction can be at given system clock cycle period complete operation effectively, and the write operation that is associated in system clock circulation subsequently.RAM write capability through delaying makes and can avoid owing to instruction pipeline is incured loss through delay in write operation co-pending.Independent RAM reads address storage and RAM and writes address storage register and also enable data handling capacity among the RAM: when two registers all possess identical address ram, the data instant that exists in the RAM data storage register can be used for RAM output, is written into the storage area through addressing simultaneously.Make that by feature result of calculation can be used for further handling with the time delay of minimum, thereby further enable the ability of instruction pipeline.

Instruction pre-decode path from ROM (read-only memory) (ROM) to random-access memory (ram) is provided, and it is used to walk around normal decode procedure and quickens the execution of register manipulation.In addition, during the change of the registers group of the activity in register manipulation follows the procedure status word (PSW), the registers group forward-path prevents the pipeline delay.

Under the situation of the interim storage register in the middle of do not have, provide from the output of RAM data directly to the dedicated data path of 8 bit data ALUs (ALU).Dedicated data path from ALU (ALU) to the RAM data input register also is provided.The dedicated data path feature provides the format high throughput path, its make can be from RAM reading of data, handle, and subsequently it write back RAM.This is the improvement with respect to prior art 8051 microcontrollers that utilize single internal bus.

Transmit and independent RAM reads and writes combination that address register realizes and improves and allow to finish register increment instruction in the circulation of individual system clock by dedicated data path, instruction pre-decode and group, and in two system clocks circulations, finish register and increase progressively indirectly.

Description of drawings

Fig. 1 is the block scheme of 8051 microcontrollers well known in the prior art.

Fig. 2 is the architecture block diagram according to the pipeline part of the CPU of one exemplary embodiment of the present invention.

Fig. 3 is one exemplary embodiment according to the present invention is carried out instruction pipelineization with one-byte instruction a sequential chart.

Fig. 4 is one exemplary embodiment according to the present invention is carried out instruction pipelineization with byte and two byte instructions a sequential chart.

Fig. 5 is the figure of one exemplary embodiment according to the present invention behavior in the ALU (ALU) when fill order's recursion instruction.

Fig. 6 is the figure of one exemplary embodiment according to the present invention behavior in the ALU (ALU) when carrying out two recursion instructions.

Fig. 7 is the exemplary architecture block scheme according to the address computation part of CPU of the present invention.

Fig. 8 A explanation term of execution that routine is instructed according to the utilization of the address buffer of one exemplary embodiment of the present invention.

Fig. 8 B explanation term of execution of hardware interrupts according to the utilization of the address buffer of one exemplary embodiment of the present invention.

Fig. 8 C explanation term of execution of software interruption according to the utilization of the address buffer of one exemplary embodiment of the present invention.

Fig. 9 is the exemplary architecture block scheme according to the instruction pre-decode of CPU of the present invention and RAM access part.

Figure 10 is the sequential chart that is used for register increment instruction according to one exemplary embodiment of the present invention.

Embodiment

With reference to figure 2, comprise according to CPU (central processing unit) (CPU) the pipeline architecture part 200 of one exemplary embodiment of the present invention have the input of first data, the ALU (ALU) 210 of the input of second data and data output.In an exemplary embodiment, ALU (ALU) 210 is configured to the octet computing.Accumulator registers (ACC) 290 is coupled in the data output of ALU (ALU) 210, and is coupled to random-access memory (ram) 270.In addition, one exemplary embodiment contain have first data inputs, the address arithmetic unit (AAU) 215 of the input of second data and data output.In an exemplary embodiment, address arithmetic unit (AAU) 215 is configured to last six bit word arithmetics.Programmable counter (PC) 220 is coupled in the data output of address arithmetic unit (AAU) 215.

Random-access memory (ram) 270 is organized into 256 * 8, is used for the total memory capacity of 256 bytes.Programmable counter (PC) 220 further is coupled to ROM (read-only memory) (ROM) 230, and is coupled to first data input of address arithmetic unit (215).ROM (read-only memory) (ROM) 230 is used to store CPU program (that is, treat carried out by CPU instruction sequence).In particular exemplary embodiment, reside in the ROM (read-only memory) (ROM) 230 based on the program of MCS-51 instruction set.Be stored in address value in the programmable counter (PC) 220 and be used for selecting the specific instruction of order registers to be passed to (IR) 240 in ROM (read-only memory) (ROM) 230.Order register (IR) 240 provides the interim storage to instruction, just described instruction is transferred to instruction decoder 250 afterwards.Instruction decoder 250 is coupled to second data input of address arithmetic unit (AAU) 215, and is coupled to random-access memory (ram) 270.The function of instruction decoder 250 is arithmetic/logics of distinguishing that instruction is required, and must data transmission arrive ALU (ALU).The additional functionality of instruction decoder 250 is to impel address arithmetic unit (AAU) 215 to increase progressively programmable counter (PC) 220 when needed.

Random-access memory (ram) 270 further is coupled to RAM address register (AR) 260.Second data input of ALU (ALU) 210 is coupled to random-access memory (ram) 270 in RAM/ALU link 280.Accumulator registers (ACC) 290 is coupled in first data input of ALU (ALU) 210.In particular exemplary embodiment of the present invention, RAM/ALU link 280 provides eight dedicated data path being sent to ALU (ALU) 210 from the data of random-access memory (ram) 270 data of read operation (that is, from).Utilize the microcontroller well known in the prior art of MCS-51 instruction set to adopt shared internal bus usually, it needs RAM that data are driven on the bus, is stored in the temporary register subsequently.RAM/ALU link 280 is embodied as dedicated data path provides significant improvement at the aspect of performance of CPU (central processing unit) (CPU) pipeline architecture part 200.

Those skilled in the art will realize that by the arrow designation data signal path directions among Fig. 2.In addition, should be appreciated that, can have extra logical blocks (Fig. 2 and following not shown in the figures), and it can be coupled to illustrated block, so that comprehensive ability of carrying out the MCS-51 instruction set is provided.Be understood by those skilled in the art that, only show necessary in order to put into practice those blocks of the present invention, to avoid making the related elements confusion.

Note Fig. 3 now, it is for carrying out the first exemplary sequential chart 300 of instruction pipelineization with one-byte instruction according to the present invention.The first exemplary sequential chart 300 comprises the first example system clock waveform 310, n instruction behavior Figure 32 0, (n+1) individual instruction behavior Figure 33 0 and (n+2) individual instruction behavior Figure 34 0.Among Fig. 3 and below contain time interval of the vertical dotted line piece-rate system clock in the reference diagram of sequential chart.Vertical dotted line is consistent with the positive edge transition of system clock.

Continuation is with reference to figure 3, and it is indicated at system clock time interval T _nDuring this time, n instruction stands to obtain operation.At system clock time interval T subsequently _N+1In, carry out n instruction.Simultaneously at system clock time interval T _N+1During this time, n+1 instruction stands to obtain operation.At system clock time interval T subsequently _N+2During this time, execution has been finished in n instruction.Carry out n+1 instruction, and n+2 instruction stands to obtain operation.The overall computational performance that the concurrency of and instruction between carrying out improved CPU is obtained in instruction, and is known as two-stage pipeline by the those skilled in the art.Introduce the operating characteristics of two-stage pipeline when the combination of fill order's byte and two byte instructions with reference to figure 4, Fig. 4 is the second exemplary sequential chart 400 that carries out instruction pipelineization according to the present invention with byte and two byte instructions.The second exemplary sequential chart 400 comprises the second example system clock waveform 410, n instruction behavior Figure 42 0, (n+1) individual two byte instruction behavior Figure 43 0, (n+2) individual two byte instruction behavior Figure 44 0 and (n+3) individual instruction behavior Figure 45 0.To the reference shows of described figure at system clock time interval T _nDuring this time, n instruction stands to obtain operation.At system clock time interval T subsequently _N+1In, carry out n instruction.Simultaneously at system clock time interval T _N+1During this time, first command byte of n+1 two byte instructions stands to obtain operation.At system clock time interval T subsequently _N+2During this time, execution has been finished in n instruction, and second command byte of n+1 two byte instructions stands to obtain operation.At system clock time interval T _N+3During this time, carry out n+1 two byte instructions, and first command byte of n+2 two byte instructions stands to obtain operation.At system clock time interval T _N+4During this time, second command byte of n+2 two byte instructions stands to obtain operation.At system clock time interval T _N+5During this time, carry out n+2 two byte instructions, and n+3 instruction stands to obtain operation.

Note Fig. 5 now, it is the figure of one exemplary embodiment according to the present invention behavior in the ALU (ALU) 210 (Fig. 2) when fill order's recursion instruction.Single cycle ALU application drawing 500 comprises that single cycle example system clock waveform 510, the total execution time behavior of single cycle Figure 52 0, single cycle register manipulation number obtain behavior Figure 53 0, single cycle ALU operation act of execution Figure 54 0, single cycle result writes back behavior Figure 55 0 and single cycle is obtained next instruction behavior Figure 56 0.At system clock time interval T ₁Interior a plurality of incidents, the described system clock time interval T of taking place ₁Corresponding to total execution time of single cycle instruction.Specifically, obtain the next instruction operation across total system clock time interval T ₁The register manipulation number obtains and ALU operation execution, and wherein each is only at system clock time interval T ₁A part in movable.To the further observation indication of figure, the part that the ALU operation is carried out and register manipulation number obtain operation and take place simultaneously.In addition, result write back operation occurs in next system clock time interval T ₂Begin the place.Below will explain the delay of result write back operation.

Note Fig. 6 now, it is the figure of one exemplary embodiment according to the present invention behavior in the ALU (ALU) 210 when carrying out two recursion instructions.Two circulation A LU application drawings 600 comprise that immediate operand Figure 63 0, two circulation A LU operation act of execution Figure 64 0 is obtained in two circulation example system clock waveforms 610, the total execution time behavior Figure 62 0 of two circulations, two circulations, two circulation results write back behavior Figure 65 0 and next instruction behavior Figure 66 0 is obtained in two circulations.At system clock time interval T ₁With system clock time interval T ₂The time interval in a plurality of incidents, described system clock time interval T take place ₁With system clock time interval T ₂Combination corresponding to total execution time of two recursion instructions.Obtain the immediate operand instruction at system clock time interval T ₁Carry out during this time, and in piece-rate system clock time interval T ₁With system clock time interval T ₂The rising clock of two circulation example system clock waveforms 610 finish along locating.The ALU operation is carried out and is obtained next instruction and operates in system clock time interval T ₂Begin locate initial.The ALU operation is carried out at the negative edge of two circulation example system clock waveforms 610, at system clock time interval T ₂Intimate in the middle of the place finish.Result write back operation starts from the rising edge of two circulation example system clock waveforms 610, at system clock time interval T ₃Begin the place.Obtain next instruction and operate in piece-rate system clock time interval T ₂With system clock time interval T ₃The rising clock of two circulation example system clock waveforms 610 finish along locating.

Note Fig. 7 now, it is a cpu address architecture block diagram 700, described cpu address architecture block diagram 700 comprises address arithmetic unit (AAU) 215, programmable counter 220, address buffer 730, first multiplexer 735, data pointer register 740, second multiplexer 750, the 3rd multiplexer 755, stack pointer 770, stack pointer increment/decrement unit 780 and offset register 790.By the data routing in the line indication cpu address architecture block diagram 700, and by the direction of the further designation data stream of arrow.

Second multiplexer 750 is coupled to programmable counter (PC) 220, data pointer register 740, and is coupled to first data input of address arithmetic unit (AAU) 215.In the address value that contains in address value that contains in the multiplexer 750 option program counters 220 and the data pointer register 740 one is to be used for by address arithmetic unit (AAU) 215 operations.The 3rd multiplexer 755 is coupled to accumulator registers (ACC) 290, constant offset value 760, offset register 790, and is coupled to second data input of address arithmetic unit (AAU) 215.In address deviant that contains in the address deviant that contains in the 3rd multiplexer 755 selection offset registers 790, the accumulator registers (ACC) 290 and the constant offset value 760 one is to be used for by address arithmetic unit (AAU) 215 operations.In particular exemplary embodiment, constant offset value 760 values of maintaining 1 (" 1 ") make and impel address arithmetic unit (AAU) 215 to increase progressively an instruction address value, to point to address value subsequently.

Address arithmetic unit (AAU) 215 relies on the ability of full adder to 16 bit word arithmetics.Each is all the sixteen bit register programmable counter (PC) 220, address buffer 730 and data pointer register 740.The microcontroller of the MCS-51 of utilization instruction set well known in the prior art adopts 8 ALU that data pointer register is increased progressively usually.The prior art data pointer register is generally 16 bit registers.Therefore, need a plurality of operations to carry out increment operation in the prior art: at first, the low byte of the address that data pointer is preserved partly is loaded among the ALU.Incremental change 1 is added in the address, and the result is write back the low byte of data pointer.Next, the high byte of the address that data pointer is preserved partly is loaded among the ALU, and adds the carry value from the low byte increment operation.The result is write back to the high byte of data pointer.16 arithmetic capabilities of address arithmetic unit of the present invention (AAU) 215 make can upgrade data pointer register 740 with single operation.The single operation updating ability has improved system operation speed, and supports the instruction pipeline operation above explained.

Carry out refresh routine counter (PC) 220 with each instruction.The instruction of being pointed to by programmable counter (PC) 220 is an instruction instruction before that just is being performed.The previous instruction that address in the programmable counter (PC) 220 is remained in the instruction that just is being performed provides a kind of means of keeping instruction pipeline.Be understood by those skilled in the art that programmable counter (PC) 220 upgrades fully soon, be in before the present instruction to keep.Carry out because the invention provides, so programmable counter (PC) 220 also should be able to be updated in the circulation of individual system clock with the same fast instruction of individual system clock circulation.The microcontroller of the MCS-51 of utilization instruction set well known in the prior art has special-purpose incrementor usually being used for programmable counter (PC) 220, but adopts 8 ALU to come to calculate relative branch address by skew being added to programmable counter (PC) 220.For the above reason of explaination that the argumentation with data pointer register 740 is associated, use 8 ALU to calculate a plurality of clock circulations of next the program counter value needs that is used for program branches.16 arithmetic capabilities of address arithmetic unit (AAU) 215 and be connected to offset register 790 and accumulator registers (ACC) 290 constitutes improvements over the prior art by the 3rd multiplexer 755, and make programmable counter (PC) 220 upgrade can to carry out pipeline in step by and instruction.

Address buffer 730 provides a kind of means of disposing interruption and subroutine call under the situation of the increment operation that does not interrupt programmable counter (PC) 220.Address buffer 730 is coupled to first multiplexer 735, and described first multiplexer 735 is coupled to the data output of programmable counter (PC) 220 and address arithmetic unit (AAU) 215 again.Below will explain the operation and the relation of programmable counter (PC) 220 and address buffer 730 in more detail.

Stack pointer 770 with reference to random-access memory (ram) 270 (Fig. 2) as the part of memory stack, it is frequent or with the access of the variable of zero access to need that described memory stack provides.The output of stack pointer 770 is coupled in the input of stack pointer increment/decrement unit 780.The input of stack pointer 770 is coupled in the output of stack pointer increment/decrement unit 780.In particular exemplary embodiment, stack pointer 770 is eight bit registers.Microcontroller with the operation of MCS-51 instruction set well known in the prior art utilizes single 8 ALU to be used to carry out the arithmetic sum logical order, and is used for the incremented/decremented SP.Pipeline architecture of the present invention does not allow ALU (ALU) 210 usefulness adequate time to come the incremented/decremented stack pointer.Increase progressively and decrement operations in order to provide to stack pointer 770, stack pointer increment/decrement unit 780 is provided for revising the special-purpose member of the address of being pointed to by stack pointer 770, and need not unnecessarily to rely on the ability of ALU (ALU) 210, thereby provide another improvement to prior art.

Purposes referring now to Fig. 8 A, Fig. 8 B and Fig. 8 C explaination programmable counter (PC) 220 and address buffer 730.With reference to figure 8A, according to the utilization of the address buffer of one exemplary embodiment of the present invention, it comprises buffer usage example system clock waveform 810A, present instruction tabulation 820A, programmable counter (PC) 220 contents list 830A and address buffer 730 contents list 840A term of execution of the instruction of routine in its explanation.In system clock interval T cycling time _nIn, to the reference shows of the present instruction tabulation 820A I1 that executing instruction.At system clock time interval T _nDuring this time, the address value A+1 that has the address of representing next instruction I2 in the programmable counter (PC) 220.Similarly, at system clock time interval T _nDuring this time, the address value A that has the address of representing present instruction I1 in the address buffer 730.

At system clock circulation T _N+1In, the reference shows of present instruction tabulation 820A is being carried out by programmable counter (PC) 220 now at last system clock time interval T _nThe instruction I2 of Zhi Xianging during this time.At system clock time interval T _N+1During this time, there is the address value A+2 of the address of representing next instruction I3 in the programmable counter (PC) 220, and has last address value A+1 in the address buffer 730.The term of execution that routine is instructed, promptly during execution command under the situation that does not have software or hardware interrupts (those skilled in the art is also known as vigour and uses), the carrying out that instruction is carried out and address increment is operated is lasting with same way as indicated above.The term of execution of the instruction of routine, programmable counter (PC) 220 provides instruction address, and address buffer 730 is not used in and keeps instruction pipeline.

With reference to figure 8B, its explanation the term of execution interrupting according to the utilization of the address buffer of one exemplary embodiment of the present invention, it comprises buffer usage example system clock waveform 810B, present instruction tabulation 820B, programmable counter (PC) 220 contents list 830B, address buffer 730 contents list 840B, interrupts detection incident 850 and actions summary 860B.At system clock time interval T _nIn, to the reference shows execution command I1 of present instruction tabulation 820B.At system clock time interval T _nDuring this time, the address value A+1 that has the address of representing the I2 instruction in the programmable counter (PC) 220.Similarly, at system clock time interval T _nDuring this time, the address value A that has the address of representing present instruction I1 in the address buffer 730.The next instruction of I2 instruction representative in (that is, normal program term of execution the) under the situation that does not have interrupt event pending a succession of instruction.

Buffer usage example system clock waveform 810B corresponding to system clock time interval T _nThe rising edge place at end, detection incident 850 takes place to interrupt, the beginning that its indication hardware (vigour with) interrupts.At same rising edge place, the last value of programmable counter (PC) 220 is passed to address buffer 730, makes at system clock time interval T _n _{+ 1}During this time, address buffer 730 contains the address value A+1 of the address of representative instruction I2.At system clock time interval T _N+1During this time, shown in present instruction tabulation 820B, Executive Agent's vigour first round-robin instruction H1 of instruction.First vigour is different from originally with instruction and is lacking the instruction I2 that carries out under the situation of interrupting detection incident 850.Actions summary 860B is provided at system clock time interval T _N+1The additional detail of event in CPU during this time: first address byte that loads interruption subroutine.

To focus on system clock time interval T now _N+1Additional aspect: programmable counter (PC) 220 contains the address A+2 of the address of representative instruction I3 (it generally follows instruction I2).Shown in address buffer 730 contents list 840B, address buffer 730 contains address A+1.Therefore, address buffer 730 is retained in and restarts the address that normal program is carried out required instruction I2 when interrupt event finishes.

At system clock time interval T _N+1The system clock time interval T of back _N+2During this time, shown in present instruction tabulation 820B, Executive Agent's vigour second round-robin instruction H2 of instruction.Programmable counter (PC) 220 is increased progressively by address arithmetic unit (AAU) 215 in each system clock cycle period continuation; Therefore, it is at system clock time interval T _N+2Contain address A+3 during this time.Yet address buffer 730 is retained in and restarts the required address A+1 of normal program execution when interrupt event finishes.Actions summary 860B is provided at system clock time interval T _N+2The additional detail of event in CPU during this time: load second address byte of interruption subroutine, and increments stack pointer 770:

SP←SP+1

At system clock time interval T _N+2The system clock time interval T of back _N+3During this time, shown in present instruction tabulation 820B, Executive Agent's vigour the 3rd round-robin instruction H3 of instruction.Programmable counter (PC) 220 is increased progressively by address arithmetic unit (AAU) 215 in each system clock cycle period continuation; Therefore, it is at system clock time interval T _N+3Contain address A+4 during this time.Yet address buffer 730 is retained in and restarts the required address A+1 of normal program execution when interrupt event finishes.Actions summary 860B is provided at system clock time interval T _N+3The additional detail of event in CPU during this time: in particular, increments stack pointer 770:

SP←SP+1

And the low byte of address buffer partly is loaded into by the current ram location (increase progressively before) of stack pointer with reference to (sensing):

(SP)←BUFFER：7-0

Wherein representation (SP) is indicated the address ram by stack pointer 770 references, and the BUFFER:7-0 representative contains eight least significant bit (LSB)s (low byte part) of the address buffer 730 of address A+1.It should be noted that at system clock time interval T _N+3During this time, it is parallel the generation that stack pointer increment and impact damper are pressed among the RAM, that is, the increasing progressively not influence of SP is pressed into employed address.

At system clock time interval T _N+3The system clock time interval T of back _N+4During this time, shown in present instruction tabulation 820B, Executive Agent's vigour the 4th round-robin instruction H4 of instruction.Programmable counter (PC) 220 contains the address B of first instruction address of representing interrupt service routine now.Address buffer 730 is retained in and restarts the required address A+1 of normal program execution when interrupt event finishes.Actions summary 860B is provided at system clock time interval T _N+4The additional detail of event in CPU during this time: take place to the jump of new procedures position (B is associated with the address), and the high byte of address buffer partly is loaded into by stack pointer 770 with reference in the current ram locations of (sensing):

(SP)←BUFFER：15-8

Wherein representation (SP) is indicated the address ram by stack pointer 770 references, and the BUFFER:15-8 representative contains eight highest significant positions (high byte part) of the address buffer 730 of address A+1.After high-byte load operation, the low byte of address A+1 part and high byte part all are loaded in the stacked memory, and can be used for when providing address A+1 to CPU when the interruption execution needs address A+1 when returning.

With reference to figure 8C, its explanation term of execution of the software subroutine call according to the utilization of the address buffer of one exemplary embodiment of the present invention, it comprises buffer usage example system clock waveform 810C, present instruction tabulation 820C, programmable counter (PC) 220 contents list 830C, address buffer 730 contents list 840C and actions summary 860C.At system clock time interval T _nIn, to the reference shows execution command I1 of present instruction tabulation 820C.At system clock time interval T _nDuring this time, the address value A+1 that has the address of representing call instruction C1 in the programmable counter (PC) 220.Similarly, at system clock time interval T _nDuring this time, the address value A that has the address of representing present instruction I1 in the address buffer 730.

Buffer usage example system clock waveform 810C corresponding to system clock time interval T _nThe rising edge place at end, the last value of programmable counter (PC) 220 is passed to address buffer 730, makes at system clock time interval T _N+1During this time, address buffer 730 contains the address value A+1 of the address of representative instruction C1.At system clock time interval T _N+1During this time, shown in present instruction tabulation 820C, first round-robin instruction C1 of Executive Agent's call instruction.Actions summary 860C is provided at system clock time interval T _N+1The additional detail of event in CPU during this time: first address byte of load software subroutine.

To focus on system clock time interval T now _N+1Additional aspect: programmable counter (PC) 220 contains the be called address A+2 of address of first address byte of subroutine (its generally follow instruction C1) of representative.Shown in buffer address contents list 840C, address buffer 730 contains address A+1.Therefore, address buffer 730 keeps the address of present instruction C1.

At system clock time interval T _N+1The system clock time interval T of back _N+2During this time, shown in present instruction tabulation 820C, second round-robin instruction C2 of Executive Agent's call instruction.Programmable counter (PC) 220 is increased progressively by address arithmetic unit (AAU) 215 in each system clock cycle period continuation; Therefore, it is at system clock time interval T _N+2Contain address A+3 during this time.Yet, address buffer 730 reserved address A+1.Actions summary 860C is provided at system clock time interval T _N+2The additional detail of event in CPU during this time: second address byte of load software subroutine, and increments stack pointer 770:

SP←SP+1

System clock waveform 810C corresponding to system clock time interval T _N+2The rising edge place at end, the increment value from address arithmetic unit (AAU) 215 of programmable counter (PC) 220 is passed to address buffer 730, makes at system clock time interval T _N+3During this time, address buffer 730 contains the address value A+4 of the address of representative instruction 12.I2 be after the C1 in the instruction that when subroutine returns, should carry out.At system clock time interval T _N+2The system clock time interval T of back _N+3During this time, shown in present instruction tabulation 820C, the 3rd round-robin instruction C3 of Executive Agent's call instruction.Programmable counter (PC) 220 is increased progressively by address arithmetic unit (AAU) 215 in each system clock cycle period continuation; Therefore, it is at system clock time interval T _N+3Contain address A+4 during this time.In addition, address buffer 730 contains and restarts normal program carry out required address A+4 when subroutine finishes.Actions summary 860C is provided at system clock time interval T _N+3The additional detail of event in CPU during this time: in particular, increments stack pointer 770:

SP←SP+1

(SP)←BUFFER：7-0

Wherein representation (SP) is indicated the address ram by stack pointer 770 references, and the BUFFER:7-0 representative contains eight least significant bit (LSB)s (low byte part) of the address buffer 730 of address A+4.It should be noted that at system clock time interval T _N+3During this time, it is parallel the generation that stack pointer increment and impact damper are pressed among the RAM, that is, the increasing progressively not influence of SP is pressed into employed address.

At system clock time interval T _N+3The system clock time interval T of back _N+4During this time, shown in present instruction tabulation 820C, Executive Agent's vigour the 4th round-robin instruction C4 of instruction.Programmable counter (PC) 220 contains the address B of first instruction address of representing the software subroutine now.Address buffer 730 is retained in and restarts the required address A+4 of normal program execution when subroutine finishes.Actions summary 860C is provided at system clock time interval T _N+4The additional detail of event in CPU during this time: take place to the jump of new procedures position (B is associated with the address), and the high byte of address buffer partly is loaded into by stack pointer 770 with reference in the current ram locations of (sensing):

(SP)←BUFFER：15-8

Wherein representation (SP) is indicated the address ram by stack pointer 770 references, and the BUFFER:15-8 representative contains eight highest significant positions (high byte part) of the address buffer 730 of address A+4.After high-byte load operation, the low byte of address A+4 part and high byte part all are loaded in the stacked memory, and can be used for when provide address A+4 to CPU when needing address A+4 when the execution of subroutine is returned.

By with reference to above to the explaination of Fig. 8 A, Fig. 8 B and Fig. 8 C, relation between programmable counter (PC) 220 and the address buffer 730 becomes apparent: specifically, the term of execution of normal program, programmable counter (PC) 220 points to the next instruction address, and address buffer 730 points to the current addresses value, and programmable counter (PC) 220 increased progressively in system clock cycle period.When address impact damper 730 received the currency of programmable counter (PC) 220 by first multiplexer 735, it was only carried out when finishing in instruction and upgrades.Programmable counter (PC) 220 continuous updatings, and described renewal may occur between order period.Therefore, during a round-robin part was carried out in instruction, programmable counter (PC) 220 may point to the address that is different from address buffer 730 addresses pointed.In this way, the increasing process of programmable counter (PC) 220 can be so that its speed that can be matched with the execution speed of instruction pipeline continues.If interrupt, programmable counter (PC) 220 continuous updatings so, but may be captured by address buffer 730 from the return address of interrupting.Therefore, make the decision-making of carrying out interruption concurrently with the increasing process of programmable counter (PC) 220.This represents improvements over the prior art, and prior art needs extra logic to come the increasing process of shut down procedure counter usually, and the programmable counter that successively decreases is so that recover the required return address of interrupt sequence.

Note Fig. 9 now, it is exemplary instruction pre-decode and RAM addressing block scheme 900, and it comprises the accumulator registers (ACC) 290 of first input of being coupled to ALU (ALU) 210.Multiplexer 930 selects RAM outgoing route 940A and the multiplexer that replaces to import one among the 940B, to be used to be coupled to second input of ALU (ALU) 210.Data register 950 is coupled in the output of ALU (ALU) 210.Data register 950 further is coupled to random-access memory (ram) 270.Be coupled to RAM outgoing route 940A, RAM read address register (RAR) 960A from the output of random-access memory (ram) 270, and be coupled to RAM and write address register (WAR) 960B.RAM read address register (RAR) 960A is coupled to random-access memory (ram) 270, and be coupled to RAM and write address register (WAR) 960B, described RAM writes address register (WAR) 960B and further is coupled to random-access memory (ram) 270.RAR multiplexer 935 is coupled in program status word (PSW) (PSW) register 970 and its input 990, and described RAR multiplexer 935 is coupled to RAM read address register (RAR) 960A again.Be coupled to order register (IR) 240 from the output of ROM (read-only memory) (ROM) 230.Order register (IR) 240 further is coupled to instruction decoder 250.Address pre-decode path 980 is coupled to RAM read address register (RAR) 960A with the output of ROM (read-only memory) (ROM) 230.

The combination of RAM outgoing route 940A, multiplexer 930 and ALU (ALU) 210 represents improvements over the prior art.Be understood by those skilled in the art that interim storage register is structured between multiplexer 930 and the ALU (ALU) 210 usually, to support the internal bus framework.Therefore, the prior art process need that data are delivered to ALU from random access memory is the intermediate steps of data storage interim storage register, subsequently just with data transmission to ALU.The intermediate steps of data storage in temporary register need be added the overhead of minimum system clock circulation as the processing time.RAM outgoing route 940A of the present invention provides a kind of means that are used for data directly are transferred to from random-access memory (ram) 270 ALU (ALU) 210, make it possible in the circulation of individual system clock, handle, and in same individual system clock circulation, catch the result by data register 950.

By the additional improvement that now address pre-decode path of explaining 980 is provided prior art.Some instruction is register manipulation specifically, need carry out fast with minimum clock circulation, to realize speed mentioned above and performance objective.For instance, the present invention adopts address pre-decode path 980 to realize the quick execution of MCS-51 instruction:

Instruction	Operation	Operational code
Instruction	Operation	Operational code	INC Rn INC@Ri MOV@Ri，ACC	Rn←Rn+1 (Ri)←(Ri)+1 (Ri)←ACC	00001rrr 0000011i 1111011i

Wherein instruct the INC Rn to be register increment, and variable n can be corresponding to value 0-7.The part representative that is expressed as rrr of operational code is corresponding to the binary coding of variable n.Instruction INC@Ri is indirect register increment, and variable i adopts 0 and 1 probable value.MOV@Ri, the ACC instruction moves to accumulator contents in the address of register Ri sensing, and variable i adopts 0 and 1 probable value.

All instructions of reading from ROM (read-only memory) (ROM) 230 all are transferred to RAM read address register (RAR) 960A by address pre-decode path 980, and it is based on minimum effective 4 speculative decode that begin instruction of instruction.RAM read address register (RAR) 960A contains a small amount of decode logic of creating by the well-known method of those skilled in the art, with the position 3:0 of checked operation sign indicating number.If position 3 is " 1 ", the decode logic hypothesis is carried out increment operation with register Rn so, and specifies the value of register with position 2:0.If the position 3:1 of operational code equals binary value 011, suppose that so register increases progressively indirectly, and specify register with position 0.

According to assessment each operational code in method predictive ground mentioned above, and correspondingly load RAM read address register (RAR) 960A.Yet some operational code does not need reading immediately from register.Be to save power, need a kind of means only to allow registers necessary to manipulate the pre decoding address and read RAM.By in order register (IR) 240, providing extra pre-decode operations, make about operational code and in fact whether to relate to determining of register read operation.Order register (IR) 240 contains extra logic distinguishes RAM read operation and RAM write operation.Extra logic prevents initial random-access memory (ram) 270 read operations of RAM read address register (RAR) 960A, and removing no-operation code in fact needs read operation.The initial step that has prevented waste energy that the sensor amplifier in the random-access memory (ram) 270 and interlock circuit are powered up of avoiding unnecessary read operation.

As other points for attention, 8051 microcontroller frameworks provide four registers group, and each has eight registers.Must a kind of means come in RAM address register (AR) 260 (Fig. 2) provides about four possible registers group which person to contain the knowledge of the register target of instruction.By program status word (PSW) (PSW) register 970 registers group information is offered RAM read address register (RAR) 960A.Specifically, be stored in the position 4:3 of program status word (PSW) in program status word (PSW) (PSW) register 970 and combine, so that the address object in the random-access memory (ram) 270 to be provided to RAM read address register (RAR) 960A with a 3:0 from operational code.For preventing to utilize address pre-decode path 980 to carry out pipeline occurring under the situation that register reads to incur loss through delay to existing after the writing of program status word (PSW) (PSW) register 970, provide RAR multiplexer 935 to be forwarded to address pre-decode path 980 from PSW input 990, thereby walk around the old value in program status word (PSW) (PSW) register 970 with new value with PSW.

In one exemplary embodiment of the present invention, implement the register shown in Fig. 9 with the positive edge-triggered flip-flop of principal and subordinate, write address register (WAR) 960B for order register 240, instruction decoder 250, accumulator registers (ACC) 290, data register 950, RAM read address register (RAR) 960A, RAM specifically, and program status word (PSW) (PSW) register 970.Be understood by those skilled in the art that, be used for this method of register embodiment in other circuit block that can not show in the drawings.

With reference now to Figure 10,, it is the register increment sequential chart 1000 according to one exemplary embodiment of the present invention, it comprises register increment example system clock waveform 1010, register increment example present instruction (INSTR) tabulation 1020, register increment example programmable counter (PC) 220 contents lisies 1030, RAM read address register (RAR) 960A content graph 1040, RAM writes address register (WAR) 960B content graph 1050, RAM data output (DOUT) content graph 1060, RAM data input (DIN) content graph 1070, ALU (ALU) 210 contents lisies 1080 and instruct exemplary general introduction 1090.At system clock time interval T _nIn, system carries out general instruction (by the asterisk indication in the exemplary general introduction 1090 of instruction); General instruction is associated with address A-1, and is expressed as I-1 by register increment example present instruction (INSTR) tabulation 1020.To the reference shows of register increment example programmable counter (PC) 220 contents lisies 1030 at system clock time interval T _nDuring this time, there is the address A0 be associated with the direct increment instruction of first register (INC RO) in the programmable counter (PC) 220, consistent with the operation of instruction pipeline mentioned above.For the purpose of example, the initial value of register R0 is assumed to be 2.

At system clock time interval T _N+1In, carry out the first register increment instruction I0.Programmable counter (PC) 220 contains the address A1 of next instruction (in this example, or INC is R0).RAM read address register (RAR) 960A contains 0, shown in RAM read address register (RAR) 960A content graph 1040.Value 0 is a target register address, and by being loaded among RAM read address register (RAR) 960A by address pre-decode path 980, thereby avoids the delay of advancing by instruction decoder 250.In same system clock time interval T _N+1In, the data at register target address place (value 2) are available in random-access memory (ram) 270 outputs place, by 1060 indications of RAM data output (DOUT) content graph.At system clock time interval T _N+1Before the end, by ALU (ALU) 210 described value is increased progressively, thereby be worth 3, indicated as ALU (ALU) 210 contents lisies 1080.

At system clock time interval T _N+2During this time, ALU output (value 3) is transferred to data register 950, and is indicated as RAM data input (DIN) content graph 1070.RAM writes address register (WAR) 960B and contains address value 0, its through loading so that can write back result from the execution of the direct increment instruction of first register (INC R0).Carry out the direct increment instruction I+1 of second register, shown in register increment example present instruction (INSTR) tabulation 1020.RAM read address register (RAR) 960A contains 0, shown in RAM read address register (RAR) 960A content graph 1040.960B points to same address (0) because RAM read address register (RAR) 960A and RAM write address register (WAR), so data take place in random-access memory (ram) 270 to be passed through, thereby the value of making 3 is transmitted to RAM output with the delay of minimum, shown in RAM data output (DOUT) content graph 1060.By ALU (ALU) 210 described value 3 is increased progressively to being worth 4, shown in ALU (ALU) 210 contents lisies 1080, and described result is at system clock time interval T _N+2Available before finishing.Therefore, in two system clock round-robin spans, two direct register increment operation have been finished.Discuss as mentioned, to being worth 4 the system clock time interval T of writing back subsequently _N+3Finish in (not shown).

In instructions above, with reference to specific embodiment of the present invention the present invention has been described.Yet, be understood by those skilled in the art that, under the situation that does not break away from the of the present invention spiritual widely and category of stating in the appended claims, can make various modifications and change to the present invention.For instance, by pipeline embodiment, dedicated stack pointer increment/decrement unit with use the improvement that single 16 single ALU support address buffer, programmable counter and data pointer in combination and constituted and be applicable to various microprocessors and microcontroller, comprise those microprocessors and the microcontroller of the instruction set of utilization except the MCS-51 instruction set.Therefore, instructions and graphicly should be considered as illustrative meaning, rather than restrictive meaning.

Claims

1. framework, it implements instruction pipeline with execution command in CPU (central processing unit) (CPU), and described framework comprises:

Address arithmetic unit (AAU), it has the input of first data, the input of second data and data output;

ALU (ALU), it has the input of first data, the input of second data and data output;

Programmable counter (PC) register, it is coupled to the described data output of described address arithmetic unit (AAU);

ROM (read-only memory) (ROM), it is coupled to described programmable counter, and described ROM (read-only memory) further is coupled to order register and is coupled to instruction decoder, and described instruction decoder further is coupled to described first data input of described ALU; And

Random-access memory (ram), it is coupled to described instruction decoder, and described random access memory further is coupled to the described output of described ALU (ALU) and is coupled to RAM address register.

2. framework according to claim 1, wherein said instruction pipeline is a two-stage pipeline.

3. framework according to claim 2, wherein said address arithmetic unit (AAU) can be carried out computing to the sixteen bit numeral.

4. framework according to claim 3, wherein said CPU is configured to carry out the MCS-51 microcontroller instruction set.

5. framework, it implements instruction pipeline with execution command in CPU (central processing unit) (CPU), and described framework comprises:

Data pointer register, it is coupled to the described data output of described address arithmetic unit (AAU);

The address buffer register, it is coupled to the described data output of described address arithmetic unit (AAU);

Multiplexer, it is coupled to described first data input of described address arithmetic unit, and described multiplexer is configured to one in the output of the output of described programmable counter (PC) register and described data pointer register is coupled to described first data input of described address arithmetic unit (AAU);

SP, it has input and output, and

Stack pointer increment/decrement unit, it has the input of the described output of being coupled to described SP, described stack pointer increment/decrement unit further has the output of the described input of being coupled to described SP, and described stack pointer increment/decrement unit further is configured in response to push operation and moves back stack operation and increase progressively and successively decrease described SP respectively.

6. framework according to claim 5, wherein said instruction pipeline is a two-stage pipeline.

7. framework according to claim 5, wherein said address arithmetic unit (AAU) can be carried out computing to the sixteen bit numeral.

8. framework according to claim 7, each all is the sixteen bit register in wherein said programmable counter (PC) register, described data pointer register and the described address buffer register.

9. framework according to claim 5, wherein said SP is an eight bit register.

10. framework according to claim 5, wherein said CPU is configured to carry out the MCS-51 microcontroller instruction set.

11. a method of implementing instruction pipeline in CPU (central processing unit) (CPU), described method comprises:

Programmed instruction the term of execution, utilize special-purpose increment/decrement unit to change the value of stack pointer;

Present instruction the term of execution, increase progressively program counter register to point to the next instruction address;

When non-interrupt instruction is carried out end, current instruction address is stored in the address buffer; And

Allow described program counter register the term of execution interrupting, to increase progressively, the term of execution of described interruption, interrupt return address is maintained in the described address buffer simultaneously.

12. method according to claim 11, it further comprises: non-interrupt instruction the term of execution carry out and to obtain one in one-byte instruction and first byte of obtaining multiple byte instruction.

13. method according to claim 11, it further comprises: shared sixteen bit address arithmetic unit (AAU) between described programmable counter, described address buffer and data pointer.

14. method according to claim 11, it further comprises: when obtaining operational code, provide to register directly and the pre decoding of going ahead of the rest of one in the indirect random-access memory (ram) of the register address.

15. method according to claim 11, it further comprises: carry out read operation and write operation to random-access memory (ram) during instruction cycle simultaneously.

16. method according to claim 11, it further comprises: carry out the read operation of random-access memory (ram) during instruction cycle and postpone write operation to described random-access memory (ram), till the next instruction circulation.

17. method according to claim 11, it further comprises: when the read operation of random-access memory (ram) and write operation are target with the same address location in the described random-access memory (ram), transmit data by described random-access memory (ram).

18. method according to claim 11, it further comprises: the path that outputs to data arithmetic logic unit (ALU) from random-access memory (ram) is provided, and described data routing is sent to described ALU (ALU) with data from described random-access memory (ram) in individual system clock time interval.

19. a framework, it implements instruction pipeline with execution command in CPU (central processing unit) (CPU), and described framework comprises:

Data arithmetic logic unit (ALU), it has the input of first data, the input of second data and data output;

Data register, it is coupled to the described data output of described ALU, and is coupled to random-access memory (ram);

Totalizer, it is coupled to described first data input of described data arithmetic logic unit (ALU);

RAM outgoing route, its output with described random access memory are coupled to described second data input of described data arithmetic logic unit (ALU);

RAM writes address register, and it is coupled to the described output of described random-access memory (ram), and be coupled to described random-access memory (ram) write address input;

The RAM read address register, its address of reading of being coupled to described random-access memory (ram) is imported, described RAM read address register further is coupled to the described output of described random-access memory (ram), and is coupled to described RAM and writes address register;

ROM (read-only memory), it is coupled to order register, and described order register further is coupled to instruction decoder;

Address pre-decode path, it is coupled to described RAM read address register with described ROM (read-only memory); And

Program status word (PSW) (PSW) register, it is coupled to described RAM read address register.

20. framework according to claim 19, it further comprises the PSW forward-path, and described PSW forward-path is coupled to described RAM read address register with the input of described program status word (PSW) (PSW) register.

21. framework according to claim 20, wherein said data arithmetic logic unit can be carried out computing to eight bit data.

22. framework according to claim 21, wherein said CPU is configured to carry out the MCS-51 microcontroller instruction set.

23. framework according to claim 22, wherein said instruction pipeline is a two-stage pipeline.

24. a framework, it implements instruction pipeline with execution command in CPU (central processing unit) (CPU), and described framework comprises:

Address arithmetic unit (AAU) member, it is used for arithmetical operation is carried out in input of first data and the input of second data;

Programmable counter (PC) member, it is used for stored programme counter (PC) address;

The data pointer member, it is used to store data address;

The address buffer member, it is used for the buffered instructions address;

The multiplexer member, it is used for one of described programmable counter (PC) member and described data pointer register member is coupled to described ALU member;

The stack pointer member, it is used for the storage stack address; And

The stack pointer increment/decrement member, it is used in response to push operation and moves back stack operation and increase progressively and successively decrease described SP respectively.

25. framework according to claim 24, wherein said arithmetic address unit (AAU) member can be carried out computing to the sixteen bit numeral.

26. framework according to claim 24, each all is used to store the sixteen bit binary digit in wherein said programmable counter (PC) member, described data pointer member and the described address buffer member.

27. framework according to claim 24, wherein said stack pointer member storage octet.

28. framework according to claim 24, wherein said CPU is configured to carry out the MCS-51 microcontroller instruction set.

29. a method of implementing instruction pipeline in CPU (central processing unit) (CPU), described method comprises:

Replace internal bus with a plurality of dedicated data path couplings, the method for the described internal bus of described replacement further is made up of following steps:

Allow described program counter register the term of execution interrupting, to increase progressively, the term of execution of described interruption, interrupt return address is maintained in the described address buffer simultaneously;

Shared sixteen bit address arithmetic unit (AAU) between described programmable counter, described address buffer and data pointer;

When the read operation of random-access memory (ram) and write operation are target with the same address location in the described random-access memory (ram), transmit data by described random-access memory (ram); And

The path that outputs to data arithmetic logic unit (ALU) from random-access memory (ram) is provided, and described data routing is sent to described ALU (ALU) with data from described random-access memory (ram) in individual system clock time interval.