CN1851677A

CN1851677A - Embedded processor system and its data operating method

Info

Publication number: CN1851677A
Application number: CNA2005101018520A
Authority: CN
Inventors: 董杰明; 夏晶
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2005-11-25
Filing date: 2005-11-25
Publication date: 2006-10-25
Anticipated expiration: 2025-11-25
Also published as: CN100419715C

Abstract

Said embedded type processor system includes processor for executing instruction and read write operation; cache connected between processor and main memory for providing highspeed data access; universal writing buffer connected between processor and main memory for storing cache write data in processor; replacing writing buffer connected between cache and main memory storing substitutional dirty data in cache and to proceed data replacing with cache after hitting. The present invention adopts separation writing cache to realize immolating cache function and raising cache hit ratio in embedded type embedded type processor, thereby raising processor literacy.

Description

Embedded processor system and data manipulation method thereof

Technical field

The present invention relates to digital processing system, more particularly, relate to a kind of embedded processor system and data manipulation method thereof.

Background technology

In the existing embedded processor system, when CPU carries out write operation to primary memory, write data into earlier and write in the buffer (Buffer),, thereby can improve the writing speed of CPU because it is very high to write the buffer access speed.Write the relevant position that buffer writes data with lower speed primary memory in due course again.

In addition, also can between the processor of embedded processor system and primary memory, embed a cache memory (Cache), the cache memory of von Neumann structure for example, as shown in Figure 1, can improve the processing power of this processor, further reduce the stand-by period of CPU, reduce processor peripheral hardware power consumption, make processor can in the monocycle, finish the read-write of most of data and instruction main memory accesses.

Cache memory is smaller with respect to primary memory, and between the lower primary memory of processor and operating speed, what preserve in it is the copy of the primary memory that using of present processor.Be that unit carries out exchanges data with the piece between cache memory and the primary memory.When CPU reading of data or instruction, simultaneously data or the instruction that reads is saved in the cache memory.According to the spatial locality and the temporal locality of program, when CPU need read identical or approximate data for the second time, can be from correspondingly obtaining data the cache stores piece.Because the speed of cache memory is much larger than the speed of primary memory, thereby the entire system performance is greatly improved.

The cache memory that processor system is commonly used mainly contains two kinds of the cache memories of the cache memory of Harvard structure and von Neumann structure.The cache memory of Harvard structure makes the separation of depositing of instruction and data, comprise instruction cache and data cache two parts, thereby the replacement of the instruction situation that can not cause the data that will read and write to be dropped, the conflict disappearance between the instruction and data promptly can not take place.In the cache memory of von Neumann structure, instruction prefetch and reading and writing data are finished in same high-speed cache, are used by the processor that has only a memory interface usually.Compare the easier conflict disappearance that causes data and instruction of the cache memory of von Neumann structure with the cache memory of Harvard structure.

General processor of the prior art (for example, the processor that PC and workstation use) uses the cache memory of Harvard structure, and uses single buffer and special-purpose victim cache (Victim Cache) structure write.As shown in Figure 2, before high-speed cache generation disappearance will be visited the primary memory of lower floor, check this victim cache,, then the data block of this victim cache and the data block of cache memory are replaced if find the data that need.In this processor, the same buffer of writing is all used in writing out of cacheable write data and write-back apoplexy involving the solid organs piece, thereby write buffer usually more than or equal to the size of the data block of cache memory, and the storage block of replacing when writing buffer buffer to be written such as needs empty.Under the opposite extreme situations, write that buffer is filled with and when the operation of more noncontinuity was arranged, the time of then waiting buffer to be written to empty may be very long, and the CPU streamline will be paused in this latent period, this has caused the decline of cpu performance.Though this design can effectively reduce the conflict disappearance, but the consideration to power consumption and area is less, be not suitable for embedded system, in embedded design, use one more than the block length write buffer memory and independently the victim cache on 1-5 road all will consume bigger area and power.

The processor system that Cadence company releases has been realized the unified buffer structure of writing.This processor system adopts the unified von Neumann structure high-speed memory buffer of instruction and data, 4 tunnel cascades, and block length is 4, the length of writing buffer is 8, and adopts write-back read-write strategy and LRU (least recently used) to replace algorithm.In this processor, write buffer length greater than block length, can reduce that data are replaced and the wait of data when writing out, but this has caused the waste of chip area, and when reading the disappearance generation, the time that buffers to be written such as needs empty can be very long, and this will cause CPU long-time pause streamline when reading to lack.This processor does not use victim cache, if when having more data and instruction block to be mapped in same group, some may can be called again again after being dropped, at this moment will cause the conflict disappearance, thereby reducing the hit rate of cache memory.

Summary of the invention

The technical problem to be solved in the present invention is, above-mentioned deficiency at prior art, a kind of embedded processor system and data manipulation method thereof are provided, in described embedded processor system, adopt to separate and write the function that buffer memory is realized victim cache, thereby improve the literacy and the hit rate of processor.

For solving the problems of the technologies described above, the technical solution adopted in the present invention is: a kind of data manipulation method of embedded processor system is provided, comprises:

In the time of label in comparator processor operation address and the cache memory, with described operation address with replace the address of writing in the buffer and compare;

If described replacement is write buffer and is hit, then with replacing the data block of writing in the data block replacement cache memory that hits in the buffer.

In method of the present invention, the described data block of writing in the data block replacement cache memory that hits in the buffer with replacement comprises:

If the transmission state position that buffer is write in described replacement is " 1 ", wait for that described replacement writes that total line write transactions of buffer is finished and with the position reset of described transmission state;

The data block of hitting in the buffer is write in described replacement read in described cache memory.

Method of the present invention also comprises: if described cache-hit, the data block of hitting in the processor direct read cache memory.

Method of the present invention also comprises: all miss if buffer is write in described cache memory and described replacement, and the data of described operation address correspondence in the processor direct read primary memory, and described data are write in the cache memory.

Method of the present invention also comprises: if when described processor read operation address is not cacheable, empty the described general buffer of writing, directly read the data of described read operation address correspondence then from primary memory.

Method of the present invention also comprises: if when described processor write operation address is not cacheable, directly data are write the described general buffer of writing, write primary memory again by the described general buffer of writing when bus is idle.

In method of the present invention, if be dirty with the data block that is replaced in the cache memory, then with described the data block that is replaced is write to replace write in the buffer, write buffer by replacement and when bus is idle, write primary memory again; If be clean piece in the cache memory, then it is directly abandoned the data block that is replaced.

Method of the present invention also comprises: to the cache memory read-write operation, generally write buffer write operation and replacement and write the buffer write operation and carry out priority and judge.

In method of the present invention, described priority is: the follow-up continued operation priority of any operation is the highest, secondly is the cache memory read-write operation, secondly is the general buffer write operation of writing again, is at last to replace to write the buffer write operation.

In method of the present invention, the data block that is replaced in the described cache memory adopts lru algorithm, random algorithm, FIFO algorithm, repeating query algorithm or pseudo-LUR algorithm to determine.

The present invention also provides a kind of embedded processor system, comprising:

Processor, execution command and read-write operation;

Cache memory is connected between processor and the primary memory, for processor provides high speed data access;

The general buffer of writing is connected between processor and the primary memory, cacheable write data in the storage of processor;

Buffer is write in replacement, is connected between cache memory and the primary memory, and the dirty data that is replaced in the store cache also carries out data with cache memory and replaces after hitting.

Embedded processor system of the present invention also comprises the high-speed cache steering logic, the operation requests of processor controls, label in comparator processor operation address and the cache memory, the address that described operation address and replacement are write in the buffer compares simultaneously.

Embedded processor system of the present invention also comprises multiplexer, to director cache, generally write buffer and replacement and write the bus transfer request of buffer and carry out priority and judge.

In embedded processor system of the present invention, described processor also comprises the processing logic unit, but is used to judge whether cacheable or buffer memory of described processor operations address.

In embedded processor system of the present invention, the described general length of writing buffer is 4 words.

In embedded processor system of the present invention, the length that buffer is write in described replacement is identical with the length of described high-speed buffer storage data piece.

In embedded processor system of the present invention, described replacement is write buffer and is provided with the transmission state position, described transmission state position represents that described replacement writes total line write transactions of buffer and also do not begin or finished when " 0 ", and described transmission state position represents that described replacement writes total line write transactions well afoot of buffer during for " 1 ".

Implement the data manipulation method of embedded processing systems of the present invention and embedded processor system, have following beneficial effect:

1, reduced the stand-by period of the read operation of not cacheable (can not Cache);

2, increased by two kinds of operations: cache miss is replaced and to be write cache hit and to replace with dirty operation and cache miss is replaced to write cache hit and replace and is not dirty operation, thereby has improved the hit rate of cache memory;

3, the present invention uses special-purpose replacement to write buffer, writes buffer memory and all lacks and replace with latent period when dirty with replacing thereby reduced high-speed cache.

Description of drawings

Fig. 1 is the structured flowchart of a kind of embedded processor system in the prior art;

Fig. 2 is the structural representation that uses the embedded processor system of victim cache in the prior art;

Fig. 3 is the structured flowchart of embedded processor system of the present invention;

Fig. 4 is the structured flowchart of an embodiment of embedded processor system of the present invention;

Fig. 5 is a single structural representation of writing buffer memory in the prior art;

Fig. 6 separates the structural representation of writing buffer memory in the embedded processor system of the present invention;

Fig. 7 is the read operation process flow diagram of embedded processor system of the present invention;

Fig. 8 is the write operation process flow diagram of embedded processor system of the present invention;

Fig. 9 is the typical sequential chart of MUX in the one embodiment of the invention.

Embodiment

Below with reference to drawings and Examples the present invention is further described:

In the embedded processor system with cache memory (Cache), when CPU sends when reading instruction, the steering logic of cache memory is promptly carried out the address relatively, and whether the data address that will read with decision is present in the cache memory.If be present in the cache memory, direct reading of data in the cache memory then, this incident promptly are called as and read success (hitting).Otherwise, if be not present in the cache memory, then from main system memory, fetch data to cache memory, and provide these data simultaneously to CPU, this incident promptly is called as and reads failure (disappearance).In addition, when CPU sent write command, the steering logic of cache memory was promptly carried out the address relatively, and whether the data address that will write with decision is present in the cache memory.If be present in the cache memory, then data are write in the cache memory, this incident promptly is called and writes success (hitting).Otherwise, if be not present in the cache memory, then with data by writing in the buffer writing system primary memory, this incident promptly is called and writes failure (disappearance).In employing write back the cache memory of strategy, the state that is written into the storage block of data can be configured to primary memory inconsistent, promptly dirty (Dirty); The data block consistent with primary memory then is set at clean piece (Clean).

Employing writes back in the cache memory of strategy, when the visit disappearance takes place, dirty data block will be replaced, and generally all be dirty data to be copied to just remove to read primary memory after writing buffer, the read operation of CPU is carried out in write operation in advance, reduced the stand-by period of read operation.But dirty writes the buffer CPU that need pause, waits buffer to be written to empty.In order to prevent this from occurring, embedded processor system of the present invention will be write buffer and be separated into and generally write buffer and two of buffers are write in replacement.As shown in Figure 3, embedded processor system of the present invention mainly comprises processor 302, cache memory 304, generally writes buffer 306, replaces and write buffer 308.Processor 302 carries out access by 310 pairs of primary memorys of system bus 312.Processor 302 can be central processing unit (CPU) or general micro controller, digital signal processor etc.Cache memory 304 is connected between processor 302 and the primary memory 312, replace write buffer 308 as the replacement path of cache memory 304 between cache memory 304 and primary memory 312.The general buffer 306 of writing is connected between processor 302 and the primary memory 312, but the cache writing data of storage of processor 302.Be provided with a tag directory table in the cache memory 302, the data block in the record cache memory 302 and the mapping relations of main memory data piece.During processor 302 read-write operations, but operation address for cache, in the operation address of more described processor 302 and the label (Tag) in the cache memory 304, the address that described operation address and replacement are write in the buffer 308 compares; If write buffer 308 interior addresses identical (promptly hitting), then with replacing the data block of writing in the data block replacement cache memory 304 that hits in the buffer 308 with replacement.If be dirty with the data block that is replaced in the cache memory 304, then with described the data block that is replaced is write to replace write in the buffer 308, write buffer 308 by replacement and when bus 310 is idle, write primary memory 312 again.If be clean piece with the data block that is replaced in the cache memory 304, then it is directly abandoned.After replacement is finished, the read-write operation of processor 302 will be finished in cache memory 304 in the mode of hitting.

Fig. 4 is the structured flowchart of an embodiment of embedded processor system of the present invention.As shown in Figure 4, this embedded processor system includes CPU 402, cache memory 404, generally writes buffer 406, replaces and write buffer 408, also includes processing logic unit (PU) 401, high-speed cache steering logic 403, multiplexer (MUX) 405, wrapper (Wrapper) 407.PU 401 is a combinational logic, and whether this operation is cacheable, whether buffer memory and this operation address be protected but return in the one-period of CPU valid function.High-speed cache steering logic 403 is used for handling all CPU operation requests.High-speed cache steering logic 403 is carried out the comparison of CPU operation address and cache memory 404 interior labels, carry out this operation address simultaneously and replace the comparison write addresses in the buffer 408, return cache memory 404 then and/or replace and write that buffer 406 hits and/or miss information.As shown in Figure 4, the general cacheable write data of buffer 406 storage CPU of writing is replaced and is write the dirty data that buffer 408 store cache 404 are replaced, and by the two data is write in the primary memory during 410 free time in bus again.Like this, director cache 403, generally write buffer 406 and replacement and write buffer 408 threes and all may produce transmission requests, and data have only one to the path of ahb bus 410, thereby also comprise MUX 405 in the embedded processor system of the present invention, to director cache 403, generally write buffer 406 and replace the bus transfer request write buffer 408 and carry out priority and judge, when data requests conflict, deposit the lower operation of priority ratio.Wrapper 407 is the outer embedding module of CPU, and in order to bridge joint processor bus and ahb bus 410, it is a prior art, thereby is not described in detail at this.

In general, the general length of writing buffer 406 is the requirement that 4 words can satisfy system performance, and the length that buffer 408 is write in replacement is identical with the length of cache memory 404 data blocks.When if the block length of cache memory 404 is 8 words, the present invention is will existing 8 words single to be write the general special use replacement of writing buffer 406 and one 8 word that buffer memory (as shown in Figure 5) is separated into one 4 word and writes buffer 408, as shown in Figure 6.The single buffer memory of writing of former 8 words needs 8 32 address register (A-register) and 8 32 data register (D-register).Separation of the present invention is write in the buffer structure, the general buffer 406 of writing of 4 words needs 4 32 address register and 4 32 data register, thereby the data block that cache memory 404 is discharged is continuous data, replaces and writes 1 32 bit address register of 408 needs of buffer and 8 32 bit data register.Like this, the register of the actual increase of the present invention is 1 32 bit register.If the length of cache memory 404 is greater than 8, the quantity that buffer structure also can reduce the register that needs is write in separation of the present invention.For example, when if the block length of cache memory 404 is 16 words, the single buffer memory of writing needs 16 32 address register and 16 32 data register, and separation of the present invention is write in the buffer structure, the general buffer of writing of 4 words needs 4 32 address register and 4 32 data register, buffer is write in the replacement of 16 words only needs 1 32 bit address register and 16 32 bit data register, like this, just can reduce by 7 registers.

In addition, as shown in Figure 6, the present invention writes in replacement and also is provided with transmission state position (B) in the buffer 408, this transmission state position can be represented with 1, when the transmission state position is " 0 ", represent to replace total line write transactions of writing buffer 408 and also do not begin or finished, when the transmission state position is " 1 ", represent to replace total line write transactions well afoot of writing buffer 408.

Introduce the operating process of embedded processor system of the present invention in detail below with reference to Fig. 7 and Fig. 8.

Fig. 7 is the read operation process flow diagram of embedded processor system of the present invention.As shown in Figure 7, after CPU sends read operation instruction (step 701), in the step 702, PU will carry out operation address and judge, but with determine CPU read operation address whether buffer memory, but whether Cache and this address protected.If this address is protected, PU will return error message (step 703).During the CPU read operation, but no matter PU judges whether buffer memory of this operation address, CPU can ignore this result of determination, but because whether buffer memory for the CPU read operation without any meaning.

In the step 705, if the CPU read operation can not Cache, CPU will directly read the data of this read operation address correspondence from primary memory.For fear of the data collision that write-after-read produced of writing buffer memory and read operation, be that read operation is better than write operation and carries out, and the data of this operation address correspondence also do not write in the primary memory from writing buffer memory, and steering logic will judge whether the general buffer of writing is empty (step 706).If the general buffer of writing is not for empty, the CPU that then pauses waits for emptying the general buffer (step 708) of writing that execution in step 707 then.If the general buffer of writing is for empty, direct execution in step 707 then, the ahb bus interface is carried out read operation, and the data that read this operation address correspondence in the primary memory offer CPU, finish this CPU read operation (step 717).

If PU judges that but the CPU read operation is the read operation of Cache, in the step 709, the high-speed cache steering logic compares the label in CPU read operation address and the cache memory.If certain label in this operation address and the cache memory is complementary, i.e. cache-hit, corresponding data are read out and offer CPU, and this read operation is finished.

Simultaneously, in the step 714, the address that the high-speed cache steering logic is write CPU read operation address and replacement in the buffer compares.If this operation address is identical with certain address in buffer is write in replacement, i.e. replacement is write buffer and is hit, and in the step 715, replaces to write and carries out the data block replacement between buffer and the cache memory.At this moment, be " 1 ", need to wait for and replace that total line write transactions of writing buffer is finished and the reset of transmission state position if replace the transmission state position write buffer.Be " 0 " if replace the transmission state position of writing buffer, then replace and write the data block of hitting in the buffer and will be write again in the cache memory, if the corresponding data block that is replaced totally then directly cast out in the cache memory, if be dirty, then write to replace and write in the buffer, write buffer by replacement and when bus is idle, write primary memory again.。After replacement was finished, in the step 716, CPU was with the data of the mode of hitting read operation address correspondence in cache memory, and then, (step 717) finished in CPU read operation this time.

But for the read operation of Cache, all miss if buffer is write in cache memory and replacement, CPU will directly read primary memory, will normally replace in the cache memory.At first, in the step 710, the speed buffering steering logic will judge in the cache memory whether the data block that will be replaced is dirty.If dirty, in the step 712, will write be written into to replace when buffer transmission state position is " 0 " in replacement for this dirty and write in the buffer, write buffer by replacement and when bus is idle, write primary memory again.If not dirty, in the step 711, the speed buffering steering logic will drive the data block that the ahb bus interface reads CPU operation address correspondence in the primary memory continuously under for empty situation at the general buffer of writing, if the general buffer of writing waits for earlier then that for empty the general buffer of writing empties.Then in the step 713, the data of this read operation address correspondence are write back in the data block of determining in the cache memory to be replaced, CPU reads described data in cache memory then, finishes this read operation (step 717).

Fig. 8 is the write operation process flow diagram of embedded processor system of the present invention.As shown in Figure 8, after CPU sends write operation instruction (step 801), in the step 802, PU will carry out operation address and judge, but with determine CPU read operation address whether buffer memory, but whether Cache and this address protected.If this address is protected, PU will return error message (step 803).

In the step 804, if this write operation address can not buffer memory (inevitable also can not Cache), CPU directly writes data the position (step 805) of this operation address correspondence in the primary memory by the ABH bus interface, finishes this write operation (step 818) then.

, then carry out and write caching if but PU judges write operation address buffer memory (but this operation address can not Cache).In the step 807, determine earlier whether general to write buffer full.If general to write buffer full, in the step 808, pause CPU waits for that the general data of writing in the buffer write in the primary memory, vacates the room, and execution in step 809 then.If general writing in the buffer had vacant position, in the step 809, CPU directly writes write data the general buffer of writing, and writes primary memory again by the general buffer of writing when bus is idle.So far, this time the CPU write operation is correctly finished (step 818).

In the step 806, if but PU judges CPU write operation address Cache, so, the high-speed cache steering logic compares (step 810) with the label in CPU write operation address and the cache memory.If certain label in this write operation address and the cache memory is complementary, i.e. cache-hit, CPU writes write data the data block of hitting in the cache memory.If this data block of hitting is dirty in this moment cache memory, then earlier should dirty write replacement and write in the buffer, and then the CPU write data is write this data block, finish this write operation.

Simultaneously, in the step 815, the address that the high-speed cache steering logic is write CPU write operation address and replacement in the buffer compares.If this write operation address is identical with certain address in buffer is write in replacement, i.e. replacement is write buffer and is hit, and in the step 816, replaces to write and carries out the data block replacement between buffer and the cache memory.At this moment, be " 1 ", need to wait for and replace that total line write transactions of writing buffer is finished and the reset of transmission state position if replace the transmission state position write buffer.Be " 0 " if replace the transmission state position of writing buffer, then replace and write the data block of hitting in the buffer and will be write again in the cache memory, if the corresponding data block that is replaced totally then directly cast out in the cache memory, if be dirty, then write to replace and write in the buffer, write buffer by replacement and when bus is idle, write primary memory again.。After replacement was finished, in the step 817, CPU write write data in the cache memory then in the mode of hitting, and this time CPU write operation is finished (step 818).

But for the write operation of Cache, all miss if buffer is write in cache memory and replacement, CPU will directly write data the general buffer of writing, if the general buffer of writing is full, then the general buffer of writing of wait is vacateed the room earlier.Simultaneously, carry out normal replacement operation in the cache memory:

In the step 811, the speed buffering steering logic will judge in the cache memory whether the data block that will be replaced is dirty.If dirty, in the step 812, will write be written into to replace when buffer transmission state position is " 0 " in replacement for this dirty and write in the buffer, write buffer by replacement and when bus is idle, write primary memory again.If not dirty, in the step 813, the speed buffering steering logic drives the data block that the ahb bus interface reads CPU write operation address correspondence in the primary memory continuously, then in the step 814, the data block of reading writes in the cache memory, and this time the CPU write operation is correctly finished (step 818) then.

In the above-mentioned CPU read-write operation process, the data block that is replaced in the cache memory can adopt least recently used (LRU) algorithm to determine, certainly, the present invention is not limited to this, and the present invention can also adopt advanced other existing replacement algorithms such as (FIFO) algorithm, random algorithm, repeating query algorithm, pseudo-lru algorithm earlier.

In the above-mentioned CPU read-write operation process, for guaranteeing the complete of continued operation, the read-write operation that MUX produces cache memory, generally write buffer write operation and replacement and write the buffer write operation and carry out priority and judge, priority of each operation is: the follow-up continued operation priority of any operation is the highest, next is the read-write operation that cache memory produces, secondly being the general buffer write operation of writing again, is to replace to write the buffer write operation at last.If replacement is write buffer and is hit, data wherein will be by in the cache memory that reads back again, whether at this moment replace the data write in the buffer has write primary memory and can not produce mistake, but, if data have write primary memory, its operation that writes primary memory is equivalent to useless total line write transactions, has wasted bus bandwidth.So it is minimum that the priority of buffer write operation is write in the setting replacement, can postpone like this replacing as far as possible and write the write operation of buffer at ahb bus, reduce bandwidth waste.

Operating under the conflict free situation of above-mentioned three kinds of priority, who sends operation earlier, and who is effective, and MUX drags down and generally writes buffer, replaces the READY signal of writing buffer and cache memory, finishes up to operation.

If above-mentioned three kinds of operations clash, MUX will generally write earlier buffer, replacement and write buffer and cache memory three parts's READY signal and drag down simultaneously, according to priority a side operation is wherein handled then, used the register operation that priority is low to deposit simultaneously.After finishing dealing with, a processed side's READY signal is put a high clock period, finish, detect this side simultaneously and whether also have subsequent operation to notify this side's operation.If this side does not have subsequent operation, then handle the lower operation of priority of depositing; If should also have subsequent operation in the side, other then again that this subsequent operation is interior with being deposited with register operations are carried out priority and are judged that the low operation of priority is deposited in the operation that execution priority is high.Typical sequential chart in the MUX as shown in Figure 9.

In the embedded processor system of the present invention, the latent period of the different operating request of CPU and correspondence thereof is shown in following table-1:

Numbering	The CPU operation requests	The CPU latent period
Numbering	The CPU operation requests	The CPU latent period	1	Read-write is hit	0
2	Can not the Cache read operation,	1+N+T	1	Read-write is hit	0
2	Can not the Cache read operation,	1+N+T	3	Can not Cache and can not the buffer memory write operation	1+N
4	But can not Cache but the buffer memory write operation	1+W*(a％)	3	Can not Cache and can not the buffer memory write operation	1+N
4	But can not Cache but the buffer memory write operation	1+W*(a％)	5	Read-write operation lacks but sacrifices Cache and hit and replace with dirty	1+L*(b％)+2
6	Read-write operation lacks but sacrifices Cache and hit and replace not for dirty	1+L*(b％)+1	5		1+L*(b％)+2
6		1+L*(b％)+1	7	Read-write operation disappearance and sacrifice Cache lack and replace with dirty	1+N+7S+L(b％)
8	Read-write operation disappearance and sacrifice Cache disappearance and replacement are not dirty	1+N+7*S	7		1+N+7S+L(b％)

The latent period of table-1 CPU different operating wherein, the determination cycles of 1+ for judging whether cache memory hits, T is for waiting for the general time that buffer empties of writing, W is that the general buffer of writing need wait for that the general buffer of writing vacates the time in a data space that (probability of this operation generation is assumed to be a% when full, this probability is lower), the periodicity that N consumes for the bus single operation, S once reads and writes the periodicity that consumes in the bus continued operation, L writes buffer and empties required averaging time (probability that this operation takes place is assumed to be b%, and this probability is extremely low) for waiting for replacement.

As seen from the above table, embedded processor system of the present invention reduced can not the Cache read operation stand-by period.After emptying, buffers to be written such as read operation needs that can not Cache just can carry out.Prior art adopts single when writing the buffer structure, could carry out this read operation after but the data of buffer memory (but Buffer) and replacement data are all write out in the buffers to be written such as needs, and the buffer structure is write in the separation that the present invention adopts, only need to wait for and generally write in the buffer cacheable data and empty and get final product, do not need to wait for emptying of replacement data.

In addition, embedded processor system of the present invention has increased by two kinds of operations: cache miss is replaced and to be write cache hit and to replace with dirty operation and cache miss is replaced to write cache hit and replace and is not dirty operation.These two kinds of operations can effectively reduce because of there being too many piece to be mapped in the instruction and data calls caused conflict disappearance again after same address causes this piece to be dropped, improved the hit rate of cache memory.

Embedded processor system of the present invention uses special-purpose replacement to write buffer, writes buffer memory and all lacks and replace with latent period when dirty with replacing thereby reduced high-speed cache.The present invention writes direct replacement data to replace and writes buffer, and the latent period of this operation such as also needs to add at the time that buffer to be written empties in the existing processor.

Below in the specific embodiment of the invention of introducing in conjunction with the accompanying drawings, the von Neumann structure high-speed memory buffer that cache memory adopts the instruction and data unification to deposit, but the present invention is not limited to this, the content of above-mentioned announcement according to the present invention, those skilled in the art as can be known, the present invention also is applicable to the Harvard structure cache memory that instruction is deposited with data separating.

Claims

1, a kind of data manipulation method of embedded processor system is characterized in that, comprising:

2, the data manipulation method of embedded processor system according to claim 1 is characterized in that, the described data block of writing in the data block replacement cache memory that hits in the buffer with replacement comprises:

3, the data manipulation method of embedded processor system according to claim 1 is characterized in that, described method also comprises:

If described cache-hit, the data block of hitting in the processor direct read cache memory.

4, the data manipulation method of embedded processor system according to claim 1 is characterized in that, described method also comprises:

If it is all miss that buffer is write in described cache memory and described replacement, the data of described operation address correspondence in the processor direct read primary memory, and described data are write in the cache memory.

5, the data manipulation method of embedded processor system according to claim 1 is characterized in that, described method also comprises:

When if described processor read operation address is not cacheable, empty the described general buffer of writing, directly read the data of described read operation address correspondence then from primary memory.

6, the data manipulation method of embedded processor system according to claim 1 is characterized in that, described method also comprises:

When if described processor write operation address is not cacheable, directly data are write the described general buffer of writing, when bus is idle, write primary memory again by the described general buffer of writing.

7, according to the data manipulation method of each described embedded processor system in the claim 1 to 6, it is characterized in that, if cache memory is interior to be dirty with the data block that is replaced, then with described the data block that is replaced is write to replace write in the buffer, write buffer by replacement and when bus is idle, write primary memory again; If be clean piece in the cache memory, then it is directly abandoned the data block that is replaced.

8, the data manipulation method of embedded processor system according to claim 7 is characterized in that, described method also comprises:

To the cache memory read-write operation, generally write buffer write operation and replacement and write the buffer write operation and carry out priority and judge.

9, the data manipulation method of embedded processor system according to claim 8, it is characterized in that, described priority is: the follow-up continued operation priority of any operation is the highest, next is the cache memory read-write operation, secondly being the general buffer write operation of writing again, is to replace to write the buffer write operation at last.

10, the data manipulation method of embedded processor system according to claim 1 is characterized in that, the data block that is replaced in the cache memory adopts lru algorithm, random algorithm, FIFO algorithm, repeating query algorithm or pseudo-LUR algorithm to determine.

11, a kind of embedded processor system is characterized in that, comprising:

Processor, execution command and read-write operation;

12, embedded processor system according to claim 11, it is characterized in that, also comprise the high-speed cache steering logic, the operation requests of processor controls, label in comparator processor operation address and the cache memory, the address that described operation address and replacement are write in the buffer compares simultaneously.

13, embedded processor system according to claim 12 is characterized in that, also comprises multiplexer, to director cache, generally write buffer and replacement and write the bus transfer request of buffer and carry out priority and judge.

14, embedded processor system according to claim 11 is characterized in that, described processor also comprises the processing logic unit, but is used to judge whether cacheable or buffer memory of described processor operations address.

15, embedded processor system according to claim 11 is characterized in that, the described general length of writing buffer is 4 words.

16, embedded processor system according to claim 11 is characterized in that, the length that buffer is write in described replacement is identical with the length of described high-speed buffer storage data piece.

17, according to claim 11 or 16 described embedded processor systems, it is characterized in that, described replacement is write buffer and is provided with the transmission state position, described transmission state position represents that described replacement writes total line write transactions of buffer and also do not begin or finished when " 0 ", and described transmission state position represents that described replacement writes total line write transactions well afoot of buffer during for " 1 ".