CN1851677A - Embedded processor system and its data operating method - Google Patents

Embedded processor system and its data operating method Download PDF

Info

Publication number
CN1851677A
CN1851677A CNA2005101018520A CN200510101852A CN1851677A CN 1851677 A CN1851677 A CN 1851677A CN A2005101018520 A CNA2005101018520 A CN A2005101018520A CN 200510101852 A CN200510101852 A CN 200510101852A CN 1851677 A CN1851677 A CN 1851677A
Authority
CN
China
Prior art keywords
write
buffer
data
cache
replacement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2005101018520A
Other languages
Chinese (zh)
Other versions
CN100419715C (en
Inventor
董杰明
夏晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CNB2005101018520A priority Critical patent/CN100419715C/en
Publication of CN1851677A publication Critical patent/CN1851677A/en
Application granted granted Critical
Publication of CN100419715C publication Critical patent/CN100419715C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Memory System Of A Hierarchy Structure (AREA)

Abstract

Said embedded type processor system includes processor for executing instruction and read write operation; cache connected between processor and main memory for providing highspeed data access; universal writing buffer connected between processor and main memory for storing cache write data in processor; replacing writing buffer connected between cache and main memory storing substitutional dirty data in cache and to proceed data replacing with cache after hitting. The present invention adopts separation writing cache to realize immolating cache function and raising cache hit ratio in embedded type embedded type processor, thereby raising processor literacy.

Description

Embedded processor system and data manipulation method thereof
Technical field
The present invention relates to digital processing system, more particularly, relate to a kind of embedded processor system and data manipulation method thereof.
Background technology
In the existing embedded processor system, when CPU carries out write operation to primary memory, write data into earlier and write in the buffer (Buffer),, thereby can improve the writing speed of CPU because it is very high to write the buffer access speed.Write the relevant position that buffer writes data with lower speed primary memory in due course again.
In addition, also can between the processor of embedded processor system and primary memory, embed a cache memory (Cache), the cache memory of von Neumann structure for example, as shown in Figure 1, can improve the processing power of this processor, further reduce the stand-by period of CPU, reduce processor peripheral hardware power consumption, make processor can in the monocycle, finish the read-write of most of data and instruction main memory accesses.
Cache memory is smaller with respect to primary memory, and between the lower primary memory of processor and operating speed, what preserve in it is the copy of the primary memory that using of present processor.Be that unit carries out exchanges data with the piece between cache memory and the primary memory.When CPU reading of data or instruction, simultaneously data or the instruction that reads is saved in the cache memory.According to the spatial locality and the temporal locality of program, when CPU need read identical or approximate data for the second time, can be from correspondingly obtaining data the cache stores piece.Because the speed of cache memory is much larger than the speed of primary memory, thereby the entire system performance is greatly improved.
The cache memory that processor system is commonly used mainly contains two kinds of the cache memories of the cache memory of Harvard structure and von Neumann structure.The cache memory of Harvard structure makes the separation of depositing of instruction and data, comprise instruction cache and data cache two parts, thereby the replacement of the instruction situation that can not cause the data that will read and write to be dropped, the conflict disappearance between the instruction and data promptly can not take place.In the cache memory of von Neumann structure, instruction prefetch and reading and writing data are finished in same high-speed cache, are used by the processor that has only a memory interface usually.Compare the easier conflict disappearance that causes data and instruction of the cache memory of von Neumann structure with the cache memory of Harvard structure.
General processor of the prior art (for example, the processor that PC and workstation use) uses the cache memory of Harvard structure, and uses single buffer and special-purpose victim cache (Victim Cache) structure write.As shown in Figure 2, before high-speed cache generation disappearance will be visited the primary memory of lower floor, check this victim cache,, then the data block of this victim cache and the data block of cache memory are replaced if find the data that need.In this processor, the same buffer of writing is all used in writing out of cacheable write data and write-back apoplexy involving the solid organs piece, thereby write buffer usually more than or equal to the size of the data block of cache memory, and the storage block of replacing when writing buffer buffer to be written such as needs empty.Under the opposite extreme situations, write that buffer is filled with and when the operation of more noncontinuity was arranged, the time of then waiting buffer to be written to empty may be very long, and the CPU streamline will be paused in this latent period, this has caused the decline of cpu performance.Though this design can effectively reduce the conflict disappearance, but the consideration to power consumption and area is less, be not suitable for embedded system, in embedded design, use one more than the block length write buffer memory and independently the victim cache on 1-5 road all will consume bigger area and power.
The processor system that Cadence company releases has been realized the unified buffer structure of writing.This processor system adopts the unified von Neumann structure high-speed memory buffer of instruction and data, 4 tunnel cascades, and block length is 4, the length of writing buffer is 8, and adopts write-back read-write strategy and LRU (least recently used) to replace algorithm.In this processor, write buffer length greater than block length, can reduce that data are replaced and the wait of data when writing out, but this has caused the waste of chip area, and when reading the disappearance generation, the time that buffers to be written such as needs empty can be very long, and this will cause CPU long-time pause streamline when reading to lack.This processor does not use victim cache, if when having more data and instruction block to be mapped in same group, some may can be called again again after being dropped, at this moment will cause the conflict disappearance, thereby reducing the hit rate of cache memory.
Summary of the invention
The technical problem to be solved in the present invention is, above-mentioned deficiency at prior art, a kind of embedded processor system and data manipulation method thereof are provided, in described embedded processor system, adopt to separate and write the function that buffer memory is realized victim cache, thereby improve the literacy and the hit rate of processor.
For solving the problems of the technologies described above, the technical solution adopted in the present invention is: a kind of data manipulation method of embedded processor system is provided, comprises:
In the time of label in comparator processor operation address and the cache memory, with described operation address with replace the address of writing in the buffer and compare;
If described replacement is write buffer and is hit, then with replacing the data block of writing in the data block replacement cache memory that hits in the buffer.
In method of the present invention, the described data block of writing in the data block replacement cache memory that hits in the buffer with replacement comprises:
If the transmission state position that buffer is write in described replacement is " 1 ", wait for that described replacement writes that total line write transactions of buffer is finished and with the position reset of described transmission state;
The data block of hitting in the buffer is write in described replacement read in described cache memory.
Method of the present invention also comprises: if described cache-hit, the data block of hitting in the processor direct read cache memory.
Method of the present invention also comprises: all miss if buffer is write in described cache memory and described replacement, and the data of described operation address correspondence in the processor direct read primary memory, and described data are write in the cache memory.
Method of the present invention also comprises: if when described processor read operation address is not cacheable, empty the described general buffer of writing, directly read the data of described read operation address correspondence then from primary memory.
Method of the present invention also comprises: if when described processor write operation address is not cacheable, directly data are write the described general buffer of writing, write primary memory again by the described general buffer of writing when bus is idle.
In method of the present invention, if be dirty with the data block that is replaced in the cache memory, then with described the data block that is replaced is write to replace write in the buffer, write buffer by replacement and when bus is idle, write primary memory again; If be clean piece in the cache memory, then it is directly abandoned the data block that is replaced.
Method of the present invention also comprises: to the cache memory read-write operation, generally write buffer write operation and replacement and write the buffer write operation and carry out priority and judge.
In method of the present invention, described priority is: the follow-up continued operation priority of any operation is the highest, secondly is the cache memory read-write operation, secondly is the general buffer write operation of writing again, is at last to replace to write the buffer write operation.
In method of the present invention, the data block that is replaced in the described cache memory adopts lru algorithm, random algorithm, FIFO algorithm, repeating query algorithm or pseudo-LUR algorithm to determine.
The present invention also provides a kind of embedded processor system, comprising:
Processor, execution command and read-write operation;
Cache memory is connected between processor and the primary memory, for processor provides high speed data access;
The general buffer of writing is connected between processor and the primary memory, cacheable write data in the storage of processor;
Buffer is write in replacement, is connected between cache memory and the primary memory, and the dirty data that is replaced in the store cache also carries out data with cache memory and replaces after hitting.
Embedded processor system of the present invention also comprises the high-speed cache steering logic, the operation requests of processor controls, label in comparator processor operation address and the cache memory, the address that described operation address and replacement are write in the buffer compares simultaneously.
Embedded processor system of the present invention also comprises multiplexer, to director cache, generally write buffer and replacement and write the bus transfer request of buffer and carry out priority and judge.
In embedded processor system of the present invention, described processor also comprises the processing logic unit, but is used to judge whether cacheable or buffer memory of described processor operations address.
In embedded processor system of the present invention, the described general length of writing buffer is 4 words.
In embedded processor system of the present invention, the length that buffer is write in described replacement is identical with the length of described high-speed buffer storage data piece.
In embedded processor system of the present invention, described replacement is write buffer and is provided with the transmission state position, described transmission state position represents that described replacement writes total line write transactions of buffer and also do not begin or finished when " 0 ", and described transmission state position represents that described replacement writes total line write transactions well afoot of buffer during for " 1 ".
Implement the data manipulation method of embedded processing systems of the present invention and embedded processor system, have following beneficial effect:
1, reduced the stand-by period of the read operation of not cacheable (can not Cache);
2, increased by two kinds of operations: cache miss is replaced and to be write cache hit and to replace with dirty operation and cache miss is replaced to write cache hit and replace and is not dirty operation, thereby has improved the hit rate of cache memory;
3, the present invention uses special-purpose replacement to write buffer, writes buffer memory and all lacks and replace with latent period when dirty with replacing thereby reduced high-speed cache.
Description of drawings
Fig. 1 is the structured flowchart of a kind of embedded processor system in the prior art;
Fig. 2 is the structural representation that uses the embedded processor system of victim cache in the prior art;
Fig. 3 is the structured flowchart of embedded processor system of the present invention;
Fig. 4 is the structured flowchart of an embodiment of embedded processor system of the present invention;
Fig. 5 is a single structural representation of writing buffer memory in the prior art;
Fig. 6 separates the structural representation of writing buffer memory in the embedded processor system of the present invention;
Fig. 7 is the read operation process flow diagram of embedded processor system of the present invention;
Fig. 8 is the write operation process flow diagram of embedded processor system of the present invention;
Fig. 9 is the typical sequential chart of MUX in the one embodiment of the invention.
Embodiment
Below with reference to drawings and Examples the present invention is further described:
In the embedded processor system with cache memory (Cache), when CPU sends when reading instruction, the steering logic of cache memory is promptly carried out the address relatively, and whether the data address that will read with decision is present in the cache memory.If be present in the cache memory, direct reading of data in the cache memory then, this incident promptly are called as and read success (hitting).Otherwise, if be not present in the cache memory, then from main system memory, fetch data to cache memory, and provide these data simultaneously to CPU, this incident promptly is called as and reads failure (disappearance).In addition, when CPU sent write command, the steering logic of cache memory was promptly carried out the address relatively, and whether the data address that will write with decision is present in the cache memory.If be present in the cache memory, then data are write in the cache memory, this incident promptly is called and writes success (hitting).Otherwise, if be not present in the cache memory, then with data by writing in the buffer writing system primary memory, this incident promptly is called and writes failure (disappearance).In employing write back the cache memory of strategy, the state that is written into the storage block of data can be configured to primary memory inconsistent, promptly dirty (Dirty); The data block consistent with primary memory then is set at clean piece (Clean).
Employing writes back in the cache memory of strategy, when the visit disappearance takes place, dirty data block will be replaced, and generally all be dirty data to be copied to just remove to read primary memory after writing buffer, the read operation of CPU is carried out in write operation in advance, reduced the stand-by period of read operation.But dirty writes the buffer CPU that need pause, waits buffer to be written to empty.In order to prevent this from occurring, embedded processor system of the present invention will be write buffer and be separated into and generally write buffer and two of buffers are write in replacement.As shown in Figure 3, embedded processor system of the present invention mainly comprises processor 302, cache memory 304, generally writes buffer 306, replaces and write buffer 308.Processor 302 carries out access by 310 pairs of primary memorys of system bus 312.Processor 302 can be central processing unit (CPU) or general micro controller, digital signal processor etc.Cache memory 304 is connected between processor 302 and the primary memory 312, replace write buffer 308 as the replacement path of cache memory 304 between cache memory 304 and primary memory 312.The general buffer 306 of writing is connected between processor 302 and the primary memory 312, but the cache writing data of storage of processor 302.Be provided with a tag directory table in the cache memory 302, the data block in the record cache memory 302 and the mapping relations of main memory data piece.During processor 302 read-write operations, but operation address for cache, in the operation address of more described processor 302 and the label (Tag) in the cache memory 304, the address that described operation address and replacement are write in the buffer 308 compares; If write buffer 308 interior addresses identical (promptly hitting), then with replacing the data block of writing in the data block replacement cache memory 304 that hits in the buffer 308 with replacement.If be dirty with the data block that is replaced in the cache memory 304, then with described the data block that is replaced is write to replace write in the buffer 308, write buffer 308 by replacement and when bus 310 is idle, write primary memory 312 again.If be clean piece with the data block that is replaced in the cache memory 304, then it is directly abandoned.After replacement is finished, the read-write operation of processor 302 will be finished in cache memory 304 in the mode of hitting.
Fig. 4 is the structured flowchart of an embodiment of embedded processor system of the present invention.As shown in Figure 4, this embedded processor system includes CPU 402, cache memory 404, generally writes buffer 406, replaces and write buffer 408, also includes processing logic unit (PU) 401, high-speed cache steering logic 403, multiplexer (MUX) 405, wrapper (Wrapper) 407.PU 401 is a combinational logic, and whether this operation is cacheable, whether buffer memory and this operation address be protected but return in the one-period of CPU valid function.High-speed cache steering logic 403 is used for handling all CPU operation requests.High-speed cache steering logic 403 is carried out the comparison of CPU operation address and cache memory 404 interior labels, carry out this operation address simultaneously and replace the comparison write addresses in the buffer 408, return cache memory 404 then and/or replace and write that buffer 406 hits and/or miss information.As shown in Figure 4, the general cacheable write data of buffer 406 storage CPU of writing is replaced and is write the dirty data that buffer 408 store cache 404 are replaced, and by the two data is write in the primary memory during 410 free time in bus again.Like this, director cache 403, generally write buffer 406 and replacement and write buffer 408 threes and all may produce transmission requests, and data have only one to the path of ahb bus 410, thereby also comprise MUX 405 in the embedded processor system of the present invention, to director cache 403, generally write buffer 406 and replace the bus transfer request write buffer 408 and carry out priority and judge, when data requests conflict, deposit the lower operation of priority ratio.Wrapper 407 is the outer embedding module of CPU, and in order to bridge joint processor bus and ahb bus 410, it is a prior art, thereby is not described in detail at this.
In general, the general length of writing buffer 406 is the requirement that 4 words can satisfy system performance, and the length that buffer 408 is write in replacement is identical with the length of cache memory 404 data blocks.When if the block length of cache memory 404 is 8 words, the present invention is will existing 8 words single to be write the general special use replacement of writing buffer 406 and one 8 word that buffer memory (as shown in Figure 5) is separated into one 4 word and writes buffer 408, as shown in Figure 6.The single buffer memory of writing of former 8 words needs 8 32 address register (A-register) and 8 32 data register (D-register).Separation of the present invention is write in the buffer structure, the general buffer 406 of writing of 4 words needs 4 32 address register and 4 32 data register, thereby the data block that cache memory 404 is discharged is continuous data, replaces and writes 1 32 bit address register of 408 needs of buffer and 8 32 bit data register.Like this, the register of the actual increase of the present invention is 1 32 bit register.If the length of cache memory 404 is greater than 8, the quantity that buffer structure also can reduce the register that needs is write in separation of the present invention.For example, when if the block length of cache memory 404 is 16 words, the single buffer memory of writing needs 16 32 address register and 16 32 data register, and separation of the present invention is write in the buffer structure, the general buffer of writing of 4 words needs 4 32 address register and 4 32 data register, buffer is write in the replacement of 16 words only needs 1 32 bit address register and 16 32 bit data register, like this, just can reduce by 7 registers.
In addition, as shown in Figure 6, the present invention writes in replacement and also is provided with transmission state position (B) in the buffer 408, this transmission state position can be represented with 1, when the transmission state position is " 0 ", represent to replace total line write transactions of writing buffer 408 and also do not begin or finished, when the transmission state position is " 1 ", represent to replace total line write transactions well afoot of writing buffer 408.
Introduce the operating process of embedded processor system of the present invention in detail below with reference to Fig. 7 and Fig. 8.
Fig. 7 is the read operation process flow diagram of embedded processor system of the present invention.As shown in Figure 7, after CPU sends read operation instruction (step 701), in the step 702, PU will carry out operation address and judge, but with determine CPU read operation address whether buffer memory, but whether Cache and this address protected.If this address is protected, PU will return error message (step 703).During the CPU read operation, but no matter PU judges whether buffer memory of this operation address, CPU can ignore this result of determination, but because whether buffer memory for the CPU read operation without any meaning.
In the step 705, if the CPU read operation can not Cache, CPU will directly read the data of this read operation address correspondence from primary memory.For fear of the data collision that write-after-read produced of writing buffer memory and read operation, be that read operation is better than write operation and carries out, and the data of this operation address correspondence also do not write in the primary memory from writing buffer memory, and steering logic will judge whether the general buffer of writing is empty (step 706).If the general buffer of writing is not for empty, the CPU that then pauses waits for emptying the general buffer (step 708) of writing that execution in step 707 then.If the general buffer of writing is for empty, direct execution in step 707 then, the ahb bus interface is carried out read operation, and the data that read this operation address correspondence in the primary memory offer CPU, finish this CPU read operation (step 717).
If PU judges that but the CPU read operation is the read operation of Cache, in the step 709, the high-speed cache steering logic compares the label in CPU read operation address and the cache memory.If certain label in this operation address and the cache memory is complementary, i.e. cache-hit, corresponding data are read out and offer CPU, and this read operation is finished.
Simultaneously, in the step 714, the address that the high-speed cache steering logic is write CPU read operation address and replacement in the buffer compares.If this operation address is identical with certain address in buffer is write in replacement, i.e. replacement is write buffer and is hit, and in the step 715, replaces to write and carries out the data block replacement between buffer and the cache memory.At this moment, be " 1 ", need to wait for and replace that total line write transactions of writing buffer is finished and the reset of transmission state position if replace the transmission state position write buffer.Be " 0 " if replace the transmission state position of writing buffer, then replace and write the data block of hitting in the buffer and will be write again in the cache memory, if the corresponding data block that is replaced totally then directly cast out in the cache memory, if be dirty, then write to replace and write in the buffer, write buffer by replacement and when bus is idle, write primary memory again.。After replacement was finished, in the step 716, CPU was with the data of the mode of hitting read operation address correspondence in cache memory, and then, (step 717) finished in CPU read operation this time.
But for the read operation of Cache, all miss if buffer is write in cache memory and replacement, CPU will directly read primary memory, will normally replace in the cache memory.At first, in the step 710, the speed buffering steering logic will judge in the cache memory whether the data block that will be replaced is dirty.If dirty, in the step 712, will write be written into to replace when buffer transmission state position is " 0 " in replacement for this dirty and write in the buffer, write buffer by replacement and when bus is idle, write primary memory again.If not dirty, in the step 711, the speed buffering steering logic will drive the data block that the ahb bus interface reads CPU operation address correspondence in the primary memory continuously under for empty situation at the general buffer of writing, if the general buffer of writing waits for earlier then that for empty the general buffer of writing empties.Then in the step 713, the data of this read operation address correspondence are write back in the data block of determining in the cache memory to be replaced, CPU reads described data in cache memory then, finishes this read operation (step 717).
Fig. 8 is the write operation process flow diagram of embedded processor system of the present invention.As shown in Figure 8, after CPU sends write operation instruction (step 801), in the step 802, PU will carry out operation address and judge, but with determine CPU read operation address whether buffer memory, but whether Cache and this address protected.If this address is protected, PU will return error message (step 803).
In the step 804, if this write operation address can not buffer memory (inevitable also can not Cache), CPU directly writes data the position (step 805) of this operation address correspondence in the primary memory by the ABH bus interface, finishes this write operation (step 818) then.
, then carry out and write caching if but PU judges write operation address buffer memory (but this operation address can not Cache).In the step 807, determine earlier whether general to write buffer full.If general to write buffer full, in the step 808, pause CPU waits for that the general data of writing in the buffer write in the primary memory, vacates the room, and execution in step 809 then.If general writing in the buffer had vacant position, in the step 809, CPU directly writes write data the general buffer of writing, and writes primary memory again by the general buffer of writing when bus is idle.So far, this time the CPU write operation is correctly finished (step 818).
In the step 806, if but PU judges CPU write operation address Cache, so, the high-speed cache steering logic compares (step 810) with the label in CPU write operation address and the cache memory.If certain label in this write operation address and the cache memory is complementary, i.e. cache-hit, CPU writes write data the data block of hitting in the cache memory.If this data block of hitting is dirty in this moment cache memory, then earlier should dirty write replacement and write in the buffer, and then the CPU write data is write this data block, finish this write operation.
Simultaneously, in the step 815, the address that the high-speed cache steering logic is write CPU write operation address and replacement in the buffer compares.If this write operation address is identical with certain address in buffer is write in replacement, i.e. replacement is write buffer and is hit, and in the step 816, replaces to write and carries out the data block replacement between buffer and the cache memory.At this moment, be " 1 ", need to wait for and replace that total line write transactions of writing buffer is finished and the reset of transmission state position if replace the transmission state position write buffer.Be " 0 " if replace the transmission state position of writing buffer, then replace and write the data block of hitting in the buffer and will be write again in the cache memory, if the corresponding data block that is replaced totally then directly cast out in the cache memory, if be dirty, then write to replace and write in the buffer, write buffer by replacement and when bus is idle, write primary memory again.。After replacement was finished, in the step 817, CPU write write data in the cache memory then in the mode of hitting, and this time CPU write operation is finished (step 818).
But for the write operation of Cache, all miss if buffer is write in cache memory and replacement, CPU will directly write data the general buffer of writing, if the general buffer of writing is full, then the general buffer of writing of wait is vacateed the room earlier.Simultaneously, carry out normal replacement operation in the cache memory:
In the step 811, the speed buffering steering logic will judge in the cache memory whether the data block that will be replaced is dirty.If dirty, in the step 812, will write be written into to replace when buffer transmission state position is " 0 " in replacement for this dirty and write in the buffer, write buffer by replacement and when bus is idle, write primary memory again.If not dirty, in the step 813, the speed buffering steering logic drives the data block that the ahb bus interface reads CPU write operation address correspondence in the primary memory continuously, then in the step 814, the data block of reading writes in the cache memory, and this time the CPU write operation is correctly finished (step 818) then.
In the above-mentioned CPU read-write operation process, the data block that is replaced in the cache memory can adopt least recently used (LRU) algorithm to determine, certainly, the present invention is not limited to this, and the present invention can also adopt advanced other existing replacement algorithms such as (FIFO) algorithm, random algorithm, repeating query algorithm, pseudo-lru algorithm earlier.
In the above-mentioned CPU read-write operation process, for guaranteeing the complete of continued operation, the read-write operation that MUX produces cache memory, generally write buffer write operation and replacement and write the buffer write operation and carry out priority and judge, priority of each operation is: the follow-up continued operation priority of any operation is the highest, next is the read-write operation that cache memory produces, secondly being the general buffer write operation of writing again, is to replace to write the buffer write operation at last.If replacement is write buffer and is hit, data wherein will be by in the cache memory that reads back again, whether at this moment replace the data write in the buffer has write primary memory and can not produce mistake, but, if data have write primary memory, its operation that writes primary memory is equivalent to useless total line write transactions, has wasted bus bandwidth.So it is minimum that the priority of buffer write operation is write in the setting replacement, can postpone like this replacing as far as possible and write the write operation of buffer at ahb bus, reduce bandwidth waste.
Operating under the conflict free situation of above-mentioned three kinds of priority, who sends operation earlier, and who is effective, and MUX drags down and generally writes buffer, replaces the READY signal of writing buffer and cache memory, finishes up to operation.
If above-mentioned three kinds of operations clash, MUX will generally write earlier buffer, replacement and write buffer and cache memory three parts's READY signal and drag down simultaneously, according to priority a side operation is wherein handled then, used the register operation that priority is low to deposit simultaneously.After finishing dealing with, a processed side's READY signal is put a high clock period, finish, detect this side simultaneously and whether also have subsequent operation to notify this side's operation.If this side does not have subsequent operation, then handle the lower operation of priority of depositing; If should also have subsequent operation in the side, other then again that this subsequent operation is interior with being deposited with register operations are carried out priority and are judged that the low operation of priority is deposited in the operation that execution priority is high.Typical sequential chart in the MUX as shown in Figure 9.
In the embedded processor system of the present invention, the latent period of the different operating request of CPU and correspondence thereof is shown in following table-1:
Numbering The CPU operation requests The CPU latent period
1 Read-write is hit 0
2 Can not the Cache read operation, 1+N+T
3 Can not Cache and can not the buffer memory write operation 1+N
4 But can not Cache but the buffer memory write operation 1+W*(a%)
5 Read-write operation lacks but sacrifices Cache and hit and replace with dirty 1+L*(b%)+2
6 Read-write operation lacks but sacrifices Cache and hit and replace not for dirty 1+L*(b%)+1
7 Read-write operation disappearance and sacrifice Cache lack and replace with dirty 1+N+7*S+L*(b%)
8 Read-write operation disappearance and sacrifice Cache disappearance and replacement are not dirty 1+N+7*S
The latent period of table-1 CPU different operating wherein, the determination cycles of 1+ for judging whether cache memory hits, T is for waiting for the general time that buffer empties of writing, W is that the general buffer of writing need wait for that the general buffer of writing vacates the time in a data space that (probability of this operation generation is assumed to be a% when full, this probability is lower), the periodicity that N consumes for the bus single operation, S once reads and writes the periodicity that consumes in the bus continued operation, L writes buffer and empties required averaging time (probability that this operation takes place is assumed to be b%, and this probability is extremely low) for waiting for replacement.
As seen from the above table, embedded processor system of the present invention reduced can not the Cache read operation stand-by period.After emptying, buffers to be written such as read operation needs that can not Cache just can carry out.Prior art adopts single when writing the buffer structure, could carry out this read operation after but the data of buffer memory (but Buffer) and replacement data are all write out in the buffers to be written such as needs, and the buffer structure is write in the separation that the present invention adopts, only need to wait for and generally write in the buffer cacheable data and empty and get final product, do not need to wait for emptying of replacement data.
In addition, embedded processor system of the present invention has increased by two kinds of operations: cache miss is replaced and to be write cache hit and to replace with dirty operation and cache miss is replaced to write cache hit and replace and is not dirty operation.These two kinds of operations can effectively reduce because of there being too many piece to be mapped in the instruction and data calls caused conflict disappearance again after same address causes this piece to be dropped, improved the hit rate of cache memory.
Embedded processor system of the present invention uses special-purpose replacement to write buffer, writes buffer memory and all lacks and replace with latent period when dirty with replacing thereby reduced high-speed cache.The present invention writes direct replacement data to replace and writes buffer, and the latent period of this operation such as also needs to add at the time that buffer to be written empties in the existing processor.
Below in the specific embodiment of the invention of introducing in conjunction with the accompanying drawings, the von Neumann structure high-speed memory buffer that cache memory adopts the instruction and data unification to deposit, but the present invention is not limited to this, the content of above-mentioned announcement according to the present invention, those skilled in the art as can be known, the present invention also is applicable to the Harvard structure cache memory that instruction is deposited with data separating.

Claims (17)

1, a kind of data manipulation method of embedded processor system is characterized in that, comprising:
In the time of label in comparator processor operation address and the cache memory, with described operation address with replace the address of writing in the buffer and compare;
If described replacement is write buffer and is hit, then with replacing the data block of writing in the data block replacement cache memory that hits in the buffer.
2, the data manipulation method of embedded processor system according to claim 1 is characterized in that, the described data block of writing in the data block replacement cache memory that hits in the buffer with replacement comprises:
If the transmission state position that buffer is write in described replacement is " 1 ", wait for that described replacement writes that total line write transactions of buffer is finished and with the position reset of described transmission state;
The data block of hitting in the buffer is write in described replacement read in described cache memory.
3, the data manipulation method of embedded processor system according to claim 1 is characterized in that, described method also comprises:
If described cache-hit, the data block of hitting in the processor direct read cache memory.
4, the data manipulation method of embedded processor system according to claim 1 is characterized in that, described method also comprises:
If it is all miss that buffer is write in described cache memory and described replacement, the data of described operation address correspondence in the processor direct read primary memory, and described data are write in the cache memory.
5, the data manipulation method of embedded processor system according to claim 1 is characterized in that, described method also comprises:
When if described processor read operation address is not cacheable, empty the described general buffer of writing, directly read the data of described read operation address correspondence then from primary memory.
6, the data manipulation method of embedded processor system according to claim 1 is characterized in that, described method also comprises:
When if described processor write operation address is not cacheable, directly data are write the described general buffer of writing, when bus is idle, write primary memory again by the described general buffer of writing.
7, according to the data manipulation method of each described embedded processor system in the claim 1 to 6, it is characterized in that, if cache memory is interior to be dirty with the data block that is replaced, then with described the data block that is replaced is write to replace write in the buffer, write buffer by replacement and when bus is idle, write primary memory again; If be clean piece in the cache memory, then it is directly abandoned the data block that is replaced.
8, the data manipulation method of embedded processor system according to claim 7 is characterized in that, described method also comprises:
To the cache memory read-write operation, generally write buffer write operation and replacement and write the buffer write operation and carry out priority and judge.
9, the data manipulation method of embedded processor system according to claim 8, it is characterized in that, described priority is: the follow-up continued operation priority of any operation is the highest, next is the cache memory read-write operation, secondly being the general buffer write operation of writing again, is to replace to write the buffer write operation at last.
10, the data manipulation method of embedded processor system according to claim 1 is characterized in that, the data block that is replaced in the cache memory adopts lru algorithm, random algorithm, FIFO algorithm, repeating query algorithm or pseudo-LUR algorithm to determine.
11, a kind of embedded processor system is characterized in that, comprising:
Processor, execution command and read-write operation;
Cache memory is connected between processor and the primary memory, for processor provides high speed data access;
The general buffer of writing is connected between processor and the primary memory, cacheable write data in the storage of processor;
Buffer is write in replacement, is connected between cache memory and the primary memory, and the dirty data that is replaced in the store cache also carries out data with cache memory and replaces after hitting.
12, embedded processor system according to claim 11, it is characterized in that, also comprise the high-speed cache steering logic, the operation requests of processor controls, label in comparator processor operation address and the cache memory, the address that described operation address and replacement are write in the buffer compares simultaneously.
13, embedded processor system according to claim 12 is characterized in that, also comprises multiplexer, to director cache, generally write buffer and replacement and write the bus transfer request of buffer and carry out priority and judge.
14, embedded processor system according to claim 11 is characterized in that, described processor also comprises the processing logic unit, but is used to judge whether cacheable or buffer memory of described processor operations address.
15, embedded processor system according to claim 11 is characterized in that, the described general length of writing buffer is 4 words.
16, embedded processor system according to claim 11 is characterized in that, the length that buffer is write in described replacement is identical with the length of described high-speed buffer storage data piece.
17, according to claim 11 or 16 described embedded processor systems, it is characterized in that, described replacement is write buffer and is provided with the transmission state position, described transmission state position represents that described replacement writes total line write transactions of buffer and also do not begin or finished when " 0 ", and described transmission state position represents that described replacement writes total line write transactions well afoot of buffer during for " 1 ".
CNB2005101018520A 2005-11-25 2005-11-25 Embedded processor system and its data operating method Active CN100419715C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2005101018520A CN100419715C (en) 2005-11-25 2005-11-25 Embedded processor system and its data operating method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2005101018520A CN100419715C (en) 2005-11-25 2005-11-25 Embedded processor system and its data operating method

Publications (2)

Publication Number Publication Date
CN1851677A true CN1851677A (en) 2006-10-25
CN100419715C CN100419715C (en) 2008-09-17

Family

ID=37133156

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2005101018520A Active CN100419715C (en) 2005-11-25 2005-11-25 Embedded processor system and its data operating method

Country Status (1)

Country Link
CN (1) CN100419715C (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102103549A (en) * 2009-12-18 2011-06-22 上海华虹集成电路有限责任公司 Method for replacing cache
WO2012083754A1 (en) * 2011-10-20 2012-06-28 华为技术有限公司 Method and device for processing dirty data
CN102646071A (en) * 2012-02-17 2012-08-22 中国科学院微电子研究所 Device and method for executing write hit operation of high-speed buffer memory at single period
CN103548005A (en) * 2011-12-13 2014-01-29 华为技术有限公司 Method and device for replacing cache objects
CN104106061A (en) * 2012-02-08 2014-10-15 国际商业机器公司 Forward progress mechanism for stores in the presence of load contention in a system favoring loads
CN104169892A (en) * 2012-03-28 2014-11-26 华为技术有限公司 Concurrently accessed set associative overflow cache
CN108132758A (en) * 2018-01-10 2018-06-08 湖南国科微电子股份有限公司 A kind of Buffer management methods, system and its application
CN108874517A (en) * 2018-04-19 2018-11-23 华侨大学 The stand-by system availability of fixed priority divides energy consumption optimization method
CN109716308A (en) * 2016-09-29 2019-05-03 高通股份有限公司 For reducing the cache memory clock generation circuit of power consumption and reading error in cache memory
CN112068945A (en) * 2020-09-16 2020-12-11 厦门势拓御能科技有限公司 Priority reversal method in optimized embedded system
CN112612727A (en) * 2020-12-08 2021-04-06 海光信息技术股份有限公司 Cache line replacement method and device and electronic equipment
CN114528230A (en) * 2022-04-21 2022-05-24 飞腾信息技术有限公司 Cache data processing method and device and electronic equipment

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6425058B1 (en) * 1999-09-07 2002-07-23 International Business Machines Corporation Cache management mechanism to enable information-type dependent cache policies
US7024545B1 (en) * 2001-07-24 2006-04-04 Advanced Micro Devices, Inc. Hybrid branch prediction device with two levels of branch prediction cache
US7120748B2 (en) * 2003-09-04 2006-10-10 International Business Machines Corporation Software-controlled cache set management
US7136967B2 (en) * 2003-12-09 2006-11-14 International Business Machinces Corporation Multi-level cache having overlapping congruence groups of associativity sets in different cache levels

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102103549A (en) * 2009-12-18 2011-06-22 上海华虹集成电路有限责任公司 Method for replacing cache
WO2012083754A1 (en) * 2011-10-20 2012-06-28 华为技术有限公司 Method and device for processing dirty data
CN102725752A (en) * 2011-10-20 2012-10-10 华为技术有限公司 Method and device for processing dirty data
CN102725752B (en) * 2011-10-20 2014-07-16 华为技术有限公司 Method and device for processing dirty data
CN103548005B (en) * 2011-12-13 2016-03-30 华为技术有限公司 Replace the method and apparatus of cache object
CN103548005A (en) * 2011-12-13 2014-01-29 华为技术有限公司 Method and device for replacing cache objects
CN104106061A (en) * 2012-02-08 2014-10-15 国际商业机器公司 Forward progress mechanism for stores in the presence of load contention in a system favoring loads
CN102646071A (en) * 2012-02-17 2012-08-22 中国科学院微电子研究所 Device and method for executing write hit operation of high-speed buffer memory at single period
CN102646071B (en) * 2012-02-17 2014-07-30 中国科学院微电子研究所 Device and method for executing write hit operation of high-speed buffer memory at single period
CN104169892A (en) * 2012-03-28 2014-11-26 华为技术有限公司 Concurrently accessed set associative overflow cache
CN109716308A (en) * 2016-09-29 2019-05-03 高通股份有限公司 For reducing the cache memory clock generation circuit of power consumption and reading error in cache memory
CN108132758A (en) * 2018-01-10 2018-06-08 湖南国科微电子股份有限公司 A kind of Buffer management methods, system and its application
CN108874517A (en) * 2018-04-19 2018-11-23 华侨大学 The stand-by system availability of fixed priority divides energy consumption optimization method
CN108874517B (en) * 2018-04-19 2021-11-02 华侨大学 Method for optimizing utilization rate division energy consumption of standby system with fixed priority
CN112068945A (en) * 2020-09-16 2020-12-11 厦门势拓御能科技有限公司 Priority reversal method in optimized embedded system
CN112068945B (en) * 2020-09-16 2024-05-31 厦门势拓御能科技有限公司 Priority reversing method in optimized embedded system
CN112612727A (en) * 2020-12-08 2021-04-06 海光信息技术股份有限公司 Cache line replacement method and device and electronic equipment
CN114528230A (en) * 2022-04-21 2022-05-24 飞腾信息技术有限公司 Cache data processing method and device and electronic equipment

Also Published As

Publication number Publication date
CN100419715C (en) 2008-09-17

Similar Documents

Publication Publication Date Title
CN1851677A (en) Embedded processor system and its data operating method
EP2430551B1 (en) Cache coherent support for flash in a memory hierarchy
US9208084B2 (en) Extended main memory hierarchy having flash memory for page fault handling
CN103714015B (en) Method device and system for reducing back invalidation transactions from a snoop filter
CN1851673A (en) Processor system and its data operating method
US20210406170A1 (en) Flash-Based Coprocessor
CN102576333B (en) Data cache in nonvolatile memory
US8370533B2 (en) Executing flash storage access requests
WO2020176795A1 (en) Use of outstanding command queues for separate read-only cache and write-read cache in a memory sub-system
TW201903612A (en) Memory module and method for operating memory module
CN1820257A (en) Microprocessor including a first level cache and a second level cache having different cache line sizes
JPH04233641A (en) Method and apparatus for data pre-fetch
US20080301371A1 (en) Memory Cache Control Arrangement and a Method of Performing a Coherency Operation Therefor
CN107589908B (en) Merging method based on non-aligned updated data in solid-state disk cache system
CN1425154A (en) Cache line flush micro-architectural implementation method ans system
US20130326145A1 (en) Methods and apparatus for efficient communication between caches in hierarchical caching design
US20220179798A1 (en) Separate read-only cache and write-read cache in a memory sub-system
US8019939B2 (en) Detecting data mining processes to increase caching efficiency
CN101038567A (en) Method, system, apparatus for performing cacheline polling operation
CN116134475A (en) Computer memory expansion device and method of operating the same
ITRM20060046A1 (en) METHOD AND SYSTEM FOR THE USE OF CACHE BY MEANS OF PREFETCH REQUEST LIMITATIONS
WO2020176828A1 (en) Priority scheduling in queues to access cache data in a memory sub-system
JPH10214226A (en) Method and system for strengthening memory performance of processor by removing old line of second level cache
JP3326189B2 (en) Computer memory system and data element cleaning method
JPH04250543A (en) Computer memory system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant