CN109697084B - Fast access memory architecture for time division multiplexed pipelined processor - Google Patents

Fast access memory architecture for time division multiplexed pipelined processor Download PDF

Info

Publication number
CN109697084B
CN109697084B CN201710977323.XA CN201710977323A CN109697084B CN 109697084 B CN109697084 B CN 109697084B CN 201710977323 A CN201710977323 A CN 201710977323A CN 109697084 B CN109697084 B CN 109697084B
Authority
CN
China
Prior art keywords
memory
thread
different
memories
access memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710977323.XA
Other languages
Chinese (zh)
Other versions
CN109697084A (en
Inventor
刘欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201710977323.XA priority Critical patent/CN109697084B/en
Publication of CN109697084A publication Critical patent/CN109697084A/en
Application granted granted Critical
Publication of CN109697084B publication Critical patent/CN109697084B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
    • G06F9/3869Implementation aspects, e.g. pipeline latches; pipeline synchronisation and clocking

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Static Random-Access Memory (AREA)

Abstract

A fast access memory architecture for a time division multiplexed pipeline processor characterized by: the instruction of the processor is completed by N pipeline clock cycles, and each stage of pipeline parts of the pipeline execute the operation of different threads in the same pipeline clock cycle; all threads multiplexing the pipeline are divided into a plurality of thread groups according to the time law of cut-in of the threads into the pipeline, so that the threads in each thread group cannot appear on the pipeline at the same time, all threads of each thread group share the same or a plurality of quick access memories, and a feasible storage entity is a single-port static random access memory (SP-SRAM).

Description

Fast access memory architecture for time division multiplexed pipelined processor
Technical Field
The present invention relates to the field of processor architectures, and in particular to an organization of fine-grained multithreading registers and memory.
Background
In the current fine-grained multithread processor, the realization of a register generally uses a register file with fixed capacity, the invention can enable threads in the same group to share a quick access memory by grouping the threads according to time characteristics, the access memory can not only replace the register file, but also provide more functions, and the capacity and the bandwidth maximization efficiency use of the quick access memory can be realized by dynamically adjusting the size of each quick access memory.
Disclosure of Invention
A fast access memory architecture for a time division multiplexed pipeline processor characterized by: the instruction of the processor is completed by N pipeline clock cycles, and each stage of pipeline parts of the pipeline execute the operation of different threads in the same pipeline clock cycle; all threads multiplexing the pipeline are divided into a plurality of thread groups according to the time law of the threads switched into the pipeline, so that the threads in each thread group cannot appear on the pipeline at the same time, and all threads of each thread group share the same or a plurality of quick access memories.
A typical grouping method is: because the number of the pipeline stages of the processor is N, one instruction of one thread can be completed by N pipeline clock cycles, the maximum value of the instruction execution speed of any thread is 1/N of the pipeline clock, threads with the same remainder after the pipeline clock count value of the moment when the thread is switched into the pipeline is divided by N cannot simultaneously appear on the pipeline, and therefore the access of the threads to the quick access memory never simultaneously appears. The threads that never occur simultaneously on the pipeline are grouped together and they may share one or more memory stores.
The most direct application of the quick access memory is to replace a register file of a traditional processor, a pipeline component of the processor can directly read and write the quick access memory to complete all functions of a register of the processor, and threads can access all storage spaces of the quick access memory connected with a thread group to which the threads belong, so that the number of the registers can be flexibly distributed, and the threads in the same thread group can exchange information directly and quickly. Meanwhile, any part or all of the memory space of the quick access memory can only belong to one thread, can be shared by a plurality of threads, and can be one part or even all of the main memory address space.
By the method for improving the read-write clock of the quick access memory, for example, the quick access memory works in 2 times of a pipeline clock, so that a larger access bandwidth can be provided for threads, one thread can read and write more data in one pipeline clock, or the thread can be used by threads of 2 thread groups in one pipeline clock, and can also be accessed by other peripheral equipment.
Generally, a thread does not access the quick access memory in each pipeline component on the pipeline, so different thread groups can share one quick access memory through time division multiplexing, and the bandwidth of a read-write port of the quick access memory can be utilized by hundreds of percent under the best condition.
The instruction sets supported by threads within different thread groups may be different, for example: instructions supported by threads of a simple functional thread group only need to access the register 3 times, while threads of some complex thread groups need to access the register 8 times, so that the bandwidth requirements of the threads of each thread group on the register are different, different bandwidth or time slices for accessing the quick access memory by different thread groups can be given to different thread groups according to different bandwidth requirements, and one or more threads can monopolize the quick access memory when necessary.
The registers actually needed by each thread are different, so the capacity of the final quick access memory is different, and the final quick access memory formed by combining a plurality of small memories with the same or different capacities can utilize the memory more effectively.
When the required register capacity is smaller than the available sub-memory capacity, the thread temporary information storage area or the thread information exchange area or a cache (cache) or a main memory or other use memories can be used.
In general, the area and power consumption of a single-port memory are superior to those of a multi-port memory, and in the invention, the circuit area can be saved by using the single-port memory, particularly a Static Random Access Memory (SRAM) using a single port on the premise of meeting the access bandwidth.
Drawings
FIG. 1 is a basic block diagram of a quick access memory
FIG. 2 is a diagram of the structure of the quick access memory interclass multiplexing
FIG. 3 is a diagram of a fast access memory unbalanced time division multiplexing architecture
FIG. 4 is a fast access memory internal space partitioning structure
FIG. 5 is a block diagram of a fast access memory bank
FIG. 6 is a basic timing diagram of a fast access memory
FIG. 7 is a timing diagram of fast access memory inter-bank multiplexing
FIG. 8 is a timing diagram of fast access memory unbalanced time division multiplexing
Detailed Description
In general, in the following description, specific details are set forth in order to provide a thorough understanding of the disclosed embodiments. However, it will be understood by those skilled in the art that the embodiments may be practiced without these specific details. In particular with regard to functional components such as pipelines and memories, multiplexers, etc., the methods of the embodiments are specific techniques chosen for illustrative purposes, and other well-known methods may be selected for these embodiments.
First embodiment
FIG. 1 illustrates a fast access memory architecture and operation of a processor, comprising:
an instruction 1 fetching unit 101 located in the 1 st stage pipeline and operating in a pipeline clock;
an instruction 2 component 102 in the stage 2 pipeline operating at the pipeline clock;
a decode 1 unit 103 in the 3 rd stage pipeline, operating at the pipeline clock;
a decode 2 unit 104 in the 4 th stage pipeline, operating at the pipeline clock;
an execution 1 unit 105 in the 5 th stage pipeline, operating at the pipeline clock;
an execution 2 component 106 in the 6 th stage pipeline, operating at the pipeline clock;
a write back 1 component 107 in the 7 th stage pipeline, operating at the pipeline clock;
a write back 2 component 108 in the 8 th stage pipeline, operating at the pipeline clock;
the fast access memory 0 component 109 is a single port Static Random Access Memory (SRAM) operating at the pipeline clock frequency and coupled to the pipeline through the mux selector 117;
the fast access memory 1 component 110 is a single port Static Random Access Memory (SRAM) operating at the pipeline clock frequency and coupled to the pipeline through the mux selector 117;
the fast access memory 2 component 111 is a single port Static Random Access Memory (SRAM) operating at the pipeline clock frequency and coupled to the pipeline through the mux selector 117;
the fast access memory 3 component 112 is a single port Static Random Access Memory (SRAM) operating at the pipeline clock frequency and coupled to the pipeline through a mux selector 117;
the fast access memory 4 component 113 is a single port Static Random Access Memory (SRAM) operating at the pipeline clock frequency and coupled to the pipeline through a mux selector 117;
the fast access memory 5 component 114 is a single port Static Random Access Memory (SRAM) operating at the pipeline clock frequency and coupled to the pipeline through a mux selector 117;
the fast access memory 6 component 115 is a single port Static Random Access Memory (SRAM) operating at the pipeline clock frequency and coupled to the pipeline through the mux selector 117;
the fast access memory 7 component 116 is a single port Static Random Access Memory (SRAM) operating at the pipeline clock frequency and coupled to the pipeline through a mux selector 117.
The pipeline accesses the fast access memory at and only in decode 1 unit 103 at stage 3 pipeline, decode 2 unit 104 at stage 4 pipeline, write back 1 unit 107 at stage 7 pipeline, and write back 2 unit 108 at stage 8 pipeline. The pipeline component locates the particular fast access memory by the thread group to which the currently running thread belongs and then reads and writes the selected fast access memory by the mux selector 117.
Fig. 2 is a timing chart corresponding to the first embodiment. For the fast access memory 0 unit 109, the operating sequence on the pipeline clock is that the 1 st cycle is idle, the 2 nd cycle is idle, the 3 rd cycle is accessed by the thread group 0, the 4 th cycle is accessed by the thread group 0, the 5 th cycle is idle, the 6 th cycle is idle, the 7 th cycle is accessed by the thread group 0, and the 8 th cycle is accessed by the thread group 0. The rest of the fast access memory is similar to the figure.
Second embodiment
FIG. 2 illustrates a fast access memory thread group multiplexing architecture and form of operation for a processor, comprising:
an instruction 1 fetch unit 201 in the 1 st stage pipeline, operating at the pipeline clock;
an instruction 2 fetch unit 202 in the stage 2 pipeline, operating at the pipeline clock;
a decode 1 unit 203 in the 3 rd stage pipeline, operating at the pipeline clock;
a decode 2 unit 204 in the 4 th stage pipeline, operating at the pipeline clock;
execution 1 unit 205 in stage 5 pipeline, operating at pipeline clock;
an execution 2 unit 206 in the 6 th stage pipeline, operating at the pipeline clock;
a write back 1 component 207 in the 7 th stage pipeline, operating at the pipeline clock;
a write back 2 component 208 in the 8 th stage pipeline, operating at the pipeline clock;
the fast access memory 0 unit 209 is a single port memory operating at the pipeline clock frequency and connected to the pipeline through the mux selector 213;
the fast access memory 1 unit 210 is a single port memory operating at the pipeline clock frequency and connected to the pipeline through the multiplexer selector 213;
the fast access memory 2 component 211 is a single port memory operating at the pipeline clock frequency and connected to the pipeline through the mux selector 213;
the fast access memory 3 component 212 is a single port memory operating at the pipeline clock frequency and is coupled to the pipeline through a multiplexer selector 213.
The pipeline accesses the fast access memory at and only in decode 1 unit 203 at stage 3 pipeline, decode 2 unit 204 at stage 4 pipeline, write back 1 unit 207 at stage 7 pipeline, and write back 2 unit 208 at stage 8 pipeline. The pipeline component locates the particular fast access memory by the thread group to which the currently running thread belongs, and then reads and writes the selected fast access memory by the mux selector 213. The pipeline component locates the particular fast access memory by the thread group to which the currently running thread belongs, and then reads and writes the selected fast access memory by the mux selector 213.
In the embodiment, two thread groups which are not conflicted with each other are selected, and the same quick access memory is subjected to time division multiplexing to the two register groups, so that the bandwidth utilization rate of the quick access memory reaches 100%.
Fig. 7 is a timing chart corresponding to the second embodiment. For the fast access memory 0 component 109, its operating state on the pipeline clock is that cycle 1 is accessed by thread group 2, cycle 2 is accessed by thread group 2, cycle 3 is accessed by thread group 0, cycle 4 is accessed by thread group 0, cycle 5 is accessed by thread group 2, cycle 6 is accessed by thread group 2, cycle 7 is accessed by thread group 0, and cycle 8 is accessed by thread group 0. The rest of the fast access memory is similar to the figure. In this embodiment, the bandwidth of the fast access memory is completely exhausted, but by doubling the read-write clock of the fast access memory, more bandwidth can be developed, and these access capabilities can be used for other peripherals, and in addition, the fast access memory can be opened as a common system memory space for all thread groups to access.
Third embodiment
FIG. 3 illustrates an unbalanced time division multiplexing architecture between groups of fast access memory threads of a processor, comprising:
an instruction 1 fetch unit 301 in the 1 st stage pipeline, operating at the pipeline clock;
an instruction 2 component 302 in the stage 2 pipeline operating at the pipeline clock;
decode 1 unit 303, located in the 3 rd stage pipeline, operating at the pipeline clock;
a decode 2 unit 304 in the 4 th stage pipeline, operating at the pipeline clock;
execution 1 unit 305 in the 5 th stage pipeline, operating at the pipeline clock;
an execution 2 component 306 in the 6 th stage pipeline, operating at the pipeline clock;
a write-back 1 section 307 in the 7 th stage pipeline, operating at the pipeline clock;
a write back 2 component 308 at stage 8 pipeline, operating at the pipeline clock;
the fast access memory 0 unit 309 is a single port memory operating at the pipeline clock frequency, connected to the pipeline through the mux selector 313;
the fast access memory 1 component 310 is a single port memory operating at the pipeline clock frequency and coupled to the pipeline through the mux selector 313;
the fast access memory 2 unit 311 is a single port memory operating at the pipeline clock frequency and connected to the pipeline through the multiplexer 313;
the fast access memory 3 component 312 is a single port memory operating at the pipeline clock frequency and coupled to the pipeline through the mux selector 313.
Because the thread groups support different instruction sets, for thread group 2, the pipeline may access the fast access memory at decode 1 component 303 at the 3 rd stage pipeline, decode 2 component 304 at the 4 th stage pipeline, execute 1 component 305 at the 5 th stage pipeline, write back 1 component 307 at the 7 th stage pipeline, and write back 2 component 308 at the 8 th stage pipeline. For thread group 4 threads, the pipeline accesses fast access memory only in decode 1 component 303 in the 3 rd stage pipeline, decode 2 component 304 in the 4 th stage pipeline, and write back 2 component 308 in the 8 th stage pipeline. For the remaining thread groups, the fast access memory is accessed only by decode 1 unit 303 in the 3 rd stage pipeline, decode 2 unit 304 in the 4 th stage pipeline, write back 1 unit 307 in the 7 th stage pipeline, and write back 2 unit 308 in the 8 th stage pipeline. The pipeline component locates the particular fast access memory by the thread group to which the currently running thread belongs, and then reads and writes the selected fast access memory by the mux selector 213.
Fig. 8 is a timing chart corresponding to the second embodiment. For the fast access memory 2 component 311, the operating state on the pipeline clock is that cycle 1 is accessed by thread group 2, cycle 2 is accessed by thread group 2, cycle 3 is accessed by thread group 2, cycle 4 is accessed by thread group 4, cycle 5 is accessed by thread group 2, cycle 6 is accessed by thread group 2, cycle 7 is accessed by thread group 4, and cycle 8 is accessed by thread group 2.
For the fast access memory 0 unit 309, its operating state on the pipeline clock is that cycle 1 is accessed by thread group 6, cycle 2 is accessed by thread group 6, cycle 3 is accessed by thread group 0, cycle 4 is accessed by thread group 0, cycle 5 is accessed by thread group 6, cycle 6 is accessed by thread group 6, cycle 7 is accessed by thread group 0, and cycle 8 is accessed by thread group 0. The rest of the fast access memory is similar to the figure.
Fourth embodiment
FIG. 4 illustrates fast access memory internal space partitioning for a processor.
The quick access memory 0 unit 401 has a memory capacity of 512 bytes, and is sequentially divided into 32 bytes dedicated to the thread 0, 64 bytes dedicated to the thread 1, 128 bytes shared by all threads, 128 bytes of main memory, and 64 bytes dedicated to the thread 2 as needed. The fast access memory has an external bandwidth for access by the remaining thread groups, and 128 bytes of main memory located in the fast access memory can be accessed by all threads to the memory space.
The fast access memory 1 unit 402 has a memory capacity of 32 bytes, and is divided into 32 bytes exclusively for the thread 3 in order as necessary.
The quick access memory 2 unit 403 has a memory capacity of 64 bytes, and is sequentially divided into 32 bytes for thread 4 and 32 bytes for thread 5 as necessary.
The quick access memory 3 unit 404 has a memory capacity of 128 bytes, and is sequentially divided into 32 bytes dedicated to the thread 6, 32 bytes dedicated to the thread 7, 48 bytes shared by the thread 6 and the thread 7, and 16 bytes of main memory as necessary. The quick access memory has no external bandwidth for accessing other thread groups, the main memory 16 bytes in the quick access memory can only be accessed by the threads in the thread group allocated to the quick access memory, and other threads cannot access the memory space, so that the memory data can be protected.
It should be noted that the thread within a thread group has access to all of its allocated space in the fast access memory, even if the space is not allocated to the thread, so the space owned by the thread is exclusively shared in the logical domain, and is not physically restricted, so that the registers of a particular thread are only a set of data that can be stored anywhere in the fast access memory on the processor.
Fifth embodiment
Fig. 5 is a diagram of a quick access memory combination. The fast access memory is now just a virtual conceptual memory, the entities of which are not specifically present but are assembled from specific physical memory as needed.
The quick access memory 0 unit 501 is a 7K byte logic memory composed of a 1K byte single port memory, a 2K byte single port memory, and a 4K byte single port memory.
The fast access memory 1 component 502 is a 3K byte logical memory that is a combination of a 1K byte single port memory and a 2K byte single port memory.
The fast access memory 2 component 503 is a 2K byte logical memory that is a combination of one 1K byte single port memory and another 1K byte single port memory.
The fast access memory 3 component 504 is a 2K byte logical memory directly generated from a 2K byte single port memory.
All physical memories are connected to the multiplexer 505 and to the pipeline through the multiplexer 505, so that each fast access memory is dynamically configurable by dynamically adjusting the data path of the multiplexer 505
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel designs disclosed herein.

Claims (59)

1. A fast access memory usage method for a time division multiplexed pipelined processor, characterized by: the processor uses an N-stage pipeline, and each stage of pipeline parts of the pipeline execute the operation of different threads in the same pipeline clock cycle; all threads multiplexing the pipeline are divided into a plurality of thread groups according to the time law of the threads switched into the pipeline, the threads in each thread group cannot appear on the pipeline at the same time, and all threads in each thread group share the same or a plurality of quick access memories.
2. The method of claim 1, further comprising: the clock period of the fast access memory is the same as or an integer multiple of the pipeline clock period.
3. The method of claim 1, further comprising: any part of the memory space or the whole memory space of the quick access memory can belong to only one thread, can be shared by a plurality of threads, and can also belong to one part of the main memory.
4. The method of claim 2, further comprising: any part of the memory space or the whole memory space of the quick access memory can belong to only one thread, can be shared by a plurality of threads, and can also belong to one part of the main memory.
5. The method of claim 1, further comprising: the thread can access part of the memory space of the quick access memory allocated by the thread group to which the thread belongs, and can also access the whole memory space of the quick access memory allocated by the thread group to which the thread belongs.
6. The method of claim 2, further comprising: the thread can access part of the memory space of the quick access memory allocated by the thread group to which the thread belongs, and can also access the whole memory space of the quick access memory allocated by the thread group to which the thread belongs.
7. The method of claim 3, further comprising: the thread can access part of the memory space of the quick access memory allocated by the thread group to which the thread belongs, and can also access the whole memory space of the quick access memory allocated by the thread group to which the thread belongs.
8. The method of claim 4, further comprising: the thread can access part of the memory space of the quick access memory allocated by the thread group to which the thread belongs, and can also access the whole memory space of the quick access memory allocated by the thread group to which the thread belongs.
9. The method of claim 1, further comprising: the time law of thread cut-in pipeline is that the remainder of the count value of pipeline clock divided by N is the same when thread cut-in pipeline.
10. The method of claim 2, further comprising: the time law of thread cut-in pipeline is that the remainder of the count value of pipeline clock divided by N is the same when thread cut-in pipeline.
11. The method of claim 3, further comprising: the time law of thread cut-in pipeline is that the remainder of the count value of pipeline clock divided by N is the same when thread cut-in pipeline.
12. The method of claim 4, further comprising: the time law of thread cut-in pipeline is that the remainder of the count value of pipeline clock divided by N is the same when thread cut-in pipeline.
13. The method of claim 1, further comprising: the storage entity of the fast access memory is a single-ported static random access memory (SP-SRAM).
14. The method of claim 2, further comprising: the storage entity of the fast access memory is a single-ported static random access memory (SP-SRAM).
15. The method of claim 3, further comprising: the storage entity of the fast access memory is a single-ported static random access memory (SP-SRAM).
16. The method of claim 4, further comprising: the storage entity of the fast access memory is a single-ported static random access memory (SP-SRAM).
17. The method of claim 5, further comprising: the storage entity of the fast access memory is a single-ported static random access memory (SP-SRAM).
18. The method of claim 6, further comprising: the storage entity of the fast access memory is a single-ported static random access memory (SP-SRAM).
19. The method of claim 7, further comprising: the storage entity of the fast access memory is a single-ported static random access memory (SP-SRAM).
20. The method of claim 8, further comprising: the storage entity of the fast access memory is a single-ported static random access memory (SP-SRAM).
21. The method as recited in claim 12, further comprising: the storage entity of the fast access memory is a single-ported static random access memory (SP-SRAM).
22. The method of claim 1, wherein the threads of different thread groups share fast access memory by time division multiplexing.
23. The method of claim 2, wherein the threads of different thread groups share fast access memory by time division multiplexing.
24. The method of claim 3, wherein the threads of different thread groups share fast access memory by time division multiplexing.
25. The method of claim 4, wherein the threads of different thread groups share fast access memory by time division multiplexing.
26. The method of claim 5, wherein the threads of different thread groups share fast access memory by time division multiplexing.
27. The method of claim 8, wherein the threads of different thread groups share fast access memory by time division multiplexing.
28. The method of claim 12, wherein the threads of different thread groups share fast access memory by time division multiplexing.
29. The method as recited in claim 22, further comprising: the number of access slots for time-division multiplexing of fast access memory by threads of different thread groups may be the same or different.
30. The method as recited in claim 23, further comprising: the number of access slots for time-division multiplexing of fast access memory by threads of different thread groups may be the same or different.
31. The method as recited in claim 24, further comprising: the number of access slots for time-division multiplexing of fast access memory by threads of different thread groups may be the same or different.
32. The method of claim 25, further comprising: the number of access slots for time-division multiplexing of fast access memory by threads of different thread groups may be the same or different.
33. The method of claim 26, further comprising: the number of access slots for time-division multiplexing of fast access memory by threads of different thread groups may be the same or different.
34. The method of claim 27, further comprising: the number of access slots for time-division multiplexing of fast access memory by threads of different thread groups may be the same or different.
35. The method as recited in claim 28, further comprising: the number of access slots for time-division multiplexing of fast access memory by threads of different thread groups may be the same or different.
36. The method of claim 1, further comprising: the fast access memory can be formed by combining a plurality of sub-memories with the same or different capacities according to the actual capacity requirement, and the capacities of the different fast access memories can be the same or different.
37. The method of claim 2, further comprising: the fast access memory can be formed by combining a plurality of sub-memories with the same or different capacities according to the actual capacity requirement, and the capacities of the different fast access memories can be the same or different.
38. The method of claim 3, further comprising: the fast access memory can be formed by combining a plurality of sub-memories with the same or different capacities according to the actual capacity requirement, and the capacities of the different fast access memories can be the same or different.
39. The method of claim 4, further comprising: the fast access memory can be formed by combining a plurality of sub-memories with the same or different capacities according to the actual capacity requirement, and the capacities of the different fast access memories can be the same or different.
40. The method of claim 8, further comprising: the fast access memory can be formed by combining a plurality of sub-memories with the same or different capacities according to the actual capacity requirement, and the capacities of the different fast access memories can be the same or different.
41. The method as recited in claim 12, further comprising: the fast access memory can be formed by combining a plurality of sub-memories with the same or different capacities according to the actual capacity requirement, and the capacities of the different fast access memories can be the same or different.
42. The method as recited in claim 28, further comprising: the fast access memory can be formed by combining a plurality of sub-memories with the same or different capacities according to the actual capacity requirement, and the capacities of the different fast access memories can be the same or different.
43. The method of claim 30, further comprising: the fast access memory can be formed by combining a plurality of sub-memories with the same or different capacities according to the actual capacity requirement, and the capacities of the different fast access memories can be the same or different.
44. The method of claim 1, further comprising: when the actual needs of the entity memory used for combining the quick access memories exceed the actual needs, the entity memory can be transferred to a thread temporary information memory or a thread information exchange memory or a cache (cache) or a main memory or other purpose memories.
45. The method of claim 2, further comprising: when the actual needs of the entity memory used for combining the quick access memories exceed the actual needs, the entity memory can be transferred to a thread temporary information memory or a thread information exchange memory or a cache (cache) or a main memory or other purpose memories.
46. The method of claim 3, further comprising: when the actual needs of the entity memory used for combining the quick access memories exceed the actual needs, the entity memory can be transferred to a thread temporary information memory or a thread information exchange memory or a cache (cache) or a main memory or other purpose memories.
47. The method of claim 4, further comprising: when the actual needs of the entity memory used for combining the quick access memories exceed the actual needs, the entity memory can be transferred to a thread temporary information memory or a thread information exchange memory or a cache (cache) or a main memory or other purpose memories.
48. The method of claim 5, further comprising: when the actual needs of the entity memory used for combining the quick access memories exceed the actual needs, the entity memory can be transferred to a thread temporary information memory or a thread information exchange memory or a cache (cache) or a main memory or other purpose memories.
49. The method of claim 6, further comprising: when the actual needs of the entity memory used for combining the quick access memories exceed the actual needs, the entity memory can be transferred to a thread temporary information memory or a thread information exchange memory or a cache (cache) or a main memory or other purpose memories.
50. The method of claim 7, further comprising: when the actual needs of the entity memory used for combining the quick access memories exceed the actual needs, the entity memory can be transferred to a thread temporary information memory or a thread information exchange memory or a cache (cache) or a main memory or other purpose memories.
51. The method of claim 8, further comprising: when the actual needs of the entity memory used for combining the quick access memories exceed the actual needs, the entity memory can be transferred to a thread temporary information memory or a thread information exchange memory or a cache (cache) or a main memory or other purpose memories.
52. The method as recited in claim 22, further comprising: when the actual needs of the entity memory used for combining the quick access memories exceed the actual needs, the entity memory can be transferred to a thread temporary information memory or a thread information exchange memory or a cache (cache) or a main memory or other purpose memories.
53. The method as recited in claim 23, further comprising: when the actual needs of the entity memory used for combining the quick access memories exceed the actual needs, the entity memory can be transferred to a thread temporary information memory or a thread information exchange memory or a cache (cache) or a main memory or other purpose memories.
54. The method as recited in claim 24, further comprising: when the actual needs of the entity memory used for combining the quick access memories exceed the actual needs, the entity memory can be transferred to a thread temporary information memory or a thread information exchange memory or a cache (cache) or a main memory or other purpose memories.
55. The method of claim 25, further comprising: when the actual needs of the entity memory used for combining the quick access memories exceed the actual needs, the entity memory can be transferred to a thread temporary information memory or a thread information exchange memory or a cache (cache) or a main memory or other purpose memories.
56. The method of claim 26, further comprising: when the actual needs of the entity memory used for combining the quick access memories exceed the actual needs, the entity memory can be transferred to a thread temporary information memory or a thread information exchange memory or a cache (cache) or a main memory or other purpose memories.
57. The method of claim 27, further comprising: when the actual needs of the entity memory used for combining the quick access memories exceed the actual needs, the entity memory can be transferred to a thread temporary information memory or a thread information exchange memory or a cache (cache) or a main memory or other purpose memories.
58. The method as recited in claim 28, further comprising: when the actual needs of the entity memory used for combining the quick access memories exceed the actual needs, the entity memory can be transferred to a thread temporary information memory or a thread information exchange memory or a cache (cache) or a main memory or other purpose memories.
59. The method of claim 30, further comprising: when the actual needs of the entity memory used for combining the quick access memories exceed the actual needs, the entity memory can be transferred to a thread temporary information memory or a thread information exchange memory or a cache (cache) or a main memory or other purpose memories.
CN201710977323.XA 2017-10-22 2017-10-22 Fast access memory architecture for time division multiplexed pipelined processor Active CN109697084B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710977323.XA CN109697084B (en) 2017-10-22 2017-10-22 Fast access memory architecture for time division multiplexed pipelined processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710977323.XA CN109697084B (en) 2017-10-22 2017-10-22 Fast access memory architecture for time division multiplexed pipelined processor

Publications (2)

Publication Number Publication Date
CN109697084A CN109697084A (en) 2019-04-30
CN109697084B true CN109697084B (en) 2021-04-09

Family

ID=66225026

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710977323.XA Active CN109697084B (en) 2017-10-22 2017-10-22 Fast access memory architecture for time division multiplexed pipelined processor

Country Status (1)

Country Link
CN (1) CN109697084B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1323105A (en) * 2001-03-19 2001-11-21 深圳市中兴集成电路设计有限责任公司 Correlator
CN1426553A (en) * 2000-01-21 2003-06-25 英特尔公司 Method and apparatus for pausing execution in processor
CN1842769A (en) * 2003-08-28 2006-10-04 美普思科技有限公司 Instruction for initiation of concurrent instruction streams in a multithreading microprocessor
CN101322111A (en) * 2005-04-07 2008-12-10 杉桥技术公司 Multithreading processor with each threading having multiple concurrent assembly line
CN102369508A (en) * 2008-09-04 2012-03-07 新思公司 Temporally-assisted resource sharing in electronic systems

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7551617B2 (en) * 2005-02-08 2009-06-23 Cisco Technology, Inc. Multi-threaded packet processing architecture with global packet memory, packet recirculation, and coprocessor
US8095932B2 (en) * 2007-08-14 2012-01-10 Intel Corporation Providing quality of service via thread priority in a hyper-threaded microprocessor
US8832712B2 (en) * 2009-09-09 2014-09-09 Ati Technologies Ulc System and method for synchronizing threads using shared memory having different buffer portions for local and remote cores in a multi-processor system
GB201001621D0 (en) * 2010-02-01 2010-03-17 Univ Catholique Louvain A tile-based processor architecture model for high efficiency embedded homogenous multicore platforms
US20140331014A1 (en) * 2013-05-01 2014-11-06 Silicon Graphics International Corp. Scalable Matrix Multiplication in a Shared Memory System
CN104391676B (en) * 2014-11-10 2017-11-10 中国航天科技集团公司第九研究院第七七一研究所 The microprocessor fetching method and its fetching structure of a kind of inexpensive high bandwidth
GB2539958B (en) * 2015-07-03 2019-09-25 Advanced Risc Mach Ltd Data processing systems
CN105183701B (en) * 2015-09-06 2018-06-26 北京北方烽火科技有限公司 1536 point FFT processing modes and relevant device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1426553A (en) * 2000-01-21 2003-06-25 英特尔公司 Method and apparatus for pausing execution in processor
CN1323105A (en) * 2001-03-19 2001-11-21 深圳市中兴集成电路设计有限责任公司 Correlator
CN1842769A (en) * 2003-08-28 2006-10-04 美普思科技有限公司 Instruction for initiation of concurrent instruction streams in a multithreading microprocessor
CN101322111A (en) * 2005-04-07 2008-12-10 杉桥技术公司 Multithreading processor with each threading having multiple concurrent assembly line
CN102369508A (en) * 2008-09-04 2012-03-07 新思公司 Temporally-assisted resource sharing in electronic systems

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
时分交换的新型FPGA互连结构研究;余慧;《中国优秀硕士学位论文全文数据库 信息科技辑》;20130215;I135-11 *

Also Published As

Publication number Publication date
CN109697084A (en) 2019-04-30

Similar Documents

Publication Publication Date Title
US10860326B2 (en) Multi-threaded instruction buffer design
US9262164B2 (en) Processor-cache system and method
KR102526619B1 (en) Low-power and low-latency GPU coprocessors for sustained computing
US6925643B2 (en) Method and apparatus for thread-based memory access in a multithreaded processor
US9262174B2 (en) Dynamic bank mode addressing for memory access
KR101275698B1 (en) Data processing method and device
US9158683B2 (en) Multiport memory emulation using single-port memory devices
JP4699781B2 (en) Semiconductor memory device and driving method thereof
WO2004034209A2 (en) Method and apparatus for register file port reduction in a multithreaded processor
CN102279818B (en) Vector data access and storage control method supporting limited sharing and vector memory
US8862835B2 (en) Multi-port register file with an input pipelined architecture and asynchronous read data forwarding
JP2010538390A (en) Second chance replacement mechanism for highly responsive processor cache memory
CN116324744A (en) Memory controller having multiple command sub-queues and corresponding arbiters
Li et al. Elastic-cache: GPU cache architecture for efficient fine-and coarse-grained cache-line management
CN109697084B (en) Fast access memory architecture for time division multiplexed pipelined processor
US7181575B2 (en) Instruction cache using single-ported memories
JP2022143544A (en) Arithmetic processing unit
TW201824270A (en) Integrated circuit
CN115599443A (en) Instruction or data function memory of processor
CN113127066A (en) Method for time-division multiplexing of instruction or data functional memory of pipeline processor
Lee et al. A low-power VLSI architecture for a shared-memory FFT processor with a mixed-radix algorithm and a simple memory control scheme
Cao et al. Memory design for selective error protection
Tanskanen et al. Parallel memory architecture for TTA processor
US11526278B2 (en) Adaptive page close prediction
CN107861689B (en) Chip area and power consumption optimization method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant