WO2016131175A1 - 多核***中数据访问者目录的访问方法及设备 - Google Patents

多核***中数据访问者目录的访问方法及设备 Download PDF

Info

Publication number
WO2016131175A1
WO2016131175A1 PCT/CN2015/073192 CN2015073192W WO2016131175A1 WO 2016131175 A1 WO2016131175 A1 WO 2016131175A1 CN 2015073192 W CN2015073192 W CN 2015073192W WO 2016131175 A1 WO2016131175 A1 WO 2016131175A1
Authority
WO
WIPO (PCT)
Prior art keywords
entry
shared
single pointer
data block
pointer entry
Prior art date
Application number
PCT/CN2015/073192
Other languages
English (en)
French (fr)
Inventor
顾雄礼
方磊
蔡卫光
刘鹏
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to BR112017017306-9A priority Critical patent/BR112017017306B1/pt
Priority to CA2976132A priority patent/CA2976132A1/en
Priority to CN201580001247.8A priority patent/CN106164874B/zh
Priority to PCT/CN2015/073192 priority patent/WO2016131175A1/zh
Priority to CN202010209306.3A priority patent/CN111488293B/zh
Priority to EP15882315.3A priority patent/EP3249539B1/en
Priority to SG11201706340TA priority patent/SG11201706340TA/en
Priority to JP2017542831A priority patent/JP6343722B2/ja
Priority to KR1020177023526A priority patent/KR102027391B1/ko
Publication of WO2016131175A1 publication Critical patent/WO2016131175A1/zh
Priority to US15/675,929 priority patent/US20170364442A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0817Cache consistency protocols using directory methods
    • G06F12/0822Copy directories
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0811Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0817Cache consistency protocols using directory methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0817Cache consistency protocols using directory methods
    • G06F12/0828Cache consistency protocols using directory methods with concurrent directory accessing, i.e. handling multiple concurrent coherency transactions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/084Multiuser, multiprocessor or multiprocessing cache systems with a shared cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0844Multiple simultaneous or quasi-simultaneous cache accessing
    • G06F12/0846Cache with multiple tag or data arrays being simultaneously accessible
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0864Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using pseudo-associative means, e.g. set-associative or hashing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1041Resource optimization
    • G06F2212/1044Space efficiency improvement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/28Using a specific disk cache architecture
    • G06F2212/283Plural cache memories
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/31Providing disk cache in a specific location of a storage system
    • G06F2212/314In storage network, e.g. network attached cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/60Details of cache memory
    • G06F2212/6032Way prediction in set-associative cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/62Details of cache specific to multiprocessor cache arrangements
    • G06F2212/621Coherency control relating to peripheral accessing, e.g. from DMA or I/O device

Definitions

  • the present invention relates to the field of information technology, and more particularly to an access method for a data visitor directory in a multi-core system, a directory cache device, a multi-core system, and a directory storage unit.
  • the data block is typically stored in a shared memory space for access by one or more processor cores.
  • create a copy of the data block in the private cache of one or more processor cores that have accessed the data block ie, store the data block in the private cache cache of the processor core, such that When a core that has accessed the data block needs to access the data block, it only needs to read the data block into the private cache of the core.
  • Commonly used cache coherency solutions include a listen-based consistency solution and a directory-based consistency solution.
  • a listen-based consistency solution when a copy of a data block in a certain core is modified, it is necessary to send a broadcast message whose data block is modified to other cores storing the copy of the data block, so as to notify the cores to update the copy or invalidate the data block.
  • the data block for the latter, the access list of the data block is used to record the list of visitors of the data block (ie, the core storing the data block in the multi-core processor), and when the copy of the data block in a certain core is modified, only A notification message that the data block is modified is sent to other visitors.
  • the continued growth in processor cores has made listening-based consistency solutions a performance and bandwidth bottleneck (broadcast messages require a lot of processing resources and bandwidth), while directory-based coherency protocols are widely used for better scalability. use.
  • a directory is in the form of a vector to record a list of visitors to a block of data.
  • each catalog table in the catalog contains an N-bit vector, and whether each bit in the vector is 1 indicates whether the N cores have a copy of a certain data block.
  • the number of directory entries will increase linearly with the increase of the number of cores, and the size of the cache used to store the copy of the data block will not become larger as the number of cores increases, resulting in the bits occupied by the directory.
  • the ratio of the number of bits occupied by the data block to the number of bits increases as the number of cores increases, so that the storage space for storing the directory becomes larger and larger, which poses a challenge to the buffer space of the on-chip multi-core processor.
  • the embodiment of the invention provides a method for accessing a data visitor directory in a multi-core system, a directory cache device, a multi-core system and a directory storage unit, which can save storage resources occupied by the data visitor directory.
  • a method for accessing a data visitor directory in a multi-core system comprising a shared data cache and a plurality of processor cores, the data blocks in the shared data cache being copied to the One or more processor cores of the plurality of processor cores, the multi-core system further comprising a data visitor directory, the data visitor directory is used to record visitor information of the data block in the shared data cache, the data block The visitor is a processor core that holds a copy of the data block;
  • the directory includes a single pointer entry array and a shared entry array, wherein each single pointer entry in the single pointer entry array is used to record information of a unique visitor of the data block, or record the single pointer entry And association information of the shared entry in the shared list of entries, each shared entry in the shared list of entries is used to record information of multiple visitors of the data block;
  • the method includes:
  • the single pointer entry in the single pointer entry array is further used to identify that the data block is shared by all processor cores in the multi-core system, the method Also includes:
  • the method further includes:
  • the first data block is allocated a first single pointer corresponding to the first data block in the single pointer entry array An entry, and the information of the first processor core is recorded in the first single pointer entry, including:
  • the single pointer entry is selected according to the least recently used principle. If the selected single pointer entry is not associated with the shared entry, and the unique visitor information is recorded, Transmitting the invalidation message to the unique visitor of the record, and then recording the information of the first processor core in the selected single pointer entry.
  • the invalidation message is broadcast to all of the processor cores, and the selected single pointer table Recording information of the first processor core in the item,
  • the selected single-pointer entry is associated with the shared entry, determining, according to the shared entry associated with the selected single-pointer entry, multiple visitors of the associated shared entry record, to the recorded multiple visitors A invalidation message is sent and the information of the first processor core is recorded in the selected single pointer entry.
  • the method further includes:
  • the first shared entry is allocated in the shared-list array, including:
  • the shared entry is selected according to the least recently used principle, and if the selected shared entry is recorded, the visitor is selected. If the quantity is greater than the predetermined threshold, the single-pointer entry associated with the selected shared entry is set such that the identified data block is shared by all processor cores in the multi-core system, and the number of visitors recorded by the selected shared entry is not greater than The predetermined threshold is used to write one visitor information of the recorded visitor into the single pointer entry associated with the selected shared entry, and send the invalidation message to other visitors in the recorded visitor.
  • the single pointer entry includes a label, a shared entry associated bit, and a single pointer, where the label is used by Corresponding to the data block, the shared entry associated bit is used to identify whether the single pointer entry is associated with the shared entry, and the single pointer is used to record the unique access of the data block when the data block has only unique visitors.
  • the shared entry includes a sharer record structure and an associated structure, wherein the sharer record structure is used to record information of a plurality of visitors of the data block, the association structure being used to associate the single pointer entry.
  • the single-pointer entry further includes: a fully shared bit, where the fully shared bit is used in the single-pointer entry and the share When an entry is not associated, the data block is identified as having only a unique visitor or identifying that the data block is shared by all processor cores in the multi-core system.
  • a directory cache device including:
  • a directory storage unit for storing a data visitor directory in the multi-core system
  • the multi-core system including a shared data cache and a plurality of processor cores, the data blocks in the shared data cache being copied to one of the plurality of processor cores Or at least two processor cores for recording visitor information of the data block in the shared data cache, the accessor of the data block being a processor core holding a copy of the data block, the directory including a single pointer An entry array and a shared list entry, wherein each single pointer entry in the single pointer entry array is used to record information of a unique visitor of the data block, or record the single pointer entry and the shared entry Association information of shared entries in the array, each shared entry in the shared list of entries is used to record information of multiple visitors of the data block;
  • Execution unit for:
  • the single pointer entry in the single pointer entry array is further used to identify that the data block is shared by all processor cores in the multi-core system, and the execution unit further Used for:
  • the execution unit after the execution unit receives the first access request sent by the first processor core, the execution unit further Used for:
  • the execution unit is configured to:
  • the invalidation message is broadcast to all of the processor cores, and the selected single pointer table Recording information of the first processor core in the item,
  • the sender sends a invalidation message and records the information of the first processor core in the selected single pointer entry.
  • the execution unit is further configured to:
  • the execution unit is configured to:
  • the shared entry is selected according to the least recently used principle, and the number of visitors recorded by the selected shared entry is selected. If the threshold value is greater than the predetermined threshold, the single-pointer entry associated with the selected shared entry is set to be identified by the all-processor core in the multi-core system, and the number of visitors recorded in the selected shared entry is not greater than the predetermined number. The threshold is to write one visitor information of the recorded visitor into the single pointer entry associated with the selected shared entry, and send the invalidation message to other visitors in the recorded visitor.
  • the single pointer entry includes a label, a shared entry associated bit, a fully shared bit, and a single pointer, where The tag is configured to correspond to the data block, where the shared table entry bit is used to identify whether the single pointer entry is associated with the shared entry, and the fully shared bit is used in the single pointer entry and the shared entry.
  • the data block is identified as having only a unique visitor or identifying that the data block is shared by all processor cores in the multi-core system.
  • the single pointer is used to record the unique visitor of the data block when the data block has only unique visitors. The information of the single pointer entry and the associated information of the shared entry when the single pointer entry is associated with the shared entry;
  • the shared entry includes a sharer record structure and an associated structure, wherein the sharer record structure is used to record information of a plurality of visitors of the data block, the association structure being used to associate the single pointer entry.
  • the single-pointer entry further includes: a fully shared bit
  • the fully shared bit is used to identify that the data block has only a unique visitor or that the data block is shared by all processor cores in the multi-core system when the single pointer entry is not associated with the shared entry.
  • a multi-core system comprising: a plurality of processor cores, a shared data cache, and a directory cache device in any of the possible implementations of the second aspect or the second aspect.
  • a directory storage unit for storing a directory in a multi-core system, the multi-core system including a shared data cache and a plurality of processor cores, the data blocks in the shared data cache being copied to the plurality of processes
  • One or at least two processor cores in the kernel for recording visitor information of a data block in the shared data cache, the accessor of the data block being a processor core holding a copy of the data block
  • the directory includes:
  • Each single pointer entry in the single pointer entry array is used to record information about a unique visitor of the data block, or record association information of the single pointer entry and the shared entry in the shared entry array;
  • Each shared table entry in the shared list of entries is used to record information for multiple visitors of the data block.
  • the single-pointer entry includes a label, a shared entry associated bit, a fully shared bit, and a single pointer, where the label is used to correspond to the data block, the sharing The table entry associated bit is used to identify whether the single pointer entry is associated with the shared entry, and the fully shared bit is used to identify that the data block has only a unique visitor or identifier when the single pointer entry and the shared entry are not associated.
  • the data block is shared by all processor cores in the multi-core system, the single pointer is used to record information of the unique visitor of the data block when the data block has only unique visitors, in the single pointer entry and the shared entry Recording the association information of the single pointer entry and the shared entry when the association is performed;
  • the shared entry includes a sharer record structure and an associated structure, wherein the sharer record structure is used to record information of a plurality of visitors of the data block, the association structure being used to associate the single pointer entry.
  • the single pointer entry includes: a fully shared bit
  • the fully shared bit is used to identify that the data block has only a unique visitor or that the data block is shared by all processor cores in the multi-core system when the single-hander entry and the shared entry are not associated.
  • the sharer record structure is a vector.
  • the embodiment of the present invention adopts a directory structure of a single pointer entry array and a shared entry array.
  • a single pointer entry is used to record the information of the visitor.
  • the information of the visitor is recorded by using a single pointer entry and a shared entry.
  • the directory entries in the directory can be occupied.
  • the average size used is relatively large, and the performance loss is small, so the storage resources occupied by the directory can be saved, and the scalability of the system can be improved.
  • FIG. 1 is a schematic diagram of a multi-core system to which the technical solution of the embodiment of the present invention is applicable.
  • FIG. 2 is a schematic diagram of a directory of one embodiment of the present invention.
  • FIG. 3 is a schematic diagram of a single pointer entry in accordance with an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a shared entry in accordance with an embodiment of the present invention.
  • Figure 5 is a schematic illustration of a directory of another embodiment of the present invention.
  • FIG. 6 is a schematic flowchart of a method for accessing a directory according to an embodiment of the present invention.
  • FIG. 7a is a schematic flowchart of a method for accessing a directory according to another embodiment of the present invention.
  • FIG. 7b is a schematic flowchart of a method for accessing a directory according to still another embodiment of the present invention.
  • FIG. 8 is a schematic flowchart of a method for accessing a directory according to still another embodiment of the present invention.
  • FIG. 9a is a schematic flowchart of a method for accessing a directory according to still another embodiment of the present invention.
  • Figure 9b is a schematic diagram of shared entry compression in accordance with one embodiment of the present invention.
  • FIG. 10 is a schematic block diagram of a directory cache device according to an embodiment of the present invention.
  • Figure 11 is a schematic illustration of a multi-core system in accordance with one embodiment of the present invention.
  • multi-core processor system or “multi-core system” refers to a processing system that includes multiple processor cores (Cores) that can be implemented as on-chip multi-core processors or on-board multi-core processing systems.
  • the on-chip multi-core processor is a processor in which multiple processor cores (Cores) are integrated on one chip, and the multi-core processing system on the board refers to many Each of the cores of the processor cores is packaged as a processor and integrated into such a processing system formed on the circuit board.
  • processor core is the abbreviation of "processor core”.
  • the core (Core) is also called the kernel. It is the most important component of the CPU (Central Processing Unit). It is made of monocrystalline silicon. All calculations, receive/store commands, and processing data of the CPU are executed by the processor core.
  • the term "multiprocessor core” in the term refers to the inclusion of at least two processor cores, and the “multiprocessor core” covers the multi-core (Multi-Core) in the prior art, as well as the range of applications of the Many Core. .
  • directory cache also referred to as “directory cache device” refers to a storage device used to store a data visitor directory in a multi-core system.
  • the storage device generally uses a cache.
  • the form of (Cache) is implemented.
  • the directory cache is implemented independently of the processor core, that is, a storage space is allocated in the cache on the on-chip multi-core processing chip as a storage directory.
  • Cache the other is that the directory cache is implemented in a distributed manner, that is, the directory is divided into several blocks, which are respectively stored in the cache inside each processor core in the on-chip multi-core processing chip.
  • shared data cache refers to a storage device used to store data blocks shared by multiple cores. In order to speed up the access rate of data blocks, the storage device is generally implemented in the form of a cache. In a specific implementation process, the shared data cache generally refers to a secondary (L2) cache or a tertiary (L3) cache in a multi-core processor system.
  • L2 secondary
  • L3 tertiary
  • private data cache refers to a storage device internal to a processor core for storing private data of the processor core.
  • the private data cache generally refers to a level one (L1) cache in a multi-core processor.
  • L1 cache level one cache in a multi-core processor.
  • the processor core reads part of the shared data into the private data cache.
  • the term "data block” refers to the granularity at which individual processors check access to data in a multi-core processor system.
  • the data block is stored in a shared data cache of the multi-core processor system. Therefore, in general, the granularity of the data block is a Cache Line (ie, a cache line). In a specific implementation, the granularity of the data block. There may be other manifestations, such as a part of the Cache Line, or multiple Cache Lines. In this regard, this specification is not limited.
  • directory also referred to as “directory structure”, “data visitor directory” refers to a data structure that records visitor information of a data block.
  • the data visitor directory includes an array of single-pointer entries and an array of shared entries, wherein the array of single-pointer entries is composed of a plurality of single-pointer entries, and the shared-entrylet array has a plurality of shared entries. The content recorded by each single-pointer entry is different according to the number of data block visitors.
  • the single-pointer entry When the data block has only a unique visitor, the single-pointer entry records the information of the unique visitor of the data block; when the data block When there are multiple visitors, the single pointer entry is also used to record the association information of the single pointer entry and the shared entry corresponding to the single pointer entry. When the data block has multiple visitors (two or more), the shared table entry is used to record information of multiple visitors of the data block.
  • the term "data visitor directory" is composed of one or more data visitor directory entries.
  • the term "entry of a data visitor directory” refers to a constituent unit in a "data visitor directory”, and an entry in the directory corresponds to each data block in the shared data cache.
  • the data visitor directory since the data visitor directory includes an array of single pointer entries and an array of shared entries, when there is only one data visitor in a certain data block, the data visitor directory entry corresponding to the data block refers to A single-pointer entry that records the unique visitor information of the data block; when there are multiple visitors (two or more) in a certain data block, the data visitor directory entry corresponding to the data block refers to the record a single pointer entry of the single pointer entry and association information of the shared entry corresponding to the single pointer entry, and a shared entry of information of a plurality of visitors recording the data block.
  • the term "visitor” refers to a processor core that accesses a block of data. For example, when a block of data is accessed by three processor cores, the three processor cores are said to be the data. The visitor of the block.
  • access request refers to a directory access request issued by a processor core to request a request for visitor information for a data block.
  • association information refers to, for a certain data block, when there are at least two visitors, the single pointer entry corresponding to the data block records the access of the shared entry corresponding to the single pointer entry. Index, the access index is called the association information of the single-pointer entry of the data block and the corresponding shared entry, and the associated information indicates that the single-pointer entry of the data block and the corresponding shared entry have connection relation.
  • LRU Least Recently Used
  • invalidation message means that when a table item is reassigned, an invalidation message is sent to the visitor originally recorded by the entry to invalidate the original data block.
  • FIG. 1 is a schematic diagram of a multi-core system to which the technical solution of the embodiment of the present invention is applicable.
  • the multi-core system 100 includes a plurality of processor cores 110, a shared data cache 120, and a directory cache 130.
  • a plurality of processor cores 110 can access the data blocks 121 in the shared data cache 120.
  • a copy of the data blocks 121 is created in the private cache 111 of the processor core 110 that has accessed the data blocks 121, and a corresponding directory is used in the directory cache 130.
  • the entry 131 records its visitor list for the data block 121.
  • the data block 121 in the shared data cache 120 can be copied to one of the plurality of processor cores 110 or at least two processor cores; the visitor of the data block 121 is a copy that holds the data block 121 Processor core.
  • a directory is a structure that records a list of visitors, and based on this, a directory can also be expressed as a directory structure.
  • the directory is stored in a Directory Cache, and in particular, can be stored in a directory storage unit in the directory cache.
  • the directory cache can be centralized or distributed.
  • the directory may be a centralized directory, that is, a buffer is set in a multi-core system (for example, a multi-core processor chip) for storing the directory; the directory may also be a distributed directory, that is, the directory is divided into blocks, The partitioned portions of the respective directories are stored on the respective processor cores. For example, assuming that a multi-core system includes 128 processor cores, the directory can be divided into 128 parts, which are stored in the 128 processor cores, respectively.
  • FIG. 2 shows a schematic diagram of a directory 200 in accordance with an embodiment of the present invention.
  • the directory 200 includes a single pointer entry array 210 and a shared list array 220.
  • the single pointer entry array 210 includes a plurality of single pointer entries; the shared entry array 220 includes a plurality of shared entries.
  • the single pointer entry in the single pointer entry array 210 is used to record the information of the unique visitor of the data block, or to record the association information of the single pointer entry and the shared entry in the shared entry array 220, that is, A single pointer entry may record information about a unique visitor of the data block when the data block has only unique visitors, or record the single pointer entry and the shared list of entries 220 when the data block has multiple visitors. Association information of a shared entry in the record; the shared entry is used to record information of multiple visitors of the data block.
  • a Scale-out application In a Scale-out application, most of the data has only one visitor.
  • the data may be private data itself or behave privately over a period of time. Based on this, most directory entries only need to record the information of one processor core in the form of a single pointer, such as the number of the processor core, which is referred to as a single pointer entry in the present invention.
  • some directory entries still use hardware structures (eg, vectors, limited pointers, or other forms) that can track multiple visitors, referred to as shared entries in the present invention. All single pointer entries form an array of single pointer entries; all shared entries constitute an array of shared entries. The number of entries in a single-pointer array can be large, and the number of entries in the shared-entry array can be small.
  • a single pointer entry can use fewer bits to record a visitor; a shared entry can take more bits and record multiple visitors.
  • a data block has only a unique visitor, only a single pointer entry is used to record the unique visitor of the data block.
  • the single pointer entry is not associated with the shared entry; when the data block has multiple visitors, the data block The corresponding single-pointer entry is associated with the shared entry, and the associated shared entry is used to record multiple visitors of the data block.
  • a full shared bit may also be set in the single pointer entry.
  • the data block may be identified by all processors in the multi-core system.
  • the core is shared.
  • the single pointer entry is not associated with the shared entry. That is to say, when the data block is shared by all processor cores, only a single pointer table entry is needed, and the shared table item is not required to be associated.
  • the directory of the embodiment of the present invention adopts a structure of a single pointer entry array and a shared entry array.
  • a single visitor exists, only the single pointer entry records the visitor's information, and when there are multiple visitors, the single use
  • the pointer entry and the shared entry are used to record the information of the visitor.
  • the average size occupied by the directory entry in the directory can be greatly compressed, and the performance loss is small, thereby saving the occupation of the directory. Storage resources to improve system scalability.
  • the single pointer entry may include a tag 301, a shared entry associated bit 302, a fully shared bit 303, and a single pointer 304.
  • the tag 301 is used to correspond to a data block.
  • the label may correspond to the address of the data block, and may be a part of the address bit of the data block, so that the single pointer entry corresponding to the data block may be searched according to the correspondence between the address of the data block and the label.
  • the shared entry association bit 302 is used to indicate whether a single pointer entry and a shared entry are associated. For example, a value of 1 in the associated entry of the shared entry indicates that the single-pointer entry has a shared entry associated with it. A value of 0 indicates that the single-pointer entry does not have a shared entry associated with it.
  • the fully shared bit 303 is used to indicate that the data block is shared by all processor cores or that the data block has only unique visitors. For example, when the value of the fully shared bit 303 is 1, it indicates that the data block is shared by all processor cores. When the associated entry of the shared entry is 0, that is, when the shared entry is not associated, and the fully shared bit is also 0, Indicates that the data block has only unique visitors.
  • the single pointer 304 is used to record the information of the unique visitor of the data block when the data block has only a unique visitor. When there are multiple visitors, the single pointer 304 is used to record the association between the single pointer entry and the shared entry. Information that points to the shared table entry.
  • the information of the unique visitor may be represented as the identifier of the visitor. For example, the number of the visitor (processor core) or other identification information may be adopted.
  • the association information of a single pointer entry and a shared entry may be represented as pointer or index information. In this regard, embodiments of the invention are not limited.
  • the associated entry of the shared entry is 0, that is, the shared entry is not associated, the fully shared bit is 0, that is, when there is only a unique visitor, the unique visitor of the data block is recorded in the single pointer; the associated bit in the shared entry is 1
  • the single pointer 304 records the association information with the shared entry, and the associated information is used to point to the shared entry associated with the single pointer entry.
  • the shared entry may include a sharer record structure 401, a high address 402, and a way select bit 403.
  • the upper address 402 and the way selection bit 403 are association structures indicating association information.
  • the sharer record structure 401 is used to record information of a plurality of visitors of a data block.
  • the sharer record structure can be a vector or other structure capable of recording multiple visitors.
  • the association structure (higher address 402 and way select bit 403) is used to point to a single pointer entry.
  • a single pointer entry array is used as a primary array, and a shared entry array is used as a secondary array.
  • the single-pointer entry array 510 and the shared-listlet array 520 all adopt a cache-like group association structure, and the number of groups (one set of each behavior of the array) is called depth, and the number of ways (each column of the array is All the way) is called the degree of association.
  • the depth of the array of single-pointer entries is large, and the degree of association is moderate, so as to reduce access power consumption; while the depth of the shared-list array is small, and the degree of association is large, so as to improve the utilization of shared entries.
  • the directory When accessing the directory, first look up the array of single-pointer entries according to the address information in the access request, for example, look up the label of the single-pointer entry to determine whether there is a single-pointer entry. Subsequent access from a single pointer entry to a shared entry and access from a shared entry to a single-pointer entry can be achieved by means of a "group number + way number". In a specific implementation, the group number can be determined first. Determine the road number to achieve.
  • the fully shared bit 303 is optional. Field.
  • a data block is shared by all processor cores in a multi-core system, it can be identified by the sharer record structure 401 in the shared entry that the data block is shared by all processor cores.
  • the fully shared bit 303 is added to the single pointer table entry and setting the fully shared bit 303 to 1, it is possible to indicate that the "data block is shared by all processor cores in the multi-core system".
  • a shared scene is possible to indicate that the "data block is shared by all processor cores in the multi-core system.
  • FIG. 6 shows a schematic flow diagram of a method 600 of accessing a data visitor directory in a multi-core system in accordance with an embodiment of the present invention.
  • This directory is the directory of the aforementioned embodiment of the present invention.
  • the method 600 can be performed by a directory cache.
  • S610 Receive a first access request sent by the first processor core, where the first access request is used to access an entry in the directory corresponding to the first data block.
  • the address information of the data block may be carried in the first access request, and the directory may be accessed according to the address information in the access request, and the entry corresponding to the data block is searched in the directory.
  • first accessing the single pointer entry array When receiving the first access request, first accessing the single pointer entry array to determine whether there is a single pointer entry corresponding to the data block. Specifically, according to the address information in the access request, it is possible to find whether there is a single pointer entry corresponding to the data block in the single pointer entry array.
  • the structure of the single-pointer entry shown in FIG. 3 can be used to compare the address information carried in the access request with the label in the single-pointer entry to determine whether there is a single-pointer entry corresponding to the data block.
  • the first single pointer entry corresponding to the first data block exists in the single pointer entry array.
  • the data block (represented as the first data block) has a corresponding single pointer entry (represented as the first single pointer entry), according to the first single pointer entry, it is determined whether there is a presence in the shared entry array.
  • the shared entry associated with the first single pointer entry For example, taking the structure of the single pointer entry shown in FIG. 3 as an example, whether a shared entry associated with the single pointer entry exists may be determined according to the shared entry associated bit in the single pointer entry.
  • the first single pointer entry associates the shared entry (represented as the first shared entry), determining the first according to the first shared entry Multiple visitors to the data block.
  • the associated shared table may be determined according to the association information recorded in the single pointer table item, for example, the association information of the single pointer record in the structure shown in FIG. Item, accessing the shared entry, and obtaining multiple visitors of the data block from the shared entry.
  • access from a single pointer entry to a shared entry may be as follows.
  • the low bit is intercepted, and the group number of the shared entry is obtained.
  • the group number of the shared entry can be determined by the lower digit of the group number of the single pointer entry.
  • the array of single-pointer entries is 4 channels and 64 groups; the array of shared table entries is 8 channels and 16 groups.
  • the associated entry of the shared-list entry of the currently accessed single-point entry is 1, indicating that the shared-allocation entry needs to be accessed.
  • the group number of the 64-group single-pointer entry array is 6 bits, the single-pointer entry is located in 55 groups, and the group number is represented as b_110111 (b_ represents binary).
  • the lower 4 bits b_0111 of b_110111 are intercepted, and the corresponding shared entry is located in the 7 groups of the shared list of entries.
  • the method 600 may further include:
  • S640 determining, according to the first single pointer entry, that the first shared entry associated with the first single pointer entry does not exist in the shared entry array, determining the first according to the first single pointer entry The unique visitor of the data block or determines that the first data block is shared by all of the processor cores in the multi-core system.
  • the visitor of the first data block is determined only according to the first single pointer entry.
  • the first single pointer entry may record a unique visitor of the first data block or identify that the first data block is shared by all processor cores in the multi-core system. In both cases, there is no need to associate shared entries, which can be represented by fewer bits. For specific examples, reference may be made to the foregoing embodiments, and details are not described herein again.
  • the method for accessing a directory in the embodiment of the present invention first accesses a single pointer entry, and then accesses the associated shared entry when the single pointer entry is associated with the shared entry, and can obtain the single pointer entry when the data block has only a unique visitor.
  • the unique visitor associated with a single pointer entry when the data block has multiple visitors The shared entry obtains the multiple visitors, so that the average directory entry size of the directory can be greatly compressed, and the performance loss is small, thereby saving the storage resources occupied by the directory and improving the scalability of the system.
  • the corresponding single pointer entry may be allocated to the data block.
  • the method 600 may further include:
  • a single pointer entry may be allocated for the data block, and the information of the unique visitor (ie, the first processor core) is recorded in the allocated single pointer entry.
  • the single pointer entry is selected according to the least recently used principle, where
  • the invalidated message is sent to the unique visitor of the record, and the first processor core is recorded in the selected single-pointer entry.
  • the invalidation message is broadcast to all of the processor cores, and the selected single pointer table Recording information of the first processor core in the item,
  • the selected single-pointer entry is associated with the shared entry, determining, according to the shared entry associated with the selected single-pointer entry, multiple visitors of the associated shared entry record, to the recorded multiple visitors A invalidation message is sent and the information of the first processor core is recorded in the selected single pointer entry.
  • FIG. 8 is a schematic flowchart of a method for accessing a directory according to another embodiment of the present invention.
  • the address information carried in the access request may be compared with the label in the single-pointer entry to determine whether there is a data block.
  • the corresponding single pointer entry may be compared with the label in the single-pointer entry to determine whether there is a data block.
  • the single-pointer entry is associated with the shared entry according to the shared entry association bit in the single-pointer entry of the hit, if the shared-entry association bit is 1 to indicate the associated shared entry, and if 0, the un-associated shared list is item.
  • an associated shared entry may be found according to a single pointer in a single pointer entry, and a visitor list may be obtained from a sharer record structure in the associated shared entry.
  • the shared table entry When the shared table entry is not associated, it is determined whether the data block is shared by all processor cores. For example, the full shared bit in the single pointer table entry may be determined to be fully shared. If the full shared bit is 0, the data block has only unique visitors, that is, it is not fully shared. If 1 indicates that the data block is shared by all processor cores, That is, full sharing.
  • a unique visitor can be obtained from a single pointer in a single pointer table entry.
  • the identifier of the processor core can be recorded. Taking 64 cores as an example, a 6-bit identifier can be used.
  • the first single pointer entry after allocating a single pointer entry (ie, the first single pointer entry) for the data block (such as the foregoing first data block), the first single pointer entry records the unique access of the first data block. (ie the first processor core). At this time, the first data block is private to the first processor core.
  • the shared table entry needs to be allocated in the shared table entry array, and multiple visitors are recorded by using the shared entry (first processor core and Second processor core) information.
  • the method 600 may further include:
  • the first shared entry is allocated, if there is an unused shared entry in the shared entry array, a shared entry is selected from the unused shared entry as the first shared entry.
  • the shared entry of the query information is selected according to the least-used principle. If the number of visitors of the selected shared entry record is greater than a predetermined threshold, the single-pointer entry associated with the selected shared entry is set to The identification data block is shared by all processor cores in the multi-core system, and if the number of visitors of the selected shared entry record is not greater than a predetermined threshold, one of the recorded visitors is written to the selected share. The single-pointer entry associated with the entry and the invalidation message is sent to other visitors in the recorded visitor.
  • the shared table item when a shared table item is allocated, the unused shared item is preferentially used. If there are no shared shared items, you need to reclaim the shared items that have already been used. Among them, the shared list with only one visitor is preferred. If there is no shared list with only one visitor, then the least used one is selected. Shared table entry. When reclaiming only one visitor's shared entry, a unique visitor needs to be written to the associated single-pointer entry, which does not result in the loss of visitor information. When the shared entry of multiple visitors is retrieved, the shared list can be compressed and stored in the associated single pointer entry according to the number of visitors.
  • the associated single pointer entry is set to identify that the data block is shared by all processor cores, which may be referred to as up-conversion; if the number of visitors is not greater than a predetermined threshold, access will be One of the visitors writes into the associated single-hander entry and sends a invalidation message to other visitors, that is, only one visitor is reserved, which can be called down-conversion.
  • the fully shared position 1 of the associated single-pointer entry indicates that the data block is shared by all processor cores; when performing the down-conversion, only one visitor is retained (eg, Recorder No. 3 in Figure 9b) records it in the associated single pointer entry.
  • Reclaimed shared entries can be assigned to other data blocks. That is to say, in the embodiment of the present invention, the shared entry can be dynamically allocated according to the change of the data sharing situation. In this way, the utilization of directory resources is more flexible, thereby improving the utilization of directory resources.
  • a single pointer entry is accessed from the shared table entry.
  • the associated single-pointer entry may be determined according to the shared entry. Specifically, the associated single-pointer entry may be determined according to the associated structure in the shared entry.
  • access from a shared entry to a single pointer entry may be as follows.
  • the array of single-pointer entries is 4 channels and 64 groups; the array of shared table entries is 8 channels and 16 groups.
  • the shared table entry is in 5 groups (b_0101), the upper address is b_10, and the path selection bit is b_01.
  • the group number of the corresponding single-pointer entry is obtained by splicing the shared entry group number and the high-order address, which is b_100101, that is, 37.
  • the way selection bits in the shared table entry are used for road selection.
  • the path selection bit is b_01, that is, 1 way, thereby obtaining an associated single pointer entry.
  • the size of the sequence numbers of the above processes does not mean the order of execution, and the order of execution of each process should be determined by its function and internal logic, and should not be taken to the embodiments of the present invention.
  • the implementation process constitutes any limitation.
  • FIG. 10 shows a schematic block diagram of a directory cache device 1000 in accordance with an embodiment of the present invention.
  • the directory cache device 1000 includes a directory storage unit 1010 and an execution unit 1020.
  • the directory storage unit 1010 is configured to save a directory in the multi-core system, the multi-core system includes a shared data cache and a plurality of processor cores, and the data blocks in the shared data cache are copied to one or at least two of the plurality of processor cores a processor core for recording visitor information of a data block in the shared data cache, the accessor of the data block being a processor core holding a copy of the data block, the directory including a single pointer entry array And an array of shared entries, wherein each single pointer entry in the array of single pointer entries is used to record information of a unique visitor of the data block, or to record the single pointer entry and the array in the shared entry The associated information of the shared item, each shared item in the shared list of items is used to record information of multiple visitors of the data block.
  • the execution unit 1020 is configured to:
  • the directory cache device of the embodiment of the present invention adopts a directory structure of a single pointer entry array and a shared entry array.
  • the single pointer entry is not associated with the shared entry, and multiple records need to be recorded.
  • the average directory entry size of the directory can be greatly compressed, and the performance loss is small, so that the storage resources occupied by the directory can be saved, and the scalability of the system can be improved.
  • the single pointer entry includes a label, a shared entry associated bit, and a single pointer, where the label is used to correspond to the data block, and the shared entry associated bit is used to identify the Whether a single pointer entry is associated with the shared entry, and the single pointer is used to record information about a unique visitor of the data block when the data block has only a unique visitor, when the single pointer entry is associated with the shared entry Recording the associated information of the single pointer entry and the shared entry;
  • the shared entry includes a sharer record structure and an associated structure, wherein the sharer record structure is used to record information of a plurality of visitors of the data block, the association structure being used to associate the single pointer entry.
  • the single pointer entry further includes a fully shared bit.
  • the fully shared bit is used to identify that the data block has only a unique visitor or that the data block is shared by all processor cores in the multi-core system when the single pointer entry is not associated with the shared entry.
  • the single-pointer entry in the single-pointer entry array is further used to identify that the data block is shared by all the processor cores in the multi-core system, and the execution unit 1020 is further configured to:
  • the execution unit 1020 after the execution unit 1020 receives the first access request sent by the first processor core, the executing unit 1020 is further configured to:
  • the executing unit 1020 is configured to:
  • the unused single finger Selecting a single pointer entry as the first single pointer entry in the pin entry, and recording information of the first processor core;
  • the single pointer entry is selected according to the least recently used principle. If the selected single pointer entry is not associated with the shared entry, and the unique visitor information is recorded, Transmitting the invalidation message to the unique visitor of the record, and then recording the information of the first processor core in the selected single pointer entry.
  • the invalidation message is broadcast to all of the processor cores, and the selected single pointer table Recording information of the first processor core in the item,
  • the sender sends a invalidation message and records the information of the first processor core in the selected single pointer entry.
  • the execution unit 1020 is further configured to:
  • the executing unit 1020 is configured to:
  • the shared entry of the queryer is selected, the shared entry is selected according to the least-used principle. If the number of visitors of the selected shared entry is greater than a predetermined threshold, the single-pointer entry associated with the selected shared entry is set as the identifier.
  • the data block is shared by all processor cores in the multi-core system, and if the number of visitors of the selected shared entry record is not greater than a predetermined threshold, one of the recorded visitors is written to the selected shared table.
  • the item is associated with a single pointer entry and sends a invalidation message to other visitors in the recorded visitor.
  • the directory stored in the directory storage unit 1010 of the directory cache device 1000 of the embodiment of the present invention may be the directory in the foregoing embodiment of the present invention, and the execution unit 1020 may perform various processes in the foregoing method embodiments, and the corresponding specific description may refer to the foregoing.
  • the embodiments are not described herein for the sake of brevity.
  • the embodiment of the invention also provides a multi-core system.
  • the multi-core system 1100 includes a plurality of processor cores 1110, a shared data cache 1120, and the directory cache device 1000 in the foregoing embodiment of the present invention.
  • the multi-core system 1100 of the embodiment of the present invention uses a new directory cache device 1000 with respect to the multi-core system 100 of FIG. 1, which includes a new directory structure provided by the embodiment of the present invention.
  • the disclosed systems, devices, and methods may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interface, device or unit, or an electrical, mechanical or other form of connection.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the embodiments of the present invention.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
  • the technical solution of the present invention contributes in essence or to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium.
  • a number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明公开了一种多核***中数据访问者目录的访问方法、目录缓存设备、多核***和目录存储单元。该方法包括:接收第一处理器核发送的第一访问请求,该第一访问请求用于访问目录中和第一数据块对应的表项;根据该第一访问请求,确定单指针表项阵列中存在该第一数据块对应的第一单指针表项;根据该第一单指针表项,确定在该共享表项阵列中存在与该第一单指针表项关联的第一共享表项时,根据该第一共享表项确定该第一数据块的多个访问者。本发明实施例能够节省目录所占用的存储资源。

Description

多核***中数据访问者目录的访问方法及设备 技术领域
本发明涉及信息技术领域,并且更具体地,涉及多核***中数据访问者目录的访问方法、目录缓存设备、多核***和目录存储单元。
背景技术
在多核(Multi-Core)处理器或众核(Many Core)处理器的应用中,存在一些数据块被该处理器中的一个或者多个处理器核访问的场景。针对该场景,通常会将该数据块存储在共享存储空间中,以便一个或多个处理器核能够访问。为了加速数据块的访问,在访问过该数据块的一个或多个处理器核的私有缓存中创建该数据块的拷贝(即将该数据块在处理器核的私有缓存Cache中存储起来),这样当访问过该数据块的某一核需要访问该数据块时,只需要到该核的私有缓存中进行该数据块的读取。因为在被访问的一个或者多个处理器核的私有缓存中存有该数据块的拷贝,就需要维护该数据块在多个核的私有缓存中的拷贝之间一致性,解决拷贝之间一致性问题被称为缓存一致性(Cache Coherence)问题。解决该缓存一致性问题的基本原理是当某一个核中该数据块的拷贝被修改时,必须更新其他核中该数据块的拷贝或无效化该数据块(即删除该数据块),这就需要确定该数据块在多核处理器中的哪些核存在拷贝(即确定该数据块的访问者)。
常用的缓存一致性解决方式包含有基于侦听的一致性解决方案和基于目录的一致性解决方案。对于前者,在发生某一个核中数据块的拷贝被修改时,需要向存储该数据块拷贝的其他核发送该数据块被修改的广播消息,以便通知这些核更新该数据块的拷贝或者无效化该数据块;对于后者,利用数据块的访问目录记录数据块的访问者(即多核处理器中存储该数据块的核)列表,在发生某一个核中数据块的拷贝被修改时,只向其他的访问者发送该数据块被修改的通知消息。处理器核数的持续增长使得基于侦听的一致性解决方案面临性能和带宽的瓶颈(广播消息需要占用大量处理资源以及带宽),而基于目录的一致性协议凭借较好的可扩展性被广泛采用。
传统的基于目录的一致性解决方案中,目录采用向量的形式来记录数据块的访问者列表。对于包含有N核的多核处理器***,该目录中每个目录表 项中包含了一个N位的向量,该向量中每一比特是否为1表示这N个核中是否拥有某一数据块的拷贝。这种机制下目录表项的个数会随着核数的增多成线性增长,而用来存储数据块拷贝的缓存的大小并不会随着核数的增多而变大,导致目录所占比特数和数据块所占比特数的比值随着核数的增多而变大,这样用来存储目录的存储空间会越来越大,对片上多核处理器的缓存的空间带来了挑战。
发明内容
本发明实施例提供了一种多核***中数据访问者目录的访问方法、目录缓存设备、多核***和目录存储单元,能够节省数据访问者目录所占用的存储资源。
第一方面,提供了一种多核***中数据访问者目录的访问方法,应用于多核***,该多核***包括共享数据缓存和多个处理器核,该共享数据缓存中的数据块被复制到该多个处理器核中的一个或者至少两个处理器核,该多核***还包括数据访问者目录,该数据访问者目录用于记录该共享数据缓存中的数据块的访问者信息,该数据块的访问者为保存有该数据块的拷贝的处理器核;
该目录包括单指针表项阵列和共享表项阵列,其中,该单指针表项阵列中的每个单指针表项用于记录该数据块的唯一访问者的信息,或者记录该单指针表项和该共享表项阵列中的共享表项的关联信息,该共享表项阵列中的每个共享表项用于记录该数据块的多个访问者的信息;
该方法包括:
接收第一处理器核发送的第一访问请求,该第一访问请求用于访问该目录中和第一数据块对应的表项;
根据该第一访问请求,确定该单指针表项阵列中存在该第一数据块对应的第一单指针表项;
根据该第一单指针表项,确定在该共享表项阵列中存在与该第一单指针表项关联的第一共享表项时,根据该第一共享表项确定该第一数据块的多个访问者。
结合第一方面,在第一种可能的实现方式中,该单指针表项阵列中的单指针表项还用于标识该数据块被该多核***中的所有处理器核共享,该方法 还包括:
根据该第一单指针表项,确定在该共享表项阵列中不存在与该第一单指针表项关联的第一共享表项时,根据该第一单指针表项确定该第一数据块的唯一访问者或确定该第一数据块被该多核***中的所有处理器核共享。
结合第一方面或第一方面的第一种可能的实现方式,在第二种可能的实现方式中,在接收第一处理器核发送的第一访问请求之后,该方法还包括:
根据该第一访问请求,确定该单指针表项阵列中不存在该第一数据块对应的单指针表项;
在该单指针表项阵列中为该第一数据块分配与该第一数据块对应的第一单指针表项,并在该第一单指针表项中记录该第一处理器核的信息。
结合第一方面的第二种可能的实现方式,在第三种可能的实现方式中,在该单指针表项阵列中为该第一数据块分配与该第一数据块对应的第一单指针表项,并在该第一单指针表项中记录该第一处理器核的信息,包括:
若该单指针表项阵列中存在未使用的单指针表项,则从该未使用的单指针表项中选择一个单指针表项作为该第一单指针表项,记录该第一处理器核的信息;
若该单指针表项阵列中不存在未使用的单指针表项,则按照最近最少使用原则选择单指针表项,若选择的单指针表项未关联共享表项,且记录唯一访问者的信息,则向记录的唯一访问者发送无效化消息,再在所选择的单指针表项中记录该第一处理器核的信息,
若所选择的单指针表项未关联共享表项,且标识数据块被该多核***中的所有处理器核共享,则向该所有处理器核广播无效化消息,并在所选择的单指针表项中记录该第一处理器核的信息,
若所选择的单指针表项关联共享表项,则根据与该所选择的单指针表项关联的共享表项确定所关联的共享表项记录的多个访问者,向记录的多个访问者发送无效化消息,并在所选择的单指针表项中记录该第一处理器核的信息。
结合第一方面的第二或三种可能的实现方式,在第四种可能的实现方式中,该方法还包括:
接收第二处理器核发送的第二访问请求,该第二访问请求用于访问该目录中和该第一数据块对应的表项;
根据该第二访问请求,确定该单指针表项阵列中存在该第一数据块对应的该第一单指针表项;
根据该第一单指针表项,确定在该共享表项阵列中不存在与该第一单指针表项关联的共享表项时,根据该第一单指针表项确定该第一数据块的唯一访问者为该第一处理器核;
在该共享表项阵列中分配第一共享表项,建立该第一单指针表项与该第一共享表项的关联关系,并在该第一共享表项中记录该第一处理器核的信息和该第二处理器核的信息。
结合第一方面的第四种可能的实现方式,在第五种可能的实现方式中,在该共享表项阵列中分配第一共享表项,包括:
若该共享表项阵列中存在未使用的共享表项,则从该未使用的共享表项中选择一个共享表项作为该第一共享表项;
若该共享表项阵列中不存在未使用的共享表项且存在只记录一个访问者信息的共享表项,则选择该只记录一个访问者信息的共享表项,将记录的一个访问者信息写入所选择的共享表项关联的单指针表项中;
若该共享表项阵列中不存在未使用的共享表项且不存在只记录一个访问者信息的共享表项,则按照最近最少使用原则选择共享表项,若选择的共享表项记录的访问者数量大于预定阈值,则将所选择的共享表项关联的单指针表项设置为标识数据块被该多核***中的所有处理器核共享,若所选择的共享表项记录的访问者数量不大于预定阈值,则将记录的访问者中的一个访问者信息写入所选择的共享表项关联的单指针表项中,并向记录的访问者中的其他访问者发送无效化消息。
结合第一方面或第一方面的上述任一种可能的实现方式,在第六种可能的实现方式中,该单指针表项包括标签、共享表项关联位和单指针,其中,该标签用于与该数据块对应,该共享表项关联位用于标识该单指针表项和该共享表项是否关联,该单指针用于在该数据块只有唯一访问者时记录该数据块的唯一访问者的信息,在该单指针表项和该共享表项关联时记录该单指针表项和该共享表项的关联信息;
该共享表项包括共享者记录结构和关联结构,其中,该共享者记录结构用于记录该数据块的多个访问者的信息,该关联结构用于关联该单指针表项。
结合第一方面的第六种可能的实现方式,在第七种可能的实现方式中,该单指针表项还包括:全共享位,该全共享位用于在该单指针表项和该共享表项未关联时标识该数据块只有唯一访问者或标识该数据块被该多核***中的所有处理器核共享。
第二方面,提供了一种目录缓存设备,包括:
目录存储单元,用于保存多核***中的数据访问者目录,该多核***包括共享数据缓存和多个处理器核,该共享数据缓存中的数据块被复制到该多个处理器核中的一个或者至少两个处理器核,该目录用于记录该共享数据缓存中的数据块的访问者信息,该数据块的访问者为保存有该数据块的拷贝的处理器核,该目录包括单指针表项阵列和共享表项阵列,其中,该单指针表项阵列中的每个单指针表项用于记录该数据块的唯一访问者的信息,或者记录该单指针表项和该共享表项阵列中的共享表项的关联信息,该共享表项阵列中的每个共享表项用于记录该数据块的多个访问者的信息;
执行单元,用于:
接收第一处理器核发送的第一访问请求,该第一访问请求用于访问该目录中和第一数据块对应的表项;
根据该第一访问请求,确定该单指针表项阵列中存在该第一数据块对应的第一单指针表项;
根据该第一单指针表项,确定在该共享表项阵列中存在与该第一单指针表项关联的第一共享表项时,根据该第一共享表项确定该第一数据块的多个访问者。
结合第二方面,在第一种可能的实现方式中,该单指针表项阵列中的单指针表项还用于标识该数据块被该多核***中的所有处理器核共享,该执行单元还用于:
根据该第一单指针表项,确定在该共享表项阵列中不存在与该第一单指针表项关联的第一共享表项时,根据该第一单指针表项确定该第一数据块的唯一访问者或确定该第一数据块被该多核***中的所有处理器核共享。
结合第二方面或第二方面的第一种可能的实现方式,在第二种可能的实现方式中,在该执行单元接收该第一处理器核发送的第一访问请求之后,该执行单元还用于:
根据该第一访问请求,确定该单指针表项阵列中不存在该第一数据块对 应的单指针表项;
在该单指针表项阵列中为该第一数据块分配与该第一数据块对应的第一单指针表项,并在该第一单指针表项中记录该第一处理器核的信息。
结合第二方面的第二种可能的实现方式,在第三种可能的实现方式中,该执行单元用于:
若该单指针表项阵列中存在未使用的单指针表项,则从该未使用的单指针表项中选择一个单指针表项作为该第一单指针表项,记录该第一处理器核的信息;
若该单指针表项阵列中不存在未使用的单指针表项,则按照最近最少使用原则单指针表项,若选择的单指针表项未关联共享表项,且记录唯一访问者的信息,则向记录的唯一访问者发送无效化消息,再在所选择的单指针表项中记录该第一处理器核的信息,
若所选择的单指针表项未关联共享表项,且标识数据块被该多核***中的所有处理器核共享,则向该所有处理器核广播无效化消息,并在所选择的单指针表项中记录该第一处理器核的信息,
若所选择的单指针表项关联共享表项,则根据与该所选择的单指针表项所关联的共享表项确定所关联的共享表项记录的多个访问者,向记录的多个访问者发送无效化消息,并在所选择的单指针表项中记录该第一处理器核的信息。
结合第二方面的第二或三种可能的实现方式,在第四种可能的实现方式中,该执行单元还用于:
接收第二处理器核发送的第二访问请求,该第二访问请求用于访问该目录中和该第一数据块对应的表项;
根据该第二访问请求,确定该单指针表项阵列中存在该第一数据块对应的该第一单指针表项;
根据该第一单指针表项,确定在该共享表项阵列中不存在与该第一单指针表项关联的共享表项时,根据该第一单指针表项确定该第一数据块的唯一访问者为该第一处理器核;
在该共享表项阵列中分配第一共享表项,建立该第一单指针表项与该第一共享表项的关联关系,并在该第一共享表项中记录该第一处理器核的信息和该第二处理器核的信息。
结合第二方面的第四种可能的实现方式,在第五种可能的实现方式中,该执行单元用于:
若该共享表项阵列中存在未使用的共享表项,则从该未使用的共享表项中选择一个共享表项作为该第一共享表项;
若该共享表项阵列中不存在未使用的共享表项且存在只记录一个访问者信息的共享表项,则选择该只记录一个访问者信息的共享表项,将记录的一个访问者信息写入所选择的共享表项关联的单指针表项中;
若该共享表项阵列中不存在未使用的共享表项且不存在只记录一个访问者的共享表项,则按照最近最少使用原则选择共享表项,若选择的共享表项记录的访问者数量大于预定阈值,则将所选择的共享表项关联的单指针表项设置为标识数据块被该多核***中的所有处理器核共享,若所选择的共享表项记录的访问者数量不大于预定阈值,则将记录的访问者中的一个访问者信息写入所选择的共享表项关联的单指针表项中,并向记录的访问者中的其他访问者发送无效化消息。
结合第二方面或第二方面的上述任一种可能的实现方式,在第六种可能的实现方式中,该单指针表项包括标签、共享表项关联位、全共享位和单指针,其中,该标签用于与该数据块对应,该共享表项关联位用于标识该单指针表项和该共享表项是否关联,该全共享位用于在该单指针表项和该共享表项未关联时标识该数据块只有唯一访问者或标识该数据块被该多核***中的所有处理器核共享,该单指针用于在该数据块只有唯一访问者时记录该数据块的唯一访问者的信息,在该单指针表项和该共享表项关联时记录该单指针表项和该共享表项的关联信息;
该共享表项包括共享者记录结构和关联结构,其中,该共享者记录结构用于记录该数据块的多个访问者的信息,该关联结构用于关联该单指针表项。
结合第二方面的第六种可能的实现方式,在第七种可能的实现方式中,该单指针表项还包括:全共享位,
该全共享位用于在该单指针表项和该共享表项未关联时标识该数据块只有唯一访问者或标识该数据块被该多核***中的所有处理器核共享。
第三方面,提供了一种多核***,包括:多个处理器核,共享数据缓存和上述第二方面或第二方面的任一种可能的实现方式中的目录缓存设备。
第四方面,提供了一种目录存储单元,用于保存多核***中的目录,该多核***包括共享数据缓存和多个处理器核,该共享数据缓存中的数据块被复制到该多个处理器核中的一个或者至少两个处理器核,该目录用于记录该共享数据缓存中的数据块的访问者信息,该数据块的访问者为保存有该数据块的拷贝的处理器核,该目录包括:
单指针表项阵列和共享表项阵列;
该单指针表项阵列中的每个单指针表项用于记录该数据块的唯一访问者的信息,或者记录该单指针表项和该共享表项阵列中的共享表项的关联信息;
该共享表项阵列中的每个共享表项用于记录该数据块的多个访问者的信息。
结合第四面,在第一种可能的实现方式中,该单指针表项包括标签、共享表项关联位、全共享位和单指针,其中,该标签用于与该数据块对应,该共享表项关联位用于标识该单指针表项和该共享表项是否关联,该全共享位用于在该单指针表项和该共享表项未关联时标识该数据块只有唯一访问者或标识该数据块被多核***中的所有处理器核共享,该单指针用于在该数据块只有唯一访问者时记录该数据块的唯一访问者的信息,在该单指针表项和该共享表项关联时记录该单指针表项和该共享表项的关联信息;
该共享表项包括共享者记录结构和关联结构,其中,该共享者记录结构用于记录该数据块的多个访问者的信息,该关联结构用于关联该单指针表项。
结合第四面的第一种可能的实现方式中,在第二种可能的实现方式中,单指针表项包括:全共享位,
全共享位用于在单指针表项和共享表项未关联时标识该数据块只有唯一访问者或标识该数据块被多核***中的所有处理器核共享。
结合第四方面的第一种或第二种可能的实现方式,在第三种可能的实现方式中,该共享者记录结构为向量。
基于上述技术方案,本发明实施例采用单指针表项阵列和共享表项阵列的目录结构,当某一数据块仅存在单个访问者时,只采用单指针表项记录该访问者的信息,当该数据块存在多个访问者时,采用单指针表项和共享表项关联的方式记录访问者的信息,采用上述方式可以使目录中的目录表项所占 用的平均尺寸得到比较大的压缩,并且性能损失较小,因此能够节省目录所占用的存储资源,提高***的可扩展性。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对本发明实施例中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是可应用本发明实施例技术方案的多核***的示意图。
图2是本发明一个实施例的目录的示意图。
图3是本发明一个实施例的单指针表项的示意图。
图4是本发明一个实施例的共享表项的示意图。
图5是本发明另一实施例的目录的示意图。
图6是本发明一个实施例的访问目录的方法的示意性流程图。
图7a是本发明另一实施例的访问目录的方法的示意性流程图。
图7b是本发明又一实施例的访问目录的方法的示意性流程图。
图8为本发明又一实施例的访问目录的方法的示意性流程图。
图9a是本发明又一实施例的访问目录的方法的示意性流程图。
图9b是本发明一个实施例的共享表项压缩的示意图。
图10是本发明一个实施例的目录缓存设备的示意性框图。
图11是本发明一个实施例的多核***的示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明的一部分实施例,而不是全部实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都应属于本发明保护的范围。
贯穿本说明书,术语“多核处理器***”或“多核***”指的是包含了多个处理器核(Core)的处理***,该***可以表现为片上多核处理器,或者板上多核处理***,其中,片上多核处理器是多个处理器核(Core)集成在一个芯片(Chip)上的处理器(Processor),板上多核处理***指的是多 个处理器核中每个核分别封装为处理器,并集成在电路板上所构成的这样的处理***。
贯穿本说明书,术语“处理器核”是“处理器核心”的简称,核心(Core)又称为内核,是CPU(Central Processing Unit)最重要的组成部分,它是由单晶硅以一定的生产工艺制造出来的,CPU所有的计算、接收/存储命令、处理数据都由处理器核执行。术语中“多处理器核”指的是包含至少两个处理器核,“多处理器核”涵盖了现有技术中的多核(Multi-Core),以及众核(Many Core)所应用的范围。
贯穿本说明书,术语“目录缓存”,也称为“目录缓存设备”,指的是在多核***中用来存储数据访问者目录的存储设备,为了加快目录的访问速率,该存储设备一般采用缓存(Cache)的形式来实现。该目录缓存的实现至少存在两种实现方式,一种是目录缓存采用独立于处理器核的方式实现,即在片上多核处理芯片上的缓存(Cache)中分配一块存储空间用来作为存储目录的缓存;另一种是目录缓存采用分布式的方式实现,即将目录分成若干块,将这些目录块分别存储在片上多核处理芯片中的每一个处理器核内部的缓存中。
贯穿本说明书,术语“共享数据缓存”指的是用来存储多核共享的数据块的存储设备,为了加快数据块的访问速率,该存储设备一般采用缓存(Cache)的形式来实现。在具体实现过程中,该共享数据缓存一般指的是多核处理器***中的二级(L2)缓存或三级(L3)缓存。
贯穿本说明书,术语“私有数据缓存”指的是在某一处理器核内部,用来存储本处理器核私有数据的存储设备。在具体实现过程中,该私有数据缓存一般指的是多核处理器中的一级(L1)缓存。在多核处理器的实现中,为了提高处理器核对数据块的访问效率,处理器核会将部分共享数据读取到私有数据缓存中。
贯穿本说明书,术语“数据块”指的是多核处理器***中,各个处理器核对数据进行访问的粒度。在具体实现中,该数据块因为存储在多核处理器***的共享数据缓存中,因此,一般情况下,该数据块的粒度是Cache Line(即缓存行),具体实现中,该数据块的粒度还可能有其他的表现形式,譬如:Cache Line的一部分,或者多个Cache Line。对此,本说明书均不加以限制。
贯穿本说明书,术语“目录”,也称为“目录结构”、“数据访问者目录”,指的是记录数据块的访问者信息的数据结构。具体实现中,数据访问者目录包含单指针表项阵列,以及共享表项阵列,其中,单指针表项阵列由许多个单指针表项构成,共享表项阵列有多个共享表项构成。每个单指针表项所记录的内容根据数据块访问者的多少存在不同,当该数据块只有唯一访问者时,该单指针表项记录该数据块的唯一访问者的信息;当该数据块有多个访问者时,该单指针表项还用于记录该单指针表项和与该单指针表项对应的共享表项的关联信息。当该数据块有多个访问者(两个及两个以上)时,共享表项用于记录该数据块的多个访问者的信息。术语“数据访问者目录”是由一个或多个数据访问者目录表项构成的。
贯穿本说明书,术语“数据访问者目录的表项”指的是“数据访问者目录”中的构成单元,目录中的表项和共享数据缓存中的每一个数据块相对应。具体实现中,由于数据访问者目录包含单指针表项阵列,以及共享表项阵列,故当某一数据块只存在一个数据访问者时,该数据块对应的数据访问者目录表项指的是记录该数据块的唯一访问者信息的单指针表项;当某一数据块存在多个访问者(两个及两个以上)时,该数据块对应的数据访问者目录表项指的是记录该单指针表项和与该单指针表项对应的共享表项的关联信息的单指针表项,以及记录该数据块的多个访问者的信息的共享表项。
贯穿本说明书,术语“访问者”指的是对某一数据块进行访问的处理器核,例如,当某一数据块被三个处理器核访问,则称这三个处理器核是该数据块的访问者。
贯穿本说明书,术语“访问请求”指的是处理器核发出的目录访问请求,用来查询某一数据块的访问者信息的请求。
贯穿本说明书,术语“关联信息”指的对于某一数据块,当存在至少2个访问者时,由该数据块对应的单指针表项记录与该单指针表项对应的共享表项的访问索引,这种访问索引称为该数据块的单指针表项和与之对应的共享表项的关联信息,这种关联信息表明该数据块的单指针表项和与之对应的共享表项具有关联关系。
贯穿本说明书,术语“最近最少使用原则(Least Recently Used,LRU)”指的是在为某一数据块分配表项(单指针表项或共享表项)时,采用该原则从单指针表项阵列或共享表项阵列中选择最近一段时间内最少被访问的表 项,用来作为该数据块的表项。
贯穿本说明书,术语“无效化消息”指的是在表项重新分配时,向表项原先记录的访问者发送无效化消息,使其无效化原有的数据块。
图1是可应用本发明实施例技术方案的多核***的示意图。
如图1所示,多核***100中包括多个处理器核110、共享数据缓存120和目录缓存130。多个处理器核110可以访问共享数据缓存120中的数据块121,访问过数据块121的处理器核110的私有缓存111中会创建数据块121的拷贝,在目录缓存130中采用相应的目录的表项131为数据块121记录其访问者列表。
换句话说,共享数据缓存120中的数据块121可以被复制到多个处理器核110中的一个或者至少两个处理器核;该数据块121的访问者为保存有该数据块121的拷贝的处理器核。
目录为记录访问者列表的结构,基于此,目录也可以表述为目录结构。
目录存储于目录缓存(Directory Cache)中,具体地,可以存储于目录缓存中的目录存储单元中。
目录缓存可以是集中式的,也可以是分布式的。相应地,目录可以是集中式的目录,即在多核***(例如多核处理器芯片)中设置一个缓存区用来存储该目录;目录也可以是分布式的目录,即将该目录进行分块,将分块后的各个目录部分分别存储在各个处理器核上。例如:假设多核***包括128个处理器核,则目录可以分成128个部分,分别存储在这128个处理器核中。
图2示出了根据本发明实施例的目录200的示意图。
如图2所示,目录200包括:单指针表项阵列210和共享表项阵列220。
单指针表项阵列210包括多个单指针表项;共享表项阵列220包括多个共享表项。
该单指针表项阵列210中的单指针表项用于记录数据块的唯一访问者的信息,或者记录该单指针表项和该共享表项阵列220中的共享表项的关联信息,也就是说,单指针表项可以在数据块只有唯一访问者时记录该数据块的唯一访问者的信息,或者在该数据块有多个访问者时记录该单指针表项和该共享表项阵列220中的共享表项的关联信息;该共享表项用于记录该数据块的多个访问者的信息。
在Scale-out(横向扩展)应用中,大部分数据只有一个访问者。这些数据可能本身是私有数据,或者在一个时间段中表现出私有特性。基于此,大部分目录表项只需要采用单个指针的形式来记录一个处理器核的信息,例如处理器核的号码,本发明中称为单指针表项。同时,为了跟踪有多个访问者的数据,一些目录表项仍然采用可以跟踪多个访问者的硬件结构(例如,向量、有限指针或其他形式),本发明中称为共享表项。所有单指针表项构成单指针表项阵列;所有共享表项构成共享表项阵列。单指针表项阵列的表项数量可以较大,共享表项阵列的表项数量可以较小。
单指针表项可采用较少比特(bit),记录一个访问者;共享表项可以采用较多比特,记录多个访问者。在数据块只有唯一访问者时只采用单指针表项记录该数据块的唯一访问者,此时该单指针表项不与共享表项关联;在数据块有多个访问者时,该数据块对应的单指针表项与共享表项关联,采用关联的共享表项记录该数据块的多个访问者。
可选地,为了节省共享表项所占用的存储空间,在单指针表项中还可以设置全共享位,当该共享位设置为1时,则可以标识数据块被多核***中的所有处理器核共享,此时该单指针表项不与共享表项关联。也就是说,在数据块被所有处理器核共享时,也只需要使用单指针表项,不需要关联共享表项。
本发明实施例的目录采用单指针表项阵列加共享表项阵列的结构,在存在单个访问者时,只采用单指针表项记录该访问者的信息,当存在多个访问者时,采用单指针表项和共享表项关联的方式记录访问者的信息,采用上述方式,目录中的目录表项所占用的平均尺寸可以得到很大的压缩,并且性能损失很小,因此能够节省目录所占用的存储资源,提高***的可扩展性。
图3是本发明一个实施例的单指针表项的示意图。如图3所示,单指针表项可以包括标签301、共享表项关联位302、全共享位303和单指针304。
标签301用于与数据块对应。例如,标签可以与数据块的地址对应,具体可以为数据块的一部分地址位,从而可以根据数据块的地址和标签的对应关系查找数据块对应的单指针表项。
共享表项关联位302用于表示单指针表项和共享表项是否关联。例如,共享表项关联位取值为1表示该单指针表项存在与之关联的共享表项,取值为0表示该单指针表项不存在与之关联的共享表项。
全共享位303用于表示数据块被所有处理器核共享或表示数据块只有唯一访问者。例如,当该全共享位303取值为1时,表示数据块被所有处理器核共享,当共享表项关联位为0,即未关联共享表项时,且全共享位也为0时,表示该数据块只有唯一的访问者。
单指针304用于在数据块只有唯一访问者时记录数据块的唯一访问者的信息,当存在多个访问者时,该单指针304用于记录该单指针表项和该共享表项的关联信息,指向该共享表项。其中,唯一访问者的信息可以表示为该访问者的标识,作为举例,可以采用该访问者(处理器核)的编号,或者其他的标识信息。单指针表项和共享表项的关联信息,可以表示为指针或者索引信息。对此,本发明的实施例不加以限定。
例如,在共享表项关联位为0,即未关联共享表项,全共享位为0,即只有唯一访问者时,单指针中记录数据块的唯一访问者;在共享表项关联位为1,即关联共享表项时,单指针304记录的是和共享表项的关联信息,该关联信息用来指向该单指针表项所关联的共享表项。
图4是本发明一个实施例的共享表项的示意图。如图4所示,共享表项可以包括共享者记录结构401,高位地址402和路选择位403。高位地址402和路选择位403为表示关联信息的关联结构。
共享者记录结构401用于记录数据块的多个访问者的信息。共享者记录结构可以为向量或其他能够记录多个访问者的结构。
关联结构(高位地址402和路选择位403)用于指向单指针表项。
由于数据块出现唯一访问者的情况较多,多个访问者的情况较少,在本发明实施例中,将单指针表项阵列作为主阵列,共享表项阵列作为从阵列。如图5所示,单指针表项阵列510和共享表项阵列520都采用类似缓存的组关联结构,组数(阵列的每一行为一组)称为深度,路数(阵列的每一列为一路)称为关联度。单指针表项阵列的深度较大,而关联度适中,以降低访问功耗;而共享表项阵列的深度较小,而关联度较大,以提高共享表项利用率。在访问目录时,先根据访问请求中的地址信息查找单指针表项阵列,例如查找单指针表项的标签,确定是否存在单指针表项。后续从单指针表项到共享表项的访问以及从共享表项到单指针表项的访问可以通过“组号+路号”的方式来实现,具体实现中,可以通过先确定组号,再确定路号而实现。
需要说明的是,在上述单指针表项的各个字段中,全共享位303为可选 字段。当数据块被多核***中的所有处理器核共享,可以通过共享表项中的共享者记录结构401标识出该数据块被所有的处理器核共享。为了节省共享表项的存储空间,通过在单指针表项中增加全共享位303,并将全共享位303置1,就可以表示“数据块被多核***中的所有处理器核共享”这样的一种共享场景。
图6示出了根据本发明实施例的多核***中数据访问者目录的访问方法600的示意性流程图。该目录为前述本发明实施例的目录。该方法600可以由目录缓存执行。
S610,接收第一处理器核发送的第一访问请求,该第一访问请求用于访问目录中和第一数据块对应的表项。
其中,第一访问请求中可携带数据块的地址信息,根据访问请求中的地址信息可以访问目录,在目录中查找数据块对应的表项。
S620,根据该第一访问请求,确定单指针表项阵列中存在该第一数据块对应的第一单指针表项。
在接收到第一访问请求时,首先访问单指针表项阵列,确定是否存在数据块对应的单指针表项。具体地,根据访问请求中的地址信息可查找单指针表项阵列中是否存在数据块对应的单指针表项。例如,以图3所示的单指针表项的结构为例,可以将访问请求中携带的地址信息与单指针表项中的标签比对,以确定是否存在数据块对应的单指针表项。在本实施例中,单指针表项阵列中存在第一数据块对应的第一单指针表项。
S630,根据该第一单指针表项,确定在共享表项阵列中存在与该第一单指针表项关联的第一共享表项时,根据该第一共享表项确定该第一数据块的多个访问者。
在数据块(表示为第一数据块)有对应的单指针表项(表示为第一单指针表项)时,再根据该第一单指针表项,确定在共享表项阵列中是否存在与该第一单指针表项关联的共享表项。例如,以图3所示的单指针表项的结构为例,可以根据单指针表项中的共享表项关联位确定是否存在与该单指针表项关联的共享表项。在存在与该第一单指针表项关联的共享表项,即该第一单指针表项关联共享表项(表示为第一共享表项)时,根据该第一共享表项确定该第一数据块的多个访问者。具体地,可以根据单指针表项中记录的关联信息,例如图3所示结构中的单指针记录的关联信息,确定关联的共享表 项,访问该共享表项,从该共享表项中得到数据块的多个访问者。
以图5所示的目录结构为例,从单指针表项到共享表项的访问可采用如下方式。
1、根据单指针表项所在的组号,截取低位,获得共享表项的组号。
由于单指针表项阵列的组数较大,而共享表项阵列的组数较小,因此,可以通过单指针表项的组号的低位确定共享表项的组号。
假设单指针表项阵列为4路,64组;共享表项阵列为8路,16组。
当前访问的单指针表项的共享表项关联位为1,表示关联共享表项,需要访问共享表项阵列。64组的单指针表项阵列的组号为6比特(bit),单指针表项位于55组,组号表示为b_110111(b_表示二进制)。共享表项阵列共有16组,需要4bits的组号进行索引。截取b_110111的低4位b_0111,得到对应共享表项位于共享表项阵列中的7组。
2、访问共享表项阵列,读出同一组的多路共享表项。
根据上一步得到的组号,访问共享表项阵列的7组,获得组内的8个共享表项(8路)。
3、根据单指针表项中的单指针,对多路共享表项进行路选择。
8路的选择需要3bits。假定单指针值为b_1100,可使用单指针值的低3位b_100,即4路,从而得到关联的共享表项。
在本发明实施例中,可选地,如图7a所示,该方法600还可以包括:
S640,根据该第一单指针表项,确定在该共享表项阵列中不存在与该第一单指针表项关联的第一共享表项时,根据该第一单指针表项确定该第一数据块的唯一访问者或确定该第一数据块被该多核***中的所有处理器核共享。
也就是说,在该第一单指针表项未关联该第一共享表项时,只根据该第一单指针表项确定该第一数据块的访问者。具体而言,第一单指针表项可以记录第一数据块的唯一访问者或标识该第一数据块被多核***中的所有处理器核共享。这两种情况都不需要关联共享表项,可以用较少bit表示。具体的例子可参考前述实施例,在此不再赘述。
本发明实施例的访问目录的方法,先访问单指针表项,在单指针表项关联共享表项时再访问关联的共享表项,可以在数据块只有唯一访问者时从单指针表项获取该唯一访问者,在数据块有多个访问者时从单指针表项关联的 共享表项获取该多个访问者,这样,目录的平均目录表项尺寸可以得到很大的压缩,并且性能损失很小,从而能够节省目录所占用的存储资源,提高***的可扩展性。
在本发明实施例中,在单指针表项阵列中不存在数据块对应的单指针表项时,还可以为数据块分配对应的单指针表项。
因此,可选地,在接收第一处理器核发送的第一访问请求之后,如图7b所示,该方法600还可以包括:
S662,根据该第一访问请求,确定该单指针表项阵列中不存在与该第一数据块对应的单指针表项;
S663,在该单指针表项阵列中为该第一数据块分配与该第一数据块对应的第一单指针表项,并在该第一单指针表项中记录该第一处理器核的信息。
也就是说,在数据块没有对应的单指针表项时,可以为数据块分配单指针表项,并在分配的单指针表项中记录唯一的访问者(即第一处理器核)的信息。
具体地,若该单指针表项阵列中存在未使用的单指针表项,则从该未使用的单指针表项中选择一个单指针表项作为该第一单指针表项,记录该第一处理器核的信息;
若该单指针表项阵列中不存在未使用的单指针表项,则按照最近最少使用原则选择单指针表项,其中,
若选择的单指针表项未关联共享表项,且记录唯一访问者的信息,则向记录的唯一访问者发送无效化消息,再在所选择的单指针表项中记录该第一处理器核的信息,
若所选择的单指针表项未关联共享表项,且标识数据块被该多核***中的所有处理器核共享,则向该所有处理器核广播无效化消息,并在所选择的单指针表项中记录该第一处理器核的信息,
若所选择的单指针表项关联共享表项,则根据与该所选择的单指针表项关联的共享表项确定所关联的共享表项记录的多个访问者,向记录的多个访问者发送无效化消息,并在所选择的单指针表项中记录该第一处理器核的信息。
下面结合举例详细描述本发明实施例的访问目录的方法。
图8为本发明另一实施例的访问目录的方法的示意性流程图。
801,访问单指针表项阵列,若命中单指针表项则执行802,若未命中单指针表项则执行807。
例如,在单指针表项和共享表项分别采用图3和图4所示的结构时,可以将访问请求中携带的地址信息与单指针表项中的标签比对,以确定是否存在数据块对应的单指针表项。
802,确定单指针表项是否关联共享表项,若为是则执行803,若为否则执行804。
例如,可以根据命中的单指针表项中的共享表项关联位确定单指针表项是否关联共享表项,若共享表项关联位为1表示关联共享表项,若为0表示未关联共享表项。
803,访问关联的共享表项,获取访问者列表。
例如,可以根据单指针表项中的单指针,找到关联的共享表项,从关联的共享表项中的共享者记录结构中获取访问者列表。
804,确定是否全共享,若为是则执行805,若为否则执行806。
在未关联共享表项时,确定数据块是否被所有处理器核共享。例如,可以根据单指针表项中的全共享位确定是否全共享,若全共享位为0表示数据块只有唯一访问者,即不是全共享,若为1表示数据块被所有处理器核共享,即是全共享。
805,确定数据块被所有处理器核共享。
806,获取唯一访问者。
例如,可以从单指针表项中的单指针中获取唯一访问者。
807,确定是否有未使用的单指针表项,若为是则执行808,若为否则执行809。
808,选择未使用的单指针表项,记录访问处理器核的信息。
例如,可以记录处理器核的标识,以64核为例,可以采用6bit的标识。
809,选择最近最少使用的单指针表项,确定所选择单指针表项是否关联共享表项,若为是则执行810,若为否则执行811。
810,无效化多个访问者,记录访问处理器核的信息。
根据所关联的共享表项确定所关联的共享表项记录的多个访问者,向该多个访问者发送无效化消息,再在所选择的单指针表项中记录访问处理器核的信息。
811,确定所选择单指针表项是否表示全共享,若为是则执行812,若为否则执行813。
812,广播无效化消息,再在所选择的单指针表项中记录访问处理器核的信息。
813,无效化唯一访问者,记录访问处理器核的信息。
确定所选择单指针表项记录的唯一访问者,向该唯一访问者发送无效化消息,再在所选择的单指针表项中记录访问处理器核的信息。
在本发明实施例中,在为数据块(如前述第一数据块)分配单指针表项(即第一单指针表项)后,第一单指针表项记录第一数据块的唯一的访问者(即第一处理器核)。此时,第一数据块是第一处理器核私有的。当第一数据块被其他处理器核(表示为第二处理器核)访问时,需要在共享表项阵列中分配共享表项,利用共享表项记录多个访问者(第一处理器核和第二处理器核)的信息。
因此,可选地,如图9a所示,该方法600还可以包括:
S671,接收第二处理器核发送的第二访问请求,该第二访问请求用于访问目录中和该第一数据块对应的表项;
S672,根据该第二访问请求,确定该单指针表项阵列中存在该第一数据块对应的该第一单指针表项;
S673,根据该第一单指针表项,确定在该共享表项阵列中不存在与该第一单指针表项关联的共享表项时,根据该第一单指针表项确定该第一数据块的唯一访问者为该第一处理器核;
S674,在该共享表项阵列中分配第一共享表项,建立该第一单指针表项与该第一共享表项的关联关系,并在该第一共享表项中记录该第一处理器核的信息和该第二处理器核的信息。
具体地,在分配第一共享表项时,若该共享表项阵列中存在未使用的共享表项,则从该未使用的共享表项中选择一个共享表项作为该第一共享表项。
若该共享表项阵列中不存在未使用的共享表项且存在只记录一个访问者信息的共享表项,则选择只记录一个访问者信息的共享表项,将记录的一个访问者信息写入所选择的共享表项关联的单指针表项中。
若该共享表项阵列中不存在未使用的共享表项且不存在只记录一个访 问者信息的共享表项,则按照最近最少使用原则选择共享表项,若选择的共享表项记录的访问者数量大于预定阈值,则将所选择的共享表项关联的单指针表项设置为标识数据块被该多核***中的所有处理器核共享,若所选择的共享表项记录的访问者数量不大于预定阈值,则将记录的访问者中的一个访问者信息写入所选择的共享表项关联的单指针表项中,并向记录的访问者中的其他访问者发送无效化消息。
也就是说,在分配共享表项时,优先使用未被使用的共享表项。若没有未被使用的共享表项,则需要收回已被使用的共享表项,其中,优先选择只有一个访问者的共享表项,若没有只有一个访问者的共享表项再选择最近最少使用的共享表项。收回只记录一个访问者的共享表项时,需要将唯一的访问者写入关联的单指针表项中,这不会导致访问者信息的丢失。收回记录多个访问者的共享表项时,根据访问者数量的大小可采用不同的方式将共享列表压缩并存入关联的单指针表项中。具体地,若访问者数量大于预定阈值,则将关联的单指针表项设置为标识数据块被所有处理器核共享,这可称为向上转换;若访问者数量不大于预定阈值,则将访问者中的一个访问者写入关联的单指针表项中,并向其他访问者发送无效化消息,即只保留一个访问者,这可称为向下转换。
例如,如图9b所示,在执行向上转换时,关联的单指针表项的全共享位置1,表示数据块被所有处理器核共享;在执行向下转换时,只保留一个访问者(如图9b中的3号访问者),将其记录在关联的单指针表项中。
收回的共享表项可以分配给其他数据块。也就是说,在本发明实施例中,可以根据数据共享情况的变化而动态分配共享表项。这样,目录资源的利用更加灵活,从而能够提高目录资源的利用率。
收回共享表项时,会从共享表项访问单指针表项。在这种情况下,可以根据共享表项确定关联的单指针表项,具体地,可以根据共享表项中的关联结构确定关联的单指针表项。
以图5所示的目录结构为例,从共享表项到单指针表项的访问可采用如下方式。
1、根据共享表项所在的组号和高位地址进行拼接,获得单指针表项阵列的组号。
由于单指针表项阵列的组数较大,而共享表项阵列的组数较小,因此, 需要结合高位地址和共享表项的组号来获得单指针表项的组号。
假设单指针表项阵列为4路,64组;共享表项阵列为8路,16组。
假定共享表项位于5组(b_0101),高位地址为b_10,路选择位为b_01。对应的单指针表项的组号由共享表项组号和高位地址拼接得到,为b_100101,即37。
2、访问单指针表项阵列,读出同一组的多路单指针表项。
根据上一步得到的组号,访问单指针表项阵列的37组,获得组内的4个单指针表项(4路)。
3、根据共享表项中的路选择位,对多路单指针表项进行路选择。
共享表项中的路选择位用于路选择。路选择位为b_01,即1路,从而得到关联的单指针表项。
应理解,在本发明的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本发明实施例的实施过程构成任何限定。
图10示出了根据本发明实施例的目录缓存设备1000的示意性框图。如图10所示,目录缓存设备1000包括目录存储单元1010和执行单元1020。
目录存储单元1010用于保存多核***中的目录,该多核***包括共享数据缓存和多个处理器核,该共享数据缓存中的数据块被复制到该多个处理器核中的一个或者至少两个处理器核,该目录用于记录该共享数据缓存中的数据块的访问者信息,该数据块的访问者为保存有该数据块的拷贝的处理器核,该目录包括单指针表项阵列和共享表项阵列,其中,该单指针表项阵列中的每个单指针表项用于记录该数据块的唯一访问者的信息,或者记录该单指针表项和该共享表项阵列中的共享表项的关联信息,该共享表项阵列中的每个共享表项用于记录该数据块的多个访问者的信息。
执行单元1020用于:
接收第一处理器核发送的第一访问请求,该第一访问请求用于访问该目录中和第一数据块对应的表项;
根据该第一访问请求,确定该单指针表项阵列中存在该第一数据块对应的第一单指针表项;
根据该第一单指针表项,确定在该共享表项阵列中存在与该第一单指针表项关联的第一共享表项时,根据该第一共享表项确定该第一数据块的多个 访问者。
本发明实施例的目录缓存设备,采用单指针表项阵列加共享表项阵列的目录结构,在不需要记录多个访问者时,单指针表项不与共享表项关联,在需要记录多个访问者时,单指针表项和共享表项关联,目录的平均目录表项尺寸可以得到很大的压缩,并且性能损失很小,因此能够节省目录所占用的存储资源,提高***的可扩展性。
在本发明实施例中,可选地,该单指针表项包括标签、共享表项关联位和单指针,其中,该标签用于与该数据块对应,该共享表项关联位用于标识该单指针表项和该共享表项是否关联,该单指针用于在该数据块只有唯一访问者时记录该数据块的唯一访问者的信息,在该单指针表项和该共享表项关联时记录该单指针表项和该共享表项的关联信息;
该共享表项包括共享者记录结构和关联结构,其中,该共享者记录结构用于记录该数据块的多个访问者的信息,该关联结构用于关联该单指针表项。
在本发明实施例中,可选地,该单指针表项还包括全共享位,
该全共享位用于在该单指针表项和该共享表项未关联时标识该数据块只有唯一访问者或标识该数据块被该多核***中的所有处理器核共享。
在本发明实施例中,可选地,该单指针表项阵列中的单指针表项还用于标识该数据块被该多核***中的所有处理器核共享,该执行单元1020还用于:
根据该第一单指针表项,确定在该共享表项阵列中不存在与该第一单指针表项关联的第一共享表项时,根据该第一单指针表项确定该第一数据块的唯一访问者或确定该第一数据块被该多核***中的所有处理器核共享。
在本发明实施例中,可选地,在该执行单元1020接收该第一处理器核发送的第一访问请求之后,该执行单元1020还用于:
根据该第一访问请求,确定该单指针表项阵列中不存在该第一数据块对应的单指针表项;
在该单指针表项阵列中为该第一数据块分配与该第一数据块对应的第一单指针表项,并在该第一单指针表项中记录该第一处理器核的信息。
在本发明实施例中,可选地,该执行单元1020用于:
若该单指针表项阵列中存在未使用的单指针表项,则从该未使用的单指 针表项中选择一个单指针表项作为该第一单指针表项,记录该第一处理器核的信息;
若该单指针表项阵列中不存在未使用的单指针表项,则按照最近最少使用原则选择单指针表项,若选择的单指针表项未关联共享表项,且记录唯一访问者的信息,则向记录的唯一访问者发送无效化消息,再在所选择的单指针表项中记录该第一处理器核的信息,
若所选择的单指针表项未关联共享表项,且标识数据块被该多核***中的所有处理器核共享,则向该所有处理器核广播无效化消息,并在所选择的单指针表项中记录该第一处理器核的信息,
若所选择的单指针表项关联共享表项,则根据与该所选择的单指针表项所关联的共享表项确定所关联的共享表项记录的多个访问者,向记录的多个访问者发送无效化消息,并在所选择的单指针表项中记录该第一处理器核的信息。
在本发明实施例中,可选地,该执行单元1020还用于:
接收第二处理器核发送的第二访问请求,该第二访问请求用于访问该目录中和该第一数据块对应的表项;
根据该第二访问请求,确定该单指针表项阵列中存在该第一数据块对应的该第一单指针表项;
根据该第一单指针表项,确定在该共享表项阵列中不存在与该第一单指针表项关联的共享表项时,根据该第一单指针表项确定该第一数据块的唯一访问者为该第一处理器核;
在该共享表项阵列中分配第一共享表项,建立该第一单指针表项与该第一共享表项的关联关系,并在该第一共享表项中记录该第一处理器核的信息和该第二处理器核的信息。
在本发明实施例中,可选地,该执行单元1020用于:
若该共享表项阵列中存在未使用的共享表项,则从该未使用的共享表项中选择一个共享表项作为该第一共享表项;
若该共享表项阵列中不存在未使用的共享表项且存在只记录一个访问者信息的共享表项,则选择该只记录一个访问者信息的共享表项,将记录的一个访问者信息写入所选择的共享表项关联的单指针表项中;
若该共享表项阵列中不存在未使用的共享表项且不存在只记录一个访 问者的共享表项,则按照最近最少使用原则选择共享表项,若选择的共享表项记录的访问者数量大于预定阈值,则将所选择的共享表项关联的单指针表项设置为标识数据块被该多核***中的所有处理器核共享,若所选择的共享表项记录的访问者数量不大于预定阈值,则将记录的访问者中的一个访问者信息写入所选择的共享表项关联的单指针表项中,并向记录的访问者中的其他访问者发送无效化消息。
本发明实施例的目录缓存设备1000中的目录存储单元1010存储的目录可以为前述本发明实施例中的目录,执行单元1020可以执行前述方法实施例中的各个流程,相应的具体描述可参考前述各实施例,为了简洁,在此不再赘述。
本发明实施例还提供了一种多核***。如图11所示,该多核***1100包括:多个处理器核1110,共享数据缓存1120和前述本发明实施例中的目录缓存设备1000。
具体而言,本发明实施例的多核***1100相对于图1中的多核***100采用了新的目录缓存设备1000,其中包含了本发明实施例提供的新的目录结构。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的***、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的***、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个***,或一些特征可以忽略,或不执行。另外,所显示或讨论的相互之间的 耦合或直接耦合或通信连接可以是通过一些接口、装置或单元的间接耦合或通信连接,也可以是电的,机械的或其它的形式连接。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本发明实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以是两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分,或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以权利要求的保护范围为准。

Claims (21)

  1. 一种多核***中数据访问者目录的访问方法,其特征在于,应用于多核***,所述多核***包括共享数据缓存和多个处理器核,所述共享数据缓存中的数据块被复制到所述多个处理器核中的一个或者至少两个处理器核中,所述多核***还包括数据访问者目录,所述目录用于记录所述共享数据缓存中的数据块的访问者信息,所述数据块的访问者为保存有所述数据块拷贝的处理器核;
    所述目录包括单指针表项阵列和共享表项阵列,其中,所述单指针表项阵列中的每个单指针表项用于记录所述数据块的唯一访问者的信息,或者记录所述单指针表项和所述共享表项阵列中的共享表项的关联信息,所述共享表项阵列中的每个共享表项用于记录所述数据块的多个访问者的信息;
    所述方法包括:
    接收第一处理器核发送的第一访问请求,所述第一访问请求用于访问所述目录中和第一数据块对应的表项;
    根据所述第一访问请求,确定所述单指针表项阵列中存在所述第一数据块对应的第一单指针表项;
    根据所述第一单指针表项,确定在所述共享表项阵列中存在与所述第一单指针表项关联的第一共享表项时,根据所述第一共享表项确定所述第一数据块的多个访问者。
  2. 根据权利要求1所述的方法,其特征在于,所述单指针表项阵列中的单指针表项还用于标识所述数据块被所述多核***中的所有处理器核共享,所述方法还包括:
    根据所述第一单指针表项,确定在所述共享表项阵列中不存在与所述第一单指针表项关联的第一共享表项时,根据所述第一单指针表项确定所述第一数据块的唯一访问者或确定所述第一数据块被所述多核***中的所有处理器核共享。
  3. 根据权利要求1或2所述的方法,其特征在于,在接收第一处理器核发送的第一访问请求之后,所述方法还包括:
    根据所述第一访问请求,确定所述单指针表项阵列中不存在与所述第一数据块对应的单指针表项;
    在所述单指针表项阵列中为所述第一数据块分配与所述第一数据块对 应的第一单指针表项,并在所述第一单指针表项中记录所述第一处理器核的信息。
  4. 根据权利要求3所述的方法,其特征在于,所述在所述单指针表项阵列中为所述第一数据块分配与所述第一数据块对应的第一单指针表项,并在所述第一单指针表项中记录所述第一处理器核的信息,包括:
    若所述单指针表项阵列中存在未使用的单指针表项,则从所述未使用的单指针表项中选择一个单指针表项作为所述第一单指针表项,记录所述第一处理器核的信息;
    若所述单指针表项阵列中不存在未使用的单指针表项,则按照最近最少使用原则选择单指针表项,若选择的单指针表项未关联共享表项,且记录唯一访问者的信息,则向记录的唯一访问者发送无效化消息,再在所选择的单指针表项中记录所述第一处理器核的信息,
    若所选择的单指针表项未关联共享表项,且标识数据块被所述多核***中的所有处理器核共享,则向所述所有处理器核广播无效化消息,并在所选择的单指针表项中记录所述第一处理器核的信息,
    若所选择的单指针表项关联共享表项,则根据与所述所选择的单指针表项关联的共享表项确定所关联的共享表项记录的多个访问者,向记录的多个访问者发送无效化消息,并在所选择的单指针表项中记录所述第一处理器核的信息。
  5. 根据权利要求3或4所述的方法,其特征在于,所述方法还包括:
    接收第二处理器核发送的第二访问请求,所述第二访问请求用于访问所述目录中和所述第一数据块对应的表项;
    根据所述第二访问请求,确定所述单指针表项阵列中存在所述第一数据块对应的所述第一单指针表项;
    根据所述第一单指针表项,确定在所述共享表项阵列中不存在与所述第一单指针表项关联的共享表项时,根据所述第一单指针表项确定所述第一数据块的唯一访问者为所述第一处理器核;
    在所述共享表项阵列中分配第一共享表项,建立所述第一单指针表项与所述第一共享表项的关联关系,并在所述第一共享表项中记录所述第一处理器核的信息和所述第二处理器核的信息。
  6. 根据权利要求5所述的方法,其特征在于,所述在所述共享表项阵 列中分配第一共享表项,包括:
    若所述共享表项阵列中存在未使用的共享表项,则从所述未使用的共享表项中选择一个共享表项作为所述第一共享表项;
    若所述共享表项阵列中不存在未使用的共享表项且存在只记录一个访问者信息的共享表项,则选择所述只记录一个访问者信息的共享表项,将记录的一个访问者信息写入所选择的共享表项关联的单指针表项中;
    若所述共享表项阵列中不存在未使用的共享表项且不存在只记录一个访问者信息的共享表项,则按照最近最少使用原则选择共享表项,若选择的共享表项记录的访问者数量大于预定阈值,则将所选择的共享表项关联的单指针表项设置为标识数据块被所述多核***中的所有处理器核共享,若所选择的共享表项记录的访问者数量不大于预定阈值,则将记录的访问者中的一个访问者信息写入所选择的共享表项关联的单指针表项中,并向记录的访问者中的其他访问者发送无效化消息。
  7. 根据权利要求1至6中任一项所述的方法,其特征在于,
    所述单指针表项包括标签、共享表项关联位和单指针,其中,所述标签用于与所述数据块对应,所述共享表项关联位用于标识所述单指针表项和所述共享表项是否关联,所述单指针用于在所述数据块只有唯一访问者时记录所述数据块的唯一访问者的信息,在所述单指针表项和所述共享表项关联时记录所述单指针表项和所述共享表项的关联信息;
    所述共享表项包括共享者记录结构和关联结构,其中,所述共享者记录结构用于记录所述数据块的多个访问者的信息,所述关联结构用于关联所述单指针表项。
  8. 根据权利要求7所述的方法,其特征在于,所述单指针表项还包括:全共享位,
    所述全共享位用于在所述单指针表项和所述共享表项未关联时标识所述数据块只有唯一访问者或标识所述数据块被所述多核***中的所有处理器核共享。
  9. 一种目录缓存设备,其特征在于,包括:
    目录存储单元,用于保存多核***中的数据访问者目录,所述多核***包括共享数据缓存和多个处理器核,所述共享数据缓存中的数据块被复制到所述多个处理器核中的一个或者至少两个处理器核,所述目录用于记录所述 共享数据缓存中的数据块的访问者信息,所述数据块的访问者为保存有所述数据块的拷贝的处理器核,所述目录包括单指针表项阵列和共享表项阵列,其中,所述单指针表项阵列中的每个单指针表项用于记录所述数据块的唯一访问者的信息,或者记录所述单指针表项和所述共享表项阵列中的共享表项的关联信息,所述共享表项阵列中的每个共享表项用于记录所述数据块的多个访问者的信息;
    执行单元,用于:
    接收第一处理器核发送的第一访问请求,所述第一访问请求用于访问所述目录中和第一数据块对应的表项;
    根据所述第一访问请求,确定所述单指针表项阵列中存在所述第一数据块对应的第一单指针表项;
    根据所述第一单指针表项,确定在所述共享表项阵列中存在与所述第一单指针表项关联的第一共享表项时,根据所述第一共享表项确定所述第一数据块的多个访问者。
  10. 根据权利要求9所述的目录缓存设备,其特征在于,所述单指针表项阵列中的单指针表项还用于标识所述数据块被所述多核***中的所有处理器核共享,所述执行单元还用于:
    根据所述第一单指针表项,确定在所述共享表项阵列中不存在与所述第一单指针表项关联的第一共享表项时,根据所述第一单指针表项确定所述第一数据块的唯一访问者或确定所述第一数据块被所述多核***中的所有处理器核共享。
  11. 根据权利要求9或10所述的目录缓存设备,其特征在于,在所述执行单元接收所述第一处理器核发送的第一访问请求之后,所述执行单元还用于:
    根据所述第一访问请求,确定所述单指针表项阵列中不存在所述第一数据块对应的单指针表项;
    在所述单指针表项阵列中为所述第一数据块分配与所述第一数据块对应的第一单指针表项,并在所述第一单指针表项中记录所述第一处理器核的信息。
  12. 根据权利要求11所述的目录缓存设备,其特征在于,所述执行单元用于:
    若所述单指针表项阵列中存在未使用的单指针表项,则从所述未使用的单指针表项中选择一个单指针表项作为所述第一单指针表项,记录所述第一处理器核的信息;
    若所述单指针表项阵列中不存在未使用的单指针表项,则按照最近最少使用原则选择单指针表项,若选择的单指针表项未关联共享表项,且记录唯一访问者的信息,则向记录的唯一访问者发送无效化消息,再在所选择的单指针表项中记录所述第一处理器核的信息,
    若所选择的单指针表项未关联共享表项,且标识数据块被所述多核***中的所有处理器核共享,则向所述所有处理器核广播无效化消息,并在所选择的单指针表项中记录所述第一处理器核的信息,
    若所选择的单指针表项关联共享表项,则根据与所述所选择的单指针表项所关联的共享表项确定所关联的共享表项记录的多个访问者,向记录的多个访问者发送无效化消息,并在所选择的单指针表项中记录所述第一处理器核的信息。
  13. 根据权利要求11或12所述的目录缓存设备,其特征在于,所述执行单元还用于:
    接收第二处理器核发送的第二访问请求,所述第二访问请求用于访问所述目录中和所述第一数据块对应的表项;
    根据所述第二访问请求,确定所述单指针表项阵列中存在所述第一数据块对应的所述第一单指针表项;
    根据所述第一单指针表项,确定在所述共享表项阵列中不存在与所述第一单指针表项关联的共享表项时,根据所述第一单指针表项确定所述第一数据块的唯一访问者为所述第一处理器核;
    在所述共享表项阵列中分配第一共享表项,建立所述第一单指针表项与所述第一共享表项的关联关系,并在所述第一共享表项中记录所述第一处理器核的信息和所述第二处理器核的信息。
  14. 根据权利要求13所述的目录缓存设备,其特征在于,所述执行单元用于:
    若所述共享表项阵列中存在未使用的共享表项,则从所述未使用的共享表项中选择一个共享表项作为所述第一共享表项;
    若所述共享表项阵列中不存在未使用的共享表项且存在只记录一个访 问者信息的共享表项,则选择所述只记录一个访问者信息的共享表项,将记录的一个访问者信息写入所选择的共享表项关联的单指针表项中;
    若所述共享表项阵列中不存在未使用的共享表项且不存在只记录一个访问者信息的共享表项,则按照最近最少使用原则选择共享表项,若选择的共享表项记录的访问者数量大于预定阈值,则将所选择的共享表项关联的单指针表项设置为标识数据块被所述多核***中的所有处理器核共享,若所选择的共享表项记录的访问者数量不大于预定阈值,则将记录的访问者中的一个访问者信息写入所选择的共享表项关联的单指针表项中,并向记录的访问者中的其他访问者发送无效化消息。
  15. 根据权利要求9至14中任一项所述的目录缓存设备,其特征在于,
    所述单指针表项包括标签、共享表项关联位和单指针,其中,所述标签用于与所述数据块对应,所述共享表项关联位用于标识所述单指针表项和所述共享表项是否关联,所述单指针用于在所述数据块只有唯一访问者时记录所述数据块的唯一访问者的信息,在所述单指针表项和所述共享表项关联时记录所述单指针表项和所述共享表项的关联信息;
    所述共享表项包括共享者记录结构和关联结构,其中,所述共享者记录结构用于记录所述数据块的多个访问者的信息,所述关联结构用于关联所述单指针表项。
  16. 根据权利要求15中所述的目录缓存设备,其特征在于,所述单指针表项还包括全共享位,
    所述全共享位用于在所述单指针表项和所述共享表项未关联时标识所述数据块只有唯一访问者或标识所述数据块被所述多核***中的所有处理器核共享。
  17. 一种多核***,其特征在于,包括:多个处理器核,共享数据缓存和根据权利要求9至16中任一项所述的目录缓存设备。
  18. 一种目录存储单元,其特征在于,用于保存多核***中的数据访问者目录,所述多核***包括共享数据缓存和多个处理器核,所述共享数据缓存中的数据块被复制到所述多个处理器核中的一个或者至少两个处理器核,所述目录用于记录所述共享数据缓存中的数据块的访问者信息,所述数据块的访问者为保存有所述数据块的拷贝的处理器核,所述目录包括:
    单指针表项阵列和共享表项阵列;
    所述单指针表项阵列中的每个单指针表项用于记录所述数据块的唯一访问者的信息,或者记录所述单指针表项和所述共享表项阵列中的共享表项的关联信息;
    所述共享表项阵列中的每个共享表项用于记录所述数据块的多个访问者的信息。
  19. 根据权利要求18所述的目录存储单元,其特征在于,
    所述单指针表项包括标签、共享表项关联位和单指针,其中,所述标签用于与所述数据块对应,所述共享表项关联位用于标识所述单指针表项和所述共享表项是否关联,所述单指针用于在所述数据块只有唯一访问者时记录所述数据块的唯一访问者的信息,在所述单指针表项和所述共享表项关联时记录所述单指针表项和所述共享表项的关联信息;
    所述共享表项包括共享者记录结构和关联结构,其中,所述共享者记录结构用于记录所述数据块的多个访问者的信息,所述关联结构用于关联所述单指针表项。
  20. 根据权利要求19所述的目录存储单元,其特征在于,所述单指针表项包括:全共享位,
    所述全共享位用于在所述单指针表项和所述共享表项未关联时标识所述数据块只有唯一访问者或标识所述数据块被多核***中的所有处理器核共享。
  21. 根据权利要求19或20所述的目录存储单元,其特征在于,所述共享者记录结构为向量。
PCT/CN2015/073192 2015-02-16 2015-02-16 多核***中数据访问者目录的访问方法及设备 WO2016131175A1 (zh)

Priority Applications (10)

Application Number Priority Date Filing Date Title
BR112017017306-9A BR112017017306B1 (pt) 2015-02-16 Método para acessar dados em um sistema com múltiplos núcleos, dispositivo de cache de diretório, sistema com múltiplos núcleos, e unidade de armazenagem de diretório
CA2976132A CA2976132A1 (en) 2015-02-16 2015-02-16 Method for accessing data visitor directory in multi-core system and device
CN201580001247.8A CN106164874B (zh) 2015-02-16 2015-02-16 多核***中数据访问者目录的访问方法及设备
PCT/CN2015/073192 WO2016131175A1 (zh) 2015-02-16 2015-02-16 多核***中数据访问者目录的访问方法及设备
CN202010209306.3A CN111488293B (zh) 2015-02-16 2015-02-16 多核***中数据访问者目录的访问方法及设备
EP15882315.3A EP3249539B1 (en) 2015-02-16 2015-02-16 Method and device for accessing data visitor directory in multi-core system
SG11201706340TA SG11201706340TA (en) 2015-02-16 2015-02-16 Method for accessing data visitor directory in multi-core system and device
JP2017542831A JP6343722B2 (ja) 2015-02-16 2015-02-16 マルチコアシステムにおいてデータ訪問者ディレクトリにアクセスするための方法及びデバイス
KR1020177023526A KR102027391B1 (ko) 2015-02-16 2015-02-16 멀티 코어 시스템에서 데이터 방문자 디렉토리에 액세스하는 방법 및 장치
US15/675,929 US20170364442A1 (en) 2015-02-16 2017-08-14 Method for accessing data visitor directory in multi-core system and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2015/073192 WO2016131175A1 (zh) 2015-02-16 2015-02-16 多核***中数据访问者目录的访问方法及设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/675,929 Continuation US20170364442A1 (en) 2015-02-16 2017-08-14 Method for accessing data visitor directory in multi-core system and device

Publications (1)

Publication Number Publication Date
WO2016131175A1 true WO2016131175A1 (zh) 2016-08-25

Family

ID=56691906

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/073192 WO2016131175A1 (zh) 2015-02-16 2015-02-16 多核***中数据访问者目录的访问方法及设备

Country Status (8)

Country Link
US (1) US20170364442A1 (zh)
EP (1) EP3249539B1 (zh)
JP (1) JP6343722B2 (zh)
KR (1) KR102027391B1 (zh)
CN (2) CN111488293B (zh)
CA (1) CA2976132A1 (zh)
SG (1) SG11201706340TA (zh)
WO (1) WO2016131175A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109684237B (zh) * 2018-11-20 2021-06-01 华为技术有限公司 基于多核处理器的数据访问方法和装置
CN112825072B (zh) * 2019-11-21 2023-02-17 青岛海信移动通信技术股份有限公司 通信终端以及数据共享方法
CN114880254A (zh) * 2022-04-02 2022-08-09 锐捷网络股份有限公司 一种表项读取方法、装置及网络设备

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101859281A (zh) * 2009-04-13 2010-10-13 廖鑫 基于集中式目录的嵌入式多核缓存一致性方法
CN104133785A (zh) * 2014-07-30 2014-11-05 浪潮集团有限公司 采用混合目录的双控存储服务器的缓存一致性实现方法

Family Cites Families (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5197139A (en) * 1990-04-05 1993-03-23 International Business Machines Corporation Cache management for multi-processor systems utilizing bulk cross-invalidate
JP3132749B2 (ja) * 1994-12-05 2001-02-05 インターナショナル・ビジネス・マシーンズ・コーポレ−ション マルチプロセッサ・データ処理システム
US5787477A (en) * 1996-06-18 1998-07-28 International Business Machines Corporation Multi-processor cache coherency protocol allowing asynchronous modification of cache data
US7509391B1 (en) * 1999-11-23 2009-03-24 Texas Instruments Incorporated Unified memory management system for multi processor heterogeneous architecture
US6922755B1 (en) * 2000-02-18 2005-07-26 International Business Machines Corporation Directory tree multinode computer system
US6725343B2 (en) * 2000-10-05 2004-04-20 Hewlett-Packard Development Company, L.P. System and method for generating cache coherence directory entries and error correction codes in a multiprocessor system
US6895476B2 (en) * 2002-10-03 2005-05-17 Hewlett-Packard Development Company, L.P. Retry-based late race resolution mechanism for a computer system
US20050273571A1 (en) * 2004-06-02 2005-12-08 Lyon Thomas L Distributed virtual multiprocessor
US7376793B2 (en) * 2005-07-21 2008-05-20 Sun Microsystems, Inc. Cache coherence protocol with speculative writestream
US20070079072A1 (en) * 2005-09-30 2007-04-05 Collier Josh D Preemptive eviction of cache lines from a directory
US7475193B2 (en) * 2006-01-18 2009-01-06 International Business Machines Corporation Separate data and coherency cache directories in a shared cache in a multiprocessor system
US8195890B1 (en) * 2006-08-22 2012-06-05 Sawyer Law Group, P.C. Method for maintaining cache coherence using a distributed directory with event driven updates
US7840759B2 (en) * 2007-03-21 2010-11-23 International Business Machines Corporation Shared cache eviction
US8065487B2 (en) * 2007-03-21 2011-11-22 International Business Machines Corporation Structure for shared cache eviction
CN100543687C (zh) * 2007-09-04 2009-09-23 杭州华三通信技术有限公司 一种多核***的资源管理方法和控制核
US20110004729A1 (en) * 2007-12-19 2011-01-06 3Leaf Systems, Inc. Block Caching for Cache-Coherent Distributed Shared Memory
US8607237B2 (en) * 2008-06-02 2013-12-10 Microsoft Corporation Collection with local lists for a multi-processor system
US8140825B2 (en) * 2008-08-05 2012-03-20 International Business Machines Corporation Systems and methods for selectively closing pages in a memory
CN101504617B (zh) * 2009-03-23 2011-05-11 华为技术有限公司 一种基于处理器共享内存的数据发送方法及装置
US9361297B2 (en) * 2009-07-30 2016-06-07 Adobe Systems Incorporated Web service-based, data binding abstraction method
US8719500B2 (en) * 2009-12-07 2014-05-06 Intel Corporation Technique for tracking shared data in a multi-core processor or multi-processor system
CN102063406B (zh) * 2010-12-21 2012-07-25 清华大学 用于多核处理器的网络共享Cache及其目录控制方法
US9411733B2 (en) * 2011-09-09 2016-08-09 University Of Rochester Sharing pattern-based directory coherence for multicore scalability (“SPACE”)
CN102346714B (zh) * 2011-10-09 2014-07-02 西安交通大学 用于多核处理器的一致性维护装置及一致***互方法
US9424191B2 (en) * 2012-06-29 2016-08-23 Intel Corporation Scalable coherence for multi-core processors
US9158689B2 (en) * 2013-02-11 2015-10-13 Empire Technology Development Llc Aggregating cache eviction notifications to a directory
US9830265B2 (en) * 2013-11-20 2017-11-28 Netspeed Systems, Inc. Reuse of directory entries for holding state information through use of multiple formats
CN107003932B (zh) * 2014-09-29 2020-01-10 华为技术有限公司 多核处理器***的缓存目录处理方法和目录控制器

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101859281A (zh) * 2009-04-13 2010-10-13 廖鑫 基于集中式目录的嵌入式多核缓存一致性方法
CN104133785A (zh) * 2014-07-30 2014-11-05 浪潮集团有限公司 采用混合目录的双控存储服务器的缓存一致性实现方法

Also Published As

Publication number Publication date
JP2018508894A (ja) 2018-03-29
SG11201706340TA (en) 2017-09-28
US20170364442A1 (en) 2017-12-21
KR102027391B1 (ko) 2019-10-01
JP6343722B2 (ja) 2018-06-13
KR20170107061A (ko) 2017-09-22
EP3249539A1 (en) 2017-11-29
CN106164874B (zh) 2020-04-03
EP3249539A4 (en) 2018-01-24
CN111488293B (zh) 2024-06-25
CA2976132A1 (en) 2016-08-25
BR112017017306A2 (pt) 2019-12-17
CN111488293A (zh) 2020-08-04
CN106164874A (zh) 2016-11-23
EP3249539B1 (en) 2021-08-18

Similar Documents

Publication Publication Date Title
CN105740164B (zh) 支持缓存一致性的多核处理器、读写方法、装置及设备
KR101786871B1 (ko) 원격 페이지 폴트 처리 장치 및 그 방법
KR100978156B1 (ko) 스누프 필터에서의 실효를 감소시키기 위한 라인 스와핑 스킴을 위한 방법, 장치, 시스템 및 컴퓨터 판독 가능 기록 매체
WO2017050014A1 (zh) 一种数据存储处理方法和装置
CN108139966B (zh) 管理转址旁路缓存的方法和多核处理器
US20190026225A1 (en) Multiple chip multiprocessor cache coherence operation method and multiple chip multiprocessor
US10055349B2 (en) Cache coherence protocol
US20150113230A1 (en) Directory storage method and query method, and node controller
CN110795206A (zh) 用于促进集群级缓存和内存空间的***和方法
WO2009140631A2 (en) Distributed computing system with universal address system and method
CN105677580A (zh) 访问缓存的方法和装置
US10733101B2 (en) Processing node, computer system, and transaction conflict detection method
CN109684237B (zh) 基于多核处理器的数据访问方法和装置
CN107341114B (zh) 一种目录管理的方法、节点控制器和***
WO2024099448A1 (zh) 内存释放、内存恢复方法、装置、计算机设备及存储介质
JP7160792B2 (ja) キャッシュエントリ転送のためにキャッシュ位置情報を記憶するシステム及び方法
WO2016131175A1 (zh) 多核***中数据访问者目录的访问方法及设备
US10331560B2 (en) Cache coherence in multi-compute-engine systems
WO2016049808A1 (zh) 多核处理器***的缓存目录处理方法和目录控制器
US20230100746A1 (en) Multi-level partitioned snoop filter
CN114238165B (zh) 数据处理方法、数据处理装置以及存储介质
WO2024082702A1 (zh) 数据处理方法、装置、芯片以及计算机可读存储介质
CN117667987A (zh) 一种存储***、数据更新方法及设备
BR112017017306B1 (pt) Método para acessar dados em um sistema com múltiplos núcleos, dispositivo de cache de diretório, sistema com múltiplos núcleos, e unidade de armazenagem de diretório

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15882315

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 11201706340T

Country of ref document: SG

ENP Entry into the national phase

Ref document number: 2976132

Country of ref document: CA

REEP Request for entry into the european phase

Ref document number: 2015882315

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2017542831

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 20177023526

Country of ref document: KR

Kind code of ref document: A

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112017017306

Country of ref document: BR

REG Reference to national code

Ref country code: BR

Ref legal event code: B01E

Ref document number: 112017017306

Country of ref document: BR

Free format text: APRESENTE NOVA VIA DA ?QUADRO REIVINDICATORIO PARA PROCESSAMENTO NA FASE NACIONAL BRASILEIRA?, POIS O DOCUMENTO APRESENTADO TERMINA NA PAGINA 10 DE 14.

ENP Entry into the national phase

Ref document number: 112017017306

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20170811