WO2023108938A1 - 解决高速缓冲存储器地址二义性问题的方法和装置 - Google Patents

解决高速缓冲存储器地址二义性问题的方法和装置 Download PDF

Info

Publication number
WO2023108938A1
WO2023108938A1 PCT/CN2022/082036 CN2022082036W WO2023108938A1 WO 2023108938 A1 WO2023108938 A1 WO 2023108938A1 CN 2022082036 W CN2022082036 W CN 2022082036W WO 2023108938 A1 WO2023108938 A1 WO 2023108938A1
Authority
WO
WIPO (PCT)
Prior art keywords
address
virtual address
cache
ambiguity
target
Prior art date
Application number
PCT/CN2022/082036
Other languages
English (en)
French (fr)
Inventor
李祖松
郇丹丹
Original Assignee
北京微核芯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京微核芯科技有限公司 filed Critical 北京微核芯科技有限公司
Priority to EP22868428.8A priority Critical patent/EP4227814A1/en
Publication of WO2023108938A1 publication Critical patent/WO2023108938A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0891Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using clearing, invalidating or resetting means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0831Cache consistency protocols using a bus scheme, e.g. with bus monitoring or watching means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0815Cache consistency protocols
    • G06F12/0831Cache consistency protocols using a bus scheme, e.g. with bus monitoring or watching means
    • G06F12/0833Cache consistency protocols using a bus scheme, e.g. with bus monitoring or watching means in combination with broadcast means (e.g. for invalidation or updating)
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0864Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using pseudo-associative means, e.g. set-associative or hashing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0888Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using selective caching, e.g. bypass
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0893Caches characterised by their organisation or structure
    • G06F12/0895Caches characterised by their organisation or structure of parts of caches, e.g. directory or tag array
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1027Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1027Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB]
    • G06F12/1045Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] associated with a data cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1027Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB]
    • G06F12/1045Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] associated with a data cache
    • G06F12/1063Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] associated with a data cache the data cache being concurrently virtually addressed
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1016Performance improvement
    • G06F2212/1024Latency reduction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/10Providing a specific technical effect
    • G06F2212/1032Reliability improvement, data loss prevention, degraded operation etc
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/50Control mechanisms for virtual memory, cache or TLB
    • G06F2212/507Control mechanisms for virtual memory, cache or TLB using speculative control
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/62Details of cache specific to multiprocessor cache arrangements
    • G06F2212/621Coherency control relating to peripheral accessing, e.g. from DMA or I/O device

Definitions

  • the present disclosure relates to the field of electronic technology, in particular to a method and device for solving the problem of address ambiguity in a cache memory.
  • computer equipment can determine the address of fetching instructions through the program counter, determine the address of accessing the memory through the memory access instruction, and then fetch instructions from the physical address indicated by the address, or read and write data.
  • the virtual address is used as the index of the cache memory, there may be two The high bits of the Index of the memory address beyond the page offset are different. This is caused by the same physical address being mapped to different virtual addresses, so that different Cache lines can be addressed based on the Index, which is actually the same physical address, that is, there is an address ambiguity problem.
  • the bit corresponding to the inconsistency between the virtual address and the physical address in the cache memory index may be referred to as an address ambiguity bit in the present disclosure.
  • embodiments of the present disclosure provide a method, device, electronic device and non-transitory computer-readable storage medium for solving the problem of address ambiguity in cache memory.
  • a method for solving the problem of cache address ambiguity comprising:
  • an apparatus for solving the problem of address ambiguity in a cache memory comprising:
  • a receiving module configured to determine a corresponding first virtual address based on the received access instruction
  • a query module configured to query an address maintenance list to determine a target item corresponding to the first virtual address when an access miss based on the first virtual address, the information recorded in the target item includes a target Tag, a target address Semantic bit and target Cache;
  • An invalidation module configured to determine a second virtual address based on the first virtual address and the target item, and invalidate the information of the second virtual address, wherein the second virtual address is the same as the first A virtual address is mapped to the same physical address;
  • An obtaining module configured to obtain information corresponding to the access instruction, and write back the first virtual address.
  • an electronic device including:
  • the program includes instructions, and the instructions, when executed by the processor, cause the processor to execute the above-mentioned method for solving the cache address ambiguity problem.
  • a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause a computer to execute the above-mentioned method for solving the cache memory address ambiguity problem.
  • a computer program product includes computer program code, when the computer program code is run on a computer, to perform the above-mentioned solution to address ambiguity of cache memory approach to the problem.
  • a computer program includes computer program code, and when the computer program code is run on a computer, the computer executes the above-mentioned solution to the cache memory address ambiguity problem Methods.
  • the address maintenance list can be used to determine whether there is address ambiguity, and if so, the consistency between the virtual address and the physical address can be maintained based on the address maintenance list. Moreover, the address maintenance list has less data and is easy to maintain, which can improve processing efficiency while solving address ambiguity.
  • Fig. 1 shows a schematic diagram of a memory address
  • FIG. 2 shows a schematic diagram of a Cache structure provided according to an embodiment of the present disclosure
  • FIG. 3 shows a schematic structural diagram of a set-associated Cache provided according to an embodiment of the present disclosure
  • Fig. 4 shows a schematic diagram of comparing the number of bits of Index+Block Offset and Page Offset provided according to an embodiment of the present disclosure
  • FIG. 5 shows a flow chart of a method for solving the problem of address ambiguity in a cache memory according to an embodiment of the present disclosure
  • FIG. 6 shows a flowchart of an access method provided according to an embodiment of the present disclosure
  • Fig. 7 shows a schematic diagram of an address maintenance list provided according to an embodiment of the present disclosure
  • Fig. 8 shows a schematic diagram of another address maintenance list provided according to an embodiment of the present disclosure.
  • FIG. 9 shows a schematic block diagram of an apparatus for solving the problem of cache address ambiguity provided according to an embodiment of the present disclosure
  • FIG. 10 shows a structural block diagram of an exemplary electronic device that can be used to implement the embodiments of the present disclosure.
  • the term “comprise” and its variations are open-ended, ie “including but not limited to”.
  • the term “based on” is “based at least in part on”.
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one further embodiment”; the term “some embodiments” means “at least some embodiments.”
  • Relevant definitions of other terms will be given in the description below. It should be noted that concepts such as “first” and “second” mentioned in this disclosure are only used to distinguish different devices, modules or units, and are not used to limit the sequence of functions performed by these devices, modules or units or interdependence.
  • the memory address can be divided into a virtual address (Virtual Address; VA for short) and a physical address (Physical Address, PA for short).
  • the virtual address can be divided into two parts: virtual page number (Virtual Page Number, referred to as VPN) and page offset (Page Offset); physical address (Physical Address, referred to as PA) can be divided into two parts: physical page number (Physical Page Number) Page Number (PPN for short) and Page Offset (Page Offset).
  • the memory address can be divided into three parts: cache memory tag (Cache Tag), cache memory index (Cache Index) and cache memory block offset (Cache Block Offset).
  • Cache memory located between CPU (Central Processing Unit, central processing unit) and main memory DRAM (Dynamic Random Access Memory, dynamic random access memory), usually by SRAM (Static Random-Access Memory, static random access memory) composition.
  • CPU Central Processing Unit
  • main memory DRAM Dynamic Random Access Memory, dynamic random access memory
  • SRAM Static Random-Access Memory, static random access memory
  • the speed of the CPU is much higher than that of the memory.
  • the CPU directly accesses data from the memory, it has to wait for a certain clock cycle, while the Cache has a fast access speed and can save a part of the information that the CPU has just used or recycled. If the CPU needs to be used again This part of the information can be directly called from the Cache, which avoids accessing information from memory with a long delay, reduces the waiting time of the CPU, and thus improves the efficiency of the system.
  • the Cache structure is mainly composed of two parts, a tag (Tag) part and a data (Data) part.
  • the Data part is used to store information of a piece of continuous address
  • the Tag part is used to store the public address of this piece of continuous information.
  • a Tag and all its corresponding information form a line called a Cache Line
  • the data part in the Cache Line is called a data block (Data Block). If a piece of information can be stored in multiple places in the Cache, these multiple Cache Lines found by the same address are called Cache Sets.
  • Cache organization methods are divided into direct associativity, set associativity and full associativity, and the present disclosure mainly relates to the organization method of set associativity.
  • Direct associativity and full associativity can be regarded as special group associative organization methods in which the number of ways is 1 and the number of ways is the number of Cache lines respectively.
  • a schematic diagram of a set-associated Cache structure is shown in FIG. 3 .
  • the address for the processor to access the memory will be divided into three parts, the tag (Tag), the index (Index) and the block offset (Block Offset).
  • Index to find a set of Cache Lines from the Cache, that is, a Cache Set
  • use the Tag part read out from the Index index to compare with the Tag in the access address, and only if they are equal, can this Cache Line be indicated It is the one you want; there are a lot of memory access information in a Cache Line, you can find the really wanted information through the Block Offset part in the memory address and the access width of the memory access instruction, and it can locate each byte .
  • the address ambiguity problem means that different virtual addresses correspond to the same physical address.
  • the address for accessing the Cache can be a virtual address or a physical address.
  • the address of the instruction fetched by the processor and the address in the executed instruction are all virtual addresses, and the physical address needs to be translated through the TLB (Translation Lookaside Buffer, bypass translation buffer) to obtain the physical address.
  • TLB Translation Lookaside Buffer, bypass translation buffer
  • commercial processors generally use the virtual address Index to access the Cache line, and the physical address Tag performs Tag comparison, that is, the composition of the Cache memory address at this time is the virtual address Index and the physical address Tag (Virtually-Indexed, Physically -Tagged, VIPT).
  • each Cache Line contains 2 b bytes of information, and the number of Cache Sets is 2 L , that is, the number of bits of Index+Block Offset of the virtual address is L+b.
  • the page size in the physical address is 2 kbytes , the number of bits of the Page Offset is k.
  • the second type, k L+b, at this time, the capacity of the Cache is equal to the size of a page
  • the third type, k ⁇ L+b, at this time, the capacity of the Cache is greater than the size of a page.
  • the Index+Block Offset in the virtual address will not change during the process of converting it into a physical address, and it can be considered as coming from the physical address. It is equivalent to directly using the physical address Index and the physical address Tag (Physically-Indexed, Physically-Tagged, PIPT), in which case there will be no address ambiguity problem.
  • the Index+Block Offset of the virtual address corresponds to the [L+b-1, 0] part.
  • the [k-1, 0] part of the two virtual addresses VA1 and VA2 are the same, but the [L+b-1, 0] part is different, that is, based on the [k-1, 0] part can The same physical address is addressed, but different virtual addresses can be addressed based on the [L+b-1, 0] part, that is, the problem of address ambiguity exists.
  • the embodiments of the present disclosure provide a method for solving the problem of ambiguity of cache memory addresses, and the method may be applied to terminals, servers and/or other electronic devices with processing capabilities, which is not limited in the present disclosure.
  • the method will be introduced below with reference to the flow chart of the method for solving the cache memory address ambiguity problem shown in FIG. 5 .
  • Step 501 Determine the corresponding first virtual address based on the received access instruction.
  • Step 502 When the access misses based on the first virtual address, query the address maintenance list to determine the target item corresponding to the first virtual address.
  • the information recorded in the target item may include target Tag, target address ambiguity bit and target Cache.
  • Step 503 Determine a second virtual address based on the first virtual address and the target item, and invalidate the information of the second virtual address.
  • the second virtual address and the first virtual address are mapped to the same physical address.
  • Step 504 obtain the information corresponding to the access instruction, and write back the first virtual address.
  • the method may further include: updating the address maintenance list based on the first virtual address.
  • updating the address maintenance list based on the first virtual address includes:
  • the target address ambiguity bit is updated to the address ambiguity bit of the first virtual address.
  • the target Cache includes at least any one of the following: the instruction cache or data cache of the current processor core, or the instruction cache or data cache of other processor cores, wherein the current processor core is used to receive access instructions .
  • determining the corresponding first virtual address includes:
  • a first virtual address corresponding to the memory access instruction is determined, and the first virtual address corresponding to the memory access instruction is used to access the data Cache.
  • obtaining the first virtual address corresponding to the instruction fetch includes:
  • the first virtual address corresponding to the instruction fetch is obtained.
  • querying the address maintenance list to determine the target item corresponding to the first virtual address includes:
  • a target item consistent with the Tag of the first virtual address is determined.
  • determining the target item consistent with the Tag of the first virtual address includes:
  • At least one item corresponding to the Index is acquired in the address maintenance list; among the above at least one item, an item consistent with the Tag of the first virtual address is used as a target item.
  • the method further includes: if there is no target entry corresponding to the first virtual address in the address maintenance list, generating a target entry and adding it to the address maintenance list.
  • generating the target item includes: using the Tag of the first virtual address or the physical address as the target Tag, using the address ambiguity bit of the first virtual address as the target address ambiguity bit, and using the first virtual address The corresponding current Cache is used as the target Cache to generate target items.
  • determining the second virtual address based on the first virtual address and the target item, and invalidating the information of the second virtual address includes:
  • the address ambiguity bit is replaced with the address ambiguity bit of the target item to obtain the second virtual address
  • the target Cache is accessed, and the information of the second virtual address is invalidated in the target Cache.
  • obtaining the information corresponding to the access address and writing back the first virtual address includes:
  • the information in the physical address is obtained, and the information is filled into the first virtual address.
  • the first virtual address and the second virtual address are used to access L1 Cache (Level 1 cache memory).
  • the address maintenance list is a Snoop Filter (snoop filter) list.
  • the list fields of the address maintenance list include: Tag, address ambiguity bit and Cache information.
  • the Cache information includes: a first identifier of the processor core and a second identifier of the Cache, the first identifier is used to indicate the processor core to which the Cache belongs, and the second identifier is used to indicate that the Cache is an instruction cache or a data cache .
  • the Tag of the memory address, the address ambiguity bit, and the target Cache for storing instructions or data can be recorded based on the address maintenance list, and the consistency of the physical address and the virtual address can be maintained by using the address maintenance list to ensure a physical address There is only one virtual address to store valid information. Since the address maintenance list has less data and is easy to maintain, it can improve processing efficiency while solving address ambiguity.
  • the following will introduce the access method for solving the cache address ambiguity problem provided by the present disclosure starting from the information being retrieved for the first time. Embodiments of the present disclosure will be discussed based on the same physical address.
  • the memory access method is as follows:
  • Step 601 Determine the corresponding virtual address based on the received access instruction.
  • the access instruction may include a fetch instruction to the instruction Cache, and may also include a memory access instruction to the data Cache, and the memory access instruction is a data fetch instruction for taking out an operand in the storage unit indicated by the instruction address code or a memory for writing into it. number instructions.
  • the information stored in the virtual address may include data and instructions.
  • any processor core may receive an access initiated to the Cache. For example, when the device executes a certain calculation task, it can trigger the corresponding memory access instruction to obtain the data required for calculation. This embodiment does not limit the specific task of triggering the memory access instruction.
  • the current processor core can calculate a corresponding virtual address based on the base address and offset in the access instruction, and the virtual address can be used to access the Cache of the current processor core, that is, the current Cache corresponding to the virtual address. Specifically, based on the identifier of the base address register carried in the access instruction, the corresponding base address can be obtained from the register, and the base address can be added to the offset to determine the corresponding virtual address.
  • the instruction fetch and the memory access instruction may have different methods for determining the virtual address, and the specific processing may be as follows:
  • the first virtual address corresponding to the memory access instruction is determined.
  • the first virtual address corresponding to the instruction fetch is used for accessing the instruction Cache
  • the first virtual address corresponding to the memory fetch instruction is used for accessing the data Cache.
  • the first virtual address of the fetching instruction may be obtained based on a program counter (PC, Program counter).
  • PC program counter
  • the value in the program counter can be used to indicate the position of the current instruction in the instruction cache, and the value is the virtual address for accessing the instruction cache.
  • the corresponding base address can be obtained from the register, and the base address can be added to the offset to determine the corresponding virtual address.
  • Step 602 based on the determined virtual address, access the Cache to perform an access operation.
  • the virtual address can include Tag, Index and Block Offset.
  • the Cache accessed here may be an L1 Cache
  • the L1 Cache may be an L1 I-Cache (level 1 instruction cache), or an L1 D-Cache (level 1 data cache).
  • L2 Cache secondary cache memory
  • L1 Cache can no longer maintain the inclusion relationship with L1 Cache, that is, it will no longer store all the information stored in L1 Cache (Non-Inclusive), reducing space occupation, making L2 Cache There can be more space to realize the processing of other tasks and improve the performance of the system.
  • a corresponding set of Cache Lines can be determined from the Cache based on the Index of the virtual address, that is, a Cache Set can be found. Based on the Tag of the virtual address, it can be judged in a certain group of Cache Lines whether the Tag of each Cache Line is the same as the Tag of the target address, and whether the valid bit indicates that valid information is stored, that is, whether the Tag hits.
  • the access to the Cache Line is triggered for the first time, or when the information of the Cache Line has been replaced, the access to the Cache based on the virtual address will miss, and the execution may jump to step 603.
  • Step 603 when the access misses, query the address maintenance list to determine the target item corresponding to the virtual address.
  • each list field can include Tag, address ambiguity bit of the virtual address, and Cache information.
  • the foregoing Tag may refer to a Tag of a physical address.
  • the address maintenance list can be used to maintain the consistency of the physical address and the virtual address, and the physical address and the virtual address indicated in one item are a pair of virtual and real addresses mapped to each other.
  • the Cache information is used to indicate the Cache that stores valid information.
  • the Cache information may further include a first identifier of the processor core and a second identifier of the Cache, the first identifier may be used to indicate the processor core to which the Cache belongs, and the second identifier may be used Indicates that the Cache is an instruction cache or a data cache.
  • the Tag of the physical address can be 28'h0080009; the address ambiguity bit of the virtual address can be 2'b01; the Cache information can include the processor core bit vector (ie, the first identifier), such as 0001, meaning In a quad-core processor, the fourth processor core stores valid information of the physical address, and uses the inst bit (that is, the second identifier) to identify whether the Cache is an instruction cache or a data cache, and the inst bit is 1 to indicate an instruction cache. If the inst bit is 0, it means data cache.
  • the processor core bit vector and the inst bit together constitute the Cache information.
  • the address maintenance list can be a Snoop Filter list, and the Snoop Filter mechanism is used to monitor fetching instructions or memory access instructions, and maintain the address maintenance list, which can avoid using L2 Cache to solve the address ambiguity problem.
  • the address maintenance list may also be directories in other forms. The common feature of these directories is that they can be used to maintain consistency. This embodiment does not limit the specific directory form.
  • step 603 may be as follows: based on the address maintenance list, determine the target item consistent with the Tag of the virtual address.
  • the address maintenance list can be organized in a manner similar to that of the Cache, that is, fully associative, direct associative, and group associative.
  • the address maintenance list adopts a fully connected organization mode.
  • the address maintenance list can be traversed to determine whether the Tag of each item is consistent with the Tag of the first virtual address; if consistent, Then use the corresponding item as the target item of the first virtual address.
  • the address maintenance list is organized in a directly connected manner.
  • the index of each item in the address maintenance list may correspond to the Index of the physical address.
  • an item corresponding to the Index can be obtained in the address maintenance list; in the above item, determine whether the Tag of the item is consistent with the Tag of the first virtual address; if If they are consistent, use this item as the target item of the first virtual address. For example, when the Index of the physical address indicates the fifth row, one item of table information in the fifth row can be obtained from the address maintenance list, and if the Tag therein is consistent with the Tag of the first virtual address, this item is used as the target item.
  • the address maintenance list is organized in a group-connected manner.
  • the index of each item in the address maintenance list may also correspond to the Index of the physical address.
  • the items corresponding to the Index can be obtained in the address maintenance list; among the above items, determine whether the Tag of each item is consistent with the Tag of the first virtual address; If they are consistent, the corresponding item is used as the target item of the first virtual address.
  • the address maintenance list can be composed of groups connected to form 4 channels. When the Index of the physical address indicates the 5th row, the table information of the 5th row can be obtained for each address maintenance list, and 4 items of table information can be obtained. An item consistent with the Tag of the first virtual address is used as the target item.
  • the list field of the address maintenance list can also include a valid bit, which can be used to indicate whether the corresponding item is valid, for example, when the valid bit is 1, the item is valid, indicating that the corresponding virtual item The information stored in the address is valid; when the valid bit is 0, the item is invalid, indicating that the information stored in the corresponding virtual address of the item is invalid or has been replaced.
  • the above target items include valid items and exclude invalid items.
  • Step 604 when the access instruction is triggered for the first time or the information has been replaced, the target item of the virtual address will not be found in the address maintenance list, and the execution may jump to steps 604 and 605.
  • Steps 604 and 605 may be processed simultaneously or sequentially, and the specific processing sequence of steps 604 and 605 is not limited in this embodiment.
  • Step 604 retrieve corresponding information from the lower-level storage system based on the physical address corresponding to the virtual address.
  • the virtual address when no virtual-real address translation is performed, the virtual address may be translated through the TLB to obtain the corresponding physical address. Then access the lower-level storage system, such as L2 Cache, memory, etc., obtain the corresponding information through the physical address and retrieve it, store it in the current virtual address, and feed back data to the memory access component that initiated the memory access instruction, or send the instruction to the initiator
  • the instruction fetch unit returns the instruction.
  • Step 605 generating a target item corresponding to the virtual address, and adding it to the address maintenance list.
  • an item of information may be added to the address maintenance list.
  • the processing of generating the target item can be as follows: the Tag of the above-mentioned virtual address or physical address is used as the target Tag, the address ambiguity bit of the virtual address is used as the target address ambiguity bit, and the current Cache corresponding to the virtual address is used as the target Cache, Build the target item.
  • the target item When there is an empty item in the location where the address maintenance list can store the target item, the target item can be added to any empty item; when there is no empty item in the location where the address maintenance list can store the target item, any item can be added based on the replacement algorithm Replaced by the target item.
  • the above-mentioned replacement algorithm may include a random replacement algorithm (Random), a least recently used algorithm (LRU, Least Recently Used), an algorithm with the least number of visits (LFU, Least Frequently Used), etc., and this embodiment does not limit the specific replacement algorithm.
  • LRU least recently used algorithm
  • LFU least number of visits
  • LFU Least Frequently Used
  • the location where the target item can be stored refers to the location indicated by the Index of the physical address.
  • the access process corresponding to a virtual address has been introduced.
  • the data retrieved above can be stored on the data Cache Line corresponding to the corresponding virtual address Index.
  • the above fetched instruction may be stored on the instruction Cache Line corresponding to the corresponding virtual address Index.
  • the processor core When the processor core receives the memory access instruction to access the virtual address again, or when the processor core fetches the instruction corresponding to the program counter of the virtual address again, after performing the above steps 601-602, it can start from the corresponding virtual address To obtain information, there is no problem of address ambiguity.
  • the virtual address of another access instruction is called the first virtual address
  • the virtual address indicated by the address maintenance list is called the second virtual address
  • the second virtual address and the first virtual address are mapped to the same physical address .
  • Step 606 Execute the above step 601 for the above another access instruction to determine the corresponding first virtual address, execute the above step 602 to access the Cache and determine a miss, execute the above step 603 to determine the target item corresponding to the first virtual address, and jump to Step 606 is executed.
  • steps 601-603 the specific processing of steps 601-603 is the same as above, and will not be repeated here. Step 606 will be introduced below.
  • Step 606 Determine a second virtual address based on the first virtual address and the target item, and invalidate the information of the second virtual address.
  • the information recorded in the target item may include a target Tag corresponding to the second virtual address, an ambiguity bit of the target address, and a target Cache.
  • the information carried by the first virtual address and the target item can constitute the second virtual address indicated by the target item.
  • the processing of the above-mentioned step 606 can be as follows: in the first virtual address, the address ambiguity bit is replaced with the target address ambiguity bit to obtain the second virtual address; to access the target Cache, in the target Cache, the first The information of the second virtual address is invalid.
  • the target Tag in the target item can be obtained, and the second virtual address can be constructed. virtual address. This embodiment does not limit it.
  • the target Cache indicated by the target item can be accessed.
  • the processor core of the target Cache may be the processor core receiving the above another access, that is, the current processor core, or another processor core, which may be an instruction cache or a data cache. In the target Cache, invalidate the information stored on the second virtual address.
  • Step 607 obtain information corresponding to the access instruction, and write back the first virtual address.
  • step 607 is the same as that of step 604 above, that is, the information in the physical address is obtained, and the obtained information is filled into the first virtual address, which will not be repeated here.
  • the information in the second virtual address may also be obtained and written back, and the written back information may be filled into the first virtual address. At this time, the invalid operation performed on the second virtual address may be invalid write-back.
  • step 607 Since the address maintenance list at this time indicates the second virtual address, it may jump to step 607 to update the target item.
  • Step 608 Update the address maintenance list based on the first virtual address.
  • the address ambiguity bit of the first virtual address is inconsistent with the address ambiguity bit of the target item
  • the address ambiguity bit of the target item is updated to the address ambiguity bit of the first virtual address.
  • the target Cache of the target item is updated to the current Cache.
  • each item of information in the target item may be judged separately, and whenever the judgment result is inconsistent, the corresponding information is updated.
  • the address maintenance list shown in FIG. 8 it may be respectively for the address ambiguity bit of the first virtual address and the address ambiguity bit of the target item, and the current processor core and the processor core indicated by the target item.
  • the Cache judges with the target Cache indicated by the target item, and whenever the judgment result is inconsistent, the corresponding information is updated.
  • the judgment may not be performed, and the target address ambiguity bit is directly updated to the address ambiguity bit of the first virtual address; the target Cache is updated to the current Cache, that is, the current Cache The information of is updated to the target Cache information.
  • the first identifier may be updated based on the current processor core, and the second identifier may be updated based on whether the current Cache is an instruction cache or a data cache. For the same information, there is naturally no change after the update.
  • the address maintenance list has less data and is easy to maintain, which can improve processing efficiency while solving address ambiguity.
  • the address maintenance list can be a Snoop Filter list, and the performance of the existing system can be improved by reusing the Snoop Filter mechanism.
  • An embodiment of the present disclosure provides a device for solving the problem of address ambiguity in a cache memory, and the device is used to implement the above method for solving the problem of address ambiguity in a cache memory.
  • the schematic block diagram of the device for solving the problem of cache address ambiguity the device 900 for solving the problem of cache memory address ambiguity includes: a receiving module 901, a query module 902, an invalid module 903, and an acquisition module 904.
  • a receiving module 901 configured to determine a corresponding first virtual address based on the received access instruction
  • the query module 902 is configured to query the address maintenance list and determine the target item corresponding to the first virtual address when the access miss based on the first virtual address, and the information recorded in the target item includes target Tag, target address Ambiguity bit and target Cache;
  • An invalidation module 903 configured to determine a second virtual address based on the first virtual address and the target item, and invalidate the information of the second virtual address, wherein the second virtual address is the same as the first A virtual address is mapped to the same physical address;
  • the obtaining module 904 is configured to obtain information corresponding to the access instruction, and write back the first virtual address.
  • the device also includes an update module, the update module is used for:
  • the update module is used to:
  • the target Cache includes at least any of the following:
  • the instruction cache or data cache of the current processor core or the instruction cache or data cache of other processor cores, wherein the current processor core is configured to receive the access instruction.
  • the receiving module 901 is used for:
  • a first virtual address corresponding to the memory access instruction is determined, and the first virtual address corresponding to the memory access instruction is used to access the data Cache.
  • the receiving module 901 is used for:
  • the first virtual address of the fetched instruction is obtained.
  • the query module 902 is used for:
  • a target item consistent with the Tag of the first virtual address is determined.
  • the query module 902 is used for:
  • At least one item corresponding to the Index is acquired in the address maintenance list; among the at least one item, an item consistent with the Tag of the first virtual address is used as a target item.
  • the update module is also used to:
  • the target item corresponding to the first virtual address does not exist in the address maintenance list, generate the target item and add it to the address maintenance list.
  • the update module is used to:
  • the target Cache uses the Tag of the first virtual address or the physical address as the target Tag, using the address ambiguity bit of the first virtual address as the target address ambiguity, and using the current Cache corresponding to the first virtual address as the The target Cache generates the target item.
  • the invalidation module 903 is used for:
  • the address ambiguity bit is replaced by the target address ambiguity bit to obtain a second virtual address
  • the obtaining module 904 is used for:
  • the first virtual address and the second virtual address are used to access the L1 cache.
  • the address maintenance list is a Snoop Filter list.
  • the list fields of the address maintenance list include: Tag, address ambiguity bit and Cache information.
  • the Cache information includes: a first identifier of the processor core and a second identifier of the Cache, the first identifier is used to indicate the processor core to which the Cache belongs, and the second identifier is used for Indicates that the cache is an instruction cache or a data cache.
  • the address maintenance list may be used to determine whether there is address ambiguity, and if so, the consistency between the virtual address and the physical address may be maintained based on the address maintenance list. Moreover, the address maintenance list has less data and is easy to maintain, which can improve processing efficiency while solving address ambiguity.
  • An embodiment of the present disclosure also provides an electronic device, including: at least one processor; and a memory communicatively connected to the at least one processor.
  • the memory stores a computer program that can be executed by at least one processor.
  • the computer program is used to make the electronic device execute the method for solving the cache address ambiguity problem according to the embodiment of the present disclosure.
  • Embodiments of the present disclosure also provide a non-transitory computer-readable storage medium storing a computer program, wherein, when the computer program is executed by a processor of a computer, the computer executes the method for solving cache memory address 2 according to an embodiment of the present disclosure. approach to meaningful questions.
  • the embodiment of the present disclosure also provides a computer program product, including a computer program, wherein, when the computer program is executed by the processor of the computer, the computer executes the method for solving the cache memory address ambiguity problem according to the embodiment of the present disclosure .
  • the embodiment of the present disclosure also provides a computer program, the computer program includes computer program code, when the computer program code is run on the computer, so that the computer executes the method of resolving cache address ambiguity according to the embodiment of the present disclosure approach to the problem.
  • Electronic Equipment is intended to mean various forms of digital electronic computer equipment, such as, data center servers, notebook computers, thin clients, laptop computers, desktop computers, workstations, personal digital assistants, blade servers, mainframe computers, and other suitable computer.
  • Electronic devices may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smart phones, wearable devices, and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions, are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.
  • an electronic device 1000 includes a computing unit 1001, which can perform calculations according to a computer program stored in a read-only memory (ROM) 1002 or a computer program loaded from a storage unit 1008 into a random access memory (RAM) 1003. Various appropriate actions and processes are performed. In the RAM 1003, various programs and data necessary for the operation of the device 1000 can also be stored.
  • the computing unit 1001, ROM 1002, and RAM 1003 are connected to each other through a bus 1004.
  • An input/output (I/O) interface 1005 is also connected to the bus 1004 .
  • the input unit 1006 can be any type of device capable of inputting information to the electronic device 1000.
  • the input unit 1006 can receive input digital or character information, and generate key signal input related to user settings and/or function control of the electronic device.
  • the output unit 1007 may be any type of device capable of presenting information, and may include, but is not limited to, a display, a speaker, a video/audio output terminal, a vibrator, and/or a printer.
  • the storage unit 1008 may include, but is not limited to, a magnetic disk and an optical disk.
  • the communication unit 1009 allows the electronic device 1000 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks, and may include but not limited to a modem, a network card, an infrared communication device, a wireless communication transceiver and/or a chip Groups, such as Bluetooth devices, WiFi devices, WiMax devices, cellular communication devices, and/or the like.
  • the computing unit 1001 may be various general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of computing units 1001 include, but are not limited to, central processing units (CPUs), graphics processing units (GPUs), various dedicated artificial intelligence (AI) computing chips, various computing units that run machine learning model algorithms, digital signal processing processor (DSP), and any suitable processor, controller, microcontroller, etc.
  • the calculation unit 1001 executes the various methods and processes described above. For example, in some embodiments, a method for resolving cache address ambiguity may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 1008 .
  • part or all of the computer program may be loaded and/or installed on the electronic device 1000 via the ROM 1002 and/or the communication unit 1009 .
  • the computing unit 1001 may be configured in any other suitable way (for example, by means of firmware) to execute the method for solving the cache address ambiguity problem.
  • Program codes for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, a special purpose computer, or other programmable data processing devices, so that the program codes, when executed by the processor or controller, make the functions/functions specified in the flow diagrams and/or block diagrams Action is implemented.
  • the program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • machine-readable medium and “computer-readable medium” refer to any computer program product, apparatus, and/or means for providing machine instructions and/or data to a programmable processor (eg, magnetic disk, optical disk, memory, programmable logic device (PLD)), including machine-readable media that receive machine instructions as machine-readable signals.
  • machine-readable signal refers to any signal used to provide machine instructions and/or data to a programmable processor.
  • the systems and techniques described herein can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user. ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer.
  • a display device e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and pointing device eg, a mouse or a trackball
  • Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and can be in any form (including Acoustic input, speech input or, tactile input) to receive input from the user.
  • the systems and techniques described herein can be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., as a a user computer having a graphical user interface or web browser through which a user can interact with embodiments of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system.
  • the components of the system can be interconnected by any form or medium of digital data communication, eg, a communication network. Examples of communication networks include: Local Area Network (LAN), Wide Area Network (WAN) and the Internet.
  • a computer system may include clients and servers.
  • Clients and servers are generally remote from each other and typically interact through a communication network.
  • the relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

一种解决高速缓冲存储器地址二义性问题的方法,包括:基于接收到的访问指令,确定对应的第一虚地址(S501);当基于第一虚地址访问未命中时,查询地址维护列表,确定第一虚地址对应的目标项(S502),目标项所记录的信息包括目标Tag、目标地址二义性位和目标Cache;基于第一虚地址和目标项,确定第二虚地址,将第二虚地址的信息置为无效(S503),其中,第二虚地址与第一虚地址映射到同一物理地址;获取访问指令对应的信息,并写回第一虚地址(S504)。还公开了一种解决高速缓冲存储器地址二义性问题的装置。

Description

解决高速缓冲存储器地址二义性问题的方法和装置
相关申请的交叉引用
本申请要求在2021年12月17日在中国提交的中国专利申请号No.202111548542.9的优先权,其全部内容通过引用并入本文。
技术领域
本公开涉及电子技术领域,尤其涉及一种解决高速缓冲存储器地址二义性问题的方法和装置。
背景技术
在电子技术领域中,计算机设备可以通过程序计数器确定取指令的地址,通过访存指令确定访问存储器的地址,进而从地址所指示的物理地址中取出指令,或读取、写入数据。
若用虚地址作为高速缓冲存储器的索引,在高速缓冲存储器索引(Cache Index)和高速缓冲存储器块偏移(Cache Block Offset)的位数之和大于页偏移的位数时,可能存在两个存储器地址的Index中超出页偏移的高位不同。这是同一物理地址映射到不同的虚地址引起的,从而出现基于Index可以寻址到不同的Cache行,实际上是同一物理地址的情况,即存在地址二义性的问题。相对应的,高速缓冲存储器索引(Cache Index)中虚地址和物理地址不一致的对应位,在本公开中可以称为地址二义性位。
发明内容
为了解决现有技术的问题,本公开实施例提供了一种解决高速缓冲存储器地址二义性问题的方法、装置、电子设备和非瞬时计算机可读存储介质。
根据本公开的一方面,提供了一种解决高速缓冲存储器地址二义性问题的方法,所述方法包括:
基于接收到的访问指令,确定对应的第一虚地址;
当基于所述第一虚地址访问未命中时,查询地址维护列表,确定所述第一虚地址对应的目标项,所述目标项所记录的信息用于指示目标Tag、目标地址二义性位和目标Cache;
基于所述第一虚地址和所述目标项,确定第二虚地址,将所述第二虚地址的信息置为无效,其中,所述第二虚地址与所述第一虚地址映射到同一物理地址;
获取所述访问指令对应的信息,并写回所述第一虚地址。
根据本公开的另一方面,提供了一种解决高速缓冲存储器地址二义性问题的装置,所述装置包括:
接收模块,用于基于接收到的访问指令,确定对应的第一虚地址;
查询模块,用于当基于所述第一虚地址访问未命中时,查询地址维护列表,确定所述第一虚地址对应的目标项,所述目标项所记录的信息包括目标Tag、目标地址二义性位和 目标Cache;
无效模块,用于基于所述第一虚地址和所述目标项,确定第二虚地址,将所述第二虚地址的信息置为无效,其中,所述第二虚地址与所述第一虚地址映射到同一物理地址;
获取模块,用于获取所述访问指令对应的信息,并写回所述第一虚地址。
根据本公开的另一方面,提供了一种电子设备,包括:
处理器;以及
存储程序的存储器,
其中,所述程序包括指令,所述指令在由所述处理器执行时使所述处理器执行上述解决高速缓冲存储器地址二义性问题的方法。
根据本公开的另一方面,提供了一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使计算机执行上述解决高速缓冲存储器地址二义性问题的方法。
根据本公开的另一方面,提供了一种计算机程序产品,所述计算机程序产品中包括计算机程序代码,当所述计算机程序代码在计算机上运行时,以执行上述解决高速缓冲存储器地址二义性问题的方法。
根据本公开的另一方面,提供了一种计算机程序,所述计算机程序包括计算机程序代码,当所述计算机程序代码在计算机上运行时,以使得计算机执行上述解决高速缓冲存储器地址二义性问题的方法。
本公开中,可以采用地址维护列表来确定是否存在地址二义性,若存在,则可以基于地址维护列表来维护虚地址和物理地址的一致性。并且,地址维护列表数据量少,维护简单,可以在解决地址二义性的同时提高处理效率。
附图说明
在下面结合附图对于示例性实施例的描述中,本公开的更多细节、特征和优点被公开,在附图中:
图1示出了存储器地址示意图;
图2示出了根据本公开实施例提供的Cache结构示意图;
图3示出了根据本公开实施例提供的组相连的Cache结构示意图;
图4示出了根据本公开实施例提供的Index+Block Offset和Page Offset的位数比较示意图;
图5示出了根据本公开实施例提供的解决高速缓冲存储器地址二义性问题的方法流程图;
图6示出了根据本公开实施例提供的访问方法流程图;
图7示出了根据本公开实施例提供的地址维护列表示意图;
图8示出了根据本公开实施例提供的另一地址维护列表示意图;
图9示出了根据本公开实施例提供的解决高速缓冲存储器地址二义性问题的装置的示意性框图;
图10示出了能够用于实现本公开的实施例的示例性电子设备的结构框图。
具体实施方式
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
为了清楚描述本公开实施例提供的方法,下面对所使用的技术进行介绍。
如图1所示的存储器地址示意图,从操作***的角度而言,存储器地址可以分为虚地址(Virtual Address;简称VA)和物理地址(Physical Address,简称PA)。其中,虚地址可以分为两部分:虚页号(Virtual Page Number,简称VPN)和页偏移(Page Offset);物理地址(Physical Address,简称PA)可以分为两部分:物理页号(Physical Page Number,简称PPN)和页偏移(Page Offset)。
从处理器的Cache(高速缓冲存储器)的角度而言,存储器地址可以分为三部分:高速缓冲存储器标签(Cache Tag)、高速缓冲存储器索引(Cache Index)和高速缓冲存储器块偏移(Cache Block Offset)。
1、Cache
高速缓冲存储器,位于CPU(Central Processing Unit,中央处理单元)和主存储器DRAM(Dynamic Random Access Memory,动态随机存取存储器)之间,通常由SRAM(Static Random-Access Memory,静态随机存取存储器)组成。CPU的速度远高于内存,当CPU直接从内存中存取数据时要等待一定的时钟周期,而Cache则访问速度快,可以保存CPU刚用过或循环使用的一部分信息,如果CPU需要再次使用该部分信息时可从Cache中直接调用,这样就避免了从延迟长的内存存取信息,减少了CPU的等待时间,因而提高了*** 的效率。
如图2所示的Cache结构示意图,Cache主要由两部分组成,标签(Tag)部分和数据(Data)部分。Data部分用于保存一片连续地址的信息,Tag部分用于存储这片连续信息的公共地址。一个Tag和它对应的所有信息组成一行称为一个Cache Line,而Cache Line中的数据部分称为数据块(Data Block)。如果一个信息可以存储在Cache中的多个地方,这些被同一个地址找到的多个Cache Line称为Cache Set。
2、Cache的组织方式
Cache的组织方式分为直接相连、组相连和全相连,本公开主要涉及组相连的组织方式。直接相连和和全相连可以分别看作路数为1以及路数为Cache行数的特殊的组相连组织方式。组相连的Cache结构示意图如图3所示。
处理器访问存储器的地址会被分为三部分,标签(Tag)、索引(Index)和块内偏移(Block Offset)。其中,使用Index来从Cache中找到一组Cache Line,也即是一个Cache Set;使用Index索引读出的Tag部分来和访问地址中的Tag进行比较,只有它们是相等的,才表明这个Cache Line就是想要的那个;在一个Cache Line中对应有很多个访存信息,通过存储器地址中的Block Offset部分和访存指令的访问宽度可以找到真正想要的信息,它可以定位到每个字节。在Cache Line中还有一个有效位(valid),用来标记Cache Line是否保存着有效的信息,只有在之前被访问过的存储器地址,它的信息才会存在对应的Cache Line中,相应的有效位也会被置为1。
3、地址二义性问题
地址二义性问题是指不同的虚地址对应同一个物理地址。
下面对地址二义性问题的产生原因进行介绍。
访问Cache的地址可以是虚地址,也可以是物理地址。处理器取指令的地址和执行的指令中的地址都是虚地址,需要经过TLB(Translation Lookaside Buffer,旁路转换缓冲)进行虚实地址转换才能得到物理地址。从Cache访问效率的角度,商用处理器一般都是用虚地址Index访问Cache行,物理地址Tag进行Tag比较,即此时Cache存储器地址的构成为虚地址Index和物理地址Tag(Virtually-Indexed,Physically-Tagged,VIPT)。
在这样的方法中,访问Cache和访问TLB是可以同时进行的。假设在直接映射的Cache中,每个Cache Line中包括2 b字节的信息,而Cache Set的个数是2 L,也就是说,虚地址的Index+Block Offset的位数为L+b。假设物理地址中页的大小是2 k字节,则Page Offset的位数为k。
因此,可以存在如图4所示的3种情况:
第一种,k>L+b,此时,Cache的容量小于一个页的大小;
第二种,k=L+b,此时,Cache的容量等于一个页的大小;
第三种,k<L+b,此时,Cache的容量大于一个页的大小。
对于上述第一种和第二种情况,在虚实地址转换的过程中,将虚地址中的Index+Block Offset在转换为物理地址的过程中是不会发生变化的,可以认为来自于物理地址。相当于 直接使用了物理地址Index和物理地址Tag(Physically-Indexed,Physically-Tagged,PIPT),这种情况下不会出现地址二义性问题。
但是对于上述第三种情况,由于1路Cache的容量存在限制(如4KB),若想要增加Cache的容量,需要使用组相连结构的Cache来增加路(way)的数量(如增加到32KB,对应于8路组相连的结构)。但way的数量受Cache访问时间的限制,不可能在组相连的结构Cache中无限制的增加way的数量,因此只能增加每个way的容量,此时就会导致虚地址中Index的位数增加,使得虚地址Index+Block Offset的位数大于物理地址中Page Offset的位数,即k<L+b。
由于物理地址的Page Offset对应虚地址的[k-1,0]部分,虚地址的Index+Block Offset对应于[L+b-1,0]部分。可能存在两个虚地址VA1和VA2的[k-1,0]部分相同,但是[L+b-1,0]部分不同的情况,也即是说,基于[k-1,0]部分可以寻址到同一物理地址,但是基于[L+b-1,0]部分可以寻址到不同的虚地址,即存在地址二义性的问题。地址二义性问题不但会造成Cache空间浪费,而且当Cache中VA2对应信息被改变时,VA1的信息不会随着变化,导致一个物理地址在Cache中存在两个不同的信息,当后续的取数指令从地址VA1读取信息时,读取到的将会是未改变的信息,则导致信息读取错误。
因此,本公开实施例提供了一种解决高速缓冲存储器地址二义性问题的方法,该方法可以应用于终端、服务器和/或其他具备处理能力的电子设备,本公开对此不作限定。
下面将参照图5所示的解决高速缓冲存储器地址二义性问题的方法流程图,对该方法进行介绍。
步骤501,基于接收到的访问指令,确定对应的第一虚地址。
步骤502,当基于第一虚地址访问未命中时,查询地址维护列表,确定第一虚地址对应的目标项。
其中,目标项所记录的信息可以包括目标Tag、目标地址二义性位和目标Cache。
步骤503,基于第一虚地址和目标项,确定第二虚地址,将第二虚地址的信息置为无效。
其中,第二虚地址与第一虚地址映射到同一物理地址。
步骤504,获取访问指令对应的信息,并写回第一虚地址。
在一些实施例中,该方法还可以包括:基于第一虚地址,更新地址维护列表。
在一些实施例中,基于第一虚地址,更新地址维护列表,包括:
当第一虚地址的地址二义性位与目标地址二义性位不一致时,将目标地址二义性位,更新为第一虚地址的地址二义性位。
在一些实施例中,目标Cache至少包括以下任意一种:当前处理器核的指令Cache或数据Cache,或,其他处理器核的指令Cache或数据Cache,其中,当前处理器核用于接收访问指令。
在一些实施例中,基于接收到的访问指令,确定对应的第一虚地址,包括:
当接收到取指令时,获取该取指令对应的第一虚地址,取指令对应的第一虚地址用于 访问指令Cache;或
当接收到访存指令时,确定访存指令对应的第一虚地址,访存指令对应的第一虚地址用于访问数据Cache。
在一些实施例中,获取该取指令对应的第一虚地址,包括:
基于程序计数器,获取该取指令对应的第一虚地址。
在一些实施例中,查询地址维护列表,确定第一虚地址对应的目标项,包括:
基于地址维护列表,确定与第一虚地址的Tag一致的目标项。
在一些实施例中,基于地址维护列表,确定与第一虚地址的Tag一致的目标项,包括:
遍历地址维护列表,确定每一项的Tag是否与第一虚地址的Tag一致;如果一致,则将对应的一项作为第一虚地址的目标项;或
基于物理地址的Index,在地址维护列表中获取Index对应的至少一项;在上述至少一项中,将与第一虚地址的Tag一致的一项作为目标项。
在一些实施例中,该方法还包括:如果地址维护列表中不存在第一虚地址对应的目标项,则生成目标项,并添加到地址维护列表中。
在一些实施例中,生成目标项,包括:将第一虚地址或者物理地址的Tag作为目标Tag,将第一虚地址的地址二义性位作为目标地址二义性位,将第一虚地址对应的当前Cache作为目标Cache,生成目标项。
在一些实施例中,基于第一虚地址和目标项,确定第二虚地址,将第二虚地址的信息置为无效,包括:
在第一虚地址中,将地址二义性位替换为目标项的地址二义性位,得到第二虚地址;
访问目标Cache,在目标Cache中,将第二虚地址的信息置为无效。
在一些实施例中,获取访问地址对应的信息,并写回第一虚地址,包括:
获取第二虚地址中的信息并写回,将写回的信息填充到第一虚地址中;或
获取物理地址中的信息,将信息填充到第一虚地址中。
在一些实施例中,第一虚地址和第二虚地址用于访问L1 Cache(一级高速缓冲存储器)。
在一些实施例中,地址维护列表为Snoop Filter(侦听过滤器)列表。
在一些实施例中,地址维护列表的列表域包括:Tag、地址二义性位和Cache信息。
在一些实施例中,Cache信息包括:处理器核的第一标识和Cache的第二标识,第一标识用于指示Cache所属的处理器核,第二标识用于指示Cache为指令Cache或数据Cache。
本公开实施例中,可以基于地址维护列表记录存储器地址的Tag、地址二义性位和存储指令或数据的目标Cache,利用地址维护列表来维护物理地址和虚地址的一致性,保证一个物理地址只存在一个虚地址存储有效信息。由于地址维护列表数据量少,维护简单,可以在解决地址二义性的同时提高处理效率。
下面将从信息第一次被调取开始,对采用本公开提供的解决高速缓冲存储器地址二义性问题的访问方法进行介绍。本公开的实施例将基于同一个物理地址进行讨论。
参照图6所示的访存方法流程图,该访存方法如下:
步骤601,基于接收到的访问指令,确定对应的虚地址。
其中,访问指令可以包括对指令Cache的取指令,还可以包括对数据Cache的访存指令,访存指令是将指令地址码指示的存储单元中的操作数取出的取数指令或写入的存数指令。相对应的,虚地址中存储的信息可以包括数据和指令。
在一种可能的实施方式中,电子设备在运行过程中,任一处理器核可以接收到对Cache发起的访问。例如,当设备执行某个计算任务时,可以触发相应的访存指令,获取计算所需的数据。本实施例对触发访存指令的具体任务不作限定。
进而,当前处理器核可以基于访问指令中的基地址和偏移量,计算对应的虚地址,该虚地址可以用于访问当前处理器核的Cache,即该虚地址对应的当前Cache。具体的,可以是基于访问指令中携带的基地址寄存器的标识,从寄存器中获取对应的基地址,将基地址与偏移量相加,确定对应的虚地址。
在一些实施例中,取指令和访存指令可以具有不同的确定虚地址的方法,具体的处理可以如下:
当接收到取指令时,获取该取指令对应的第一虚地址;或
当接收到访存指令时,确定访存指令对应的第一虚地址。
其中,取指令对应的第一虚地址用于访问指令Cache,访存指令对应的第一虚地址用于访问数据Cache。
具体的,对于取指令,可以是基于程序计数器(PC,Program counter),获取该取指令的第一虚地址。程序计数器中的数值可以用于指示当前指令在指令Cache中的位置,该数值即为访问指令Cache的虚地址。
对于访存指令,可以基于指令中携带的基地址寄存器的标识,从寄存器中获取对应的基地址,将基地址与偏移量相加,确定对应的虚地址。
步骤602,基于确定的虚地址,访问Cache进行访问操作。
其中,虚地址可以包括Tag、Index和Block Offset。
在一些实施例中,此处访问的Cache可以是L1 Cache,L1 Cache可以是L1 I-Cache(一级指令高速缓冲存储器),也可以是L1 D-Cache(一级数据高速缓冲存储器)。在此基础上,L2 Cache(二级高速缓冲存储器)可以不再维护与L1 Cache的包含关系,即不再存储L1 Cache所存储的所有信息(Non-Inclusive),减少空间的占用,使得L2 Cache可以有更多的空间来实现其他任务的处理,提高***的性能。
在一种可能的实施方式中,可以基于虚地址的Index从Cache中确定对应的一组Cache Line,也即,找到一个Cache Set。基于虚地址的Tag,可以在确定的一组Cache Line中判断每个Cache Line的Tag是否与目标地址的Tag相同,以及有效位是否指示存储有效信息,也即是Tag是否命中。如果命中,即Cache Line的Tag与目标地址的Tag相同,且有效位为1,指示存储有效信息,则通过选择器,选择命中的目标Cache Line中的Data Block,并根据Block Offset在Data Block中的相应位置读取或存储信息;如果未命中,即Cache Line的Tag与目标地址的Tag不相同,或有效位为0,指示未存储有效信息。
当对Cache Line的访问是第一次触发时,或Cache Line的信息已被替换时,基于虚地址访问Cache将未命中,则可以跳转至步骤603执行。
步骤603,当访问未命中时,查询地址维护列表,确定虚地址对应的目标项。
其中,如图7所示的地址维护列表示意图,地址维护列表中可以记录多项信息,每项列表域可以包括Tag、虚地址的地址二义性位和Cache信息。在一些实施例中,上述Tag可以是指物理地址的Tag。地址维护列表可以用于维护物理地址和虚地址的一致性,一项中所指示的物理地址和虚地址即为一对互相映射的虚实地址。
Cache信息用于指示存储有效信息的Cache。在一些实施例中,如图8所示,Cache信息可以进一步包括处理器核的第一标识和Cache的第二标识,第一标识可以用于指示Cache所属的处理器核,第二标识可以用于指示Cache为指令Cache或数据Cache。
在一些实施例中,物理地址的Tag可以是28’h0080009;虚地址的地址二义性位可以是2’b01;Cache信息可以包括处理器核位向量(即第一标识),如0001,意思是在四核处理器中,第4个处理器核存储有物理地址的有效信息,并且用inst位(即第二标识)来标识Cache是指令Cache还是数据Cache,inst位为1表示指令Cache,inst位为0表示数据Cache。处理器核位向量和inst位共同构成Cache信息。
在一些实施例中,地址维护列表可以是Snoop Filter列表,采用Snoop Filter机制对取指令或访存指令进行监听,对地址维护列表进行维护,可以避免使用L2 Cache来解决地址二义性问题。当然,地址维护列表还可以是其他形式的目录,这些目录的共同点在于可以用于维护一致性,本实施例对具体的目录形式不作限定。
相对应的步骤603的处理可以如下:基于地址维护列表,确定与虚地址的Tag一致的目标项。
地址维护列表可以采用与Cache相类似的组织方式,也即是全相连、直接相连和组相连的组织方式。
在第一种可能的实施方式中,地址维护列表采用全相连的组织方式,在步骤603中可以是遍历地址维护列表,确定每一项的Tag是否与第一虚地址的Tag一致;如果一致,则将对应的一项作为第一虚地址的目标项。
在第二种可能的实施方式中,地址维护列表采用直接相连的组织方式,此时,地址维护列表每一项的索引可以与物理地址的Index相对应。在此基础上,在步骤603中可以基于物理地址的Index,在地址维护列表中获取Index对应的一项;在上述一项中,确定该项的Tag是否与第一虚地址的Tag一致;如果一致,则将该项作为第一虚地址的目标项。例如,当物理地址的Index指示第5行时,可以对地址维护列表获取第5行的1项表信息,如果其中的Tag与第一虚地址的Tag一致,则将该项作为目标项。
在第三种可能的实施方式中,地址维护列表采用组相连的组织方式,此时,地址维护列表每一项的索引也可以与物理地址的Index相对应。在此基础上,在步骤603中可以基于物理地址的Index,在地址维护列表中获取Index对应的多项;在上述多项中,确定每一项的Tag是否与第一虚地址的Tag一致;如果一致,则将对应的一项作为第一虚地址的目 标项。例如,地址维护列表可以采用组相连的组成方式构成4路,当物理地址的Index指示第5行时,可以对每路地址维护列表获取第5行的表信息,得到4项表信息,将其中与第一虚地址的Tag一致的一项作为目标项。
在一些实施例中,地址维护列表的列表域还可以包括有效位,该有效位可以用于指示对应的一项是否有效,例如,当有效位为1时,该项有效,表明该项对应虚地址存储的信息有效;当有效位为0时,该项无效,表明该项对应虚地址存储的信息无效,或已被替换。在此基础上,上述目标项包括有效项,不包括无效项。
与步骤602同理,当访问指令是第一次触发或信息已被替换时,在地址维护列表中将查询不到虚地址的目标项,则可以跳转至步骤604、605执行。步骤604、605可以同时处理,也可以顺序处理,本实施例对步骤604、605具体的处理顺序不作限定。
步骤604,基于虚地址对应的物理地址,从下级存储***中取回相应的信息。
在一种可能的实施方式中,在没有经过虚实地址转换时,可以将虚地址经过TLB进行虚实地址转换,得到对应的物理地址。进而访问下级存储***,例如L2 Cache、内存等,通过物理地址获取对应的信息并取回,存储在当前的虚地址中,并向发起访存指令的访存部件反馈数据,或向发起取指令的取指部件返回指令。
步骤605,生成虚地址对应的目标项,并添加到地址维护列表中。
在一种可能的实施方式中,当访问是第一次触发或数据已被替换时,可以在地址维护列表中新增一项信息。
生成目标项的处理可以如下:将上述虚地址或者物理地址的Tag,作为目标Tag,将虚地址的地址二义性位作为目标地址二义性位,将虚地址对应的当前Cache作为目标Cache,生成目标项。
当地址维护列表可存储目标项的位置存在空项时,可以在任一空项中添加该目标项;当地址维护列表可存储目标项的位置不存在空项时,可以基于替换算法,将任意一项替换为该目标项。上述替换算法可以包括随机替换算法(Random)、最近最少使用算法(LRU,Least Recently Used)、访问次数最少算法(LFU,Least Frequently Used)等,本实施例对具体的替换算法不作限定。此外,还可以访问被替换的一项所指示的Cache,将被替换的一项所指示的虚地址中的信息置为无效。
在一些实施例中,对应于地址维护列表的索引与物理地址的Index相对应的情况,上述可存储目标项的位置是指物理地址的Index所指示的位置。
至此,当访问指令是第一次触发或信息被替换时,一个虚地址对应的访问流程已经介绍完毕。此时,在接收到访存指令的处理器核的数据Cache中,相应的虚地址Index对应的数据Cache Line上可以存储有上述取回的数据。或者在接收到取指令的处理器核的指令Cache中,相应的虚地址Index对应的指令Cache Line上可以存储有上述取回的指令。当该处理器核再次接收到访问该虚地址的访存指令时,或该处理器核再次取该虚地址的程序计数器对应的指令时,执行上述步骤601-602后可以从相应的虚地址上获取信息,不存在地址二义性的问题。
下面将对另一访问指令的访问方法进行介绍。
当另一访问指令的虚地址映射到相同物理地址上时,将产生地址二义性的问题。为了便于介绍,将上述另一访问指令的虚地址称为第一虚地址,将地址维护列表所指示的虚地址称为第二虚地址,第二虚地址与第一虚地址映射到同一物理地址。
对上述另一访问指令执行上述步骤601可以确定对应的第一虚地址,执行上述步骤602访问Cache并且可以确定未命中,执行上述步骤603可以确定第一虚地址对应的目标项,并跳转至步骤606执行。其中,步骤601-603的具体处理与上述同理,此处不再赘述。下面将对步骤606进行介绍。
步骤606,基于第一虚地址和目标项,确定第二虚地址,将第二虚地址的信息置为无效。
其中,目标项所记录的信息可以包括第二虚地址对应的目标Tag、目标地址二义性位和目标Cache。
在一种可能的实施方式中,由于第一虚地址和第二虚地址映射到同一物理地址,也即是Tag和Index除地址二义性位不同,其他相同。因此,第一虚地址和目标项所携带的信息可以构成目标项所指示的第二虚地址。
具体的,上述步骤606的处理可以如下:在第一虚地址中,将地址二义性位替换为目标地址二义性位,得到第二虚地址;访问目标Cache,在目标Cache中,将第二虚地址的信息置为无效。
当然,还可以采用其他方式构建第二虚地址,例如可以获取目标项中的目标Tag、目标地址二义性位和第一虚地址除Tag、地址二义性位之外的信息,构建第二虚地址。本实施例对此不作限定。
在确定第二虚地址之后,可以访问目标项所指示的目标Cache。该目标Cache的处理器核可以是接收到上述另一访问的处理器核,也即当前处理器核,也可以是另一处理器核,可以是指令Cache,也可以是数据Cache。在目标Cache中,将第二虚地址上存储的信息置为无效。
进而,跳转至步骤607。
步骤607,获取访问指令对应的信息,并写回第一虚地址。
在一种可能的实施方式中,步骤607的具体处理与上述步骤604相同,即获取物理地址中的信息,将获取的信息填充到第一虚地址中,此处不再赘述。
在另一种可能的实施方式中,还可以是获取第二虚地址中的信息并写回,将写回的信息填充到第一虚地址中。此时,对第二虚地址执行的无效操作可以是无效写回。
至此,只有第一虚地址能够存储物理地址的有效信息,避免了第二虚地址继续存储该有效信息,解决了地址二义性问题。
由于此时的地址维护列表指示的是第二虚地址,可以跳转至步骤607,对目标项进行更新。
步骤608,基于第一虚地址,更新地址维护列表。
当第一虚地址的地址二义性位与目标项的地址二义性位不一致时,将目标项中的地址二义性位,更新为第一虚地址的地址二义性位。此外,当第一虚地址对应的当前Cache与目标Cache不一致时,将目标项的目标Cache,更新为当前Cache。
在一种具体的实施方式中,可以是分别对目标项中的每项信息进行判断,每当判断结果为不一致时,将相应的信息进行更新。对于图8所示的地址维护列表,可以是分别对第一虚地址的地址二义性位与目标项的地址二义性位,以及当前处理器核与目标项所指示的处理器核,当前Cache与目标项所指示的目标Cache进行判断,每当判断结果为不一致时,将相应的信息进行更新。
在另一种具体的实施方式中,也可以不执行判断,直接将目标地址二义性位,更新为第一虚地址的地址二义性位;将目标Cache,更新为当前Cache,即将当前Cache的信息更新为目标Cache信息。对于图8所示的地址维护列表,可以是基于当前处理器核更新第一标识,基于当前Cache是指令Cache或数据Cache更新第二标识。对于相同的信息,更新后自然未发生变更。
在后续的过程中,当接收到访存指令时,或接收到取指令时,均可基于上面介绍的流程进行处理,利用地址维护列表对虚地址和物理地址的一致性进行维护,解决地址二义性问题。
本公开可以获得如下有益效果:
(1)地址维护列表数据量少,维护简单,可以在解决地址二义性的同时提高处理效率。
(2)上述过程均可基于硬件实现,相比于软件的方法,时延较小。
(3)地址维护列表可以是Snoop Filter列表,通过复用Snoop Filter机制,提高现有***的性能。
(4)可以不再使用L2 Cache重复存储L1 Cache的所有信息,不再通过维护包含关系的方式来解决地址二义性问题,减少空间的占用,使得L2 Cache可以有更多的空间来实现其他任务的处理,提高现有***的性能。
本公开实施例提供了一种解决高速缓冲存储器地址二义性问题的装置,该装置用于实现上述解决高速缓冲存储器地址二义性问题的方法。如图9所示的解决高速缓冲存储器地址二义性问题的装置的示意性框图,解决高速缓冲存储器地址二义性问题的装置900包括:接收模块901,查询模块902,无效模块903,获取模块904。
接收模块901,用于基于接收到的访问指令,确定对应的第一虚地址;
查询模块902,用于当基于所述第一虚地址访问未命中时,查询地址维护列表,确定所述第一虚地址对应的目标项,所述目标项所记录的信息包括目标Tag、目标地址二义性位和目标Cache;
无效模块903,用于基于所述第一虚地址和所述目标项,确定第二虚地址,将所述第二虚地址的信息置为无效,其中,所述第二虚地址与所述第一虚地址映射到同一物理地址;
获取模块904,用于获取所述访问指令对应的信息,并写回所述第一虚地址。
在一些实施例中,所述装置还包括更新模块,所述更新模块用于:
基于所述第一虚地址,更新所述地址维护列表。
在一些实施例中,所述更新模块用于:
当所述第一虚地址的地址二义性位与所述目标地址二义性位不一致时,将所述目标地址二义性位,更新为所述第一虚地址的地址二义性位。
在一些实施例中,所述目标Cache,至少包括以下任意一种:
当前处理器核的指令Cache或数据Cache,或,其他处理器核的指令Cache或数据Cache,其中,所述当前处理器核用于接收所述访问指令。
在一些实施例中,所述接收模块901用于:
当接收到取指令时,获取所述取指令对应的第一虚地址,所述取指令对应的第一虚地址用于访问指令Cache的程序计数器;或
当接收到访存指令时,确定所述访存指令对应的第一虚地址,所述访存指令对应的第一虚地址用于访问数据Cache。
在一些实施例中,所述接收模块901用于:
基于程序计数器,获取所述取指令的第一虚地址。
在一些实施例中,所述查询模块902用于:
基于地址维护列表,确定与所述第一虚地址的Tag一致的目标项。
在一些实施例中,所述查询模块902用于:
遍历所述地址维护列表,确定每一项的Tag是否与所述第一虚地址的Tag一致;如果一致,则将对应的一项作为所述第一虚地址的目标项;或
基于所述物理地址的索引Index,在所述地址维护列表中获取所述Index对应的至少一项;在所述至少一项中,将与所述第一虚地址的Tag一致的一项作为目标项。
在一些实施例中,所述更新模块还用于:
如果所述地址维护列表中不存在所述第一虚地址对应的目标项,则生成所述目标项,并添加到所述地址维护列表中。
在一些实施例中,所述更新模块用于:
将所述第一虚地址或者所述物理地址的Tag作为目标Tag,将所述第一虚地址的地址二义性位作为目标地址二义性,将所述第一虚地址对应的当前Cache作为目标Cache,生成所述目标项。
在一些实施例中,所述无效模块903用于:
在所述第一虚地址中,将地址二义性位替换为所述目标地址二义性位,得到第二虚地址;
访问所述目标Cache,在所述目标Cache中,将所述第二虚地址的信息置为无效。
在一些实施例中,所述获取模块904用于:
获取所述第二虚地址中的信息并写回,将写回的信息填充到所述第一虚地址中;或
获取所述物理地址中的信息,将信息填充到所述第一虚地址中。
在一些实施例中,所述第一虚地址和所述第二虚地址用于访问一级高速缓冲存储器L1 Cache。
在一些实施例中,所述地址维护列表为侦听过滤器Snoop Filter列表。
在一些实施例中,所述地址维护列表的列表域包括:Tag、地址二义性位和Cache信息。
在一些实施例中,所述Cache信息包括:处理器核的第一标识和Cache的第二标识,所述第一标识用于指示所述Cache所属的处理器核,所述第二标识用于指示所述Cache为指令Cache或数据Cache。
本公开实施例中,可以采用地址维护列表来确定是否存在地址二义性,若存在,则可以基于地址维护列表来维护虚地址和物理地址的一致性。并且,地址维护列表数据量少,维护简单,可以在解决地址二义性的同时提高处理效率。
本公开实施例还提供一种电子设备,包括:至少一个处理器;以及与至少一个处理器通信连接的存储器。存储器存储有能够被至少一个处理器执行的计算机程序,计算机程序在被至少一个处理器执行时用于使电子设备执行根据本公开实施例的解决高速缓冲存储器地址二义性问题的方法。
本公开实施例还提供一种存储有计算机程序的非瞬时计算机可读存储介质,其中,计算机程序在被计算机的处理器执行时用于使计算机执行根据本公开实施例的解决高速缓冲存储器地址二义性问题的方法。
本公开实施例还提供一种计算机程序产品,包括计算机程序,其中,计算机程序在被计算机的处理器执行时用于使计算机执行根据本公开实施例的解决高速缓冲存储器地址二义性问题的方法。
本公开实施例还提供一种计算机程序,所述计算机程序包括计算机程序代码,当所述计算机程序代码在计算机上运行时,以使得计算机执行根据本公开实施例的解决高速缓冲存储器地址二义性问题的方法。
参考图10,现将描述可以作为本公开的服务器或客户端的电子设备1000的结构框图,其是可以应用于本公开的各方面的硬件设备的示例。电子设备旨在表示各种形式的数字电子的计算机设备,诸如,数据中心服务器、笔记本电脑、瘦客户机、膝上型计算机、台式计算机、工作站、个人数字助理、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本公开的实现。
如图10所示,电子设备1000包括计算单元1001,其可以根据存储在只读存储器(ROM)1002中的计算机程序或者从存储单元1008加载到随机访问存储器(RAM)1003中的计算机程序,来执行各种适当的动作和处理。在RAM1003中,还可存储设备1000操作所需的各种程序和数据。计算单元1001、ROM 1002以及RAM 1003通过总线1004彼此相连。输入/输出(I/O)接口1005也连接至总线1004。
电子设备1000中的多个部件连接至I/O接口1005,包括:输入单元1006、输出单元1007、存储单元1008以及通信单元1009。输入单元1006可以是能向电子设备1000输入 信息的任何类型的设备,输入单元1006可以接收输入的数字或字符信息,以及产生与电子设备的用户设置和/或功能控制有关的键信号输入。输出单元1007可以是能呈现信息的任何类型的设备,并且可以包括但不限于显示器、扬声器、视频/音频输出终端、振动器和/或打印机。存储单元1008可以包括但不限于磁盘、光盘。通信单元1009允许电子设备1000通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据,并且可以包括但不限于调制解调器、网卡、红外通信设备、无线通信收发机和/或芯片组,例如蓝牙设备、WiFi设备、WiMax设备、蜂窝通信设备和/或类似物。
计算单元1001可以是各种具有处理和计算能力的通用和/或专用处理组件。计算单元1001的一些示例包括但不限于中央处理单元(CPU)、图形处理单元(GPU)、各种专用的人工智能(AI)计算芯片、各种运行机器学习模型算法的计算单元、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。计算单元1001执行上文所描述的各个方法和处理。例如,在一些实施例中,解决高速缓冲存储器地址二义性问题的方法可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储单元1008。在一些实施例中,计算机程序的部分或者全部可以经由ROM1002和/或通信单元1009而被载入和/或安装到电子设备1000上。在一些实施例中,计算单元1001可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行解决高速缓冲存储器地址二义性问题的方法。
用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器,使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行***、装置或设备使用或与指令执行***、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体***、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
如本公开使用的,术语“机器可读介质”和“计算机可读介质”指的是用于将机器指令和/或数据提供给可编程处理器的任何计算机程序产品、设备、和/或装置(例如,磁盘、光盘、存储器、可编程逻辑装置(PLD)),包括,接收作为机器可读信号的机器指令的机器可读介质。术语“机器可读信号”指的是用于将机器指令和/或数据提供给可编程处理器的任何信号。
为了提供与用户的交互,可以在计算机上实施此处描述的***和技术,该计算机具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装 置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。
可以将此处描述的***和技术实施在包括后台部件的计算***(例如,作为数据服务器)、或者包括中间件部件的计算***(例如,应用服务器)、或者包括前端部件的计算***(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的***和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算***中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将***的部件相互连接。通信网络的示例包括:局域网(LAN)、广域网(WAN)和互联网。
计算机***可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。
本公开所有实施例均可以单独被执行,也可以与其他实施例相结合被执行,均视为本公开要求的保护范围。

Claims (21)

  1. 一种解决高速缓冲存储器地址二义性问题的方法,其特征在于,所述方法包括:
    基于接收到的访问指令,确定对应的第一虚地址;
    当基于所述第一虚地址访问未命中时,查询地址维护列表,确定所述第一虚地址对应的目标项,所述目标项所记录的信息用于指示目标标签Tag、目标地址二义性位和目标高速缓冲存储器Cache;
    基于所述第一虚地址和所述目标项,确定第二虚地址,将所述第二虚地址的信息置为无效,其中,所述第二虚地址与所述第一虚地址映射到同一物理地址;
    获取所述访问指令对应的信息,并写回所述第一虚地址。
  2. 根据权利要求1所述的解决高速缓冲存储器地址二义性问题的方法,其特征在于,所述方法还包括:
    基于所述第一虚地址,更新所述地址维护列表。
  3. 根据权利要求2所述的解决高速缓冲存储器地址二义性问题的方法,其特征在于,所述基于所述第一虚地址,更新所述地址维护列表,包括:
    当所述第一虚地址的地址二义性位与所述目标地址二义性位不一致时,将所述目标地址二义性位,更新为所述第一虚地址的地址二义性位。
  4. 根据权利要求1至3中任一项所述的解决高速缓冲存储器地址二义性问题的方法,其特征在于,所述目标Cache,至少包括以下任意一种:当前处理器核的指令Cache或数据Cache,或,其他处理器核的指令Cache或数据Cache,其中,所述当前处理器核用于接收所述访问指令。
  5. 根据权利要求1至4中任一项所述的解决高速缓冲存储器地址二义性问题的方法,其特征在于,所述基于接收到的访问指令,确定对应的第一虚地址,包括:
    当接收到取指令时,获取所述取指令对应的第一虚地址,所述取指令对应的第一虚地址用于访问指令Cache;或
    当接收到访存指令时,确定所述访存指令对应的第一虚地址,所述访存指令对应的第一虚地址用于访问数据Cache。
  6. 根据权利要求5所述的解决高速缓冲存储器地址二义性问题的方法,其特征在于,所述获取所述取指令对应的第一虚地址,包括:
    基于程序计数器,获取所述取指令对应的第一虚地址。
  7. 根据权利要求1至6中任一项所述的解决高速缓冲存储器地址二义性问题的方法,其特征在于,所述查询地址维护列表,确定所述第一虚地址对应的目标项,包括:
    基于地址维护列表,确定与所述第一虚地址的Tag一致的目标项。
  8. 根据权利要求7所述的解决高速缓冲存储器地址二义性问题的方法,其特征在于,所述基于地址维护列表,确定与所述第一虚地址的Tag一致的目标项,包括:
    遍历所述地址维护列表,确定每一项的Tag是否与所述第一虚地址的Tag一致;如果 一致,则将对应的一项作为所述第一虚地址的目标项;或
    基于所述物理地址的索引Index,在所述地址维护列表中获取所述Index对应的至少一项;在所述至少一项中,将与所述第一虚地址的Tag一致的一项作为目标项。
  9. 根据权利要求1至8中任一项所述的解决高速缓冲存储器地址二义性问题的方法,其特征在于,所述方法还包括:
    如果所述地址维护列表中不存在所述第一虚地址对应的目标项,则生成所述目标项,并添加到所述地址维护列表中。
  10. 根据权利要求9所述的解决高速缓冲存储器地址二义性问题的方法,其特征在于,所述生成所述目标项,包括:
    将所述第一虚地址或者所述物理地址的Tag作为目标Tag,将所述第一虚地址的地址二义性位作为目标地址二义性位,将所述第一虚地址对应的当前Cache作为目标Cache,生成所述目标项。
  11. 根据权利要求1至10中任一项所述的解决高速缓冲存储器地址二义性问题的方法,其特征在于,所述基于所述第一虚地址和所述目标项,确定第二虚地址,将所述第二虚地址的信息置为无效,包括:
    在所述第一虚地址中,将地址二义性位替换为所述目标地址二义性位,得到第二虚地址;
    访问所述目标Cache,在所述目标Cache中,将所述第二虚地址的信息置为无效。
  12. 根据权利要求1至11中任一项所述的解决高速缓冲存储器地址二义性问题的方法,其特征在于,所述获取所述访问指令对应的信息,并写回所述第一虚地址,包括:
    获取所述第二虚地址中的信息并写回,将写回的信息填充到所述第一虚地址中;或
    获取所述物理地址中的信息,将信息填充到所述第一虚地址中。
  13. 根据权利要求1至12中任一项所述的解决高速缓冲存储器地址二义性问题的方法,其特征在于,所述第一虚地址和所述第二虚地址用于访问一级高速缓冲存储器L1 Cache。
  14. 根据权利要求1至13中任一项所述的解决高速缓冲存储器地址二义性问题的方法,其特征在于,所述地址维护列表为侦听过滤器Snoop Filter列表。
  15. 根据权利要求1至14中任一项所述的解决高速缓冲存储器地址二义性问题的方法,其特征在于,所述地址维护列表的列表域包括:Tag、地址二义性位和Cache信息。
  16. 根据权利要求15所述的解决高速缓冲存储器地址二义性问题的方法,其特征在于,所述Cache信息包括:处理器核的第一标识和Cache的第二标识,所述第一标识用于指示所述Cache所属的处理器核,所述第二标识用于指示所述Cache为指令Cache或数据Cache。
  17. 一种解决高速缓冲存储器地址二义性问题的装置,其特征在于,所述装置包括:
    接收模块,用于基于接收到的访问指令,确定对应的第一虚地址;
    查询模块,用于当基于所述第一虚地址访问未命中时,查询地址维护列表,确定所述第一虚地址对应的目标项,所述目标项所记录的信息用于指示目标Tag、目标地址二义性位和目标Cache;
    无效模块,用于基于所述第一虚地址和所述目标项,确定第二虚地址,将所述第二虚地址的信息置为无效,其中,所述第二虚地址与所述第一虚地址映射到同一物理地址;
    获取模块,用于获取所述访问指令对应的信息,并写回所述第一虚地址。
  18. 一种电子设备,包括:
    处理器;以及
    存储程序的存储器,
    其中,所述程序包括指令,所述指令在由所述处理器执行时使所述处理器执行根据权利要求1-16中任一项所述的方法。
  19. 一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使计算机执行根据权利要求1-16中任一项所述的方法。
  20. 一种计算机程序产品,其特征在于,所述计算机程序产品中包括计算机程序代码,当所述计算机程序代码在计算机上运行时,以执行如权利要求1-16中任一项所述的方法。
  21. 一种计算机程序,其特征在于,所述计算机程序包括计算机程序代码,当所述计算机程序代码在计算机上运行时,以使得计算机执行如权利要求1-16中任一项所述的方法。
PCT/CN2022/082036 2021-12-17 2022-03-21 解决高速缓冲存储器地址二义性问题的方法和装置 WO2023108938A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP22868428.8A EP4227814A1 (en) 2021-12-17 2022-03-21 Method and apparatus for solving address ambiguity problem of cache

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111548542.9A CN113934655B (zh) 2021-12-17 2021-12-17 解决高速缓冲存储器地址二义性问题的方法和装置
CN202111548542.9 2021-12-17

Publications (1)

Publication Number Publication Date
WO2023108938A1 true WO2023108938A1 (zh) 2023-06-22

Family

ID=79289314

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/082036 WO2023108938A1 (zh) 2021-12-17 2022-03-21 解决高速缓冲存储器地址二义性问题的方法和装置

Country Status (3)

Country Link
EP (1) EP4227814A1 (zh)
CN (1) CN113934655B (zh)
WO (1) WO2023108938A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113934655B (zh) * 2021-12-17 2022-03-11 北京微核芯科技有限公司 解决高速缓冲存储器地址二义性问题的方法和装置
CN115563027B (zh) * 2022-11-22 2023-05-12 北京微核芯科技有限公司 存数指令的执行方法、***及装置
CN117472802B (zh) * 2023-12-28 2024-03-29 北京微核芯科技有限公司 高速缓存访问方法、处理器、电子设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5109335A (en) * 1987-10-02 1992-04-28 Hitachi, Ltd. Buffer memory control apparatus using address translation
CN1506843A (zh) * 2002-12-12 2004-06-23 国际商业机器公司 能够使用虚拟存储器处理模式的数据处理***
US20140095784A1 (en) * 2012-09-28 2014-04-03 Thang M. Tran Techniques for Utilizing Transaction Lookaside Buffer Entry Numbers to Improve Processor Performance
CN107111455A (zh) * 2014-12-26 2017-08-29 威斯康星校友研究基金会 使用虚拟地址的缓存访问
US20190087339A1 (en) * 2017-09-21 2019-03-21 International Business Machines Corporation Cache synonym system and method
CN113934655A (zh) * 2021-12-17 2022-01-14 北京微核芯科技有限公司 解决高速缓冲存储器地址二义性问题的方法和装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2543745B (en) * 2015-10-15 2018-07-04 Advanced Risc Mach Ltd An apparatus and method for operating a virtually indexed physically tagged cache
US11243891B2 (en) * 2018-09-25 2022-02-08 Ati Technologies Ulc External memory based translation lookaside buffer
GB2595256B (en) * 2020-05-19 2022-08-17 Advanced Risc Mach Ltd Translation table address storage circuitry

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5109335A (en) * 1987-10-02 1992-04-28 Hitachi, Ltd. Buffer memory control apparatus using address translation
CN1506843A (zh) * 2002-12-12 2004-06-23 国际商业机器公司 能够使用虚拟存储器处理模式的数据处理***
US20140095784A1 (en) * 2012-09-28 2014-04-03 Thang M. Tran Techniques for Utilizing Transaction Lookaside Buffer Entry Numbers to Improve Processor Performance
CN107111455A (zh) * 2014-12-26 2017-08-29 威斯康星校友研究基金会 使用虚拟地址的缓存访问
US20190087339A1 (en) * 2017-09-21 2019-03-21 International Business Machines Corporation Cache synonym system and method
CN113934655A (zh) * 2021-12-17 2022-01-14 北京微核芯科技有限公司 解决高速缓冲存储器地址二义性问题的方法和装置

Also Published As

Publication number Publication date
CN113934655B (zh) 2022-03-11
CN113934655A (zh) 2022-01-14
EP4227814A1 (en) 2023-08-16

Similar Documents

Publication Publication Date Title
WO2023108938A1 (zh) 解决高速缓冲存储器地址二义性问题的方法和装置
TWI531912B (zh) 具有用於多上下文計算引擎的轉譯後備緩衝之處理器、用於致能多執行緒以存取於處理器中之資源之系統和方法
CN104346294B (zh) 基于多级缓存的数据读/写方法、装置和计算机***
US8185692B2 (en) Unified cache structure that facilitates accessing translation table entries
US20160140042A1 (en) Instruction cache translation management
JP2015111435A (ja) マルチレベルのキャッシュ階層におけるキャストアウトを低減するための装置および方法
CN109219804B (zh) 非易失内存访问方法、装置和***
JP6478843B2 (ja) 半導体装置及びキャッシュメモリ制御方法
KR20180008507A (ko) 캐시 태그 압축을 위한 방법 및 장치
CN114761934A (zh) 在基于处理器的***中用于增强用于将虚拟地址(VA)转换为物理地址(PA)的存储器管理单元(MMU)TLB的进程专用的存储器中转换后备缓冲器(TLB)(mTLB)
KR101893966B1 (ko) 메모리 관리 방법 및 장치, 및 메모리 컨트롤러
US20230102891A1 (en) Re-reference interval prediction (rrip) with pseudo-lru supplemental age information
US20240256464A1 (en) Prefetch kill and revival in an instruction cache
CN114238167B (zh) 信息预取方法、处理器、电子设备
EP4022448A1 (en) Optimizing access to page table entries in processor-based devices
CN114637700A (zh) 针对目标虚拟地址的地址转换方法、处理器及电子设备
CN115098410A (zh) 处理器、用于处理器的数据处理方法及电子设备
JP6249120B1 (ja) プロセッサ
CN115061955A (zh) 处理器、电子设备、地址翻译方法以及缓存页表项方法
CN117940909A (zh) 基于处理器的设备中的追踪存储器块存取频率
JP2022156563A (ja) 情報処理装置及び情報処理方法
CN115080464B (zh) 数据处理方法和数据处理装置
CN114238176B (zh) 处理器、用于处理器的地址翻译方法、电子设备
CN114218132B (zh) 信息预取方法、处理器、电子设备
US20200167285A1 (en) Prefetching data to reduce cache misses

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 18246196

Country of ref document: US

ENP Entry into the national phase

Ref document number: 2022868428

Country of ref document: EP

Effective date: 20230322