WO2023165317A1 - 内存访问方法和装置 - Google Patents

内存访问方法和装置 Download PDF

Info

Publication number
WO2023165317A1
WO2023165317A1 PCT/CN2023/075635 CN2023075635W WO2023165317A1 WO 2023165317 A1 WO2023165317 A1 WO 2023165317A1 CN 2023075635 W CN2023075635 W CN 2023075635W WO 2023165317 A1 WO2023165317 A1 WO 2023165317A1
Authority
WO
WIPO (PCT)
Prior art keywords
address
tlb
memory access
physical address
cpu core
Prior art date
Application number
PCT/CN2023/075635
Other languages
English (en)
French (fr)
Inventor
郭凯杰
罗犇
彭开桓
Original Assignee
阿里巴巴(中国)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴(中国)有限公司 filed Critical 阿里巴巴(中国)有限公司
Publication of WO2023165317A1 publication Critical patent/WO2023165317A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/0292User address space allocation, e.g. contiguous or non contiguous base addressing using tables or multilevel address translation means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • G06F12/0882Page mode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1009Address translation using page tables, e.g. page table structures

Definitions

  • This specification relates to the field of computer technology, and in particular to a memory access method and device.
  • CPU Central Processing Unit, central processing unit
  • a CPU may include one or more CPU cores.
  • the CPU core usually includes TLB (Translation Look-aside Buffer, bypass translation cache).
  • TLB is a high-speed hardware cache used to cache the mapping relationship between virtual addresses and physical addresses, which can improve the address translation rate.
  • this specification provides a memory access method and device.
  • a memory access method used for memory access of a computer system
  • the computer system includes a central processing unit CPU
  • the CPU includes a plurality of CPU cores
  • the CPU cores include a bypass translation cache TLB
  • the TLB is used to cache virtual addresses and physical addresses
  • the mapping relation between, described method is applied to CPU core, comprises:
  • an address detection request is sent, and the address detection request carries the virtual address, so that the CPU core receiving the address detection request can search in its TLB A physical address corresponding to the virtual address, and when the corresponding physical address is found, return an address detection response, where the address detection response carries the found physical address;
  • the sending address detection request includes:
  • the sending address detection request includes:
  • the target CPU core is a CPU core where other threads in the process to which the thread initiating the memory access request belongs belong.
  • the target CPU core is the CPU core where the thread initiating the memory access request is dispatched to the current CPU core.
  • the target core identifier is written by a scheduler.
  • TLB invalid instruction is sent after the thread initiating the memory access request is dispatched to other CPU cores
  • the method also includes:
  • the physical address corresponding to the virtual address in the valid state and the invalid state is searched in the TLB.
  • a memory access device used for memory access of a computer system
  • the computer system includes a central processing unit CPU
  • the CPU includes a plurality of CPU cores
  • the CPU cores include a bypass translation cache TLB
  • the TLB is used to cache virtual addresses and physical addresses
  • the mapping relationship between, the device is applied to the CPU core, including:
  • the address search unit in response to the memory access request, searches the TLB for a physical address corresponding to the virtual address carried by the memory access request;
  • the address detection unit sends an address detection request when the physical address corresponding to the virtual address is not found, and the address detection request carries the virtual address for the CPU core receiving the address detection request to Finding the physical address corresponding to the virtual address in the TLB, and returning an address detection response when the corresponding physical address is found, the address detection response carrying the found physical address;
  • the memory access unit in response to the received address detection response, stores the mapping relationship between the physical address carried in the address detection response and the virtual address in the TLB, and performs memory access based on the physical address .
  • the address detection unit reads the target core identifier of the target CPU core from the register; Sending an address detection request to the target CPU core based on the target core identifier.
  • CPU comprises a plurality of CPU cores
  • CPU core comprises bypass conversion cache TLB
  • TLB is used for caching the mapping relation between virtual address and physical address
  • described CPU core is configured as:
  • an address detection request is sent, and the address detection request carries the virtual address, so that the CPU core receiving the address detection request can search in its TLB A physical address corresponding to the virtual address, and when the corresponding physical address is found, return an address detection response, where the address detection response carries the found physical address;
  • Adopt the above-mentioned embodiment mode under the situation that the CPU core does not store the physical address corresponding to the virtual address in the TLB, send the address detection request to other CPU cores, other CPU cores check whether the physical address corresponding to the virtual address is stored in their respective TLBs, and can send The queried physical address is added to the address probe response and returned. The CPU core can then store the mapping relationship between the physical address and the virtual address in its TLB, and perform memory access.
  • the TLB cache sharing between CPU cores can be realized, which greatly reduces the repeated process page table query performed by CPU cores in the scenario of cross-CPU core operation, reduces the waste of CPU core processing resources, and Effectively shorten the time-consuming address translation in the case of TLB Miss, and improve IO performance.
  • the above-mentioned technical solution provided in this manual can be realized based on existing hardware and cache detection protocol, without adding new hardware, and the cost is low and the feasibility is high.
  • Fig. 1 is a schematic flowchart of a memory access method shown in an exemplary embodiment of this specification.
  • Fig. 2 is a block diagram of a CPU shown in an exemplary embodiment of this specification.
  • Fig. 3 is a schematic flowchart of another memory access method shown in an exemplary embodiment of this specification.
  • Fig. 4 is a schematic flowchart of another memory access method shown in an exemplary embodiment of this specification.
  • Fig. 5 is a block diagram of a memory access device shown in an exemplary embodiment of this specification.
  • first, second, third, etc. may be used in this specification to describe various information, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of this specification, first information may also be called second information, and similarly, second information may also be called first information. Depending on the context, the word “if” as used herein may be interpreted as “at” or “when” or “in response to a determination.”
  • the CPU core usually includes an MMU (Memory Management Unit, memory management unit).
  • MMU Memory Management Unit, memory management unit
  • the MMU can convert the virtual address into the physical address required for memory access, and can store the mapping relationship between the virtual address and the physical address in the TLB of the CPU core. middle.
  • the MMU can first query the physical address corresponding to the virtual address in the TLB, and then perform memory access based on the queried physical address. If the physical address corresponding to the virtual address is not stored in the TLB, the MMU needs to search for the physical address corresponding to the virtual address based on the process page table of the memory.
  • ASID Address Space ID, Address Space Identifier
  • a process is a carrier for running an application program, and the running of an application program usually corresponds to a process.
  • a thread is the smallest unit for performing calculations in a process. It is included in the process and is the actual operating unit in the process.
  • a process can usually consist of multiple threads that share the application's memory space. When a process or thread accesses memory, it uses the ASID corresponding to the process, and the MMU can look up the physical address corresponding to the two in the TLB based on the ASID and the virtual address.
  • a process of an application program is scheduled from CPU core 1 to CPU core 2 by the scheduler.
  • multiple threads of a process run on different CPU cores.
  • This manual provides a memory access scheme for a computer system, which can improve the IO performance in the cross-CPU core operation scenario and save CPU core processing resources.
  • Fig. 1 is a schematic flowchart of a memory access method shown in an exemplary embodiment of this specification.
  • the memory access method can be used for memory access of a computer system
  • the computer system includes a CPU
  • the CPU includes a plurality of CPU cores
  • the CPU core includes a TLB
  • the method is applied to the CPU core, for example, can be applied to
  • the MMU of the CPU core includes the following steps:
  • Step 102 in response to the memory access request, search the TLB for a physical address corresponding to the virtual address carried in the memory access request.
  • a process or thread running on the CPU core can initiate a memory access request (since the thread is the actual operating unit in the process, it will be described as a thread initiating a memory access request in the following), and the memory access request usually carries the required The virtual address to access.
  • the ASID of the application program will also be carried in the memory access request. This manual will be described later by taking the payment ASID as an example.
  • the MMU of the CPU core may first query the TLB of the CPU core, and query whether the physical address corresponding to the virtual address is cached in the TLB. For example, query the TLB for the physical address corresponding to the ASID and virtual address specified by the memory access request.
  • TLB Hit If the corresponding physical address (TLB Hit) is found in the TLB, memory access can be performed based on the found physical address.
  • step 104 If the corresponding physical address (TLB Miss) is not found in the TLB, the following step 104 can be performed.
  • the physical address can be queried based on the process page table of the memory.
  • the following step 104 can be directly performed, and the physical address is not queried based on the process page table of the memory; the following steps can also be performed.
  • the physical address query is performed in parallel based on the process page table of the memory, which is not particularly limited in this specification.
  • Step 104 when the physical address corresponding to the virtual address is not found, send an address detection request, the address detection request carries the virtual address, for the CPU core that receives the address detection request to Searching the physical address corresponding to the virtual address in the TLB, and returning an address detection response if the corresponding physical address is found, where the address detection response carries the found physical address.
  • the CPU core may construct an address detection request, and add the ASID and virtual address to the address detection request.
  • the address detection request can be constructed based on the cache detection Snoop protocol (Snoop Protocol), and the Snoop protocol is a strategy for solving cache consistency of multi-core processors in a hardware manner.
  • the address detection request may also be constructed based on other protocols, and this specification does not make any special limitation on this.
  • the CPU core may broadcast the constructed address detection request to all CPU cores.
  • the CPU core may also send the constructed address detection request to a designated CPU core.
  • the CPU core may first read the target core identifier of the target CPU core from a designated register, and then send the address detection request to the target CPU core based on the target core identifier.
  • the target CPU core can be the CPU core where other threads in the process to which the thread that initiates the memory access request belongs; the target CPU core can also be that the thread that initiates the memory access request is scheduled before the CPU core The CPU core where it is located.
  • the target core identifier of the target CPU core can be written into the specified register by the scheduler.
  • the target core identifier may be added as a parameter to the address detection request.
  • the CPU core can inquire whether the physical address corresponding to the ASID carried in the address detection request and the virtual address is cached in its TLB.
  • the physical address may be added to the address detection response and returned to the CPU core that sent the address detection request.
  • an address detection response may also be returned to the CPU core that sent the address detection request, and the address detection response does not carry a physical address.
  • the CPU cores are connected by buses, for example, Ring Bus (Ring Bus), MESH network bus, etc.
  • Ring Bus Ring Bus
  • MESH MESH network bus
  • the transmission direction of the address detection request/address detection response shown in FIG. 2 is only illustrative, representing that the address detection request and address detection response are transmitted between CPU cores, and does not represent the actual transmission path.
  • Step 106 In response to the received address detection response, store the mapping relationship between the physical address and the virtual address carried in the address detection response in the TLB, and perform memory access based on the physical address.
  • the CPU core After the CPU core receives the address detection response to the address detection request sent by the CPU core, on the one hand, it can extract the physical address from the address detection response, and then use the physical address, the The mapping relationship between the virtual address and the ASID is stored in the TLB. On the other hand, memory access may be based on the physical address.
  • the CPU core may search for the physical address corresponding to the virtual address based on the process page table of the memory.
  • the CPU core in this manual does not store the physical address corresponding to the virtual address in the TLB, it sends an address detection request to other CPU cores, and the other CPU cores check whether the physical address corresponding to the virtual address is stored in their respective TLBs. , and the queried physical address can be added to the address detection response and returned.
  • the CPU core can then store the mapping relationship between the physical address and the virtual address in its TLB, and perform memory access.
  • thread 1-thread 4 the threads used by thread 1-thread 4 are all ASID 7.
  • the MMU of CPU core 8 searches TLB 8, but the physical address corresponding to virtual address 0x800000 and ASID 7 is not stored in TLB 8, and then the process page table based on memory The corresponding physical address is queried, and the queried physical address and the mapping relationship between the virtual address and the ASID are stored in the TLB 8 . Then, the CPU core 8 can perform memory access based on the queried physical address.
  • TLB entries shown in Table 2 above can be stored in TLB 8. It should be noted that Table 2 is only an illustration, and in actual implementation, the TLB entry may also include other fields such as access rights (read or write), page type, and the like.
  • the virtual address it accesses is also 0x800000.
  • the MMU of CPU core 12 searches for TLB 12, but TLB 12 does not store the virtual address 0x800000 and the physical address corresponding to ASID 7.
  • CPU core 12 will Query the physical address based on the memory process page table. In order to avoid such repeated inquiries, adopt the technical solution provided in this manual, the CPU core 12 can construct an address detection request, and add the virtual address 0x800000 and ASID 7 to the address detection request.
  • the CPU core 12 can broadcast the address detection request to all CPUs via the bus core.
  • MESH network can be used to realize the bus design, and the delay is smaller.
  • the CPU core 12 may send the address detection request to the CPU core 8 where the thread 1 - 2 belonging to the same process as the thread 3 resides.
  • the CPU core 12 may first read the core ID 8 of the CPU core 8 from a designated register, and then add the core ID 8 as a parameter to the address detection request.
  • the address detection request is sent to the Snoop Agent, and the Snoop Agent can send the address detection request to the CPU core 8 according to the core identification 8 carried in the address detection request.
  • the core identifier 8 in the register can be written by the scheduler.
  • the scheduler knows all the threads under the same process and the CPU cores each thread runs on, and the scheduler can write the core identifiers of the CPU cores run by each thread under the process in the designated registers of these CPU cores.
  • the threads under this process run in two CPU cores, that is, CPU core 8 and CPU core 12, and the scheduler can write the core identifier 8 into the designated register of CPU core 12,
  • the core identification 12 can be written into a designated register of the CPU core 8 .
  • the current CPU core may not be excluded, and the core identifier 8 and the core identifier 12 are respectively written into designated registers of the CPU core 8 and the CPU core 12 .
  • the process runs on two CPU cores, and in other examples, it can also run on three or even more CPU cores, which is not specifically limited in this manual.
  • CPU core 8 After CPU core 8 receives the address detection message sent by CPU core 12, it finds the virtual address 0x800000 in TLB 8 and the physical address 0x2000 corresponding to ASID 7, and then the physical address 0x2000, virtual address 0x800000, and ASID 7 are added to the address detection response and returned to the CPU core 12, and the CPU core 12 can store the mapping relationship between the physical address 0x2000, the virtual address 0x800000, and ASID7 in the TLB 12, which also forms Table 2 The TLB entry shown. The CPU core 12 can also perform memory access based on the physical address 0x2000.
  • the CPU core 8 may also only add the physical address 0x2000 to the address detection response, and this specification does not make any special limitation on this.
  • the forwarding of the address detection request and the address detection response is usually implemented by the Snoop Agent.
  • the Snoop Agent After the Snoop Agent receives the address detection responses returned by different CPU cores, it Filter out address detection responses that do not carry physical addresses, and also deduplicate address detection responses that carry the same search results returned by different CPU cores, for example, return an address detection response that carries the aforementioned physical address to the CPU core that sent the address detection request address probe response.
  • the CPU core in the scenario where multiple threads of the same process run on different CPU cores, and the CPU core does not store the physical address corresponding to the virtual address in the TLB, it can send address detection requests to other CPU cores, and other The CPU core checks whether the physical address corresponding to the virtual address is stored in each TLB, and may add the queried physical address to the address detection response and return. The CPU core can then store the mapping relationship between the physical address and the virtual address in its TLB, and perform memory access.
  • the TLB cache sharing between CPU cores can be realized, which is extremely It greatly reduces the repeated process page table query performed by the CPU core in the scenario where multiple threads of the same process run on different CPU cores, reduces the waste of CPU core processing resources, and effectively shortens the consumption of address translation in the case of TLB Miss. , the IO performance is improved.
  • a TLB entry may have three states, namely: Valid, Stale and Invalid.
  • Valid indicates that the corresponding TLB entry is valid
  • Stale indicates that the corresponding TLB entry is temporarily invalid and can be reactivated to the Valid state
  • Invalid means that when the memory corresponding to the TLB entry is released, for example, the process is destroyed, the corresponding TLB entry is destroyed, and the destroyed TLB entry cannot be activated again.
  • the TLB entry corresponding to the ASID bound to the swapped out process is set to the invalid Stale state.
  • the operating system sends a TLB invalidation command after a process is swapped out.
  • the TLB invalidation command specifies the ASID bound to the process being swapped out.
  • the TLB entry corresponding to the specified ASID will be changed to The Valid state is set to the invalid Stale state.
  • the MMU queries the TLB entry that hits the Stale state, and then sets the state of the hit TLB entry from invalid Stale to valid Valid.
  • TLB destroy command specifies the ASID bound to the process to be destroyed. Based on the TLB destroy command, the TLB entry corresponding to the specified ASID (including Valid TLB entries in the Valid state and TLB entries in the invalid Stale state) are completely destroyed, for example, delete the TLB entry to make it in the Invalid state.
  • the scheduler can schedule processes based on the load of each CPU core, for example, if a certain process runs on the first CPU core, the process is scheduled to run on the second CPU core, and so on.
  • scheduling a process generally refers to scheduling all threads under the process.
  • the operating system sends a TLB destroy command to the first CPU core, and then completely destroys the TLB entry corresponding to the ASID bound to the process in the TLB of the first CPU core.
  • the second CPU core After the process is dispatched to the second CPU core, when performing memory access, the second CPU core still needs to perform address translation based on the process page table of the memory.
  • the operating system sends a TLB invalidation instruction to replace the TLB destruction instruction, so as to avoid the relevant TLB entries being completely destroyed.
  • the second CPU core can construct an address detection request and request other CPU cores to assist in querying the physical address.
  • the memory access method provided in this manual may include the following steps:
  • step 402 the first CPU core receives the TLB invalidation instruction, and sets the TLB entry bound to the called-out process in the first TLB to an invalid state.
  • the operating system does not send a TLB destroy instruction to the first CPU core, but sends a TLB invalid instruction to the first CPU core, and the TLB is invalid
  • the command specifies the ASID bound to the called out process.
  • the first CPU core sets the TLB entry corresponding to the ASID in the first TLB (ie, the TLB of the first CPU core) from a valid state to an invalid Stale state.
  • step 404 the second CPU core searches the second TLB for the physical address corresponding to the virtual address in response to the memory access request of the called-in process.
  • the above-mentioned process scheduled to run by the second CPU core or a thread under the process initiates a memory access request to the second CPU core when performing memory access.
  • the second CPU core first searches the second TLB (that is, the TLB of the second CPU core) for the physical address corresponding to the ASID and the virtual address. If the corresponding physical address (TLB Hit) is found, memory access can be performed based on the physical address. If the corresponding physical address (TLB Miss) is not found, the following step 406 can be performed.
  • Step 406 when the second CPU core does not find the physical address, send an address detection request to the first CPU core.
  • the second CPU core Based on the query result of the aforementioned step 404, if the second CPU core does not find the physical address, it can construct an address detection request, and add the virtual address to be accessed and the process-bound ASID to the address detection request .
  • the second CPU core sends the address detection request.
  • the address detection request may be broadcast and sent, or the address detection request may be sent to the CPU core where the process was scheduled before, that is, the first CPU core.
  • construction and sending of the address detection request may refer to the specific implementation process of the foregoing embodiments, and this embodiment will not repeat them here.
  • the first core identifier in the specified register can be determined by the scheduler in the process Write after dispatch.
  • Step 408 In response to the address detection request, the first CPU core searches the first TLB for a physical address corresponding to the virtual address.
  • the first CPU core in response to the address detection request, searches the first TLB for TLB entries in a valid state and an invalid state, so as to query the physical address.
  • the query hits the TLB entry in the valid state, it can indicate that different threads of the same process run on different CPU cores, that is, the thread that initiates the memory access request in the second CPU core is different from the thread running in the first CPU core. Some threads belong to the same process.
  • the query may indicate a scenario where process scheduling migration is likely, that is, the process originally runs on the first CPU core and is migrated to the second CPU core by the scheduler.
  • Step 410 the first CPU core adds the found physical address to the address detection response and returns it to the second CPU core.
  • Step 412 after receiving the address detection response, the second CPU core stores the mapping relationship between the physical address and the virtual address in the second TLB, and performs memory access based on the physical address.
  • steps 410-412 may refer to the description in the foregoing embodiments.
  • the second CPU core may The process page table looks up the physical address.
  • the CPU core if it does not store the physical address corresponding to the virtual address in the TLB, it can send an address detection request to other CPU cores, and the other CPU cores check whether the physical address corresponding to the virtual address is stored in their respective TLBs. The physical address corresponding to the virtual address, and the queried physical address can be added to the address detection response and returned. The CPU core can then store the mapping relationship between the physical address and the virtual address in its TLB, and perform memory access.
  • this specification also provides embodiments of the memory access device.
  • the embodiment of the memory access device in this specification can be applied in the CPU core of the computer system, and the CPU core includes a TLB, and the TLB is used for caching the mapping relationship between the virtual address and the physical address.
  • the memory access device 500 includes: an address search unit 501 , an address detection unit 502 , a memory access unit 503 and a status flag unit 504 .
  • the address search unit 501 in response to the memory access request, searches the TLB for a physical address corresponding to the virtual address carried in the memory access request;
  • the address detection unit 502 sends an address detection request if the physical address corresponding to the virtual address is not found, and the address detection request carries the virtual address for the CPU core that receives the address detection request Searching the physical address corresponding to the virtual address in its TLB, and returning an address detection response when the corresponding physical address is found, the address detection response carrying the found physical address;
  • the memory access unit 503 in response to the received address detection response, stores the mapping relationship between the physical address carried in the address detection response and the virtual address in the TLB, and based on the physical address for memory access.
  • the address detection unit 502 reads the target core identifier of the target CPU core from the register; and sends an address detection request to the target CPU core based on the target core identifier.
  • the address detection unit 502 broadcasts the address detection request.
  • the target CPU core is a CPU core where other threads in the process to which the thread initiating the memory access request belongs belong.
  • the target CPU core is the CPU core where the thread initiating the memory access request is dispatched to the current CPU core.
  • the target core identifier is written by a scheduler.
  • the state marking unit 504 receives a TLB invalid instruction, and the TLB invalid instruction is sent after the thread initiating the memory access request is dispatched to other CPU cores;
  • the method also includes:
  • the address search unit 501 after receiving the address detection request sent by other CPU cores, searches the TLB for the physical address corresponding to the virtual address in the valid state and the invalid state.
  • the address lookup unit 501 looks up the physical address corresponding to the virtual address based on the process page table of the memory if the address detection response is not received.
  • the device embodiment since it basically corresponds to the method embodiment, for the related parts, please refer to the part of the description of the method embodiment.
  • the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed to multiple network elements. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution in this specification. It can be understood and implemented by those skilled in the art without creative effort.
  • a typical implementing device is a computer, which may take the form of a personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media player, navigation device, e-mail device, game control device, etc. desktops, tablets, wearables, or any combination of these.
  • this specification also provides a CPU core, the CPU core includes a bypass translation cache TLB, and the TLB is used to cache the mapping relationship between the virtual address and the physical address, and the CPU core is configured for:
  • an address detection request is sent, and the address detection request carries the virtual address, so that the CPU core receiving the address detection request can search in its TLB A physical address corresponding to the virtual address, and when the corresponding physical address is found, return an address detection response, where the address detection response carries the found physical address;
  • the sending address detection request includes:
  • the sending address detection request includes:
  • the target CPU core is a CPU core where other threads in the process to which the thread initiating the memory access request belongs belong.
  • the target CPU core is the CPU core where the thread initiating the memory access request is dispatched to the current CPU core.
  • the target core identifier is written by the scheduler.
  • TLB invalid instruction is sent after the thread initiating the memory access request is dispatched to other CPU cores
  • the CPU core is also configured as:
  • the physical address corresponding to the virtual address in the valid state and the invalid state is searched in the TLB.
  • this specification also provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by the CPU core, the following steps are implemented:
  • an address detection request is sent, and the address The virtual address is carried in the address detection request, so that the CPU core that receives the address detection request searches the physical address corresponding to the virtual address in its TLB, and returns the address when the corresponding physical address is found.
  • a probe response where the address probe response carries the found physical address;
  • the sending address detection request includes:
  • the sending address detection request includes:
  • the target CPU core is a CPU core where other threads in the process to which the thread initiating the memory access request belongs belong.
  • the target CPU core is the CPU core where the thread initiating the memory access request is dispatched to the current CPU core.
  • the target core identifier is written by a scheduler.
  • TLB invalid instruction is sent after the thread initiating the memory access request is dispatched to other CPU cores
  • the method also includes:
  • the physical address corresponding to the virtual address in the valid state and the invalid state is searched in the TLB.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

说明书披露一种内存访问方法和装置,应用于CPU核心,该方法包括:响应于内存访问请求,在TLB中查找内存访问请求携带的虚拟地址对应的物理地址;在未查找到虚拟地址对应的物理地址的情况下,发送地址探测请求,地址探测请求中携带所述虚拟地址,以供接收到地址探测请求的CPU核心在其TLB中查找虚拟地址对应的物理地址,并在查找到对应的物理地址的情况下,返回地址探测响应,地址探测响应中携带查找到的物理地址;响应于接收到的地址探测响应,将地址探测响应中携带的物理地址与虚拟地址之间的映射关系存储至TLB中,并基于所述物理地址进行内存访问。

Description

内存访问方法和装置
本申请要求于2022年03月02日提交中国专利局、申请号为202210200142.7、申请名称为“内存访问方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本说明书涉及计算机技术领域,尤其涉及一种内存访问方法和装置。
背景技术
CPU(Central Processing Unit,中央处理器)是计算机***运算和控制核心,是信息处理、程序运行的最终执行单元。CPU可包括一个或多个CPU核心(core)。CPU核心中通常包括TLB(Translation Look-aside Buffer,旁路转换缓存),TLB是一种高速硬件缓存,用于缓存虚拟地址与物理地址之间的映射关系,可提高地址转换速率。
对于多核CPU而言,同一进程的不同线程可能运行在不同的CPU核心;进程或线程也可能被调度到其他CPU核心中运行。在这种跨CPU核心运行的场景下,如何提升地址转换速率,已成为亟待解决的技术问题。
发明内容
有鉴于此,本说明书提供一种内存访问方法和装置。
具体地,本说明书是通过如下技术方案实现的:
一种内存访问方法,用于计算机***的内存访问,所述计算机***包括中央处理器CPU,CPU包括多个CPU核心,CPU核心包括旁路转换缓存TLB,TLB用于缓存虚拟地址与物理地址之间的映射关系,所述方法应用于CPU核心,包括:
响应于内存访问请求,在TLB中查找所述内存访问请求携带的虚拟地址对应的物理地址;
在未查找到所述虚拟地址对应的物理地址的情况下,发送地址探测请求,所述地址探测请求中携带所述虚拟地址,以供接收到所述地址探测请求的CPU核心在其TLB中查找所述虚拟地址对应的物理地址,并在查找到对应的物理地址的情况下,返回地址探测响应,所述地址探测响应中携带查找到的所述物理地址;
响应于接收到的地址探测响应,将所述地址探测响应中携带的所述物理地址与所 述虚拟地址之间的映射关系存储至TLB中,并基于所述物理地址进行内存访问。
可选的,所述发送地址探测请求,包括:
广播发送地址探测请求。
可选的,所述发送地址探测请求,包括:
从寄存器中读取目标CPU核心的目标核心标识;
基于所述目标核心标识发送地址探测请求至所述目标CPU核心。
可选的,所述目标CPU核心是发起所述内存访问请求的线程所属进程中其他线程所在的CPU核心。
可选的,所述目标CPU核心是发起所述内存访问请求的线程被调度至本CPU核心之前所在的CPU核心。
可选的,所述目标核心标识由调度器写入。
可选的,还包括:
接收TLB无效指令,所述TLB无效指令是在发起所述内存访问请求的线程被调度至其他CPU核心后发送;
响应于所述TLB无效指令,将TLB中所述TLB无效指令指定的虚拟地址与物理地址之间的映射关系标记为无效状态;
所述方法还包括:
在接收到其他CPU核心发送的地址探测请求后,在TLB中查找处于有效状态和无效状态下的虚拟地址对应的物理地址。
可选的,还包括:
在未接收到所述地址探测响应的情况下,基于内存的进程页表查找所述虚拟地址对应的物理地址。
一种内存访问装置,用于计算机***的内存访问,所述计算机***包括中央处理器CPU,CPU包括多个CPU核心,CPU核心包括旁路转换缓存TLB,TLB用于缓存虚拟地址与物理地址之间的映射关系,所述装置应用于CPU核心,包括:
地址查找单元,响应于内存访问请求,在TLB中查找所述内存访问请求携带的虚拟地址对应的物理地址;
地址探测单元,在未查找到所述虚拟地址对应的物理地址的情况下,发送地址探测请求,所述地址探测请求中携带所述虚拟地址,以供接收到所述地址探测请求的CPU核心在其TLB中查找所述虚拟地址对应的物理地址,并在查找到对应的物理地址的情况下,返回地址探测响应,所述地址探测响应中携带查找到的所述物理地址;
内存访问单元,响应于接收到的地址探测响应,将所述地址探测响应中携带的所述物理地址与所述虚拟地址之间的映射关系存储至TLB中,并基于所述物理地址进行内存访问。
可选的,所述地址探测单元,从寄存器中读取目标CPU核心的目标核心标识;基 于所述目标核心标识发送地址探测请求至所述目标CPU核心。
一种中央处理器CPU,CPU包括多个CPU核心,CPU核心包括旁路转换缓存TLB,TLB用于缓存虚拟地址与物理地址之间的映射关系,所述CPU核心被配置为:
响应于内存访问请求,在TLB中查找所述内存访问请求携带的虚拟地址对应的物理地址;
在未查找到所述虚拟地址对应的物理地址的情况下,发送地址探测请求,所述地址探测请求中携带所述虚拟地址,以供接收到所述地址探测请求的CPU核心在其TLB中查找所述虚拟地址对应的物理地址,并在查找到对应的物理地址的情况下,返回地址探测响应,所述地址探测响应中携带查找到的所述物理地址;
响应于接收到的地址探测响应,将所述地址探测响应中携带的所述物理地址与所述虚拟地址之间的映射关系存储至TLB中,并基于所述物理地址进行内存访问。
采用上述实施方式,CPU核心在TLB中未存储虚拟地址对应的物理地址的情况下,发送地址探测请求给其他CPU核心,其他CPU核心查找各自TLB中是否存储虚拟地址对应的物理地址,并可将查询到的物理地址添加至地址探测响应中返回。CPU核心进而可在其TLB中存储所述物理地址和所述虚拟地址之间的映射关系,并进行内存访问。
采用上述实施方式,可实现CPU核心之间的TLB缓存共享,极大程度的减少了跨CPU核心运行的场景下,CPU核心进行的重复进程页表查询,减少了CPU核心处理资源的浪费,并且有效缩短了TLB Miss情况下地址转换的耗时,提升了IO性能。另一方面,本说明书提供的上述技术方案,基于已有的硬件以及缓存探测协议即可实现,无需新增硬件,成本低,可行性高。
附图说明
图1是本说明书一示例性实施例示出的一种内存访问方法的流程示意图。
图2是本说明书一示例性实施例示出的一种CPU框图。
图3是本说明书一示例性实施例示出的另一种内存访问方法的流程示意图。
图4是本说明书一示例性实施例示出的另一种内存访问方法的流程示意图。
图5是本说明书一示例性实施例示出的一种内存访问装置的框图。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本说明书相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本说明书的一些方面相一致的装置和方法的例子。
在本说明书使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本说明书。在本说明书和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。
应当理解,尽管在本说明书可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本说明书范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。
CPU核心通常可包括MMU(Memory Management Unit,内存管理单元),MMU可将虚拟地址转换为内存访问所需的物理地址,并可将虚拟地址与物理地址之间的映射关系存储至CPU核心的TLB中。
运行在CPU核心的应用程序在进行内存访问时,MMU可先在TLB中查询虚拟地址对应的物理地址,然后基于查询到的物理地址进行内存访问。若TLB中未存储虚拟地址对应的物理地址,MMU需基于内存的进程页表查找虚拟地址对应的物理地址。
为了隔离不同应用程序使用的虚拟地址,引入ASID(Address Space ID,地址空间标识符),应用程序初始化后,操作***可为该应用程序生成ASID,并将ASID与应用程序进程标识(PID)绑定。不同应用程序的ASID不同,这样,对于不同的应用程序而言,便可使用相同的虚拟地址。TLB中存储ASID、虚拟地址和物理地址三者之间的映射关系。
其中,进程是应用程序运行的载体,一个应用程序的运行通常对应一个进程。而线程是进程中执行运算的最小单位,被包含在进程之中,是进程中的实际运作单位。一个进程通常可包括多个线程,这些线程共享应用程序的内存空间。进程或线程在进行内存访问时,均使用与进程对应的ASID,MMU可基于ASID和虚拟地址,在TLB中查找与二者对应的物理地址。
对于多核CPU而言,往往存在很多跨CPU核心运行的场景。
例如,某一应用程序的进程被调度器从CPU核心1调度至CPU核心2。
再例如,某个进程的多个线程运行在不同的CPU核心中。
在这些跨CPU核心运行的场景中,由于都是同一个应用程序的进程或线程,其可共享相同的内存空间。然而,目前不同CPU核心的TLB无法共享,在上述跨CPU核心运行的场景中,每个CPU核心的MMU均需基于内存的进程页表查找虚拟地址对应的物理地址,然后再将映射关系存储至TLB中,导致了重复的进程页表查询,浪费CPU核心的处理资源,并且也会影响IO性能。
本说明书提供一种计算机***的内存访问方案,可提升跨CPU核心运行场景下的IO性能,节约CPU核心的处理资源。
图1是本说明书一示例性实施例示出的一种内存访问方法的流程示意图。
请参考图1,所述内存访问方法可用于计算机***的内存访问,所述计算机***包括CPU,CPU包括多个CPU核心,CPU核心包括TLB,所述方法应用于CPU核心,例如,可应用于CPU核心的MMU,包括以下步骤:
步骤102,响应于内存访问请求,在TLB中查找所述内存访问请求携带的虚拟地址对应的物理地址。
在本说明书中,运行于CPU核心的进程或线程可发起内存访问请求(由于线程是进程中的实际运作单位,后续均描述为线程发起内存访问请求),所述内存访问请求中通常携带有需要访问的虚拟地址。在支持ASID的CPU中,内存访问请求中还会携带应用程序的ASID,本说明书后续以支付ASID为例进行描述。
响应于所述内存访问请求,CPU核心的MMU可先查询CPU核心的TLB,在TLB中查询是否缓存有所述虚拟地址对应的物理地址。例如,查询TLB中与内存访问请求指定的ASID和虚拟地址对应的物理地址。
若在TLB中查询到对应的物理地址(TLB Hit),可基于查找到的物理地址进行内存访问。
若在TLB中未查询到对应的物理地址(TLB Miss),可执行下述步骤104。
值得注意的是,相关技术中,若在TLB中未查询到对应的物理地址,可基于内存的进程页表进行物理地址的查询。采用本说明书提供的内存访问方案,若在TLB中未查询到对应的物理地址,可直接执行下述步骤104,先不基于内存的进程页表进行物理地址的查询;也可在执行下述步骤104时,并行基于内存的进程页表进行物理地址的查询,本说明书对此不作特殊限制。
步骤104,在未查找到所述虚拟地址对应的物理地址的情况下,发送地址探测请求,所述地址探测请求中携带所述虚拟地址,以供接收到所述地址探测请求的CPU核心在其TLB中查找所述虚拟地址对应的物理地址,并在查找到对应的物理地址的情况下,返回地址探测响应,所述地址探测响应中携带查找到的所述物理地址。
基于前述步骤102的查找结果,在TLB中未缓存ASID和虚拟地址对应的物理地址的情况下,CPU核心可构造地址探测请求,并将所述ASID和虚拟地址添加至所述地址探测请求中。
其中,所述地址探测请求可基于缓存探测Snoop协议(Snoop Protocol)构造,Snoop协议是一种用硬件方式解决多核处理器冲缓存一致性的一种策略。当然,在说明书其他例子中,也可基于其他协议构造地址探测请求,本说明书对此不作特殊限制。
在一个例子中,CPU核心可将构造的地址探测请求广播发送至所有CPU核心。
在另一个例子中,CPU核心也可将构造的地址探测请求发送至指定的CPU核心。例如,CPU核心可先从指定的寄存器中读取目标CPU核心的目标核心标识,然后基于所述目标核心标识将所述地址探测请求发送至目标CPU核心。
其中,所述目标CPU核心可以是发起所述内存访问请求的线程所属进程中其他线程所在的CPU核心;所述目标CPU核心也可以是发起所述内存访问请求的线程被调度至本CPU核心之前所在的CPU核心。所述目标CPU核心的目标核心标识可由调度器写入至所述指定的寄存器。
在将所述地址探测请求发送至指定CPU核心的情况下,在从寄存器中读取到所述目标核心标识后,可将所述目标核心标识作为参数添加至所述地址探测请求中。
在本说明书中,CPU核心在接收到其他CPU核心发送的地址探测请求后,可在其TLB中查询是否缓存有地址探测请求携带的ASID和虚拟地址对应的物理地址。
在其TLB中缓存有所述ASID和虚拟地址对应的物理地址的情况下,可将所述物理地址添加至地址探测响应中返回给发送所述地址探测请求的CPU核心。
在其TLB中未缓存所述ASIM和虚拟地址对应的物理地址的情况下,也可向发送所述地址探测请求的CPU核心返回地址探测响应,该地址探测响应中不携带物理地址。
请参考图2所示的CPU框图,各CPU核心之间通过总线连接,例如,环形总线(Ring Bus)、MESH网络总线等。各CPU核心之间可通过总线实现地址探测请求和地址探测响应的传输。
值得注意的是,图2所示的地址探测请求/地址探测响应的传输方向仅为示例性说明,代表地址探测请求和地址探测响应在CPU核心之间传输,并不代表实际的传输路径。
步骤106,响应于接收到的地址探测响应,将所述地址探测响应中携带的所述物理地址与所述虚拟地址之间的映射关系存储至TLB中,并基于所述物理地址进行内存访问。
在本说明书中,CPU核心在接收到针对本CPU核心发送的地址探测请求的地址探测响应后,一方面,可从该地址探测响应中提取出所述物理地址,然后将所述物理地址、所述虚拟地址以及所述ASID之间的映射关系存储至TLB中。另一方面,可基于所述物理地址进行内存访问。
CPU核心在未接收到针对本CPU核心发送的地址探测请求的地址探测响应的情况下,可基于内存的进程页表查找所述虚拟地址对应的物理地址。
由以上描述可以看出,本说明书CPU核心在TLB中未存储虚拟地址对应的物理地址的情况下,发送地址探测请求给其他CPU核心,其他CPU核心查找各自TLB中是否存储虚拟地址对应的物理地址,并可将查询到的物理地址添加至地址探测响应中返回。CPU核心进而可在其TLB中存储所述物理地址和所述虚拟地址之间的映射关系,并进行内存访问。
采用本说明书提供的上述技术方案,可实现CPU核心之间的TLB缓存共享,极大程度的减少了跨CPU核心运行的场景下,CPU核心进行的重复进程页表查询,减少了CPU核心处理资源的浪费,并且有效缩短了TLB Miss情况下地址转换的耗时,提 升了IO性能。
另一方面,本说明书提供的上述技术方案,基于已有的硬件以及缓存探测协议即可实现,无需新增硬件,成本低,可行性高。
下面分别基于前述提到两种跨CPU核心运行的场景来详细描述本说明书的具体实现过程。
一、同一进程的多线程运行在不同CPU核心
在本说明书中,同一进程的多个线程共享内存空间,使用相同的ASID,即均使用其所属进程绑定的ASID。
表1
请参考表1的示例,假设某一进程包括4个线程,分别为线程1-线程4,其中,线程1和线程2运行在CPU核心8中,线程3和线程4运行在CPU核心12中,线程1-线程4使用的ASID均为ASID 7。
假设,线程1先进行内存访问,其访问的虚拟地址为0x800000,CPU核心8的MMU查找TLB 8,而TLB 8中未存储虚拟地址0x800000、ASID 7对应的物理地址,进而基于内存的进程页表查询对应的物理地址,并将查询到的物理地址及上述虚拟地址和ASID之间的映射关系存储至TLB 8中。然后,CPU核心8可基于查询到的物理地址进行内存访问。
表2
又假设,查询到的物理地址是0x2000,则TLB 8中可存储上述表2所示的TLB表项。值得注意的是,表2仅为示例性说明,在实际实现中,TLB表项还可包括访问权限(读或写)、页面类型等其他字段。
若线程3也需要进行内存访问,其访问的虚拟地址也是0x800000,CPU核心12的MMU查找TLB 12,而TLB 12中未存储虚拟地址0x800000、ASID 7对应的物理地址,相关技术中CPU核心12会基于内存的进程页表进行物理地址的查询。为避免这种重复查询,采用本说明书提供的技术方案,CPU核心12可构造地址探测请求,将虚拟地址0x800000和ASID 7添加至所述地址探测请求中。
在一个例子中,CPU核心12可通过总线将该地址探测请求广播发送至所有CPU 核心。在多核CPU的架构下,可采用MESH网络实现总线设计,延迟更小。
在另一个例子中,请参考图3,CPU核心12可将该地址探测请求发送至与线程3属于同一进程的线程1-2所在的CPU核心8。
在本例中,CPU核心12可先从指定的寄存器中读取CPU核心8的核心标识8,然后将核心标识8作为参数也添加至地址探测请求中。以Snoop协议为例,该地址探测请求被发送至Snoop Agent,Snoop Agent可根据地址探测请求中携带的核心标识8将该地址探测请求发送至CPU核心8。
其中,所述寄存器中的核心标识8可由调度器写入。调度器知晓同一进程下的所有线程,以及各线程所运行的CPU核心,调度器可在这些CPU核心的指定寄存器中写入该进程下各线程所运行的CPU核心的核心标识。
仍以表1所示的情况为例,该进程下的线程运行在两个CPU核心中,即CPU核心8和CPU核心12,调度器可将核心标识8写入CPU核心12的指定寄存器中,可将核心标识12写入CPU核心8的指定寄存器中。当然,也可不排除当前CPU核心,将核心标识8和核心标识12分别写入CPU核心8和CPU核心12的指定寄存器中。值得注意的是,表1的示例中,进程运行在两个CPU核心中,在其他例子中,还可以运行在3个,甚至更多的CPU核心中,本说明书对此不作特殊限制。
在本说明书中,请继续参考图3,CPU核心8在接收到CPU核心12发送的地址探测消息后,在TLB 8中查找到虚拟地址0x800000、ASID 7对应的物理地址0x2000,然后可将物理地址0x2000、虚拟地址0x800000以及ASID 7添加至地址探测响应中返回给CPU核心12,CPU核心12可将物理地址0x2000、虚拟地址0x800000以及ASID7之间的映射关系存储至TLB 12中,即也形成表2所示的TLB表项。CPU核心12还可基于物理地址0x2000进行内存访问。
在本说明书中,CPU核心8也可仅将物理地址0x2000添加至地址探测响应中,本说明书对此不作特殊限制。
需要说明的是,在基于Snoop协议实现物理地址探测的过程中,通常由Snoop Agent实现地址探测请求与地址探测响应的转发,例如,Snoop Agent在接收到不同CPU核心返回的地址探测响应后,从中过滤掉未携带物理地址的地址探测响应,并且,还可对不同CPU核心返回的携带相同查找结果的地址探测响应进行去重等,例如,向发送地址探测请求的CPU核心返回一个携带前述物理地址的地址探测响应。
由以上描述可以看出,在同一进程的多线程运行在不同CPU核心的场景中,CPU核心在TLB中未存储虚拟地址对应的物理地址的情况下,可发送地址探测请求给其他CPU核心,其他CPU核心查找各自TLB中是否存储虚拟地址对应的物理地址,并可将查询到的物理地址添加至地址探测响应中返回。CPU核心进而可在其TLB中存储所述物理地址和所述虚拟地址之间的映射关系,并进行内存访问。
采用本说明书提供的上述技术方案,可实现CPU核心之间的TLB缓存共享,极 大程度的减少了同一进程的多线程运行在不同CPU核心的场景下,CPU核心进行的重复进程页表查询,减少了CPU核心处理资源的浪费,并且有效缩短了TLB Miss情况下地址转换的耗时,提升了IO性能。
另一方面,本说明书提供的上述技术方案,基于已有的硬件以及缓存探测协议即可实现,无需新增硬件,成本低,可行性高。
二、进程迁移
相关技术中,TLB表项可具有三种状态,分别为:Valid、Stale和Invalid。
Valid表示对应的TLB表项有效;
Stale表示对应的TLB表项暂时失效,可被重新激活为Valid状态;
Invalid表示在TLB表项对应的内存被释放的情况下,例如进程销毁,销毁对应的TLB表项,被销毁的TLB表项不可被再次激活。
其中,TLB表项生成后,其状态为有效Valid。在进程切换过程中,若进程被换出,被换出进程绑定的ASID对应的TLB表项被置为无效Stale状态。例如,操作***在进程被换出后发送TLB无效指令,该TLB无效指令中指定有被换出的进程绑定的ASID,基于该TLB无效指令,会将该指定ASID对应的TLB表项由有效Valid状态置为无效Stale状态。当进程切换回之后,在进行内存访问时,MMU查询命中Stale状态的TLB表项,进而可将命中的TLB表项的状态由无效Stale置为有效Valid。
在进程销毁后,操作***可发送TLB销毁指令(TLB Shootdown),该TLB销毁指令中指定有销毁的进程绑定的ASID,基于该TLB销毁指令,可将该指定ASID对应的TLB表项(包括有效Valid状态的TLB表项和无效Stale状态的TLB表项)彻底销毁,例如,删除TLB表项,使之处于Invalid状态。
在本说明书中,调度器可基于各个CPU核心的负载情况进行进程调度,例如某一进程运行在第一CPU核心中,将该进程调度至第二CPU核心中运行等。其中,对进程进行调度通常指对该进程下的所有线程进行调度。
相关技术中,在进程调度的情况下,操作***会发送TLB销毁指令至第一CPU核心,进而将第一CPU核心的TLB中该进程绑定的ASID对应的TLB表项彻底销毁。进程被调度至第二CPU核心后,在进行内存访问时,第二CPU核心仍需基于内存的进程页表进行地址转换。
为避免这种重复查询,采用本说明书提供的技术方案,在进程调度的情况下,一方面,操作***发送TLB无效指令来替代TLB销毁指令,以避免相关TLB表项被彻底销毁。另一方面,第二CPU核心在TLB Miss的情况下,可构造地址探测请求,请求其他CPU核心协助进行物理地址的查询。
请参考图4,在进程调度场景下,本说明书提供的内存访问方法可包括如下步骤:
步骤402,第一CPU核心接收到TLB无效指令,将第一TLB中被调出进程绑定的TLB表项置为无效状态。
在本实施例中,在进程被调度出第一CPU核心后,与相关技术不同,操作***不发送TLB销毁指令至第一CPU核心,而是发送TLB无效指令至第一CPU核心,该TLB无效指令中指定有被调出的进程绑定的ASID。
响应于所述TLB无效指令,第一CPU核心将第一TLB(即第一CPU核心的TLB)中该ASID对应的TLB表项由有效Valid状态置为无效Stale状态。
换言之,采用本说明书提供的技术方案,在进程调度时,进程原来所在的CPU核心中的TLB表项不会被彻底销毁,而是将这些TLB表项置为暂时无效的Stale状态。
步骤404,第二CPU核心响应于被调入进程的内存访问请求,在第二TLB中查找虚拟地址对应的物理地址。
在本实施例中,被调度至第二CPU核心运行的上述进程或该进程下的线程在进行内存访问时,向第二CPU核心发起内存访问请求。第二CPU核心进而先在第二TLB(即第二CPU核心的TLB)中查找ASID和虚拟地址对应的物理地址。若查找到对应的物理地址(TLB Hit),可基于该物理地址进行内存访问。若未查找到对应的物理地址(TLB Miss),可执行下述步骤406。
步骤406,第二CPU核心在未查找到所述物理地址的情况下,发送地址探测请求至第一CPU核心。
基于前述步骤404的查询结果,第二CPU核心在未查找到所述物理地址的情况下,可构造地址探测请求,并将要访问的虚拟地址和进程绑定的ASID添加至所述地址探测请求中。
第二CPU核心发送所述地址探测请求。例如,可广播发送所述地址探测请求,也可将该地址探测请求发送至进程被调度前所在的CPU核心,即第一CPU核心。
在本实施例中,地址探测请求的构造以及发送可参考前述实施例的具体实现过程,本实施例在此不再一一赘述。
需要说明的是,在第二CPU核心基于寄存器中第一CPU核心的第一核心标识,将地址探测请求发送至第一CPU核心的情况下,指定寄存器中的第一核心标识可由调度器在进程调度后写入。
步骤408,第一CPU核心响应于所述地址探测请求,在第一TLB中查找所述虚拟地址对应的物理地址。
在本实施例中,第一CPU核心响应于所述地址探测请求,在第一TLB中查找有效状态和无效状态的TLB表项,以进行物理地址的查询。
其中,若查询命中有效状态的TLB表项,可说明大概率为同一进程的不同线程运行在不同CPU核心的场景,即第二CPU核心中发起内存访问请求的线程与第一CPU核心中运行的某些线程归属于同一进程。
若查询命中无效状态的TLB表项,可说明大概率为进程调度迁移的场景,即进程原来运行在第一CPU核心中,后被调度器迁移至第二CPU核心。
换言之,对于接收到地址探测请求的CPU核心而言,在查询TLB表项时,既需要查询有效状态的TLB表项,也需要查询无效状态的TLB表项,查询命中后,即可将查找到的物理地址返回,CPU核心无需关注具体的应用场景。
步骤410,第一CPU核心将查找到的物理地址添加至地址探测响应中返回给第二CPU核心。
步骤412,第二CPU核心在接收到所述地址探测响应后,将所述物理地址和所述虚拟地址之间的映射关系存储至第二TLB中,并基于所述物理地址进行内存访问。
在本实施例中,步骤410-412的实现过程可参考前述实施例中的描述。
在本实施例中,若第二CPU核心未接收到携带有物理地址的地址探测响应,例如在预设时长内均未接收到携带有物理地址的地址探测响应,第二CPU核心可基于内存的进程页表进行物理地址的查询。
由以上描述可以看出,在进程迁移的场景中,CPU核心在TLB中未存储虚拟地址对应的物理地址的情况下,可发送地址探测请求给其他CPU核心,其他CPU核心查找各自TLB中是否存储虚拟地址对应的物理地址,并可将查询到的物理地址添加至地址探测响应中返回。CPU核心进而可在其TLB中存储所述物理地址和所述虚拟地址之间的映射关系,并进行内存访问。
采用本说明书提供的上述技术方案,可实现CPU核心之间的TLB缓存共享,极大程度的减少了进程迁移场景下,CPU核心进行的重复进程页表查询,减少了CPU核心处理资源的浪费,并且有效缩短了TLB Miss情况下地址转换的耗时,提升了IO性能。
另一方面,本说明书提供的上述技术方案,基于已有的硬件以及缓存探测协议即可实现,无需新增硬件,成本低,可行性高。
与前述内存访问方法的实施例相对应,本说明书还提供了内存访问装置的实施例。
本说明书内存访问装置的实施例可以应用在计算机***的CPU核心中,CPU核心包括TLB,TLB用于缓存虚拟地址与物理地址之间的映射关系。请参考图5,内存访问装置500包括有:地址查找单元501、地址探测单元502、内存访问单元503和状态标记单元504。
其中,地址查找单元501,响应于内存访问请求,在TLB中查找所述内存访问请求携带的虚拟地址对应的物理地址;
地址探测单元502,在未查找到所述虚拟地址对应的物理地址的情况下,发送地址探测请求,所述地址探测请求中携带所述虚拟地址,以供接收到所述地址探测请求的CPU核心在其TLB中查找所述虚拟地址对应的物理地址,并在查找到对应的物理地址的情况下,返回地址探测响应,所述地址探测响应中携带查找到的所述物理地址;
内存访问单元503,响应于接收到的地址探测响应,将所述地址探测响应中携带的所述物理地址与所述虚拟地址之间的映射关系存储至TLB中,并基于所述物理地址 进行内存访问。
可选的,所述地址探测单元502,从寄存器中读取目标CPU核心的目标核心标识;基于所述目标核心标识发送地址探测请求至所述目标CPU核心。
可选的,所述地址探测单元502,广播发送地址探测请求。
可选的,所述目标CPU核心是发起所述内存访问请求的线程所属进程中其他线程所在的CPU核心。
可选的,所述目标CPU核心是发起所述内存访问请求的线程被调度至本CPU核心之前所在的CPU核心。
可选的,所述目标核心标识由调度器写入。
可选的,还包括:
状态标记单元504,接收TLB无效指令,所述TLB无效指令是在发起所述内存访问请求的线程被调度至其他CPU核心后发送;
响应于所述TLB无效指令,将TLB中所述TLB无效指令指定的虚拟地址与物理地址之间的映射关系标记为无效状态;
所述方法还包括:
所述地址查找单元501,在接收到其他CPU核心发送的地址探测请求后,在TLB中查找处于有效状态和无效状态下的虚拟地址对应的物理地址。
可选的,所述地址查找单元501,在未接收到所述地址探测响应的情况下,基于内存的进程页表查找所述虚拟地址对应的物理地址。
上述装置中各个单元的功能和作用的实现过程具体详见上述方法中对应步骤的实现过程,在此不再赘述。
对于装置实施例而言,由于其基本对应于方法实施例,所以相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本说明书方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
上述实施例阐明的***、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机,计算机的具体形式可以是个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件收发设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任意几种设备的组合。
与前述内存访问方法的实施例相对应,本说明书还提供一种CPU核心,该CPU核心包括旁路转换缓存TLB,TLB用于缓存虚拟地址与物理地址之间的映射关系,该CPU核心被配置为:
响应于内存访问请求,在TLB中查找所述内存访问请求携带的虚拟地址对应的物理地址;
在未查找到所述虚拟地址对应的物理地址的情况下,发送地址探测请求,所述地址探测请求中携带所述虚拟地址,以供接收到所述地址探测请求的CPU核心在其TLB中查找所述虚拟地址对应的物理地址,并在查找到对应的物理地址的情况下,返回地址探测响应,所述地址探测响应中携带查找到的所述物理地址;
响应于接收到的地址探测响应,将所述地址探测响应中携带的所述物理地址与所述虚拟地址之间的映射关系存储至TLB中,并基于所述物理地址进行内存访问。
可选的,所述发送地址探测请求,包括:
广播发送地址探测请求。
可选的,所述发送地址探测请求,包括:
从寄存器中读取目标CPU核心的目标核心标识;
基于所述目标核心标识发送地址探测请求至所述目标CPU核心。
可选的,所述目标CPU核心是发起所述内存访问请求的线程所属进程中其他线程所在的CPU核心。
可选的,所述目标CPU核心是发起所述内存访问请求的线程被调度至本CPU核心之前所在的CPU核心。
可选的,所述目标核心标识由调度器写入。
可选的,还包括:
接收TLB无效指令,所述TLB无效指令是在发起所述内存访问请求的线程被调度至其他CPU核心后发送;
响应于所述TLB无效指令,将TLB中所述TLB无效指令指定的虚拟地址与物理地址之间的映射关系标记为无效状态;
该CPU核心还被配置为:
在接收到其他CPU核心发送的地址探测请求后,在TLB中查找处于有效状态和无效状态下的虚拟地址对应的物理地址。
可选的,还包括:
在未接收到所述地址探测响应的情况下,基于内存的进程页表查找所述虚拟地址对应的物理地址。
与前述内存访问方法的实施例相对应,本说明书还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,该程序被CPU核心执行时实现以下步骤:
响应于内存访问请求,在TLB中查找所述内存访问请求携带的虚拟地址对应的物理地址;
在未查找到所述虚拟地址对应的物理地址的情况下,发送地址探测请求,所述地 址探测请求中携带所述虚拟地址,以供接收到所述地址探测请求的CPU核心在其TLB中查找所述虚拟地址对应的物理地址,并在查找到对应的物理地址的情况下,返回地址探测响应,所述地址探测响应中携带查找到的所述物理地址;
响应于接收到的地址探测响应,将所述地址探测响应中携带的所述物理地址与所述虚拟地址之间的映射关系存储至TLB中,并基于所述物理地址进行内存访问。
可选的,所述发送地址探测请求,包括:
广播发送地址探测请求。
可选的,所述发送地址探测请求,包括:
从寄存器中读取目标CPU核心的目标核心标识;
基于所述目标核心标识发送地址探测请求至所述目标CPU核心。
可选的,所述目标CPU核心是发起所述内存访问请求的线程所属进程中其他线程所在的CPU核心。
可选的,所述目标CPU核心是发起所述内存访问请求的线程被调度至本CPU核心之前所在的CPU核心。
可选的,所述目标核心标识由调度器写入。
可选的,还包括:
接收TLB无效指令,所述TLB无效指令是在发起所述内存访问请求的线程被调度至其他CPU核心后发送;
响应于所述TLB无效指令,将TLB中所述TLB无效指令指定的虚拟地址与物理地址之间的映射关系标记为无效状态;
所述方法还包括:
在接收到其他CPU核心发送的地址探测请求后,在TLB中查找处于有效状态和无效状态下的虚拟地址对应的物理地址。
可选的,还包括:
在未接收到所述地址探测响应的情况下,基于内存的进程页表查找所述虚拟地址对应的物理地址。
上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。
以上所述仅为本说明书的较佳实施例而已,并不用以限制本说明书,凡在本说明书的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本说明书保护的范围之内。

Claims (10)

  1. 一种内存访问方法,用于计算机***的内存访问,所述计算机***包括中央处理器CPU,CPU包括多个CPU核心,CPU核心包括旁路转换缓存TLB,TLB用于缓存虚拟地址与物理地址之间的映射关系,所述方法应用于CPU核心,包括:
    响应于内存访问请求,在TLB中查找所述内存访问请求携带的虚拟地址对应的物理地址;
    在未查找到所述虚拟地址对应的物理地址的情况下,发送地址探测请求,所述地址探测请求中携带所述虚拟地址,以供接收到所述地址探测请求的CPU核心在其TLB中查找所述虚拟地址对应的物理地址,并在查找到对应的物理地址的情况下,返回地址探测响应,所述地址探测响应中携带查找到的所述物理地址;
    响应于接收到的地址探测响应,将所述地址探测响应中携带的所述物理地址与所述虚拟地址之间的映射关系存储至TLB中,并基于所述物理地址进行内存访问。
  2. 根据权利要求1所述的方法,所述发送地址探测请求,包括:
    广播发送地址探测请求。
  3. 根据权利要求1所述的方法,所述发送地址探测请求,包括:
    从寄存器中读取目标CPU核心的目标核心标识;
    基于所述目标核心标识发送地址探测请求至所述目标CPU核心。
  4. 根据权利要求3所述的方法,所述目标CPU核心是发起所述内存访问请求的线程所属进程中其他线程所在的CPU核心。
  5. 根据权利要求3所述的方法,所述目标CPU核心是发起所述内存访问请求的线程被调度至本CPU核心之前所在的CPU核心。
  6. 根据权利要求4或5所述的方法,所述目标核心标识由调度器写入。
  7. 根据权利要求1所述的方法,还包括:
    接收TLB无效指令,所述TLB无效指令是在发起所述内存访问请求的线程被调度至其他CPU核心后发送;
    响应于所述TLB无效指令,将TLB中所述TLB无效指令指定的虚拟地址与物理地址之间的映射关系标记为无效状态;
    所述方法还包括:
    在接收到其他CPU核心发送的地址探测请求后,在TLB中查找处于有效状态和无效状态下的虚拟地址对应的物理地址。
  8. 根据权利要求1所述的方法,还包括:
    在未接收到所述地址探测响应的情况下,基于内存的进程页表查找所述虚拟地址对应的物理地址。
  9. 一种内存访问装置,用于计算机***的内存访问,所述计算机***包括中央处理器CPU,CPU包括多个CPU核心,CPU核心包括旁路转换缓存TLB,TLB用于缓 存虚拟地址与物理地址之间的映射关系,所述装置应用于CPU核心,包括:
    地址查找单元,响应于内存访问请求,在TLB中查找所述内存访问请求携带的虚拟地址对应的物理地址;
    地址探测单元,在未查找到所述虚拟地址对应的物理地址的情况下,发送地址探测请求,所述地址探测请求中携带所述虚拟地址,以供接收到所述地址探测请求的CPU核心在其TLB中查找所述虚拟地址对应的物理地址,并在查找到对应的物理地址的情况下,返回地址探测响应,所述地址探测响应中携带查找到的所述物理地址;
    内存访问单元,响应于接收到的地址探测响应,将所述地址探测响应中携带的所述物理地址与所述虚拟地址之间的映射关系存储至TLB中,并基于所述物理地址进行内存访问。
  10. 一种中央处理器CPU,CPU包括多个CPU核心,CPU核心包括旁路转换缓存TLB,TLB用于缓存虚拟地址与物理地址之间的映射关系,所述CPU核心被配置为:
    响应于内存访问请求,在TLB中查找所述内存访问请求携带的虚拟地址对应的物理地址;
    在未查找到所述虚拟地址对应的物理地址的情况下,发送地址探测请求,所述地址探测请求中携带所述虚拟地址,以供接收到所述地址探测请求的CPU核心在其TLB中查找所述虚拟地址对应的物理地址,并在查找到对应的物理地址的情况下,返回地址探测响应,所述地址探测响应中携带查找到的所述物理地址;
    响应于接收到的地址探测响应,将所述地址探测响应中携带的所述物理地址与所述虚拟地址之间的映射关系存储至TLB中,并基于所述物理地址进行内存访问。
PCT/CN2023/075635 2022-03-02 2023-02-13 内存访问方法和装置 WO2023165317A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210200142.7A CN114840445A (zh) 2022-03-02 2022-03-02 内存访问方法和装置
CN202210200142.7 2022-03-02

Publications (1)

Publication Number Publication Date
WO2023165317A1 true WO2023165317A1 (zh) 2023-09-07

Family

ID=82561573

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/075635 WO2023165317A1 (zh) 2022-03-02 2023-02-13 内存访问方法和装置

Country Status (2)

Country Link
CN (1) CN114840445A (zh)
WO (1) WO2023165317A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114840445A (zh) * 2022-03-02 2022-08-02 阿里巴巴(中国)有限公司 内存访问方法和装置
CN116644006B (zh) * 2023-07-27 2023-11-03 浪潮电子信息产业股份有限公司 一种内存页面管理方法、***、装置、设备及计算机介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050172099A1 (en) * 2004-01-17 2005-08-04 Sun Microsystems, Inc. Method and apparatus for memory management in a multi-processor computer system
CN102662726A (zh) * 2012-04-01 2012-09-12 龙芯中科技术有限公司 虚拟机的模拟方法和计算机设备
US20190018800A1 (en) * 2017-07-14 2019-01-17 Advanced Micro Devices, Inc. Protecting host memory from access by untrusted accelerators
CN114064524A (zh) * 2021-11-22 2022-02-18 浪潮商用机器有限公司 一种服务器、提升服务器性能的方法、装置及介质
US11392508B2 (en) * 2017-11-29 2022-07-19 Advanced Micro Devices, Inc. Lightweight address translation for page migration and duplication
CN114840445A (zh) * 2022-03-02 2022-08-02 阿里巴巴(中国)有限公司 内存访问方法和装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050172099A1 (en) * 2004-01-17 2005-08-04 Sun Microsystems, Inc. Method and apparatus for memory management in a multi-processor computer system
CN102662726A (zh) * 2012-04-01 2012-09-12 龙芯中科技术有限公司 虚拟机的模拟方法和计算机设备
US20190018800A1 (en) * 2017-07-14 2019-01-17 Advanced Micro Devices, Inc. Protecting host memory from access by untrusted accelerators
US11392508B2 (en) * 2017-11-29 2022-07-19 Advanced Micro Devices, Inc. Lightweight address translation for page migration and duplication
CN114064524A (zh) * 2021-11-22 2022-02-18 浪潮商用机器有限公司 一种服务器、提升服务器性能的方法、装置及介质
CN114840445A (zh) * 2022-03-02 2022-08-02 阿里巴巴(中国)有限公司 内存访问方法和装置

Also Published As

Publication number Publication date
CN114840445A (zh) 2022-08-02

Similar Documents

Publication Publication Date Title
WO2023165317A1 (zh) 内存访问方法和装置
US9323672B2 (en) Scatter-gather intelligent memory architecture for unstructured streaming data on multiprocessor systems
US6647466B2 (en) Method and apparatus for adaptively bypassing one or more levels of a cache hierarchy
US9229873B2 (en) Systems and methods for supporting a plurality of load and store accesses of a cache
US7290116B1 (en) Level 2 cache index hashing to avoid hot spots
US8250254B2 (en) Offloading input/output (I/O) virtualization operations to a processor
US8285969B2 (en) Reducing broadcasts in multiprocessors
JP5259866B2 (ja) キャッシュメモリの最大レジデンシー交換のための方法およびシステム
US7430643B2 (en) Multiple contexts for efficient use of translation lookaside buffer
US9632940B2 (en) Intelligence cache and intelligence terminal
US8285926B2 (en) Cache access filtering for processors without secondary miss detection
WO2023165319A1 (zh) 内存访问方法、装置和输入输出内存管理单元
WO2017190266A1 (zh) 管理转址旁路缓存的方法和多核处理器
US20040225840A1 (en) Apparatus and method to provide multithreaded computer processing
TWI785320B (zh) 裝置內標記資料移動系統、資訊處置系統及用於提供裝置內標記資料移動之方法
US8468297B2 (en) Content addressable memory system
US20080065855A1 (en) DMAC Address Translation Miss Handling Mechanism
CN115048142A (zh) 缓存访问命令处理***、方法、装置、设备和存储介质
US20230236988A1 (en) Reducing Translation Lookaside Buffer Searches for Splintered Pages
US10754791B2 (en) Software translation prefetch instructions
Agarwal et al. Using CoDeL to rapidly prototype network processsor extensions
US10853293B2 (en) Switch-based inter-device notational data movement system
US11741017B2 (en) Power aware translation lookaside buffer invalidation optimization

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23762718

Country of ref document: EP

Kind code of ref document: A1