WO2023051359A1 - 控制内存带宽的方法、装置、处理器及计算设备 - Google Patents

控制内存带宽的方法、装置、处理器及计算设备 Download PDF

Info

Publication number
WO2023051359A1
WO2023051359A1 PCT/CN2022/120293 CN2022120293W WO2023051359A1 WO 2023051359 A1 WO2023051359 A1 WO 2023051359A1 CN 2022120293 W CN2022120293 W CN 2022120293W WO 2023051359 A1 WO2023051359 A1 WO 2023051359A1
Authority
WO
WIPO (PCT)
Prior art keywords
memory
bandwidth
processor
accessed
memory medium
Prior art date
Application number
PCT/CN2022/120293
Other languages
English (en)
French (fr)
Inventor
陈欢
祝晓平
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP22874746.5A priority Critical patent/EP4390685A1/en
Publication of WO2023051359A1 publication Critical patent/WO2023051359A1/zh
Priority to US18/612,459 priority patent/US20240231654A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • G06F3/0631Configuration or reconfiguration of storage systems by allocating resources to storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/177Initialisation or configuration control
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices

Definitions

  • the present application relates to the field of computer technology, and in particular to a method, device, processor and computing device for controlling memory bandwidth.
  • the present application provides a method, a device, a processor and a computing device for controlling the memory bandwidth, thereby ensuring the overall access speed of the processor to access the memory on the premise of increasing the memory capacity.
  • a method for controlling memory bandwidth includes multiple processors and multiple different types of memory media.
  • the first processor is associated with at least two different types of memory media.
  • the first processor is For any one of the multiple processors, the method is executed by the first processor, and specifically includes the following steps: after the first processor obtains the required bandwidth of the memory medium to be accessed, obtains the occupation of the memory bandwidth of the memory medium to be accessed If it is determined according to the memory bandwidth occupancy rate that the memory medium to be accessed cannot meet the bandwidth requirement, adjust the memory bandwidth occupancy rate according to the factors that affect the memory bandwidth occupancy rate of the memory medium to be accessed according to the bandwidth adjustment policy indication, and start from the memory medium to be accessed The first bandwidth meeting the bandwidth requirement is used in the adjusted remaining bandwidth of the medium. In this way, by dynamically adjusting the remaining bandwidth of the memory medium, the memory medium to be accessed can provide enough bandwidth for the application running on the processor to ensure the overall access speed of the processor to access the memory.
  • the factors affecting the memory bandwidth occupancy of the memory medium to be accessed include at least one of user-oriented applications and system-oriented applications run by the first processor.
  • user-facing applications include big data applications, database applications, and cloud service applications.
  • System-oriented applications include operating system management applications, memory copy and data migration.
  • adjusting the memory bandwidth occupancy rate according to the bandwidth adjustment strategy includes: controlling factors that affect the memory bandwidth occupancy rate of the memory medium to be accessed occupying the bandwidth of the memory medium to be accessed, and obtaining the adjusted remaining bandwidth, The remaining bandwidth after adjustment is greater than the remaining bandwidth before adjustment.
  • controlling factors that affect the memory bandwidth occupancy rate of the memory medium to be accessed occupies the bandwidth of the memory medium to be accessed, including: determining the limited available bandwidth of the memory medium to be accessed according to the remaining bandwidth and the bandwidth threshold; The factor of the memory bandwidth occupancy rate of the access memory medium accesses the memory medium to be accessed, and obtains the adjusted remaining bandwidth.
  • the bandwidth threshold is obtained according to the total bandwidth of the memory medium to be accessed and an adjustment factor.
  • the at least two different types of memory media associated with the first processor include a first memory medium and a second memory medium, and the access speed of the first memory medium is greater than that of the second memory medium. speed; the memory medium to be accessed is the first memory medium associated with the first processor or the second memory medium associated with the first processor; or/and, the memory medium to be accessed is the first memory medium associated with a processor adjacent to the first processor A memory medium or a second memory medium associated with a processor proximate to the first processor.
  • using the first bandwidth that meets the bandwidth requirement from the adjusted remaining bandwidth of the memory medium to be accessed includes: accessing memory allocated at a preset memory allocation granularity in the memory medium to be accessed based on the first bandwidth Space, the preset memory allocation granularity is larger than the page size of the memory medium.
  • the first processor is connected to multiple different types of memory media through interfaces supporting memory semantics, and the interfaces include supporting computer fast link (Compute Express Link TM , CXL), cache coherent interconnection protocol ( Cache Coherent Interconnect for Accelerators, CCIX) or unified bus (unified bus, UB or Ubus) at least one interface.
  • the interfaces include supporting computer fast link (Compute Express Link TM , CXL), cache coherent interconnection protocol ( Cache Coherent Interconnect for Accelerators, CCIX) or unified bus (unified bus, UB or Ubus) at least one interface.
  • the first memory medium is a dynamic random access memory (dynamic random access memory, DRAM)
  • the second memory medium is a storage-class memory (storage-class-memory, SCM)
  • the SCM includes a corresponding Phase-change memory (PCM), magnetic random-access memory (magnetoresistive random-access memory, MRAM), resistive random access memory (resistive random access memory, RRAM/ReRAM), ferroelectric random access memory (FRAM) ), at least one of fast NAND (fast NAND) or nano random access memory (Nano-RAM, NRAM).
  • PCM Phase-change memory
  • MRAM magnetic random-access memory
  • MRAM magnetic random-access memory
  • resistive random access memory resistive random access memory
  • FRAM ferroelectric random access memory
  • fast NAND fast NAND
  • nano random access memory Nano-RAM, NRAM
  • the hybrid memory system is applied to a scenario of deploying large-capacity memory, and the scenario includes at least one of big data, in-memory database, or cloud service.
  • the hybrid memory system is a server or a server cluster, and the server cluster includes two or more servers.
  • an apparatus for controlling memory bandwidth includes various modules for executing the method for controlling memory bandwidth in the first aspect or any possible design of the first aspect.
  • a processor is provided, the processor is associated with at least two different types of memory media, and the processor is used to execute the method for controlling memory bandwidth in the first aspect or any possible design of the first aspect operation steps.
  • a computing device in a fourth aspect, includes at least one processor, a memory, and a plurality of different types of memory media, the memory is used to store a set of computer instructions; when the processor executes the set of computer instructions, Execute the operation steps of the method for controlling memory bandwidth in the first aspect or in any possible implementation manner of the first aspect.
  • a computer system in a fifth aspect, includes a memory, at least one processor, and multiple different types of memory media, each processor is associated with at least two different types of memory media, and the memory is used to store A set of computer instructions; when the processor executes the set of computer instructions, perform the operation steps of the first aspect or the method for controlling memory bandwidth in any possible implementation manner of the first aspect.
  • a computer-readable storage medium including: computer software instructions; when the computer software instructions are run in the computing device, the computing device is made to execute the computer program described in the first aspect or any one of the possible implementation manners of the first aspect. Operational steps of the method.
  • a computer program product is provided.
  • the computing device executes the operation steps of the method described in the first aspect or any possible implementation manner of the first aspect.
  • a chip system in an eighth aspect, includes a first processor and at least two different types of memory media associated with the first processor, for implementing the functions of the first processor in the method of the first aspect above.
  • the chip system further includes a memory, configured to store program instructions and/or data.
  • the system-on-a-chip may consist of chips, or may include chips and other discrete devices.
  • FIG. 1 is a schematic diagram of a storage system with a three-layer structure provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of a hybrid memory system provided by an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of a method for controlling memory bandwidth provided by an embodiment of the present application
  • FIG. 4 is a schematic structural diagram of a device for controlling memory bandwidth provided by the present application.
  • FIG. 5 is a schematic structural diagram of a computing device provided by the present application.
  • FIG. 6 is a schematic diagram of a computer system provided by the present application.
  • Memory is a memory device used to store programs and various data.
  • the access speed refers to the data transfer speed when writing data or reading data to the memory. Access speed can also be called read and write speed.
  • the memory can be divided into different layers according to the storage capacity and access speed.
  • FIG. 1 is a schematic diagram of a storage system with a three-layer structure provided by an embodiment of the present application. From the first layer to the third layer, the storage capacity increases step by step, the access speed decreases step by step, and the cost decreases step by step.
  • the first layer includes a register 111, a first-level cache 112, a second-level cache 113, and a third-level cache 114 located in a central processing unit (CPU).
  • the second layer contains memory that can be used as the main memory of the computer system. For example, a dynamic random access memory 121, a double data rate synchronous dynamic random access memory (double data rate synchronous DRAM, DDR SDRAM) 122.
  • Main memory can be simply called main memory or internal memory, which is the memory that exchanges information with the CPU.
  • the memory contained in the third layer can be used as auxiliary memory of the computer system.
  • a network storage 131 a solid state drive (solid state disk or solid state drive, SSD) 132, and a hard disk drive (hard disk drive, HDD) 133.
  • Secondary storage may be referred to simply as secondary storage or external storage.
  • the storage capacity of the external memory is large, but the access speed is slow. It can be seen that the closer the memory is to the CPU, the smaller the storage capacity, the faster the access speed, the larger the bandwidth, and the smaller the access delay.
  • the range of the access delay of DRAM 121 can be 50-100 nanoseconds (nanosecond, ns), the range of the access delay of network storage 131 can be 1-1000 microseconds (microsecond, ⁇ s), the access time of solid-state drive The range of delay can be 100 ⁇ s.
  • the range of the access delay of the hard disk drive may be 1 millisecond (milli second, ms). Therefore, the memory contained in the third layer can be used as a back-end storage device.
  • the memory contained in the second layer can be used as a cache device to store data frequently accessed by the CPU, which can significantly improve the access performance of the system.
  • a system that utilizes multiple different types of storage media as memory can be referred to as a hybrid memory system.
  • the storage medium used as memory in a hybrid memory system may be called a memory medium.
  • multiple different types of memory media include a first memory medium and a second memory medium, the storage capacity of the first memory medium is lower than that of the second memory medium, and the access speed of the first memory medium is higher than that of the second memory medium The access speed of the medium, the access delay of the first memory medium is lower than the access delay of the second memory medium, and the cost of the first memory medium is higher than the cost of the second memory medium.
  • Storage-class memory (storage-class-memory, SCM) not only has the advantages of memory, but also takes into account the characteristics of storage. Simply understood, it is a new type of non-volatile memory medium. SCM has the characteristics of non-volatility, extremely short access time, low price per bit, solid state, and no mobile area.
  • phase-change memory phase-change memory, PCM
  • Optane memory developed based on 3D Xpoint ( Optane TM Memory).
  • SCM also includes magnetic random access memory (magnetoresistive random-access memory, MRAM), resistive random access memory (resistive random access memory, RRAM/ReRAM), ferroelectric random access memory (FRAM), fast Other types such as NAND (fast NAND) and nano random access memory (Nano-RAM, NRAM).
  • MRAM magnetic random access memory
  • RRAM/ReRAM resistive random access memory
  • FRAM ferroelectric random access memory
  • NAND fast NAND
  • nano-RAM nano random access memory
  • the storage capacity of the SCM can be hundreds of gigabytes (Gigabyte, GB), and the access delay of the SCM can be in the range of 120-400 ns.
  • the SCM may be located at the second layer in the hierarchical architecture of the storage system shown in FIG. 1 , for example, the storage class memory 123 . Since the SCM has the characteristics of large storage capacity and fast access speed, the SCM and other storage media in the second layer can be used as the memory medium in the hybrid memory system. For example, DDR and SCM are used as memory media in a hybrid memory system, or DRAM and SCM are used as memory media in a hybrid memory system.
  • FIG. 2 is a schematic diagram of a hybrid memory system provided by an embodiment of the present application.
  • the hybrid memory system 200 includes multiple processors and multiple different types of memory media. Multiple processors are connected through a quick path interconnect (quick path interconnect, QPI) (or called a common system interface (common system interface, CSI)). Each of the multiple processors may be associated with at least one type of memory medium, that is, some processors may be associated with one type of memory medium, and some processors may be associated with more than two types of memory media.
  • quick path interconnect quick path interconnect
  • QPI quick path interconnect
  • CSI common system interface
  • a plurality of processors includes a processor 210 and a processor 240, a plurality of different types of memory media include a first memory medium and a second memory medium, the first memory medium may include DRAM 220 and DRAM 250, and the second memory medium may Contains SCM 230 and SCM 260.
  • the processor 210 is connected to the first memory medium and the second memory medium through an interface supporting memory semantics.
  • the processor 240 is connected to the first memory medium and the second memory medium through an interface supporting memory semantics.
  • the interface includes supporting at least one interface among Compute Express Link TM (CXL), Cache Coherent Interconnect for Accelerators (CCIX) or unified bus (unified bus, UB or Ubus).
  • CXL Compute Express Link TM
  • CCIX Cache Coherent Interconnect for Accelerators
  • unified bus unified bus, UB or Ubus
  • the processor 210 accesses the DRAM 220 as fast as possible through a parallel interface (such as UB) to perform data read and write operations, so as to increase the speed of processing data by the processor 210 .
  • the processor 210 is connected to the SCM 230 through a higher-speed serial interface (such as: CXL) to expand the memory, which can expand more memory channels, thereby obtaining more memory bandwidth and memory capacity, and solving the problem of matching between the processor core and the memory relation.
  • the access delay of the processor accessing the memory connected through the serial interface is relatively large.
  • the processor 210 includes an integrated memory controller (integrated memory controller, iMC) 211 and a plurality of processor cores for implementing memory management and control, and the plurality of processor cores can be further divided into a plurality of computing clusters, each A computing cluster includes multiple processor cores.
  • computing cluster 1 includes processor core 1 to processor core 8 .
  • the computing cluster 2 includes processor cores 9 to 16 .
  • Multiple computing clusters communicate through a network on chip (NoC) 212, and the network on chip 212 is used to implement communication between processor cores in different computing clusters.
  • NoC network on chip
  • the network-on-chip 212 can be a node controller (node controller, NC); A chip or logic circuit used to implement communication between processor cores.
  • NC node controller
  • Each computing cluster is connected to different types of memory media through multiple integrated memory controllers 211 .
  • all processor cores in the processor 210 may also be divided into a computing cluster.
  • hybrid memory system 200 shown in FIG. 2 only uses the processor 210 and the processor 240 as an example for illustration.
  • the hybrid memory system 200 may include two or more processors, each Each processor is connected to different types of memory media through iMC.
  • iMC 211 as a memory expander.
  • the hybrid memory system 200 is a hybrid memory system.
  • the hybrid memory system 200 may also include other types of memory media, and the type of the memory medium is the same as that of the first memory medium. The type of the medium is different from that of the second memory medium.
  • random access memory random access memory (random access memory, RAM) or static random access memory (static RAM, SRAM), synchronous dynamic random access memory (SRAM) or synchronous dynamic random access memory (SRAM) can also be added in the hybrid memory system 200.
  • Memory synchronous DRAM, SDRAM
  • double data rate synchronous dynamic random access memory double data date SDRAM, DDR SDRAM
  • the hybrid memory system 200 includes multiple types Mixed memory media.
  • the hybrid memory system 200 only includes the first memory medium and the second memory medium, and the first memory medium is DRAM, and the second memory medium may be SCM as an example for illustration.
  • the operating system running on the processors of the hybrid memory system can allocate different levels of memory media to each processor according to the type of memory media, and record the correspondence between processors and memory media , so as to perform data reading or writing operations based on the correspondence between the above-mentioned processors and different levels of memory media.
  • Each processor may be allocated memory resources according to a hierarchical memory mechanism, which is used to indicate the levels of multiple different types of memory media in a hybrid memory system, and the hybrid memory system includes multiple levels.
  • the processor can divide the memory media in the multi-level memory system into different levels according to the physical properties of the memory media, where the physical properties include delay, cost, and lifetime. and memory capacity, the processor can divide the memory medium in the multi-level memory system into multiple levels according to at least one of delay, cost, lifetime and memory capacity, and the multiple levels can be from the first level to the second The order of the second level is sorted from high to low.
  • DRAM can be used as the first-level memory medium
  • SCM can be used as the second-level memory medium. , where the first rank is higher than the second rank.
  • the same type of memory medium can be divided into one or more levels, for example, the same type of memory medium can be divided into two or more levels according to at least one of physical attributes. For example, since the cost of DRAM produced by different manufacturers may vary, low-cost DRAM may be used as a first-level memory medium, and high-cost DRAM may be used as a second-level memory medium.
  • the memory capacity of the system is expanded by configuring multiple different types of memory media, so that the processor can obtain as much memory resources as possible when running applications.
  • any processor such as: the first processor
  • the system obtains the memory allocation request, according to the physical attributes of various types of memory media indicated by the allocation strategy (such as: the physical attributes include memory capacity, access at least one of time delay, cost or service life), determine the memory resource to be allocated from a variety of different types of memory media, allocate the memory resource to the logical address according to the allocation strategy, and ensure that the first processor accesses the allocated memory
  • the resource access delay is as low as possible, so that the access speed and memory capacity of the memory medium can meet the computing speed of the processor as much as possible.
  • the combination of low-cost large-capacity SCM and low-latency DRAM as a hybrid memory medium can store different data hierarchically, which can reduce the hardware cost of the system.
  • the processor uses the statically allocated large page resources.
  • the processor assigns the running application Dynamically allocate memory resources to ensure that the state of the system after completing the initial memory allocation is a state of optimal memory performance, reducing the impact on application performance.
  • the processor allocates memory resources to running applications according to the physical properties of various types of memory media, so that the memory resources can be used by other applications and improve the utilization of memory resources.
  • the access speed of SCM is lower than that of DRAM. Compared with the total system bandwidth when the processor only uses DRAM to read and write data, the total system bandwidth when the processor uses both SCM and DRAM to read and write data is reduced. Therefore, it increases the access latency of the processor to access the memory when the application is running. As shown in Table 1.
  • the occupancy rate of the memory bandwidth of the memory medium to be accessed is obtained. If it is determined according to the occupancy rate of the memory bandwidth that the memory medium to be accessed cannot To meet the bandwidth requirements, adjust the memory bandwidth occupancy rate according to the factors that affect the memory bandwidth occupancy rate of the memory medium to be accessed according to the bandwidth adjustment policy instructions, and use the first bandwidth that meets the bandwidth requirements from the adjusted remaining bandwidth of the memory medium to be accessed . In this way, by dynamically adjusting the remaining bandwidth of the memory medium, the memory medium to be accessed can provide enough bandwidth for the application running on the processor to ensure the overall access speed of the processor to access the memory.
  • the factors affecting the memory bandwidth occupancy of the memory medium to be accessed include at least one of user-oriented applications and system-oriented applications run by the first processor.
  • user-facing applications include big data applications, database applications, and cloud service applications.
  • System-oriented applications include operating system management applications, memory copy and data migration.
  • At least two different types of memory media associated with the first processor include a first memory medium and a second memory medium.
  • the memory medium accessed by the first processor may be the first memory medium associated with the first processor or the second memory medium associated with the first processor; or/and, the first processor
  • the accessed memory medium is a first memory medium associated with a processor adjacent to the first processor or a second memory medium associated with a processor adjacent to the first processor.
  • FIG. 3 is a schematic flowchart of a method for controlling memory bandwidth provided by an embodiment of the present application.
  • the hybrid memory system 200 is taken as an example for illustration.
  • DRAM 220 and DRAM 250 serve as first-class memory media.
  • SCM 230 and SCM 260 serve as secondary memory media.
  • Processor 210 is associated with DRAM 220 and SCM 230.
  • Processor 240 is associated with DRAM 250 and SCM 260. Assume that the processor 210 accesses the memory medium during running the application.
  • the method includes the following steps.
  • Step 310 the processor 210 obtains the bandwidth requirement of the memory medium to be accessed.
  • the bandwidth requirement is the bandwidth required by the processor 210 to access the memory medium to be accessed. It can be understood that, when the processor 210 accesses the memory medium to be accessed, it expects the memory medium to process a data amount of data per unit time.
  • step 320 the processor 210 acquires the memory bandwidth occupancy rate of the memory medium to be accessed.
  • the processor 210 may make statistics on the real-time bandwidth of the memory medium to be accessed occupied by the running user-oriented applications and/or system-oriented applications.
  • the processor 210 may determine the memory bandwidth occupancy rate of the memory medium to be accessed according to the ratio of the real-time bandwidth of the memory medium to be accessed to the total bandwidth of the memory medium to be accessed.
  • processor 210 may count certain hardware events occurring in the system.
  • the specific hardware event includes, for example, a cache miss (Cache Miss) or a branch misprediction (Branch Misprediction). Multiple events can be combined to calculate some performance data such as cycles per instruction (CPI), cache hit rate, etc.
  • the processor 210 calculates the real-time bandwidth of the memory medium accessed by the processor 210 by reading specific hardware events or performance data.
  • the processor 210 can also obtain the remaining bandwidth of the memory medium it accesses.
  • the processor 210 may determine the remaining bandwidth of the memory medium according to the difference between the total bandwidth and the real-time bandwidth of the memory medium.
  • the total bandwidth refers to the bandwidth determined by the hardware of the memory medium.
  • Processor 210 may obtain the remaining bandwidth of at least one of DRAM 220, SCM 230, DRAM 250, or SCM 260 that it has access to. For ease of description, the processor 210 accesses the DRAM 220, the memory bandwidth occupancy rate of the DRAM 220 determines that the DRAM 220 cannot meet the bandwidth requirement, and the memory bandwidth occupancy rate of the DRAM 220 is adjusted as an example for illustration.
  • the bandwidth adjustment condition includes at least one of the remaining bandwidth of the memory medium being smaller than the bandwidth threshold or the remaining bandwidth not meeting the memory bandwidth requirement of the application running on the processor 210 .
  • the processor 210 determines whether the remaining bandwidth of the DRAM 220 is less than a bandwidth threshold. If the remaining bandwidth of the DRAM 220 is greater than or equal to the bandwidth threshold, it means that the remaining bandwidth of the DRAM 220 is relatively sufficient, and can support the bandwidth allocated to the processor 210 required for running applications, without bandwidth adjustment. If the remaining bandwidth of the DRAM 220 is less than the bandwidth threshold, it means that the bandwidth of the DRAM 220 is too much occupied, and may not be able to support the bandwidth allocated to the processor 210 to run the application, and step 330 is performed.
  • the bandwidth threshold is obtained according to the total bandwidth of the memory medium and the adjustment factor.
  • the memory medium accessed by the processor 210 is the DRAM 220.
  • the bandwidth threshold satisfies the following formula (1).
  • P DRAM represents the bandwidth threshold of the DRAM
  • B DRAM represents the total bandwidth of the DRAM
  • represents an adjustment factor
  • the value range of the adjustment factor is 0-1.
  • the processor 210 may obtain the memory bandwidth requirements of the applications it runs. The processor 210 determines whether the remaining bandwidth of the DRAM 220 meets the memory bandwidth requirement. If the remaining bandwidth of the DRAM 220 meets the memory bandwidth requirement, no bandwidth adjustment is required. If the remaining bandwidth of the DRAM 220 does not meet the memory bandwidth requirement, step 330 is performed.
  • Step 330 the processor 210 adjusts the memory bandwidth occupancy rate of the DRAM 220 according to the bandwidth adjustment strategy, and uses the first bandwidth that meets the bandwidth requirement from the adjusted remaining bandwidth of the DRAM 220.
  • the processor 210 controls factors that affect the memory bandwidth occupancy rate of the DRAM 220 to occupy the bandwidth of the DRAM 220 to obtain the adjusted remaining bandwidth, and the adjusted remaining bandwidth is greater than the adjusted remaining bandwidth. Understandably, the adjusted occupancy rate of the memory bandwidth of the DRAM 220 is smaller than the unadjusted occupancy rate of the memory bandwidth of the DRAM 220 . That is, the processor 210 controls the use of the bandwidth of the DRAM 220 by factors affecting the occupancy rate of the memory bandwidth of the DRAM 220 .
  • storing cold data by SCM can reduce the storage cost of the system.
  • the access speed of the DRAM is higher than that of the SCM, and storing hot data in the DRAM can reduce the access delay of processing and accessing the hot data, and increase the speed of processing data by the processor 210 .
  • Hot data refers to data whose number of times the same data is accessed within a unit period is greater than the first threshold.
  • Cold data refers to data whose number of times the same data is accessed within a unit period is less than or equal to the second threshold.
  • the first threshold and the second threshold may be the same or different, and when the first threshold or the second threshold is different, the first threshold is greater than the second threshold.
  • the processor 210 includes a register for recording the page table management flag (access bit), and the processor 210 can determine whether a memory page is accessed in a fixed cycle, and count the number of times accessed, and pass through each memory page The distribution of page access times defines the above-mentioned first threshold and second threshold, and then judges whether the data is hot or cold.
  • the cold data stored in the SCM 230 may become hot data. If the hot data is stored by the SCM 230, the overall access delay of the system will increase due to the frequent access of the processor 210 to the SCM 230.
  • the hot data stored in the DRAM 220 may become cold data, and if the cold data is stored by the DRAM 220, the storage space of the DRAM 220 will be wasted.
  • the processor 210 can determine the data migration strategy according to the data distribution in the hybrid memory system, and then realize the migration processing of the migration data set with hot and cold attribute identification between different memory media, and reduce the storage cost of the system. If the processor 210 performs frequent data migration, that is, the cold data stored in the DRAM 220 is migrated to the SCM 230, and the hot data stored in the SCM 230 is migrated to the DRAM 220, resulting in excessive memory medium bandwidth occupation, that is, a high memory bandwidth occupancy rate , the remaining bandwidth of the memory medium accessed by the processor 210 may not be able to support the bandwidth allocated to the processor 210 to run the application, then the processor 210 can perform a bandwidth control on the remaining bandwidth of the memory medium according to the method for controlling memory bandwidth provided by this application. Adjustment.
  • the processor 210 determines the limited available bandwidth of the DRAM 220 according to the remaining bandwidth and the bandwidth threshold; accesses the DRAM 220 according to the limited available bandwidth to control factors affecting the occupancy rate of the memory bandwidth of the DRAM 220, and obtains the adjusted remaining bandwidth.
  • SDRAM represents limited available bandwidth
  • P DRAM represents a bandwidth threshold
  • a DRAM represents remaining bandwidth
  • represents an integer
  • K represents a constant
  • T means time.
  • SDRAM means that SDRAM pages of DRAMs are moved in and out of the DRAM 220 per unit time. If the DRAM is replaced by SCM, S SCM represents the pages of S SCM SCMs that move in and out of the SCM 230 per unit time.
  • memory pages are often divided into different specifications, for example, 4k, 2MB and 1GB, of which 4k memory pages are also called small pages or small page memory, 2MB or 1GB memory Pages are called huge pages or huge page memory.
  • a memory page whose size is larger than the third threshold is called a large page or a large page memory
  • a memory page whose size is smaller than or equal to the fourth threshold is called a small page or a small page memory.
  • the third threshold and the fourth threshold may be the same or different, and may be configured according to service requirements during specific implementation.
  • the processor allocates memory according to the memory allocation request, it can allocate memory in large pages, compared with the processor allocating memory in small pages of 4KB, reducing the occurrence of processor access to the translation look aside buffer (TLB) or page table The probability of a miss, and reducing the number of interrupts the processor generates when accessing memory.
  • TLB translation look aside buffer
  • pages that are moved in and out may refer to huge pages.
  • the conversion monitoring buffer is also called a page table buffer, which is a high-speed storage unit located in the processor, and some page table files (virtual address to physical address conversion table page table) are stored inside. If the "page table" is stored in the main memory, the cost of querying the page table will be very high, and the TLB located inside the memory can improve the conversion efficiency from virtual address to physical address.
  • the processor 210 can access the memory space in the DRAM 220 based on the first bandwidth allocated with a preset memory allocation granularity, the preset memory allocation granularity is greater than the page size of the memory medium, for example, the preset memory allocation granularity is large pages.
  • the processor 210 can adjust the bandwidth of at least one of the remaining bandwidth of the DRAM 220, the SCM 230, the DRAM 250 or the SCM 260 according to the above steps 310 to 330, so that the DRAM 220, the SCM 230, the DRAM 250 and the SCM 260 Ensure enough remaining bandwidth is allocated to the application.
  • the bandwidth of SCM and DRAM consumed by migrating data can be controlled below the specified bandwidth threshold, so as to ensure that the remaining bandwidth allocated to the application is greater than a certain fixed proportion of the total bandwidth, which can provide applications with this Quality of Service (QoS) guarantee at a certain ratio.
  • QoS Quality of Service
  • the hybrid memory system includes corresponding hardware structures and/or software modules for performing various functions.
  • the present application can be implemented in the form of hardware or a combination of hardware and computer software with reference to the units and method steps of the examples described in the embodiments disclosed in the present application. Whether a certain function is executed by hardware or computer software drives the hardware depends on the specific application scenario and design constraints of the technical solution.
  • FIG. 4 is a schematic structural diagram of a possible device for controlling memory bandwidth provided by this embodiment. These devices for controlling memory bandwidth can be used to implement the functions of the first processor in the above method embodiments, and thus can also achieve the beneficial effects of the above method embodiments.
  • the device for controlling memory bandwidth may be the processor 210 shown in FIG. 3 , or may be a module (such as a chip) applied to a server.
  • the device 400 for controlling memory bandwidth includes a communication module 410 , a request module 420 , a bandwidth perception module 430 , a decision module 440 , an adjustment module 450 and an access module 460 .
  • the device 400 for controlling memory bandwidth is used to implement the functions of the processor 210 in the method embodiment shown in FIG. 3 above.
  • the communication module 410 is used for communicating with other devices. For example, receiving bandwidth requirements sent by other devices.
  • the request module 420 is configured to acquire the bandwidth requirement of the memory medium to be accessed, where the bandwidth requirement is the bandwidth required by the first processor to access the memory medium to be accessed. For example, the request module 420 is used to execute step 310 in FIG. 3 .
  • the bandwidth sensing module 430 is configured to acquire the memory bandwidth occupancy rate of the memory medium to be accessed.
  • the bandwidth sensing module 430 is used to execute step 320 in FIG. 3 .
  • the decision module 440 is configured to determine, according to the memory bandwidth occupancy rate, that the memory medium to be accessed cannot meet the bandwidth requirement.
  • the adjustment module 450 is configured to adjust the occupancy rate of the memory bandwidth according to a bandwidth adjustment policy, and the bandwidth adjustment policy indicates that the memory bandwidth is adjusted according to factors that affect the occupancy rate of the memory bandwidth of the memory medium to be accessed, affecting the memory medium to be accessed.
  • the factor of the memory bandwidth occupancy of the memory medium includes at least one of a user-oriented application and a system-oriented application run by the first processor.
  • the bandwidth sensing module 430 is used to execute step 330 in FIG. 3 .
  • the access module 460 is configured to use the first bandwidth that meets the bandwidth requirement from the adjusted remaining bandwidth of the memory medium to be accessed.
  • the adjustment module 450 is specifically configured to control factors affecting the memory bandwidth occupancy rate of the memory medium to be accessed to occupy the bandwidth of the memory medium to be accessed, to obtain the adjusted remaining bandwidth, and the adjusted remaining bandwidth Greater than the remaining bandwidth before adjustment.
  • the storage module 470 may correspond to storing bandwidth adjustment policies in the foregoing method embodiments.
  • the device 400 for controlling memory bandwidth in the embodiment of the present application may be implemented by a graphics processing unit (graphics processing unit, GPU), a neural network processor (neural network processing unit, NPU), an application-specific integrated circuit (application-specific integrated Circuit, ASIC) implementation, or programmable logic device (programmable logic device, PLD) implementation
  • the above-mentioned PLD can be a complex program logic device (complex programmable logical device, CPLD), field-programmable gate array (field-programmable gate array, FPGA) ), generic array logic (GAL), or any combination thereof.
  • the memory bandwidth control method shown in FIG. 3 can also be implemented by software
  • the memory bandwidth control device 400 and its modules can also be software modules.
  • the memory bandwidth control device 400 may correspond to executing the method described in the embodiment of the present application, and the above-mentioned and other operations and/or functions of each unit in the memory bandwidth control device 400 are respectively in order to realize the For the sake of brevity, the corresponding processes of each method are not repeated here.
  • FIG. 5 is a schematic diagram of a hybrid memory system 500 provided by an embodiment of the present application.
  • the hybrid memory system 500 includes a processor 510, various types of memory media (for example, memory media 520, memory media 530), communication interface 540, storage medium 550 and bus 560.
  • the processor 510, the memory medium 520, the memory medium 530, the communication interface 540, and the storage medium 550 communicate through the bus 560, or communicate through other means such as wireless transmission.
  • various types of memory media may be used to store computer-executable instructions
  • the processor 510 is used to execute the computer-executable instructions stored in the memory medium 520 .
  • the memory medium 520 stores computer-executable instructions
  • the processor 510 may invoke the computer-executable instructions stored in the memory medium 520 to perform the following operations:
  • bandwidth requirement of the memory medium to be accessed where the bandwidth requirement is the bandwidth required by the first processor to access the memory medium to be accessed;
  • Adjust the occupancy rate of the memory bandwidth according to a bandwidth adjustment strategy use the first bandwidth that meets the bandwidth requirement from the adjusted remaining bandwidth of the memory medium to be accessed, and the bandwidth adjustment strategy indicates that the basis affects the memory to be accessed
  • the memory bandwidth occupancy factor of the medium adjusts the memory bandwidth.
  • the processor 510 may be a CPU, for example, a processor of an X56 architecture or a processor of an ARM architecture.
  • the processor 510 can also be other general processors, digital signal processors (digital signal processing, DSP), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) or other programmable logic devices, discrete gates or transistors Logic devices, discrete hardware components, system on chip (SoC), graphics processing unit (graphic processing unit, GPU), artificial intelligence (artificial intelligent, AI) chips, etc.
  • a general purpose processor may be a microprocessor or any conventional processor or the like.
  • the memory medium 520 may include read-only memory and random-access memory, and provides instructions and data to the processor 510 .
  • Memory medium 520 may also include non-volatile random access memory.
  • Memory medium 520 can be volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory.
  • the non-volatile memory can be read-only memory (read-only memory, ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically programmable Erases programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • Volatile memory can be random access memory (RAM), which acts as external cache memory.
  • RAM random access memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • Double data rate synchronous dynamic random access memory double data date SDRAM, DDR SDRAM
  • enhanced SDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronous connection dynamic random access memory
  • direct rambus RAM direct rambus RAM
  • the memory medium 520 can also be a storage class memory SCM, and the SCM includes at least one of phase change memory PCM, magnetic random access memory MRAM, resistive random access memory RRAM, ferroelectric memory FRAM, fast NAND or nano random access memory NRAM .
  • the type of the memory medium 530 is similar to that of the memory medium 520 , and may be any one of the above-mentioned types of memory medium, but in the hybrid memory system 500 , the types of the memory medium 520 and the memory medium 530 are different.
  • the bus 560 may also include a power bus, a control bus, a status signal bus, and the like. However, for clarity of illustration, the various buses are labeled as bus 560 in the figure.
  • the bus 560 can be a peripheral component interconnection standard (Peripheral Component Interconnect Express, PCIe) bus, or an extended industry standard architecture (extended industry standard architecture, EISA) bus, a unified bus (unified bus, Ubus or UB), a computer fast link ( compute express link (CXL), cache coherent interconnect for accelerators (CCIX), etc.
  • PCIe peripheral component interconnection standard
  • EISA extended industry standard architecture
  • unified bus unified bus, Ubus or UB
  • CXL compute express link
  • CIX cache coherent interconnect for accelerators
  • the bus 560 can be divided into an address bus, a data bus, a control bus, and the like.
  • FIG. 5 only takes the system including two different types of memory media as an example. In a specific implementation, more memory media may be included, and the types of memory media are different, so as to realize different types of memory media in a hybrid memory system.
  • the physical properties of the media enable hierarchical storage of data.
  • hybrid memory system 500 may correspond to the memory bandwidth control device 400 in the embodiment of the present application, and may correspond to the corresponding subject in performing the method according to the embodiment of the present application, and the hybrid memory system 500
  • the above-mentioned and other operations and/or functions of each module in FIG. 3 are respectively for realizing the corresponding flow of each method in FIG. 3 , and for the sake of brevity, details are not repeated here.
  • the present application also provides a processor, the processor includes an integrated circuit, the integrated circuit is connected to multiple different types of memory media, and the integrated circuit is used to realize the functions of each operation step in the method shown in Figure 3. For the sake of brevity, I won't repeat them here.
  • each module in the device 400 for controlling memory bandwidth provided by the present application can be distributed and deployed on multiple computers in the same environment or in different environments, the present application also provides a computer system as shown in FIG. 6 , which The computer system includes a plurality of computers 600 , and each computer 600 includes a memory medium 601 , a processor 602 , a communication interface 603 , a bus 604 and a memory medium 605 . Wherein, the memory medium 601 , the processor 602 , and the communication interface 603 are connected to each other through a bus 604 .
  • the memory medium 601 may be a combination of at least two of a read-only memory, a static storage device, a dynamic storage device, a random access memory, or a storage-class memory.
  • memory media includes DRAM and SCM.
  • the memory medium 601 may store computer instructions. When the computer instructions stored in the memory medium 601 are executed by the processor 602, the processor 602 and the communication interface 603 are used to execute the method for controlling memory bandwidth of the software system.
  • the memory medium may also store a data set, for example: a part of storage resources in the memory medium 601 is divided into an area for storing page tables and programs implementing the function of controlling memory bandwidth in the embodiment of the present application.
  • the type of the memory medium 605 is similar to that of the memory medium 601 , and may be any of the above-mentioned types of memory medium, but in the computer 600 , the types of the memory medium 605 and the memory medium 601 are different.
  • the processor 602 may be a general-purpose CPU, an application specific integrated circuit (ASIC), a GPU or any combination thereof.
  • Processor 602 may include one or more chips.
  • Processor 602 may include an AI accelerator, such as an NPU.
  • the communication interface 603 uses a transceiver module such as but not limited to a transceiver to implement communication between the computer 600 and other devices or communication networks. For example, memory allocation requests and the like may be obtained through the communication interface 603 .
  • the bus 604 may include a pathway for transferring information between various components of the computer 600 (eg, memory medium 601 , memory medium 605 , processor 602 , communication interface 603 ).
  • a communication path is established between each of the above-mentioned computers 600 through a communication network. Any one or more of the request module 420 , the bandwidth sensing module 430 , the decision module 440 , the adjustment module 450 and the access module 460 runs on each computer 600 .
  • Any computer 600 may be a computer (for example: a server) in a cloud data center, or a computer in an edge data center, or a terminal computing device.
  • GPUs are used to implement the function of training neural networks.
  • the method steps in this embodiment may be implemented by means of hardware, and may also be implemented by means of a processor executing software instructions.
  • Software instructions can be composed of corresponding software modules, and software modules can be stored in random access memory (random access memory, RAM), flash memory, read-only memory (read-only memory, ROM), programmable read-only memory (programmable ROM) , PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically erasable programmable read-only memory (electrically EPROM, EEPROM), register, hard disk, mobile hard disk, CD-ROM or known in the art any other form of storage medium.
  • An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium.
  • the storage medium may also be a component of the processor.
  • the processor and storage medium can be located in the ASIC.
  • the ASIC may be located in the terminal device.
  • the processor and the storage medium may also exist in the network device or the terminal device as discrete components.
  • all or part of them may be implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product comprises one or more computer programs or instructions. When the computer program or instructions are loaded and executed on the computer, the processes or functions described in the embodiments of the present application are executed in whole or in part.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, network equipment, user equipment, or other programmable devices.
  • the computer program or instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer program or instructions may be downloaded from a website, computer, A server or data center transmits to another website site, computer, server or data center by wired or wireless means.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrating one or more available media. Described usable medium can be magnetic medium, for example, floppy disk, hard disk, magnetic tape; It can also be optical medium, for example, digital video disc (digital video disc, DVD); It can also be semiconductor medium, for example, solid state drive (solid state drive) , SSD).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Memory System (AREA)

Abstract

一种控制内存带宽的方法,当***中的任意一个处理器(如:第一处理器)获取待访问内存介质所需的带宽后,获取待访问内存介质的内存带宽的占用率,若根据内存带宽的占用率确定待访问内存介质无法满足带宽需求,根据带宽调整策略指示的依据影响待访问内存介质的内存带宽的占用率的因素调整内存带宽的占用率,从待访问内存介质的调整后剩余带宽中使用满足带宽需求的第一带宽。如此,通过动态调整内存介质的剩余带宽,使待访问内存介质可以提供足够的带宽给处理器运行的应用所使用,来确保处理器访问内存的整体存取速度。

Description

控制内存带宽的方法、装置、处理器及计算设备
本申请要求于2021年9月30日提交国家知识产权局、申请号为202111166082.3、申请名称为“控制内存带宽的方法、装置、处理器及计算设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,尤其涉及一种控制内存带宽的方法、装置、处理器及计算设备。
背景技术
随着多核处理器的发展,单个处理器的核数逐渐增多,处理器的计算速度也不断提高。由于内存的存取速度和内存容量严重滞后处理器的计算速度,导致“内存墙”的问题日益凸显。目前,在计算机***中设置多种类型的存储介质来提升内存容量。但是,由于存储介质的存储容量越大,则存取速度越慢及延迟越大。增加了计算机***中内存的存储容量,也导致内存的整体存取速度降低。因此,如何在增加内存容量的前提下,确保处理器访问内存的整体存取速度是亟待解决的问题。
发明内容
本申请提供了控制内存带宽的方法、装置、处理器及计算设备,由此在增加内存容量的前提下,确保处理器访问内存的整体存取速度。
第一方面,提供了一种控制内存带宽的方法,混合内存***包含多个处理器和多种不同类型的内存介质,第一处理器关联至少两种不同类型的内存介质,第一处理器为多个处理器中任意一个处理器,所述方法由第一处理器执行,具体包括如下步骤:第一处理器获取待访问内存介质所需的带宽后,获取待访问内存介质的内存带宽的占用率,若根据内存带宽的占用率确定待访问内存介质无法满足带宽需求,根据带宽调整策略指示的依据影响待访问内存介质的内存带宽的占用率的因素调整内存带宽的占用率,从待访问内存介质的调整后剩余带宽中使用满足带宽需求的第一带宽。如此,通过动态调整内存介质的剩余带宽,使待访问内存介质可以提供足够的带宽给处理器运行的应用所使用,来确保处理器访问内存的整体存取速度。
其中,影响待访问内存介质的内存带宽的占用率的因素包括第一处理器运行的面向用户应用和面向***应用中至少一个。例如,面向用户应用包括大数据应用、数据库应用和云服务应用。面向***应用包括操作***管理应用、内存拷贝和数据迁移。
在一种可能的实现方式中,根据带宽调整策略调整内存带宽的占用率,包括:控制影响待访问内存介质的内存带宽的占用率的因素占用待访问内存介质的带宽,得到调整后剩余带宽,调整后剩余带宽大于调整前剩余带宽。
示例地,控制影响待访问内存介质的内存带宽的占用率的因素占用待访问内存介质的带宽,包括:根据剩余带宽和带宽阈值确定待访问内存介质的限制可用带宽;根 据限制可用带宽控制影响待访问内存介质的内存带宽的占用率的因素访问待访问内存介质,得到调整后剩余带宽。其中,带宽阈值是根据待访问内存介质的总带宽和调整因子得到的。
在另一种可能的实现方式中,第一处理器关联的至少两种不同类型的内存介质包含第一内存介质和第二内存介质,第一内存介质的存取速度大于第二内存介质的存取速度;待访问内存介质为第一处理器关联的第一内存介质或第一处理器关联的第二内存介质;或/和,待访问内存介质为第一处理器临近的处理器关联的第一内存介质或第一处理器临近的处理器关联的第二内存介质。
在另一种可能的实现方式中,从待访问内存介质的调整后剩余带宽中使用满足带宽需求的第一带宽包括:基于第一带宽访问待访问内存介质中以预设内存分配粒度分配的内存空间,预设内存分配粒度大于内存介质的页大小。
在另一种可能的实现方式中,第一处理器与多种不同类型的内存介质通过支持内存语义的接口相连,接口包括支持计算机快速链接(Compute Express Link TM,CXL)、缓存一致互联协议(Cache Coherent Interconnect for Accelerators,CCIX)或统一总线(unified bus,UB或Ubus)中至少一种接口。
在另一种可能的实现方式中,第一内存介质为动态随机存取存储器(dynamic random access memory,DRAM),第二内存介质为存储级内存(storage-class-memory,SCM),SCM包括相变存储器(phase-change memory,PCM),磁性随机存储器(magnetoresistive random-access memory,MRAM)、电阻型随机存储器(resistive random access memory,RRAM/ReRAM),铁电式存储器(ferroelectric random access memory,FRAM),快速NAND(fast NAND)或纳米随机存储器(Nano-RAM,NRAM)中至少一种。
在另一种可能的实现方式中,混合内存***应用于部署大容量内存的场景,场景包括大数据、内存型数据库或云服务中至少一种。
在另一种可能的实现方式中,混合内存***为服务器或服务器集群,服务器集群包括两个或两个以上服务器。
第二方面,提供了一种控制内存带宽的装置,所述装置包括用于执行第一方面或第一方面任一种可能设计中的控制内存带宽的方法的各个模块。
第三方面,提供了一种处理器,所述处理器关联至少两种不同类型的内存介质,所述处理器用于执行第一方面或第一方面任一种可能设计中的控制内存带宽的方法的操作步骤。
第四方面,提供一种计算设备,该计算设备包括至少一个处理器、存储器和多种不同类型的内存介质,存储器用于存储一组计算机指令;当处理器执行所述一组计算机指令时,执行第一方面或第一方面任一种可能实现方式中的控制内存带宽的方法的操作步骤。
第五方面,提供一种计算机***,所述计算机***包括存储器、至少一个处理器和多种不同类型的内存介质,每个处理器关联至少两种不同类型的内存介质,所述存储器用于存储一组计算机指令;当所述处理器执行所述一组计算机指令时,执行第一方面或第一方面任一种可能实现方式中的控制内存带宽的方法的操作步骤。
第六方面,提供一种计算机可读存储介质,包括:计算机软件指令;当计算机软件指令在计算设备中运行时,使得计算设备执行如第一方面或第一方面任意一种可能的实现方式中所述方法的操作步骤。
第七方面,提供一种计算机程序产品,当计算机程序产品在计算机上运行时,使得计算设备执行如第一方面或第一方面任意一种可能的实现方式中所述方法的操作步骤。
第八方面,提供一种芯片***,该芯片***包括第一处理器和第一处理器关联至少两种不同类型的内存介质,用于实现上述第一方面的方法中第一处理器的功能。在一种可能的设计中,所述芯片***还包括存储器,用于保存程序指令和/或数据。该芯片***,可以由芯片构成,也可以包括芯片和其他分立器件。
本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。
附图说明
图1为本申请实施例提供的一种三层结构的存储***示意图;
图2为本申请实施例提供的一种混合内存***的示意图;
图3为本申请实施例提供的一种控制内存带宽的方法的流程示意图;
图4为本申请提供的一种控制内存带宽装置的结构示意图;
图5为本申请提供的一种计算设备的结构示意图;
图6为本申请提供的一种计算机***的示意图。
具体实施方式
存储器是用于存储程序和各种数据的记忆器件。存储器的存储容量越大,存取速度越慢。反之,存储容量越小,存取速度越快。存取速度是指对存储器写入数据或读取数据时的数据传输速度。存取速度也可以称为读写速度。为了提高计算机***的***性能,可以依据存储容量和存取速度将存储器划分为不同层。
图1为本申请实施例提供的一种三层结构的存储***示意图。从第一层至第三层,存储容量逐级增加,存取速度逐级降低,成本逐级减少。如图1所示,第一层包含位于中央处理器(central processing unit,CPU)内的寄存器111、一级缓存112、二级缓存113和三级缓存114。第二层包含的存储器可以作为计算机***的主存储器(main memory)。例如,动态随机存取存储器121,双倍数据率同步动态随机存取存储器(double data rate synchronous DRAM,DDR SDRAM)122。主存储器可以简称为主存或内存,即与CPU交换信息的存储器。第三层包含的存储器可以作为计算机***的辅助存储器。例如,网络存储器131,固态驱动器(solid state disk或solid state drive,SSD)132,硬盘驱动器(hard disk drive,HDD)133。辅助存储器可以简称为辅存或外存。相对主存,外存的存储容量大,存取速度慢。可见,距离CPU越近的存储器,存储容量越小、存取速度越快、带宽越大、访问时延越小。例如,DRAM 121的访问时延的范围可以为50-100纳秒(nanosecond,ns),网络存储器131的访问时延的范围可以为1-1000微秒(microsecond,μs),固态驱动器的访问时延的范围可以为100μs。硬盘驱动器的访问时延的范围可以为1毫秒(milli second,ms)。因此,第三层包含的存储器可以作为后端存储设备。第二层包含的存储器可以作为缓存设备,用于 存储CPU经常访问的数据,显著地改善***的访问性能。
随着多核处理器的发展,单个处理器的核数逐渐增多,处理器的计算速度也不断提高,则处理器对内存的存取速度和内存容量的需求也越来越高。在一种可能的实现方式中,将多种不同类型的存储介质一起作为内存,以此提升内存容量,为每个处理器核从内存中分配尽可能多的内存资源和内存带宽,满足处理器对内存的存取速度和内存容量的需求。可以将利用多种不同类型的存储介质作为内存的***称为混合内存***。混合内存***中作为内存的存储介质可以称为内存介质。例如,多种不同类型的内存介质包含第一内存介质和第二内存介质,第一内存介质的存储容量低于第二内存介质的存储容量,第一内存介质的存取速度高于第二内存介质的存取速度,第一内存介质的访问时延低于第二内存介质的访问时延,第一内存介质的成本高于第二内存介质的成本。
存储级内存(storage-class-memory,SCM),既具有内存(memory)的优势,又兼顾了存储(storage)的特点,简单理解,即为新型非易失内存介质。SCM具有非易失性、极短的存取时间、每比特价格低廉、固态,无移动区的特点。当前的SCM介质技术有很多,其中相变存储器(phase-change memory,PCM)是最为突出和典型的介质,也是最早有产品面世的内存级内存介质技术之一,例如,
Figure PCTCN2022120293-appb-000001
基于3D Xpoint开发的傲腾内存(
Figure PCTCN2022120293-appb-000002
Optane TM Memory)。除此之外,SCM还包括磁性随机存储器(magnetoresistive random-access memory,MRAM)、电阻型随机存储器(resistive random access memory,RRAM/ReRAM)、铁电式存储器(ferroelectric random access memory,FRAM)、快速NAND(fast NAND)和纳米随机存储器(Nano-RAM,NRAM)等其他类型。
SCM的存储容量可以为几百吉字节(Gigabyte,GB),SCM的访问时延的范围可以为120-400ns。SCM可以位于图1所示的存储***的层级架构中的第二层,例如,存储级内存123。由于SCM具备存储容量大和存取速度快的特点,则可以将SCM和第二层中的其他存储介质一起作为混合内存***中的内存介质。比如,将DDR和SCM作为混合内存***中的内存介质,或者,将DRAM和SCM作为混合内存***中的内存介质。
示例地,图2为本申请实施例提供的一种混合内存***的示意图。如图2所示,混合内存***200包含多个处理器和多种不同类型的内存介质。多个处理器之间通过快速通道互联(quick path interconnect,QPI)(或称为公共***接口(common system interface,CSI))连接。多个处理器中的每个处理器可以关联至少一种类型的内存介质,即有的处理器关联一种类型的内存介质,有的处理器关联两种以上类型的内存介质。例如,多个处理器包含处理器210和处理器240,多种不同类型的内存介质包括第一内存介质和第二内存介质,第一内存介质可以包含DRAM 220和DRAM 250,第二内存介质可以包含SCM 230和SCM 260。处理器210与第一内存介质和第二内存介质通过支持内存语义的接口相连。处理器240与第一内存介质和第二内存介质通过支持内存语义的接口相连。所述接口包括支持内存互联(Compute Express Link TM,CXL)、缓存一致互联协议(Cache Coherent Interconnect for Accelerators,CCIX)或统一总线(unified bus,UB或Ubus)中至少一种接口。处理器210通过并行接口(如:UB) 尽可能快地访问DRAM 220进行数据读写操作,提升处理器210处理数据的速度。处理器210通过更高速率的串行接口(如:CXL)连接SCM 230来扩展内存,可以扩展更多的内存通道,从而获得更多的内存带宽和内存容量,解决处理器核与内存的匹配关系。另外,相对处理器访问通过并行接口连接的内存的访问时延,处理器访问通过串行接口连接的内存的访问时延较大,通过调节内存中通过串行接口连接的内存和通过并行接口连接的内存的配置比率,满足处理器对内存的存取速度和内存容量的需求。
其中,处理器210又包括用于实现内存管理和控制的集成内存控制器(integrated memory controller,iMC)211和多个处理器核,多个处理器核可以进一步被划分为多个计算集群,每个计算集群包括多个处理器核,例如,如图2所示,计算集群1包括处理器核1至处理器核8。计算集群2包括处理器核9至处理器核16。多个计算集群之间通过片上网络(network on chip,NoC)212进行通信,片上网络212用于实现不同计算集群中处理器核之间的通信。对于X86架构的处理器,片上网络212可以为节点控制器(node controller,NC);对于高级精简指令集计算机器(advanced reduced instruction set computing machines,ARM)架构的处理器,片上网络212可以为用于实现处理器核间通信的芯片或逻辑电路。每个计算集群通过多个集成内存控制器211与不同类型的内存介质进行连接。可选地,处理器210中所有处理器核也可以被划分为一个计算集群。
值得说明的是,图2所示的混合内存***200中仅以处理器210和处理器240为例进行说明,具体实施时,混合内存***200中可以包括两个或两个以上处理器,每个处理器分别通过iMC与不同类型的内存介质相连。
可选地,iMC 211除了如图2所示被集成在混合内存***200的处理器中外,也可以以处理器外的片外芯片形式作为混合内存***200中一个端点设备(endpoint),此时,iMC 211作为内存扩展器。
可选地,混合内存***200为混合内存***,除了包括第一内存介质和第二内存介质之外,混合内存***200中还可以包括其他类型的内存介质,该内存介质的类型与第一内存介质和第二内存介质的类型不同,例如,还可以在混合内存***200中添加随机存取存储器(random access memory,RAM)或静态随机存取存储器(static RAM,SRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data date SDRAM,DDR SDRAM)等类型的内存介质中至少一种,此时,混合内存***200中包括多种类型的混合内存介质。为了便于描述,本申请的以下实施例中以混合内存***200中仅包括第一内存介质和第二内存介质,且第一内存介质为DRAM,第二内存介质可以为SCM为例进行说明。
需要说明的是,在初始化阶段,混合内存***的处理器上运行的操作***可以根据内存介质的类型为各个处理器分配了不同等级的内存介质,并记录处理器和内存介质之间的对应关系,以便基于上述处理器和不同等级的内存介质的对应关系执行数据读取或写入操作。
每个处理器可以根据分级内存机制被分配内存资源,分级内存机制用于指示混合内存***中多种不同类型的内存介质的等级,混合内存***包括多个等级。具体地,由于不同厂商生产的内存介质的物理属性可能存在差异,处理器可以根据内存介质的 物理属性将多级内存***中内存介质划分为不用等级,其中,物理属性包括时延、成本、寿命和内存容量中至少一种,则处理器可以根据时延、成本、寿命和内存容量中至少一种将多级内存***中内存介质划分为多个等级,多个等级可以从第一等级、第二等级的从高到低的顺序排序。例如,由于DRAM的存取速度大于SCM的存取速度,则DRAM的访问时延小于SCM的访问时延,因此,可以将DRAM作为第一等级的内存介质,将SCM作为第二等级的内存介质,其中,第一等级高于第二等级。
此外,同一种类型的内存介质可以被划分为一个或多个等级,例如,根据物理属性中至少一种将同一种类型的内存介质划分为两个或两个以上等级。例如,由于不同厂商生产DRAM的成本可能存在差异,可以将低成本的DRAM作为第一等级的内存介质,将高成本的DRAM作为第二等级的内存介质。
如此,通过配置多种不同类型的内存介质来扩展***的内存容量,使得处理器运行应用时可以获得尽可能多的内存资源。另外,***中的任意一个处理器(如:第一处理器)获取到内存分配请求后,依据分配策略指示的依据多种不同类型的内存介质的物理属性(如:物理属性包含内存容量、访问时延、成本或使用寿命中至少一种),从多种不同类型的内存介质中确定待分配的内存资源,根据分配策略为逻辑地址分配所述内存资源,确保第一处理器访问分配的内存资源的访问时延尽可能低,使内存介质的存取速度和内存容量尽可能满足处理器的计算速度。另外,在不影响性能的前提下,通过低成本的大容量SCM和低时延的DRAM作为混合内存介质的组合,分级存储不同数据,可以降低***的硬件成本。
相比在应用使用内存资源前预先将内存以大页面划分为大页资源,处理器使用静态分配的大页资源,本申请中处理器根据多种不同类型的内存介质的物理属性给运行的应用动态地分配内存资源,保证了***在完成初始化内存分配后的状态是一种最佳内存性能的状态,降低对应用性能的影响。另外,处理器根据释放指令释放根据多种不同类型的内存介质的物理属性给运行的应用分配内存资源,使内存资源可以被其他应用所使用,提高内存资源的利用率。
但是,SCM的存取速度低于DRAM的存取速度,相比处理器仅使用DRAM读写数据时的***总带宽,处理器混合使用SCM和DRAM读写数据时的***总带宽有所降低,因此,反而增加了处理器运行应用时访问内存的访问时延。如表1所示。
表1
Figure PCTCN2022120293-appb-000003
由表1可知,无论是顺序读和随机读,5线程访问DRAM的***总带宽均大于5线程访问DRAM和其他5线程访问SCM的***总带宽。相对5线程访问DRAM的总带宽***,5线程访问DRAM和其他5线程访问SCM的***总带宽降低最大达到87%。
本申请提供的控制内存带宽的方法,第一处理器访问待访问内存介质所需的带宽后,获取待访问内存介质的内存带宽的占用率,若根据内存带宽的占用率确定待访问内存介质无法满足带宽需求,根据带宽调整策略指示的依据影响待访问内存介质的内存带宽的占用率的因素调整内存带宽的占用率,从待访问内存介质的调整后剩余带宽中使用满足带宽需求的第一带宽。如此,通过动态调整内存介质的剩余带宽,使待访问内存介质可以提供足够的带宽给处理器运行的应用所使用,来确保处理器访问内存的整体存取速度。
其中,影响待访问内存介质的内存带宽的占用率的因素包括第一处理器运行的面向用户应用和面向***应用中至少一个。例如,面向用户应用包括大数据应用、数据库应用和云服务应用。面向***应用包括操作***管理应用、内存拷贝和数据迁移。
其中,第一处理器关联的至少两种不同类型的内存介质包含第一内存介质和第二内存介质。第一处理器所访问的内存介质(也即是待访问内存介质)可以为第一处理器关联的第一内存介质或第一处理器关联的第二内存介质;或/和,第一处理器所访问的内存介质为第一处理器临近的处理器关联的第一内存介质或第一处理器临近的处理器关联的第二内存介质。
接下来,结合图3对本申请实施例提供的控制内存带宽的方法进行详细阐述。图3为本申请实施例提供的一种控制内存带宽的方法的流程示意图。在这里以混合内存***200为例进行说明。DRAM 220和DRAM 250作为第一等级的内存介质。SCM 230和SCM 260作为第二等级的内存介质。处理器210关联DRAM 220和SCM 230。处理器240关联DRAM 250和SCM 260。假设处理器210运行应用的过程中进行内存介质访问。如图3所示,该方法包括以下步骤。
步骤310、处理器210获取待访问内存介质的带宽需求。
带宽需求为处理器210访问待访问内存介质所需的带宽。可以理解的,处理器210访问待访问内存介质时期望内存介质单位时间内处理数据的数据量。
步骤320、处理器210获取待访问内存介质的内存带宽的占用率。
处理器210可以统计运行的面向用户应用和/或面向***应用所占用的待访问内存介质的实时带宽。处理器210根据待访问内存介质的实时带宽和待访问内存介质的总带宽比值可以确定待访问内存介质的内存带宽的占用率。
例如,处理器210可以统计***中发生的特定硬件事件。特定硬件事件例如包括缓存未命中(Cache Miss)或者分支预测错误(Branch Misprediction)等。多个事件可以结合计算出一些性能数据例如每指令周期数(CPI),缓存命中率等。处理器210通过读取特定硬件事件或性能数据来计算处理器210所访问的内存介质的实时带宽。
处理器210还可以获取其所访问的内存介质的剩余带宽。处理器210可以根据总带宽和内存介质的实时带宽之差确定内存介质的剩余带宽。总带宽是指由内存介质的硬件所决定的带宽。
处理器210可以获取其可以访问的DRAM 220、SCM 230、DRAM 250或SCM 260中至少一个的剩余带宽。为了便于描述,以处理器210访问DRAM 220,DRAM 220的内存带宽的占用率确定DRAM 220无法满足带宽需求,调整DRAM 220的内存带宽的占用率为例进行说明。
带宽调整条件包括内存介质的剩余带宽小于带宽阈值或剩余带宽不满足处理器210运行的应用的内存带宽需求中至少一个。
在一些实施例中,处理器210判断DRAM 220的剩余带宽是否小于带宽阈值。如果DRAM 220的剩余带宽大于或等于带宽阈值,表示DRAM 220的剩余带宽比较充足,可以支持分配给处理器210运行应用所需的带宽,无需进行带宽调整。如果DRAM 220的剩余带宽小于带宽阈值,表示DRAM 220的带宽被占用的过多,可能无法支持分配给处理器210运行应用所需的带宽,执行步骤330。
带宽阈值是根据内存介质的总带宽和调整因子得到。例如,处理器210所访问的内存介质为DRAM 220。带宽阈值满足如下公式(1)。
P DRAM=α·B DRAM      公式(1)
其中,P DRAM表示DRAM的带宽阈值,B DRAM表示DRAM的总带宽,α表示调整因子,调整因子的取值范围为0-1。
在另一些实施例中,处理器210可以获取其运行的应用的内存带宽需求。处理器210判断DRAM 220的剩余带宽是否满足内存带宽需求。如果DRAM 220的剩余带宽满足内存带宽需求,无需进行带宽调整。如果DRAM 220的剩余带宽不满足内存带宽需求,执行步骤330。
步骤330、处理器210根据带宽调整策略调整DRAM 220的内存带宽的占用率,从DRAM 220的调整后剩余带宽中使用满足带宽需求的第一带宽。
处理器210控制影响DRAM 220的内存带宽的占用率的因素占用DRAM 220的带宽,得到调整后剩余带宽,所述调整后剩余带宽大于调整前剩余带宽。可理解的,DRAM 220的内存带宽的调整后占用率小于DRAM 220的内存带宽的调整前占用率。也即是,处理器210控制影响DRAM 220的内存带宽的占用率的因素使用DRAM 220的带宽。
例如,由于SCM的硬件成本低于DRAM的硬件成本,由SCM存储冷数据,可以降低***的存储成本。DRAM的存取速度高于SCM的存取速度,由DRAM存储热数据,可以降低处理访问热数据的访问时延,提高处理器210处理数据的速度。
热数据,是指单位周期内同一数据被访问的次数大于第一阈值的数据。
冷数据,则是指单位周期内同一数据被访问的次数小于或等于第二阈值的数据。其中,第一阈值和第二阈值可以相同或不同,当第一阈值或第二阈值不同时,第一阈值大于第二阈值。
值得说明的是,处理器210中包括用于记录页表管理标志位(access bit)的寄存器,处理器210可以确定固定周期内一个内存页是否被访问,并统计被访问的次数,通过各个内存页访问次数的分布,定义上述第一阈值和第二阈值,进而判断数据的冷热。
因此,存储在SCM 230的冷数据有可能变为热数据,如果由SCM 230存储热数据,由于处理器210频繁访问SCM 230,导致***的整体访问时延增加。存储在DRAM 220的热数据有可能变为冷数据,如果由DRAM 220存储冷数据,导致浪费DRAM 220的存储空间。
由此,处理器210可以根据混合内存***中数据分布确定数据迁移策略,进而实现带有冷热属性标识的迁移数据集在不同内存介质之间的迁移处理,降低***的存储 成本。如果处理器210进行频繁的数据迁移,即将DRAM 220存储冷数据迁移到SCM 230,将SCM 230存储热数据迁移到DRAM 220,导致占用过多的内存介质的带宽,即内存带宽的占用率较高,处理器210所访问的内存介质的剩余带宽可能无法支持分配给处理器210运行应用所需的带宽,则处理器210可以根据本申请提供的控制内存带宽的方法对内存介质的剩余带宽进行带宽调整。
示例地,处理器210根据剩余带宽和带宽阈值确定DRAM 220的限制可用带宽;根据限制可用带宽控制影响DRAM 220的内存带宽的占用率的因素访问DRAM 220,得到调整后剩余带宽。
限制可用带宽满足公式(2)。
S DRAM=(P DRAM-A DRAM)·T*β+K    公式(2)
其中,S DRAM表示限制可用带宽,P DRAM表示带宽阈值,A DRAM表示剩余带宽。β表示整数,K表示常数。T表示时间。
在一些实施例中,如果T大于0,S DRAM表示单位时间内迁入迁出DRAM 220的S DRAM个DRAM的页面。若DRAM替换为SCM,S SCM表示单位时间内迁入迁出SCM 230的S SCM个SCM的页面。
需要说明的是,按照当前计算机领域的发展,内存页往往被划分为不同规格,例如,4k,2MB和1GB,其中4k的内存页也被称为小页或小页内存,2MB或1GB的内存页则被称为大页或大页内存。或者,将内存页的大小大于第三阈值的内存页称为大页或大页内存,将小于或等于第四阈值的内存页称为小页或小页内存。其中,第三阈值和第四阈值可以相同也可以不同,具体实施时,可以根据业务需求进行配置。
作为一种可能的实现方式,对于大容量内存的场景,为了提升数据处理的效率,往往使用大页内存进行数据的处理。例如,处理器依据内存分配请求分配内存时,可以以大页分配内存,相对于处理器以小页4KB分配内存,减少处理器访问转换监视缓冲器(translation look aside buffer,TLB)或页表发生未命中的概率,以及减少处理器在访问内存时产生较多的中断。因此,迁入迁出的页面可以指大页面。其中,转换监视缓冲器也称为页表缓冲器,是一块位于处理器内的高速存储单元,里面存放的是一些页表文件(虚拟地址到物理地址的转换表page table)。如果“页表”存储在主存储器中,查询页表所付出的代价将会很大,而位于存储器内部的TLB则可以提高从虚拟地址到物理地址的转换效率。
处理器210可以基于第一带宽访问DRAM 220中以预设内存分配粒度分配的内存空间,所述预设内存分配粒度大于内存介质的页大小,例如,预设内存分配粒度为大页面。
在实际应用中,处理器210可以根据上述步骤310至步骤330对DRAM 220、SCM 230、DRAM 250或SCM 260中至少一个的剩余带宽进行带宽调整,使DRAM 220、SCM 230、DRAM 250和SCM 260确保足够分配给应用的剩余带宽。
如此,通过该协同控制流程,可以将迁移数据所消耗SCM和DRAM的带宽控制在指定的带宽阈值之下,从而确保分配给应用的剩余带宽大于总带宽的某一固定比例,可为应用提供这一比例下的质量(Quality of Service,QoS)保障。
可以理解的是,为了实现上述实施例中的功能,混合内存***包括了执行各个功 能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本申请中所公开的实施例描述的各示例的单元及方法步骤,本申请能够以硬件或硬件和计算机软件相结合的形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用场景和设计约束条件。
上文中结合图1至图3,详细描述了根据本实施例所提供的控制内存带宽的方法,下面将结合图4,描述根据本实施例所提供的控制内存带宽装置。
图4为本实施例提供的可能的控制内存带宽装置的结构示意图。这些控制内存带宽装置可以用于实现上述方法实施例中第一处理器的功能,因此也能实现上述方法实施例所具备的有益效果。在本实施例中,该控制内存带宽装置可以是如图3所示的处理器210,还可以是应用于服务器的模块(如芯片)。
如图4所示,控制内存带宽装置400包括通信模块410、请求模块420、带宽感知模块430、决策模块440、调节模块450和访问模块460。控制内存带宽装置400用于实现上述图3中所示的方法实施例中处理器210的功能。
通信模块410用于与其他设备进行通信。例如,接收其他设备发送的带宽需求。
请求模块420用于获取待访问内存介质的带宽需求,所述带宽需求为所述第一处理器中访问所述待访问内存介质所需的带宽。例如,请求模块420用于执行图3中步骤310。
带宽感知模块430,用于获取所述待访问内存介质的内存带宽的占用率。例如,带宽感知模块430用于执行图3中步骤320。
决策模块440,用于根据所述内存带宽的占用率确定所述待访问内存介质无法满足所述带宽需求。
调节模块450,用于根据带宽调整策略调整所述内存带宽的占用率,所述带宽调整策略指示依据影响所述待访问内存介质的内存带宽的占用率的因素调整内存带宽,影响所述待访问内存介质的内存带宽的占用率的因素包括所述第一处理器运行的面向用户应用和面向***应用中至少一个。例如,带宽感知模块430用于执行图3中步骤330。
访问模块460,用于从所述待访问内存介质的调整后剩余带宽中使用满足所述带宽需求的第一带宽。
可选地,调节模块450具体用于控制影响所述待访问内存介质的内存带宽的占用率的因素占用所述待访问内存介质的带宽,得到所述调整后剩余带宽,所述调整后剩余带宽大于调整前剩余带宽。
存储模块470可以对应上述方法实施例中用于存储带宽调整策略。
应理解的是,本申请实施例的控制内存带宽装置400可以通过图形处理器(graphics processing unit,GPU)、神经网络处理器(neural network processing unit,NPU)、特定应用集成电路(application-specific integrated circuit,ASIC)实现,或可编程逻辑器件(programmable logic device,PLD)实现,上述PLD可以是复杂程序逻辑器件(complex programmable logical device,CPLD),现场可编程门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合。也可以通过软件实现图3所示的控制内存带宽方法时,控制内存带宽 装置400及其各个模块也可以为软件模块。
根据本申请实施例的控制内存带宽装置400可对应于执行本申请实施例中描述的方法,并且控制内存带宽装置400中的各个单元的上述和其它操作和/或功能分别为了实现图3中的各个方法的相应流程,为了简洁,在此不再赘述。
图5为本申请实施例提供的一种混合内存***500的示意图,如图所示,所述混合内存***500包括处理器510、多种不同类型的内存介质(例如,内存介质520、内存介质530)、通信接口540、存储介质550和总线560。其中,处理器510、内存介质520、内存介质530、通信接口540和存储介质550通过总线560进行通信,也可以通过无线传输等其他手段实现通信。其中,多种类型的内存介质(例如,内存介质520)可以用于存储计算机执行指令,该处理器510用于执行内存介质520存储的计算机执行指令。内存介质520存储计算机执行指令,且处理器510可以调用内存介质520中存储的计算机执行指令以执行以下操作:
获取待访问内存介质的带宽需求,所述带宽需求为所述第一处理器中访问所述待访问内存介质所需的带宽;
获取所述待访问内存介质的内存带宽的占用率;
根据所述内存带宽的占用率确定所述待访问内存介质无法满足所述带宽需求;
根据带宽调整策略调整所述内存带宽的占用率,从所述待访问内存介质的调整后剩余带宽中使用满足所述带宽需求的第一带宽,所述带宽调整策略指示依据影响所述待访问内存介质的内存带宽的占用率的因素调整内存带宽。
应理解,在本申请实施例中,处理器510可以是CPU,例如,X56架构的处理器或ARM架构的处理器。该处理器510还可以是其他通用处理器、数字信号处理器(digital signal processing,DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件、片上***(system on chip,SoC)、图形处理器(graphic processing unit,GPU)、人工智能(artificial intelligent,AI)芯片等。通用处理器可以是微处理器或者是任何常规的处理器等。
内存介质520可以包括只读存储器和随机存取存储器,并向处理器510提供指令和数据。内存介质520还可以包括非易失性随机存取存储器。内存介质520可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data date SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。可选地,内存介质520还可以是存储级内存SCM,SCM包括相变存储器PCM,磁性随机存储器MRAM、电阻型随机存储器 RRAM,铁电式存储器FRAM,快速NAND或纳米随机存储器NRAM中至少一种。
内存介质530的类型与内存介质520的类型类似,也可以为上述各种内存介质类型中任意一种,但在混合内存***500中,内存介质520和内存介质530的类型不同。
总线560除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都标为总线560。总线560可以是快捷***部件互连标准(Peripheral Component Interconnect Express,PCIe)总线,或扩展工业标准结构(extended industry standard architecture,EISA)总线、统一总线(unified bus,Ubus或UB)、计算机快速链接(compute express link,CXL)、缓存一致互联协议(cache coherent interconnect for accelerators,CCIX)等。总线560可以分为地址总线、数据总线、控制总线等。
值得说明的是,图5所示的混合内存***500中虽然以一个处理器510为例,但具体实施时,混合内存***500中可以包括多个处理器,且每个处理器中所包括的处理器核的个数不做限定。此外,图5仅以该***包括两个不同类型的内存介质为例,具体实施中,可以包括更多内存介质,且内存介质的种类各不相同,以实现混合内存***中按照不同类型的内存介质的物理属性实现数据的分级存储。
应理解,根据本申请实施例的混合内存***500可对应于本申请实施例中的控制内存带宽装置400,并可以对应于执行根据本申请实施例的方法中的相应主体,并且混合内存***500中的各个模块的上述和其它操作和/或功能分别为了实现图3中的各个方法的相应流程,为了简洁,在此不再赘述。
本申请还提供一种处理器,该处理器包括集成电路,所述集成电路与多种不同类型的内存介质相连,集成电路用于实现图3所示方法中各个操作步骤的功能,为了简洁,在此不再赘述。
由于本申请提供的控制内存带宽装置400中的各个模块可以分布式地部署在同一环境或不同环境中的多个计算机上,因此,本申请还提供一种如图6所示的计算机***,该计算机***包括多个计算机600,每个计算机600包括内存介质601、处理器602、通信接口603、总线604和内存介质605。其中,内存介质601、处理器602、通信接口603通过总线604实现彼此之间的通信连接。
内存介质601可以是只读存储器,静态存储设备,动态存储设备、随机存取存储器或者存储级内存中至少两个的组合。例如,内存介质包含DRAM和SCM。内存介质601可以存储计算机指令,当内存介质601中存储的计算机指令被处理器602执行时,处理器602和通信接口603用于执行软件***的控制内存带宽的方法。内存介质还可以存储数据集合,例如:内存介质601中的一部分存储资源被划分成一个区域,用于存储页表及实现本申请实施例的控制内存带宽的功能的程序。
内存介质605的类型与内存介质601的类型类似,也可以为上述各种内存介质类型中任意一种,但在计算机600中,内存介质605和内存介质601的类型不同。
处理器602可以采用通用的CPU,应用专用集成电路(application specific integrated circuit,ASIC),GPU或其任意组合。处理器602可以包括一个或多个芯片。处理器602可以包括AI加速器,例如NPU。
通信接口603使用例如但不限于收发器一类的收发模块,来实现计算机600与其 他设备或通信网络之间的通信。例如,可以通过通信接口603获取内存分配请求等。
总线604可包括在计算机600各个部件(例如,内存介质601、内存介质605、处理器602、通信接口603)之间传送信息的通路。
上述每个计算机600间通过通信网络建立通信通路。每个计算机600上运行请求模块420、带宽感知模块430、决策模块440、调节模块450和访问模块460中的任意一个或多个。任一计算机600可以为云数据中心中的计算机(例如:服务器),或边缘数据中心中的计算机,或终端计算设备。
每个计算机600上都可以部署数据库、大数据、云服务等的功能。例如,GPU用于实现训练神经网络的功能。
本实施例中的方法步骤可以通过硬件的方式来实现,也可以由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于随机存取存储器(random access memory,RAM)、闪存、只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)、寄存器、硬盘、移动硬盘、CD-ROM或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。另外,该ASIC可以位于终端设备中。当然,处理器和存储介质也可以作为分立组件存在于网络设备或终端设备中。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机程序或指令。在计算机上加载和执行所述计算机程序或指令时,全部或部分地执行本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、网络设备、用户设备或者其它可编程装置。所述计算机程序或指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机程序或指令可以从一个网站站点、计算机、服务器或数据中心通过有线或无线方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是集成一个或多个可用介质的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,例如,软盘、硬盘、磁带;也可以是光介质,例如,数字视频光盘(digital video disc,DVD);还可以是半导体介质,例如,固态硬盘(solid state drive,SSD)。以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。

Claims (24)

  1. 一种控制内存带宽的方法,其特征在于,混合内存***包含多个处理器和多种不同类型的内存介质,第一处理器关联至少两种不同类型的内存介质,所述第一处理器为所述多个处理器中任意一个处理器,所述方法由所述第一处理器执行,包括:
    获取待访问内存介质的带宽需求,所述带宽需求为所述第一处理器中访问所述待访问内存介质所需的带宽;
    获取所述待访问内存介质的内存带宽的占用率;
    根据所述内存带宽的占用率确定所述待访问内存介质无法满足所述带宽需求;
    根据带宽调整策略调整所述内存带宽的占用率,从所述待访问内存介质的调整后剩余带宽中使用满足所述带宽需求的第一带宽,所述带宽调整策略指示依据影响所述待访问内存介质的内存带宽的占用率的因素调整内存带宽。
  2. 根据权利要求1所述的方法,其特征在于,影响所述待访问内存介质的内存带宽的占用率的因素包括所述第一处理器运行的面向用户应用和面向***应用中至少一个。
  3. 根据权利要求2所述的方法,其特征在于,所述根据带宽调整策略调整所述内存带宽的占用率,包括:
    控制影响所述待访问内存介质的内存带宽的占用率的因素占用所述待访问内存介质的带宽,得到所述调整后剩余带宽,所述调整后剩余带宽大于调整前剩余带宽。
  4. 根据权利要求3所述的方法,其特征在于,控制影响所述待访问内存介质的内存带宽的占用率的因素占用所述待访问内存介质的带宽,包括:
    根据所述剩余带宽和带宽阈值确定所述待访问内存介质的限制可用带宽;
    根据所述限制可用带宽控制影响所述待访问内存介质的内存带宽的占用率的因素访问所述待访问内存介质,得到所述调整后剩余带宽。
  5. 根据权利要求4所述的方法,其特征在于,所述带宽阈值是根据所述待访问内存介质的总带宽和调整因子得到的。
  6. 根据权利要求1-5中任一项所述的方法,其特征在于,从所述待访问内存介质的调整后剩余带宽中使用满足所述带宽需求的第一带宽包括:
    基于所述第一带宽访问所述待访问内存介质中以预设内存分配粒度分配的内存空间,所述预设内存分配粒度大于内存介质的页大小。
  7. 根据权利要求1-6中任一项所述的方法,其特征在于,所述第一处理器关联的至少两种不同类型的内存介质包含第一内存介质和第二内存介质,所述第一内存介质的存取速度大于所述第二内存介质的存取速度;
    所述待访问内存介质为所述第一处理器关联的第一内存介质或所述第一处理器关联的第二内存介质;或/和,所述待访问内存介质为所述第一处理器临近的处理器关联的第一内存介质或所述第一处理器临近的处理器关联的第二内存介质。
  8. 根据权利要求7所述的方法,其特征在于,所述第一内存介质为动态随机存取存储器DRAM,所述第二内存介质为存储级内存SCM,所述SCM包括相变存储器PCM,磁性随机存储器MRAM、电阻型随机存储器RRAM,铁电式存储器FRAM,快速NAND或纳米随机存储器NRAM中至少一种。
  9. 根据权利要求1-8中任一项所述的方法,其特征在于,所述第一处理器与所述多种不同类型的内存介质通过支持内存语义的接口相连,所述接口包括支持计算机快速链接CXL、缓存一致互联协议CCIX或统一总线UB中至少一种接口。
  10. 根据权利要求1-9中任一项所述的方法,其特征在于,所述混合内存***应用于部署大容量内存的场景,所述场景包括大数据、内存型数据库或云服务中至少一种。
  11. 根据权利要求1-10中任一项所述的方法,其特征在于,所述混合内存***为服务器或服务器集群,所述服务器集群包括两个或两个以上服务器。
  12. 一种控制内存带宽的装置,其特征在于,混合内存***包含多种不同类型的内存介质,包括:
    请求模块,用于获取待访问内存介质的带宽需求,所述带宽需求为访问所述待访问内存介质所需的带宽;
    带宽感知模块,用于获取所述待访问内存介质的内存带宽的占用率;
    决策模块,用于根据所述内存带宽的占用率确定所述待访问内存介质无法满足所述带宽需求;
    调节模块,用于根据带宽调整策略调整所述内存带宽的占用率,所述带宽调整策略指示依据影响所述待访问内存介质的内存带宽的占用率的因素调整内存带宽;
    访问模块,用于从所述待访问内存介质的调整后剩余带宽中使用满足所述带宽需求的第一带宽。
  13. 根据权利要求12所述的装置,其特征在于,影响所述待访问内存介质的内存带宽的占用率的因素包括面向用户应用和面向***应用中至少一个。
  14. 根据权利要求13所述的装置,其特征在于,所述调节模块根据带宽调整策略调整所述内存带宽的占用率时,具体用于:
    控制影响所述待访问内存介质的内存带宽的占用率的因素占用所述待访问内存介质的带宽,得到所述调整后剩余带宽,所述调整后剩余带宽大于调整前剩余带宽。
  15. 根据权利要求14所述的装置,其特征在于,所述调节模块控制影响所述待访问内存介质的内存带宽的占用率的因素占用所述待访问内存介质的带宽时,具体用于:
    根据所述剩余带宽和带宽阈值确定所述待访问内存介质的限制可用带宽;
    根据所述限制可用带宽控制影响所述待访问内存介质的内存带宽的占用率的因素访问所述待访问内存介质,得到所述调整后剩余带宽。
  16. 根据权利要求15所述的装置,其特征在于,所述带宽阈值是根据所述待访问内存介质的总带宽和调整因子得到的。
  17. 根据权利要求12-16中任一项所述的装置,其特征在于,所述访问模块从所述待访问内存介质的调整后剩余带宽中使用满足所述带宽需求的第一带宽时,具体用于:
    基于所述第一带宽访问所述待访问内存介质中以预设内存分配粒度分配的内存空间,所述预设内存分配粒度大于内存介质的页大小。
  18. 根据权利要求12-17中任一项所述的装置,其特征在于,所述多种不同类型的内存介质包含第一内存介质和第二内存介质,所述第一内存介质的存取速度大于所 述第二内存介质的存取速度;所述待访问内存介质为所述第一内存介质或所述第二内存介质。
  19. 根据权利要求18所述的装置,其特征在于,所述第一内存介质为动态随机存取存储器DRAM,所述第二内存介质为存储级内存SCM,所述SCM包括相变存储器PCM,磁性随机存储器MRAM、电阻型随机存储器RRAM,铁电式存储器FRAM,快速NAND或纳米随机存储器NRAM中至少一种。
  20. 根据权利要求12-19中任一项所述的装置,其特征在于,所述混合内存***应用于部署大容量内存的场景,所述场景包括大数据、内存型数据库或云服务中至少一种。
  21. 根据权利要求12-20中任一项所述的装置,其特征在于,所述混合内存***为服务器或服务器集群,所述服务器集群包括两个或两个以上服务器。
  22. 一种处理器,其特征在于,所述处理器关联至少两种不同类型的内存介质,所述处理器用于执行上述权利要求1-11中任一项所述的方法的操作步骤。
  23. 一种计算设备,其特征在于,所述计算设备包括存储器、处理器和多种不同类型的内存介质,所述处理器关联至少两种不同类型的内存介质,所述存储器用于存储一组计算机指令;当所述处理器执行所述一组计算机指令时,执行上述权利要求1-11中任一项所述的方法的操作步骤。
  24. 一种计算机***,其特征在于,所述计算机***包括存储器、至少一个处理器和多种不同类型的内存介质,每个处理器关联至少两种不同类型的内存介质,所述存储器用于存储一组计算机指令;当所述处理器执行所述一组计算机指令时,执行上述权利要求1-11中任一项所述的方法的操作步骤。
PCT/CN2022/120293 2021-09-30 2022-09-21 控制内存带宽的方法、装置、处理器及计算设备 WO2023051359A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP22874746.5A EP4390685A1 (en) 2021-09-30 2022-09-21 Method and apparatus for controlling memory bandwidth, processor and computing device
US18/612,459 US20240231654A1 (en) 2021-09-30 2024-03-21 Method and Apparatus for Controlling Internal Memory Bandwidth, Processor, and Computing Device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111166082.3A CN115904689A (zh) 2021-09-30 2021-09-30 控制内存带宽的方法、装置、处理器及计算设备
CN202111166082.3 2021-09-30

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/612,459 Continuation US20240231654A1 (en) 2021-09-30 2024-03-21 Method and Apparatus for Controlling Internal Memory Bandwidth, Processor, and Computing Device

Publications (1)

Publication Number Publication Date
WO2023051359A1 true WO2023051359A1 (zh) 2023-04-06

Family

ID=85767806

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/120293 WO2023051359A1 (zh) 2021-09-30 2022-09-21 控制内存带宽的方法、装置、处理器及计算设备

Country Status (4)

Country Link
US (1) US20240231654A1 (zh)
EP (1) EP4390685A1 (zh)
CN (1) CN115904689A (zh)
WO (1) WO2023051359A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116881016A (zh) * 2023-09-06 2023-10-13 苏州浪潮智能科技有限公司 服务器进程的处理方法及装置、存储介质及电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107193646A (zh) * 2017-05-24 2017-09-22 中国人民解放军理工大学 一种基于混合主存架构的高效动态页面调度方法
US20180024750A1 (en) * 2016-07-19 2018-01-25 Sap Se Workload-aware page management for in-memory databases in hybrid main memory systems
US20190050261A1 (en) * 2018-03-29 2019-02-14 Intel Corporation Arbitration across shared memory pools of disaggregated memory devices
CN110502334A (zh) * 2018-05-17 2019-11-26 上海交通大学 基于混合内存架构的带宽感知任务窃取方法、***及芯片
CN112965885A (zh) * 2019-12-12 2021-06-15 中科寒武纪科技股份有限公司 访存带宽的检测方法、装置、计算机设备及可读存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180024750A1 (en) * 2016-07-19 2018-01-25 Sap Se Workload-aware page management for in-memory databases in hybrid main memory systems
CN107193646A (zh) * 2017-05-24 2017-09-22 中国人民解放军理工大学 一种基于混合主存架构的高效动态页面调度方法
US20190050261A1 (en) * 2018-03-29 2019-02-14 Intel Corporation Arbitration across shared memory pools of disaggregated memory devices
CN110502334A (zh) * 2018-05-17 2019-11-26 上海交通大学 基于混合内存架构的带宽感知任务窃取方法、***及芯片
CN112965885A (zh) * 2019-12-12 2021-06-15 中科寒武纪科技股份有限公司 访存带宽的检测方法、装置、计算机设备及可读存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116881016A (zh) * 2023-09-06 2023-10-13 苏州浪潮智能科技有限公司 服务器进程的处理方法及装置、存储介质及电子设备
CN116881016B (zh) * 2023-09-06 2024-01-19 苏州浪潮智能科技有限公司 服务器进程的处理方法及装置、存储介质及电子设备

Also Published As

Publication number Publication date
US20240231654A1 (en) 2024-07-11
EP4390685A1 (en) 2024-06-26
CN115904689A (zh) 2023-04-04

Similar Documents

Publication Publication Date Title
US11119908B2 (en) Systems and methods for memory system management
US20200301582A1 (en) Data management scheme in virtualized hyperscale environments
US20210141731A1 (en) Proactive data prefetch with applied quality of service
TWI781439B (zh) 映射未經分類之記憶體存取至經分類之記憶體存取
US20160085585A1 (en) Memory System, Method for Processing Memory Access Request and Computer System
US11494311B2 (en) Page table hooks to memory types
US10949120B2 (en) Host defined bandwidth allocation for SSD tasks
WO2023051000A1 (zh) 内存管理方法、装置、处理器及计算设备
WO2023051715A1 (zh) 数据处理的方法、装置、处理器和混合内存***
TWI764265B (zh) 用於將資料連結至記憶體命名空間的記憶體系統
WO2014178856A1 (en) Memory network
US11755241B2 (en) Storage system and method for operating storage system based on buffer utilization
US20240231654A1 (en) Method and Apparatus for Controlling Internal Memory Bandwidth, Processor, and Computing Device
US20220050722A1 (en) Memory pool management
WO2016000470A1 (zh) 一种内存控制方法和装置
US20240086315A1 (en) Memory access statistics monitoring
CN116342365A (zh) 用于经由使用可用设备存储器扩展***存储器的技术
WO2023000696A1 (zh) 一种资源分配方法及装置
CN116028386A (zh) 缓存资源的动态分配
CN115495433A (zh) 一种分布式存储***、数据迁移方法及存储装置
WO2024098795A1 (zh) 内存管理方法、装置和相关设备
WO2024045846A1 (zh) 存储介质的迁移带宽调整方法、装置、***以及芯片
US11829618B2 (en) Memory sub-system QOS pool management
WO2023241655A1 (zh) 数据处理方法、装置、电子设备以及计算机可读存储介质
Jang et al. Dynamic Clustering Page Allocation for Read-Intensive Multimedia Streaming Applications

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22874746

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022874746

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2022874746

Country of ref document: EP

Effective date: 20240319

NENP Non-entry into the national phase

Ref country code: DE