WO2023227004A1 - 内存访问热度统计方法、相关装置及设备 - Google Patents

内存访问热度统计方法、相关装置及设备 Download PDF

Info

Publication number
WO2023227004A1
WO2023227004A1 PCT/CN2023/095923 CN2023095923W WO2023227004A1 WO 2023227004 A1 WO2023227004 A1 WO 2023227004A1 CN 2023095923 W CN2023095923 W CN 2023095923W WO 2023227004 A1 WO2023227004 A1 WO 2023227004A1
Authority
WO
WIPO (PCT)
Prior art keywords
memory
address
data block
processor
access
Prior art date
Application number
PCT/CN2023/095923
Other languages
English (en)
French (fr)
Inventor
罗兴宇
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023227004A1 publication Critical patent/WO2023227004A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system

Definitions

  • the present application relates to the field of computers, and in particular to a memory access heat statistics method, a controller, a chip, a memory, a motherboard and a computer device.
  • This application provides memory access heat statistics methods, controllers, chips, memory, motherboards and computer equipment, thereby improving the data processing speed of the system and reducing data processing delays.
  • the first aspect provides a memory access heat statistics method, which is executed by the controller.
  • the method includes counting the access frequency of the data block in the memory by the processor based on the operation of the memory of the computer device by the application program running by the processor in the computer device, so as to determine the access popularity of the data block based on the access frequency.
  • the access frequency intuitively represents how frequently the data block is accessed by the application.
  • counting the access frequency of the data block where the first address is located includes: counting the access frequency of the data block where the unit storage space to which the first address belongs is located.
  • the size of the data block is a multiple of the unit storage space in the memory accessible to the processor. Because the computer device performs read operations or write operations on the memory storage space with a cache line (cacheline) granularity, and manages the memory storage space with a page granularity.
  • a page can include multiple cache lines.
  • the size of a data block is the size of a page in memory accessible to the processor in a computer device.
  • the size of the unit storage space can be the size of the cache line when the processor in the computer device accesses the memory.
  • An operation on a cache line in memory by an application running on a processor in a computer device can be viewed as an operation on a page.
  • the processor operates the memory with the cache line as the granularity, and the cache line belongs to a certain page under management. Then the cache line in the memory is operated once, and the page to which the cache line belongs is read and written once.
  • the controller counts the access frequency of the page to which the operated cache line belongs, which effectively improves the accuracy of identifying the access popularity of the page. It also counts the access frequency of the page at the granularity of the page, which is compatible with the page management method of computer equipment for memory. This memory access heat statistics method is easy to use.
  • interleaving means that the processor in the computer device distributes data to multiple memories for operation.
  • the application program run by the processor in the computer device operates the memory of the computer device in an interleaved manner and performs data processing based on multiple memory channels to improve the memory bandwidth utilization and processing performance of the computer device.
  • the size of the data block may also be the size of the interleaved data block in the memory accessible to the processor in the computer device in an interleaved manner.
  • the method further includes: identifying the second address according to the address of the data block where the first address is located and the address mapping relationship.
  • the second address is used to indicate a location in the controller where the access frequency of the data block is stored.
  • the address mapping relationship is used to indicate the mapping relationship between the address of the data block and the address of the storage space where the access frequency is stored. Therefore, it is convenient for the controller to obtain the access frequency of the data block stored in the first storage medium according to the second address, and update the access frequency of the data block.
  • the memory includes a first storage medium, the first storage medium is used to store the access frequency of the data block in the memory, and the second address indicates the storage of the access frequency of the data block in the first storage medium. space.
  • the computer device since the computer device has pre-allocated the capacity of the memory particles in the memory, when the processor accesses the first storage medium, the computer device allocates the corresponding capacity to the first storage medium from the capacity of the memory particles in the memory. If the physical address space of the first storage medium is larger than the capacity of the first storage medium, the controller maps the physical address accessing the first storage medium to the first storage medium. In this way, the corresponding capacity in the memory particle cannot be used, resulting in a waste of storage space of the memory particle.
  • the total capacity of the memory particles in the memory is 64GB
  • the page size is 4KB
  • the access frequency bit width of each data block is 4B, which wastes 64MB of memory storage space.
  • the method also includes: identifying the second address according to the third address indicated by the processor and the address mapping relationship, and obtaining the access frequency of the data block, where the third address is determined by the second address. Therefore, the storage space of the memory particles is saved and the utilization rate of the storage space of the memory particles is improved.
  • the method further includes: the controller determines the access popularity of the data block based on the access frequency, and triggers the data migration based on the access popularity. For example, the controller feeds back the access heat of the data block to the processor, and the processor controls data migration of data blocks with different access heat. For example, the processor stores cold data in remote memory and hot data in local memory. In this way, when the controller triggers data migration based on the access hotness of the data block, it can migrate the hot data to the local memory and the cold data to the remote memory, so that the processor can obtain data from the local memory as quickly as possible, improving The data processing speed of the system, as well as the reduction of data processing delay, improve the overall processing performance of the system.
  • a memory access heat statistics method is provided.
  • the method is executed by the processor, including: receiving the access frequency of interleaved data blocks sent by multiple memories, merging the access frequency of the interleaved data blocks, and obtaining the access frequency of a page.
  • the access frequency of the interleaved data block indicates the access frequency of the processor using the interleaving method to access the data block of one memory in multiple memories.
  • the controller merges when the application program running by the processor in the computer device operates the memory of the computer device in an interleaved manner, performs data processing based on multiple memory channels, and improves the memory bandwidth utilization and processing performance of the computer device.
  • the access frequency of the interleaved data blocks sent by multiple memories is used to obtain the access frequency of the page, so that the access popularity of the page can be determined based on the access frequency of the page.
  • the hot data can be migrated to the near end.
  • Memory migrate cold data to remote memory, so that the processor can obtain frequently accessed data from local memory as quickly as possible, improve the data processing speed of the system, reduce data processing latency, and significantly improve the access performance of the system .
  • a memory access heat statistics device in a third aspect, includes various modules for executing the memory access heat statistics method in the first aspect or any possible design of the first aspect, or executing the second aspect or In the second aspect, each module of the memory access heat statistics method in any possible design.
  • a controller in a fourth aspect, includes a processing unit and a storage unit.
  • the storage unit is used to store a set of computer instructions; when the processing unit serves as the control in the first aspect or any possible implementation of the first aspect, When the processor executes the set of computer instructions, the operating steps of the memory access heat statistics method in the first aspect or any possible implementation of the first aspect are executed.
  • the controller is a Register Clock Driver (Register Clock Driver) in the memory of the computer device. RCD) or expansion controller.
  • a chip including: a processor and a power supply circuit; the power supply circuit is used to supply power to the processor; and the processor is used to execute the memory access heat statistical method in the first aspect or any possible implementation of the first aspect. operating steps.
  • a memory in a sixth aspect, includes a memory and a controller as described in the fourth aspect.
  • the memory is used to store a set of computer instructions; when the controller executes a set of computer instructions, the first aspect or the first aspect is executed. Operation steps of the memory access heat statistics method in any possible implementation.
  • a seventh aspect provides a motherboard, which includes the controller as described in the fourth aspect, and the controller performs the operation steps of the memory access heat statistics method in the first aspect or any possible implementation of the first aspect.
  • a computer device in an eighth aspect, includes the motherboard as described in the seventh aspect.
  • a computer-readable storage medium including: computer software instructions; when the computer software instructions are run in a computing device, the computing device is caused to execute as in the first aspect or any possible implementation of the first aspect. The steps of the method.
  • a computer program product is provided.
  • the computer program product When the computer program product is run on a computer, it causes the computing device to perform the operation steps of the method described in the first aspect or any possible implementation of the first aspect.
  • Figure 1 is a schematic structural diagram of a computer device provided by this application.
  • Figure 2 is a schematic flow chart of a memory access heat statistics method provided by this application.
  • Figure 3 is a schematic diagram of the relationship between cache lines and pages provided by this application.
  • Figure 4 is a schematic diagram of a storage structure of access frequency provided by this application.
  • FIG. 5 is a schematic diagram of an interleaving method provided by this application.
  • Figure 6 is a schematic structural diagram of another computer device provided by this application.
  • Figure 7 is a schematic diagram of a memory access heat statistics device provided by this application.
  • Figure 8 is a schematic structural diagram of a controller provided by this application.
  • Access speed refers to the data transfer speed when writing data or reading data to the memory. Access speed can also be called read and write speed. According to the memory access speed, the main memory connected to the processor in the computer device can be divided into far memory (far memory) and near memory (near memory). Main memory can be referred to as main memory or memory.
  • the access speed of local memory is greater than the access speed of remote memory.
  • the local memory can be dynamic random access memory (Dynamic Random Access Memory, DRAM) or double data rate synchronous dynamic random access memory (double data rate SDRAM, DDR SDRAM).
  • the remote memory can be storage-class-memory (SCM).
  • Hot data refers to data that is frequently accessed by the processor. If hot data is stored in local memory, the processor can obtain the data as quickly as possible, improve the system's data processing speed, reduce data processing latency, and significantly improve the system's access performance.
  • Cold data refers to data that is not accessed frequently by the processor. If cold data is stored in remote memory, the reliability of the data can be improved, and the local memory can store more hot data, improving the resource utilization of the local memory and reducing system costs.
  • Cacheline refers to the unit in which a computer device performs read or write operations on the memory storage space.
  • the size of the cache line can be 64 bytes (byte, B).
  • Page refers to the unit used by computer equipment to manage memory storage space.
  • the page size is 4 kilobytes (KB), 2 megabytes (Megabytes, MB) or other byte sizes.
  • a 4KB page can be called a minipage.
  • a 2MB page can be called a huge page.
  • the smaller the page the more resources the computer device needs to manage memory; the larger the page, the fewer resources the computer device needs to manage memory.
  • a page can include multiple cache lines, i.e. the page size is a multiple of the cache line size.
  • Interleaving refers to evenly distributing the data accessed to the memory to multiple memory channels according to the unit storage space (for example, cache line).
  • the interleaving method can be configured by the system administrator and can be interleaved between multiple memory channels connected to a processor, or between multiple memory channels on multiple processors.
  • Memory channel refers to multiple memories connected to the processor in a computer device.
  • the processor can use interleaving technology to operate on memory. For example, the processor evenly distributes the data to be written to memory across multiple memory channels based on the size of the cache line. In turn, the processor reads data from multiple memory channels based on the size of the cache line. Therefore, data processing is performed based on multiple memory channels to improve the memory bandwidth utilization and processing performance of the computer device.
  • this application provides a memory access heat statistics method, that is, based on the operation of the memory of the computer device by the application program running by the processor in the computer device, the number of data blocks accessed by the processor in the memory is counted.
  • the access frequency is used to determine the access popularity of the data block based on the access frequency, and trigger data migration based on the access popularity.
  • Access frequency visually represents how frequently a data block is accessed by an application. The more times the data block is accessed by the application program, the more frequently the data block is accessed by the application program, and the hotter the data block is accessed; the fewer the number of times the data block is accessed by the application program, the more frequently the data block is accessed by the application program.
  • the size of the data block is a multiple of the unit storage space in the memory accessible to the processor.
  • the size of a data block is the page size used by a computer device to manage memory.
  • the size of the data block is the size of the interleaved data block in the memory accessible to the processor in the computer device using interleaving.
  • FIG. 1 is a schematic structural diagram of a computer device provided by this application.
  • a computer device including a local memory is used as an example for explanation.
  • computer device 100 includes processor 110 and memory 120 .
  • the processor 110 and the memory 120 are connected through a bus 130 .
  • the processor 110 may be a central processing unit (CPU), a graphics processing unit (GPU), a data processing unit (DPU), a neural processing unit (NPU), and an embedded processor.
  • XPU used for data processing such as neural-network processing unit (NPU).
  • the processor 110 can also be other general-purpose processors, digital signal processing (DSP), application-specific integrated circuit (ASIC), field-programmable gate array (FPGA) , system on chip (SoC) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • DSP digital signal processing
  • ASIC application-specific integrated circuit
  • FPGA field-programmable gate array
  • SoC system on chip
  • a general-purpose processor can be a microprocessor or any conventional processor, etc.
  • the following embodiments take the processor 110 as a CPU as an example for description.
  • Computer device 100 in Figure 1 may include one or more processors.
  • the processor may be a multi-core processor.
  • a processor here may refer to one or more devices, circuits, and/or computing units for processing data (eg, computer program instructions).
  • the processor 110 is configured to run an application program to perform read operations or write operations on the memory 120, and to trigger data migration based on the hotness and coldness of access to the memory 120.
  • Bus 130 may include a path for transferring data between the above-mentioned components (eg, processor 110 and memory 120). For example, the access request sent by the processor 110 to the memory 120, and the access frequency of the data block fed back by the memory 120 to the processor 110.
  • the bus 130 may also include a power bus, a control bus, a status signal bus, etc.
  • bus 130 is a DDR bus.
  • the various buses are labeled bus 130 in the figure.
  • Memory 120 may be a pool of volatile memory or a pool of non-volatile memory, or may include both volatile and non-volatile memory.
  • non-volatile memory can be read-only memory (ROM), programmable ROM (PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically removable memory. Erase electrically programmable read-only memory (EPROM, EEPROM) or flash memory.
  • Volatile memory can be random access memory (RAM), which is used as an external cache.
  • RAM static random access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • DDR SDRAM double data rate synchronous dynamic random access memory
  • Access memory double data date SDRAM, DDR SDRAM
  • enhanced synchronous dynamic random access memory enhanced SDRAM, ESDRAM
  • synchronous link dynamic random access memory synchronization RAM, SLDRAM
  • direct memory bus random access memory direct rambus RAM, DR RAM
  • the memory 120 includes memory particles 121, a data cache (Data Buffer, DB) 122, and a registered clock driver (Register Clock Driver, RCD) 123.
  • DB 122 connects memory particle 121 and RCD 123.
  • the memory particle 121 is used to store application data of application programs run by the processor 110 .
  • the memory particle 121 may be DRAM.
  • DB 122 is used to drive the data signal generated by the memory controller in the processor 110, to write the data sent by the processor 110 into the memory particle 121, and to transmit the application data stored in the memory particle 121 to the processor 110 or to the processor 110. Transmit the access frequency of the data blocks stored in RCD 123.
  • the RCD 123 is used to drive the clock signal, address signal and command signal generated by the memory controller in the processor 110 to implement operations on multiple memory particles 121.
  • the RCD 123 is used to perform a read operation or a write operation on the memory particle 121 in the memory 120 according to the access request of the processor 110, count the access frequency of the data block of the memory particle 121 in the memory 120, and store the access frequency of the data block.
  • RCD 123 is also used to obtain the access frequency of data blocks from the storage medium 124.
  • RCD 123 can determine the access popularity of the data block based on the access frequency to trigger data migration.
  • the RCD 123 includes a storage medium 124, and the storage medium 124 is used to store the access frequency of the data block. This avoids occupying the storage space of the memory particle 121 to store the access frequency of the data block, and improves the resource utilization of the memory particle 121.
  • Storage medium 124 may be a pool of volatile or non-volatile memory, or may include both volatile and non-volatile memory.
  • storage medium 124 may be RAM or ROM.
  • the storage capacity of the storage medium 124 is related to the size of the data block used for counting access frequencies.
  • the storage capacity of the storage medium 124 can be determined in the following two ways.
  • Method 1 Determine the capacity of the storage medium that stores the page access frequency based on the memory capacity, page size, and access frequency counter bit width.
  • the capacity of the storage medium used to store the access frequency of the page can be determined by the memory capacity, page size, and access frequency counter bit width.
  • n 1 capacity represents the capacity of the storage medium used to store the access frequency of the page
  • n capacity represents the capacity of the memory
  • X page size represents the page size
  • W count represents the counter bit width of the access frequency.
  • the counter bit width can be determined based on empirical values. The larger the counter bit width is, the larger the value recorded by the counter is. The smaller the counter bit width is, the smaller the value recorded by the counter is.
  • the storage medium 124 requires a larger storage space.
  • the RCD 123 can be connected to an external storage medium, thereby expanding the capacity of the RCD 123 to store page access frequencies.
  • RCD 123 is also connected to storage medium 125.
  • the storage medium 124 may be used to cache the access frequency of frequently accessed pages.
  • the storage medium 125 is used to store the access frequency of the page.
  • RCD 123 updates the access frequency of the page stored in the storage medium 125, two operations of the storage medium 125 are involved.
  • RCD 123 can meet the demand for storage access frequency by expanding the bandwidth of the storage medium or increasing the access frequency of the storage medium.
  • RCD 123 can connect at least two storage media to expand the bandwidth of the storage media and increase the access frequency of the storage media.
  • Method 2 Determine the capacity of the storage medium that stores the access frequency of the interleaved data block based on the memory capacity, page size, access frequency counter bit width, and the number of interleaved internal channels.
  • the capacity of the storage medium used to store the access frequency of interleaved data blocks can be determined by the memory capacity, page size, access frequency counter bit width, and the number of interleaved internal channels.
  • n 1 capacity represents the capacity of the storage medium that stores the access frequency of the interleaved data block
  • n capacity represents the memory capacity
  • X page size represents the page size
  • W count represents the counter bit width of the access frequency
  • N channel represents the number of memory channels.
  • the computer device needs to reserve 8 MB of physical address space for the storage medium of 8 memory channels.
  • the processor obtains the access frequency based on 256KB units of 8 memory channels and can combine the access frequency of a complete 2MB page to perform data migration. For example, the processor can add these 8 pieces of data.
  • the RCD of each memory channel needs to record the access frequency based on 512B units.
  • Computer equipment needs to reserve 4GB of physical address space for storage media with 8 memory channels.
  • Step 210 RCD 123 determines the storage space to be accessed by the access request sent by processor 110.
  • the RCD 123 receives the access request sent by the processor 110 through the memory bus (such as the DDR bus), decodes the access request, and obtains the physical address and operation instructions. The RCD 123 determines whether the processor 110 accesses the memory particle 121 or the storage medium 124 in the memory 120 based on the physical address.
  • the RCD 123 determines that the application program running on the processor 110 accesses the memory particle 121, and determines to perform a read operation or a write operation on the memory particle 121 according to the operation instruction. Execute step 220 and step 230.
  • the RCD 123 determines the access frequency of the processor 110 to access the storage medium 124 to obtain the data block, and determines to perform a read operation on the storage medium 124 according to the operation instruction. Perform steps 240 and 250.
  • Step 220 RCD 123 operates the memory particles 121 in the memory 120.
  • the RCD 123 writes the data obtained through the DB 122 into the storage space indicated by the physical address.
  • the RCD 123 reads the data stored in the storage space indicated by the physical address and transmits the data to the processor 110 through the DB 122.
  • Step 230 RCD 123 counts the access frequency of the data block where the first address is located.
  • the access request sent by the processor 110 instructs the application program running on the processor 110 to perform a read operation or write operation on one cache line.
  • RCD123 performs read or write operations according to a cache line indicated by the physical address. When a cache line is read or written once, the access frequency of the page to which a cache line belongs is increased by one. RCD 123 can set a counter for each page to count the access frequency of the page. Every time the application program running on the processor 110 accesses the page, the counter corresponding to the page is incremented by one. Therefore, memory access heat statistics are performed with the page granularity of the storage space of the memory managed by the computer device, so as to improve the accuracy of memory access heat statistics.
  • the size of the physical address space of the memory is 2 N+1 , in which the size of the cache line is 64 bytes.
  • the page size can be 4KB or 2MB.
  • Physical address 0x000 represents the starting address of the first page.
  • Physical address 0x1000 represents the starting address of the second page.
  • the first page contains 64 consecutive cache lines between physical addresses 0x000 to 0x1000.
  • the second page contains 64 consecutive cache lines between physical address 0x1000 and physical address 0x2000.
  • the address segment for statistics is determined based on the page size. For example, the page size is 4KB, and RCD 123 counts the access frequency of the page contained in the address segment [N:12].
  • the page size is 2MB
  • RCD 123 counts the access frequency of the page contained in the address segment [N:21].
  • the capacity of memory 120 is 64GB
  • RCD 123 counts the access frequency of the 32KB pages contained in the address segment [N:21].
  • the first address may be an address or an address segment of a cache line in the statistical address segment.
  • RCD 123 identifies the second address according to the address of the page where the first address is located and the address mapping relationship, reads the access frequency of the accessed page from the storage medium 124 according to the second address, and updates the access frequency of the page, that is, the access frequency of the page. Add one, and write the updated access frequency of the page back to the storage medium 124 .
  • the second address is used to indicate the location of the storage space that stores the access frequency of the page.
  • the second address may be the page number of the page or the address of the page.
  • the address mapping relationship is used to indicate the mapping relationship between the address of the page and the address of the storage space where the access frequency is stored. For example, when the first address indicates an address, the RCD 123 determines the page number of the page where the cache line indicated by the first address is located based on the first address, and reads the access frequency of the accessed page from the storage medium 124 based on the page number of the page. When the first address indicates an address segment, the RCD 123 determines the page numbers of all pages where the cache line indicated by the address segment is located based on the first address, and then reads the accessed page from the storage medium 124 based on the page number of the page. Visit frequency.
  • Step 240 RCD 123 operates the storage medium 124 in the memory 120.
  • the computer device Since the computer device has pre-allocated the capacity of the memory particles 121 in the memory 120, when the processor 110 accesses the storage medium 124, the computer device allocates the capacity of the memory particle 121 in the memory 120 to the storage medium 124 corresponding to the capacity of the storage medium 124.
  • the RCD 123 maps the physical address accessing the storage medium 124 to the storage medium 124, so that the corresponding capacity in the memory particle 121 cannot be used, resulting in a waste of the storage space of the memory particle 121. For example, a page size of 4KB wastes 64MB of memory storage space.
  • the RCD 123 can determine the physical address of the storage medium 124 based on the page number (page address), and store the partial bits of the page number and the access frequency field into the corresponding address space in the storage medium 124.
  • the counter of each page has a bit width of 4B (32 bits) and is divided into two parts. One part is used to record the access frequency of the page, and the other part records any 8 bits of the page number.
  • the position of recording the access frequency of the page and the position of any 8 bits in the page number of the recorded page are not limited. Among them, bit23 ⁇ bit0 are used to record the access frequency of the page, and bit32 ⁇ bit24 are used to record any 8 bits of the page number.
  • bit32 ⁇ bit24 represent the lower 8 bits of the page number.
  • the page number generation processor 110 of the page is used to access the corresponding address space storing the access frequency of the page in the storage medium 124 to obtain the mapping relationship between the address of the data block and the address of the storage space used to store the access frequency, so that it can be used
  • the purpose of using a small amount of address space to map the address space of the storage medium to store the access frequency is achieved, so that the RCD 123 determines the physical address of the processor 110 accessing the storage medium 124 based on the address mapping relationship.
  • Improve the storage space utilization of memory particles 121 improve the storage space utilization of memory particles 121.
  • partial bits in the page number of the page are used to generate the processor 110 to access the corresponding address space in the storage medium 124 that stores the access frequency of the page. For example, the remaining bits of any bits in the page number of the page are deleted as the corresponding address space for the processor 110 to access the access frequency of the stored page in the storage medium 124 .
  • the more any bits are in the page number of the deleted page the smaller the address space corresponding to the access frequency of the page stored in the storage medium 124 is accessed by the processor 110; conversely, the fewer any bits are in the page number of the deleted page, the smaller the processing time is.
  • the address space corresponding to the access frequency of the memory page stored in the storage medium 124 by the processor 110 is larger. For example, after the RCD 123 operates on the page in the memory particle 121, it stores the lower 8 bits of the page number in the counter's bit32 ⁇ bit24, and adds one to the access frequency recorded in bit23 ⁇ bit0.
  • the physical address obtained by RCD 123 indicates the storage space of storage medium 124, indicating that RCD 123 determines the access frequency of processor 110 to access storage medium 124 to obtain data blocks, and determines to perform a read operation on storage medium 124 according to the operation instruction.
  • the physical address can be the remaining bits of any bit in the page number of the deleted page.
  • the third address obtained by RCD 123 determines the storage space of the storage medium 124 indicated by the third address. After reading the 32-bit data, extract the upper 8 bits, add the third address, and obtain the page number and the corresponding 24-bit data. The bit is the access frequency of the page with the corresponding page number.
  • Step 250 RCD 123 determines the access popularity of the data block based on the access frequency.
  • the hot and cold degree of the data block is determined to be hot; if the cold access frequency is less than the threshold, the cold and hot degree of the data block is determined to be cold.
  • the RCD 123 may feed back the page number and the access popularity of the page indicated by the page number to the processor 110 to trigger data migration.
  • the processor operates the memory with the cache line as the granularity, and the cache line belongs to a certain page under management. Then the cache line in the memory is operated once, and the page to which the cache line belongs is read and written once.
  • the controller counts the access frequency of the page to which the operated cache line belongs, effectively improving the accuracy of identifying the access popularity of the page.
  • the page access frequency statistics are based on page granularity, which is compatible with the page management method of computer equipment for memory. This memory access heat statistics method is easy to use.
  • hot data can be migrated to the local memory and cold data can be migrated to the remote memory, so that the processor can obtain frequently accessed data from the local memory as quickly as possible.
  • the above embodiment explains the memory access heat statistics method using 4KB pages (small pages) and 2MB pages (large pages).
  • data accessing the memory is evenly distributed to multiple memory channels according to unit storage space (eg, cache line).
  • unit storage space eg, cache line.
  • the controller of each memory channel performs memory access heat statistics, it counts the access frequency of the interleaved data blocks.
  • the method for the controller of each memory channel to count the access frequency of interleaved data blocks can refer to the above explanation of the page access frequency.
  • the difference in the statistical access frequency of the interleaved data block is that the size of the interleaved data block is smaller than the page size, and the processor 110 accesses the storage medium 124 based on some bits in the address of the interleaved data block.
  • the corresponding address space that stores the access frequency of the page is
  • a 2MB page includes 512 4KB pages, the size of one cache line is 64B, and a 4KB page includes 64 cache lines.
  • a 2MB page includes 512*64 cache lines evenly divided into less than 8 memory channels, and each memory channel is allocated 512*512B cache lines. Each memory channel contains 256KB interleaved data blocks.
  • the interleaved data block size of each memory channel is 512B.
  • the capacity of the storage medium used to store the access frequency of interleaved data blocks in each in-memory controller is 0.5GB.
  • Computer equipment needs to reserve 4GB of physical address space for storage media with 8 memory channels.
  • the reserved address space of each memory is 1MB, there are 8 memory channels, and the reserved memory address space is 8MB.
  • the controller may determine the address of the storage space in the processor access controller used to store the access frequency of the interleaved data block based on the mapping relationship between the address of the interleaved data block and the address of the storage space that stores the access frequency. That is, the controller stores any 8 bits of the address of the interleaved data block into bits 32 to 24 of the interleaved data block counter, and adds one to bits 23 to bit 0 to record the access frequency of the interleaved data block. The remaining bits of the address of the interleaved data block serve as the physical address where the processor accesses the storage medium to store the access frequency of the interleaved data block.
  • the minimum data granularity of the device access frequency counter is 4B.
  • the controller determines the physical address in the storage medium that stores the access frequency of the page based on the physical address indicated by the processor accessing the storage medium. For example, for each memory, the processor periodically and continuously reads the access frequency of the interleaved data blocks stored by the controller in the memory. The controller detects the corresponding 512KB address access.
  • the physical address is PA[18:0] , the controller implements an 8-bit address counter AddrCnt[7:0].
  • the controller After the controller reads the 32-bit data, it extracts the high 8 bits, plus PA[18:0], a total of 27 bits is the address of the interleaved data block (Sub-page), and the corresponding data bits are the lower 24 bits.
  • the processor obtains the 512B-based access frequency of 8 memory channels and can combine a complete 4KB page for access frequency. For example, add the access frequencies of 8 memory channels, or take the mean, variance, maximum value, etc. of the access frequencies of 8 memory channels.
  • the processor 110 uses an access request as an example to illustrate the memory access heat statistics method. If the application program running on the processor 110 sends multiple access requests to the memory of the computer device, statistics will be made on the data block corresponding to the physical address of each access request. For specific methods, please refer to the description of the above embodiment.
  • the memory 120 may refer to a local memory (which may also be called a near memory), and the RCD 123 may refer to a controller that controls the local memory to perform memory access heat statistics.
  • the computer device may include a remote memory (which may also be called a far memory), and the controller counts the access frequency of the remote memory.
  • a remote memory which may also be called a far memory
  • the difference between the connection relationship between the local memory and the controller is that compared to the memory controller in the processor, the controller that controls the remote memory to perform memory access heat statistics can be an expansion controller.
  • the computer device includes a remote memory and a local memory.
  • the RCD in the local memory counts the access frequency of the local memory
  • the expansion controller connected to the remote memory counts the access frequency of the remote memory.
  • the access frequency threshold is determined. Data with an access frequency greater than the threshold is determined as hot data and placed in the local memory. Data with an access frequency less than the threshold is determined as cold data and placed in the remote memory.
  • the threshold setting does not affect application performance, it is found that only a small amount of memory access data is hot data, and most of the memory access data is cold data.
  • Computer equipment can be configured with a larger proportion of remote memory, reducing system costs. The reduction in the proportion of hot data can reduce data migration between local memory and remote memory, and reduce system memory bandwidth occupation and CPU overhead.
  • this application also provides a schematic structural diagram of another computer device.
  • the computer device 100 includes a processor 110 and a memory 120
  • the computer device 100 also includes a memory 610 .
  • Memory 120 can be used as a local memory
  • memory 610 can be used as a remote memory.
  • the processor 110 and the memory 610 are connected through a bus 620.
  • Memory 610 includes expansion controller 611, memory particles 612 and storage media 613.
  • Memory particles 612 include DRAM and SCM.
  • Storage medium 613 may be DRAM.
  • the expansion controller 611 is configured to perform a read operation or a write operation in the memory particle 612 according to the access request of the processor 110, count the access frequency of the data block in the memory particle 612, and store the access frequency of the data block to the storage medium 613. .
  • the expansion controller 611 can also be connected to an external storage medium 614 to expand the storage capacity of the storage medium 613 and store the access frequency of the data blocks.
  • the expansion controller 611 is also used to obtain the access frequency of the data block from the storage medium 613.
  • controller 611 To expand the functions of the controller 611 and the method of counting the access frequency of the data blocks of the memory particles 612 in the memory 610, as well as to obtain the access frequency of the data blocks obtained from the storage medium 613, please refer to the above controller to count the access frequency of the near-end memory. related explanations.
  • Memory 610 may be a pool of volatile memory or a pool of non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory can be ROM, PROM, EPROM, EEPROM or flash memory.
  • Volatile memory can be RAM.
  • many forms of RAM are available, such as SRAM, DRAM, SDRAM, DDR SDRAM, ESDRAM, SLDRAM, and DR RAM.
  • Bus 620 may include a path for transferring data between the components described above (eg, processor 110 and memory 610). For example, the access request sent by the processor 110 to the memory 610 and the data block fed back by the memory 610 to the processor 110 Visit frequency.
  • the bus 620 may also include a power bus, a control bus, a status signal bus, etc.
  • bus 620 is a DDR bus.
  • the various buses are labeled bus 620 in the figure.
  • the bus 620 may be a Peripheral Component Interconnect Express (PCIe) bus, an extended industry standard architecture (EISA) bus, a unified bus (Ubus or UB), or a computer quick link ( compute express link (CXL), cache coherent interconnect for accelerators (CCIX), etc.
  • PCIe Peripheral Component Interconnect Express
  • EISA extended industry standard architecture
  • Ubus or UB unified bus
  • CXL compute express link
  • CCIX cache coherent interconnect for accelerators
  • the bus 620 can be divided into an address bus, a data bus, a control bus, etc.
  • the embodiment of this application explains the structure of the near-end memory and the remote memory included in the computer device.
  • the RCD in the local memory counts the access frequency of the local memory
  • the expansion controller connected to the remote memory counts the access frequency of the remote memory. This consumes almost no CPU performance, and the processor operates the memory at the cache line granularity. If the cache line belongs to a managed page, the cache line in the memory is operated once, and the page to which the cache line belongs is read and written once.
  • the controller counts the access frequency of the page to which the operated cache line belongs, which effectively improves the accuracy of identifying the access popularity of the page. It also counts the access frequency of the page at the granularity of the page, which is compatible with the page management method of computer equipment for memory.
  • This memory access heat statistics method is easy to use.
  • the threshold of access frequency is determined based on the application, and data whose access frequency is greater than the threshold is determined as hot data and placed in the local memory, and data whose access frequency is less than the threshold is determined as cold data and placed in the remote memory.
  • the threshold setting does not affect application performance, it is found that only a small amount of memory access data is hot data, and most of the memory access data is cold data.
  • Computer equipment can be configured with a larger proportion of remote memory, reducing system costs. The reduction in the proportion of hot data can reduce data migration between local memory and remote memory, and reduce system memory bandwidth occupation and CPU overhead.
  • the controller includes hardware structures and/or software modules corresponding to each function.
  • the units and method steps of each example described in conjunction with the embodiments disclosed in this application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software driving the hardware depends on the specific application scenarios and design constraints of the technical solution.
  • the memory access heat statistics method provided by the present application is described in detail above with reference to FIGS. 1 to 6 .
  • the memory access heat statistics device provided by the present application will be described with reference to FIG. 7 .
  • Figure 7 is a schematic structural diagram of a possible memory access heat statistics device provided by this application. These memory access heat statistics devices can be used to implement the functions of the controller in the above method embodiments, and therefore can also achieve the beneficial effects of the above method embodiments.
  • the memory access heat statistics device may be a controller as shown in FIG. 2 , or may be a module (such as a chip) applied to computer equipment.
  • the memory access heat statistics device 700 includes a communication module 710 , a statistics module 720 and a storage module 730 .
  • the memory access heat statistics device 700 is used to implement the functions of the controller in the method embodiment shown in FIG. 2 .
  • the communication module 710 is configured to determine the first address based on the obtained access request, which is used to instruct the application program running by the processor in the computer device where the controller is located to operate the memory of the computer device.
  • the address is a physical address in said memory.
  • the statistics module 720 is used to count the access frequency of the data block where the first address is located, and the size of the data block is a multiple of the unit storage space in the memory that is accessible to the processor. For example, the statistics module 720 is used to perform steps 210 to 250 in FIG. 2 .
  • the statistics module 720 is specifically used to count the access frequency of the data block in the unit storage space to which the first address belongs.
  • the storage module 730 is used to store the access frequency, so that the statistics module 720 can determine the access popularity of the data block based on the access frequency and trigger data migration.
  • the memory access heat statistics device 700 in the embodiment of the present application can be implemented by an application-specific integrated circuit (ASIC) or a programmable logic device (PLD).
  • ASIC application-specific integrated circuit
  • PLD programmable logic device
  • the above PLD can be a complex program logic device ( complex programmable logical device (CPLD), field-programmable gate array (FPGA), general array logic (GAL), DPU, SoC or any combination thereof.
  • CPLD complex programmable logical device
  • FPGA field-programmable gate array
  • GAL general array logic
  • DPU SoC or any combination thereof.
  • the memory access heat statistics device 700 may correspond to executing the method described in the embodiment of the present application, and the above and other operations and/or functions of each unit in the memory access heat statistics device 700 are respectively to implement Figure 2
  • the corresponding processes of each method in for the sake of brevity, will not be repeated here.
  • FIG 8 is a schematic structural diagram of a controller 800 provided by this application.
  • the controller 800 includes a processing unit 810, a bus 820, a storage unit 830, and a communication interface 840.
  • the processing unit 810, the storage unit 830 and the communication interface 840 are connected through a bus 820.
  • the processing unit 810 may be a CPU.
  • the processing unit 810 may also be other general-purpose processors, digital signal processing (DSP), ASIC, FPGA or other programmable logic devices, discrete gates or Transistor logic devices, discrete hardware components, etc.
  • DSP digital signal processing
  • a general-purpose processor can be a microprocessor or any conventional processor, etc.
  • the communication interface 840 is used to implement communication between the controller 800 and external devices or devices. In this embodiment, when the controller 800 is used to implement the functions of the controller shown in Figure 2, the communication interface 840 is used to obtain the access request.
  • Bus 820 may include a path for communicating information between the above-mentioned components (eg, processing unit 810 and storage unit 830).
  • the bus 820 may also include a power bus, a control bus, a status signal bus, etc.
  • the various buses are labeled bus 820 in the figure.
  • Bus 820 may be a Peripheral Component Interconnect Express (PCIe) bus, an extended industry standard architecture (EISA) bus, a unified bus (unified bus, Ubus or UB), or a computer quick link ( compute express link (CXL), cache coherent interconnect for accelerators (CCIX), etc.
  • PCIe Peripheral Component Interconnect Express
  • EISA extended industry standard architecture
  • CXL compute express link
  • CIX cache coherent interconnect for accelerators
  • the bus 820 can be divided into an address bus, a data bus, a control bus, etc.
  • controller 800 may include multiple processors.
  • the processor may be a multi-CPU processor.
  • a processor here may refer to one or more devices, circuits, and/or computing units for processing data (eg, computer program instructions).
  • the processing unit 810 counts the access frequency of the data block where the first address is located.
  • the method of counting the access frequency of the data block may also be built into the processing unit 810, so that the processing unit 810 can count the access frequency of the data block.
  • the controller 800 includes a processing unit 810 and a storage unit 830.
  • the processing unit 810 and the storage unit 830 are respectively used to indicate a type of device or equipment. In specific embodiments, it can be based on Business requirements determine the quantity of each type of device or equipment.
  • the storage unit 830 may be used to store information such as access frequency in the above method embodiment.
  • Storage unit 830 may be a pool of volatile or non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory can be read-only memory (ROM), programmable ROM (PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically removable memory.
  • Erase programmable read-only memory electrically EPROM, EEPROM
  • Volatile memory may be random access memory (random access memory, RAM), which is used as an external cache.
  • RAM static random access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • Double data rate synchronous dynamic random access memory double data date SDRAM, DDR SDRAM
  • enhanced synchronous dynamic random access memory enhanced SDRAM, ESDRAM
  • synchronous link dynamic random access memory direct rambus RAM, DR RAM
  • the storage unit 830 may also correspond to the storage medium used to store computer instructions and other information in the above method embodiments, for example, a magnetic disk, such as a mechanical hard disk or a solid state hard disk.
  • the above-mentioned controller 800 may be a general-purpose device or a special-purpose device.
  • the controller 800 may also be a server or other device with computing capabilities.
  • the controller 800 may correspond to the memory access heat statistics device 700 in this embodiment, and may correspond to the corresponding subject executing any method according to FIG. 2, and each of the memory access heat statistics devices 700
  • the above and other operations and/or functions of the module are respectively intended to implement the corresponding processes of each method in Figure 2. For the sake of simplicity, they will not be described again here.
  • the controller 800 described in the embodiment of the present application may be an RCD controller in the memory of the computer device.
  • the controller 800 is an expansion controller connected to the memory of the computer device.
  • An embodiment of the present application also provides a chip, including: a processor and a power supply circuit; the power supply circuit is used to supply power to the processor; and the processor is used to execute the memory access heat statistical method described in the above embodiment.
  • An embodiment of the present application also provides a memory.
  • the memory includes a memory and a controller.
  • the memory is used to store a set of computer instructions.
  • the controller executes a set of computer instructions, the memory access heat statistical method described in the above embodiment is executed.
  • An embodiment of the present application also provides a motherboard.
  • the motherboard includes a controller, and the controller executes the memory access heat statistics method described in the above embodiment.
  • the method steps in this embodiment can be implemented by hardware or by a processor executing software instructions.
  • Software instructions can be composed of corresponding software modules, and software modules can be stored in random access memory (random access memory, RAM), flash memory, read-only memory (read-only memory, ROM), programmable read-only memory (programmable ROM) , PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically erasable programmable read-only memory (electrically EPROM, EEPROM), register, hard disk, mobile hard disk, CD-ROM or other well-known in the art any other form of storage media.
  • An exemplary storage medium is coupled to the processor such that the processor can read information from the storage medium and write information to the storage medium.
  • the storage medium can also be an integral part of the processor.
  • the processor and storage media may be located in an ASIC. Additionally, the ASIC can be located in a computing device. Of course, the processor and storage medium may also exist as discrete components in a computing device.
  • the computer program product includes one or more computer programs or instructions.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, a network device, a user equipment, or other programmable device.
  • the computer program or instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another.
  • the computer program or instructions may be transmitted from a website, computer, A server or data center transmits via wired or wireless means to another website site, computer, server, or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or data center that integrates one or more available media.
  • the available media may be magnetic media, such as floppy disks, hard disks, tapes; or optical media.
  • the medium is, for example, a digital video disc (DVD); it can also be a semiconductor medium, such as a solid state drive (SSD).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

公开了内存访问热度统计方法、控制器、芯片、内存、主板及计算机设备,涉及计算机领域。方法包括依据处理器运行的应用程序对内存的操作,统计处理器访问内存中的数据块的访问频次,根据访问频次确定数据块的访问热度。访问频次表示数据块被应用程序访问的频繁程度。数据块被应用程序访问的次数越多,表示数据块被应用程序访问的越频繁,数据块的访问热度越热;数据块被应用程序访问的次数越少,表示数据块被应用程序访问的越不频繁,数据块的访问热度越冷。由此将热数据迁移到近端内存,将冷数据迁移到远端内存,使处理器可以尽可能快地从近端内存获取频繁访问的数据,提高***的数据处理速度,以及降低数据处理时延,显著地改善***的访问性能。

Description

内存访问热度统计方法、相关装置及设备
本申请要求于2022年5月24日提交中国专利局、申请号为202210575022.5、发明名称为“一种数据迁移方法”的中国专利申请的优先权,以及于2022年08月24日提交的申请号为202211016161.0、发明名称为“内存访问热度统计方法、相关装置及设备”的中国专利申请的优先权,前述两件专利申请的全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机领域,尤其涉及一种内存访问热度统计方法、控制器、芯片、内存、主板及计算机设备。
背景技术
目前,希望存取速度快的存储器存储访问频繁的数据,存取速度慢的存储器存储访问不频繁的数据。但是,由于现有的内存访问热度统计方法准确性低,可能导致存取速度快的存储器可能存储访问不频繁的数据,或者导致存取速度慢的存储器可能存储访问频繁的数据,影响了***的数据处理速度和时延。
发明内容
本申请提供了内存访问热度统计方法、控制器、芯片、内存、主板及计算机设备,由此提高***的数据处理速度,以及降低数据处理时延。
第一方面,提供了一种内存访问热度统计方法,方法由控制器执行。方法包括依据计算机设备中处理器运行的应用程序对计算机设备的内存的操作,统计处理器访问内存中的数据块的访问频次,以便于根据访问频次确定数据块的访问热度。
如此,访问频次直观地表示了数据块被应用程序访问的频繁程度。数据块被应用程序访问的次数越多,表示数据块被应用程序访问的越频繁,数据块的访问热度越热;数据块被应用程序访问的次数越少,表示数据块被应用程序访问的越不频繁,数据块的访问热度越冷。由此依据数据块的访问热度触发数据迁移时,可以将热数据迁移到近端内存,将冷数据迁移到远端内存,使处理器可以尽可能快地从近端内存获取频繁访问的数据,提高***的数据处理速度,以及降低数据处理时延,显著地改善***的访问性能。
在一种可能的实现方式中,统计第一地址所在数据块的访问频次,包括:统计第一地址所归属的单位存储空间所在的数据块的访问频次。
其中,数据块的大小为处理器可访问的内存中单位存储空间的倍数。由于计算机设备以缓存线(cacheline)为粒度对内存的存储空间进行读操作或写操作,以页面为粒度对内存的存储空间进行管理。一个页面可以包括多个缓存线。数据块的大小为计算机设备中处理器可访问的内存中一个页面的大小。单位存储空间的大小可以为计算机设备中处理器访问内存时缓存线的大小。计算机设备中处理器运行的应用程序对内存中一个缓存线的操作可以视为对一个页面的操作。如此,处理器以缓存线为粒度对内存进行操作,而缓存线属于管理的某一页面,则内存中的缓存线***作一次,缓存线所属的页面也就被读写一次。控制器统计***作的缓存线所属的页面的访问频次,有效地提升了页面的访问热度的识别准确性,而且以页面为粒度统计页面的访问频次,兼容了计算机设备对内存的页面管理方式,该内存访问热度统计方法具备易用性。
此外,交织方式指计算机设备中处理器将数据分布到多个内存进行操作。计算机设备中处理器运行的应用程序以交织方式对计算机设备的内存进行操作,基于多个内存通道进行数据处理,以提升计算机设备的内存带宽利用率和处理性能。数据块的大小还可以为计算机设备中处理器采用交织方式可访问的内存中交织后数据块的大小。
在另一种可能的实现方式中,方法还包括:根据第一地址所在数据块的地址和地址映射关系识别第二地址。其中,第二地址用于指示控制器中存储数据块的访问频次的位置。地址映射关系用于指示数据块的地址与存储访问频次的存储空间的地址的映射关系。从而,以便于控制器依据第二地址获取第一存储介质存储的数据块的访问频次,更新数据块的访问频次。
在另一种可能的实现方式中,内存包括第一存储介质,第一存储介质用于存储内存中数据块的访问频次,第二地址指示了第一存储介质中存储数据块的访问频次的存储空间。
在另一种可能的实现方式中,由于计算机设备预先分配了内存中内存颗粒的容量,当处理器访问第一存储介质时,计算机设备从内存中内存颗粒的容量中给第一存储介质分配对应第一存储介质的容量的物理地址空间,则控制器将访问第一存储介质的物理地址映射到第一存储介质,这样内存颗粒中相应容量则无法使用,导致浪费了内存颗粒的存储空间。例如,内存中内存颗粒的总容量为64GB,页面大小为4KB,存储每个数据块的访问频次位宽为4B,浪费了64MB的内存存储空间。方法还包括:根据处理器指示的第三地址和地址映射关系识别第二地址,获取数据块的访问频次,第三地址是由第二地址确定的。从而,节省了内存颗粒的存储空间,提升了内存颗粒的存储空间的利用率。
在另一种可能的实现方式中,方法还包括:控制器根据访问频次确定数据块的访问热度,依据访问热度触发数据迁移。例如,控制器将数据块的访问热度反馈给处理器,由处理器控制不同访问热度的数据块进行数据迁移。比如,处理器将冷数据存储到远端内存,将热数据存储到近端内存。如此,控制器依据数据块的访问热度触发数据迁移时,可以将热数据迁移到近端内存,将冷数据迁移到远端内存,使处理器可以尽可能快地从近端内存获取数据,提高***的数据处理速度,以及降低数据处理时延,提升***的整体处理性能。
第二方面,提供了内存访问热度统计方法,方法由处理器执行,包括:接收多个内存发送的交织后数据块的访问频次,合并交织后数据块的访问频次,得到一个页面的访问频次。交织后数据块的访问频次指示处理器采用交织方式访问多个内存中一个内存的数据块的访问频次。
如此,在计算机设备中处理器运行的应用程序以交织方式对计算机设备的内存进行操作,基于多个内存通道进行数据处理,提升计算机设备的内存带宽利用率和处理性能的情况下,控制器合并多个内存发送的交织后数据块的访问频次,得到页面的访问频次,以便于根据页面的访问频次确定页面的访问热度,依据页面的访问热度触发数据迁移时,可以将热数据迁移到近端内存,将冷数据迁移到远端内存,使处理器可以尽可能快地从近端内存获取频繁访问的数据,提高***的数据处理速度,以及降低数据处理时延,显著地改善***的访问性能。
第三方面,提供了一种内存访问热度统计装置,所述装置包括用于执行第一方面或第一方面任一种可能设计中的内存访问热度统计方法的各个模块,或者执行第二方面或第二方面任一种可能设计中的内存访问热度统计方法的各个模块。
第四方面,提供一种控制器,该控制器包括处理单元和存储单元,存储单元用于存储一组计算机指令;当处理单元作为第一方面或第一方面任一种可能实现方式中的控制器执行所述一组计算机指令时,执行第一方面或第一方面任一种可能实现方式中的内存访问热度统计方法的操作步骤。控制器为计算机设备的内存中的寄存时钟驱动器(Register Clock Driver, RCD)或扩展控制器。
第五方面,提供一种芯片,包括:处理器和供电电路;供电电路用于为处理器供电;处理器用于执行第一方面或第一方面任一种可能实现方式中的内存访问热度统计方法的操作步骤。
第六方面,提供一种内存,内存包括存储器和如第四方面所述的控制器,存储器用于存储一组计算机指令;当控制器执行一组计算机指令时,执行第一方面或第一方面任一种可能实现方式中的内存访问热度统计方法的操作步骤。
第七方面,提供一种主板,主板包括如第四方面所述的控制器,控制器执行第一方面或第一方面任一种可能实现方式中的内存访问热度统计方法的操作步骤。
第八方面,提供一种计算机设备,计算机设备包括如第七方面所述的主板。
第九方面,提供一种计算机可读存储介质,包括:计算机软件指令;当计算机软件指令在计算设备中运行时,使得计算设备执行如第一方面或第一方面任意一种可能的实现方式中所述方法的操作步骤。
第十方面,提供一种计算机程序产品,当计算机程序产品在计算机上运行时,使得计算设备执行如第一方面或第一方面任意一种可能的实现方式中所述方法的操作步骤。
本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。
附图说明
图1为本申请提供的一种计算机设备的结构示意图;
图2为本申请提供的一种内存访问热度统计方法的流程示意图;
图3为本申请提供的一种缓存线与页面的关系示意图;
图4为本申请提供的一种访问频次的存储结构示意图;
图5为本申请提供的一种交织方式的示意图;
图6为本申请提供的另一种计算机设备的结构示意图;
图7为本申请提供的一种内存访问热度统计装置的示意图;
图8为本申请提供的一种控制器的结构示意图。
具体实施方式
为了便于描述,首先对本申请所涉及的术语进行简单解释。
存储器,是用于存储程序和各种数据的记忆器件。存取速度是指对存储器写入数据或读取数据时的数据传输速度。存取速度也可以称为读写速度。依据存储器的存取速度,可以将与计算机设备中处理器连接的主存储器划分为远端内存(far memory)和近端内存(near memory)。主存储器可以简称为主存(main memory)或内存(memory)。近端内存的存取速度大于远端内存的存取速度。例如,近端内存可以是动态随机存取存储器(Dynamic Random Access Memory,DRAM)或者双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)。远端内存可以是存储级内存(storage-class-memory,SCM)。
热数据,指处理器经常访问的数据。如果将热数据存储到近端内存,使得处理器可以尽可能快地获取到数据,提高***的数据处理速度,以及降低数据处理时延,显著地改善***的访问性能。
冷数据,指处理器不经常访问的数据。如果将冷数据存储到远端内存,可以提高数据的可靠性,而且使近端内存存储更多的热数据,提高近端内存的资源利用率,降低***成本。
缓存线(cacheline),指计算机设备对内存的存储空间进行读操作或写操作的单位。一个 缓存线的大小可以为64字节(byte,B)。
页面,指计算机设备对内存的存储空间进行管理的单位。例如,页面大小(page size)为4千字节(Kilobyte,KB)、2兆字节(Megabyte,MB)或其他的字节大小。4KB页面可以称为小页。2MB页面可以称为大页。页面越小,计算机设备管理内存所需的资源越多;页面越大,计算机设备管理内存所需的资源越少。一个页面可以包括多个缓存线,即页面大小为缓存线的大小的倍数。
交织,指将访问内存的数据按照单位存储空间(例如,缓存线)均匀地分布到多个内存通道上。交织方式可以由***管理员配置,可以在一个处理器连接的多个内存通道之间进行交织,也可以在多个处理器的多个内存通道之间进行交织。
内存通道,指计算机设备中处理器连接的多个内存。处理器可以采用交织技术对内存进行操作。例如,处理器根据缓存线的大小将待写入内存的数据均匀地分布到多个内存通道上。进而,处理器根据缓存线的大小从多个内存通道上读取数据。从而,基于多个内存通道进行数据处理,以提升计算机设备的内存带宽利用率和处理性能。
为了提高内存访问热度统计的准确度,本申请提供一种内存访问热度统计方法,即依据计算机设备中处理器运行的应用程序对计算机设备的内存的操作,统计处理器访问内存中的数据块的访问频次,以便于根据访问频次确定数据块的访问热度,根据访问热度触发数据迁移。访问频次直观地表示了数据块被应用程序访问的频繁程度。数据块被应用程序访问的次数越多,表示数据块被应用程序访问的越频繁,数据块的访问热度越热;数据块被应用程序访问的次数越少,表示数据块被应用程序访问的越不频繁,数据块的访问热度越冷。由此依据数据块的访问热度触发数据迁移时,可以将热数据迁移到近端内存,将冷数据迁移到远端内存,使处理器可以尽可能快地从近端内存获取频繁访问的数据,提高***的数据处理速度,以及降低数据处理时延,显著地改善***的访问性能。
其中,数据块的大小为处理器可访问的内存中单位存储空间的倍数。例如,数据块的大小为计算机设备管理内存所使用的页面大小。又如,数据块的大小为计算机设备中处理器采用交织方式可访问的内存中交织后数据块的大小。
下面结合附图详细介绍本申请提供的内存访问热度统计方法。
图1为本申请提供的一种计算机设备的结构示意图。在这里以计算机设备包括近端内存为例进行说明。如图1所示,计算机设备100包括处理器110和内存120。处理器110和内存120通过总线130相连。
处理器110可以是中央处理器(central processing unit,CPU)、图形处理器(graphics processing unit,GPU)、数据处理单元(data processing unit,DPU)、神经处理单元(neural processing unit,NPU)和嵌入式神经网络处理器(neural-network processing unit,NPU)等用于数据处理的XPU。处理器110还可以是其他通用处理器、数字信号处理器(digital signal processing,DSP)、专用集成电路(application-specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)、片上***(system on chip,SoC)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者是任何常规的处理器等。为了便于描述,以下实施例以处理器110为CPU为例进行说明。
图1中计算机设备100可包括一个或多个处理器。处理器可以是一个多核(multi-core)处理器。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(例如计算机程序指令)的计算单元。
处理器110,用于运行应用程序对内存120进行读操作或写操作,以及根据对内存120的访问冷热程度触发数据迁移。
总线130可以包括一通路,用于在上述组件(如处理器110和内存120)之间传送数据。例如,处理器110向内存120发送的访问请求,以及内存120向处理器110反馈的数据块的访问频次。总线130除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。例如,总线130为DDR总线。但是为了清楚说明起见,在图中将各种总线都标为总线130。
内存120可以是易失性存储器池或非易失性存储器池,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、DRAM、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data date SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。
内存120包括内存颗粒121、数据缓存(Data Buffer,DB)122和寄存时钟驱动器(Register Clock Driver,RCD)123。DB 122连接内存颗粒121和RCD 123。
内存颗粒121,用于存储处理器110运行的应用程序的应用数据。例如,内存颗粒121可以是DRAM。
DB 122,用于驱动处理器110中内存控制器生成的数据信号,实现将处理器110发送的数据写入内存颗粒121,以及向处理器110传输内存颗粒121存储的应用数据或向处理器110传输RCD 123存储的数据块的访问频次。
RCD 123用于驱动处理器110中内存控制器生成的时钟(clock)信号、地址(address)信号和命令(command)信号,实现对多个内存颗粒121的操作。RCD 123用于根据处理器110的访问请求在内存120中内存颗粒121执行读操作或写操作,并对内存120中内存颗粒121的数据块的访问频次进行计数,存储数据块的访问频次。RCD 123,还用于从存储介质124中获取数据块的访问频次。可选地,RCD 123可以根据访问频次确定数据块的访问热度触发数据迁移。
其中,RCD 123包含存储介质124,存储介质124用于存储数据块的访问频次。从而避免占用内存颗粒121的存储空间来存储数据块的访问频次,提升内存颗粒121的资源利用率。存储介质124可以是易失性存储器池或非易失性存储器池,或可包括易失性和非易失性存储器两者。例如,存储介质124可以是RAM或ROM。存储介质124的存储容量与用于统计访问频次的数据块的大小有关。存储介质124的存储容量可以依据以下两种方式确定。
方式一,根据内存的容量、页面大小和访问频次的计数器位宽确定存储页面的访问频次的存储介质的容量。
统计页面的访问频次时,用于存储页面的访问频次的存储介质的容量可以由内存的容量、页面大小和访问频次的计数器位宽确定。存储介质的容量如下公式(1)所示。
n1 capacity=(ncapacity/Xpage size)*Wcount   公式(1)
其中,n1 capacity表示用于存储页面的访问频次的存储介质的容量,ncapacity表示内存的容量, Xpage size表示页面大小,Wcount表示访问频次的计数器位宽。计数器位宽可以根据经验值确定。计数器位宽越大,计数器记录的数值越大,计数器位宽越小,计数器记录的数值越小。
示例一,假设内存的容量为64GB,页面大小为4KB,计数器位宽为4B,存储介质的容量为(64GB/4KB)*4B=64MB。
示例二,假设内存的容量为64GB,页面大小为2MB,计数器位宽为4B,存储介质的容量为(64GB/2MB)*4B=128KB。
由此可知,页面大小越大,存储介质的容量越小,表示需要存储的页面的访问频次的数据越少。页面大小越小,存储介质的容量越大,表示需要存储的页面的访问频次的数据越多。
另外,计算机设备以4KB页面管理内存时,存储介质124所需的存储空间较大。在一些实施例中,RCD 123可以再连接一个外置的存储介质,从而扩大RCD 123存储页面的访问频次的容量。示例地,如图1中所示,RCD 123还连接存储介质125。存储介质124可以用于缓存频繁访问的页面的访问频次。存储介质125用于存储页面的访问频次。可选地,RCD 123更新存储介质125中存储的页面的访问频次时涉及到两次存储介质125的操作。RCD 123可以通过扩展存储介质的带宽或提升存储介质的访问频率满足存储访问频次的需求。例如,RCD123可以连接至少两个存储介质来扩展存储介质的带宽和提升存储介质的访问频率。
方式二,根据内存的容量、页面大小、访问频次的计数器位宽和交织的内部通道数确定存储交织后数据块的访问频次的存储介质的容量。
统计交织后数据块的访问频次时,用于存储交织后数据块的访问频次的存储介质的容量可以由内存的容量、页面大小、访问频次的计数器位宽和交织的内部通道数确定。
假设计算机设备中内存通道为Nchannel个,页面大小为Xpage size字节,数据按照缓存线的粒度在Nchannel个内存通道上进行交织,每个内存通道分布的交织后数据块(子页面(sub-page)的大小为Xpage size/Nchannel字节,即每个内存通道的控制器按照Xpage size/Nchannel字节对内存单位进行管理,每个内存通道的控制器中记录基于Xpage size/Nchannel字节的数据块的访问频次。存储介质的容量如下公式(2)所示。
n1 capacity==(ncapacity/(Xpage size/Nchannel)*Wcount)     公式(2)
其中,n1 capacity表示存储交织后数据块的访问频次的存储介质的容量,ncapacity表示内存的容量,Xpage size表示页面大小,Wcount表示访问频次的计数器位宽。Nchannel表示内存通道的数量。
示例一,假设页面大小为2MB(Xpage size=2MB),数据按照缓存线的粒度在8个内存通道上进行交织(Nchannel=8),则每个内存通道的交织后数据块的大小为256KB,内存的容量为64GB(ncapacity=64GB),访问频次的计数器位宽为4B(Wcount=4B),则存储介质的容量为1MB。计算机设备需要预留给8个内存通道的存储介质的物理地址空间为8MB。处理器获取到8个内存通道的基于256KB单位的访问频次,可以组合出一个完整的2MB页面的访问频次,以此进行数据迁移。例如,处理器可以将这个8个数据相加。
示例二,假设页面大小为4KB(Xpage size=4KB),数据按照缓存线的粒度在8个内存通道上进行交织(Nchannel=8),则每个内存通道的交织后数据块的大小为512B,内存的容量为64GB(ncapacity=64GB),访问频次的计数器位宽为4B(Wcount=4B),则存储介质的容量为0.5GB。每个内存通道的RCD需要记录基于以512B为单位的访问频次。计算机设备需要预留给8个内存通道的存储介质的物理地址空间为4GB。
应理解,图1所示计算机设备100包括的器件只是示意性说明,具体实施例中,可以根据业务需求确定器件的数量。
接下来,结合图2介绍本申请提供的一种内存访问热度统计方法,如图所示。
步骤210、RCD 123判断处理器110发送的访问请求待访问的存储空间。
RCD 123通过内存总线(如,DDR总线)接收处理器110发送的访问请求,对访问请求进行译码,得到物理地址和操作指令。RCD 123根据物理地址判断处理器110访问内存120中内存颗粒121还是存储介质124。
如果物理地址指示内存颗粒121的存储空间,表示RCD 123确定处理器110运行的应用程序访问内存颗粒121,并根据操作指令确定对内存颗粒121进行读操作或写操作。执行步骤220和步骤230。
如果物理地址指示存储介质124的存储空间,表示RCD 123确定处理器110访问存储介质124获取数据块的访问频次,并根据操作指令确定对存储介质124进行读操作。执行步骤240和步骤250。
步骤220、RCD 123对内存120中内存颗粒121进行操作。
若操作指令指示对内存颗粒121中物理地址指示的存储空间进行写操作,则RCD 123将通过DB 122获取的数据写入物理地址指示的存储空间。
若操作指令指示对内存颗粒121中物理地址指示的存储空间进行读操作,则RCD 123读取物理地址指示的存储空间存储的数据,通过DB 122将数据传输至处理器110。
步骤230、RCD 123统计第一地址所在数据块的访问频次。
由于处理器110运行的应用程序对内存120的操作以缓存线作为最小粒度,处理器110发送的访问请求指示对处理器110运行的应用程序对一个缓存线进行读操作或写操作。RCD123根据物理地址指示的一个缓存线进行读操作或写操作,一个缓存线被读一次或写一次,一个缓存线所属的页面的访问频次加一。RCD 123可以为每个页面设置一个计数器,统计页面的访问频次,处理器110运行的应用程序对页面每访问一次,该页面对应的计数器加一。由此,以计算机设备管理内存的存储空间的页面为粒度进行内存访问热度统计,以提高对内存访问热度统计的精确性。
示例地,如图3中的(a)所示,内存的物理地址空间的大小为2N+1,其中,缓存线的大小为64字节。页面的大小可以为4KB或2MB。
如图3中的(b)所示,假设一个缓存线的大小是64字节,页面大小为4KB。4KB的页面包含64个连续的缓存线。物理地址0x000表示第一个页的起始地址。物理地址0x1000表示第二个页的起始地址。第一个页包含了物理地址0x000到0x1000之间的64个连续的缓存线。第二个页包含了物理地址0x1000到物理地址0x2000之间的64个连续的缓存线。
RCD 123以页面为单位统计访问频次时,根据页面大小确定统计的地址段。例如,页面大小为4KB,RCD 123统计地址段[N:12]包含的页面的访问频次。N表示内存120的容量。例如,内存120的容量为64GB,N=35。假设内存120的容量为64GB,地址段[N:12]包含的64GB/4KB=16MB个页面,RCD 123统计地址段[N:12]包含的16MB个页面的访问频次。
又如,页面大小为2MB,RCD 123统计地址段[N:21]包含的页面的访问频次。假设内存120的容量为64GB,地址段[N:21]包含的64GB/2MB=32KB个页面,RCD 123统计地址段[N:21]包含的32KB个页面的访问频次。
假设访问请求包含的物理地址指示第一地址,处理器110运行的应用程序访问内存颗粒121中第一地址指示的缓存线。第一地址可以是统计的地址段中一个缓存线的一个地址或地址段。RCD 123根据第一地址所在页面的地址和地址映射关系识别第二地址,根据第二地址从存储介质124中读取所访问页面的访问频次,更新页面的访问频次,即对页面的访问频次 加一,将更新后的页面的访问频次写回到存储介质124。第二地址用于指示存储页面的访问频次的存储空间的位置。第二地址可以是页面的页号或页面的地址。地址映射关系用于指示页面的地址与存储访问频次的存储空间的地址的映射关系。例如,当第一地址指示一个地址时,RCD 123根据第一地址确定第一地址指示的缓存线所在页面的页号,根据页面的页号从存储介质124中读取所访问页面的访问频次。当第一地址指示一个地址段时,RCD 123根据第一地址确定该地址段指示的缓存线所在的所有页面的页号,再根据上述页面的页号从存储介质124中读取所访问页面的访问频次。
步骤240、RCD 123对内存120中存储介质124进行操作。
由于计算机设备预先分配了内存120中内存颗粒121的容量,当处理器110访问存储介质124时,计算机设备从内存120中内存颗粒121的容量中给存储介质124分配对应该存储介质124的容量的物理地址空间,则RCD 123将访问存储介质124的物理地址映射到存储介质124,这样内存颗粒121中相应容量则无法使用,导致浪费了内存颗粒121的存储空间。例如,页面大小为4KB,浪费64MB的内存的存储空间。
RCD 123可以根据页面的页号(页面的地址)确定存储介质124的物理地址,将页号的部分比特位和访问频次组成字段存储到存储介质124中相应的地址空间。如图4所示,将每个页面的计数器的位宽为4B(32bit)分成两部分,一部分用于记录页面的访问频次,另一部分记录页面的页号中任意8bit。记录页面的访问频次的位置和记录页面的页号中任意8bit的位置不予限定。其中,bit23~bit0用于记录页面的访问频次,bit32~bit24用于记录页面的页号中任意8bit,例如bit32~bit24表示页面的页号中低8bit。进而,采用页面的页号生成处理器110访问存储介质124中存储页面的访问频次的相应的地址空间,得到数据块的地址与用于存储访问频次的存储空间的地址的映射关系,从而可以用于节约为内存120预留的地址空间,已达到用少量的地址空间映射存储介质存储访问频次的地址空间的目的,以便于RCD 123根据地址映射关系确定处理器110访问存储介质124的物理地址,提升内存颗粒121的存储空间的利用率。
在一些实施例中,采用页面的页号中的部分比特位生成处理器110访问存储介质124中存储页面的访问频次的相应的地址空间。例如,删除页面的页号中任意比特位的剩余比特位作为处理器110访问存储介质124中存储页面的访问频次的相应的地址空间。删除的页面的页号中任意比特位越多,处理器110访问存储介质124中存储页面的访问频次的相应的地址空间越小;反之,删除的页面的页号中任意比特位越少,处理器110访问存储介质124中存储页面的访问频次的相应的地址空间越大。例如,RCD 123对内存颗粒121中页面进行操作后,将该页面的页号的低8bit位存储到计数器的bit32~bit24,并对bit23~bit0记录的访问频次加一。
RCD 123获取到的物理地址指示存储介质124的存储空间,表示RCD 123确定处理器110访问存储介质124获取数据块的访问频次,并根据操作指令确定对存储介质124进行读操作。物理地址可以是删除页面的页号中任意比特位的剩余比特位。例如,RCD 123获取到的第三地址,确定第三地址指示的存储介质124的存储空间,读取到32bit数据后,提取高8bit,再加上第三地址,得到页号,对应的24bit数据位为相应页号的页面的访问频次。
步骤250、RCD 123根据访问频次确定数据块的访问热度。
访问频次大于或等于阈值,确定数据块的冷热程度为热;若冷访问频次小于阈值,确定数据块的冷热程度为冷。RCD 123可以将页号和页号指示的页面的访问热度反馈给处理器110,触发数据迁移。
如此,处理器以缓存线为粒度对内存进行操作,而缓存线属于管理的某一页面,则内存中的缓存线***作一次,缓存线所属的页面也就被读写一次。控制器统计***作的缓存线所属的页面的访问频次,有效地提升了页面的访问热度的识别准确性。而且以页面为粒度统计页面的访问频次,兼容了计算机设备对内存的页面管理方式,该内存访问热度统计方法具备易用性。另外,依据数据块的访问热度触发数据迁移时,可以将热数据迁移到近端内存,将冷数据迁移到远端内存,使处理器可以尽可能快地从近端内存获取频繁访问的数据,提高***的数据处理速度,以及降低数据处理时延,显著地改善***的访问性能。
上述实施例是以4KB页面(小页)和2MB页面(大页)对内存访问热度统计方法进行了说明。
在另一些实施例中,处理器运行的应用程序以交织方式对计算机设备的内存的操作时,将访问内存的数据按照单位存储空间(例如,缓存线)均匀地分布到多个内存通道上。每个内存通道的控制器进行内存访问热度统计时,统计交织后数据块的访问频次。每个内存通道的控制器统计交织后数据块的访问频次的方法可以参考上述对页面的访问频次的阐述。
与非交织的场景相比,统计交织后数据块的访问频次的区别在于:交织后数据块的大小小于页面大小,根据交织后数据块的地址中的部分比特位生成处理器110访问存储介质124中存储页面的访问频次的相应的地址空间。
示例地,如图5所示,2MB的页面包括512个4KB的页面,一个缓存线的大小为64B,4KB的页面包括64个缓存线。2MB的页面包括512*64个缓存线均匀地分不到8个内存通道,每个内存通道分得512*512B的缓存线。每个内存通道包含256KB的交织后数据块。
例如,页面大小为4KB(Xpage size=4KB),数据按照缓存线的粒度在8个内存通道上进行交织(Nchannel=8),则每个内存通道的交织后数据块的大小为512B。每个内存中控制器记录64GB/512B=128MB个交织后数据块的访问频次。每个内存中控制器中用于存储交织后数据块的访问频次的存储介质的容量为0.5GB。计算机设备需要预留给8个内存通道的存储介质的物理地址空间为4GB。
又如,页面大小为2MB(Xpage size=2MB),内存的容量为64GB,计数器位宽4B。每个内存预留的地址空间是1MB,8个内存通道,预留的内存地址空间为8MB。
在一些实施例,控制器可以根据交织数据块的地址与存储访问频次的存储空间的地址的映射关系,确定处理器访问控制器中用于存储交织后数据块的访问频次的存储空间的地址。即控制器将交织数据块的地址的任意8bit位存储到交织后数据块的计数器的bit32~bit24,并对bit23~bit0记录交织后数据块的访问频次加一。交织数据块的地址的其余比特作为处理器访问存储介质中存储该交织数据块的访问频次的物理地址。
例如,内存中控制器记录64GB/512B=128MB个交织后数据块的访问频次时,处理器访问存储介质的物理地址空间减少到0.5GB/256=2MB,同时访问频次计数器位宽为4B,处理器访问频次计数器最小数据粒度为4B,2MB物理地址空间可以进一步减少到2MB/4B=512KB,8个内存通道共需要预留4MB物理地址空间,则内存的容量便仅浪费4MB。
控制器根据处理器访问存储介质指示的物理地址确定存储介质中存储该页面的访问频次的物理地址。例如,对每个内存,处理器周期性连续读取内存中控制器存储的交织后数据块的访问频次,控制器检测到对应的512KB地址访问,该物理地址编址为PA[18:0],控制器实现一个8bit的地址计数器AddrCnt[7:0],当512KB地址连续访问一遍时,AddrCnt[7:0]计数器加一,控制器生成一个物理地址MAT_Addr={AddrCnt[7:0],PA[18:0]},控制器基于物理地址MAT_Addr对存储在控制器中交织后数据块的访问频次进行访问。
控制器读取到32bit数据后,提取高8bit,再加上PA[18:0],共27bit为交织后数据块的地址(子页(Sub-page)),对应的数据位低24bit为相应交织后数据块的访问频次。处理器获取了8个内存通道的基于512B的访问频次,可以组合出一个完整的4KB页面进行访问频次。例如,将8个内存通道的访问频次相加,或者,取8个内存通道的访问频次的均值、方差、最大值等。
上述实施例是处理器110以一个访问请求为例说明内存访问热度统计方法。如果处理器110运行的应用程序对计算机设备的内存的操作发送了多个访问请求,针对每个访问请求对应物理地址所在数据块进行统计,具体方法可以参考上述实施例的阐述。
上述各实施例结合附图对计算机设备包括的控制器与近端内存的连接关系和近端内存的内存访问热度统计方法的场景进行了说明。如上述实施例中所述的内存120可以是指示近端内存(也可以称为近内存),RCD 123可以是指控制近端内存进行内存访问热度统计的控制器。
在另一些实施例中,计算机设备可以包含远端内存(也可以称为远内存),控制器统计远端内存的访问频次。与近端内存和控制器的连接关系的区别在于,相对处理器中的内存控制器,控制远端内存进行内存访问热度统计的控制器可以是一种扩展控制器。
可选地,计算机设备包括远端内存和近端内存,由近端内存中RCD统计近端内存的访问频次,由连接远端内存的扩展控制器统计远端内存的访问频次。
基于应用确定访问频次的阈值,将访问频次大于阈值的数据判定为热数据放到近端内存中,将访问频次小于阈值的数据判定为冷数据放到远端内存中。在阈值的设定不影响应用性能的前提下发现只有少量的访问内存数据是热数据,多数的内存访问数据是冷数据。计算机设备可以配置更多比例的远端内存,降低***成本,热数据比例的降低,可以减少近端内存和远端内存之间数据迁移,降低***内存带宽的占用和CPU开销。
作为一种可能的实现方式,本申请还提供另一种计算机设备的结构示意图。如图6所示,在计算机设备100包括处理器110和内存120的基础上,计算机设备100还包括内存610。内存120可以作为近端内存,内存610可以作为远端内存。处理器110和内存610通过总线620相连。
内存610包括扩展控制器611、内存颗粒612和存储介质613。内存颗粒612包括DRAM和SCM。存储介质613可以是DRAM。
扩展控制器611用于根据处理器110的访问请求在内存颗粒612中执行读操作或写操作,并对内存颗粒612中数据块的访问频次进行计数,将数据块的访问频次存储到存储介质613。扩展控制器611还可以外接存储介质614,用于扩展存储介质613的存储容量,存储数据块的访问频次。
扩展控制器611,还用于从存储介质613中获取数据块的访问频次。
扩展控制器611的功能和对内存610中内存颗粒612的数据块的访问频次进行计数的方法,以及获取从存储介质613中获取数据块的访问频次可以参考上述控制器统计近端内存的访问频次的相关阐述。
内存610可以是易失性存储器池或非易失性存储器池,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是ROM、PROM、EPROM、EEPROM或闪存。易失性存储器可以是RAM。通过示例性但不是限制性说明,许多形式的RAM可用,例如SRAM、DRAM、SDRAM、DDR SDRAM、ESDRAM、SLDRAM和DR RAM。
总线620可以包括一通路,用于在上述组件(如处理器110和内存610)之间传送数据。例如,处理器110向内存610发送的访问请求,以及内存610向处理器110反馈的数据块的 访问频次。总线620除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。例如,总线620为DDR总线。但是为了清楚说明起见,在图中将各种总线都标为总线620。总线620可以是快捷***部件互连标准(Peripheral Component Interconnect Express,PCIe)总线,或扩展工业标准结构(extended industry standard architecture,EISA)总线、统一总线(unified bus,Ubus或UB)、计算机快速链接(compute express link,CXL)、缓存一致互联协议(cache coherent interconnect for accelerators,CCIX)等。总线620可以分为地址总线、数据总线、控制总线等。
本申请实施例对计算机设备包含的近端内存和远端内存的结构进行了说明,对近端内存和远端内存的内存访问热度统计的方法可以参考上述对近端内存进行内存访问热度统计的阐述。由近端内存中RCD统计近端内存的访问频次,由连接远端内存的扩展控制器统计远端内存的访问频次,几乎不消耗CPU性能,而且处理器以缓存线为粒度对内存进行操作,而缓存线属于管理的某一页面,则内存中的缓存线***作一次,缓存线所属的页面也就被读写一次。控制器统计***作的缓存线所属的页面的访问频次,有效地提升了页面的访问热度的识别准确性,而且以页面为粒度统计页面的访问频次,兼容了计算机设备对内存的页面管理方式,该内存访问热度统计方法具备易用性。另外,基于应用确定访问频次的阈值,将访问频次大于阈值的数据判定为热数据放到近端内存中,将访问频次小于阈值的数据判定为冷数据放到远端内存中。在阈值的设定不影响应用性能的前提下发现只有少量的访问内存数据是热数据,多数的内存访问数据是冷数据。计算机设备可以配置更多比例的远端内存,降低***成本,热数据比例的降低,可以减少近端内存和远端内存之间数据迁移,降低***内存带宽的占用和CPU开销。
为了实现上述实施例中的功能,控制器包括了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本申请中所公开的实施例描述的各示例的单元及方法步骤,本申请能够以硬件或硬件和计算机软件相结合的形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用场景和设计约束条件。
上文中结合图1至图6,详细描述了根据本申请所提供的内存访问热度统计方法,下面将结合图7,描述根据本申请所提供的内存访问热度统计装置。
图7为本申请提供的可能的内存访问热度统计装置的结构示意图。这些内存访问热度统计装置可以用于实现上述方法实施例中控制器的功能,因此也能实现上述方法实施例所具备的有益效果。在本实施例中,该内存访问热度统计装置可以是如图2所示的控制器,还可以是应用于计算机设备的模块(如芯片)。
如图7所示,内存访问热度统计装置700包括通信模块710、统计模块720和存储模块730。内存访问热度统计装置700用于实现上述图2中所示的方法实施例中控制器的功能。
通信模块710用于根据获取的访问请求确定第一地址,所述访问请求用于指示所述控制器所在计算机设备中处理器运行的应用程序对所述计算机设备的内存的操作,所述第一地址为所述内存中的物理地址。
统计模块720,用于统计所述第一地址所在数据块的访问频次,所述数据块的大小为所述处理器可访问的所述内存中单位存储空间的倍数。例如,统计模块720用于执行图2中步骤210至步骤250。
统计模块720,具体用于统计所述第一地址所归属的单位存储空间所在的数据块的访问频次。
存储模块730用于存储访问频次,以便于统计模块720根据访问频次确定所述数据块的访问热度,触发数据迁移。
本申请实施例的内存访问热度统计装置700可以通过专用集成电路(application-specific integrated circuit,ASIC)实现,或可编程逻辑器件(programmable logic device,PLD)实现,上述PLD可以是复杂程序逻辑器件(complex programmable logical device,CPLD),现场可编程门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)、DPU、SoC或其任意组合。也可以通过软件实现图2所示的内存访问热度统计方法时,及其各个模块也可以为软件模块,内存访问热度统计装置700及其各个模块也可以为软件模块。
根据本申请实施例的内存访问热度统计装置700可对应于执行本申请实施例中描述的方法,并且内存访问热度统计装置700中的各个单元的上述和其它操作和/或功能分别为了实现图2中的各个方法的相应流程,为了简洁,在此不再赘述。
图8为本申请提供的一种控制器800的结构示意图。如图所示,控制器800包括处理单元810、总线820、存储单元830和通信接口840。处理单元810、存储单元830和通信接口840通过总线820相连。
在本实施例中,处理单元810可以是CPU,该处理单元810还可以是其他通用处理器、数字信号处理器(digital signal processing,DSP)、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者是任何常规的处理器等。
通信接口840用于实现控制器800与外部设备或器件的通信。在本实施例中,控制器800用于实现图2所示的控制器的功能时,通信接口840用于获取访问请求。
总线820可以包括一通路,用于在上述组件(如处理单元810和存储单元830)之间传送信息。总线820除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都标为总线820。总线820可以是快捷***部件互连标准(Peripheral Component Interconnect Express,PCIe)总线,或扩展工业标准结构(extended industry standard architecture,EISA)总线、统一总线(unified bus,Ubus或UB)、计算机快速链接(compute express link,CXL)、缓存一致互联协议(cache coherent interconnect for accelerators,CCIX)等。总线820可以分为地址总线、数据总线、控制总线等。
作为一个示例,控制器800可以包括多个处理器。处理器可以是一个多核(multi-CPU)处理器。这里的处理器可以指一个或多个设备、电路、和/或用于处理数据(例如计算机程序指令)的计算单元。在本实施例中,控制器800用于实现图2所示的控制器的功能时,处理单元810统计所述第一地址所在数据块的访问频次。
可选地,也可以将统计数据块的访问频次的方法烧制在处理单元810中,以便于处理单元810统计数据块的访问频次。
图8中仅以控制器800包括1个处理单元810和1个存储单元830为例,此处,处理单元810和存储单元830分别用于指示一类器件或设备,具体实施例中,可以根据业务需求确定每种类型的器件或设备的数量。
存储单元830可以对应上述方法实施例中用于存储访问频次等信息。存储单元830可以是易失性存储器池或非易失性存储器池,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器 (random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data date SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。
存储单元830还可以对应上述方法实施例中用于存储计算机指令等信息的存储介质,例如,磁盘,如机械硬盘或固态硬盘。
上述控制器800可以是一个通用设备或者是一个专用设备。例如,控制器800也可以是服务器或其他具有计算能力的设备。
根据本实施例的控制器800可对应于本实施例中的内存访问热度统计装置700,并可以对应于执行根据图2中任一方法中的相应主体,并且内存访问热度统计装置700中的各个模块的上述和其它操作和/或功能分别为了实现图2中的各个方法的相应流程,为了简洁,在此不再赘述。
本申请实施例所述的控制器800可以为计算机设备的内存中的RCD控制器。或者,控制器800为与计算机设备的内存连接的扩展控制器。
本申请实施例还提供一种芯片,包括:处理器和供电电路;所述供电电路用于为所述处理器供电;所述处理器用于执行上述实施例所述的内存访问热度统计方法。
本申请实施例还提供一种内存,内存包括存储器和控制器,存储器用于存储一组计算机指令;当控制器执行一组计算机指令时执行上述实施例所述的内存访问热度统计方法。
本申请实施例还提供一种主板,主板包括控制器,控制器执行上述实施例所述的内存访问热度统计方法。
本实施例中的方法步骤可以通过硬件的方式来实现,也可以由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于随机存取存储器(random access memory,RAM)、闪存、只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)、寄存器、硬盘、移动硬盘、CD-ROM或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。另外,该ASIC可以位于计算设备中。当然,处理器和存储介质也可以作为分立组件存在于计算设备中。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机程序或指令。在计算机上加载和执行所述计算机程序或指令时,全部或部分地执行本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、网络设备、用户设备或者其它可编程装置。所述计算机程序或指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机程序或指令可以从一个网站站点、计算机、服务器或数据中心通过有线或无线方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是集成一个或多个可用介质的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,例如,软盘、硬盘、磁带;也可以是光 介质,例如,数字视频光盘(digital video disc,DVD);还可以是半导体介质,例如,固态硬盘(solid state drive,SSD)。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。

Claims (21)

  1. 一种内存访问热度统计方法,其特征在于,所述方法由控制器执行,包括:
    根据获取的访问请求确定第一地址,所述访问请求用于指示所述控制器所在计算机设备中处理器运行的应用程序对所述计算机设备的内存的操作,所述第一地址为所述内存中的物理地址;
    统计所述第一地址所在数据块的访问频次,所述数据块的大小为所述处理器可访问的所述内存中单位存储空间的倍数;
    根据所述访问频次确定所述数据块的访问热度。
  2. 根据权利要求1所述的方法,其特征在于,统计所述第一地址所在数据块的访问频次,包括:
    统计所述第一地址所归属的单位存储空间所在的数据块的访问频次。
  3. 根据权利要求2所述的方法,其特征在于,所述方法还包括:
    根据所述第一地址所在数据块的地址和地址映射关系识别第二地址,所述第二地址用于指示所述控制器中存储所述数据块的访问频次的位置,所述地址映射关系用于指示所述数据块的地址与存储访问频次的存储空间的地址的映射关系。
  4. 根据权利要求3所述的方法,其特征在于,所述方法还包括:
    根据所述处理器指示的第三地址和所述地址映射关系识别所述第二地址,获取所述数据块的访问频次,所述第三地址是由所述第二地址确定的。
  5. 根据权利要求1-4中任一项所述的方法,其特征在于,所述数据块的大小为所述处理器可访问的所述内存中一个页面的大小。
  6. 根据权利要求1-4中任一项所述的方法,其特征在于,所述数据块的大小为所述处理器采用交织方式可访问的所述内存中交织后数据块的大小,所述交织方式指所述计算机设备中处理器将运行的应用程序对所述计算机设备的内存的操作的数据分布到多个内存进行操作。
  7. 根据权利要求1-6中任一项所述的方法,其特征在于,所述单位存储空间的大小为所述处理器访问所述内存时缓存线的大小。
  8. 一种内存访问热度统计装置,其特征在于,包括:
    通信模块,用于根据获取的访问请求确定第一地址,所述访问请求用于指示控制器所在计算机设备中处理器运行的应用程序对所述计算机设备的内存的操作,所述第一地址为所述内存中的物理地址;
    统计模块,用于统计所述第一地址所在数据块的访问频次,所述数据块的大小为所述处理器可访问的所述内存中单位存储空间的倍数;
    所述统计模块,还用于根据所述访问频次确定所述数据块的访问热度。
  9. 根据权利要求8所述的装置,其特征在于,所述统计模块统计所述第一地址所在数据块的访问频次时,具体用于:
    统计所述第一地址所归属的单位存储空间所在的数据块的访问频次。
  10. 根据权利要求9所述的装置,其特征在于,所述统计模块,还用于:
    根据所述第一地址所在数据块的地址和地址映射关系识别第二地址,所述第二地址用于指示所述控制器中存储所述数据块的访问频次的位置,所述地址映射关系用于指示所述数据块的地址与存储访问频次的存储空间的地址的映射关系。
  11. 根据权利要求10所述的装置,其特征在于,所述统计模块,还用于:
    根据所述处理器指示的第三地址和所述地址映射关系识别所述第二地址,获取所述数据块的访问频次,所述第三地址是由所述第二地址确定的。
  12. 根据权利要求8-11中任一项所述的装置,其特征在于,所述数据块的大小为所述处理器可访问的所述内存中一个页面的大小。
  13. 根据权利要求8-11中任一项所述的装置,其特征在于,所述数据块的大小为所述处理器采用交织方式可访问的所述内存中交织后数据块的大小,所述交织方式指所述计算机设备中处理器将运行的应用程序对所述计算机设备的内存的操作的数据分布到多个内存进行操作。
  14. 根据权利要求8-13中任一项所述的装置,其特征在于,所述单位存储空间的大小为所述处理器访问所述内存时缓存线的大小。
  15. 一种控制器,其特征在于,所述控制器包括存储单元和处理单元,所述存储单元用于存储一组计算机指令;当所述处理单元执行所述一组计算机指令时,执行上述权利要求1-7中任一项所述的方法的操作步骤实现识别内存中数据块被应用程序访问的访问频次。
  16. 根据权利要求15所述的控制器,其特征在于,所述控制器为计算机设备的内存中的寄存时钟驱动器RCD。
  17. 根据权利要求15所述的控制器,其特征在于,所述控制器为与计算机设备的内存中的扩展控制器。
  18. 一种芯片,其特征在于,包括:处理器和供电电路;其中,所述供电电路用于为所述处理器供电;所述处理器用于执行权利要求1至7中任一项所述的方法的操作步骤。
  19. 一种内存,其特征在于,所述内存包括存储器和如权利要求15-17中任一项所述的控制器,所述存储器用于存储一组计算机指令;当所述控制器执行所述一组计算机指令时,执行上述权利要求1-7中任一项所述的方法的操作步骤实现识别内存中数据块被应用程序访问的访问频次。
  20. 一种主板,其特征在于,所述主板包括如权利要求15-17中任一项所述的控制器,所述控制器执行上述权利要求1-7中任一项所述的方法的操作步骤实现识别内存中数据块被应用程序访问的访问频次。
  21. 一种计算机设备,其特征在于,所述计算机设备包括如权利要求20所述的主板。
PCT/CN2023/095923 2022-05-24 2023-05-24 内存访问热度统计方法、相关装置及设备 WO2023227004A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202210575022 2022-05-24
CN202210575022.5 2022-05-24
CN202211016161.0 2022-08-24
CN202211016161.0A CN117149049A (zh) 2022-05-24 2022-08-24 内存访问热度统计方法、相关装置及设备

Publications (1)

Publication Number Publication Date
WO2023227004A1 true WO2023227004A1 (zh) 2023-11-30

Family

ID=88884861

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/095923 WO2023227004A1 (zh) 2022-05-24 2023-05-24 内存访问热度统计方法、相关装置及设备

Country Status (2)

Country Link
CN (1) CN117149049A (zh)
WO (1) WO2023227004A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117632043A (zh) * 2024-01-25 2024-03-01 北京超弦存储器研究院 Cxl内存模组、控制芯片、数据处理方法、介质和***
CN118170694A (zh) * 2024-05-15 2024-06-11 剑博微电子(南京)有限公司 存储器的读取方法、装置和计算机设备

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108804350A (zh) * 2017-04-27 2018-11-13 华为技术有限公司 一种内存访问方法及计算机***
US10545683B2 (en) * 2016-02-08 2020-01-28 International Business Machines Corporation Asymmetric storage data distribution
CN110858124A (zh) * 2018-08-24 2020-03-03 华为技术有限公司 数据迁移方法及装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10545683B2 (en) * 2016-02-08 2020-01-28 International Business Machines Corporation Asymmetric storage data distribution
CN108804350A (zh) * 2017-04-27 2018-11-13 华为技术有限公司 一种内存访问方法及计算机***
CN110858124A (zh) * 2018-08-24 2020-03-03 华为技术有限公司 数据迁移方法及装置

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117632043A (zh) * 2024-01-25 2024-03-01 北京超弦存储器研究院 Cxl内存模组、控制芯片、数据处理方法、介质和***
CN117632043B (zh) * 2024-01-25 2024-05-28 北京超弦存储器研究院 Cxl内存模组、控制芯片、数据处理方法、介质和***
CN118170694A (zh) * 2024-05-15 2024-06-11 剑博微电子(南京)有限公司 存储器的读取方法、装置和计算机设备

Also Published As

Publication number Publication date
CN117149049A (zh) 2023-12-01

Similar Documents

Publication Publication Date Title
WO2021004231A1 (zh) 一种闪存设备中的数据存储方法及闪存设备
WO2023227004A1 (zh) 内存访问热度统计方法、相关装置及设备
US8370533B2 (en) Executing flash storage access requests
WO2021218038A1 (zh) 一种存储***、内存管理方法和管理节点
CN107908571B (zh) 一种数据写入方法、闪存装置及存储设备
WO2017092002A1 (zh) 应用于计算机***的数据迁移方法和装置、计算机***
US10754785B2 (en) Checkpointing for DRAM-less SSD
WO2023035646A1 (zh) 一种扩展内存的方法、装置及相关设备
WO2019127135A1 (zh) 文件页表管理技术
WO2023125524A1 (zh) 数据存储方法、***、存储访问配置方法及相关设备
US10095432B2 (en) Power management and monitoring for storage devices
WO2013101158A1 (en) Metadata management and support for phase change memory with switch (pcms)
WO2023051715A1 (zh) 数据处理的方法、装置、处理器和混合内存***
CN112115067A (zh) 闪存物理资源集合管理装置及方法及计算机可读取存储介质
EP4300308A1 (en) Method and device for achieving memory sharing control, computer device, and system
WO2023045483A1 (zh) 一种存储设备、数据存储方法及存储***
US20190042415A1 (en) Storage model for a computer system having persistent system memory
WO2017107162A1 (zh) 一种异构混合内存组件、***及存储方法
US20240256449A1 (en) Tracking memory modifications at cache line granularity
US20240020014A1 (en) Method for Writing Data to Solid-State Drive
CN110737607A (zh) 管理hmb内存的方法、装置、计算机设备及存储介质
CN117992360A (zh) 存储***及存储方法
WO2022262345A1 (zh) 一种数据管理方法、存储空间管理方法及装置
EP4398116A1 (en) Computing node cluster, data aggregation method and related device
WO2023000696A1 (zh) 一种资源分配方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23811075

Country of ref document: EP

Kind code of ref document: A1