WO2022160321A1 - Method and apparatus for accessing memory - Google Patents

Method and apparatus for accessing memory Download PDF

Info

Publication number
WO2022160321A1
WO2022160321A1 PCT/CN2021/074562 CN2021074562W WO2022160321A1 WO 2022160321 A1 WO2022160321 A1 WO 2022160321A1 CN 2021074562 W CN2021074562 W CN 2021074562W WO 2022160321 A1 WO2022160321 A1 WO 2022160321A1
Authority
WO
WIPO (PCT)
Prior art keywords
access
memory
address
storage area
bit
Prior art date
Application number
PCT/CN2021/074562
Other languages
French (fr)
Chinese (zh)
Inventor
范团宝
崔永
俞东斌
肖勇军
刘晨
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN202180077267.9A priority Critical patent/CN116472520A/en
Priority to PCT/CN2021/074562 priority patent/WO2022160321A1/en
Publication of WO2022160321A1 publication Critical patent/WO2022160321A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/06Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication

Definitions

  • the present application relates to the technical field of memory access, and in particular, to a method and apparatus for accessing memory.
  • the low power double data rate (LPDDR) 5 memory chip defined in the memory standard has a maximum read and write access rate of 5500Mbps.
  • LPDDR5 low power double data rate
  • JEDEC Joint Electron Device Engineering Council
  • the JEDEC standard defines the constraints of continuous access, that is, it is recommended that the commands for continuous access to the memory chip hit different bank groups (BG).
  • BG interleaving is interleaving between different BGs.
  • the constraint of BG interleaving imposes strict requirements on the address sequence of the SoC accessing the memory chip, which may cause most applications to fail to obtain the highest access rate of the memory chip.
  • a DDR controller and a port physical layer are integrated in the SoC, and the memory chip includes multiple channels (Channels), and each channel includes 4 ⁇ 4 banks.
  • a patchwork control logic is integrated in the DDR controller, for example, the logic is used to advance the sending order of the access commands sent later in time, patch together with the access commands earlier in time and send them continuously, or send the access commands sent earlier in time. The sending order of the command is delayed, and it is pieced together with the access command later in time and sent continuously. In BG mode, the pieced control logic can re-order the access command stored in the buffer of the DDR controller.
  • Arrange the access commands in the rearranged order (for example, the three access commands that access BG1 in Figure 1 are pieced together) and sent to the memory chip through the PHY.
  • the rearranged order can make the access commands satisfy the BG interleaving as much as possible. constraints to improve the probability of access conforming to BG interleaving.
  • Embodiments of the present application provide a method and apparatus for accessing memory, which can improve the power consumption performance of a memory chip.
  • the following technical solutions are adopted in the embodiments of the present application.
  • a memory access device in a first aspect, includes: a controller configured to determine that a first access command accesses a first storage area or a second storage area in a memory chip according to an access address of the first access command; A storage area does not overlap with the second storage area; the interleaver is used to interleave the address of the first access command according to the first interleaving scheme when the controller determines that the first access command accesses the first storage area to obtain an interleaved access address; and when the controller determines that the first access command accesses the second storage area, the address of the first access command is interleaved according to the second interleaving scheme to obtain an interleaved access address, and the interleaved access address is used to access the memory chip.
  • the memory access device provided by the application can use different interleaving schemes to interleave the addresses of the access commands for multiple different storage areas to obtain the interleaved
  • the interleaving scheme used for accessing addresses is different. It can be understood that different interleaving schemes can make the specific memory bank groups (eg BG) of the memory chip accessed by the access command different, for example, the number of memory bank groups accessed by the access command to the memory chip is different, thereby resulting in different power consumption of the memory chip.
  • the first interleaving scheme can be used to enable the first access command to access fewer memory bank groups in the first storage area; for a business scenario with a high power consumption requirement, the second interleaving scheme can be adopted.
  • the interleaving scheme accesses more banks of memory in the second storage area. Therefore, the memory access device provided by the present application can enable the memory device to use different interleaving schemes to access the storage area corresponding to the business scenario according to different business scenarios with different power consumption requirements when accessing the memory chip, and can optimize different The power consumption of the memory chip in the application scenario.
  • the second bandwidth required to access the second data in the second storage area is higher than the first bandwidth required to access the first data in the first storage area.
  • the access bandwidth of the memory chip can be lower, the power consumption is lower, and the energy efficiency is higher;
  • the second interleaving scheme is used for address interleaving, which can make the access bandwidth of the memory chip higher.
  • the access bandwidth actually used bandwidth
  • the second interleaving scheme adopted in this application can make the access bandwidth inside the chip higher, the access bandwidth inside the memory chip can support the access bandwidth of the external bus of the external memory chip, and the bandwidth usage efficiency higher.
  • the bandwidth usage efficiency can be understood as the ratio of the bandwidth actually used by the memory chip at one time to the full bandwidth of the memory chip when accessing the memory chip once, or it can be understood as the data actually transmitted by the memory chip at one time The proportion of the total amount of data that can be transmitted over the full bandwidth.
  • the ratio is small, it means that the bandwidth usage efficiency of the memory chip is low, and the bandwidth is wasted.
  • the small piece of data only occupies a part of the full bandwidth of the memory chip, and another part of the bandwidth is occupied but not used, that is, no data is transmitted, and part of the bandwidth is wasted.
  • the controller is further configured to split the first access command into multiple subcommands, and the interleaved access address includes the access addresses of the multiple subcommands; in the second interleaving scheme, the The access addresses are respectively used to access different memory bank groups in the second storage area; each memory bank group in the different memory bank groups includes multiple memory banks; in the first interleaving scheme, the access addresses of the multiple subcommands are used to access the first memory bank. Different banks in the same bank group in a storage area or the same bank in the same bank group.
  • the second interleaving scheme when the second interleaving scheme is adopted, so that multiple subcommands access different memory bank groups in the second storage area (the memory bank group may be the BG in this application), it is equivalent to that multiple memory bank groups are processing data at the same time , multiple memory bank groups are activated and accessed at the same time, the speed of accessing the memory chip is faster, and the bandwidth used is also higher.
  • the first interleaving scheme so that multiple subcommands access the same memory bank group in the first storage area, or even the same memory bank in the same memory bank group, it can be equivalent to only one memory bank group processing data, and only one memory bank The group is activated to access, the power consumption of the memory chip is lower, and the energy efficiency is higher.
  • each subcommand of the plurality of subcommands includes a plurality of bits for indicating an access address of each subcommand, each of the plurality of bits corresponds to an address line of the access address, and each subcommand
  • the amount of data accessed is 2 M-1 units, where M is an integer greater than 1. It can be understood that if the amount of data accessed by the first access command is 2 ⁇ 2 M-1 units, if the first access command is divided into 2 subcommands, the amount of data accessed by each subcommand is 2 M-1 Unit, the unit can be Byte.
  • the M-th bit and the N-th bit from the low-order bits are used to jointly indicate the memory bank group accessed by each subcommand; N is an integer greater than M .
  • Multiple bits can be understood as multiple bits corresponding to the first address of the amount of data to be accessed carried by the subcommand.
  • the bandwidth usage efficiency of the memory chip is relatively high. This requires that in the address information indicated by the address lines corresponding to the first addresses of different subcommands, the address information indicating the memory bank group is different.
  • the first address of each subcommand is determined according to the first address of the previously determined subcommand and the data length accessed by each subcommand, the first address of the previous subcommand is offset by 2 M
  • the address information of the Mth bit from the low order among the multiple bits of the previous subcommand and the Mth bit from the low order of the multiple bits of the next subcommand is different; when the address information of the M-th bit corresponding to different subcommands changes, it means that the memory bank group accessed by the indicated subcommand in the multiple bits of different subcommands changes, which further means that the first access command Split subcommands can access different memory bank groups.
  • the bandwidth for accessing the second memory region is relatively high, and the Mth bit from the lower bit in the multiple bits in the subcommand is used to indicate the lower bit of the accessed memory bank group.
  • the Mth bit from the lower bit in the multiple bits in the subcommand is used to indicate the lower bit of the accessed memory bank group.
  • it is used to indicate that the high bit of the accessed memory bank group, that is, the Nth bit, is higher than the Mth bit.
  • the R-th bit and the S-th bit from the low-order bits are used to indicate the memory bank accessed by each subcommand, where R is an integer greater than N, and S is an integer greater than R.
  • each memory bank group includes 4 memory banks. Therefore, the memory bank can also transmit the address information of the memory bank by using the address lines corresponding to 2 bits in the multiple bits.
  • the memory bank includes bank0, bank1, bank2 and bank3.
  • a 2-bit address line is used to transmit the address information of the bank.
  • the bit indicating the memory bank may be higher than the bit indicating the memory bank group.
  • the P-th bit and the Q-th bit from the low-order bits are used to jointly indicate the memory bank group accessed by each subcommand, and P is an integer greater than M , Q is an integer greater than P.
  • P is an integer greater than M
  • Q is an integer greater than P.
  • the address information of the P-th bit from the low-order bit in the multiple bits of the subcommand is the same until the accessed data volume is accumulated to (2 P-1 )kb.
  • the address information of the Pth bit corresponding to different subcommands is the same, it means that the memory bank groups accessed by the indicated subcommands in the multiple bits of different subcommands are all the same within this address range.
  • the access address of the command is within this continuous address range, the subcommands after the first access command is split also access the same memory bank group. Therefore, in order to implement the first interleaving scheme in the present application, the power consumption of accessing the first memory region is low, and the energy efficiency is high.
  • the Jth bit and the Kth bit starting from the lower order of the plurality of bits are used to indicate the memory bank accessed by each subcommand, where J is an integer greater than Q, and K is an integer greater than J. That is, in the first interleaving scheme of the present application, the bit indicating the memory bank is also higher than the bit indicating the memory bank group.
  • the controller of the present application can split the first access command into a greater number of subcommands, so that when performing address interleaving, more Larger numbers of subcommands are more efficient in bandwidth usage when accessing different banks of memory at the same time.
  • a memory access method comprising: determining, according to an access address of the first access command, that a first access command accesses a first storage area or a second storage area in a memory chip; The storage areas do not overlap; and when it is determined that the first access command accesses the first storage area, the address of the first access command is interleaved according to the first interleaving scheme to obtain the interleaved access address; when it is determined that the first access command accesses the second storage area When storing the area, the address of the first access command is interleaved according to the second interleaving scheme to obtain the interleaved access address, and the interleaved access address is used to access the memory chip.
  • the beneficial effects of the second aspect please refer to the description of the beneficial effects of the first aspect.
  • the second bandwidth required to access the second data in the second storage area is higher than the first bandwidth required to access the first data in the first storage area.
  • the method further includes: splitting the first access command into multiple subcommands, and the interleaved access address includes the access addresses of the multiple subcommands; in the second interleaving scheme, the access addresses of the multiple subcommands The addresses are respectively used to access different memory bank groups in the second storage area; each memory bank group in the different memory bank groups includes multiple memory banks; in the first interleaving scheme, the access addresses of the multiple subcommands are used to access the first Different banks in the same bank group in a storage area or the same bank in the same bank group.
  • each subcommand of the plurality of subcommands includes a plurality of bits for indicating an access address of each subcommand, each of the plurality of bits corresponds to an address line of the access address, and each subcommand The amount of data accessed is 2 M-1 units, where M is an integer greater than 1.
  • the M-th bit and the N-th bit from the low-order bits are used to jointly indicate the memory bank group accessed by each subcommand; N is an integer greater than M .
  • the R-th bit and the S-th bit from the low-order bits are used to indicate the memory bank accessed by each subcommand, where R is an integer greater than N, and S is an integer greater than R.
  • the P-th bit and the Q-th bit from the low-order bits are used to jointly indicate the memory bank group accessed by each subcommand, and P is an integer greater than M , Q is an integer greater than P.
  • the Jth bit and the Kth bit starting from the lower order of the plurality of bits are used to indicate the memory bank accessed by each subcommand, where J is an integer greater than Q, and K is an integer greater than J.
  • the first access command when the first access command accesses the first storage area, the first access command is split into X subcommands; and when the first access command accesses the second storage area, the first access command is split into X subcommands; Split into Y subcommands; Y is greater than X, and X and Y are integers greater than 1.
  • a third aspect provides a communication chip, where the communication chip includes the memory access device described in the first aspect or any possible design of the first aspect.
  • an electronic device in a fourth aspect, includes the memory access device according to the first aspect or any possible design of the first aspect.
  • a computer-readable storage medium comprising computer instructions, which, when the computer instructions are executed on an electronic device, cause the electronic device to perform the method described in the first aspect or any possible design of the first aspect .
  • a sixth aspect provides a computer program product that, when the computer program product runs on a computer, enables an electronic device to perform the method described in the first aspect or any possible design of the first aspect.
  • FIG. 1 is a schematic diagram of accessing a memory chip using patchwork control logic according to an embodiment of the present application
  • FIG. 2 is a schematic diagram of a storage structure of a DDR particle provided by an embodiment of the present application
  • FIG. 3 is a schematic diagram of an access to a Channel X of LPDDR5 in a BG mode provided by an embodiment of the present application;
  • FIG. 4 is a schematic diagram of transmission of an access command and data output in a BG mode provided by an embodiment of the present application
  • FIG. 5 is a schematic diagram of the simulation bandwidth requirement of GPU accessing LPDDR5 in a BG mode provided by the embodiment of the present application;
  • FIG. 5 is a schematic diagram of the measured bandwidth requirement of GPU accessing LPDDR5 in a BG mode provided by the embodiment of the present application;
  • FIG. 6 is a schematic diagram of the division of a storage area according to an embodiment of the present application.
  • FIG. 7 is a schematic diagram of different service types accessing different storage areas in a game scenario provided by an embodiment of the present application.
  • FIG. 8 is a frame diagram of a memory access device provided by the application.
  • FIG. 9 is a schematic flowchart of a memory allocation process provided by an embodiment of the present application.
  • FIG. 10 is a schematic flowchart of a memory access method provided by an embodiment of the present application.
  • FIG. 11 is a schematic diagram of address information indicating a memory bank group provided by an embodiment of the present application.
  • FIG. 12 is a schematic diagram of address information indicating a memory bank group and a memory bank provided by an embodiment of the present application;
  • FIG. 13 is a schematic diagram of interleaving of a second interleaving scheme provided by an embodiment of the present application.
  • FIG. 14 is a schematic diagram of using a second interleaving scheme to access different BGs according to an embodiment of the present application
  • FIG. 15 is a schematic diagram of interleaving of another second interleaving scheme provided by an embodiment of the present application.
  • 16 is a schematic diagram of using a second interleaving scheme to access different BGs according to an embodiment of the present application
  • 17 is a schematic diagram of address information indicating a memory bank group provided by an embodiment of the present application.
  • 18 is a schematic diagram of address information indicating a memory bank group and a memory bank provided by an embodiment of the present application
  • FIG. 19 is a schematic diagram of interleaving of a first interleaving scheme provided by an embodiment of the present application.
  • FIG. 20 is a schematic diagram of accessing the same BG using a first interleaving scheme provided by an embodiment of the present application
  • FIG. 21 is a schematic structural diagram of a SoC provided by an embodiment of the application.
  • FIG. 22 is a schematic structural diagram of a DDR controller according to an embodiment of the present application.
  • FIG. 23 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • each memory chip (which can be the DDR particle in this application) includes multiple Channels, and each Channel can support two DQs (Define Quadra Word, in assembly language).
  • DQs Deform Quadra Word, in assembly language.
  • Pseudo operation command used to define the number of bytes occupied by the operand) Byte interface access
  • multiple BGs can be accessed through each DQ Byte interface, that is, a BG includes multiple banks, and each bank includes multiple rows and columns. multiple small storage areas (cells).
  • FIG. 2 shows a schematic diagram of the structure included in the DDR particle.
  • the DDR particle/chip is an LPDDR5 chip as an example, and the LPDDR5 chip is referred to as LPDDR5 for short.
  • the DDR particle in Figure 2 includes a channel, namely Channel X (illustrated as Channel X in Figure 2), through the DQ Byte0 (DQ Byte 0 illustrated in Figure 2) interface and DQ Byte1 (DQ Byte 1 illustrated in Figure 2 Byte 1) interface ) interface can access Channel X, and the burst length (BL) accessible through each DQ Byte interface can be 128Byte or 64Byte, etc.
  • a DQ Byte interface can access one or more of the 4 BGs (BG0, BG1, BG2, and BG3).
  • each BG corresponds to 4 banks, bank0, bank1, bank2, and bank3, and 4 BGs include 16 banks.
  • the internal access bandwidth of LPDDR5 is only half of the access bandwidth of the external bus, that is to say, the external access rate of LPDDR5 is too high, the internal access rate of LPDDR5 is too late to support the external access rate, and half of the access bandwidth of LPDDR5 does not transmit data, resulting in The bandwidth of LPDDR5 is wasted, and the data transmission efficiency of LPDDR5 is lower.
  • the JEDEC standard defines continuous access constraints for BG mode, that is, it is recommended that in BG mode, the command to access LPDDR5 continuously can be used to access different BGs, that is, the command to access LPDDR5
  • the first command sent by the CPU is used to read bank0 of BG0, followed by the second command to read BG1 (BG0 In the bank0 in other BGs), the BG accessed by the first command is different from the BG accessed by the second BG.
  • the commands of the CPU to access the DDR particles are sent through the command and address bus (Command and Address bus, CA) in advance to send multiple access commands to the LPDDR5 before outputting the data of the multiple access commands at the same time, that is, the data to be read by the access command After being read, it is not output in time, but after the data of multiple access commands are read, it is output at the same time.
  • CA Common and Address bus
  • LPDDR5 can already execute the second command in advance when the data of the second command is about to be transmitted, that is, read the data in BG1 bank0 in advance, when the 128bit data of BG0 bank0 is transmitted, the 128bit data of BG1 bank0 has been advanced It is read, so the data of BG1 bank0 can be output at the same time as the data of BG0 bank0, that is, LPDDR5 can output 256bit BL access, so as to achieve the same internal and external bus access bandwidth.
  • This BG interleaving access method has been defined in the JEDEC standard, that is, the access in BG mode needs to be interleaved between BGs, as shown in Figure 4, CK_t and CK_c in Figure 4 represent the clock for the device inside the SoC to send the access command (clock), the access command can be sent in advance at the rising edge time such as T0, T1 and T2.
  • the order of the commands shown in Figure 4 is to access the BG-interleaved read commands such as BGn, BGm, and BGn.
  • LPDDR5 passes DMI ( Data Mask Inversion) can realize continuous data output for commands such as accessing BGn, BGm, BGn, etc.
  • DMI Data Mask Inversion
  • WCK_t and WCK_c in Figure 4 represent the clock in the DDR particle. Under one clock, DMI outputs one data, and one burst can output data of 0 to 15 clocks.
  • the number of commands to access different BGs is unbalanced in different services, and may vary greatly, so it is difficult to achieve effective BG interleaving.
  • the number of read and write commands sent by the CPU is small, it is difficult to realize effective interleaving between BGs after the read and write commands are pieced together and reordered as shown in FIG. 1 in the prior art; or, in the access commands sent by the CPU, access BG0
  • the CPU mentioned in the above embodiment can also be replaced with other devices that have storage usage requirements. For example, (a) in FIG.
  • the GPU peak bandwidth requirement for DDR is about 60GB/s.
  • (b) in Figure 5 shows the actual measured bandwidth requirements of 4 channels of DDR particles.
  • the horizontal axis represents time in s, and the vertical axis represents bandwidth in GBps.
  • the calculated maximum bandwidth usage efficiency is 62%. Therefore, at present, the maximum bandwidth utilization efficiency of LPDDR5 measured on each SoC chip platform is only about 60%.
  • the present application provides a memory access device, which can regard the storage area of a memory chip as including a plurality of storage areas, and when the memory chip is accessed in the plurality of storage areas, the storage area of the memory chip with high power consumption There are also storage areas where the memory chip consumes less power when accessing the memory chip.
  • the memory access device can access memory chips according to storage areas. As shown in Figure 6, the memory chip includes multiple channels: channel 0, channel 1, ...
  • the memory chip includes a total of e
  • the storage area of the e channel is divided into two storage areas, a first storage area and a second storage area, each area occupying at least one (usually multiple) channel part of the storage area .
  • Each area is represented by the area covered by the cross-hatched line in FIG. 6 .
  • the first storage area is a storage area with low power consumption
  • the second storage area is a storage area with high power consumption.
  • the second storage area when the power consumption of the memory chip is high when accessing the second storage area, the second storage area can be used for services requiring large bandwidth memory applications, and when the power consumption of the memory chip when accessing the first storage area is low, the A storage area can be used for services that require small bandwidth memory applications.
  • the division of the storage area of the memory chip into multiple storage areas can be understood as a logical division of the entire storage area of the memory chip, the first storage area and the second storage area in the memory chip. Regions are not completely isolated.
  • the same BG may contain both the data of the first storage area and the data of the second storage area, and the addresses of the data of the first storage area and the data of the second storage area in the same BG do not overlap.
  • the business requiring a large bandwidth memory application may be, for example, a business in a game scene or a CPU, a network process unit (NPU), a graphics processor (graphics processing unit) in an artificial intelligence (Artificial Intelligence, AI) scene.
  • the application for the DDR memory by the corresponding services such as GPU) or media (Media) processor, and the bandwidth required for these services to access the memory is relatively high.
  • FIG. 7 shows a schematic diagram of different service types accessing different storage areas in a game scenario.
  • the memory access device provided by the present application can use different interleaving schemes to interleave the addresses of the access commands to obtain interleaved access addresses for multiple storage areas with different power consumption, so that the interleaved access addresses can access functions. higher power consumption storage area or lower power consumption storage area.
  • the memory access device 80 provided by the present application may be located in a processor, and the processor may include devices such as a CPU or a GPU, and the memory access device 80 includes a controller (DDR controller) 81 and interleaver 82.
  • DDR controller controller
  • the interleaver 82 can use the first interleaving scheme to interleave, and the interleaved access address is used to access the first storage area, and the controller 81 determines that the access power consumption is relatively low.
  • the interleaver 82 may use the second interleaving scheme to interleave, and the interleaved access address is used to access the second storage area. Therefore, when the memory access device 80 provided by the present application accesses memory chips, different interleaving schemes can be used to access different storage areas according to different scenarios of power consumption requirements, so that the memory chips can be saved even more when different interleaving schemes are used. power consumption.
  • the CPU applies to the controller 81 to allocate memory ( The process of applying for the memory address range to be accessed, the specific memory address within the memory address range is the access address in the above).
  • the above controller 81 may be implemented in software, hardware or a combination of the two.
  • the controller 81 may include logic circuits, and may also run necessary memory management software. The following description will be made by taking the controller 81 running the software to implement software memory management as an example.
  • the implementation of software memory management in this application is different from the implementation of existing software memory management.
  • the management can allocate memory from the first storage area or the second storage area according to business needs for DDR bandwidth. That is, for services requiring high-bandwidth memory application, the software memory management can allocate memory from the second storage area, and for other non-high-bandwidth memory application requirements, the software memory management can allocate memory from the first storage area.
  • the accessed data occupies a larger bandwidth of the memory chip, the power consumption of the memory chip is higher, and the bandwidth usage efficiency is higher; when the memory is allocated from the first storage area, the memory occupied by the accessed data
  • the bandwidth of the chip is small, the power consumption of the memory chip is low, and the power consumption is not wasted. Therefore, this solution achieves a good balance and comprehensive optimization between bandwidth utilization efficiency and power consumption optimization.
  • the memory allocation process of the application can give priority to meeting the bandwidth requirements of services with high power consumption, that is, large bandwidth requirements, and secondly to satisfy the low power consumption as much as possible, that is, bandwidth requirements.
  • User performance and energy efficiency requirements for low-level services Exemplarily, as shown in FIG. 9 , the process of allocating memory is first introduced.
  • the memory allocation process of the present application may include: 1) The CPU applies to the controller 81 for memory.
  • the controller 81 determines whether it is necessary to apply for a large-bandwidth memory; if it is determined to be yes, the controller 81 determines whether the second storage area is powered on, and then enters step 3); if it is determined to be no, the controller 81 applies from the first storage area. memory, and then go to step 4). For example, the controller 81 may determine whether to apply for a large bandwidth memory according to different service scenarios.
  • step 3) can be: 3) when the controller 81 determines that the second storage area is not powered on, first triggers the second storage area to be powered on, and then applies for memory to the second storage area.
  • the controller 81 determines to allocate memory, it first determines whether the memory in the second storage area is sufficient. If it is determined that the memory in the second storage area is insufficient, the second storage area can be migrated or sorted out, and then Region allocates memory. Alternatively, if it is determined that the memory of the second storage area is insufficient, memory may also be requested from the first storage area.
  • the controller 81 determines to apply for memory from the first storage area, it first determines whether the memory in the first storage area is sufficient, and if it is determined to be sufficient, allocates memory from the first storage area; The memory of the area is reclaimed and sorted out, and then, the memory is requested from the first storage area. It can be understood that the result of the CPU applying for the memory to the controller 81 is that the controller 81 obtains the memory address range of one or more memory regions.
  • the controller 81 When the controller 81 processes the application of the CPU and reaches the memory address range corresponding to the storage area, the controller 81 feeds back the memory address range to the CPU, that is, the process of memory allocation is completed. Then the CPU starts the memory access process.
  • the CPU generates an access command according to the memory address range, and the access command carries a specific memory address within the memory address range to be accessed, so that the access command is used to access the memory chip.
  • the memory address here can be, for example, the access address of the first access command mentioned below in this application.
  • the first access command is a command sent by the CPU to the controller 81 for accessing the memory chip.
  • the controller 81 When the controller 81 receives the first access command, it first determines the storage area to be accessed by the first access command, and the interleaver 82 performs BG interleaving according to the interleaving scheme corresponding to the storage area to access the memory chip according to the interleaved address.
  • the memory access device 80 can be applied to a scenario where a CPU accesses a memory chip.
  • the memory access device 80 provided in the present application may be a device located in a processor, such as an SoC.
  • the memory chip may be, for example, LPDDR5.
  • FIG. 10 for a method for a CPU to access a memory chip according to an access address, that is, FIG. 10 shows a memory access method provided by the present application, and the method includes: 140.
  • Access by the memory access device 80 according to a first access command sent by the CPU The address determines that the first access command accesses the first storage area or the second storage area in the memory chip; the first storage area and the second storage area do not overlap as shown in FIG. 6 .
  • the memory access device 80 may be determined according to the address range to which the first access command belongs.
  • the address ranges of the first storage area and the second storage area are different, and the controller 81 in the memory access device may determine the storage area to be accessed according to the first access command and the address ranges of the different storage areas, that is, the first access
  • the controller 81 in the memory access device may determine the storage area to be accessed according to the first access command and the address ranges of the different storage areas, that is, the first access
  • the memory access device 80 interleaves the address of the first access command according to the first interleaving scheme to obtain an interleaved access address; after determining that the first access command When accessing the second storage area, the address of the first access command is interleaved according to the second interleaving scheme to obtain an interleaved access address, and the interleaved access address is used to access the memory chip.
  • the second bandwidth required to access the second data in the second storage area is higher than the first bandwidth required to access the first data in the first storage area.
  • the first interleaving scheme and the second interleaving scheme provided by the present application can make the power consumption of the memory chip different when accessing data in the first storage area of the memory chip and the power consumption of the memory chip when accessing data in the second storage area, or , the bandwidth when the second interleaving scheme is used to access the second storage area is higher than the bandwidth when the first interleaving scheme is used to access the first storage area.
  • the controller 81 When the controller 81 receives the first access command sent by the CPU, the controller 81 will perform command splitting, and the storage area accessed by the first access command is determined according to the address of the access command.
  • the interleaver 82 adopts the second interleaving scheme for the subcommands after the first access command is split. If the subcommands are all used to access different BGs, the accessed data can be accessed from multiple memory chips. Simultaneously output or write multiple BGs in the BG, at this time, the bandwidth utilization efficiency of the memory chip is higher, and the power consumption is higher; when the first access command accesses the first storage area, the interleaver 82 is disassembled to the first access command.
  • the divided subcommand adopts the first interleaving scheme. If the subcommands are used to access the same BG, or even access the same bank in the same BG, the bandwidth usage efficiency of the memory chip is low, but the subcommands All access to a single BG, the activated bank is less, the power consumption of the memory chip is low, and the access energy efficiency is high. Therefore, this solution achieves a good balance and comprehensive optimization between bandwidth utilization efficiency and power consumption optimization.
  • the controller 81 before the interleaver 82 performs address interleaving using the first interleaving scheme or the second interleaving scheme, the controller 81 can also be used to: split the first access command into a plurality of subcommands, and after the interleaving The access address includes the access addresses of multiple subcommands. The controller 81 then sends the split subcommands to the interleaver 82, and the interleaver 82 performs address interleaving by using the first interleaving scheme or the second interleaving scheme according to the accessed storage area.
  • the access addresses of the multiple subcommands are respectively used to access different memory bank groups in the second storage area; each memory bank group (bank group) in the different memory bank groups includes a plurality of memory banks ( bank); in the first interleaving scheme, the access addresses of the multiple subcommands are used to access different memory banks in the same memory bank group or the same memory bank in the same memory bank group in the first storage area.
  • the first interleaving scheme may be a scheme that does not interleave between different memory bank groups, or in other words, the first interleaving scheme is a scheme that performs interleaving access in the same memory bank group, that is, intra-BG interleaving or non-BG interleaving. The power consumption of the scheme is relatively higher.
  • the second interleaving scheme may be a scheme of interleaving between different memory bank groups, that is, inter-BG interleaving or simply BG interleaving, and this scheme consumes relatively lower power.
  • the data accessed by the multiple subcommands can be simultaneously output or output from multiple different memory bank groups. Simultaneously write access to multiple different memory bank groups, the bandwidth usage efficiency of the memory chip is high; when the access addresses of the split multiple subcommands are used to access different memories in the same memory bank group in the first storage area
  • the memory bank group accessed by multiple subcommands is a single memory bank group. Even if multiple subcommands access the same memory bank in a single memory bank group, the power consumption of the memory chip is lower and the energy efficiency is higher.
  • the first interleaving scheme accesses the same memory bank in the same memory bank group
  • the addresses of the access commands are interleaved (mapped) into the addresses of the same memory bank in the same memory bank group.
  • the first access command in this application may be a write command or a read command. It can be understood that, referring to FIG. 8 , if the first access command is a read command, after the controller 81 reads data from the memory chip, the controller 81 also needs to combine the data read back by the subcommand and return it to the CPU.
  • the first access command sent by the controller 81 will be split into a plurality of subcommands, and the plurality of subcommands will be sent to the interleaver 82.
  • Multiple subcommands respectively carry the split access addresses, and each subcommand carries different access addresses.
  • the interleaver 82 receives multiple subcommands sent by the controller 81, it will interleave the addresses carried by the multiple subcommands to obtain the interleaved access address, and the interleaved access address will be sent to the memory chip, and the memory chip will then follow the The interleaved access addresses perform write access or read access.
  • the process of interleaving addresses carried by multiple subcommands can be understood as a process of mapping the access addresses carried by each subcommand into access addresses identifiable by the memory chip. This is because the access address carried by each subcommand can be understood as: each subcommand includes multiple bits for indicating the access address of each subcommand.
  • the storage area of the memory chip to be accessed includes multiple memory bank groups, each memory bank group includes multiple memory banks, and each memory bank is further divided into multiple rows and columns. Therefore, the process of address mapping is the process of determining the internal memory bank group, the memory bank, and the row and column according to the plurality of bits included in the subcommand.
  • the address information of the access address needs to be transmitted through the address line, and the address line can also be understood as a wire for transmitting address information.
  • Each of the plurality of bits included in the subcommand corresponds to an address line of the access address. Therefore, the address lines corresponding to each bit included in the subcommand include an address line indicating a bank group, an address line indicating a memory bank, and an address line indicating a row and a column. Any address line can transmit a high level or a low level, that is, each address line can transmit a binary address bit value of 0 or 1.
  • the burst length BL supported by the memory chip is 2 M-1 units
  • the amount of data accessed by each subcommand after splitting needs to be adapted to the BL supported by the memory chip, that is, the split
  • the amount of data (data length) accessed by each subcommand is 2 M-1 units, where M is an integer greater than 1.
  • the unit here can be Byte.
  • the length of the data accessed by the first access command is (2 ⁇ 2 M-1 ) Byte, and if the first access command is split into 2 subcommands, the length of data accessed by each subcommand is 2 M-1 Byte; for example,
  • the length of the data accessed by the first access command may be small, for example, 64 Bytes, that is, (2 ⁇ 2 5 ) Bytes.
  • the controller 81 may be configured to split the first access command into two subcommands, and the two subcommands access different memory banks through the second interleaving scheme.
  • the interleaver 82 can be used to directly interleave the access address of the first access command by using the first interleaving scheme to obtain the interleaved access address, which also omits the resource consumption caused by command splitting, and the access energy efficiency is higher.
  • the controller 81 of the present application can split the first access command into a greater number of subcommands, so that when performing address interleaving, a greater number of subcommands Commands are more efficient in bandwidth usage when accessing different banks of memory at the same time. Therefore, in the present application, when the first access command accesses the second storage area, the first access command is split into X subcommands, such as 2 subcommands; when the first access command accesses the second storage area, the first access command is divided into X subcommands, such as 2 subcommands; An access command is split into Y subcommands; where Y is greater than X, and X and Y are integers greater than one.
  • the number of subcommands split when the first access command accesses the second storage area is greater than the number of subcommands split when the first access command accesses the first storage area. That is, when the first access command accesses the second storage area, the accessed data length is (4 ⁇ 2 M-1 ) Byte, and if the first access command is split into 4 subcommands, the data length accessed by each subcommand is 2 M-1 Byte.
  • the data length accessed by the first access command is 128Byte
  • the first access command is divided into 4 subcommands
  • the data length accessed by each subcommand is 32Byte
  • the 4 subcommands Simultaneous access to different banks of memory is possible.
  • the first access command sent by the memory access device 80 when accessing the memory chip carries the first address and data length of the data length to be accessed, and when the first access command is split into multiple subcommands, each subcommand will also Carry the corresponding first address and data length of the subcommand itself (2 M-1 units).
  • the first address carried by each subcommand is different, and the data length is the same. Therefore, when performing address mapping, the first address carried by each subcommand is mapped separately to determine the memory bank group, memory bank, row and column accessed by each subcommand.
  • the first address of each subcommand can be determined according to the length of the data to be accessed, the first address carried by one subcommand can be the same as the first address carried by the first access command, and the first address carried by other subcommands can be determined for the previous one.
  • the first address of the subcommand is obtained by performing the address offset, and the offset is the address of 2 M-1 units.
  • the first access command carries the first address
  • the data length is (2 ⁇ 2 M-1 ) Byte
  • the first access command is split into 2 subcommands, and the first address carried by one subcommand can be The first address
  • the data length is 2 M-1 Byte
  • the first address carried by another subcommand can be obtained by offsetting the first address by 2 M-1 Byte
  • the data length is also 2 M-1 Byte.
  • the different subcommands can be At the same time, different memory bank groups are accessed, so that the bandwidth usage efficiency of multiple subcommands accessing the memory chip is high. That is to say, when the address information indicated by the address lines corresponding to the first addresses of different subcommands is used to access different memory bank groups, the bandwidth usage efficiency of the memory chip is relatively high. This requires that in the address information indicated by the address lines corresponding to the first addresses of different subcommands, the address information indicating the memory bank group is different.
  • the address information of 2 address lines needs to indicate the 4 memory bank groups.
  • the address information of the 2 address lines is expressed in binary Can include 00, 01, 10, and 11. For example, 00 indicates BG0, 01 indicates BG1, 10 indicates BG2 and 11 indicates BG3.
  • the low-order address lines used to indicate the address lines of the memory bank groups are particularly important.
  • the memory bank groups accessed by different subcommands are different. Which address lines are used to transmit the address information of the memory bank group makes the memory bank groups accessed by different subcommands different, which can be determined according to the length of the data accessed by each subcommand.
  • the access bandwidth of the memory chip is relatively high.
  • the Mth starting from the low bit ie the lowest bit, also called the Least Significant Bit (LSB)
  • Least Significant Bit The bit, together with the Nth bit, may be used to indicate the bank of memory accessed by each subcommand; N is an integer greater than M.
  • the Mth bit from the low order in the multiple bits is the low order in the address information indicating the memory bank group
  • the Nth bit from the low order in the plurality of bits is the address information indicating the memory bank group.
  • the high bit ie the highest bit, also called the most significant bit (Most Significant Bit, MSB)). It has been explained above that the first address of a subcommand in each subcommand after splitting is the same as the first address of the first access command, and the second subcommand can be obtained by offsetting the first address of the subcommand by 2 M-1 units.
  • the first address of the command the address information of the M-th bit from the low-order bit in the multiple bits of the subcommand is different from the address information of the M-th bit from the low-order bit in the multiple bits of the next subcommand, such as 0 and 1.
  • the address information of the M-th bit corresponding to different subcommands changes, it means that the memory bank group accessed by the indicated subcommands in the multiple bits of different subcommands is changed, which further means that the subcommand after the first access command is split Commands can access different memory bank groups.
  • the bandwidth for accessing the second memory region is relatively high, and the Mth bit from the lower bit in the multiple bits in the subcommand is used to indicate the lower bit of the accessed memory bank group.
  • the Mth bit from the lower bit in the multiple bits in the subcommand is used to indicate the lower bit of the accessed memory bank group.
  • it is used to indicate that the high bit of the accessed memory bank group, that is, the Nth bit, is higher than the Mth bit.
  • the first address of the first subcommand is the same as the first address of the first access command
  • the multiple bits in the first subcommand start from the low-order bit.
  • the address information of the Mth bit is 0, and the first address of the second subcommand is obtained by offsetting the first address of the first subcommand by 2 M-1 units, then the multiple bits in the second subcommand
  • the address information of the Mth bit from the lower order becomes 1, and the memory bank group accessed by the second subcommand is different from the memory bank group accessed by the first subcommand.
  • each memory bank group includes 4 memory banks, so the address information of the memory bank can also be transmitted by using the address line corresponding to 2 bits in multiple bits.
  • the memory bank includes bank0, bank1, bank2 and bank3, and 2-bit address lines are used to transmit the address information of each bank, for example, the address information of the bank is 00, 01, 10 and 11. Therefore, in the second interleaving scheme of the present application, as shown in FIG. 12 , on the basis of FIG.
  • the R-th bit and the S-th bit starting from the low-order bits in the multiple bits included in each subcommand are used for Indicates the memory bank accessed by each subcommand, R is an integer greater than N, S is an integer greater than R, that is, the R-th bit is the low-order bit of the address information indicating the memory bank, and the S-th bit is the high-order bit of the address information indicating the memory bank . That is, in the second interleaving scheme of the present application, the bit indicating the memory bank is higher than the bit indicating the memory bank group.
  • the number of the above memory bank groups and the number of memory banks are both 4, so the address information used to indicate the memory bank group and the address information used to indicate the memory bank are both 2-bit information.
  • the number of memory bank groups and the number of memory banks can be set to other values, for example, the number of either the memory bank group or the memory bank is 8 or 16, etc. Therefore, the address information for indicating the memory bank group and the address information for indicating the memory bank can also be represented by more or less bits of information.
  • the first interleaving scheme and the second interleaving scheme are essentially address mapping processes.
  • the address lines used are 31-bit address lines, with a0, a1, a2, ..., a30 Indicate the 31-bit address lines, each address line corresponds to a binary bit, the subcommand includes multiple bits, and the 31-bit address line can be controlled to output a high level or a low level according to the bit value of the multiple bits of the subcommand.
  • BG including BG0, BG1, BG2, and BG3
  • each BG includes bank0, bank1, bank2, and bank3
  • the behavior Row and the column as Col as an example
  • the 7th bit (a6) and the 13th bit (a12) from the lower order of the multiple bits are used to indicate the BG accessed by each subcommand
  • the 14th bit (a13) and the 15th bit (a14) are used to indicate the bank in the BG.
  • a recommended mapping of the mapping relationship between the 31-bit address lines and BG, bank, and row and column may be as shown in FIG. 13 .
  • the address mapping is not performed on the a0 address line starting from the low order, and the address mapping is performed starting from the a1 address line.
  • the a0 address line does not participate in address mapping
  • the a6 address line and the a12 address line are the address lines for mapping BG
  • the a13 address line and the a14 address line are The address lines of the bank are mapped, and the remaining address lines are address lines for mapping rows and columns.
  • the first address carried by the first access command is 10000000 in hexadecimal, and the data length is 128Byte, as shown in Figure 14, assuming that the channel accessed by the first access command is channel 0,
  • the first address of one subcommand 1 is 10000000 in hexadecimal
  • the first address after conversion to binary is 100000000000000000000000
  • the data length is 64Btye
  • the first address of another subcommand 2 is 10000040 in hexadecimal.
  • the first address is 10000000000000000000001000000, and the data length is 64Byte.
  • the value of BG0 bit is 0, and the value of BG1 bit is 0, then the BG accessed by subcommand 1 is BG0; according to the first address of subcommand 2 and Figure 13
  • the mapping relationship provided by 13 can be obtained, the value of BG0 bit is 1, the value of BG1 bit is 0, the BG accessed by subcommand 2 is BG1, the BG accessed by subcommand 1 and subcommand 2 are different, and the memory chip can be stored in the second memory at the same time.
  • Data accessed by subcommand 1 and subcommand 2 are processed in BGs with different regions, and the bandwidth usage efficiency is high.
  • the 6th bit (a5) and the 13th bit (a12) are used to indicate the BG accessed by each subcommand, and the 14th bit (a13) and the 15th bit (a14) are used to indicate the bank in the BG.
  • a recommended mapping of the mapping relationship between the 31-bit address lines and BG, bank, and row and column may be shown in FIG. 15 .
  • the first address carried by the first access command is 10000000 in hexadecimal, and the data length is 128 Bytes, as shown in Figure 16.
  • the channel accessed by the first access command is channel 0, among the four split subcommands, the subcommand
  • the first address of command 1 is 10000000 in hexadecimal.
  • the first address is 10000000000000000000000000000
  • the data length is 32Btye
  • the first address of subcommand 2 is 10000020 in hexadecimal.
  • the value of BG0 bit is 0, and the value of BG1 bit is 0, then the BG accessed by subcommand 1 is BG0 (address information is 00); according to the subcommand
  • the first address of 2 and the mapping relationship provided in Figure 15 can be obtained, the value of the BG0 bit is 1, the value of the BG1 bit is 0, and the BG accessed by the subcommand 2 is BG1 (address information is 01); according to the first address of the subcommand 3
  • the mapping relationship provided by Figure 15 can be obtained, the value of the BG0 bit is 0, the value of the BG1 bit is 0, and the BG accessed by the subcommand 3 is BG0 (address information is 00); according to the first address of the subcommand 4 and Figure 15 provide
  • the mapping relationship can be obtained, the value of the BG0 bit is 1, the value of the BG1 bit is 0, and the BG accessed by the subcommand 4 is BG1 (the address information is 01). That is, sub
  • the amount of data (data length) accessed by each subcommand is 2 M-1 units, and the memory bank group is indicated by 2 bits, in the first interleaving scheme of the present application, in order to achieve access to the first When there is a storage area, the power consumption of the memory chip is low, and the access energy efficiency is high.
  • the P-th bit and the Q-th bit starting from the low bit in the multiple bits of the subcommand are used to jointly indicate the location of each subcommand.
  • the memory bank group to be accessed where P is an integer greater than M, and Q is an integer greater than P.
  • the P-th bit from the low-order bit in the multiple bits of the subcommand is used to indicate the low-order bit in the 2-bit of the memory bank group accessed by the sub-command
  • the Q-th bit from the low-order bit in the multiple bits is used to indicate the sub-command. The higher of the 2 bits of the bank group accessed by the command.
  • the lower bits of the previous subcommand will be The address information of the M-th bit at the beginning and the address information of the M-th bit starting from the lower order of the multiple bits of the subsequent subcommand change.
  • the access addresses of the access commands before the split are all continuous, and in a very long Within a range of addresses, in the subcommands after the access command is split, the address information of the Pth bit from the lower bit in the multiple bits of the previous subcommand is the same as the address information of the Pth bit from the lower bit in the multiple bits of the next subcommand.
  • the address information of the P bits are all the same, that is, the change of the address information of the P-th bit will only occur after accumulating access to the data amount of (2 P-1 )kb. That is to say, the memory bank groups accessed by the indicated subcommands in multiple bits of different subcommands are all the same within this address range, then when the access address of the first access command is within this continuous address range , the subcommands after the first access command is split also all access the same memory bank group. Therefore, in order to implement the first interleaving scheme in the present application, the power consumption of accessing the first memory region is low, and the energy efficiency is high.
  • the multiple bits included in each subcommand start from the low-order bit.
  • the J-th and K-th bits are used to indicate the memory bank accessed by each subcommand, where J is an integer greater than Q, and K is an integer greater than J. That is, in the first interleaving scheme of the present application, the bit indicating the memory bank is higher than the bit indicating the memory bank group.
  • each subcommand indicated by the Jth bit starting from the lower bit and the Kth bit in the multiple bits in the multiple subcommands can also be the same.
  • multiple subcommands access the same memory bank in the same memory bank group, fewer memory banks are activated, the power consumption is lower, and the energy efficiency is higher.
  • the memory bank group is BG (including BG0, BG1, BG2 and BG3)
  • the memory bank is bank (each BG includes bank0, bank1, bank2 and bank3)
  • the 8th bit (a7) and the 13th bit (a12) from the low order in the multiple bits of the subcommand are used to indicate the BG accessed by each subcommand, the 14th bit (a13) and the 15th bit.
  • Bit (a14) is used to indicate the bank within the BG.
  • a recommended mapping of the mapping relationship between the 31-bit address lines and BG, bank, and row and column may be as shown in FIG. 19 .
  • the a11 address line and the a12 address line are address lines for mapping BG, BG0 and BG1 in the first interleaving scheme in FIG.
  • the a12 address line corresponds to BG1
  • the a11 address line is the low-order address line indicating BG
  • the a12 address line is the high-order address line indicating BG, that is, the BG0 bit is the low-order BG indicating the access
  • the BG1 bit is indicating the visit. BG's high.
  • the bit value of the BG0 and BG1 bits in the subcommand is 00, it means that the BG accessed by the subcommand is BG0, the a11 address line outputs a low level, and the a12 address line outputs a low level; when the BG0 and BG1 bits in the subcommand
  • the bit value of the bit is 01, it means that the BG accessed by the subcommand is BG1, the a11 address line outputs a high level, and the a12 address line outputs a low level; when the bit value of the BG0 and BG1 bits in the subcommand is 10, It means that the BG accessed by the subcommand is BG2, the address line a11 outputs a low level, and the address line a12 outputs a high level; when the bit value of the BG0 and BG1 bits in the subcommand is 11, it means that the BG accessed by the subcommand is BG3, a11 address line output high level, a12 address line output high level.
  • the a13 address line (abbreviated in Figure 19) and a14 address line in Figure 19 are the address lines for mapping the bank, and the BA0 and BA1 bits in Figure 19 indicate the bank.
  • BA1 bit and BA0 bit When the value of BA1 bit and BA0 bit is 00, it means that the subcommand accesses bank0, and when the value of BA1 bit and BA0 bit is 01, it means that the subcommand accesses bank1, BA1 bit and BA0 bit.
  • the value is 10
  • the subcommand accesses bank2 and when the BA1 and BA0 bits are 11, it means that the subcommand accesses bank3.
  • Bits other than BG and bank are mapped in FIG. 19 to indicate rows and columns.
  • the Row15:0 bits in FIG. 19 are bits that indicate rows
  • the Col9:6, Col5, and Col4:0 bits are bits that indicate columns.
  • the first address carried by the first access command is 10000000 in hexadecimal, and the data length is 128 Bytes.
  • the first address of a subcommand 1 is 10000000 in hexadecimal, the first address after conversion to binary is 10000000000000000000000000000, and the data length is 64Btye;
  • the first address of another subcommand 2 is 10000040 in hexadecimal, and the first address after conversion to binary It is 10000000000000000000001000000, and the data length is 64Byte.
  • the value of BG0 bit is 0, and the value of BG1 bit is 0, then the BG accessed by subcommand 1 is BG0;
  • the mapping relationship provided by 19 can be obtained, the value of the BG0 bit is 0, the value of the BG1 bit is 0, the BG accessed by subcommand 2 is also BG0, and the BG accessed by subcommand 1 and subcommand 2 are the same.
  • subcommand 1 and subcommand 2 the value of the BA0 bit is 0, the value of BA1 is 0, and the banks accessed by subcommand 1 and subcommand 2 are also the same.
  • the split subcommands are accessing the same BG, or even the same bank of the same BG.
  • This setting can make the number of banks accessed in the first storage area in the memory chip less, and the power consumption of the memory chip is lower. Higher energy efficiency.
  • the first interleaving scheme and the second interleaving scheme it can be seen that different address interleaving schemes (or address mapping schemes) are adopted in this application, so that the multiple subcommands after the splitting of the first access command can access the
  • the memory bank groups are the same or different.
  • the access bandwidth of the memory chips is higher and the access efficiency is higher; when the memory bank groups are the same, the access power consumption of the memory chips is lower and the energy efficiency is higher.
  • the BG interleaving in the prior art is to access different BGs by piecing together the rearrangement logic to sort the order of the access commands.
  • the BG interleaving may be implemented by mapping the address of the access command to the BG according to the corresponding interleaving scheme.
  • the BG interleaving is to map the addresses of the access commands to different BGs through the second interleaving scheme, so as to access data from different BGs.
  • BG interleaving can be understood as the process of accessing different BGs, and the bandwidth of accessing memory chips is high.
  • the BG interleaving is to map the address of the access command to the same BG, or even the same bank in the same BG, through the first interleaving scheme.
  • BG interleaving can be understood as the process of accessing the same BG, and the power consumption of the memory chip is low at this time.
  • interleaving in this application can be understood as address mapping, for example, in this application, the address of an access command is mapped to an address of a different BG, or to an address of the same BG, or even an address process of the same bank of the same BG.
  • the memory access device 80 provided by the present application will be described below.
  • the following further describes the memory access device 80 provided by the present application.
  • FIG. 21 another memory access method is provided for the present application. Schematic diagram of device 80 .
  • the above-mentioned memory access device 80 may be a chip A new device added to the DDR controller inside.
  • the DDR controller may be, for example, a device within a chip such as an SoC or a GPU. Different from the existing SoC or GPU chip, the present application improves the hardware circuit in the DDR controller in the SoC or GPU chip, that is, a memory access device 80 is added.
  • the SoC includes a CPU, a bus, a cache, a power management unit, a first-in first-out queue cache, and a DDR controller.
  • the DDR controller includes a buffer circuit 83, a PHY 84 and the above-mentioned memory access device 80, that is, a controller 81 and an interleaver 82.
  • the buffer circuit 83 can be used to cache the access commands from the CPU in the SoC, because before accessing the data of the memory chip, the access commands sent by the CPU are sent to the DDR controller in advance and cached in the buffer circuit 83. middle. Then, the controller 81 in the memory access device 80 applies for a read access command from the buffer circuit 83, and the buffer circuit 83 can send the access command to access the memory chip to the controller 81 according to the address sequence of the received access commands.
  • the access command carries the access address and data length of the data to be accessed.
  • the access command may be a command for a read operation or a command for a write operation.
  • the controller 81 can be used to implement the above-mentioned processes of determining the storage area and dividing the command for the access command after reading the access command from the buffer circuit 83 .
  • the interleaver 82 is configured to implement the above-mentioned processes of interleaving the addresses of the subcommands and sending the subcommands.
  • the PHY 84 is used to drive the sending of subcommands, so as to send information such as the address information and data length of the subcommands to the memory chip, so as to realize the read operation or write operation to the memory chip.
  • the controller 81 may include a partition determination circuit 811 and a command splitting circuit 812
  • the interleaver 82 may include a first interleaving circuit 813 , a second interleaving circuit 814 and a command sending circuit 815 .
  • the partition judgment circuit 81 is used to determine the first storage area or the second storage area in the memory chip accessed by the first access command according to the access address of the first access command, the first storage area and the second storage area do not overlap, and The memory area to be accessed by the first access command is indicated to the command splitting circuit 812 .
  • the partition determination circuit 811 in the controller 81 may store the address range corresponding to the first storage area and the address range corresponding to the second storage area. When the partition determination circuit 811 receives the first access command, Whether the first access command is used to access the first storage area or the second storage area can be determined according to the address ranges corresponding to the two storage areas.
  • the partition determination circuit 811 may be configured to send the first access command and an indication that the first access command is used to access the first storage area to the command splitting circuit 812; if determined If the first access command is used to access the second storage area, the partition determination circuit 811 may be configured to send the first access command and an indication that the first access command is used to access the second storage area to the command splitting circuit 812 .
  • the command splitting circuit 812 is configured to split the first access command according to the instruction of the partition judging circuit 811 and the command splitting configuration to obtain multiple subcommands, and send the multiple subcommands to the first interleaving circuit 813 or the second interleaving circuit 813 Interleaving circuit 814.
  • the command splitting circuit 812 may determine how many subcommands to split the first access command into according to the burst length BL supported by the memory chip. For example, the BL supported by the memory chip is 32 Bytes, and the length of data accessed by the first access command is 64 Bytes. If the first access command is used to access the second storage area, the command splitting circuit 812 can be used to split the first access command. It consists of 2 subcommands. The length of data accessed by each subcommand is 32 Bytes. The second interleaving scheme can be used to perform address interleaving on the 2 subcommands, so that these 2 subcommands can be used to access different memory bank groups and improve the bandwidth of the second storage area. Use efficiency; if the first access command is used to access the first storage area, the command splitting circuit 812 may also not split the first access command to reduce the power consumption of accessing the memory chip and improve energy efficiency.
  • the command splitting circuit 812 splits the first access command, it also needs to determine the access address of each subcommand according to the access address of the first access command, specifically the first address and each subcommand carried by the first access command.
  • the length of data accessed by the command determines the first address carried by each subcommand.
  • the first address carried by one of the subcommands is the same as the first address carried by the first access command.
  • the command splitting circuit 812 sends the split subcommand to the first interleaving circuit 813; If the indication of the storage area to be accessed by the first access command is used to indicate that the first access command is used to access the second storage area, the command splitting circuit 812 sends the split subcommand to the second interleaving circuit 814 .
  • the first interleaving circuit 813 is used to interleave the addresses of the subcommands after the first access command is split according to the first interleaving scheme to obtain the interleaved access addresses; the interleaved access addresses are used to access the first storage of the memory chip. area.
  • the second interleaving circuit 814 is configured to interleave the addresses of the subcommands after the splitting of the first access command according to the second interleaving scheme to obtain the interleaved access addresses; the interleaved access addresses are used to access the second storage of the memory chip area.
  • the command sending circuit 815 can be used to send the interleaved access address and data length corresponding to the subcommand to the PHY 84, so that the PHY 84 is used to drive the subcommand and send it to the memory chip.
  • the memory chip determines the interleaved access address and data length corresponding to the subcommand
  • the memory chip can determine the BG and bank to be accessed according to the interleaved access address corresponding to the subcommand, that is, the first address of the data to be accessed by the subcommand. , row and column, and then read or write data from the storage area determined by the row and column according to the data length accessed by the subcommand, so as to complete the interaction with the memory access device 80 .
  • the controller 81 further includes a data splicing circuit 816 .
  • the partition judgment circuit 811 determines the storage area accessed by the first access command, it can also be used to send the first access command to the data splicing circuit 816 to access the data length of the first storage area or the second storage area.
  • the data splicing circuit 816 When the read-back data is received, the read-back data may carry a label that the data belongs to the first storage area or the second storage area, and then the data splicing circuit 816 can splicing the data with the same label as required by the first access command. The accessed data length is returned to the CPU.
  • the data splicing circuit 816 Since the read back data is for the split subcommand, the data length is shorter than the data length required to be read by the first access command, so the data splicing circuit 816 needs to be introduced. It can be understood that the function of the data splicing circuit 816 Instead of being integrated in the DDR controller, it may be integrated in the CPU, for example, it exists as a software module executed by the CPU, which is not limited in this embodiment.
  • the principle is to patch and rearrange access commands to improve the probability of satisfying BG interleaving.
  • the bus bandwidth utilization efficiency of the memory chip in the prior art can reach 100%; for the random access mode within the Page (access addresses accessing the same Page are discontinuous) ), the bus bandwidth usage efficiency of the prior art is: 1-patch failure rate ⁇ 50%, the bus bandwidth usage efficiency of the memory chip is relatively low; for the cross-row (Row) access mode, the bus bandwidth usage efficiency of the prior art memory chip somewhere between 75% and 100%. If the required access bandwidths are all small bandwidths, the BG of the prior art leads to high power consumption and low access energy efficiency.
  • the bus bandwidth usage efficiency of the memory chip is 100%; the random access mode in the Page is 100%.
  • the bus bandwidth usage efficiency can be: 1-64Byte command share ⁇ patchwork failure rate ⁇ 50%.
  • the bus bandwidth usage efficiency of the first solution is improved; When the row delay (Row to Row Delay, Trrd) is 5ns), the bus bandwidth usage efficiency of the memory chip is about 75%.
  • the third scheme of the present application can make In a scenario where the bandwidth for accessing the first storage area is small, centralized access to a single BG, that is, centralized access to the same BG, can reduce the power consumption of the memory chip and improve the access energy efficiency.
  • the access command for accessing 128Byte is split into four subcommands for accessing 32Byte, or the access command for accessing 64Byte is split into two subcommands for accessing 32Byte, and the second interleaving scheme is adopted.
  • the BG interleaving will be satisfied when the subcommand accesses the BG.
  • the bus bandwidth usage efficiency of the memory chip is 100%; the random access mode in the Page is 100%.
  • the bus bandwidth usage efficiency can be 100%.
  • the bus bandwidth usage efficiency under the second solution is significantly improved; for the cross-Row access mode (when the Trrd is 5ns), the bus bandwidth usage of the memory chip Efficiency is around 75%.
  • the second solution of the present application can enable centralized access to a single BG in a scenario where the bandwidth for accessing the first storage area is small, that is, centralized access to the same BG, which can reduce the power consumption of the memory chip and improve the access energy efficiency.
  • Embodiments of the present application further provide a communication chip, where the communication chip includes the memory access device described in the embodiments of the present application.
  • the communication chip may be a chip such as an SoC or a GPU.
  • An embodiment of the present application further provides an electronic device. As shown in FIG. 23 , the electronic device includes the communication chip described in the embodiment of the present application, and the communication chip includes the memory access device 80 provided by the present application.
  • Embodiments of the present application further provide a computer-readable storage medium, including computer instructions, which, when the computer instructions are executed on the electronic device, cause the electronic device to execute the method described in the foregoing memory access method.
  • Embodiments of the present application further provide a computer program product, which, when the computer program product runs on a computer, enables an electronic device to execute the method described in the foregoing memory access method.
  • the disclosed apparatus and method may be implemented in other manners.
  • the device embodiments described above are only illustrative.
  • the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be Incorporation may either be integrated into another device, or some features may be omitted, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may be one physical unit or multiple physical units, that is, they may be located in one place, or may be distributed to multiple different places . Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units may be implemented in the form of hardware, and may also be implemented at least partially in the form of software functional units.
  • a unit can be stored in a readable storage medium if it is implemented in the form of a software functional unit and sold or used as an independent product.
  • the technical solutions of the embodiments of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, which are stored in a storage medium , including several instructions to make a device (may be a single chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read only memory (ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System (AREA)

Abstract

The present application discloses a method and apparatus for accessing a memory, relates to the technical field of memory access, and can improve the bandwidth usage efficiency of a memory chip. The memory access apparatus comprises: a controller, which is used to determine, according to the access address of a first access command, a first storage area or second storage area in a memory chip that is accessed by the first access command, the first storage area not overlapping with the second storage area; and an interleaver, which is used when the controller determines that the first access command accesses the first storage area, to interleave the address of the first access command according to a first interleaving solution so as to obtain an interleaved access address; and when the controller determines that the first access command accesses the second storage area, interleave the address of the first access command according to a second interleaving solution so as to obtain an interleaved access address, the interleaved access address being used to access the memory chip. The embodiments of the present application are used to access a memory.

Description

一种访问内存的方法和装置A method and apparatus for accessing memory 技术领域technical field
本申请涉及内存访问技术领域,尤其涉及一种访问内存的方法和装置。The present application relates to the technical field of memory access, and in particular, to a method and apparatus for accessing memory.
背景技术Background technique
目前,内存标准中定义的低功耗双倍数据速率(low power double data rate,LPDDR)5这种内存芯片的读写访问最高速率等级提升至5500Mbps,但由于技术原因,各厂家的LPDDR5的存储介质访问速率并没有大幅提升,因此,联合电子设备工程委员会(Joint Electron Device Engineering Council,JEDEC)定义了3种访问内存芯片的模式(mode):8B、16B以及BG,对高速访问增加了访问约束条件,可实现仅在特殊条件下达成满带宽访问速率。At present, the low power double data rate (LPDDR) 5 memory chip defined in the memory standard has a maximum read and write access rate of 5500Mbps. However, due to technical reasons, the storage of LPDDR5 by various manufacturers The medium access rate has not been greatly improved. Therefore, the Joint Electron Device Engineering Council (JEDEC) defines three modes of accessing memory chips: 8B, 16B and BG, which increase access constraints for high-speed access. Condition, can achieve full bandwidth access rate only under special conditions.
其中,在BG模式下,当内存芯片外部的访问速率大于3200Mbps时,对于访问时突发长度(burst length)=32Byte的情况,在内存芯片外部总线持续时间内,内存芯片内部的内存库(bank)只能完成1次访问,即访问128bit,内存芯片内部的带宽只有外部总线带宽的一半。为了解决访问内存芯片时的内部带宽小于内存芯片外部总线带宽的问题,JEDEC标准定义了连续访问的约束条件,即推荐连续访问内存芯片的命令命中在不同的bank组(bank group,BG)上。即***级芯片(System on Chip,SoC)在访问内存芯片时,相邻的访问命令满足BG交织(interleave)的约束。BG交织是在不同的BG之间做交织。但是,在实际情况中,BG交织的约束对SoC访问内存芯片的地址顺序存在严格要求,可能导致大部分应用无法获得该内存芯片的最高访问速率。Among them, in BG mode, when the access rate outside the memory chip is greater than 3200Mbps, for the case of burst length = 32Byte during access, within the duration of the external bus of the memory chip, the memory bank inside the memory chip (bank ) can only complete one access, that is, access 128bit, and the internal bandwidth of the memory chip is only half of the external bus bandwidth. In order to solve the problem that the internal bandwidth when accessing the memory chip is smaller than the external bus bandwidth of the memory chip, the JEDEC standard defines the constraints of continuous access, that is, it is recommended that the commands for continuous access to the memory chip hit different bank groups (BG). That is, when a System on Chip (SoC) accesses a memory chip, adjacent access commands satisfy the constraint of BG interleave. BG interleaving is interleaving between different BGs. However, in practical situations, the constraint of BG interleaving imposes strict requirements on the address sequence of the SoC accessing the memory chip, which may cause most applications to fail to obtain the highest access rate of the memory chip.
目前,如图1所示,DDR控制器和端口物理层(Port Physical Layer,PHY)集成在SoC中,内存芯片包括多个通道(Channel),每个Channel包括4×4个bank。在DDR控制器中集成有拼凑控制逻辑,该逻辑例如用于将时间上后发送的访问命令的发送顺序提前,跟时间靠前的访问命令拼凑在一起连续发送,或者将时间上先发送的访问命令的发送顺序延后,跟时间靠后的访问命令拼凑在一起连续发送,在BG模式下,该拼凑控制逻辑可以对DDR控制器的缓存区(buffer)中存储的访问命令的访问顺序进行重排,以将访问命令按照重排后的顺序(例如图1中访问BG1的3个访问命令拼凑在一起)通过PHY发送给内存芯片,重排后的顺序可以尽可能的使得访问命令满足BG交织的约束,以提高访问符合BG交织的概率。At present, as shown in Figure 1, a DDR controller and a port physical layer (Port Physical Layer, PHY) are integrated in the SoC, and the memory chip includes multiple channels (Channels), and each channel includes 4 × 4 banks. A patchwork control logic is integrated in the DDR controller, for example, the logic is used to advance the sending order of the access commands sent later in time, patch together with the access commands earlier in time and send them continuously, or send the access commands sent earlier in time. The sending order of the command is delayed, and it is pieced together with the access command later in time and sent continuously. In BG mode, the pieced control logic can re-order the access command stored in the buffer of the DDR controller. Arrange the access commands in the rearranged order (for example, the three access commands that access BG1 in Figure 1 are pieced together) and sent to the memory chip through the PHY. The rearranged order can make the access commands satisfy the BG interleaving as much as possible. constraints to improve the probability of access conforming to BG interleaving.
但是,在实际场景中,由于不同的业务下,访问不同BG的访问命令在数量上是不均衡的,而且数量可能差异较大,因此很难做到有效的BG交织。这是因为实际应用中不是所有场景都需要对访问命令进行BG之间的交织,如果在低功耗需求的场景下,例如非游戏负载场景下继续按照拼凑控制逻辑实现BG交织会造成内存芯片的功耗浪费,功耗性能不够优化。However, in actual scenarios, due to the uneven number of access commands to access different BGs under different services, and the number may vary greatly, it is difficult to achieve effective BG interleaving. This is because in practical applications, not all scenarios require interleaving between BGs for access commands. In scenarios with low power consumption requirements, such as non-game load scenarios, continuing to implement BG interleaving according to the patchwork control logic will cause memory chips to fail. The power consumption is wasted, and the power consumption performance is not optimized enough.
发明内容SUMMARY OF THE INVENTION
本申请实施例提供一种访问内存的方法和装置,能够提高内存芯片的功耗性能。为达到上述目的,本申请实施例采用如下技术方案。Embodiments of the present application provide a method and apparatus for accessing memory, which can improve the power consumption performance of a memory chip. In order to achieve the above purpose, the following technical solutions are adopted in the embodiments of the present application.
第一方面,提供一种内存访问装置,内存访问装置包括:控制器,用于根据第一访问命令的访问地址确定第一访问命令访问内存芯片中的第一存储区域或第二存储区域;第一存储区域与第二存储区域不重叠;交织器,用于在控制器确定第一访问命令访问第一存储区域时,根据第一交织方案对第一访问命令的地址进行交织以得到交织后的访问地址;且在控制器确定第一访问命令访问第二存储区域时,根据第二交织方案对第一访问命令的地址进行交织以得到交织后的访问地址,交织后的访问地址用于访问内存芯片。In a first aspect, a memory access device is provided. The memory access device includes: a controller configured to determine that a first access command accesses a first storage area or a second storage area in a memory chip according to an access address of the first access command; A storage area does not overlap with the second storage area; the interleaver is used to interleave the address of the first access command according to the first interleaving scheme when the controller determines that the first access command accesses the first storage area to obtain an interleaved access address; and when the controller determines that the first access command accesses the second storage area, the address of the first access command is interleaved according to the second interleaving scheme to obtain an interleaved access address, and the interleaved access address is used to access the memory chip.
由此,在本申请将内存芯片划分不同的存储区域的情况下,申请提供的内存访问装置针对多个不同的存储区域,可以采用不同的交织方案对访问命令的地址进行交织以得到交织后的访问地址,即访问不同的存储区域采用的交织方案不同。可以理解,不同的交织方案,可以使得访问命令访问内存芯片的特定内存库组(例如BG)不同,例如访问命令访问内存芯片的内存库组的数量不同,进而导致内存芯片的功耗不同。例如,对于功耗需求较低的业务场景,可以采用第一交织方案使得第一访问命令访问第一存储区域中较少的内存库组;对于功耗需求较高的业务场景,可以采用第二交织方案访问第二存储区域中较多的内存库组。由此,本申请提供的内存访问装置,可以使得内存装置在访问内存芯片时,可以根据功耗需求不同的业务场景,针对业务场景对应的存储区域使用不同的交织方案进行访问,可以优化不同的应用场景情况下内存芯片的功耗。Therefore, in the case where the application divides the memory chip into different storage areas, the memory access device provided by the application can use different interleaving schemes to interleave the addresses of the access commands for multiple different storage areas to obtain the interleaved The interleaving scheme used for accessing addresses, that is, accessing different storage areas, is different. It can be understood that different interleaving schemes can make the specific memory bank groups (eg BG) of the memory chip accessed by the access command different, for example, the number of memory bank groups accessed by the access command to the memory chip is different, thereby resulting in different power consumption of the memory chip. For example, for a business scenario with a low power consumption requirement, the first interleaving scheme can be used to enable the first access command to access fewer memory bank groups in the first storage area; for a business scenario with a high power consumption requirement, the second interleaving scheme can be adopted. The interleaving scheme accesses more banks of memory in the second storage area. Therefore, the memory access device provided by the present application can enable the memory device to use different interleaving schemes to access the storage area corresponding to the business scenario according to different business scenarios with different power consumption requirements when accessing the memory chip, and can optimize different The power consumption of the memory chip in the application scenario.
在一种可能的设计中,访问第二存储区域中第二数据所需要的第二带宽高于访问第一存储区域中第一数据所需要的第一带宽。这样,在第一访问命令访问第一存储区域时,采用第一交织方案进行地址交织时,可以使得内存芯片的访问带宽较低,功耗较低,能效较高;在第二访问命令访问第二存储区域时,采用第二交织方案进行地址交织,可以使得内存芯片的访问带宽较高,针对使得内存芯片内部的访问带宽(实际使用的带宽)较低,不能支持外内存芯片外部总线的访问带宽的问题,本申请在访问第二存储区域时,采用的第二交织方案可以使得芯片内部的访问带宽较高,内存芯片内部的访问带宽可以支持外内存芯片外部总线的访问带宽,带宽使用效率较高。In a possible design, the second bandwidth required to access the second data in the second storage area is higher than the first bandwidth required to access the first data in the first storage area. In this way, when the first access command accesses the first storage area, when the first interleaving scheme is used for address interleaving, the access bandwidth of the memory chip can be lower, the power consumption is lower, and the energy efficiency is higher; When there are two storage areas, the second interleaving scheme is used for address interleaving, which can make the access bandwidth of the memory chip higher. In order to make the access bandwidth (actually used bandwidth) inside the memory chip lower, it cannot support the access of the external bus of the external memory chip. The problem of bandwidth, when accessing the second storage area, the second interleaving scheme adopted in this application can make the access bandwidth inside the chip higher, the access bandwidth inside the memory chip can support the access bandwidth of the external bus of the external memory chip, and the bandwidth usage efficiency higher.
在本申请实施例中,带宽使用效率可以理解为在一次访问内存芯片时,内存芯片在全带宽上一次实际使用的带宽占内存芯片全带宽的比例,或者可以理解为内存芯片一次实际传输的数据量占全带宽上总共可传输的数据量的比例。该比例较小时,说明内存芯片的带宽使用效率较低,带宽存在浪费情况。例如内存芯片在传输小块数据时,该小块数据只占用了内存芯片的全带宽的一部分带宽,另一部分带宽被占用,但是并未被使用,即并未传输数据,浪费了部分带宽。In the embodiment of the present application, the bandwidth usage efficiency can be understood as the ratio of the bandwidth actually used by the memory chip at one time to the full bandwidth of the memory chip when accessing the memory chip once, or it can be understood as the data actually transmitted by the memory chip at one time The proportion of the total amount of data that can be transmitted over the full bandwidth. When the ratio is small, it means that the bandwidth usage efficiency of the memory chip is low, and the bandwidth is wasted. For example, when a memory chip transmits a small piece of data, the small piece of data only occupies a part of the full bandwidth of the memory chip, and another part of the bandwidth is occupied but not used, that is, no data is transmitted, and part of the bandwidth is wasted.
在一种可能的设计中,控制器,还用于将第一访问命令拆分为多个子命令,交织后的访问地址包括多个子命令的访问地址;在第二交织方案中,多个子命令的访问地址分别用于访问第二存储区域中的不同内存库组;不同内存库组中每个内存库组包括多个内存库;在第一交织方案中,多个子命令的访问地址用于访问第一存储区域中的同一内存库组中的不同内存库或同一内存库组中的同一内存库。这样,当采用第二交织方案,使得多个子命令访问第二存储区域中的不同的内存库组(内存库组可以为本申请中的BG)时,相当于多个内存库组同时在处理数据,多个内存库组同时被激活访 问,访问内存芯片的速率较快,使用的带宽也较高。当采用第一交织方案,使得多个子命令访问第一存储区域中的同一内存库组,甚至同一内存库组中的同一内存库,可以相当于只有一个内存库组在处理数据,只有一个内存库组被激活访问,内存芯片的功耗较低,能效较高。In a possible design, the controller is further configured to split the first access command into multiple subcommands, and the interleaved access address includes the access addresses of the multiple subcommands; in the second interleaving scheme, the The access addresses are respectively used to access different memory bank groups in the second storage area; each memory bank group in the different memory bank groups includes multiple memory banks; in the first interleaving scheme, the access addresses of the multiple subcommands are used to access the first memory bank. Different banks in the same bank group in a storage area or the same bank in the same bank group. In this way, when the second interleaving scheme is adopted, so that multiple subcommands access different memory bank groups in the second storage area (the memory bank group may be the BG in this application), it is equivalent to that multiple memory bank groups are processing data at the same time , multiple memory bank groups are activated and accessed at the same time, the speed of accessing the memory chip is faster, and the bandwidth used is also higher. When the first interleaving scheme is adopted, so that multiple subcommands access the same memory bank group in the first storage area, or even the same memory bank in the same memory bank group, it can be equivalent to only one memory bank group processing data, and only one memory bank The group is activated to access, the power consumption of the memory chip is lower, and the energy efficiency is higher.
在一种可能的设计中,多个子命令中的每个子命令包括多个位,用于指示每个子命令的访问地址,多个位中的每个位对应访问地址的一地址线,每个子命令所访问的数据量是2 M-1个单位,M是大于1的整数。可以理解为,如果第一访问命令访问的数据量是2×2 M-1个单位,如果将第一访问命令拆分为2个子命令,每个子命令所访问的数据量是2 M-1个单位,单位可以为Byte。 In a possible design, each subcommand of the plurality of subcommands includes a plurality of bits for indicating an access address of each subcommand, each of the plurality of bits corresponds to an address line of the access address, and each subcommand The amount of data accessed is 2 M-1 units, where M is an integer greater than 1. It can be understood that if the amount of data accessed by the first access command is 2×2 M-1 units, if the first access command is divided into 2 subcommands, the amount of data accessed by each subcommand is 2 M-1 Unit, the unit can be Byte.
在一种可能的设计中,在第二交织方案中,多个位中从低位开始的第M位与第N位用于共同指示每个子命令所访问的内存库组;N是大于M的整数。多个位可以理解为子命令携带的要访问的数据量的首地址对应的多个位。本申请中,在不同子命令的首地址对应的地址线指示的地址信息用于访问不同的内存库组时,内存芯片的带宽使用效率较高。这就需要不同子命令的首地址对应的地址线指示的地址信息中,指示内存库组的地址信息不同。由于拆分后的子命令中,每个子命令的首地址是根据前一个确定的子命令的首地址和每个子命令访问的数据长度确定的,将前一个子命令的首地址偏移了2 M-1个单位得到后一个子命令的首地址时,前一个子命令的多个位中从低位开始的第M位的地址信息与后一个子命令的多个位中从低位开始的第M位的地址信息不同;当不同的子命令对应的第M位的地址信息改变时,意味着不同子命令的多个位中指示子命令所访问的内存库组改变,也进一步意味着第一访问命令拆分后的子命令可以访问不同的内存库组。因此,本申请为了实现第二交织方案中,访问第二内存区域的带宽较高,子命令中的多个位中从低位开始的第M位用于指示所访问的内存库组的低位。相应的,用于指示所访问的内存库组的高位,即第N位,高于第M位。 In a possible design, in the second interleaving scheme, the M-th bit and the N-th bit from the low-order bits are used to jointly indicate the memory bank group accessed by each subcommand; N is an integer greater than M . Multiple bits can be understood as multiple bits corresponding to the first address of the amount of data to be accessed carried by the subcommand. In the present application, when the address information indicated by the address lines corresponding to the first addresses of different subcommands is used to access different memory bank groups, the bandwidth usage efficiency of the memory chip is relatively high. This requires that in the address information indicated by the address lines corresponding to the first addresses of different subcommands, the address information indicating the memory bank group is different. Since in the split subcommands, the first address of each subcommand is determined according to the first address of the previously determined subcommand and the data length accessed by each subcommand, the first address of the previous subcommand is offset by 2 M When the first address of the next subcommand is obtained by -1 unit, the address information of the Mth bit from the low order among the multiple bits of the previous subcommand and the Mth bit from the low order of the multiple bits of the next subcommand The address information is different; when the address information of the M-th bit corresponding to different subcommands changes, it means that the memory bank group accessed by the indicated subcommand in the multiple bits of different subcommands changes, which further means that the first access command Split subcommands can access different memory bank groups. Therefore, in this application, in order to implement the second interleaving scheme, the bandwidth for accessing the second memory region is relatively high, and the Mth bit from the lower bit in the multiple bits in the subcommand is used to indicate the lower bit of the accessed memory bank group. Correspondingly, it is used to indicate that the high bit of the accessed memory bank group, that is, the Nth bit, is higher than the Mth bit.
在一种可能的设计中,多个位中从低位开始的第R位与第S位用于指示每个子命令所访问的内存库,R为大于N的整数,S为大于R的整数。通常,每个内存库组包括4个内存库,因此,内存库也可以使用多个位中的2位对应的地址线传输内存库的地址信息,例如内存库包括bank0、bank1、bank2和bank3,使用2位地址线传输bank的地址信息,本申请的第二交织方案中,指示内存库的位可以为高于指示内存库组的位。In a possible design, the R-th bit and the S-th bit from the low-order bits are used to indicate the memory bank accessed by each subcommand, where R is an integer greater than N, and S is an integer greater than R. Usually, each memory bank group includes 4 memory banks. Therefore, the memory bank can also transmit the address information of the memory bank by using the address lines corresponding to 2 bits in the multiple bits. For example, the memory bank includes bank0, bank1, bank2 and bank3. A 2-bit address line is used to transmit the address information of the bank. In the second interleaving scheme of the present application, the bit indicating the memory bank may be higher than the bit indicating the memory bank group.
在一种可能的设计中,在第一交织方案中,多个位中从低位开始的第P位与第Q位用于共同指示每个子命令所访问的内存库组,P为大于M的整数,Q为大于P的整数。假设拆分前的访问命令的访问地址都是连续的时,在很长一段地址范围内,访问命令被拆分后的子命令中,前一个子命令的多个位中从低位开始的第P位的地址信息与后一个子命令的多个位中从低位开始的第P位的地址信息都是相同的,第P位的地址信息改变需要在累积访问了(2 P-1)kb的数据量后才会改变,在访问的数据量为累积到(2 P-1)kb之前,子命令的多个位中从低位开始的第P位的地址信息都是相同的。当不同的子命令对应的第P位的地址信息相同时,意味着不同子命令的多个位中指示子命令所访问的内存库组在这段地址范围内都是相同的,上述第一访问命令的访问地址在这段连续的地址范围内时,第一访问命令被拆分后的子命令也都访问了相同的内 存库组。因此,本申请为了实现第一交织方案中,访问第一内存区域的功耗较低,能效较高。 In a possible design, in the first interleaving scheme, the P-th bit and the Q-th bit from the low-order bits are used to jointly indicate the memory bank group accessed by each subcommand, and P is an integer greater than M , Q is an integer greater than P. Assuming that the access addresses of the access commands before the split are all consecutive, in a long address range, in the subcommand after the access command is split, the Pth starting from the low bit in the multiple bits of the previous subcommand The address information of the bit is the same as the address information of the P-th bit starting from the low-order bit in the multiple bits of the following subcommand. The change of the address information of the P-th bit requires that (2 P-1 )kb of data have been accumulated and accessed. The address information of the P-th bit from the low-order bit in the multiple bits of the subcommand is the same until the accessed data volume is accumulated to (2 P-1 )kb. When the address information of the Pth bit corresponding to different subcommands is the same, it means that the memory bank groups accessed by the indicated subcommands in the multiple bits of different subcommands are all the same within this address range. When the access address of the command is within this continuous address range, the subcommands after the first access command is split also access the same memory bank group. Therefore, in order to implement the first interleaving scheme in the present application, the power consumption of accessing the first memory region is low, and the energy efficiency is high.
在一种可能的设计中,多个位中从低位开始的第J位与第K位用于指示每个子命令所访问的内存库,J为大于Q的整数,K为大于J的整数。即本申请的第一交织方案中,指示内存库的位也是高于指示内存库组的位的。In a possible design, the Jth bit and the Kth bit starting from the lower order of the plurality of bits are used to indicate the memory bank accessed by each subcommand, where J is an integer greater than Q, and K is an integer greater than J. That is, in the first interleaving scheme of the present application, the bit indicating the memory bank is also higher than the bit indicating the memory bank group.
在一种可能的设计中,当第一访问命令访问第一存储区域时,第一访问命令被拆分为X个子命令;且当第一访问命令访问第二存储区域时,第一访问命令被拆分为Y个子命令;Y大于X,且X和Y为大于1的整数。可以理解为本申请为了提高第一访问命令访问第二存储区域时的带宽使用效率,本申请的控制器可以将第一访问命令拆分为更多数量个子命令,这样在进行地址交织时,更多数量个子命令在同时访问不同的内存库组时的带宽使用效率更高。In a possible design, when the first access command accesses the first storage area, the first access command is split into X subcommands; and when the first access command accesses the second storage area, the first access command is split into X subcommands; Split into Y subcommands; Y is greater than X, and X and Y are integers greater than 1. It can be understood that in order to improve the bandwidth usage efficiency when the first access command accesses the second storage area in the present application, the controller of the present application can split the first access command into a greater number of subcommands, so that when performing address interleaving, more Larger numbers of subcommands are more efficient in bandwidth usage when accessing different banks of memory at the same time.
第二方面,提供一种内存访问方法,该方法包括:根据第一访问命令的访问地址确定第一访问命令访问内存芯片中的第一存储区域或第二存储区域;第一存储区域与第二存储区域不重叠;且在确定第一访问命令访问第一存储区域时,根据第一交织方案对第一访问命令的地址进行交织以得到交织后的访问地址;在确定第一访问命令访问第二存储区域时,根据第二交织方案对第一访问命令的地址进行交织以得到交织后的访问地址,交织后的访问地址用于访问内存芯片。第二方面的有益效果可以参见第一方面的有益效果的说明。In a second aspect, a memory access method is provided, the method comprising: determining, according to an access address of the first access command, that a first access command accesses a first storage area or a second storage area in a memory chip; The storage areas do not overlap; and when it is determined that the first access command accesses the first storage area, the address of the first access command is interleaved according to the first interleaving scheme to obtain the interleaved access address; when it is determined that the first access command accesses the second storage area When storing the area, the address of the first access command is interleaved according to the second interleaving scheme to obtain the interleaved access address, and the interleaved access address is used to access the memory chip. For the beneficial effects of the second aspect, please refer to the description of the beneficial effects of the first aspect.
在一种可能的设计中,访问第二存储区域中第二数据所需要的第二带宽高于访问第一存储区域中第一数据所需要的第一带宽。In a possible design, the second bandwidth required to access the second data in the second storage area is higher than the first bandwidth required to access the first data in the first storage area.
在一种可能的设计中,该方法还包括:将第一访问命令拆分为多个子命令,交织后的访问地址包括多个子命令的访问地址;在第二交织方案中,多个子命令的访问地址分别用于访问第二存储区域中的不同内存库组;不同内存库组中每个内存库组包括多个内存库;在第一交织方案中,多个子命令的访问地址用于访问第一存储区域中的同一内存库组中的不同内存库或同一内存库组中的同一内存库。In a possible design, the method further includes: splitting the first access command into multiple subcommands, and the interleaved access address includes the access addresses of the multiple subcommands; in the second interleaving scheme, the access addresses of the multiple subcommands The addresses are respectively used to access different memory bank groups in the second storage area; each memory bank group in the different memory bank groups includes multiple memory banks; in the first interleaving scheme, the access addresses of the multiple subcommands are used to access the first Different banks in the same bank group in a storage area or the same bank in the same bank group.
在一种可能的设计中,多个子命令中的每个子命令包括多个位,用于指示每个子命令的访问地址,多个位中的每个位对应访问地址的一地址线,每个子命令所访问的数据量是2 M-1个单位,M是大于1的整数。 In a possible design, each subcommand of the plurality of subcommands includes a plurality of bits for indicating an access address of each subcommand, each of the plurality of bits corresponds to an address line of the access address, and each subcommand The amount of data accessed is 2 M-1 units, where M is an integer greater than 1.
在一种可能的设计中,在第二交织方案中,多个位中从低位开始的第M位与第N位用于共同指示每个子命令所访问的内存库组;N是大于M的整数。In a possible design, in the second interleaving scheme, the M-th bit and the N-th bit from the low-order bits are used to jointly indicate the memory bank group accessed by each subcommand; N is an integer greater than M .
在一种可能的设计中,多个位中从低位开始的第R位与第S位用于指示每个子命令所访问的内存库,R为大于N的整数,S为大于R的整数。In a possible design, the R-th bit and the S-th bit from the low-order bits are used to indicate the memory bank accessed by each subcommand, where R is an integer greater than N, and S is an integer greater than R.
在一种可能的设计中,在第一交织方案中,多个位中从低位开始的第P位与第Q位用于共同指示每个子命令所访问的内存库组,P为大于M的整数,Q为大于P的整数。In a possible design, in the first interleaving scheme, the P-th bit and the Q-th bit from the low-order bits are used to jointly indicate the memory bank group accessed by each subcommand, and P is an integer greater than M , Q is an integer greater than P.
在一种可能的设计中,多个位中从低位开始的第J位与第K位用于指示每个子命令所访问的内存库,J为大于Q的整数,K为大于J的整数。In a possible design, the Jth bit and the Kth bit starting from the lower order of the plurality of bits are used to indicate the memory bank accessed by each subcommand, where J is an integer greater than Q, and K is an integer greater than J.
在一种可能的设计中,当第一访问命令访问第一存储区域时,第一访问命令被拆分为X个子命令;且当第一访问命令访问第二存储区域时,第一访问命令被拆分为Y 个子命令;Y大于X,且X和Y为大于1的整数。In a possible design, when the first access command accesses the first storage area, the first access command is split into X subcommands; and when the first access command accesses the second storage area, the first access command is split into X subcommands; Split into Y subcommands; Y is greater than X, and X and Y are integers greater than 1.
第三方面,提供一种通信芯片,通信芯片包括如第一方面或第一方面的任一种可能的设计所述的内存访问装置。A third aspect provides a communication chip, where the communication chip includes the memory access device described in the first aspect or any possible design of the first aspect.
第四方面,提供一种电子设备,所述电子设备包括如第一方面或第一方面的任一种可能的设计所述的内存访问装置。In a fourth aspect, an electronic device is provided, and the electronic device includes the memory access device according to the first aspect or any possible design of the first aspect.
第五方面,提供一种计算机可读存储介质,包括计算机指令,当计算机指令在电子设备上运行时,使得电子设备执行上述第一方面或第一方面的任一种可能的设计所述的方法。In a fifth aspect, a computer-readable storage medium is provided, comprising computer instructions, which, when the computer instructions are executed on an electronic device, cause the electronic device to perform the method described in the first aspect or any possible design of the first aspect .
第六方面,提供一种计算机程序产品,当计算机程序产品在计算机上运行时,使得电子设备执行上述第一方面或第一方面的任一种可能的设计所述的方法。A sixth aspect provides a computer program product that, when the computer program product runs on a computer, enables an electronic device to perform the method described in the first aspect or any possible design of the first aspect.
附图说明Description of drawings
图1为本申请实施例提供的一种采用拼凑控制逻辑访问内存芯片的示意图;1 is a schematic diagram of accessing a memory chip using patchwork control logic according to an embodiment of the present application;
图2为本申请实施例提供的一种DDR颗粒的存储结构示意图;FIG. 2 is a schematic diagram of a storage structure of a DDR particle provided by an embodiment of the present application;
图3为本申请实施例提供的一种BG模式下,对LPDDR5的一个Channel X的访问示意图;3 is a schematic diagram of an access to a Channel X of LPDDR5 in a BG mode provided by an embodiment of the present application;
图4为本申请实施例提供的一种BG模式下的访问命令的发送和数据输出的示意图;4 is a schematic diagram of transmission of an access command and data output in a BG mode provided by an embodiment of the present application;
图5中的(a)为本申请实施例提供的一种BG模式下,GPU访问LPDDR5的仿真带宽需求示意图;(a) in FIG. 5 is a schematic diagram of the simulation bandwidth requirement of GPU accessing LPDDR5 in a BG mode provided by the embodiment of the present application;
图5中的(b)为本申请实施例提供的一种BG模式下,GPU访问LPDDR5的实测带宽需求示意图;(b) in FIG. 5 is a schematic diagram of the measured bandwidth requirement of GPU accessing LPDDR5 in a BG mode provided by the embodiment of the present application;
图6为本申请实施例提供的一种存区域的划分示意图;6 is a schematic diagram of the division of a storage area according to an embodiment of the present application;
图7为本申请实施例提供的一种游戏场景下不同的业务类型访问不同的存储区域的示意图;7 is a schematic diagram of different service types accessing different storage areas in a game scenario provided by an embodiment of the present application;
图8为本申请提供的一种内存访问装置的框架图;8 is a frame diagram of a memory access device provided by the application;
图9为本申请实施例提供的一种内存分配流程示意图;FIG. 9 is a schematic flowchart of a memory allocation process provided by an embodiment of the present application;
图10为本申请实施例提供的一种内存访问方法的流程示意图;10 is a schematic flowchart of a memory access method provided by an embodiment of the present application;
图11为本申请实施例提供的一种指示内存库组的地址信息示意图;11 is a schematic diagram of address information indicating a memory bank group provided by an embodiment of the present application;
图12为本申请实施例提供的一种指示内存库组和内存库的地址信息示意图;12 is a schematic diagram of address information indicating a memory bank group and a memory bank provided by an embodiment of the present application;
图13为本申请实施例提供的一种第二交织方案的交织示意图;13 is a schematic diagram of interleaving of a second interleaving scheme provided by an embodiment of the present application;
图14为本申请实施例提供的一种采用第二交织方案访问不同BG的示意图;14 is a schematic diagram of using a second interleaving scheme to access different BGs according to an embodiment of the present application;
图15为本申请实施例提供的另一种第二交织方案的交织示意图;FIG. 15 is a schematic diagram of interleaving of another second interleaving scheme provided by an embodiment of the present application;
图16为本申请实施例提供的一种采用第二交织方案访问不同BG的示意图;16 is a schematic diagram of using a second interleaving scheme to access different BGs according to an embodiment of the present application;
图17为本申请实施例提供的一种指示内存库组的地址信息示意图;17 is a schematic diagram of address information indicating a memory bank group provided by an embodiment of the present application;
图18为本申请实施例提供的一种指示内存库组和内存库的地址信息示意图;18 is a schematic diagram of address information indicating a memory bank group and a memory bank provided by an embodiment of the present application;
图19为本申请实施例提供的一种第一交织方案的交织示意图;FIG. 19 is a schematic diagram of interleaving of a first interleaving scheme provided by an embodiment of the present application;
图20为本申请实施例提供的一种采用第一交织方案访问相同BG的示意图;20 is a schematic diagram of accessing the same BG using a first interleaving scheme provided by an embodiment of the present application;
图21为本申请实施例提供的一种SoC的结构示意图;FIG. 21 is a schematic structural diagram of a SoC provided by an embodiment of the application;
图22为本申请实施例提供的一种DDR控制器的结构示意图;FIG. 22 is a schematic structural diagram of a DDR controller according to an embodiment of the present application;
图23为本申请实施例提供的一种电子设备的结构示意图。FIG. 23 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
具体实施方式Detailed ways
本申请中BG模式下的BG可以理解为:每个内存芯片(可以为本申请中的DDR颗粒)中包括多个Channel,每个Channel可支持对2个DQ(Define Quadra Word,汇编语言中的伪操作命令,用来定义操作数占用的字节数)Byte接口的访问,通过每个DQ Byte接口可以访问多个BG,即一个BG包括多个bank,每个bank中包括多行和多列的多个小存储区域(cell)。图2中示出了DDR颗粒内包括结构的示意图,本文以DDR颗粒/芯片为LPDDR5芯片为例,后续将LPDDR5芯片简称为LPDDR5。图2中的DDR颗粒包括一个通道,即Channel X(图2示意为通道X),通过DQ Byte0(图2中示意的DQ字节0)接口和DQ Byte1(图2中示意的DQ字节1)接口可以访问Channel X,并且通过每个DQ Byte接口可访问的突发长度(burst length,BL)可以为128Byte或64Byte等。一个DQ Byte接口可对4个BG(BG0、BG1、BG2和BG3)中的一个或多个BG进行访问。图2中示出的LPDDR5中的一个Channel X,每个BG对应有bank0、bank1、bank2以及bank3共4个bank,4个BG即包括16个bank。The BG in the BG mode in this application can be understood as: each memory chip (which can be the DDR particle in this application) includes multiple Channels, and each Channel can support two DQs (Define Quadra Word, in assembly language). Pseudo operation command, used to define the number of bytes occupied by the operand) Byte interface access, multiple BGs can be accessed through each DQ Byte interface, that is, a BG includes multiple banks, and each bank includes multiple rows and columns. multiple small storage areas (cells). FIG. 2 shows a schematic diagram of the structure included in the DDR particle. In this paper, the DDR particle/chip is an LPDDR5 chip as an example, and the LPDDR5 chip is referred to as LPDDR5 for short. The DDR particle in Figure 2 includes a channel, namely Channel X (illustrated as Channel X in Figure 2), through the DQ Byte0 (DQ Byte 0 illustrated in Figure 2) interface and DQ Byte1 (DQ Byte 1 illustrated in Figure 2 Byte 1) interface ) interface can access Channel X, and the burst length (BL) accessible through each DQ Byte interface can be 128Byte or 64Byte, etc. A DQ Byte interface can access one or more of the 4 BGs (BG0, BG1, BG2, and BG3). For a Channel X in LPDDR5 shown in Figure 2, each BG corresponds to 4 banks, bank0, bank1, bank2, and bank3, and 4 BGs include 16 banks.
下面先对本申请方案要解决的问题的背景进行阐述。JEDEC组织制定的LPDDR5标准JESD209-5A中定义,中央处理单元(central processing unit,CPU)或其他器件对LPDDR5的读写访问最高速率等级为5500Mbps,相对LPDDR4,读写访问速率大约提升了29%。但由于技术原因,各厂家生产的LPDDR5的存储Cell内的访问速率并没有如此大幅提升,因而JEDEC定义了3种模式(mode),对不同模式增加访问约束条件,仅在特殊条件下达成对LPDDR5的满带宽访问速率。这3种模式可以如表1所示。The background of the problem to be solved by the solution of the present application is first described below. As defined in the LPDDR5 standard JESD209-5A formulated by the JEDEC organization, the maximum rate of read and write access to LPDDR5 by a central processing unit (CPU) or other devices is 5500Mbps, which is about 29% higher than that of LPDDR4. However, due to technical reasons, the access rate in the storage cell of LPDDR5 produced by various manufacturers has not been greatly improved. Therefore, JEDEC defines 3 modes, adding access constraints to different modes, and only achieves access to LPDDR5 under special conditions. full bandwidth access rate. These 3 modes can be shown in Table 1.
表1Table 1
Figure PCTCN2021074562-appb-000001
Figure PCTCN2021074562-appb-000001
从表1可知,在BG模式下,以CPU为例作介绍,CPU对LPDDR5的访问速率最高可以在3200Mbps和5500Mbps之间,由于CPU的访问速率大于3200Mbps,对于LPDDR5支持的BL=32Byte=256bit的访问,在LPDDR5外部总线持续时间内,如图3所示,LPDDR5只能完成1次bank的访问,即传输128bit数据。因此,LPDDR5内部的访问带宽只有外部总线访问带宽的一半,也就是说,LPDDR5外部的访问速率过高,LPDDR5内部来不及支持外部的访问速率,LPDDR5内有一半的访问带宽并未传输数据,造成了LPDDR5的带宽浪费,LPDDR5的数据传输效率较低。It can be seen from Table 1 that in BG mode, taking the CPU as an example, the maximum access rate of the CPU to LPDDR5 can be between 3200Mbps and 5500Mbps. Since the access rate of the CPU is greater than 3200Mbps, for the BL=32Byte=256bit supported by LPDDR5 Access, within the duration of the LPDDR5 external bus, as shown in Figure 3, LPDDR5 can only complete one bank access, that is, transmit 128bit data. Therefore, the internal access bandwidth of LPDDR5 is only half of the access bandwidth of the external bus, that is to say, the external access rate of LPDDR5 is too high, the internal access rate of LPDDR5 is too late to support the external access rate, and half of the access bandwidth of LPDDR5 does not transmit data, resulting in The bandwidth of LPDDR5 is wasted, and the data transmission efficiency of LPDDR5 is lower.
为了解决LPDDR5内部访问带宽小于LPDDR5外部总线访问带宽的问题,JEDEC标准对BG模式定义了连续访问约束,即推荐BG模式下,连续访问LPDDR5的命令能够用于访问不同的BG,即访问LPDDR5的命令要实现BG交织,即在不同BG之间做交织。举例来说,LPDDR5支持的BL=32Byte的访问,CPU要对LPDDR5进行读访问,CPU发送的第1个命令用于读取BG0的bank0,紧跟着第2个命令用于读取BG1(BG0以外的其它BG)中的bank0,第1个命令访问的BG和第2个BG访问的BG不同。由于CPU访问DDR颗粒的命令都是通过命令和地址总线(Command and Address bus,CA)在同时输出多个访问命令的数据之前提前发送多个访问命令给LPDDR5的,即访问命令要读取的数据被读取后并没有及时输出,而是等多个访问命令的数据都读取后再同时输出。因而LPDDR5可在将要传输第2个命令的数据时已经提前执行了第2个命令,即提前读取了BG1 bank0中的数据,当将BG0 bank0的128bit数据传输后,BG1 bank0的128bit数据已经提前读取了,因此BG1 bank0的数据可以与BG0 bank0数据同时输出,即LPDDR5可输出256bit的BL的访问,从而实现内外总线访问带宽相同。In order to solve the problem that the internal access bandwidth of LPDDR5 is smaller than the access bandwidth of the external bus of LPDDR5, the JEDEC standard defines continuous access constraints for BG mode, that is, it is recommended that in BG mode, the command to access LPDDR5 continuously can be used to access different BGs, that is, the command to access LPDDR5 To implement BG interleaving, that is, interleaving between different BGs. For example, for BL=32Byte access supported by LPDDR5, the CPU needs to read access to LPDDR5. The first command sent by the CPU is used to read bank0 of BG0, followed by the second command to read BG1 (BG0 In the bank0 in other BGs), the BG accessed by the first command is different from the BG accessed by the second BG. Because the commands of the CPU to access the DDR particles are sent through the command and address bus (Command and Address bus, CA) in advance to send multiple access commands to the LPDDR5 before outputting the data of the multiple access commands at the same time, that is, the data to be read by the access command After being read, it is not output in time, but after the data of multiple access commands are read, it is output at the same time. Therefore, LPDDR5 can already execute the second command in advance when the data of the second command is about to be transmitted, that is, read the data in BG1 bank0 in advance, when the 128bit data of BG0 bank0 is transmitted, the 128bit data of BG1 bank0 has been advanced It is read, so the data of BG1 bank0 can be output at the same time as the data of BG0 bank0, that is, LPDDR5 can output 256bit BL access, so as to achieve the same internal and external bus access bandwidth.
这种BG交织的访问方式在JEDEC标准中已经定义,即在BG模式下的访问需要在BG间交织,如图4所示,图4中的CK_t和CK_c表示SoC内部的器件发送访问命令的时钟(clock),可在T0、T1和T2等为上升沿时刻提前发送访问命令,例如图4中示出的命令的顺序为访问BGn、BGm、BGn这种BG交织的读命令,LPDDR5通过DMI(Data Mask Inversion)在输出数据时,可对访问BGn、BGm、BGn等命令实现数据连续输出,DMI输出的数据的顺序也是按照BGn、BGm、BGn这种BG交织的方式输出。图4中的WCK_t和WCK_c表示DDR颗粒内的clock,一个clock下,DMI输出一个数据,一个burst可以输出0~15个clock的数据。This BG interleaving access method has been defined in the JEDEC standard, that is, the access in BG mode needs to be interleaved between BGs, as shown in Figure 4, CK_t and CK_c in Figure 4 represent the clock for the device inside the SoC to send the access command (clock), the access command can be sent in advance at the rising edge time such as T0, T1 and T2. For example, the order of the commands shown in Figure 4 is to access the BG-interleaved read commands such as BGn, BGm, and BGn. LPDDR5 passes DMI ( Data Mask Inversion) can realize continuous data output for commands such as accessing BGn, BGm, BGn, etc. when outputting data, and the order of data output by DMI is also output according to the BG interleaving method of BGn, BGm, and BGn. WCK_t and WCK_c in Figure 4 represent the clock in the DDR particle. Under one clock, DMI outputs one data, and one burst can output data of 0 to 15 clocks.
以上描述属于对本实施例应用场景的大致介绍,关于BG模式和BG交织的背景技术更多可参照JEDEC组织制定的LPDDR5标准JESD209-5A中定义,本实施例对此不做详细介绍,本领域技术人员可参照现有标准中的描述。The above description is a general introduction to the application scenarios of this embodiment. For the background technology of the BG mode and BG interleaving, more reference may be made to the definition in the LPDDR5 standard JESD209-5A formulated by the JEDEC organization, which is not described in detail in this embodiment. Personnel can refer to the description in the existing standard.
在实际场景下,由于不同的业务下,访问不同BG的命令在数量上是不均衡的,而且可能差异很大,因此很难做到有效BG交织。例如CPU发送的读写命令的数量较少时,很难实现如现有技术图1那样让读写命令拼凑重排序后实现BG之间的有效交织;或者,CPU发送的访问命令中,访问BG0的命令较多,访问BG0以外的其它BG的命令较少,也很难实现BG之间的有效交织。以上实施例提到的CPU也可以替换为其他有存储使用需求的器件,例如,图5中的(a)为图形处理器(Graphics Processing Unit,GPU)访问DDR颗粒的仿真带宽,横轴表示时间,单位为s,纵轴表示带宽,单位为GBps,且GPU通过缓存(Cache)(cacheline=128B)访问DDR颗粒,访问的数据地址不连续时,GPU对DDR峰值带宽需求约为60GB/s。图5中的(b)示意的是实测DDR颗粒的4个通道带宽的需求,横轴表示时间,单位为s,纵轴表示带宽,单位为GBps,计算得到最大带宽使用效率为62%。因此,目前,各SoC芯片平台上对LPDDR5实测最大带宽使用效率也只有60%左右。可见,目前LPDDR5的带宽使用效率较低,其本质原因为在这些场景下,访问LPDDR5的BG的命令是不均衡的,很难实现BG之间的有效交织。这是因为不是所有应用场景都需要实现BG之间的交织, 使用此方案会造成功耗的增加。况且,当前技术方案将整个LPDDR5的存储区域看作一个分区,针对任何应用场景,无论重负载高带宽业务,还是低负载低带宽业务访问LPDDR5,均按照同一套拼凑重排策略,导致同时有多个bank被激活(active),增加了功耗。下面针对以上问题,提出一种内存访问方案,使得访问能考虑应用需求,达到功耗的优化。In actual scenarios, the number of commands to access different BGs is unbalanced in different services, and may vary greatly, so it is difficult to achieve effective BG interleaving. For example, when the number of read and write commands sent by the CPU is small, it is difficult to realize effective interleaving between BGs after the read and write commands are pieced together and reordered as shown in FIG. 1 in the prior art; or, in the access commands sent by the CPU, access BG0 There are many commands for accessing other BGs other than BG0, and it is difficult to achieve effective interleaving between BGs. The CPU mentioned in the above embodiment can also be replaced with other devices that have storage usage requirements. For example, (a) in FIG. 5 is the simulation bandwidth of the Graphics Processing Unit (GPU) accessing DDR particles, and the horizontal axis represents the time. , the unit is s, the vertical axis represents the bandwidth, the unit is GBps, and the GPU accesses DDR particles through the cache (cache) (cacheline=128B). When the accessed data addresses are discontinuous, the GPU’s peak bandwidth requirement for DDR is about 60GB/s. (b) in Figure 5 shows the actual measured bandwidth requirements of 4 channels of DDR particles. The horizontal axis represents time in s, and the vertical axis represents bandwidth in GBps. The calculated maximum bandwidth usage efficiency is 62%. Therefore, at present, the maximum bandwidth utilization efficiency of LPDDR5 measured on each SoC chip platform is only about 60%. It can be seen that the current bandwidth usage efficiency of LPDDR5 is low. The essential reason is that in these scenarios, the commands for accessing BGs of LPDDR5 are unbalanced, and it is difficult to achieve effective interleaving between BGs. This is because not all application scenarios need to implement interleaving between BGs, and using this solution will increase power consumption. Moreover, the current technical solution regards the entire LPDDR5 storage area as a partition. For any application scenario, whether heavy-load high-bandwidth services or low-load low-bandwidth services access LPDDR5, they all follow the same patchwork and rearrangement strategy, resulting in multiple simultaneous Each bank is activated (active), increasing power consumption. Aiming at the above problems, a memory access scheme is proposed below, so that the access can consider the application requirements and achieve the optimization of power consumption.
因此,本申请提供一种内存访问装置,该内存访问装置可将内存芯片的存储区域看作包括多个存储区域,多个存储区域中有访问内存芯片时,内存芯片的功耗较高的存储区域,也有访问内存芯片时,内存芯片的功耗较低的存储区域。该内存访问装置可以按照存储区域访问内存芯片。如图6所示,内存芯片包括多个通道:通道0、通道1、…通道e,其中每个通道可以采用如图2所示的结构,e为正整数,因此内存芯片包括了总共e个如图2所示的通道,这e个通道的存储区域被划分为两个存储区域,第一存储区域和第二存储区域,每个区域占用至少一个(通常是多个)通道的部分存储区域。在图6中每个区域以交叉斜线所覆盖的区域来表示。第一存储区域为功耗较低的存储区域,第二存储区域为功耗较高的存储区域。换种说法,访问第二存储区域时内存芯片的功耗较高时,第二存储区域可以用于大带宽内存申请需求的业务,访问第一存储区域时内存芯片的功耗较低时,第一存储区域可以用于小带宽内存申请需求的业务。Therefore, the present application provides a memory access device, which can regard the storage area of a memory chip as including a plurality of storage areas, and when the memory chip is accessed in the plurality of storage areas, the storage area of the memory chip with high power consumption There are also storage areas where the memory chip consumes less power when accessing the memory chip. The memory access device can access memory chips according to storage areas. As shown in Figure 6, the memory chip includes multiple channels: channel 0, channel 1, ... channel e, where each channel can adopt the structure shown in Figure 2, e is a positive integer, so the memory chip includes a total of e For the channel shown in Figure 2, the storage area of the e channel is divided into two storage areas, a first storage area and a second storage area, each area occupying at least one (usually multiple) channel part of the storage area . Each area is represented by the area covered by the cross-hatched line in FIG. 6 . The first storage area is a storage area with low power consumption, and the second storage area is a storage area with high power consumption. In other words, when the power consumption of the memory chip is high when accessing the second storage area, the second storage area can be used for services requiring large bandwidth memory applications, and when the power consumption of the memory chip when accessing the first storage area is low, the A storage area can be used for services that require small bandwidth memory applications.
需要说明的是,本申请中,内存芯片的存储区域被划分为多个存储区域可以理解为是对内存芯片的整个存储区域进行的逻辑上的划分,内存芯片内第一存储区域和第二存储区域并不是完全隔离开的。例如同一个BG可能既包含第一存储区域的数据,也包含第二存储区域的数据,同一个BG中第一存储区域的数据的地址和第二存储区域的数据的地址不重叠。It should be noted that, in this application, the division of the storage area of the memory chip into multiple storage areas can be understood as a logical division of the entire storage area of the memory chip, the first storage area and the second storage area in the memory chip. Regions are not completely isolated. For example, the same BG may contain both the data of the first storage area and the data of the second storage area, and the addresses of the data of the first storage area and the data of the second storage area in the same BG do not overlap.
示例性的,大带宽内存申请需求的业务例如可以为游戏场景的业务或人工智能(Artificial Intelligence,AI)场景中CPU、网络处理器(network process units,NPU)、图形处理器(graphics processing unit,GPU)或媒体(Media)处理器等对应的业务对DDR内存的申请,这些业务访问内存时需要的带宽较高。例如如图7所示,图7示出的是游戏场景下不同的业务类型访问不同的存储区域的示意图。可以看出,对于游戏场景下的操作***内核(Kernel)的数据或代码段、高保真(High-Fidelity,HIFI)固件(Firmware)以及调制解调器(Modem)等业务、用户进行的私有数据(process data)以及文件缓存(file cache)等业务,对带宽的需求不高,因此在分配内存时可以都在第一存储区域中分配。而那些对纹理/顶点缓存(buffer)、GPU绘图后的输入输出(GPU job queue)以及帧缓冲(Frame buffer)等业务对带宽的需求较高,可以从第二存储区域中分配内存。这样一来,在访问不同功耗的存储区域的情况下,可以使得内存芯片在功耗匹配更好适应实际应用需求,使得内存芯片的功耗不浪费,提升了内存芯片的能效。Exemplarily, the business requiring a large bandwidth memory application may be, for example, a business in a game scene or a CPU, a network process unit (NPU), a graphics processor (graphics processing unit) in an artificial intelligence (Artificial Intelligence, AI) scene. The application for the DDR memory by the corresponding services such as GPU) or media (Media) processor, and the bandwidth required for these services to access the memory is relatively high. For example, as shown in FIG. 7 , FIG. 7 shows a schematic diagram of different service types accessing different storage areas in a game scenario. It can be seen that for the data or code segment of the operating system kernel (Kernel), high-fidelity (High-Fidelity, HIFI) firmware (Firmware) and modem (Modem) in the game scenario, private data (process data) performed by users ) and file cache (file cache) and other services, the demand for bandwidth is not high, so when allocating memory, it can be allocated in the first storage area. For those services such as texture/vertex buffer (buffer), input and output after GPU drawing (GPU job queue), and frame buffer (Frame buffer), which have high bandwidth requirements, memory can be allocated from the second storage area. In this way, in the case of accessing storage areas with different power consumption, the power consumption of the memory chip can be better adapted to the actual application requirements, so that the power consumption of the memory chip is not wasted, and the energy efficiency of the memory chip is improved.
基于此,本申请提供的内存访问装置针对多个功耗不同的存储区域,可以采用不同的交织方案对访问命令的地址进行交织以得到交织后的访问地址,使得交织后的访问地址可以访问功耗较高的存储区域或功耗较低的存储区域。例如如图8所示,本申请提供的内存访问装置80,该内存访问装置80可以位于处理器中,该处理器可以包 括CPU或GPU等器件,内存访问装置80包括控制器(DDR控制器)81和交织器82。控制器81确定需访问功耗较低的第一存储区域时,交织器82可以采用第一交织方案进行交织,交织后的访问地址用于访问第一存储区域,控制器81确定访问功耗较高的第二存储区域时,交织器82可以采用第二交织方案进行交织,交织后的访问地址用于访问第二存储区域。由此,本申请提供的内存访问装置80在访问内存芯片时,可以根据功耗需求不同的场景采用不同的交织方案访问不同的存储区域,可以达到使用交织方案不同的情况下更加节省内存芯片的功耗。Based on this, the memory access device provided by the present application can use different interleaving schemes to interleave the addresses of the access commands to obtain interleaved access addresses for multiple storage areas with different power consumption, so that the interleaved access addresses can access functions. higher power consumption storage area or lower power consumption storage area. For example, as shown in FIG. 8 , the memory access device 80 provided by the present application may be located in a processor, and the processor may include devices such as a CPU or a GPU, and the memory access device 80 includes a controller (DDR controller) 81 and interleaver 82. When the controller 81 determines that it needs to access the first storage area with lower power consumption, the interleaver 82 can use the first interleaving scheme to interleave, and the interleaved access address is used to access the first storage area, and the controller 81 determines that the access power consumption is relatively low. When the second storage area is high, the interleaver 82 may use the second interleaving scheme to interleave, and the interleaved access address is used to access the second storage area. Therefore, when the memory access device 80 provided by the present application accesses memory chips, different interleaving schemes can be used to access different storage areas according to different scenarios of power consumption requirements, so that the memory chips can be saved even more when different interleaving schemes are used. power consumption.
可以理解,在处理器中的CPU向内存芯片发送命令之前,需要先向处理器中的控制器81申请内存,即申请发送访问命令时,访问命令中携带的访问地址。由于本申请将内存芯片的存储区域划分为多个不同访问带宽的存储区域,因此,假设在图6示出的划分为第一存储区域和第二存储区域的划分的基础上,控制器81在分配内存时,与现有技术不同的是,需要考虑从第一存储区域还是第二存储区域分配内存。以内存芯片是DDR内存,控制器81是DDR内存控制为例,在介绍本申请的内存访问装置80的硬件结构之前,先介绍CPU向内存芯片发送命令之前,CPU向控制器81申请分配内存(申请待访问的内存地址范围,内存地址范围内的具体内存地址即上文中的访问地址)的过程。It can be understood that, before the CPU in the processor sends a command to the memory chip, it needs to apply to the controller 81 in the processor for memory, that is, when applying for sending an access command, the access address carried in the access command is accessed. Since this application divides the storage area of the memory chip into a plurality of storage areas with different access bandwidths, it is assumed that on the basis of the division into the first storage area and the second storage area shown in FIG. When allocating memory, different from the prior art, it is necessary to consider whether to allocate memory from the first storage area or the second storage area. Taking the memory chip as a DDR memory and the controller 81 as a DDR memory controller as an example, before introducing the hardware structure of the memory access device 80 of the present application, it is introduced that before the CPU sends a command to the memory chip, the CPU applies to the controller 81 to allocate memory ( The process of applying for the memory address range to be accessed, the specific memory address within the memory address range is the access address in the above).
以上控制器81可以软件、硬件或二者结合实现,当控制器81是硬件时,其可以包括逻辑电路,也可以运行必要的内存管理软件。后续以控制器81运行所述软件实现软件内存管理为例作介绍。The above controller 81 may be implemented in software, hardware or a combination of the two. When the controller 81 is hardware, it may include logic circuits, and may also run necessary memory management software. The following description will be made by taking the controller 81 running the software to implement software memory management as an example.
由于本申请将内存芯片的存储区域划分为了第一存储区域和第二存储区域,因此,本申请中的软件内存管理的实现与现有的软件内存管理的实现有所不同,本申请的软件内存管理可以根据业务对DDR带宽的需要从第一存储区域或第二存储区域分配内存。即对于大带宽内存申请需求的业务,软件内存管理可以从第二存储区域分配内存,对于其他非大带宽的内存申请需求的业务,软件内存管理可以从第一存储区域分配内存。从第二存储区域分配内存时,访问的数据占用的内存芯片的带宽较大,内存芯片的功耗较高,带宽使用效率较高;从第一存储区域分配内存时,访问的数据占用的内存芯片的带宽较小,内存芯片的功耗较低,功耗不浪费。因此,本方案在带宽使用效率和功耗优化之间做到了良好的均衡和综合优化。Since the storage area of the memory chip is divided into the first storage area and the second storage area in this application, the implementation of software memory management in this application is different from the implementation of existing software memory management. The management can allocate memory from the first storage area or the second storage area according to business needs for DDR bandwidth. That is, for services requiring high-bandwidth memory application, the software memory management can allocate memory from the second storage area, and for other non-high-bandwidth memory application requirements, the software memory management can allocate memory from the first storage area. When the memory is allocated from the second storage area, the accessed data occupies a larger bandwidth of the memory chip, the power consumption of the memory chip is higher, and the bandwidth usage efficiency is higher; when the memory is allocated from the first storage area, the memory occupied by the accessed data The bandwidth of the chip is small, the power consumption of the memory chip is low, and the power consumption is not wasted. Therefore, this solution achieves a good balance and comprehensive optimization between bandwidth utilization efficiency and power consumption optimization.
基于本申请对存储区域的划分的架构,本申请的内存分配流程可以优先满足功耗较高,即大带宽需求的业务对带宽的需求,其次尽可能的满足功耗较低,即对带宽需求不高的业务的用户性能和对能效的要求。示例性的,如图9所示,首先介绍分配内存的过程。本申请的内存分配流程可以包括:1)CPU向控制器81申请内存。2)控制器81确定是否需要申请大带宽内存;若确定是,则控制器81确定第二存储区域是否上电,而后进入步骤3);若确定否,则控制器81从第一存储区域申请内存,而后进入步骤4)。例如,控制器81可以根据不同的业务的场景确定是否需要申请大带宽内存。Based on the structure of the application for the division of storage areas, the memory allocation process of the application can give priority to meeting the bandwidth requirements of services with high power consumption, that is, large bandwidth requirements, and secondly to satisfy the low power consumption as much as possible, that is, bandwidth requirements. User performance and energy efficiency requirements for low-level services. Exemplarily, as shown in FIG. 9 , the process of allocating memory is first introduced. The memory allocation process of the present application may include: 1) The CPU applies to the controller 81 for memory. 2) The controller 81 determines whether it is necessary to apply for a large-bandwidth memory; if it is determined to be yes, the controller 81 determines whether the second storage area is powered on, and then enters step 3); if it is determined to be no, the controller 81 applies from the first storage area. memory, and then go to step 4). For example, the controller 81 may determine whether to apply for a large bandwidth memory according to different service scenarios.
由于第二存储区域的访问频率相较于第一存储区域的访问频率较小,因此,为了降低内存芯片的功耗,在不向第二存储区域申请内存时,可控制第二存储区域下电。因此,步骤3)可以为:3)控制器81确定第二存储区域未上电时,先触发第二存储 区域上电,而后向第二存储区域申请内存。当控制器81确定要分配内存时,先确定第二存储区域的内存是否充足,如果确定第二存储区域的内存不充足,可以先对第二存储区域进行内存迁移或整理,再从第二存储区域分配内存。或者,如果确定第二存储区域的内存不充足,也可以从第一存储区域申请内存。Since the access frequency of the second storage area is lower than that of the first storage area, in order to reduce the power consumption of the memory chip, the second storage area can be controlled to be powered off when no memory is applied to the second storage area . Therefore, step 3) can be: 3) when the controller 81 determines that the second storage area is not powered on, first triggers the second storage area to be powered on, and then applies for memory to the second storage area. When the controller 81 determines to allocate memory, it first determines whether the memory in the second storage area is sufficient. If it is determined that the memory in the second storage area is insufficient, the second storage area can be migrated or sorted out, and then Region allocates memory. Alternatively, if it is determined that the memory of the second storage area is insufficient, memory may also be requested from the first storage area.
4)控制器81确定从第一存储区域申请内存时,先确定第一存储区域的内存是否充足,如果确定充足,则从第一存储区域分配内存;如果确定不充足,可以先对第一存储区域的内存进行回收整理操作,而后,再从第一存储区域申请内存。可以理解,CPU向控制器81申请内存的结果即为是控制器81得到一个或多个内存区域的内存地址范围。4) When the controller 81 determines to apply for memory from the first storage area, it first determines whether the memory in the first storage area is sufficient, and if it is determined to be sufficient, allocates memory from the first storage area; The memory of the area is reclaimed and sorted out, and then, the memory is requested from the first storage area. It can be understood that the result of the CPU applying for the memory to the controller 81 is that the controller 81 obtains the memory address range of one or more memory regions.
当控制器81处理CPU的申请而到存储区域对应的内存地址范围时,将内存地址范围反馈给CPU,即完成内存分配的过程。随后CPU开始内存访问过程。在内存访问过程中,CPU根据内存地址范围生成访问命令,访问命令中携带需访问的内存地址范围内的特定内存地址,以使得访问命令用于访问内存芯片。这里的内存地址例如可以为本申请下文中提到的第一访问命令的访问地址。第一访问命令即为CPU向控制器81发送的用于访问内存芯片的命令。控制器81在接收到第一访问命令时,先确定第一访问命令待访问的存储区域,交织器82再根据存储区域对应的交织方案进行BG交织,以根据交织后的地址访问内存芯片。When the controller 81 processes the application of the CPU and reaches the memory address range corresponding to the storage area, the controller 81 feeds back the memory address range to the CPU, that is, the process of memory allocation is completed. Then the CPU starts the memory access process. During the memory access process, the CPU generates an access command according to the memory address range, and the access command carries a specific memory address within the memory address range to be accessed, so that the access command is used to access the memory chip. The memory address here can be, for example, the access address of the first access command mentioned below in this application. The first access command is a command sent by the CPU to the controller 81 for accessing the memory chip. When the controller 81 receives the first access command, it first determines the storage area to be accessed by the first access command, and the interleaver 82 performs BG interleaving according to the interleaving scheme corresponding to the storage area to access the memory chip according to the interleaved address.
因此,基于本申请提出的内存访问装置80,该内存访问装置80可以应用于CPU访问内存芯片的场景中。例如本申请提供的内存访问装置80可以是位于处理器,例如SoC中的装置。内存芯片例如可以为LPDDR5。CPU根据访问地址访问内存芯片的方法可以参考图10,即图10示出的是本申请提供一种内存访问方法,该方法包括:140、内存访问装置80根据CPU发送的第一访问命令的访问地址确定该第一访问命令访问内存芯片中的第一存储区域或第二存储区域;第一存储区域与第二存储区域如图6所示不重叠。Therefore, based on the memory access device 80 proposed in this application, the memory access device 80 can be applied to a scenario where a CPU accesses a memory chip. For example, the memory access device 80 provided in the present application may be a device located in a processor, such as an SoC. The memory chip may be, for example, LPDDR5. Refer to FIG. 10 for a method for a CPU to access a memory chip according to an access address, that is, FIG. 10 shows a memory access method provided by the present application, and the method includes: 140. Access by the memory access device 80 according to a first access command sent by the CPU The address determines that the first access command accesses the first storage area or the second storage area in the memory chip; the first storage area and the second storage area do not overlap as shown in FIG. 6 .
例如内存访问装置80可以是根据第一访问命令所属的地址范围确定的。示例性的,第一存储区域和第二存储区域的地址范围不同,内存访问装置中的控制器81可以根据第一访问命令和不同存储区域的地址范围确定要访问的存储区域,即第一访问命令的地址属于第一存储区域的地址范围时,确定第一访问命令访问第一存储区域;第一访问命令的地址属于第二存储区域的地址范围时,确定第一访问命令访问第二存储区域。For example, the memory access device 80 may be determined according to the address range to which the first access command belongs. Exemplarily, the address ranges of the first storage area and the second storage area are different, and the controller 81 in the memory access device may determine the storage area to be accessed according to the first access command and the address ranges of the different storage areas, that is, the first access When the address of the command belongs to the address range of the first storage area, it is determined that the first access command accesses the first storage area; when the address of the first access command belongs to the address range of the second storage area, it is determined that the first access command accesses the second storage area .
进一步地,141、在确定第一访问命令访问第一存储区域时,内存访问装置80根据第一交织方案对第一访问命令的地址进行交织以得到交织后的访问地址;在确定第一访问命令访问第二存储区域时,根据第二交织方案对第一访问命令的地址进行交织以得到交织后的访问地址,交织后的访问地址用于访问内存芯片。例如,访问第二存储区域中第二数据所需要的第二带宽高于访问第一存储区域中第一数据所需要的第一带宽。本申请提供的第一交织方案和第二交织方案,可以使得访问内存芯片的第一存储区域的数据时内存芯片的功耗和访问第二存储区域的数据时内存芯片的功耗不同,或者说,第二交织方案用于访问第二存储区域时的带宽高于第一交织方案用于访问第一存储区域时的带宽。Further, 141. When determining that the first access command accesses the first storage area, the memory access device 80 interleaves the address of the first access command according to the first interleaving scheme to obtain an interleaved access address; after determining that the first access command When accessing the second storage area, the address of the first access command is interleaved according to the second interleaving scheme to obtain an interleaved access address, and the interleaved access address is used to access the memory chip. For example, the second bandwidth required to access the second data in the second storage area is higher than the first bandwidth required to access the first data in the first storage area. The first interleaving scheme and the second interleaving scheme provided by the present application can make the power consumption of the memory chip different when accessing data in the first storage area of the memory chip and the power consumption of the memory chip when accessing data in the second storage area, or , the bandwidth when the second interleaving scheme is used to access the second storage area is higher than the bandwidth when the first interleaving scheme is used to access the first storage area.
由于控制器81接收到CPU发送的第一访问命令时,控制器81都会进行命令拆分, 第一访问命令访问的存储区域又是根据访问命令的地址确定的,当第一访问命令访问第二存储区域时,交织器82对第一访问命令被拆分后的子命令所采用的是第二交织方案,如果使得子命令都用于访问不同的BG,访问的数据可以从内存芯片的多个BG中同时输出或同时写入多个BG,这时内存芯片的带宽使用效率较高,功耗较高;当第一访问命令访问第一存储区域时,交织器82对第一访问命令被拆分后的子命令所采用的是第一交织方案,如果使得子命令都用于访问相同的BG,甚至访问相同的BG中相同的bank,这时内存芯片的带宽使用效率较低,但是子命令都集中访问了单BG,被激活的bank较少,内存芯片的功耗较低,访问能效较高。因此,本方案在带宽使用效率和功耗优化之间做到了良好的均衡和综合优化。When the controller 81 receives the first access command sent by the CPU, the controller 81 will perform command splitting, and the storage area accessed by the first access command is determined according to the address of the access command. When storing the area, the interleaver 82 adopts the second interleaving scheme for the subcommands after the first access command is split. If the subcommands are all used to access different BGs, the accessed data can be accessed from multiple memory chips. Simultaneously output or write multiple BGs in the BG, at this time, the bandwidth utilization efficiency of the memory chip is higher, and the power consumption is higher; when the first access command accesses the first storage area, the interleaver 82 is disassembled to the first access command. The divided subcommand adopts the first interleaving scheme. If the subcommands are used to access the same BG, or even access the same bank in the same BG, the bandwidth usage efficiency of the memory chip is low, but the subcommands All access to a single BG, the activated bank is less, the power consumption of the memory chip is low, and the access energy efficiency is high. Therefore, this solution achieves a good balance and comprehensive optimization between bandwidth utilization efficiency and power consumption optimization.
因此,在一些实施例中,在交织器82使用第一交织方案或第二交织方案进行地址交织之前,该控制器81还可以用于:将第一访问命令拆分为多个子命令,交织后的访问地址包括多个子命令的访问地址。控制器81再将拆分后的多个子命令发送给交织器82,交织器82按照访问的存储区域使用第一交织方案或第二交织方案进行地址交织。其中,在第二交织方案中,多个子命令的访问地址分别用于访问第二存储区域中的不同内存库组;不同内存库组中每个内存库组(bank组)包括多个内存库(bank);在第一交织方案中,多个子命令的访问地址用于访问第一存储区域中的同一内存库组中的不同内存库或同一内存库组中的同一内存库。因此第一交织方案可以是不在不同内存库组之间进行交织的方案,或者说,第一交织方案是在同一个内存库组中进行交织访问的方案,即BG内交织或非BG交织,此方案功耗相对更高。第二交织方案可以是在不同内存库组之间进行交织的方案,即BG之间交织或简称为BG交织,此方案功耗相对更低。Therefore, in some embodiments, before the interleaver 82 performs address interleaving using the first interleaving scheme or the second interleaving scheme, the controller 81 can also be used to: split the first access command into a plurality of subcommands, and after the interleaving The access address includes the access addresses of multiple subcommands. The controller 81 then sends the split subcommands to the interleaver 82, and the interleaver 82 performs address interleaving by using the first interleaving scheme or the second interleaving scheme according to the accessed storage area. Wherein, in the second interleaving scheme, the access addresses of the multiple subcommands are respectively used to access different memory bank groups in the second storage area; each memory bank group (bank group) in the different memory bank groups includes a plurality of memory banks ( bank); in the first interleaving scheme, the access addresses of the multiple subcommands are used to access different memory banks in the same memory bank group or the same memory bank in the same memory bank group in the first storage area. Therefore, the first interleaving scheme may be a scheme that does not interleave between different memory bank groups, or in other words, the first interleaving scheme is a scheme that performs interleaving access in the same memory bank group, that is, intra-BG interleaving or non-BG interleaving. The power consumption of the scheme is relatively higher. The second interleaving scheme may be a scheme of interleaving between different memory bank groups, that is, inter-BG interleaving or simply BG interleaving, and this scheme consumes relatively lower power.
基于上述方案,当拆分后的多个子命令的访问地址分别用于访问第二存储区域中的不同内存库组时,多个子命令访问的数据可以从多个不同的内存库组中同时输出或同时向多个不同的内存库组进行写访问,内存芯片的带宽使用效率较高;当拆分后的多个子命令的访问地址用于访问第一存储区域中的同一内存库组中的不同内存库或同一内存库组中的同一内存库时,多个子命令所访问的内存库组为单内存库组,向单内存库组执行写访问或读访问时,被访问的内存库组为一个,甚至多个子命令访问的是单内存库组中的同一内存库,内存芯片的功耗较低,能效较高。其中,第一交织方案访问同一内存库组的同一内存库时,可以理解为将访问命令的地址交织(映射)为同一内存库组的同一内存库的地址。可以理解,本申请中的第一访问命令可以是写命令,也可以是读命令。可以理解,针对图8,如果第一访问命令为读命令,控制器81从内存芯片读取到数据后,控制器81还需要将子命令读回的数据进行拼接后返回给CPU。Based on the above solution, when the access addresses of the split multiple subcommands are respectively used to access different memory bank groups in the second storage area, the data accessed by the multiple subcommands can be simultaneously output or output from multiple different memory bank groups. Simultaneously write access to multiple different memory bank groups, the bandwidth usage efficiency of the memory chip is high; when the access addresses of the split multiple subcommands are used to access different memories in the same memory bank group in the first storage area When executing the same memory bank or the same memory bank in the same memory bank group, the memory bank group accessed by multiple subcommands is a single memory bank group. Even if multiple subcommands access the same memory bank in a single memory bank group, the power consumption of the memory chip is lower and the energy efficiency is higher. Wherein, when the first interleaving scheme accesses the same memory bank in the same memory bank group, it can be understood that the addresses of the access commands are interleaved (mapped) into the addresses of the same memory bank in the same memory bank group. It can be understood that the first access command in this application may be a write command or a read command. It can be understood that, referring to FIG. 8 , if the first access command is a read command, after the controller 81 reads data from the memory chip, the controller 81 also needs to combine the data read back by the subcommand and return it to the CPU.
根据以上说明,可以理解,内存访问装置80在访问内存芯片时,所发送的第一访问命令会被控制器81拆分为多个子命令,并将多个子命令发送给交织器82,相应的,多个子命令分别携带拆分后的访问地址,每个子命令携带的访问地址不同。交织器82在接收到控制器81发送的多个子命令时,会对多个子命令携带的地址进行交织,得到交织后的访问地址,交织后的访问地址会被发送给内存芯片,内存芯片再根据交织后的访问地址进行写访问或读访问。According to the above description, it can be understood that when the memory access device 80 accesses the memory chip, the first access command sent by the controller 81 will be split into a plurality of subcommands, and the plurality of subcommands will be sent to the interleaver 82. Correspondingly, Multiple subcommands respectively carry the split access addresses, and each subcommand carries different access addresses. When the interleaver 82 receives multiple subcommands sent by the controller 81, it will interleave the addresses carried by the multiple subcommands to obtain the interleaved access address, and the interleaved access address will be sent to the memory chip, and the memory chip will then follow the The interleaved access addresses perform write access or read access.
其中,多个子命令携带的地址进行交织的过程可以理解为将每个子命令携带的访 问地址映射为内存芯片可识别的访问地址的过程。这是由于,每个子命令携带的访问地址可以理解为:每个子命令包括多个位,用于指示每个子命令的访问地址。而要访问的内存芯片的存储区域是包括多个内存库组的,每个内存库组包括多个内存库,每个内存库又被划分为多个行和列。因此,地址映射的过程即为根据子命令包括的多个位确定内内存库组、内存库以及行和列的过程。这种地址映射过程在实现时,需要通过地址线传输访问地址的地址信息,地址线也可以理解为传输地址信息的导线。子命令包括的多个位中的每个位对应访问地址的一地址线。因此,子命令包括的每个位对应的地址线包括指示内存库组的地址线、指示内存库的地址线和指示行和列的地址线。任意一根地址线可以传输高电平或低电平,也即每根地址线可以传输二进制的地址比特的值为0或1。The process of interleaving addresses carried by multiple subcommands can be understood as a process of mapping the access addresses carried by each subcommand into access addresses identifiable by the memory chip. This is because the access address carried by each subcommand can be understood as: each subcommand includes multiple bits for indicating the access address of each subcommand. The storage area of the memory chip to be accessed includes multiple memory bank groups, each memory bank group includes multiple memory banks, and each memory bank is further divided into multiple rows and columns. Therefore, the process of address mapping is the process of determining the internal memory bank group, the memory bank, and the row and column according to the plurality of bits included in the subcommand. When this address mapping process is implemented, the address information of the access address needs to be transmitted through the address line, and the address line can also be understood as a wire for transmitting address information. Each of the plurality of bits included in the subcommand corresponds to an address line of the access address. Therefore, the address lines corresponding to each bit included in the subcommand include an address line indicating a bank group, an address line indicating a memory bank, and an address line indicating a row and a column. Any address line can transmit a high level or a low level, that is, each address line can transmit a binary address bit value of 0 or 1.
在一些实施例中,如果内存芯片被访问时支持的突发长度BL为2 M-1个单位,拆分后每个子命令所访问的数据量需要适应内存芯片支持的BL,即拆分后的每个子命令所访问的数据量(数据长度)是2 M-1个单位,M是大于1的整数。其中,这里的单位可以为Byte。例如,第一访问命令访问的数据长度是(2×2 M-1)Byte,如果将第一访问命令拆分为2个子命令,每个子命令访问的数据长度为2 M-1Byte;例如,内存芯片支持的BL为64Byte(M=7),第一访问命令访问的数据长度为128Byte,将第一访问命令拆分为2个子命令,每个子命令访问的数据长度为64Byte;再例如,内存芯片支持的BL为32Byte(M=6),第一访问命令访问的数据长度为64Byte,可以将第一访问命令拆分为2个子命令,每个子命令访问的数据长度为32Byte。 In some embodiments, if the burst length BL supported by the memory chip is 2 M-1 units, the amount of data accessed by each subcommand after splitting needs to be adapted to the BL supported by the memory chip, that is, the split The amount of data (data length) accessed by each subcommand is 2 M-1 units, where M is an integer greater than 1. Among them, the unit here can be Byte. For example, the length of the data accessed by the first access command is (2×2 M-1 ) Byte, and if the first access command is split into 2 subcommands, the length of data accessed by each subcommand is 2 M-1 Byte; for example, The BL supported by the memory chip is 64Byte (M=7), the data length accessed by the first access command is 128Byte, the first access command is divided into 2 subcommands, and the data length accessed by each subcommand is 64Byte; for another example, the memory The BL supported by the chip is 32Byte (M=6), the length of the data accessed by the first access command is 64Byte, the first access command can be divided into 2 subcommands, and the data length accessed by each subcommand is 32Bytes.
例如,第一访问命令访问的数据长度可能较小,例如为64Byte,即(2×2 5)Byte。第一访问命令在访问第二存储区域或第一存储区域时时,控制器81可以用于将第一访问命令拆分为2个子命令,2个子命令通过第二交织方案访问不同的内存库。特殊的,第一访问命令在访问第一存储区域时,如果第一访问命令访问的数据长度较小(例如64Byte),可以对带宽的要求较低,也可以不对第一访问命令进行拆分,即交织器82可以用于采用第一交织方案直接对第一访问命令的访问地址进行交织得到交织后的访问地址,这样也省略了命令拆分带来的资源消耗,访问能效更高。 For example, the length of the data accessed by the first access command may be small, for example, 64 Bytes, that is, (2×2 5 ) Bytes. When the first access command accesses the second storage area or the first storage area, the controller 81 may be configured to split the first access command into two subcommands, and the two subcommands access different memory banks through the second interleaving scheme. In particular, when the first access command accesses the first storage area, if the length of the data accessed by the first access command is small (for example, 64Byte), the bandwidth requirements may be lower, and the first access command may not be split, That is, the interleaver 82 can be used to directly interleave the access address of the first access command by using the first interleaving scheme to obtain the interleaved access address, which also omits the resource consumption caused by command splitting, and the access energy efficiency is higher.
为了进一步提高第一访问命令访问第二存储区域时的带宽使用效率,本申请的控制器81可以将第一访问命令拆分为更多数量个子命令,这样在进行地址交织时,更多数量个子命令在同时访问不同的内存库组时的带宽使用效率更高。因此,在本申请中,当第一访问命令访问第二存储区域时,第一访问命令被拆分为X个子命令,例如2个子命令;当第一访问命令访问第二存储区域时,则第一访问命令被拆分为Y个子命令;其中,Y大于X,且X和Y为大于1的整数。即第一访问命令访问第二存储区域时拆分的子命令个数大于第一访问命令访问第一存储区域时拆分的子命令个数。也即,第一访问命令访问第二存储区域时,访问的数据长度是(4×2 M-1)Byte,如果将第一访问命令拆分为4个子命令,每个子命令访问的数据长度为2 M-1Byte。例如,内存芯片支持的BL为32Byte(M=6),第一访问命令访问的数据长度为128Byte,第一访问命令拆分为4个子命令,每个子命令访问的数据长度为32Byte,4个子命令可以同时对不同的内存库组进行访问。 In order to further improve the bandwidth usage efficiency when the first access command accesses the second storage area, the controller 81 of the present application can split the first access command into a greater number of subcommands, so that when performing address interleaving, a greater number of subcommands Commands are more efficient in bandwidth usage when accessing different banks of memory at the same time. Therefore, in the present application, when the first access command accesses the second storage area, the first access command is split into X subcommands, such as 2 subcommands; when the first access command accesses the second storage area, the first access command is divided into X subcommands, such as 2 subcommands; An access command is split into Y subcommands; where Y is greater than X, and X and Y are integers greater than one. That is, the number of subcommands split when the first access command accesses the second storage area is greater than the number of subcommands split when the first access command accesses the first storage area. That is, when the first access command accesses the second storage area, the accessed data length is (4×2 M-1 ) Byte, and if the first access command is split into 4 subcommands, the data length accessed by each subcommand is 2 M-1 Byte. For example, the BL supported by the memory chip is 32Byte (M=6), the data length accessed by the first access command is 128Byte, the first access command is divided into 4 subcommands, the data length accessed by each subcommand is 32Byte, and the 4 subcommands Simultaneous access to different banks of memory is possible.
通常,内存访问装置80在访问内存芯片时发送的第一访问命令携带要访问的数据 长度的首地址和数据长度,而在将第一访问命令拆分为多个子命令时,每个子命令也会携带子命令自身对应的首地址和数据长度(2 M-1个单位),每个子命令携带的首地址不同,数据长度相同。因此,在进行地址映射时,是对每个子命令携带的首地址分别进行映射,来确定每个子命令所访问的内存库组、内存库以及行和列。其中,每个子命令的首地址可以根据要访问的数据长度确定,其中一个子命令携带的首地址可以与第一访问命令携带的首地址相同,其他子命令携带的首地址可以是对前一个确定的子命令的首地址进行地址偏移得到的,偏移量为2 M-1个单位的地址。示例性的,例如第一访问命令携带第一首地址,数据长度为(2×2 M-1)Byte,第一访问命令被拆分为2个子命令,其中一个子命令携带的首地址可以为第一首地址,数据长度为2 M-1Byte,另一个子命令携带的首地址可以对第一首地址进行2 M-1Byte的偏移得到的,数据长度也为2 M-1Byte。 Usually, the first access command sent by the memory access device 80 when accessing the memory chip carries the first address and data length of the data length to be accessed, and when the first access command is split into multiple subcommands, each subcommand will also Carry the corresponding first address and data length of the subcommand itself (2 M-1 units). The first address carried by each subcommand is different, and the data length is the same. Therefore, when performing address mapping, the first address carried by each subcommand is mapped separately to determine the memory bank group, memory bank, row and column accessed by each subcommand. Wherein, the first address of each subcommand can be determined according to the length of the data to be accessed, the first address carried by one subcommand can be the same as the first address carried by the first access command, and the first address carried by other subcommands can be determined for the previous one. The first address of the subcommand is obtained by performing the address offset, and the offset is the address of 2 M-1 units. Exemplarily, for example, the first access command carries the first address, the data length is (2×2 M-1 ) Byte, the first access command is split into 2 subcommands, and the first address carried by one subcommand can be The first address, the data length is 2 M-1 Byte, the first address carried by another subcommand can be obtained by offsetting the first address by 2 M-1 Byte, and the data length is also 2 M-1 Byte.
基于上述说明,可以理解,在一些实施例中,如果将拆分后的子命令的首地址在进行地址映射时,不同的子命令的首地址映射的内存库组不同,可以使得不同子命令可以同时访问不同的内存库组,进而使得多个子命令访问内存芯片的带宽使用效率较高。也就是说,在不同子命令的首地址对应的地址线指示的地址信息用于访问不同的内存库组时,内存芯片的带宽使用效率较高。这就需要不同子命令的首地址对应的地址线指示的地址信息中,指示内存库组的地址信息不同。Based on the above description, it can be understood that, in some embodiments, if the first addresses of the split subcommands are mapped to different memory bank groups when the first addresses of the split subcommands are mapped, the different subcommands can be At the same time, different memory bank groups are accessed, so that the bandwidth usage efficiency of multiple subcommands accessing the memory chip is high. That is to say, when the address information indicated by the address lines corresponding to the first addresses of different subcommands is used to access different memory bank groups, the bandwidth usage efficiency of the memory chip is relatively high. This requires that in the address information indicated by the address lines corresponding to the first addresses of different subcommands, the address information indicating the memory bank group is different.
上文已经说明,在经过地址线传输地址信息时,首地址被转换为二进制数(也就是上文中的多个位),然后所述多个位再转换为由地址线的高电平和低电平指示的地址信息,当内存芯片的一个Channel中有4个内存库组时,需要以2根地址线的地址信息指示所述4个内存库组,2根地址线的地址信息以二进制表示时可以包括00、01、10和11。例如,00指示BG0、01指示BG1、10指示BG2和11指示BG3。可以理解,要实现不同的子命令的首地址对应的地址线所指示内存库组的地址信息不同,用于指示内存库组的地址线的低位地址线就显得尤为重要。当根据不同的子命令的首地址确定指示内存库组的地址线的低位地址线的电平改变时,不同的子命令所访问的内存库组就不同。而使用哪些地址线传输内存库组的地址信息使得不同的子命令所访问的内存库组不同,可以根据每个子命令所访问的数据长度确定。It has been explained above that when the address information is transmitted through the address line, the first address is converted into a binary number (that is, a plurality of bits in the above), and then the plurality of bits are converted into a high level and a low level by the address line. When there are 4 memory bank groups in one Channel of the memory chip, the address information of 2 address lines needs to indicate the 4 memory bank groups. When the address information of the 2 address lines is expressed in binary Can include 00, 01, 10, and 11. For example, 00 indicates BG0, 01 indicates BG1, 10 indicates BG2 and 11 indicates BG3. It can be understood that, in order to realize that the address information of the memory bank groups indicated by the address lines corresponding to the first addresses of different subcommands is different, the low-order address lines used to indicate the address lines of the memory bank groups are particularly important. When the level of the lower address line of the address line indicating the memory bank group is determined to change according to the first address of different subcommands, the memory bank groups accessed by different subcommands are different. Which address lines are used to transmit the address information of the memory bank group makes the memory bank groups accessed by different subcommands different, which can be determined according to the length of the data accessed by each subcommand.
在一些实施例中,如果每个子命令所访问的数据量(数据长度)是2 M-1个单位,内存库组以2位指示时,在本申请的第二交织方案中,为了实现访问第二存储区域时,内存芯片的访问带宽较高,如图11所示,子命令的多个位中从低位(即最低位,也叫最低有效位(Least Significant Bit,LSB))开始的第M位与第N位可以用于共同指示每个子命令所访问的内存库组;N是大于M的整数。 In some embodiments, if the amount of data (data length) accessed by each subcommand is 2 M-1 units, and the memory bank group is indicated by 2 bits, in the second interleaving scheme of the present application, in order to achieve access to the first When there are two storage areas, the access bandwidth of the memory chip is relatively high. As shown in Figure 11, the Mth starting from the low bit (ie the lowest bit, also called the Least Significant Bit (LSB)) of the multiple bits of the subcommand The bit, together with the Nth bit, may be used to indicate the bank of memory accessed by each subcommand; N is an integer greater than M.
可以理解,本申请中,多个位中从低位开始的第M位为指示内存库组的地址信息中的低位,多个位中从低位开始的第N位为指示内存库组的地址信息中的高位(即最高位,也叫最高有效位(Most Significant Bit,MSB))。上文已经说明,拆分后的每个子命令中一个子命令的首地址与第一访问命令的首地址相同,将该子命令的首地址偏移了2 M-1个单位可以得到后一个子命令的首地址,该子命令的多个位中从低位开始的第M位的地址信息与后一个子命令的多个位中从低位开始的第M位的地址信息不同,例如分别是0和1。当不同的子命令对应的第M位的地址信息改变时,意味着不 同子命令的多个位中指示子命令所访问的内存库组改变,也进一步意味着第一访问命令拆分后的子命令可以访问不同的内存库组。因此,本申请为了实现第二交织方案中,访问第二内存区域的带宽较高,将子命令中的多个位中从低位开始的第M位用于指示所访问的内存库组的低位。相应的,用于指示所访问的内存库组的高位,即第N位,高于第M位。 It can be understood that, in this application, the Mth bit from the low order in the multiple bits is the low order in the address information indicating the memory bank group, and the Nth bit from the low order in the plurality of bits is the address information indicating the memory bank group. The high bit (ie the highest bit, also called the most significant bit (Most Significant Bit, MSB)). It has been explained above that the first address of a subcommand in each subcommand after splitting is the same as the first address of the first access command, and the second subcommand can be obtained by offsetting the first address of the subcommand by 2 M-1 units. The first address of the command, the address information of the M-th bit from the low-order bit in the multiple bits of the subcommand is different from the address information of the M-th bit from the low-order bit in the multiple bits of the next subcommand, such as 0 and 1. When the address information of the M-th bit corresponding to different subcommands changes, it means that the memory bank group accessed by the indicated subcommands in the multiple bits of different subcommands is changed, which further means that the subcommand after the first access command is split Commands can access different memory bank groups. Therefore, in the present application, in order to implement the second interleaving scheme, the bandwidth for accessing the second memory region is relatively high, and the Mth bit from the lower bit in the multiple bits in the subcommand is used to indicate the lower bit of the accessed memory bank group. Correspondingly, it is used to indicate that the high bit of the accessed memory bank group, that is, the Nth bit, is higher than the Mth bit.
示例性的,假设将第一访问命令拆分后的子命令有2个,第一子命令的首地址与第一访问命令的首地址相同,第一子命令中的多个位中从低位开始的第M位的地址信息为0,第二子命令的首地址是对第一子命令的首地址进行2 M-1个单位偏移后得到的,那么第二子命令中的多个位中从低位开始的第M位的地址信息变为1,第二子命令访问的内存库组与第一子命令访问的内存库组不同。 Exemplarily, it is assumed that there are two subcommands after the first access command is split, the first address of the first subcommand is the same as the first address of the first access command, and the multiple bits in the first subcommand start from the low-order bit. The address information of the Mth bit is 0, and the first address of the second subcommand is obtained by offsetting the first address of the first subcommand by 2 M-1 units, then the multiple bits in the second subcommand The address information of the Mth bit from the lower order becomes 1, and the memory bank group accessed by the second subcommand is different from the memory bank group accessed by the first subcommand.
进一步的,在第二交织方案中,如果每个子命令所访问的数据量(数据长度)是2 M-1个单位,在子命令的多个位中从低位开始的第M位与第N位可以用于共同指示每个子命令所访问的内存库组,通常,每个内存库组包括4个内存库,因此也可以使用多个位中的2位对应的地址线传输内存库的地址信息,例如内存库包括bank0、bank1、bank2和bank3,使用2位地址线传输每个bank的地址信息,bank的地址信息例如为00、01、10和11。由此,本申请的第二交织方案中,如图12所示,在图11的基础上,进一步地,每个子命令包括的多个位中从低位开始的第R位与第S位用于指示每个子命令所访问的内存库,R为大于N的整数,S为大于R的整数,即第R位为指示内存库的地址信息的低位,第S位为指示内存库的地址信息的高位。即本申请的第二交织方案中,指示内存库的位高于指示内存库组的位。 Further, in the second interleaving scheme, if the amount of data (data length) accessed by each subcommand is 2 M-1 units, the Mth bit and the Nth bit starting from the low order in the multiple bits of the subcommand. It can be used to jointly indicate the memory bank group accessed by each subcommand. Usually, each memory bank group includes 4 memory banks, so the address information of the memory bank can also be transmitted by using the address line corresponding to 2 bits in multiple bits. For example, the memory bank includes bank0, bank1, bank2 and bank3, and 2-bit address lines are used to transmit the address information of each bank, for example, the address information of the bank is 00, 01, 10 and 11. Therefore, in the second interleaving scheme of the present application, as shown in FIG. 12 , on the basis of FIG. 11 , further, the R-th bit and the S-th bit starting from the low-order bits in the multiple bits included in each subcommand are used for Indicates the memory bank accessed by each subcommand, R is an integer greater than N, S is an integer greater than R, that is, the R-th bit is the low-order bit of the address information indicating the memory bank, and the S-th bit is the high-order bit of the address information indicating the memory bank . That is, in the second interleaving scheme of the present application, the bit indicating the memory bank is higher than the bit indicating the memory bank group.
需要理解,以上的内存库组的数量和内存库的数量均为4,因此用于指示内存库组的地址信息和用于指示内存库的地址信息均为2位信息。实际上,内存库组的数量和内存库的数量是可设置为其他数值的,比如,内存库组和内存库中任一个的数量是8或16等。因此用于指示内存库组的地址信息和用于指示内存库的地址信息也可以用更多位或更少位信息来表示。以上实施例描述和附图仅用于理解但不用于限定本实施例的技术方案。It should be understood that the number of the above memory bank groups and the number of memory banks are both 4, so the address information used to indicate the memory bank group and the address information used to indicate the memory bank are both 2-bit information. In fact, the number of memory bank groups and the number of memory banks can be set to other values, for example, the number of either the memory bank group or the memory bank is 8 or 16, etc. Therefore, the address information for indicating the memory bank group and the address information for indicating the memory bank can also be represented by more or less bits of information. The above description of the embodiments and the accompanying drawings are only for understanding but not for limiting the technical solutions of the present embodiment.
根据上文的说明可以理解,第一交织方案和第二交织方案实质是地址映射的过程。通常,将访问命令的地址映射为内存库组的地址、内存库的地址和内存库中行和列的地址时,所使用的地址线为31位地址线,以a0、a1、a2、…、a30指示这31位地址线,每位地址线对应一个二进制的位,子命令包括多个位,可以根据子命令的多个位的比特值控制这31位地址线输出高电平还是低电平。It can be understood from the above description that the first interleaving scheme and the second interleaving scheme are essentially address mapping processes. Generally, when the address of the access command is mapped to the address of the memory bank group, the address of the memory bank, and the address of the row and column in the memory bank, the address lines used are 31-bit address lines, with a0, a1, a2, ..., a30 Indicate the 31-bit address lines, each address line corresponds to a binary bit, the subcommand includes multiple bits, and the 31-bit address line can be controlled to output a high level or a low level according to the bit value of the multiple bits of the subcommand.
示例性的,以内存库组为BG(包括BG0、BG1、BG2和BG3),内存库为bank(每个BG包括bank0、bank1、bank2和bank3),行为Row,列为Col为例,本申请举例一种方案一:如果内存芯片指示的突发长度BL为64Byte,即2 6Byte,根据上文的说明,M=7,BG以两个位的地址线指示,bank也以两位地址线指示时,第二交织方案中,多个位中从低位开始的第7位(a6)与高于第7位的一位用于共同指示每个子命令所访问的BG;第二交织方案中,假设N=13,R=14,S=15,参见图13,则多个位中从低位开始的第7位(a6)与第13位(a12)用于指示每个子命令所访问的BG,第14位(a13)与第15位(a14)用于指示BG中的bank。 Exemplarily, taking the memory bank group as BG (including BG0, BG1, BG2, and BG3), the memory bank as bank (each BG includes bank0, bank1, bank2, and bank3), the behavior Row, and the column as Col as an example, this application For example, a scheme 1: If the burst length BL indicated by the memory chip is 64 Bytes, that is, 2 6 Bytes, according to the above description, M=7, BG is indicated by a two-bit address line, and bank is also indicated by a two-bit address line When indicating, in the second interleaving scheme, the 7th bit (a6) starting from the low-order bit and a bit higher than the 7th bit are used to jointly indicate the BG accessed by each subcommand; in the second interleaving scheme, Assuming N=13, R=14, S=15, see Fig. 13, the 7th bit (a6) and the 13th bit (a12) from the lower order of the multiple bits are used to indicate the BG accessed by each subcommand, The 14th bit (a13) and the 15th bit (a14) are used to indicate the bank in the BG.
基于上述举例,这31位地址线与BG、bank以及行和列的映射关系的一种推荐映射可以如图13所示。需要说明的是,图13中从低位开始的a0地址线没有进行地址映射,从a1地址线开始进行地址映射。根据上文对图13的阐述的原理,除a0地址线不参与地址映射以外,在第二交织方案中,a6地址线和a12地址线为映射BG的地址线,a13地址线和a14地址线为映射bank的地址线,其余的地址线为映射行和列的地址线。Based on the above example, a recommended mapping of the mapping relationship between the 31-bit address lines and BG, bank, and row and column may be as shown in FIG. 13 . It should be noted that in FIG. 13 , the address mapping is not performed on the a0 address line starting from the low order, and the address mapping is performed starting from the a1 address line. According to the principle described above for FIG. 13 , except that the a0 address line does not participate in address mapping, in the second interleaving scheme, the a6 address line and the a12 address line are the address lines for mapping BG, and the a13 address line and the a14 address line are The address lines of the bank are mapped, and the remaining address lines are address lines for mapping rows and columns.
基于图13举例的地址线的映射关系,假设第一访问命令携带的首地址为16进制的10000000,数据长度为128Byte,如图14所示,假设第一访问命令访问的通道为通道0,拆分后的2个子命令中,一个子命令1的首地址为16进制的10000000,转换为二进制后首地址为10000000000000000000000000000,数据长度为64Btye;另一个子命令2的首地址为16进制的10000040,转换为二进制后首地址为10000000000000000000001000000,数据长度为64Byte。根据子命令1的首地址与图13提供的映射关系可以得到,BG0位的值为0,BG1位的值为0,那么子命令1访问的BG为BG0;根据子命令2的首地址与图13提供的映射关系可以得到,BG0位的值为1,BG1位的值为0,子命令2访问的BG为BG1,子命令1和子命令2访问的BG不同,内存芯片可以同时在第二存储区域不同的BG中处理子命令1和子命令2访问的数据,带宽使用效率较高。Based on the mapping relationship of the address lines shown in Figure 13, it is assumed that the first address carried by the first access command is 10000000 in hexadecimal, and the data length is 128Byte, as shown in Figure 14, assuming that the channel accessed by the first access command is channel 0, Among the two split subcommands, the first address of one subcommand 1 is 10000000 in hexadecimal, and the first address after conversion to binary is 1000000000000000000000000000, and the data length is 64Btye; the first address of another subcommand 2 is 10000040 in hexadecimal. , after conversion to binary, the first address is 10000000000000000000001000000, and the data length is 64Byte. According to the mapping relationship between the first address of subcommand 1 and Figure 13, the value of BG0 bit is 0, and the value of BG1 bit is 0, then the BG accessed by subcommand 1 is BG0; according to the first address of subcommand 2 and Figure 13 The mapping relationship provided by 13 can be obtained, the value of BG0 bit is 1, the value of BG1 bit is 0, the BG accessed by subcommand 2 is BG1, the BG accessed by subcommand 1 and subcommand 2 are different, and the memory chip can be stored in the second memory at the same time. Data accessed by subcommand 1 and subcommand 2 are processed in BGs with different regions, and the bandwidth usage efficiency is high.
在一些实施例中,本申请再举例一种方案二:如果第一访问命令访问的数据长度为128Byte,第一访问命令用于访问第二存储区域,那么第一访问命令还可以被拆分为4个子命令,每个子命令访问的数据长度为32Byte,即2 5Byte,即M=6,那么在第二交织方案中,多个位中从低位开始的第6位(a5)与高于第6位的一位用于共同指示每个子命令所访问的BG;第二交织方案中,假设N=13,R=14,S=15,参见图15,则多个位中从低位开始的第6位(a5)与第13位(a12)用于指示每个子命令所访问的BG,第14位(a13)与第15位(a14)用于指BG内的bank。 In some embodiments, this application provides another example of a second solution: if the length of the data accessed by the first access command is 128 Bytes, and the first access command is used to access the second storage area, the first access command can also be split into 4 subcommands, the length of data accessed by each subcommand is 32Byte, that is, 25 Byte, that is, M=6, then in the second interleaving scheme, the 6th bit (a5) starting from the lower bit in the multiple bits is the same as the one higher than the 6th bit. One of the 6 bits is used to jointly indicate the BG accessed by each subcommand; in the second interleaving scheme, assuming that N=13, R=14, and S=15, see FIG. The 6th bit (a5) and the 13th bit (a12) are used to indicate the BG accessed by each subcommand, and the 14th bit (a13) and the 15th bit (a14) are used to indicate the bank in the BG.
基于上述举例,这31位地址线与BG、bank以及行和列的映射关系的一种推荐映射可以如图15所示。仍然假设第一访问命令携带的首地址为16进制的10000000,数据长度为128Byte,如图16所示,假设第一访问命令访问的通道为通道0,拆分后的4个子命令中,子命令1的首地址为16进制的10000000,转换为二进制后首地址为10000000000000000000000000000,数据长度为32Btye;子命令2的首地址为16进制的10000020,转换为二进制后首地址为10000000000000000000000100000,数据长度为32Byte;子命令3的首地址为16进制的10000040,转换为二进制后首地址为10000000000000000000001000000,数据长度为32Byte;子命令4的首地址为16进制的10000060,转换为二进制后首地址为10000000000000000000001100000,数据长度为32Byte。根据子命令1的首地址与图15提供的映射关系可以得到,BG0位的值为0,BG1位的值为0,那么子命令1访问的BG为BG0(地址信息为00);根据子命令2的首地址与图15提供的映射关系可以得到,BG0位的值为1,BG1位的值为0,子命令2访问的BG为BG1(地址信息为01);根据子命令3的首地址与图15提供的映射关系可以得到,BG0位的值为0,BG1位的值为0,子命令3访问的BG为BG0(地址信息为00);根据子命令4的首地址与图15提供的映射关系可以得到,BG0位的值为1,BG1位的值为0,子命令4访问的BG为BG1(地址信息为01)。即子命令1 和子命令3均访问了BG0,子命令2和子命令4访问了BG1,也实现了多个子命令访问不同的BG。Based on the above example, a recommended mapping of the mapping relationship between the 31-bit address lines and BG, bank, and row and column may be shown in FIG. 15 . It is still assumed that the first address carried by the first access command is 10000000 in hexadecimal, and the data length is 128 Bytes, as shown in Figure 16. Assuming that the channel accessed by the first access command is channel 0, among the four split subcommands, the subcommand The first address of command 1 is 10000000 in hexadecimal. After conversion to binary, the first address is 10000000000000000000000000000, and the data length is 32Btye; the first address of subcommand 2 is 10000020 in hexadecimal. ; The first address of subcommand 3 is 10000040 in hexadecimal, after conversion to binary, the first address is 1000000000000000000001000000, and the data length is 32Byte; the first address of subcommand 4 is 10000060 in hexadecimal, and the first address after conversion to binary is 1000000000000000000001100000, and the data length is 32Byte. According to the mapping relationship between the first address of subcommand 1 and Figure 15, the value of BG0 bit is 0, and the value of BG1 bit is 0, then the BG accessed by subcommand 1 is BG0 (address information is 00); according to the subcommand The first address of 2 and the mapping relationship provided in Figure 15 can be obtained, the value of the BG0 bit is 1, the value of the BG1 bit is 0, and the BG accessed by the subcommand 2 is BG1 (address information is 01); according to the first address of the subcommand 3 The mapping relationship provided by Figure 15 can be obtained, the value of the BG0 bit is 0, the value of the BG1 bit is 0, and the BG accessed by the subcommand 3 is BG0 (address information is 00); according to the first address of the subcommand 4 and Figure 15 provide The mapping relationship can be obtained, the value of the BG0 bit is 1, the value of the BG1 bit is 0, and the BG accessed by the subcommand 4 is BG1 (the address information is 01). That is, subcommand 1 and subcommand 3 both access BG0, subcommand 2 and subcommand 4 access BG1, and multiple subcommands access different BGs.
在一些实施例中,该方案二的另一种举例可以为:如果第一访问命令访问的数据长度为64Byte,第一访问命令用于访问第二存储区域,那么第一访问命令可以被拆分为2个子命令,每个子命令访问的数据长度为32Byte,即2 5Byte,即M=6,那么地址映射关系也可以参见图15示出的映射关系。 In some embodiments, another example of the second solution may be: if the length of the data accessed by the first access command is 64 Bytes, and the first access command is used to access the second storage area, then the first access command may be split For 2 subcommands, the length of data accessed by each subcommand is 32 Bytes, that is, 25 Bytes, that is, M=6, then the address mapping relationship can also refer to the mapping relationship shown in FIG. 15 .
在一些实施例中,如果每个子命令所访问的数据量(数据长度)是2 M-1个单位,内存库组以2位指示时,在本申请的第一交织方案中,为了实现访问第一存储区域时,内存芯片的功耗较低,访问能效较高,如图17所示,子命令的多个位中从低位开始的第P位与第Q位用于共同指示每个子命令所访问的内存库组,P为大于M的整数,Q为大于P的整数。可以理解,子命令的多个位中从低位开始的第P位用于指示子命令所访问的内存库组的2位中的低位,多个位中从低位开始的第Q位用于指示子命令所访问的内存库组的2位中的高位。 In some embodiments, if the amount of data (data length) accessed by each subcommand is 2 M-1 units, and the memory bank group is indicated by 2 bits, in the first interleaving scheme of the present application, in order to achieve access to the first When there is a storage area, the power consumption of the memory chip is low, and the access energy efficiency is high. As shown in Figure 17, the P-th bit and the Q-th bit starting from the low bit in the multiple bits of the subcommand are used to jointly indicate the location of each subcommand. The memory bank group to be accessed, where P is an integer greater than M, and Q is an integer greater than P. It can be understood that the P-th bit from the low-order bit in the multiple bits of the subcommand is used to indicate the low-order bit in the 2-bit of the memory bank group accessed by the sub-command, and the Q-th bit from the low-order bit in the multiple bits is used to indicate the sub-command. The higher of the 2 bits of the bank group accessed by the command.
由于第一交织方案中,P大于M,当将前一个子命令的首地址偏移了2 M-1个单位得到后一个子命令的首地址时,前一个子命令的多个位中从低位开始的第M位的地址信息和后一个子命令的多个位中从低位开始的第M位的地址信息改变,但是,假设拆分前的访问命令的访问地址都是连续的,在很长一段地址范围内,访问命令被拆分后的子命令中,前一个子命令的多个位中从低位开始的第P位的地址信息与后一个子命令的多个位中从低位开始的第P位的地址信息都是相同的,即第P位的地址信息改变需要在累积访问了(2 P-1)kb的数据量后才会发生。也就是说,不同子命令的多个位中指示子命令所访问的内存库组在这段地址范围内都是相同的,那么上述第一访问命令的访问地址在这段连续的地址范围内时,第一访问命令被拆分后的子命令也都访问了相同的内存库组。因此,本申请为了实现第一交织方案中,访问第一内存区域的功耗较低,能效较高。 Since P is greater than M in the first interleaving scheme, when the first address of the next subcommand is obtained by offsetting the first address of the previous subcommand by 2 M-1 units, the lower bits of the previous subcommand will be The address information of the M-th bit at the beginning and the address information of the M-th bit starting from the lower order of the multiple bits of the subsequent subcommand change. However, it is assumed that the access addresses of the access commands before the split are all continuous, and in a very long Within a range of addresses, in the subcommands after the access command is split, the address information of the Pth bit from the lower bit in the multiple bits of the previous subcommand is the same as the address information of the Pth bit from the lower bit in the multiple bits of the next subcommand. The address information of the P bits are all the same, that is, the change of the address information of the P-th bit will only occur after accumulating access to the data amount of (2 P-1 )kb. That is to say, the memory bank groups accessed by the indicated subcommands in multiple bits of different subcommands are all the same within this address range, then when the access address of the first access command is within this continuous address range , the subcommands after the first access command is split also all access the same memory bank group. Therefore, in order to implement the first interleaving scheme in the present application, the power consumption of accessing the first memory region is low, and the energy efficiency is high.
进一步的,在第一交织方案中,与第二交织方案中指示内存库的原理类似,如图18所示,本申请的第一交织方案中,每个子命令包括的多个位中从低位开始的第J位与第K位用于指示每个子命令所访问的内存库,J为大于Q的整数,K为大于J的整数。即本申请的第一交织方案中,指示内存库的位高于指示内存库组的位。Further, in the first interleaving scheme, similar to the principle of indicating the memory bank in the second interleaving scheme, as shown in FIG. 18 , in the first interleaving scheme of the present application, the multiple bits included in each subcommand start from the low-order bit. The J-th and K-th bits are used to indicate the memory bank accessed by each subcommand, where J is an integer greater than Q, and K is an integer greater than J. That is, in the first interleaving scheme of the present application, the bit indicating the memory bank is higher than the bit indicating the memory bank group.
在一些实施例中,为了进一步使得第一交织方案所能达到的功耗更低,能效更高,多个子命令中的多个位中从低位开始的第J位与第K位指示的每个子命令所访问的内存库也可以相同,当多个子命令访问同一内存库组中的同一内存库时,被激活的内存库较少,功耗更低,能效更高。In some embodiments, in order to further enable the first interleaving scheme to achieve lower power consumption and higher energy efficiency, each subcommand indicated by the Jth bit starting from the lower bit and the Kth bit in the multiple bits in the multiple subcommands The memory banks accessed by the command can also be the same. When multiple subcommands access the same memory bank in the same memory bank group, fewer memory banks are activated, the power consumption is lower, and the energy efficiency is higher.
示例性的,本申请举例一种方案三:以内存库组为BG(包括BG0、BG1、BG2和BG3),内存库为bank(每个BG包括bank0、bank1、bank2和bank3),行为Row,列为Col为例,如果内存芯片指示的突发长度BL为64Byte,即2 6Byte,根据上文的说明,M=7,BG以两位地址线指示,bank也以两位地址线指示。 Exemplarily, this application exemplifies a third solution: the memory bank group is BG (including BG0, BG1, BG2 and BG3), the memory bank is bank (each BG includes bank0, bank1, bank2 and bank3), and the behavior is Row, Taking Col as an example, if the burst length BL indicated by the memory chip is 64 Bytes, that is, 2 6 Bytes, according to the above description, M=7, BG is indicated by two address lines, and bank is also indicated by two address lines.
在第一交织方案中,子命令的多个位中从低位开始的第8位或第8位以上的位(a7或a7以上的位),与高于第8位的一个位用于共同指示每个子命令所访问的BG;第一交织方案中,假设Q=13,J=14,K=15。以图19为例,子命令的多个位中从低位开 始的第8位(a7)与第13位(a12)用于指示每个子命令所访问的BG,第14位(a13)与第15位(a14)用于指示BG内的bank。In the first interleaving scheme, the 8th bit or the bit above the 8th bit (a7 or the bit above a7) from the lower bit among the multiple bits of the subcommand is used to jointly indicate a bit higher than the 8th bit BG accessed by each subcommand; in the first interleaving scheme, assume that Q=13, J=14, and K=15. Taking Figure 19 as an example, the 8th bit (a7) and the 13th bit (a12) from the low order in the multiple bits of the subcommand are used to indicate the BG accessed by each subcommand, the 14th bit (a13) and the 15th bit. Bit (a14) is used to indicate the bank within the BG.
基于上述举例,这31位地址线与BG、bank以及行和列的映射关系的一种推荐映射可以如图19所示。图19示出的第一交织方案中,a11地址线和a12地址线为映射BG的地址线,图19中第一交织方案中的BG0和BG1为指示BG的两个位,a11地址线与BG0位对应,a12地址线与BG1对应,a11地址线为指示BG的低位地址线,a12地址线为指示BG的高位地址线,也即BG0位为指示访问的BG的低位,BG1位为指示访问的BG的高位。当子命令中BG0位和BG1位的比特值为00时,意味着子命令访问的BG为BG0,a11地址线输出低电平,a12地址线输出低电平;当子命令中BG0位和BG1位的比特值为01时,意味着子命令访问的BG为BG1,a11地址线输出高电平,a12地址线输出低电平;当子命令中BG0位和BG1位的比特值为10时,意味着子命令访问的BG为BG2,a11地址线输出低电平,a12地址线输出高电平;当子命令中BG0位和BG1位的比特值为11时,意味着子命令访问的BG为BG3,a11地址线输出高电平,a12地址线输出高电平。Based on the above example, a recommended mapping of the mapping relationship between the 31-bit address lines and BG, bank, and row and column may be as shown in FIG. 19 . In the first interleaving scheme shown in FIG. 19 , the a11 address line and the a12 address line are address lines for mapping BG, BG0 and BG1 in the first interleaving scheme in FIG. 19 are two bits indicating BG, and the a11 address line and BG0 Bit correspondence, the a12 address line corresponds to BG1, the a11 address line is the low-order address line indicating BG, and the a12 address line is the high-order address line indicating BG, that is, the BG0 bit is the low-order BG indicating the access, and the BG1 bit is indicating the visit. BG's high. When the bit value of the BG0 and BG1 bits in the subcommand is 00, it means that the BG accessed by the subcommand is BG0, the a11 address line outputs a low level, and the a12 address line outputs a low level; when the BG0 and BG1 bits in the subcommand When the bit value of the bit is 01, it means that the BG accessed by the subcommand is BG1, the a11 address line outputs a high level, and the a12 address line outputs a low level; when the bit value of the BG0 and BG1 bits in the subcommand is 10, It means that the BG accessed by the subcommand is BG2, the address line a11 outputs a low level, and the address line a12 outputs a high level; when the bit value of the BG0 and BG1 bits in the subcommand is 11, it means that the BG accessed by the subcommand is BG3, a11 address line output high level, a12 address line output high level.
与第一交织方案中指示BG的地址映射方案类似,图19中a13地址线(在图19中略写)和a14地址线为映射bank的地址线,图19中的BA0位和BA1位为指示bank的两个位,BA1位和BA0位的值为00时,意味着子命令访问的是bank0,BA1位和BA0位的值为01时,意味着子命令访问的是bank1,BA1位和BA0位的值为10时,意味着子命令访问的是bank2,BA1位和BA0位的值为11时,意味着子命令访问的是bank3。图19中映射BG和bank以外的位则为指示行和列的位,例如图19中的Row15:0位为指示行的位,Col9:6、Col5以及Col4:0位为指示列的位。Similar to the address mapping scheme indicating BG in the first interleaving scheme, the a13 address line (abbreviated in Figure 19) and a14 address line in Figure 19 are the address lines for mapping the bank, and the BA0 and BA1 bits in Figure 19 indicate the bank. When the value of BA1 bit and BA0 bit is 00, it means that the subcommand accesses bank0, and when the value of BA1 bit and BA0 bit is 01, it means that the subcommand accesses bank1, BA1 bit and BA0 bit When the value is 10, it means that the subcommand accesses bank2, and when the BA1 and BA0 bits are 11, it means that the subcommand accesses bank3. Bits other than BG and bank are mapped in FIG. 19 to indicate rows and columns. For example, the Row15:0 bits in FIG. 19 are bits that indicate rows, and the Col9:6, Col5, and Col4:0 bits are bits that indicate columns.
可以理解,31根地址线中有两根地址线(a11地址线和a12地址线)用于指示子命令访问的BG,两根地址线(a13地址线和a14地址线)用于指示子命令访问的bank,其余的地址线用于指示子命令访问的行和列。但是以上数字不用于限定本实施例,实际的用于指示BG和bank的地址线的数量取决于实际应用需求,例如取决于被拆分的子命令数量。It can be understood that in the 31 address lines, two address lines (a11 address line and a12 address line) are used to indicate the BG of subcommand access, and two address lines (a13 address line and a14 address line) are used to indicate subcommand access. bank, the remaining address lines are used to indicate the row and column accessed by the subcommand. However, the above numbers are not used to limit this embodiment, and the actual number of address lines used to indicate BG and bank depends on actual application requirements, for example, on the number of subcommands to be split.
针对第一交织方案,基于图19举例的地址线的映射关系,假设第一访问命令携带的首地址为16进制的10000000,数据长度为128Byte,如图20所示,拆分后的2个子命令中,一个子命令1的首地址为16进制的10000000,转换为二进制后首地址为10000000000000000000000000000,数据长度为64Btye;另一个子命令2的首地址为16进制的10000040,转换为二进制后首地址为10000000000000000000001000000,数据长度为64Byte。根据子命令1的首地址与图19提供的映射关系可以得到,BG0位的值为0,BG1位的值为0,那么子命令1访问的BG为BG0;根据子命令2的首地址与图19提供的映射关系可以得到,BG0位的值为0,BG1位的值为0,子命令2访问的BG也为BG0,子命令1和子命令2访问的BG相同。For the first interleaving scheme, based on the mapping relationship of the address lines shown in Figure 19, it is assumed that the first address carried by the first access command is 10000000 in hexadecimal, and the data length is 128 Bytes. In the command, the first address of a subcommand 1 is 10000000 in hexadecimal, the first address after conversion to binary is 10000000000000000000000000000, and the data length is 64Btye; the first address of another subcommand 2 is 10000040 in hexadecimal, and the first address after conversion to binary It is 10000000000000000000001000000, and the data length is 64Byte. According to the mapping relationship between the first address of subcommand 1 and Figure 19, the value of BG0 bit is 0, and the value of BG1 bit is 0, then the BG accessed by subcommand 1 is BG0; The mapping relationship provided by 19 can be obtained, the value of the BG0 bit is 0, the value of the BG1 bit is 0, the BG accessed by subcommand 2 is also BG0, and the BG accessed by subcommand 1 and subcommand 2 are the same.
进一步举例,在子命令1和子命令2中,BA0位的值为0,BA1的值为0,子命令1和子命令2访问的bank也相同。这样,拆分后的多个子命令在访问同一BG,甚至同一BG的同一bank,这种设置可以使得内存芯片中的第一存储区域被访问的bank数量较少,内存芯片的功耗较低,能效较高。For further example, in subcommand 1 and subcommand 2, the value of the BA0 bit is 0, the value of BA1 is 0, and the banks accessed by subcommand 1 and subcommand 2 are also the same. In this way, the split subcommands are accessing the same BG, or even the same bank of the same BG. This setting can make the number of banks accessed in the first storage area in the memory chip less, and the power consumption of the memory chip is lower. Higher energy efficiency.
通过上述对第一交织方案和第二交织方案的说明,可以看出,本申请采用不同的地址交织方案(或者说地址映射方案),可以使得第一访问命令拆分后的多个子命令访问的内存库组相同或者不同。当内存库组不同时,内存芯片的访问带宽更高,访问效率较高;当内存库组相同时,内存芯片的访问功耗较低,能效较高。From the above description of the first interleaving scheme and the second interleaving scheme, it can be seen that different address interleaving schemes (or address mapping schemes) are adopted in this application, so that the multiple subcommands after the splitting of the first access command can access the The memory bank groups are the same or different. When the memory bank groups are different, the access bandwidth of the memory chips is higher and the access efficiency is higher; when the memory bank groups are the same, the access power consumption of the memory chips is lower and the energy efficiency is higher.
可以理解,现有技术中的BG交织是通过拼凑重排逻辑对访问命令的顺序进行排序实现的访问不同的BG。在本申请中,BG交织可以是通过对访问命令的地址按照对应的交织方案映射至BG实现的。其中针对大带宽业务场景的第二存储区域,BG交织是通过第二交织方案,将访问命令的地址映射至了不同的BG,以从不同的BG访问数据。这些情况下,BG交织可以理解为访问不同BG的过程,访问内存芯片的带宽较高。此外,本申请针对小带宽业务场景的第一存储区域,BG交织是通过第一交织方案,将访问命令的地址映射至了相同的BG,甚至相同的BG中的同一bank。这种情况下,BG交织可以理解为访问相同BG的过程,此时内存芯片的功耗较低。可以理解,本申请中的交织可以理解为地址映射,例如本申请中将访问命令的地址映射至不同BG的地址,或映射至相同BG的地址,甚至相同BG的同一bank的地址过程。It can be understood that the BG interleaving in the prior art is to access different BGs by piecing together the rearrangement logic to sort the order of the access commands. In this application, the BG interleaving may be implemented by mapping the address of the access command to the BG according to the corresponding interleaving scheme. For the second storage area of the high-bandwidth service scenario, the BG interleaving is to map the addresses of the access commands to different BGs through the second interleaving scheme, so as to access data from different BGs. In these cases, BG interleaving can be understood as the process of accessing different BGs, and the bandwidth of accessing memory chips is high. In addition, for the first storage area of the low-bandwidth service scenario in this application, the BG interleaving is to map the address of the access command to the same BG, or even the same bank in the same BG, through the first interleaving scheme. In this case, BG interleaving can be understood as the process of accessing the same BG, and the power consumption of the memory chip is low at this time. It can be understood that interleaving in this application can be understood as address mapping, for example, in this application, the address of an access command is mapped to an address of a different BG, or to an address of the same BG, or even an address process of the same bank of the same BG.
基于上述本申请提出的内存访问方法,下面对本申请提供的内存访问装置80进行说明。基于上述对本申请对不同的存储区域采用不同的交织方案进行内存访问的方法的基础上,下面对本申请提供的内存访问装置80进一步进行说明,如图21所示为本申请提供另一种内存访问装置80的示意图。Based on the above-mentioned memory access method proposed by the present application, the memory access device 80 provided by the present application will be described below. On the basis of the above-mentioned methods of using different interleaving schemes for memory access to different storage areas in the present application, the following further describes the memory access device 80 provided by the present application. As shown in FIG. 21 , another memory access method is provided for the present application. Schematic diagram of device 80 .
从上文可以理解,本申请的方案涉及到对第一访问命令访问的存储区域的确定、命令的拆分、地址的交织以及子命令的发送等过程,因此,上述内存访问装置80可以是芯片内的DDR控制器内新增的一个装置。DDR控制器例如可以是SoC、或GPU等芯片内的装置。与现有的SoC或GPU芯片不同的是,本申请对SoC或GPU芯片内的DDR控制器内的硬件电路进行了改进,即新增了内存访问装置80。如图21所示,以SoC为例,SoC中包括CPU、总线、缓存、电源管理单元、先入先出队列缓存以及DDR控制器等。其中,DDR控制器包括缓冲区电路83、PHY 84和上述内存访问装置80,即控制器81和交织器82。It can be understood from the above that the solution of the present application involves the determination of the storage area accessed by the first access command, the splitting of commands, the interleaving of addresses, and the sending of subcommands. Therefore, the above-mentioned memory access device 80 may be a chip A new device added to the DDR controller inside. The DDR controller may be, for example, a device within a chip such as an SoC or a GPU. Different from the existing SoC or GPU chip, the present application improves the hardware circuit in the DDR controller in the SoC or GPU chip, that is, a memory access device 80 is added. As shown in Figure 21, taking the SoC as an example, the SoC includes a CPU, a bus, a cache, a power management unit, a first-in first-out queue cache, and a DDR controller. Wherein, the DDR controller includes a buffer circuit 83, a PHY 84 and the above-mentioned memory access device 80, that is, a controller 81 and an interleaver 82.
缓冲区电路83,可以用于缓存来自SoC内CPU的访问命令,由于在访问内存芯片的数据之前,CPU发送的访问命令都是预先向DDR控制器发送多个访问命令并缓存在缓冲区电路83中。而后,内存访问装置80内控制器81再从缓冲区电路83申请读取访问命令,缓冲区电路83可以按照接收到的访问命令的地址顺序向控制器81发送访问内存芯片的访问命令。其中,访问命令携带将要访问的数据的访问地址和数据长度。该访问命令可以是读操作的命令,也可以是写操作的命令。The buffer circuit 83 can be used to cache the access commands from the CPU in the SoC, because before accessing the data of the memory chip, the access commands sent by the CPU are sent to the DDR controller in advance and cached in the buffer circuit 83. middle. Then, the controller 81 in the memory access device 80 applies for a read access command from the buffer circuit 83, and the buffer circuit 83 can send the access command to access the memory chip to the controller 81 according to the address sequence of the received access commands. The access command carries the access address and data length of the data to be accessed. The access command may be a command for a read operation or a command for a write operation.
控制器81,可以用于在从到缓冲区电路83读取访问命令后,对访问命令实现上述存储区域的确定和命令的拆分等过程。交织器82,用于实现上述对子命令的地址进行交织以及子命令的发送等过程。PHY 84,用于驱动对子命令的发送,以将子命令的地址信息和数据长度等信息发送给内存芯片,以实现对内存芯片的读操作或写操作。The controller 81 can be used to implement the above-mentioned processes of determining the storage area and dividing the command for the access command after reading the access command from the buffer circuit 83 . The interleaver 82 is configured to implement the above-mentioned processes of interleaving the addresses of the subcommands and sending the subcommands. The PHY 84 is used to drive the sending of subcommands, so as to send information such as the address information and data length of the subcommands to the memory chip, so as to realize the read operation or write operation to the memory chip.
在一些实施例中,如图22所示,控制器81可以包括分区判断电路811和命令拆分电路812,交织器82可以包括第一交织电路813、第二交织电路814和命令发送电路815。其中:分区判断电路81,用于根据第一访问命令的访问地址确定第一访问命 令访问内存芯片中的第一存储区域或第二存储区域,第一存储区域与第二存储区域不重叠,并向命令拆分电路812指示第一访问命令待访问的存储区域。In some embodiments, as shown in FIG. 22 , the controller 81 may include a partition determination circuit 811 and a command splitting circuit 812 , and the interleaver 82 may include a first interleaving circuit 813 , a second interleaving circuit 814 and a command sending circuit 815 . Wherein: the partition judgment circuit 81 is used to determine the first storage area or the second storage area in the memory chip accessed by the first access command according to the access address of the first access command, the first storage area and the second storage area do not overlap, and The memory area to be accessed by the first access command is indicated to the command splitting circuit 812 .
在一些实施例中,控制器81中的分区判断电路811中可以存储有第一存储区域对应的地址范围和第二存储区域对应的地址范围,当分区判断电路811接收到第一访问命令时,可以根据两个存储区域对应的地址范围判断第一访问命令用于访问第一存储区域还是第二存储区域。如果确定第一访问命令用于访问第一存储区域,则分区判断电路811可以用于向命令拆分电路812发送第一访问命令和第一访问命令用于访问第一存储区域的指示;如果确定第一访问命令用于访问第二存储区域,则分区判断电路811可以用于向命令拆分电路812发送第一访问命令和第一访问命令用于访问第二存储区域的指示。In some embodiments, the partition determination circuit 811 in the controller 81 may store the address range corresponding to the first storage area and the address range corresponding to the second storage area. When the partition determination circuit 811 receives the first access command, Whether the first access command is used to access the first storage area or the second storage area can be determined according to the address ranges corresponding to the two storage areas. If it is determined that the first access command is used to access the first storage area, the partition determination circuit 811 may be configured to send the first access command and an indication that the first access command is used to access the first storage area to the command splitting circuit 812; if determined If the first access command is used to access the second storage area, the partition determination circuit 811 may be configured to send the first access command and an indication that the first access command is used to access the second storage area to the command splitting circuit 812 .
命令拆分电路812,用于根据分区判断电路811的指示和命令拆分的配置将第一访问命令进行拆分,得到多个子命令,并将多个子命令发送给第一交织电路813或第二交织电路814。The command splitting circuit 812 is configured to split the first access command according to the instruction of the partition judging circuit 811 and the command splitting configuration to obtain multiple subcommands, and send the multiple subcommands to the first interleaving circuit 813 or the second interleaving circuit 813 Interleaving circuit 814.
在一些实施例中,命令拆分电路812可以根据内存芯片支持的突发长度BL确定将第一访问命令拆分为多少个子命令。例如,内存芯片支持的BL为32Byte,第一访问命令访问的数据长度为64Byte,如果第一访问命令用于访问第二存储区域,那么命令拆分电路812可以用于将第一访问命令拆分为2个子命令,每个子命令访问的数据长度为32Byte,可以利用第二交织方案对2个子命令进行地址交织,使得这2个子命令用于访问不同的内存库组,提升第二存储区域的带宽使用效率;如果第一访问命令用于访问第一存储区域,那么命令拆分电路812也可以不对第一访问命令进行拆分以降低访问内存芯片的功耗,提高能效。In some embodiments, the command splitting circuit 812 may determine how many subcommands to split the first access command into according to the burst length BL supported by the memory chip. For example, the BL supported by the memory chip is 32 Bytes, and the length of data accessed by the first access command is 64 Bytes. If the first access command is used to access the second storage area, the command splitting circuit 812 can be used to split the first access command. It consists of 2 subcommands. The length of data accessed by each subcommand is 32 Bytes. The second interleaving scheme can be used to perform address interleaving on the 2 subcommands, so that these 2 subcommands can be used to access different memory bank groups and improve the bandwidth of the second storage area. Use efficiency; if the first access command is used to access the first storage area, the command splitting circuit 812 may also not split the first access command to reduce the power consumption of accessing the memory chip and improve energy efficiency.
可以理解,命令拆分电路812在拆分第一访问命令时,还需要根据第一访问命令的访问地址确定每个子命令的访问地址,具体可以是根据第一访问命令携带的首地址和每个子命令访问的数据长度确定每个子命令携带的首地址。其中一个子命令携带的首地址与第一访问命令携带的首地址相同。而后,如果第一访问命令待访问的存储区域的指示用于指示第一访问命令用于访问第一存储区域,那么命令拆分电路812将拆分后的子命令发送给第一交织电路813;如果第一访问命令待访问的存储区域的指示用于指示第一访问命令用于访问第二存储区域,那么命令拆分电路812将拆分后的子命令发送给第二交织电路814。It can be understood that when the command splitting circuit 812 splits the first access command, it also needs to determine the access address of each subcommand according to the access address of the first access command, specifically the first address and each subcommand carried by the first access command. The length of data accessed by the command determines the first address carried by each subcommand. The first address carried by one of the subcommands is the same as the first address carried by the first access command. Then, if the indication of the storage area to be accessed by the first access command is used to indicate that the first access command is used to access the first storage area, then the command splitting circuit 812 sends the split subcommand to the first interleaving circuit 813; If the indication of the storage area to be accessed by the first access command is used to indicate that the first access command is used to access the second storage area, the command splitting circuit 812 sends the split subcommand to the second interleaving circuit 814 .
第一交织电路813,用于根据第一交织方案对第一访问命令拆分后的子命令的地址进行交织以得到交织后的访问地址;交织后的访问地址用于访问内存芯片的第一存储区域。第二交织电路814,用于根据第二交织方案对第一访问命令拆分后的子命令的地址进行交织以得到交织后的访问地址;交织后的访问地址用于访问内存芯片的第二存储区域。命令发送电路815,可以用于将子命令对应的交织后的访问地址和数据长度发送给PHY 84,以便PHY 84用于驱动子命令发送给内存芯片。当内存芯片确定了子命令对应的交织后的访问地址和数据长度时,内存芯片可以根据子命令对应的交织后的访问地址,即子命令要访问的数据的首地址确定待访问的BG、bank、行和列,再根据子命令访问的数据长度从行和列确定的存储区域读取数据或写入数据,从而完成与内存访问装置80的交互。The first interleaving circuit 813 is used to interleave the addresses of the subcommands after the first access command is split according to the first interleaving scheme to obtain the interleaved access addresses; the interleaved access addresses are used to access the first storage of the memory chip. area. The second interleaving circuit 814 is configured to interleave the addresses of the subcommands after the splitting of the first access command according to the second interleaving scheme to obtain the interleaved access addresses; the interleaved access addresses are used to access the second storage of the memory chip area. The command sending circuit 815 can be used to send the interleaved access address and data length corresponding to the subcommand to the PHY 84, so that the PHY 84 is used to drive the subcommand and send it to the memory chip. When the memory chip determines the interleaved access address and data length corresponding to the subcommand, the memory chip can determine the BG and bank to be accessed according to the interleaved access address corresponding to the subcommand, that is, the first address of the data to be accessed by the subcommand. , row and column, and then read or write data from the storage area determined by the row and column according to the data length accessed by the subcommand, so as to complete the interaction with the memory access device 80 .
在一些实施例中,如果第一访问命令为读命令,如图22所示,控制器81还包括数据拼接电路816。当分区判断电路811确定了第一访问命令访问的存储区域时,还可以用于向数据拼接电路816发送第一访问命令访问第一存储区域或第二存储区域的数据长度,当数据拼接电路816接收到读回的数据时,读回的数据中可以携带数据属于第一存储区域或第二存储区域的标记,而后,数据拼接电路816可以将具有相同标记的数据拼接为第一访问命令所需访问的数据长度后返回给CPU。由于读回的数据针对的是拆分后的子命令的,因此数据长度短于第一访问命令所需读取的数据长度,因此需要引入数据拼接电路816,可以理解,数据拼接电路816的功能可以不用集成于DDR控制器中,而是集成在CPU内,例如作为CPU执行的软件模块存在,本实施例对此不限定。In some embodiments, if the first access command is a read command, as shown in FIG. 22 , the controller 81 further includes a data splicing circuit 816 . When the partition judgment circuit 811 determines the storage area accessed by the first access command, it can also be used to send the first access command to the data splicing circuit 816 to access the data length of the first storage area or the second storage area. When the data splicing circuit 816 When the read-back data is received, the read-back data may carry a label that the data belongs to the first storage area or the second storage area, and then the data splicing circuit 816 can splicing the data with the same label as required by the first access command. The accessed data length is returned to the CPU. Since the read back data is for the split subcommand, the data length is shorter than the data length required to be read by the first access command, so the data splicing circuit 816 needs to be introduced. It can be understood that the function of the data splicing circuit 816 Instead of being integrated in the DDR controller, it may be integrated in the CPU, for example, it exists as a software module executed by the CPU, which is not limited in this embodiment.
如果将本申请提供的内存访问装置80的实现原理、地址映射、访问第二存储区域的带宽收益以及访问第一存储区域的能效收益与现有的拼凑重排控制逻辑(已有方案)的实现原理、地址映射和访问存储区域的带宽收益和能效收益进行对比,在一定的访问条件下,不同的方案得到的结果可以如表2所示。If the implementation principle, address mapping, bandwidth gain of accessing the second storage area, and energy efficiency gain of accessing the first storage area provided by the present application of the memory access device 80 are compared with the implementation of the existing patchwork rearrangement control logic (existing solution) The principle, address mapping, and the bandwidth gain and energy efficiency gain of accessing the storage area are compared. Under certain access conditions, the results obtained by different schemes can be shown in Table 2.
表2Table 2
Figure PCTCN2021074562-appb-000002
Figure PCTCN2021074562-appb-000002
Figure PCTCN2021074562-appb-000003
Figure PCTCN2021074562-appb-000003
从表2可以看出,对于现有的拼凑重排逻辑的方案,其原理是对访问命令进行拼凑重排,以提高满足BG交织的概率。对于页(Page)内连续访问模式(访问同一Page的访问地址连续),现有技术的内存芯片的总线带宽使用效率可以达到100%;对于Page内随机访问模式(访问同一Page的访问地址不连续),现有技术的总线带宽使用效率为:1-拼凑失败率×50%,内存芯片的总线带宽使用效率较低;对于跨行(Row)访问模式,现有技术的内存芯片的总线带宽使用效率大约在75%到100%之间。如果需要的访问带宽都为小带宽时,现有技术的BG导致功耗较高,访问能效较低。As can be seen from Table 2, for the existing scheme of patching and rearranging logic, the principle is to patch and rearrange access commands to improve the probability of satisfying BG interleaving. For the continuous access mode within a page (access addresses accessing the same Page are continuous), the bus bandwidth utilization efficiency of the memory chip in the prior art can reach 100%; for the random access mode within the Page (access addresses accessing the same Page are discontinuous) ), the bus bandwidth usage efficiency of the prior art is: 1-patch failure rate×50%, the bus bandwidth usage efficiency of the memory chip is relatively low; for the cross-row (Row) access mode, the bus bandwidth usage efficiency of the prior art memory chip somewhere between 75% and 100%. If the required access bandwidths are all small bandwidths, the BG of the prior art leads to high power consumption and low access energy efficiency.
对于本申请提供的方案一,即将访问128Byte的访问命令拆分为2个访问64Byte的子命令且采用第二交织方案的情况下,子命令访问BG时会满足BG交织。在本申请图13提供的第二交织方案的地址映射关系下,访问命令在访问第二存储区域时,对于Page内连续访问模式,内存芯片的总线带宽使用效率为100%;Page内随机访问模式下,总线带宽使用效率可以为:1-64Byte命令占率×拼凑失败率×50%,相对于已有方案来说,方案一下的总线带宽使用效率得到提升;对于跨Row访问模式(在行到行延时(Row to Row Delay,Trrd)为5ns时),内存芯片的总线带宽使用效率大约在75%。For the first solution provided by this application, when the access command accessing 128Byte is split into two subcommands accessing 64Byte and the second interleaving scheme is adopted, BG interleaving will be satisfied when the subcommand accesses BG. Under the address mapping relationship of the second interleaving scheme provided in FIG. 13 of this application, when the access command accesses the second storage area, for the continuous access mode in the Page, the bus bandwidth usage efficiency of the memory chip is 100%; the random access mode in the Page is 100%. In this case, the bus bandwidth usage efficiency can be: 1-64Byte command share × patchwork failure rate × 50%. Compared with the existing solutions, the bus bandwidth usage efficiency of the first solution is improved; When the row delay (Row to Row Delay, Trrd) is 5ns), the bus bandwidth usage efficiency of the memory chip is about 75%.
在将访问128Byte的访问命令拆分为2个访问64Byte的子命令且采用第一交织方案的情况下,在本申请图19提供的第一交织方案的地址映射关系下,本申请方案三可以使得访问第一存储区域的带宽较小的场景下,集中访问单BG,即集中访问同一BG,可以降低内存芯片的功耗,提升访问能效。In the case where the access command accessing 128Byte is split into two subcommands accessing 64Byte and the first interleaving scheme is adopted, under the address mapping relationship of the first interleaving scheme provided in FIG. 19 of the present application, the third scheme of the present application can make In a scenario where the bandwidth for accessing the first storage area is small, centralized access to a single BG, that is, centralized access to the same BG, can reduce the power consumption of the memory chip and improve the access energy efficiency.
对于本申请提供的方案二,即将访问128Byte的访问命令拆分为4个访问32Byte的子命令,或将访问64Btye的访问命令拆分为2个访问32Byte的子命令,且采用第二交织方案的情况下,子命令访问BG时会满足BG交织。在本申请图15提供的第二交织方案的地址映射关系下,访问命令在访问第二存储区域时,对于Page内连续访问模式,内存芯片的总线带宽使用效率为100%;Page内随机访问模式下,总线带宽使用效率可以为100%,相对于已有方案来说,方案二下的总线带宽使用效率得到明显提升;对于跨Row访问模式(在Trrd为5ns时),内存芯片的总线带宽使用效率大约在75%。而且,本申请方案二可以使得访问第一存储区域的带宽较小的场景下,集中访问单BG,即集中访问同一BG,可以降低内存芯片的功耗,提升访问能效。For the second solution provided by this application, the access command for accessing 128Byte is split into four subcommands for accessing 32Byte, or the access command for accessing 64Byte is split into two subcommands for accessing 32Byte, and the second interleaving scheme is adopted. In this case, the BG interleaving will be satisfied when the subcommand accesses the BG. Under the address mapping relationship of the second interleaving scheme provided in FIG. 15 of this application, when the access command accesses the second storage area, for the continuous access mode in the Page, the bus bandwidth usage efficiency of the memory chip is 100%; the random access mode in the Page is 100%. The bus bandwidth usage efficiency can be 100%. Compared with the existing solutions, the bus bandwidth usage efficiency under the second solution is significantly improved; for the cross-Row access mode (when the Trrd is 5ns), the bus bandwidth usage of the memory chip Efficiency is around 75%. Moreover, the second solution of the present application can enable centralized access to a single BG in a scenario where the bandwidth for accessing the first storage area is small, that is, centralized access to the same BG, which can reduce the power consumption of the memory chip and improve the access energy efficiency.
本申请内存访问装置中各个模块的具体实现方式可以参见本申请对内存访问方法的说明,此处不再赘述。本申请实施例还提供一种通信芯片,该通信芯片包括本申请实施例中阐述的内存访问装置。例如该通信芯片可以为SoC或GPU等芯片。本申请 实施例还提供一种电子设备,如图23所示,电子设备包括如本申请实施例中阐述的通信芯片,该通信芯片包括本申请提供的内存访问装置80。本申请实施例还提供一种计算机可读存储介质,包括计算机指令,当计算机指令在电子设备上运行时,使得电子设备执行上述内存访问方法中所述的方法。本申请实施例还提供一种计算机程序产品,当计算机程序产品在计算机上运行时,使得电子设备执行上述内存访问方法中所述的方法。For the specific implementation manner of each module in the memory access device of the present application, reference may be made to the description of the memory access method of the present application, which will not be repeated here. Embodiments of the present application further provide a communication chip, where the communication chip includes the memory access device described in the embodiments of the present application. For example, the communication chip may be a chip such as an SoC or a GPU. An embodiment of the present application further provides an electronic device. As shown in FIG. 23 , the electronic device includes the communication chip described in the embodiment of the present application, and the communication chip includes the memory access device 80 provided by the present application. Embodiments of the present application further provide a computer-readable storage medium, including computer instructions, which, when the computer instructions are executed on the electronic device, cause the electronic device to execute the method described in the foregoing memory access method. Embodiments of the present application further provide a computer program product, which, when the computer program product runs on a computer, enables an electronic device to execute the method described in the foregoing memory access method.
通过以上实施方式的描述,所属领域的技术人员可以了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。From the description of the above embodiments, those skilled in the art can understand that for the convenience and brevity of the description, only the division of the above functional modules is used as an example for illustration. In practical applications, the above functions can be allocated by different The function module is completed, that is, the internal structure of the device is divided into different function modules, so as to complete all or part of the functions described above.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个装置,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the device embodiments described above are only illustrative. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be Incorporation may either be integrated into another device, or some features may be omitted, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是一个物理单元或多个物理单元,即可以位于一个地方,或者也可以分布到多个不同地方。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may be one physical unit or multiple physical units, that is, they may be located in one place, or may be distributed to multiple different places . Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以至少部分采用软件功能单元的形式实现。一个单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该软件产品存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, and may also be implemented at least partially in the form of software functional units. A unit can be stored in a readable storage medium if it is implemented in the form of a software functional unit and sold or used as an independent product. Based on such understanding, the technical solutions of the embodiments of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, which are stored in a storage medium , including several instructions to make a device (may be a single chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read only memory (ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program codes.
以上内容,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above content is only a specific embodiment of the present application, but the protection scope of the present application is not limited to this. Covered within the scope of protection of this application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims (22)

  1. 一种内存访问装置,其特征在于,所述内存访问装置包括:A memory access device, characterized in that the memory access device comprises:
    控制器,用于根据第一访问命令的访问地址确定所述第一访问命令访问内存芯片中的第一存储区域或第二存储区域;所述第一存储区域与所述第二存储区域不重叠;a controller, configured to determine that the first access command accesses the first storage area or the second storage area in the memory chip according to the access address of the first access command; the first storage area and the second storage area do not overlap ;
    交织器,用于在所述控制器确定所述第一访问命令访问所述第一存储区域时,根据第一交织方案对所述第一访问命令的地址进行交织以得到交织后的访问地址;且在所述控制器确定所述第一访问命令访问所述第二存储区域时,根据第二交织方案对所述第一访问命令的地址进行交织以得到所述交织后的访问地址,所述交织后的访问地址用于访问所述内存芯片。an interleaver, configured to interleave the address of the first access command according to a first interleaving scheme to obtain an interleaved access address when the controller determines that the first access command accesses the first storage area; And when the controller determines that the first access command accesses the second storage area, the address of the first access command is interleaved according to a second interleaving scheme to obtain the interleaved access address, the The interleaved access addresses are used to access the memory chips.
  2. 根据权利要求1所述的内存访问装置,其特征在于,访问所述第二存储区域中第二数据所需要的第二带宽高于访问所述第一存储区域中第一数据所需要的第一带宽。The memory access device according to claim 1, wherein a second bandwidth required for accessing the second data in the second storage area is higher than a first bandwidth required for accessing the first data in the first storage area bandwidth.
  3. 根据权利要求1或2所述的内存访问装置,其特征在于,所述控制器,还用于将所述第一访问命令拆分为多个子命令,所述交织后的访问地址包括所述多个子命令的访问地址;The memory access device according to claim 1 or 2, wherein the controller is further configured to split the first access command into multiple subcommands, and the interleaved access address includes the multiple subcommands. The access address of the subcommand;
    在所述第二交织方案中,所述多个子命令的访问地址分别用于访问所述第二存储区域中的不同内存库组;所述不同内存库组中每个内存库组包括多个内存库;In the second interleaving scheme, the access addresses of the multiple subcommands are respectively used to access different memory bank groups in the second storage area; each memory bank group in the different memory bank groups includes multiple memory banks library;
    在所述第一交织方案中,所述多个子命令的访问地址用于访问所述第一存储区域中的同一内存库组中的不同内存库或所述同一内存库组中的同一内存库。In the first interleaving scheme, the access addresses of the multiple subcommands are used to access different memory banks in the same memory bank group or the same memory bank in the same memory bank group in the first storage area.
  4. 根据权利要求3所述的内存访问装置,其特征在于,所述多个子命令中的每个子命令包括多个位,用于指示所述每个子命令的访问地址,所述多个位中的每个位对应所述访问地址的一地址线,所述每个子命令所访问的数据量是2 M-1个单位,M是大于1的整数。 The memory access device according to claim 3, wherein each subcommand in the plurality of subcommands comprises a plurality of bits for indicating an access address of the each subcommand, and each subcommand in the plurality of bits comprises a plurality of bits. One bit corresponds to an address line of the access address, and the amount of data accessed by each subcommand is 2 M-1 units, where M is an integer greater than 1.
  5. 根据权利要求4所述的内存访问装置,其特征在于,在所述第二交织方案中,所述多个位中从低位开始的第M位与第N位用于共同指示所述每个子命令所访问的内存库组;N是大于M的整数。The memory access device according to claim 4, wherein, in the second interleaving scheme, the Mth bit and the Nth bit from the lower bits of the plurality of bits are used to jointly indicate each subcommand The memory bank group accessed; N is an integer greater than M.
  6. 根据权利要求5所述的内存访问装置,其特征在于,所述多个位中从低位开始的第R位与第S位用于指示所述每个子命令所访问的内存库,R为大于N的整数,S为大于R的整数。The memory access device according to claim 5, wherein the R-th bit and the S-th bit from the low-order bits in the plurality of bits are used to indicate the memory bank accessed by each subcommand, and R is greater than N , S is an integer greater than R.
  7. 根据权利要求4-6中任一项所述的内存访问装置,其特征在于,在所述第一交织方案中,所述多个位中从低位开始的第P位与第Q位用于共同指示所述每个子命令所访问的内存库组,P为大于M的整数,Q为大于P的整数。The memory access device according to any one of claims 4-6, characterized in that, in the first interleaving scheme, the P-th bit and the Q-th bit from the low-order bits in the plurality of bits are used in common Indicates the memory bank group accessed by each subcommand, P is an integer greater than M, and Q is an integer greater than P.
  8. 根据权利要求7所述的内存访问装置,其特征在于,所述多个位中从低位开始的第J位与第K位用于指示所述每个子命令所访问的内存库,J为大于Q的整数,K为大于J的整数。The memory access device according to claim 7, wherein the Jth bit and the Kth bit from the lower bits of the plurality of bits are used to indicate the memory bank accessed by each subcommand, and J is greater than Q , and K is an integer greater than J.
  9. 根据权利要求3-8任一项所述的内存访问装置,其特征在于,当所述第一访问命令访问所述第一存储区域时,所述第一访问命令被拆分为X个子命令;且The memory access device according to any one of claims 3-8, wherein when the first access command accesses the first storage area, the first access command is split into X subcommands; and
    当所述第一访问命令访问所述第二存储区域时,所述第一访问命令被拆分为Y个子命令;When the first access command accesses the second storage area, the first access command is split into Y subcommands;
    Y大于X,且X和Y为大于1的整数。Y is greater than X, and X and Y are integers greater than one.
  10. 一种内存访问方法,其特征在于,所述方法包括:A memory access method, characterized in that the method comprises:
    根据第一访问命令的访问地址确定所述第一访问命令访问内存芯片中的第一存储区域或第二存储区域;所述第一存储区域与所述第二存储区域不重叠;According to the access address of the first access command, determine that the first access command accesses the first storage area or the second storage area in the memory chip; the first storage area and the second storage area do not overlap;
    在确定所述第一访问命令访问所述第一存储区域时,根据第一交织方案对所述第一访问命令的地址进行交织以得到交织后的访问地址;且When it is determined that the first access command accesses the first storage area, the address of the first access command is interleaved according to a first interleaving scheme to obtain an interleaved access address; and
    在确定所述第一访问命令访问所述第二存储区域时,根据第二交织方案对所述第一访问命令的地址进行交织以得到所述交织后的访问地址,所述交织后的访问地址用于访问所述内存芯片。When it is determined that the first access command accesses the second storage area, the address of the first access command is interleaved according to a second interleaving scheme to obtain the interleaved access address, the interleaved access address for accessing the memory chip.
  11. 根据权利要求10所述的方法,其特征在于,访问所述第二存储区域中第二数据所需要的第二带宽高于访问所述第一存储区域中第一数据所需要的第一带宽。The method according to claim 10, wherein the second bandwidth required for accessing the second data in the second storage area is higher than the first bandwidth required for accessing the first data in the first storage area.
  12. 根据权利要求10或11所述的方法,其特征在于,所述方法还包括:The method according to claim 10 or 11, wherein the method further comprises:
    将所述第一访问命令拆分为多个子命令,所述交织后的访问地址包括所述多个子命令的访问地址;splitting the first access command into a plurality of subcommands, and the interleaved access address includes the access addresses of the plurality of subcommands;
    在所述第二交织方案中,所述多个子命令的访问地址分别用于访问所述第二存储区域中的不同内存库组;所述不同内存库组中每个内存库组包括多个内存库;In the second interleaving scheme, the access addresses of the multiple subcommands are respectively used to access different memory bank groups in the second storage area; each memory bank group in the different memory bank groups includes multiple memory banks library;
    在所述第一交织方案中,所述多个子命令的访问地址用于访问所述第一存储区域中的同一内存库组中的不同内存库或所述同一内存库组中的同一内存库。In the first interleaving scheme, the access addresses of the multiple subcommands are used to access different memory banks in the same memory bank group or the same memory bank in the same memory bank group in the first storage area.
  13. 根据权利要求12所述的方法,其特征在于,所述多个子命令中的每个子命令包括多个位,用于指示所述每个子命令的访问地址,所述多个位中的每个位对应所述访问地址的一地址线,所述每个子命令所访问的数据量是2 M-1个单位,M是大于1的整数。 The method according to claim 12, wherein each subcommand in the plurality of subcommands comprises a plurality of bits for indicating an access address of the each subcommand, and each bit in the plurality of bits Corresponding to an address line of the access address, the amount of data accessed by each subcommand is 2 M-1 units, where M is an integer greater than 1.
  14. 根据权利要求13所述的方法,其特征在于,在所述第二交织方案中,所述多个位中从低位开始的第M位与第N位用于共同指示所述每个子命令所访问的内存库组;N是大于M的整数。The method according to claim 13, wherein, in the second interleaving scheme, the M-th bit and the N-th bit from the low-order bits in the plurality of bits are used to jointly indicate that each subcommand accesses ; N is an integer greater than M.
  15. 根据权利要求14所述的方法,其特征在于,所述多个位中从低位开始的第R位与第S位用于指示所述每个子命令所访问的内存库,R为大于N的整数,S为大于R的整数。The method according to claim 14, wherein the R-th bit and the S-th bit from the low-order bits in the plurality of bits are used to indicate the memory bank accessed by each subcommand, and R is an integer greater than N , S is an integer greater than R.
  16. 根据权利要求13所述的方法,其特征在于,在所述第一交织方案中,所述多个位中从低位开始的第P位与第Q位用于共同指示所述每个子命令所访问的内存库组,P为大于M的整数,Q为大于P的整数。The method according to claim 13, wherein, in the first interleaving scheme, the P-th bit and the Q-th bit from the lower bits in the plurality of bits are used to jointly indicate that each subcommand accesses , where P is an integer greater than M, and Q is an integer greater than P.
  17. 根据权利要求16所述的方法,其特征在于,所述多个位中从低位开始的第J位与第K位用于指示所述每个子命令所访问的内存库,J为大于Q的整数,K为大于J的整数。The method according to claim 16, wherein the Jth bit and the Kth bit from the lower bit of the plurality of bits are used to indicate the memory bank accessed by each subcommand, and J is an integer greater than Q , K is an integer greater than J.
  18. 根据权利要求12-17任一项所述的方法,其特征在于,当所述第一访问命令访问所述第一存储区域时,所述第一访问命令被拆分为X个子命令;且The method according to any one of claims 12-17, wherein when the first access command accesses the first storage area, the first access command is split into X subcommands; and
    当所述第一访问命令访问所述第二存储区域时,所述第一访问命令被拆分为Y个子命令;When the first access command accesses the second storage area, the first access command is split into Y subcommands;
    Y大于X,且X和Y为大于1的整数。Y is greater than X, and X and Y are integers greater than one.
  19. 一种通信芯片,其特征在于,所述通信芯片包括如权利要求1-9任一项所述的内存访问装置。A communication chip, characterized in that, the communication chip includes the memory access device according to any one of claims 1-9.
  20. 一种电子设备,其特征在于,所述电子设备包括如权利要求1-9任一项所述的内存访问装置。An electronic device, characterized in that, the electronic device comprises the memory access device according to any one of claims 1-9.
  21. 一种计算机可读存储介质,其特征在于,包括计算机指令,当计算机指令在电子设备上运行时,使得电子设备执行上述权利要求10-18中的任一项所述的方法。A computer-readable storage medium, characterized by comprising computer instructions, which, when executed on an electronic device, cause the electronic device to perform the method of any one of the preceding claims 10-18.
  22. 一种计算机程序产品,其特征在于,当计算机程序产品在计算机上运行时,使得电子设备执行上述权利要求10-18中的任一项所述的方法。A computer program product, characterized in that, when the computer program product is run on a computer, the electronic device is caused to perform the method of any one of the above claims 10-18.
PCT/CN2021/074562 2021-01-30 2021-01-30 Method and apparatus for accessing memory WO2022160321A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202180077267.9A CN116472520A (en) 2021-01-30 2021-01-30 Method and device for accessing memory
PCT/CN2021/074562 WO2022160321A1 (en) 2021-01-30 2021-01-30 Method and apparatus for accessing memory

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/074562 WO2022160321A1 (en) 2021-01-30 2021-01-30 Method and apparatus for accessing memory

Publications (1)

Publication Number Publication Date
WO2022160321A1 true WO2022160321A1 (en) 2022-08-04

Family

ID=82652894

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/074562 WO2022160321A1 (en) 2021-01-30 2021-01-30 Method and apparatus for accessing memory

Country Status (2)

Country Link
CN (1) CN116472520A (en)
WO (1) WO2022160321A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120137090A1 (en) * 2010-11-29 2012-05-31 Sukalpa Biswas Programmable Interleave Select in Memory Controller
CN105612501A (en) * 2013-10-03 2016-05-25 高通股份有限公司 System and method for uniform interleaving of data across a multiple-channel memory architecture with asymmetric storage capacity
US9696941B1 (en) * 2016-07-11 2017-07-04 SK Hynix Inc. Memory system including memory buffer
CN107180001A (en) * 2016-03-10 2017-09-19 华为技术有限公司 Access dynamic RAM DRAM method and bus

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120137090A1 (en) * 2010-11-29 2012-05-31 Sukalpa Biswas Programmable Interleave Select in Memory Controller
CN105612501A (en) * 2013-10-03 2016-05-25 高通股份有限公司 System and method for uniform interleaving of data across a multiple-channel memory architecture with asymmetric storage capacity
CN107180001A (en) * 2016-03-10 2017-09-19 华为技术有限公司 Access dynamic RAM DRAM method and bus
US9696941B1 (en) * 2016-07-11 2017-07-04 SK Hynix Inc. Memory system including memory buffer

Also Published As

Publication number Publication date
CN116472520A (en) 2023-07-21

Similar Documents

Publication Publication Date Title
US10453501B2 (en) Hybrid LPDDR4-DRAM with cached NVM and flash-NAND in multi-chip packages for mobile devices
JP5231642B2 (en) Independently controlled virtual memory device in memory module
US9773531B2 (en) Accessing memory
EP1754229B1 (en) System and method for improving performance in computer memory systems supporting multiple memory access latencies
CN112035381B (en) Storage system and storage data processing method
US6434674B1 (en) Multiport memory architecture with direct data flow
US7765366B2 (en) Memory micro-tiling
US20120030396A1 (en) Decoupled Memory Modules: Building High-Bandwidth Memory Systems from Low-Speed Dynamic Random Access Memory Devices
US8364889B2 (en) Dynamic row-width memory
JP2011530734A (en) Independently controllable and reconfigurable virtual memory device in a memory module that is pin compatible with a standard memory module
WO2021035761A1 (en) Method and apparatus for implementing mixed reading and writing of solid state disk
US20230409198A1 (en) Memory sharing control method and device, computer device, and system
US9696941B1 (en) Memory system including memory buffer
AU2014301874B2 (en) Data writing method and memory system
US10162522B1 (en) Architecture of single channel memory controller to support high bandwidth memory of pseudo channel mode or legacy mode
WO2019141050A1 (en) Refreshing method, apparatus and system, and memory controller
WO2024087559A1 (en) Memory access method and system, and apparatus and electronic device
CN100536021C (en) High-capacity cache memory
US20190042499A1 (en) High bandwidth dimm
WO2022160321A1 (en) Method and apparatus for accessing memory
US20120226863A1 (en) Information processing device, memory access control device, and address generation method thereof
US10067868B2 (en) Memory architecture determining the number of replicas stored in memory banks or devices according to a packet size
WO2021139733A1 (en) Memory allocation method and device, and computer readable storage medium
WO2022160214A1 (en) Memory access method and device
CN115687196B (en) Method and apparatus for controlling multi-channel memory

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21921922

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202180077267.9

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21921922

Country of ref document: EP

Kind code of ref document: A1