WO2020124609A1 - 一种处理芯片、方法及相关设备 - Google Patents

一种处理芯片、方法及相关设备 Download PDF

Info

Publication number
WO2020124609A1
WO2020124609A1 PCT/CN2018/122946 CN2018122946W WO2020124609A1 WO 2020124609 A1 WO2020124609 A1 WO 2020124609A1 CN 2018122946 W CN2018122946 W CN 2018122946W WO 2020124609 A1 WO2020124609 A1 WO 2020124609A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
memory
length
block
read
Prior art date
Application number
PCT/CN2018/122946
Other languages
English (en)
French (fr)
Inventor
包雅林
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2018/122946 priority Critical patent/WO2020124609A1/zh
Priority to CN201880100446.8A priority patent/CN113227984B/zh
Publication of WO2020124609A1 publication Critical patent/WO2020124609A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the invention relates to the field of chip technology, in particular to a processing chip, method and related equipment.
  • the queues between different users can be distinguished based on Media Access Control Address (Media Access Control Address, MAC), Internet Protocol Address (Internet Protocol Address, IP) address, or Transmission Control Protocol (Transmission Control Protocol, TCP) connection relationship, etc. .
  • Media Access Control Address Media Access Control Address
  • IP Internet Protocol Address
  • TCP Transmission Control Protocol
  • the queue depth of any user may be increased or decreased through multiple access sources (such as N channels, data interfaces, pipelines, or planes). For this type of queue, if you want to determine the actual depth of multiple user queues within 1 clock cycle, you need to involve the implementation of multiple queue depth calculations that implement N access sources in the chip.
  • the technical problem to be solved by the embodiments of the present invention is to provide a processing chip, method and related equipment to improve the calculation efficiency of the data length of multiple access sources.
  • the data lengths of M copies of the target data corresponding to the block are repeatedly stored, when N
  • the data length of the target data corresponding to any one or more blocks in a block changes, read the initial length stored in the 1R1W memory of one of the corresponding blocks, and update the M stored in the 1R1W memory of the block
  • the initial length may include multiple types of data (for example, data including multiple users).
  • the stored Block is allowed at most M read operations and M write operations within one clock cycle, and one of the M read operations can be used to read the initial length of this type of data (to calculate the updated Data length), M write operations can be used to write the updated data length of M copies of this type of data, so that the total length of M types of data can be calculated in the same clock cycle.
  • the processing chip in the embodiment of the present invention can allow the calculation of the total length of the M-type data at most in one clock cycle, and realizes the N access source inside the chip while ensuring that the data length of the target data is updated in real time
  • the calculation method of M-type data length improves the efficiency and accuracy of the calculation of the data length of multi-type data of multiple access sources.
  • the chip further includes: a second memory connected to the controller, and N data interfaces connected to the second memory, the N data interfaces and the N Each memory block has a one-to-one correspondence; each of the N data interfaces is used to write data to or read data from the second memory; the second memory For storing data written through the N data interfaces; wherein, the target data S i corresponding to the i-th Block is specifically stored in the i-th Block through the data interface corresponding to the i-th Block The data in the second memory.
  • the processing chip provided by the embodiment of the present invention further includes a second memory and N data interfaces connected thereto, and the N data interfaces correspond one-to-one to the N blocks, so the target data corresponding to the block is passed through the block The data written or read by the corresponding data interface.
  • the second memory is used to store various types of data written through N data interfaces, and the N data interfaces can be regarded as N access sources of the processing chip. When data is written or read through a data interface, the data length stored in the block corresponding to the data interface is read and updated to ensure the accuracy of the data length of the data.
  • each 1R1W memory includes K storage units with a bit width of W;
  • the data length of the S i comprises data length K: L i1, L i2, L i3 whil
  • Each 1R1W memory of the M 1R1W memories in the i-th Block stores the K data lengths, and the K data lengths are stored one-to-one corresponding to the K storages in a 1R1W memory unit; said controller, in particular for the j-th data corresponding to Block interfaces s g written or read, the reading of the j-th Block correspondence storage unit, wherein a memory 1R1W
  • the stored L jg and according to the change in the length of s g , update the M copies of the L jg stored in the corresponding storage units in the M 1R1W memories of the jth Block, where M is greater than or equal to N Integer;
  • L jg is the length of the data stored in the second memory by the data interface corresponding to the jth Block s g of the g- th type of data s g, and the g- th data is the g-th of the K-type data
  • the depth of the M 1R1W memories in each of the N blocks in the first memory is K, and the bit width is W.
  • the target data includes K-type data
  • each type of data is stored in the data length of the second memory through a certain data interface, which is exactly stored in one of the storage units in the 1R1W memory in the corresponding block of the data interface, And the data lengths of M copies of the data are stored in the M 1R1W memories of the block, respectively. Therefore, when a data interface has data transmission (writing or reading), the controller reads the storage unit in the 1R1W memory corresponding to the block (the fixed storage unit corresponding to this type of data).
  • the initial length of storage in order to calculate and update the length of M copies of data stored in the corresponding storage unit in the M 1R1W memory.
  • the processing chip in the embodiment of the present invention can be implemented, and the maximum total data length of M types of data can be calculated in the same clock cycle, and since M takes an integer greater than or equal to N, therefore, when N data interfaces have During data transmission (writing or reading), and when there are N different types of data, respectively, the processing chip in the embodiment of the present invention can simultaneously support the calculation of the total length of the N type data of the N access source, which improves multiple access Calculation efficiency of the data length of the source.
  • the processing chip further includes a calculation unit connected to the controller and the first memory: the controller is also used to switch from the N within the same clock cycle
  • the processing chip provided by the embodiment of the present invention further includes a computing unit connected to the controller and the first memory, and the computing unit receives a certain type or several types of data read by the controller in the same clock cycle in N blocks The length of the data in each block, and the total length of the data of the one or several types of data is calculated according to the length of the received data.
  • the controller is further configured according to the s g total data in said second memory length S, a s g controls the writing or reading.
  • the controller also controls the reading and writing of the data of any type or types according to the total data length of the data calculated by the calculation unit, so as to realize data scheduling based on the data length in different scenarios With control.
  • the processing chip further includes a computing unit connected to the controller and the first memory: the controller is also used to separately receive data from the controller within the same clock cycle.
  • the N blocks reads the data length of the T-type data storage and sends it to the calculation unit;
  • the T-type data are T pieces of the N data interfaces that respectively pass through the same clock cycle Data written or read by the data interface; wherein, the data length of the T-type data is read from any one of the N blocks, including reading separately from the T 1R1W memories of the any one block
  • the data length of the T-type data, and the data length of the first-type data is read from a 1R1W memory.
  • the T-type data is the T-type data in the K-type data, where M is greater than or An integer equal to N, 2 ⁇ T ⁇ M; the calculation unit is used to calculate the total length of the data of the T-type data in the second memory, respectively.
  • the processing chip provided by the embodiment of the present invention when there are T (2 ⁇ T ⁇ M) data transmission in N data interfaces in the same clock cycle, then the controller can read this in the same clock cycle
  • the data length of the T-type data in N Blocks respectively so that T read operations and T write operations are generated in each of the N Blocks. Since each Block includes M 1R1W memories, and M takes an integer greater than or equal to N, it is possible to calculate the total length of T-type data within the same clock cycle. It can be understood that the data length of the T-type data sent by the controller to the computing unit in N blocks can be the same as the M write operations (update the data length in each 1R1W memory in the corresponding block).
  • the clock cycle can also be read and sent in the clock cycle after M write operations.
  • the former can be understood as reading the current data length before writing the latest data length, and calculating the latest total data length in combination with the current updated length learned in the controller, that is, sending the data length to the calculation unit and updating M
  • One 1R1W memory is in the same clock cycle; the latter can be understood as after the latest data length is updated, the data length is sent to the calculation unit to calculate the total length of the data, that is, M 1R1W memory is updated and the data length is sent
  • the calculation unit is at different clock cycles.
  • the total data length of M types of data can be calculated at most in the same clock cycle, and because M takes an integer greater than or equal to N, when N data interfaces have data transmission (write Or read out), and when there are N different types of data, respectively, the processing chip in the embodiment of the present invention can simultaneously support the calculation of the total length of data of N types of data, for example, the trigger condition for calculating the total length of data is N Any one or several of the data interfaces have data transmission.
  • the processing chip in the embodiment of the present invention may also support the calculation of the total data length of M types of data according to different application scenarios. For example, the trigger condition for calculating the total length of the data is not that the data interface has data transmission. It is to periodically calculate the total data length of M-type data.
  • the present application provides a processing method, which is applied to a processing device.
  • the i-th Block stores M copies of the data length of the S i , and the M copies
  • the data length of the S i is stored in the M 1R1W memories of the i-th block, one 1R1W memory stores a copy of the data length of the S i; the data length of the target data S j corresponding to the j-th block
  • read the data length of S j stored in one of the 1R1W memories of the jth block and update the M stored in the M 1R1W memories of the jth block according to the change of the data length of S j
  • 1 ⁇ j ⁇ N, and j is an integer.
  • the processing device further includes: a second memory connected to the controller, and N data interfaces connected to the second memory, and the N data interfaces are connected to the N memory blocks are in one-to-one correspondence; the method further includes writing data to the second memory or reading data from the second memory through each of the N data interfaces Storing the data written through the N data interfaces to the second memory; wherein, the target data S i corresponding to the i-th Block is specifically the data interface corresponding to the i-th Block The data stored in the second memory.
  • each 1R1W memory includes K storage units with a bit width of W;
  • the data length of the S i comprises data length K: L i1, L i2, L i3 whil
  • Each 1R1W memory of the M 1R1W memories in the i-th Block stores the K data lengths, and the K data lengths are stored one-to-one corresponding to the K storages in a 1R1W memory unit; Block in the j-th data corresponding interfaces s g write or read out of the case, reading the j-th Block wherein L jg 1R1W a correspondence storage unit stored in the memory, and in accordance with s The length of g changes, and M copies of the L jg stored in the corresponding storage units in the M 1R1W memories of the jth Block are updated; where M takes an integer greater than or equal to N, and L jg is the gth class
  • the data s g is stored in the second memory through the data interface corresponding to the jth Block, and the g- th data is the g- th data in the K-type data, 1 ⁇ g ⁇ K, and g is an integer.
  • the method further comprising: according to the total data in the second memory s g is the length of S, a s g controls the writing or reading.
  • the method further includes: reading the data length of the T-type data storage from each of the N blocks in the same clock cycle, and sending to the calculation Unit; the T-type data is data written or read out through the T data interfaces of the N data interfaces in the same clock cycle; wherein, from any one of the N blocks The data length of the T-type data is read, including the data length of the T-type data read out from the T 1R1W memories of any one of the blocks, and the data of the first type data is read from a 1R1W memory Length, the T-type data is T-type data among the K-type data, where M takes an integer greater than or equal to N, 2 ⁇ T ⁇ N; calculate the T-type data in the second memory respectively The total length of the data in.
  • the present application provides a system-on-a-chip, the system-on-a-chip includes a processing chip provided in any one of the implementation manners of the first aspect.
  • the system-on-chip may be composed of a processing chip, or may include a processing chip and other discrete devices.
  • the present application provides an electronic device, including a processing chip provided in any one of the implementation manners of the first aspect above, and a discrete device coupled to the chip.
  • FIG. 1 is a schematic structural diagram of a processing chip provided by an embodiment of the present invention.
  • FIG. 2 is a schematic structural diagram of another processing chip provided by an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of a block provided by an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a storage form of K-type data provided in an embodiment of the present invention in a first memory
  • FIG. 5 is a schematic structural diagram of yet another processing chip provided by an embodiment of the present invention.
  • FIG. 6 is a schematic flowchart of a data processing method according to an embodiment of the present invention.
  • the register is a part of the central processor, which is related to the CPU. Registers are high-speed storage components with limited storage capacity. They can be used to temporarily store instructions, data, and addresses. In the control unit of the central processor, the registers included are the instruction register (IR) and the program counter (PC). In the arithmetic and logic components of the central processor, the registers included are accumulators (ACC).
  • IR instruction register
  • PC program counter
  • ACC accumulators
  • the memory has a large range, and it almost covers all the categories about storage. Registers, memory, are all a type of memory. Any hardware with storage capability can be called a memory. Hard disks can be classified as external storage.
  • Cache is the buffer for data exchange (called Cache).
  • Cache When a piece of hardware wants to read data, it will first find the required data from the cache. If it is found, it will be executed directly. Look in memory. Because the cache runs much faster than memory, the role of the cache is to help the hardware run faster. Because the cache often uses RAM (non-permanent storage when power is turned off), the file will still be sent to the hard disk and other storage for permanent storage after use.
  • Memory that is, internal memory, is also a type of memory, and the scope of coverage is also very large. It is generally divided into read-only memory and random memory, and cache memory (CACHE).
  • Read-only memory is widely used. It is usually It is a readable chip integrated on the hardware. Its function is to identify and control the hardware. Its characteristic is that it can only be read but not written. The characteristic of the random access memory is that it can be read and written, and all data disappears after the power is turned off, that is, the so-called memory.
  • CACHE is a kind of memory that has a very fast speed in the CPU but a small capacity.
  • Queue is a first-in first-out (FIFO) linear table data structure.
  • Common operations such as inserting at the tail of the table and deleting data at the head.
  • the types of queues include linked list structure, fixed buffer structure, etc.
  • Commonly used queue space is dynamically applied from the heap. In the task of frequent data volume operations, it brings problems such as system real-time and memory fragmentation.
  • Queue length calculation formula: nCount (rear-front+nSize)%nSize. Among them, the end of the queue: the end specified in the queue for inserting data; the head of the queue: the end specified in the queue for deleting data; enqueue: the action of inserting data; dequeue: the action of deleting data.
  • Stacks and queues are data stored in a specific range of storage units, and these data can be retrieved and used again. The difference is that the stack is like a very narrow bucket of data that can only be retrieved at the end, but the queue is different, that is, "first in, out”.
  • the queue is a bit like the "queue" of people who queue up to buy things everyday. Those who queue up buy first, and those who queue up buy later, that is, “first in first out.”
  • Sometimes in the data structure there may be data queues that are queued according to size or according to certain conditions. At this time, the queue is a special queue, and it is not necessary to read data according to the principle of "first in first out”.
  • Random access memory random access memory random access memory.
  • the contents of the storage unit can be taken out or stored as needed, and the access speed is independent of the storage unit's location. This kind of memory will lose its storage content when the power is off, so it is mainly used to store programs for short time use.
  • the random access memory is divided into static random access memory (Static RAM, SRAM) and dynamic random access memory (Dynamic RAM, DRAM).
  • Modulus is the operation of finding the remainder. For example, the remainder of 10 divided by 4 is 2, and the result of modulo is 2.
  • the chip hardware implementation method to realize the queue depth of N access source mainly includes the following methods:
  • Method one directly use the multi-port read-write cache provided by the chip production factory.
  • the chip manufacturer uses 1 clock cycle N to access the cache that the source can read and write at the same time.
  • Method 2 Increase the clock frequency, and allocate the multiple read and write buffers originally in one clock cycle to multiple clock cycles to complete.
  • Method three Use registers in the chip to achieve deep calculation of the queue.
  • method 1 requires chip manufacturers to provide customized cache units.
  • Existing chip production factories generally provide up to 2R2W of cache, and N cannot be expanded indefinitely.
  • the customized cache unit is not universal, and there should be no corresponding cache for another chip production.
  • the customized cache unit has a large area and large power consumption, which is inconvenient for application specific integrated circuit (Application Specific Integrated Circuit, ASIC) integration and cannot be modified.
  • ASIC Application Specific Integrated Circuit
  • the technical problem actually to be solved by the present application is how to ensure that the internal clock frequency, area, power consumption, etc. of the chip are as balanced as possible, and flexibly realize the efficient calculation of the data length of the N access source.
  • FIG. 1 is a schematic structural diagram of a processing chip according to an embodiment of the present invention.
  • the processing chip 10 includes a controller 101 and a first memory 102 connected to the controller 101.
  • the first memory 102 includes N memory blocks, and each Block includes M 1 read-write 1R1W memories; and N is an integer greater than 1, and M is an integer greater than 1. among them,
  • the data length characterizes the length of the target data corresponding to the Block.
  • the specific storage format is that the M copies of the same data length are stored in the Block M respectively. 1R1W memory.
  • the target data corresponding to the block may be data written or read through the data interface connected to the block, or a pre-established mapping relationship with the block (such as carrying the MAC address bound to the block/ IP address/identity ID/or TCP connection relationship, etc.) data.
  • This embodiment of the present invention does not specifically limit this, that is, the correspondence between the Block and the target data can be set differently according to different application scenarios.
  • the first Block is used to store the data length of the target data S 1 corresponding to the first Block (take Block 1 in FIG. 1 as an example), where a total of M data lengths of S 1 are stored in Block 1 , And the data lengths of M copies of S i are stored in M 1R1W memories of Block1, such as 1R1W memory 1, 1R1W memory 2, 1R1W memory 3...1R1W memory M each stores a copy of the data length of S 1 , And so on.
  • the controller 101 is configured to read the data length of S j stored in one of the 1R1W memories of the jth Block when the data length of the target data S j corresponding to the jth Block changes, and according to the S j When the data length changes, the data length of M copies of the S j stored in the M 1R1W memories of the jth Block is updated, where 1 ⁇ j ⁇ N, and j is an integer.
  • the controller 101 reads the block corresponding to the changed target data The stored data length of the target data, that is, the current initial length of the target data; and then write the updated data length after calculation to each 1R1W memory (a total of M 1R1W memories) in the Block .
  • the data length of the target data is the depth of the queue (also called the length of the queue)
  • the first memory 102 may specifically be a queue depth memory, where the queue Depth refers to the total bytes of all packets cached in the queue.
  • the controller 101 reads the queue depth from one of the 1R1W memories of the corresponding block in the queue depth memory, and adds the current packet length as the new queue depth, and then sets the new queue depth Write back to all M 1R1W memories in the corresponding Block.
  • the controller 101 When a packet is dequeued, the controller 101 reads the queue depth from one of the corresponding Block 1R1W memories, subtracts the current dequeued packet length as the new queue depth, and then writes the new queue depth back to the corresponding All M 1R1W memories in the block.
  • a data interface can be understood as an access source, corresponding to a dequeue port or an enqueue port. Therefore, when a certain enqueue port or dequeue port receives a certain user's data packet, the controller 101 Then the corresponding Block is used to update the length of the data received or sent by the user through the port.
  • the controller 101 reads the Block One of the 2 in 1R1W memory (such as 1R1W memory 2) S 2 stored in the current data length is 128 bytes, and then through calculation to determine the updated data length is 256 bytes, then to Block 2 All 1R1W memory (including memory 1R1W 2) writing the data length of the target data S 2562 (e.g., written in binary form).
  • 1R1W memory such as 1R1W memory 2
  • the 1R1W memory (1R1W memory) in the embodiment of the present invention is a read-write memory, which supports one read operation and one write operation in one clock cycle.
  • the data length of the above read S j is a read operation for a 1R1W memory in the jth block; the data length of the updated M copies S j is for each 1R1W in the jth block
  • the memory performs one write operation, a total of M write operations. That is, an update requires a read M write operation, so the upper limit of M read M writes that can be provided by M 1R1W memories in a block within one clock cycle is not exceeded.
  • the 1R1W memory in this application may also be a multi-read multi-write memory.
  • the characteristics of the multi-read multi-write memory can be The principle of reading the initial length of a plurality of target data and the changed data length of updating a plurality of target data is the same as that of the above-mentioned one-read-write memory, which will not be repeated here.
  • the data lengths of M copies of the target data corresponding to the block are repeatedly stored, when N
  • the data length of the target data corresponding to any one or more blocks in a block changes, read the initial length stored in the 1R1W memory of one of the corresponding blocks, and update the M stored in the 1R1W memory of the block
  • the initial length may include multiple types of data (for example, data including multiple users).
  • the stored Block is allowed at most M read operations and M write operations within one clock cycle, and one of the M read operations can be used to read the initial length of this type of data (to calculate the updated Data length), M write operations can be used to write the updated data length of M copies of this type of data, so that the total length of M types of data can be calculated in the same clock cycle.
  • the processing chip in the embodiment of the present invention can allow the calculation of the total length of the M-type data at most in one clock cycle, and realizes the N access source inside the chip while ensuring that the data length of the target data is updated in real time
  • the calculation method of M-type data length improves the efficiency and accuracy of the calculation of the data length of multi-type data from multiple access sources.
  • FIG. 2 is a schematic structural diagram of another processing chip provided by an embodiment of the present invention.
  • the processing chip 10 includes a controller 101 and a first memory 102 connected to the controller 101.
  • it also includes a second memory 103 connected to the controller 101, and N data interfaces connected to the second memory 103, where N is an integer greater than 1, wherein the first memory 102 includes N memory blocks,
  • Each Block includes M one-read-one-write 1R1W memories; the N data interfaces correspond to the N memory blocks Block one-to-one.
  • M takes an integer greater than or equal to N.
  • Each of the N data interfaces is used to write data to or read data from the second memory.
  • each data interface is connected to the external interface of the processing chip 10.
  • N external interfaces are used as an example.
  • the N external interfaces can simultaneously input data packets of the same or different users, and each data packet Carry user ID and have a certain data length.
  • the controller 101 can perform related control on the user's data message based on the total storage amount of the data message of a certain user (that is, the data message carrying the user ID) in the second memory 103 (eg, discarding the message, Back pressure port or billing, etc.).
  • the second memory 103 is used to store data written through the N data interfaces. For example, after the processing chip 10 receives the data message of each interface, the data message is cached in the second memory 103, and at the same time, the user ID and message length of each data message and related control information are sent to the controller 101.
  • the functions of the N blocks in the first memory 102 reference may be made to the related description of the N blocks in FIG. 1 described above, and details are not described herein again.
  • the controller 101 is configured to read the data length of S j stored in one of the 1R1W memories of the jth Block when the data interface corresponding to the jth Block has S j input or output, and according to S j The data length of changes, update the data length of M copies of the S j stored in the M 1R1W memories of the jth Block, where 1 ⁇ j ⁇ N, and j is an integer.
  • the controller controls to read the initial data length stored in the 1R1W memory of one of the corresponding blocks and update the block
  • the M data lengths stored in the M 1R1W memories in the memory are to allow at most M access sources to simultaneously access and read the updated M data lengths in the block in one clock cycle. Further, for the function of the controller 101, reference may be made to the related description of the controller 101 in FIG. 1 described above, and details are not described herein again.
  • the processing chip provided by the embodiment of the present invention further includes a second memory and N data interfaces connected thereto, and the N data interfaces correspond one-to-one to the N blocks, so the target data corresponding to the block is passed through the block The data written or read by the corresponding data interface.
  • the second memory is used to store various types of data written through N data interfaces, and the N data interfaces can be regarded as N access sources of the processing chip. When data is written or read through a data interface, the data length stored in the block corresponding to the data interface is read and updated to ensure the accuracy of the data length of the data.
  • FIG. 3 is a schematic structural diagram of a Block provided by an embodiment of the present invention.
  • the block may be any one of the N blocks in the first memory 102 provided in FIG. 1 or FIG. 2 in this application. among them,
  • each block includes M 1R1W memories, and each 1R1W memory includes K storage units with a bit width of W;
  • the data length of the S i comprises K data length: L i1, L i2, L i3 ...... L iK;
  • 1R1W each of said memory stores the i-th 1R1W Block M of the K memory data length, data length and the K One-to-one correspondence is stored in the K storage units in a 1R1W memory.
  • the 1R1W memory in this application includes multiple storage units, and each storage unit stores the same data bit width and is the smallest unit of the 1R1W memory (this application assumes that the storage unit can store data bit width W) . Therefore, the 1R1W memory will be implemented according to the read-write method of W in W out when writing and reading data, that is, the 1R1W memory can only write data to one storage unit per clock cycle, and it can only Read the data stored in a storage unit.
  • queues can generally be grouped according to different users and different services.
  • the target data is a data message carrying a user ID
  • the K-type data is a data message of K different users (carrying different user IDs).
  • the first memory 102 may specifically be a queue depth memory
  • the second memory may specifically be a data buffer.
  • the depth of each 1R1W in each block in the queue depth memory is K (the number of queues that can be stored), and the bit width is W (used to save the length of a queue), where the bit width W is greater than each user's
  • the upper limit of the queue length corresponding to the amount of cache is sufficient.
  • the controller 101 When a packet is dequeued, the controller 101 reads the queue depth from a storage unit in one of the 1R1W memories of the corresponding Block, subtracts the current dequeued packet length as the new queue depth, and then sets the new queue depth Write back to all M corresponding 1R1W storage units in the corresponding Block; Similarly, when a packet is enqueued, the current packet length is added as the new queue depth, and the relevant queue depth is performed. The update operation will not be repeated here. Further optionally, when the length of the queue exceeds the bit width W, the problem of length inversion can be solved by loop calculation, that is, the new queue length is modulo and then stored.
  • FIG. 4 is a schematic diagram of a storage form of K-type data provided in an embodiment of the present invention in a first memory.
  • each of the M 1R1W memories (including 1R1W memory 1, 1R1W memory 2,... 1R1W memory M) has Class K stored in it
  • the data length of the data that is, M duplicate data lengths stored between M 1R1W memories in the same block.
  • the first storage unit in the 1R1W memory 1 of the first block stores the data length L 11 of the data of the first type stored in the second memory through the data interface 1
  • the second storage unit in the 1R1W memory 2 stores the data length L 22 of the second type of data stored in the second memory through the data interface 2.
  • the controller 101 for the j-th data corresponding to Block interfaces s g written or read, the reading of the j-th Block L jg correspondence storage unit wherein a 1R1W stored in memory, And update M copies of the L jg stored in the corresponding storage units in the M 1R1W memories of the jth Block according to the change in the length of s g ; where M takes an integer greater than or equal to N, and L jg is
  • M takes an integer greater than or equal to N
  • L jg is
  • the g-th data s g is stored in the second memory through the data interface corresponding to the j-th block, and the g- th data is the g- th data in the K-type data, 1 ⁇ g ⁇ K, and g is an integer.
  • the controller 101 controls to read the corresponding storage unit in one of the 1R1W memories in the block corresponding to the data interface (the first The data length in the storage unit corresponding to the g-type data).
  • the controller 101 reads the storage unit 2 (assuming the second) in one of the 1R1W memory (assuming 1R1W memory 1) in the second block L 22 corresponding to the type data storing unit data length 2) L 22, and the data length of the second data interface input, and L 22 is read, is calculated based on the updated second data type, finally, updated The subsequent L 22 is written to the second storage unit in each 1R1W memory in the second block.
  • the K-type data is data of K users, and each user's data carries different user IDs to distinguish data between different users.
  • Each user data is stored in the second memory through the data interface, and the data storage amount (ie, data length) of each user in the second memory is stored in each block in the first memory.
  • the data storage amount ie, data length
  • the length of the data stored in the block corresponding to the data interface needs to be updated in time.
  • the corresponding Block needs to go through a read M write process, because at this time, the user's data length needs to be updated, and before the update, first need to know the initial length of the user's data stored in the Block, that is, at least A 1R1W memory read operation, further, after reading the user's initial data length, you need to update the length of all the storage in the block in conjunction with the data length of the data written or read through the interface, because , Even if it can be read by M access sources at the same time provided by the embodiment of the present invention, because any one reads the length stored in N 1R1W storages, it must ensure that the data length of the user is the latest. When updating, the length of the data stored in the N 1R1W stores in the block needs to be updated, that is, M write operations must be performed in the same clock cycle.
  • the depth of the M 1R1W memories in each of the N blocks in the first memory is K, and the bit width is W.
  • the target data includes K-type data
  • each type of data is stored in the data length of the second memory through a certain data interface, which is exactly stored in one of the storage units in the 1R1W memory in the corresponding block of the data interface, And the data lengths of M copies of the data are stored in the M 1R1W memories of the block, respectively. Therefore, when a data interface has data transmission (writing or reading), the controller reads the storage unit in the 1R1W memory corresponding to the block (the fixed storage unit corresponding to this type of data).
  • the initial length of storage in order to calculate and update the length of M copies of data stored in the corresponding storage unit in the M 1R1W memory.
  • the processing chip in the embodiment of the present invention can be implemented, and the maximum total data length of M types of data can be calculated in the same clock cycle, and since M takes an integer greater than or equal to N, therefore, when N data interfaces have During data transmission (writing or reading), and when there are N different types of data, respectively, the processing chip in the embodiment of the present invention can simultaneously support the calculation of the total length of the N type data of the N access source, which improves multiple access Calculation efficiency of the data length of the source.
  • FIG. 5 is a schematic structural diagram of yet another processing chip provided by an embodiment of the present invention.
  • the processing chip may further include The computing unit 104 connected to the controller 101 and the first memory 102. among them,
  • the controller 101 is also used to read the data length of s g from one of the 1R1W memories of each of the N blocks in the same clock cycle and send it to the calculation unit; including L 1g , L 2g , L 3g ... L Ng ; It should be noted that each data interface can only write or read data messages in one storage unit per clock cycle, of course, the length of the data message is not fixed Is variable. Therefore, when data packets are distinguished according to different users, each data interface can only write or read one user's data packet in each constant cycle.
  • the amount of cache (that is, the total queue length) is finally controlled according to the amount of cache.
  • the processing chip wants to calculate the total length of a user's data in the second memory 103, it is necessary to know the total amount of data stored in the second memory 103 by the user's data through the N data interfaces, which is required Learn the data length of the user's data stored in N blocks. Therefore, to calculate the total length of a user's data in the second memory 103, a read operation needs to be performed in each block. Since the processing chip 10 in the embodiment of the present invention includes N data interfaces, and each data interface writes or reads at most one data packet in one clock cycle, the processing chip in the embodiment of the present invention is on the same clock The data length of up to N users will change during the period.
  • the processing chip needs to calculate the total data length of N users at most in one clock cycle.
  • the calculation of the total data length of each user requires a read operation in a block, so calculating the total data length of N users occupies N read operations in a block, and for a block, the one clock cycle
  • the N read operations within are distributed in N 1R1W memories in M 1R1W memories, that is, the data length of each user is from a block in one of the 1R1W memory storage units, and the data of different users are distributed in different In the storage unit, and the data length of each user is stored in a different storage unit (that is, a different address) of 1R1W, so it does not interfere with each other, so it can be simultaneously used in each block in the same clock cycle N access sources to access.
  • the controller 101 also based on the total data in the second memory s g is the length of S, a s g controls the writing or reading. Calculate the depth of the queue, and perform various control operations on the message according to the queue depth calculation interface. Used to send the message length and user ID to the queue depth storage unit, while supporting N user ID access, get the depth of N user IDs in the cache, and send the data to the queue depth (user ID) calculation unit,
  • the controller 101 also reads the data length of the T-type data storage from each of the N blocks in the same clock cycle and sends them to the calculation unit
  • the T-type data is data written or read out through the T data interfaces of the N data interfaces in the same clock cycle; wherein, the data is read from any one of the N blocks Fetch the data length of the T-type data, including the data length of the T-type data read out from the T 1R1W memories of any one of the blocks, and the data length of the first-class data read from a 1R1W memory
  • the T-type data is T-type data among the K-type data, where M takes an integer greater than or equal to N, and 2 ⁇ T ⁇ M.
  • the controller when the controller needs to calculate the total length of multiple types of data (T-type data) in the second memory in the same clock cycle, it can obtain the data of the T-type data in N blocks in the same clock cycle Length, where the maximum value of T is M, because the M 1R1W memories in any block provide at most M read operations in a clock cycle.
  • M takes an integer greater than or equal to N.
  • the calculating unit is used to calculate the total length of the T-type data in the second memory. That is, the processing chip in the embodiment of the present invention can obtain at most the data length of the T-type data in N blocks in one clock cycle, so the calculation unit can calculate the T-type data separately according to the data sent by the controller The total length of data in the second memory.
  • the processing chip provided by the embodiment of the present invention when there are T (2 ⁇ T ⁇ M) data transmission in N data interfaces in the same clock cycle, then the controller can read this in the same clock cycle
  • the data length of the T-type data in N Blocks respectively so that T read operations and T write operations are generated in each of the N Blocks. Since each Block includes M 1R1W memories, and M takes an integer greater than or equal to N, it is possible to calculate the total length of T-type data within the same clock cycle. It can be understood that the data length of the T-type data sent by the controller to the computing unit in N blocks can be the same as the M write operations (update the data length in each 1R1W memory in the corresponding block).
  • the clock cycle can also be read and sent in the clock cycle after M write operations.
  • the former can be understood as reading the current data length before writing the latest data length, and calculating the latest total data length in combination with the current updated length learned in the controller, that is, sending the data length to the calculation unit and updating M
  • One 1R1W memory is in the same clock cycle; the latter can be understood as after the latest data length is updated, the data length is sent to the calculation unit to calculate the total length of the data, that is, M 1R1W memory is updated and the data length is sent
  • the calculation unit is at different clock cycles.
  • the total data length of M types of data can be calculated at most in the same clock cycle, and because M takes an integer greater than or equal to N, when N data interfaces have data transmission (write Or read out), and when there are N different types of data, respectively, the processing chip in the embodiment of the present invention can simultaneously support the calculation of the total length of data of N types of data, for example, the trigger condition for calculating the total length of data is N Any one or several of the data interfaces have data transmission.
  • the processing chip in the embodiment of the present invention may also support the calculation of the total data length of M types of data according to different application scenarios. For example, the trigger condition for calculating the total length of the data is not that the data interface has data transmission. It is to periodically calculate the total data length of M-type data, etc.
  • the amount of cache occupied by the user through the access source needs to be updated, so the current address for the user in the block needs to be read
  • the queue length corresponding to the amount of cache, and then update the latest queue length according to the above enqueue or dequeue situation.
  • it involves a read N write, where a read refers to reading any of the Block
  • the user in a 1R1W memory corresponds to the queue length, and then update the queue length according to the above dequeue or enqueue situation.
  • the queue length is updated, not one, but M, because the same M copies are stored in a block If you need to maintain the consistency of the information and subsequently calculate the total length of the queues of M users at the same time, you need to update the current latest queue length of the M Memory for that user.
  • the ultimate purpose of the controller is to calculate the current total cache amount of the user on the entire system (because the data of the same user may pass through any of the N access sources above Dequeue or enqueue, so each access source may affect the user's cache capacity on the system, that is, the queue length corresponding to the user's cache capacity on the entire system is affected by each access source), Therefore, the controller needs to read the queue length corresponding to the user recorded in each block, and calculate the total data length.
  • the first memory and the second memory in this application may include volatile memory (volatile memory), such as random access memory (random-access memory, RAM); and may also include non-volatile memory (non-volatile memory) , Such as read-only memory (ROM), flash memory (flash memory), hard disk (hard disk drive) or solid-state drive (SSD); can also include the above types of memory combination.
  • volatile memory such as random access memory (random-access memory, RAM
  • non-volatile memory non-volatile memory
  • ROM read-only memory
  • flash memory flash memory
  • hard disk hard disk drive
  • SSD solid-state drive
  • the structure of the processing chip in the embodiment of the present invention includes, but is not limited to, the structures in FIGS. 1 to 5 described above.
  • FIG. 6 is a schematic flowchart of a data processing method provided by an embodiment of the present invention, which can be applied to the processing chip described in FIGS. 1 to 5 above.
  • the method can be applied to a processing device, and the processing device includes A controller and a first memory connected to the controller; wherein the first memory includes N memory blocks, and each block includes N 1 read-write 1R1W memories; N is an integer greater than 1 and M is An integer greater than 1; the processing method includes the following steps S101-S103.
  • Step S101 In each of the N blocks, the data length of the target data S i corresponding to the i-th block is stored.
  • the i-th Block stores M copies of the data length of the S i
  • Step S102 When the data length of the target data S j corresponding to the jth Block changes, read the data length of S j stored in one of the 1R1W memories of the jth Block, and change according to the data length of S j , Updating the data length of M copies of the S j stored in the M 1R1W memories of the jth Block, where 1 ⁇ j ⁇ N, and j is an integer.
  • the processing device further includes: a second memory connected to the controller, and N data interfaces connected to the second memory, and the N data interfaces are connected to the N memory blocks Block one-to-one correspondence; the method further includes:
  • Step S103 Write data to or read data from the second memory through each of the N data interfaces.
  • Step S104 Store the data written through the N data interfaces to the second memory; wherein, the target data S i corresponding to the i-th Block is specifically corresponding to the i-th Block The data stored in the second memory by the data interface.
  • each 1R1W memory includes K storage units with a bit width of W;
  • Each 1R1W memory of the M 1R1W memories in the i-th Block stores the K data lengths, and the K data lengths are stored one-to-one corresponding to the K storages in a 1R1W memory In the unit
  • Step S105 Block at the j-th data corresponding interfaces s g written or read, the reading of the j-th Block wherein L jm 1R1W a correspondence storage unit stored in the memory, and in accordance with s The length of g changes, and M copies of the L jm stored in the corresponding storage units in the M 1R1W memories in the jth Block are updated.
  • L jm is the length of the data stored in the second memory through the data interface corresponding to the jth Block s g of the mth type of data s
  • the mth type of data is the The m-th data in the K-type data, 1 ⁇ m ⁇ K, and m is an integer.
  • the method further includes:
  • the method further includes:
  • Step S108 The total of the data in the second memory s g is the length of S, to control the writing or reading s g.
  • the method further includes:
  • Step S109 In the same clock cycle, read the data length of the T-type data storage from each of the N blocks and send it to the calculation unit;
  • the T-type data is data written or read out through T data interfaces of the N data interfaces in the same clock cycle; wherein, from any one of the N blocks The data length of the T-type data is read in, including the data length of the T-type data respectively read from the T 1R1W memories of the any one Block, and the type of data is read from a 1R1W memory Data length, the T type data is T type data among the K type data, wherein M takes an integer greater than or equal to N, 2 ⁇ T ⁇ N;
  • Step S110 Calculate the total length of the T-type data in the second memory separately.
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical, or other forms.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a dedicated computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted through the computer-readable storage medium.
  • the computer instructions can be sent from a website, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) Another website site, computer, server or data center for transmission.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device including a server, a data center, and the like integrated with one or more available media.
  • the available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, digital universal disc (DVD)), or semiconductor media (eg, solid state disk (SSD)) )Wait.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种处理芯片、方法及相关设备,其中,处理芯片(10)包括:控制器(101)、与控制器(101)连接的第一存储器(102);其中,第一存储器(102)包括N个存储块Block,每个Block包括M个一读一写1R1W存储器;N个Block中的第i个Block,用于存储与第i个Block对应的目标数据Si的数据长度,i=1、2、3、……N;控制器(101),用于在第j个Block对应的目标数据Sj的数据长度变化时,读取第j个Block的其中一个1R1W存储器中存储的Sj的数据长度,并根据Sj的数据长度变化,更新第j个Block的M个1R1W存储器中存储的M份Sj的数据长度。采用本方法可以提升多访问源的数据长度的计算效率。

Description

一种处理芯片、方法及相关设备 技术领域
本发明涉及芯片技术领域,尤其涉及一种处理芯片、方法及相关设备。
背景技术
在各类通讯、电子设备所使用的芯片中,有很多功能需要基于数据的长度(例如队列的深度或数据报文的长度等)做运算,比如,基于队列的长度丢弃报文,反压端口,计费等。
假设,***需要支持1M个数的用户队列,并基于用户队列的深度对各个用户队列进行调度。其中,不同用户之间的队列可以基于媒体访问控制地址(Media Access Control Address,MAC)、互联网协议地址(Internet Protocol Address,IP)地址或者传输控制协议(Transmission Control Protocol,TCP)连接关系等进行区分。而在实际监控调度过程中,每个时钟周期内,任意一个用户的队列深度可能会通过多个访问源(例如N个通道、数据接口、流水或平面)进行增加或减少。对于这类队列,若要在1个时钟周期内确定多个用户队列的实际深度,就需要涉及在芯片内实现N访问源的多个队列深度计算的实现方法。
因此,如何在芯片内部实现N访问源的数据长度的高效计算,是亟待解决的问题。
发明内容
本发明实施例所要解决的技术问题在于,提供一种处理芯片、方法及相关设备,以提升多访问源的数据长度的计算效率。
第一方面,本发明实施例提供了一种处理芯片,可包括:控制器、与所述控制器连接的第一存储器;其中,所述第一存储器包括N个存储块Block,每个Block包括M个一读一写1R1W存储器;N为大于1的整数,M为大于1的整数;所述N个Block中的第i个Block,用于存储与所述第i个Block对应的目标数据S i的数据长度,i=1、2、3、……N;其中,所述第i个Block中存储M份所述S i的数据长度,且M份所述S i的数据长度分别存储在所述第i个Block的M个1R1W存储器中,一个1R1W存储器存储一份所述S i的数据长度;所述控制器,用于在第j个Block对应的目标数据S j的数据长度变化时,读取所述第j个Block的其中一个1R1W存储器中存储的S j的数据长度,并根据S j的数据长度变化,更新所述第j个Block的M个1R1W存储器中存储的M份所述S j的数据长度,其中,1≤j≤N,且j为整数。
本发明实施例提供的处理芯片,通过在第一存储器中的N个Block中的每一个Block中的M个1R1W存储器中,分别重复存储M份与该Block对应的目标数据的数据长度,当N个Block中的任意一个或多个Block对应的目标数据的数据长度发生变化时,读取对应的Block中的其中一个1R1W存储器中存储的初始长度,并更新该Block中的M个1R1W存储器中存储的初始长度。可选的,目标数据可以包括多类数据(例如包括多个用户的数据)。因此,在本发明实施例中,当某类数据通过N个访问源(例如N个通道、数据接口、流水或平面)中的一个或多个进行增加或减少时,由于该类数据的数据长度所存储于的 Block,在一个时钟周期内,最多被允许M次读操作和M次写操作,而M次读操作中的其中一次可用于读取该类数据的初始长度(以计算更新后的数据长度),M次写操作则可用于写入该类数据的M份更新后的数据长度,以便于在同一个时钟周期内可以计算M类数据的总长度。即当需要在同一个时钟周期内计算M类数据通过N个访问源写入(除去读出的)的数据总长度时,则可以使用每一个Block中的M次读操作(每一次对应一类数据)读出在各个Block中的数据长度,最终进行求和得到数据总长度。因此,本发明实施例中的处理芯片,在一个时钟周期内,最多可以允许计算M类数据的总长度,在保证了目标数据的数据长度实时更新的情况下,实现了芯片内部的N访问源的M类数据长度的计算方法,提升了多访问源的多类数据的数据长度计算的效率和精确性。
在一种可能的实现方式中,所述芯片还包括:与所述控制器连接的第二存储器,以及与所述第二存储器连接的N个数据接口,所述N个数据接口与所述N个存储块Block一一对应;所述N个数据接口中的每个数据接口,用于向所述第二存储器写入数据,或从所述第二存储器中读出数据;所述第二存储器,用于存储通过所述N个数据接口写入的数据;其中,与所述第i个Block对应的目标数据S i,具体为通过所述第i个Block对应的数据接口存储至所述第二存储器中的数据。
本发明实施例提供的处理芯片,还包括第二存储器以及与之连接的N个数据接口,且该N个数据接口与N个Block一一对应,因此,Block对应的目标数据即为通过该Block对应的数据接口写入或者读出的数据。该第二存储器用于存储通过N个数据接口写入的各类数据,且该N个数据接口可以看做是该处理芯片的N个访问源。当有数据通过某个数据接口写入或者读出时,则对该数据接口对应的Block中存储的数据长度进行读取和更新,以保证该数据的数据长度的精确性。
在一种可能的实现方式中,每个1R1W存储器包括K个位宽为W的存储单元;S i包括K类数据,将第k类数据s k通过第i个Block对应的数据接口存储至所述第二存储器中的数据长度记为L ik,k=1、2、3、……K;所述S i的数据长度包括K个数据长度:L i1,L i2,L i3……L iK;所述第i个Block中的M个1R1W存储器中的每一个1R1W存储器存储所述K个数据长度,且所述K个数据长度一一对应的存储在一个1R1W存储器中的K个所述存储单元中;所述控制器,具体用于在第j个Block对应的数据接口有s g写入或读出的情况下,读取所述第j个Block的其中一个1R1W存储器的对应存储单元中存储的L jg,并根据s g的长度变化,更新所述第j个Block中的M个1R1W存储器中的对应存储单元中存储的M份所述L jg,其中,M取大于或者等于N的整数;其中,L jg为第g类数据s g通过第j个Block对应的数据接口存储至所述第二存储器中的数据长度,所述第g类数据为所述K类数据中的第g类数据,1≤g≤K,且g为整数。
本发明实施例提供的处理芯片,其第一存储器中的N个Block中的每一个Block中的M个1R1W存储器的深度均为K,位宽均为W。当目标数据包括K类数据时,则每一类数据通过某个数据接口存储至第二存储器中的数据长度,恰好存储于该数据接口对应的Block中的1R1W存储器中的其中一个存储单元中,且M份该数据的数据长度分别存储于该Block的M个1R1W存储器中。因此,当某个数据接口有数据传输(写入或读出)时,则通过控制器读取对应Block中的其中一个1R1W存储器中的存储单元(该类数据对应的固定的存 储单元)中所存储的初始长度,以便于计算并更新M个1R1W存储器中对应的存储单元中存储的M份数据长度。综上,本发明实施例中的处理芯片可以实现,在同一个时钟周期内最多计算M类数据的数据总长度,且因为M取大于或者等于N的整数,因此,当N个数据接口均有数据传输(写入或读出)时,且分别为N个不同类数据时,则本发明实施例中的处理芯片可以同时支持计算N访问源的N类数据的数据总长度,提升了多访问源的数据长度的计算效率。
在一种可能的实现方式中,所述处理芯片还包括与所述控制器和所述第一存储器连接的计算单元:所述控制器,还用于在同一个时钟周期内,从所述N个Block中的每一个Block的其中一个1R1W存储器中读取s g的数据长度,并发送至所述计算单元;包括L 1g,L 2g,L 3g……L Ng;所述计算单元,用于根据读取的所述s g的数据长度,计算s g在所述第二存储器中的数据总长度S,其中,
Figure PCTCN2018122946-appb-000001
1≤g≤K,且g为整数,i=1、2、3、……N。
本发明实施例提供的处理芯片,还包括与控制器和第一存储器连接的计算单元,该计算单元接收控制器在同一个时钟周期内读取的某一类或几类数据在N个Block中的每一个Block中的数据长度,并根据接收到的数据长度计算该一类或者几类数据的数据总长度。
在一种可能的实现方式中,所述控制器,还用于根据所述s g在所述第二存储器中的数据总长度S,控制所述s g的写入或读出。
本发明实施例提供的处理芯片,控制器还根据计算单元计算的任意一类或几类数据的数据总长度对该类数据的读写进行控制,以实现不同场景下的基于数据长度的数据调度与控制。
在一种可能的实现方式中,所述处理芯片还包括与所述控制器和所述第一存储器连接的计算单元:所述控制器,还用于在同一个时钟周期内,分别从所述N个Block中的各个Block中读取T类数据存储的数据长度,并发送至所述计算单元;所述T类数据为在同一个时钟周期内分别通过所述N个数据接口中的T个数据接口写入或者读出的数据;其中,从所述N个Block中的任意一个Block中读取所述T类数据的数据长度,包括从所述任意一个Block的T个1R1W存储器中分别读出的所述T类数据的数据长度,且从一个1R1W存储器中读出一类数据的数据长度,所述T类数据为所述K类数据中的其中T类数据,其中,M取大于或者等于N的整数,2≤T≤M;所述计算单元,用于分别计算所述T类数据在所述第二存储器中的数据总长度。
本发明实施例提供的处理芯片,当同一个时钟周期内,N个数据接口中有T(2≤T≤M)类数据的传输时,那么控制器可以在同一个时钟周期内,读取该T类数据分别在N个Block中的数据长度,因而分别在N个Block中的每一个Block中产生T次读操作,以及T次写操作。由于每个Block中包括M个1R1W存储器,且M取大于或者等于N的整数,因此可以实现在同一个时钟周期内的T类数据的数据总长度的计算。可以理解的是,控制器向计算单元发送的T类数据分别在N个Block中的数据长度,可以是与M次写操作(更新对应Block中的每个1R1W存储器中的数据长度)在同一个时钟周期读取的,也可以是在M次写操作之后的时钟周期读取并发送的。前一种,可以理解为在未写入最新数据长度之前就读取当前数据长度,并结合控制器中获知的当前更新的长度计算最新的数据总长度,即将数据长度发送至计算单元和更新M个1R1W存储器是在同一个时钟周期;后一种可以 理解为当更新了最新的数据长度之后,再将数据长度发送至计算单元计算数据的总长度,即更新M个1R1W存储器和将数据长度发送至计算单元是在不同时钟周期。综上,本发明实施例,在同一个时钟周期内最多可以计算M类数据的数据总长度,且因为M取大于或者等于N的整数,所以,当N个数据接口均有数据传输(写入或读出)时,且分别为N个不同类数据时,则本发明实施例中的处理芯片可以同时支持计算N类数据的数据总长度,比如,计算数据的总长度的触发条件为N个数据接口中任意一个或几个有数据传输。可选的,本发明实施例中的处理芯片也可以根据应用场景的不同,同时支持计算M类数据的数据总长度,比如,计算数据的总长度的触发条件不为数据接口有数据传输,而是周期性计算M类数据的数据总长度等。
第二方面,本申请提供一种处理方法,应用于处理装置,所述处理装置包括控制器、与所述控制器连接的第一存储器;其中,所述第一存储器包括N个存储块Block,每个Block包括M个一读一写1R1W存储器;N为大于1的整数,M为大于1的整数;所述方法可包括:在所述N个Block中的每个Block中,存储与所述第i个Block对应的目标数据S i的数据长度,i=1、2、3、……N;其中,所述第i个Block中存储M份所述S i的数据长度,且M份所述S i的数据长度分别存储在所述第i个Block的M个1R1W存储器中,一个1R1W存储器存储一份所述S i的数据长度;在第j个Block对应的目标数据S j的数据长度变化时,读取所述第j个Block的其中一个1R1W存储器中存储的S j的数据长度,并根据S j的数据长度变化,更新所述第j个Block的M个1R1W存储器中存储的M份所述S j的数据长度,其中,1≤j≤N,且j为整数。
在一种可能的实现方式中,所述处理装置还包括:与所述控制器连接的第二存储器,以及与所述第二存储器连接的N个数据接口,所述N个数据接口与所述N个存储块Block一一对应;所述方法还包括:通过所述N个数据接口中的每个数据接口,向所述第二存储器写入数据,或从所述第二存储器中读出数据;将通过所述N个数据接口写入的数据存储至所述第二存储器;其中,与所述第i个Block对应的目标数据S i,具体为通过所述第i个Block对应的数据接口存储至所述第二存储器中的数据。
在一种可能的实现方式中,每个1R1W存储器包括K个位宽为W的存储单元;S i包括K类数据,将第k类数据s k通过第i个Block对应的数据接口存储至所述第二存储器中的数据长度记为L ik,k=1、2、3、……K;所述S i的数据长度包括K个数据长度:L i1,L i2,L i3……L iK;所述第i个Block中的M个1R1W存储器中的每一个1R1W存储器存储所述K个数据长度,且所述K个数据长度一一对应的存储在一个1R1W存储器中的K个所述存储单元中;在第j个Block对应的数据接口有s g写入或读出的情况下,读取所述第j个Block的其中一个1R1W存储器的对应存储单元中存储的L jg,并根据s g的长度变化,更新所述第j个Block中的M个1R1W存储器中的对应存储单元中存储的M份所述L jg;其中,M取大于或者等于N的整数,L jg为第g类数据s g通过第j个Block对应的数据接口存储至所述第二存储器中的数据长度,所述第g类数据为所述K类数据中的第g类数据,1≤g≤K,且g为整数。
在一种可能的实现方式中,所述方法还包括:在同一个时钟周期内,从所述N个Block 中的每一个Block的其中一个1R1W存储器中读取s g的数据长度,并发送至所述计算单元;包括L 1g,L 2g,L 3g……L Ng;根据读取的所述s g的数据长度,计算s g在所述第二存储器中的数据总长度S,其中,
Figure PCTCN2018122946-appb-000002
1≤g≤K,且g为整数,i=1、2、3、……N。
在一种可能的实现方式中,所述方法还包括:根据所述s g在所述第二存储器中的数据总长度S,控制所述s g的写入或读出。
在一种可能的实现方式中,所述方法还包括:在同一个时钟周期内,分别从所述N个Block中的各个Block中读取T类数据存储的数据长度,并发送至所述计算单元;所述T类数据为在同一个时钟周期内分别通过所述N个数据接口中的T个数据接口写入或者读出的数据;其中,从所述N个Block中的任意一个Block中读取所述T类数据的数据长度,包括从所述任意一个Block的T个1R1W存储器中分别读出的所述T类数据的数据长度,且从一个1R1W存储器中读出一类数据的数据长度,所述T类数据为所述K类数据中的其中T类数据,其中,M取大于或者等于N的整数,2≤T≤N;分别计算所述T类数据在所述第二存储器中的数据总长度。
第三方面,本申请提供一种片上***芯片,该片上***芯片包括上述第一方面的任意一种实现方式所提供的处理芯片。该片上***芯片,可以由处理芯片构成,也可以包含处理芯片和其他分立器件。
第四方面,本申请提供一种电子设备,包括上述第一方面中的任意一种实现方式所提供的处理芯片以及耦合于所述芯片的分立器件。
附图说明
为了更清楚地说明本发明实施例或背景技术中的技术方案,下面将对本发明实施例或背景技术中所需要使用的附图进行说明。
图1是本发明实施例提供的一种处理芯片的结构示意图;
图2是本发明实施例提供的另一种处理芯片的结构示意图;
图3是本发明实施例提供的一种Block的结构示意图;
图4为本发明实施例提供的K类数据在第一存储器中的存储形式示意图;
图5是本发明实施例提供的又一种处理芯片的结构示意图;
图6是本发明实施例提供的一种数据处理方法的流程示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例进行描述。
本申请的说明书和权利要求书及所述附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、***、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
首先,对本申请中的部分用语进行解释说明,以便于本领域技术人员理解。
(1)寄存器,是中央处理器内的组成部份,它跟CPU有关。寄存器是有限存贮容量的高速存贮部件,它们可用来暂存指令、数据和位址。在中央处理器的控制部件中,包含的寄存器有指令寄存器(IR)和程序计数器(PC)。在中央处理器的算术及逻辑部件中,包含的寄存器有累加器(ACC)。
(2)存储器,范围较大,它几乎涵盖了所有关于存储的范畴。寄存器,内存,都是存储器中的一种。凡是有存储能力的硬件,都可以称之为存储器。硬盘则可以归入外存储器行列。
(3)缓存,就是数据交换的缓冲区(称作Cache),当某一硬件要读取数据时,会首先从缓存中查找需要的数据,如果找到了则直接执行,找不到的话则从内存中找。由于缓存的运行速度比内存快得多,故缓存的作用就是帮助硬件更快地运行。因为缓存往往使用的是RAM(断电即掉的非永久储存),所以在用完后还是会把文件送到硬盘等存储器里永久存储。
(3)内存,即内存储器,它也是存储器中的一种,包涵的范围也很大,一般分为只读存储器和随即存储器,以及高速缓冲存储器(CACHE),只读存储器应用广泛,它通常是一块在硬件上集成的可读芯片,作用是识别与控制硬件,它的特点是只可读取,不能写入。随机存储器的特点是可读可写,断电后一切数据都消失,也即是通常所说的内存。CACHE是在CPU中速度非常块,而容量却很小的一种存储器。
(4)队列,是一种先进先出(FIFO)的线性表数据结构,常见的操作如在表的尾部***,在头部删除数据。队列的类型有链表结构、固定缓冲区结构等。常用的队列空间都是动态地从堆中申请,在数据量操作频繁的任务中,带来***实时性和内存碎片等问题。队列长度计算公式:nCount=(rear-front+nSize)%nSize。其中,队尾:队列中指定了用来***数据的一端;队头:队列中指定了用来删除数据的一端;入队:数据的***动作;出队:数据的删除动作。
(5)栈和队列都是在一个特定范围的存储单元中存储的数据,这些数据都可以重新被取出使用。不同的是,栈就象一个很窄的桶先存进去的数据只能最后才能取出来,而队列则不一样,即“先进后出”。队列有点象日常排队买东西的人的“队列”先排队的人先买,后排队的人后买,即“先进先出”。有时在数据结构中还有可能出现按照大小排队或按照一定条件排队的数据队列,这时的队列属于特殊队列,就不一定按照“先进先出”的原则读取数据了。
(6)随机存取存储器(RAM-random access memory,RAM)随机存储器。存储单元的内容可按需随意取出或存入,且存取的速度与存储单元的位置无关的存储器。这种存储器在断电时将丢失其存储内容,故主要用于存储短时间使用的程序。按照存储信息的不同, 随机存储器又分为静态随机存储器(Static RAM,SRAM)和动态随机存储器(Dynamic RAM,DRAM)。
(7)取模,就是求余数的运算,例如10除以4的余数是2,于是取模的结果就是2。对于整型数a,b来说,取模运算的方法都是:1、求整数商:c=a/b;2、计算模:r=a-c*b。
首先,基于背景技术中提出的技术缺陷,进一步分析本申请需要解决的技术问题及应用场景。实现N访问源的队列深度的芯片硬件实现方式,主要包括以下几种方法:
方法一:直接使用芯片生产工厂提供的多端口读写的缓存。例如,使用芯片生产厂家1个时钟周期N访问源可以同时读写的缓存。或者,使用芯片生产厂家定制化的硬核实现。
方法二:提高时钟频率,把原本在1个时钟周期的多次读写缓存分配到多个时钟周期完成。
方法三:在芯片中使用寄存器实现队列的深度计算。
综上所述,现有技术中主要存在以下缺陷。
1)方法一的缺点在于,需要芯片厂家提供定制化的缓存单元。现有芯片生产的工厂一般最多提供2R2W的缓存,N不能无限扩展。定制化的缓存单元不具有通用性,换一个芯片生产就不应有对应的缓存。定制的缓存单元面积大,功耗大,不便于专用集成电路(Application Specific Integrated Circuit,ASIC)集成,无法修改。
2)方法二的缺点在于,时钟频率提升有极限,不能无限的提高。
3)方法三的缺点在于,由于受到寄存器实现和物理限制,对于队列数目大的情况,芯片使用寄存器拥塞严重,无法实现;对于中等规模的队列数据,若可以实现,对应的芯片面积是使用缓存的5倍以上。
因此,本申请实际要解决的技术问题在于,如何尽可能的保证芯片内部的时钟频率、面积、功耗等均衡的情况下,灵活实现N访问源的数据长度的高效计算。
基于上述,本申请提供一种处理芯片。请参见图1,图1是本发明实施例提供的一种处理芯片的结构示意图,如图1中所示,处理芯片10包括控制器101、与控制器101连接的第一存储器102,其中,所述第一存储器102包括N个存储块Block,每个Block包括M个一读一写1R1W存储器;且N为大于1的整数,M为大于1的整数。其中,
所述N个Block中的第i个Block,用于存储与所述第i个Block对应的目标数据S i的数据长度,i=1、2、3、……N;其中,所述第i个Block中存储M份所述S i的数据长度,且M份所述S i的数据长度分别存储在所述第i个Block的M个1R1W存储器中,一个1R1W存储器存储一份所述S i的数据长度。即每一个Block中都存储有M份相同的数据长度,该数据长度表征的是该Block所对应的目标数据的长度,具体存储形式为该M份相同的数据长度分别存储在该Block中的M个1R1W存储器中。可选的,与Block对应的目标数据可以是通过Block连接的数据接口所写入或者读出的数据,或者,是与Block预先建立了映射关系(如携带了与该Block绑定的MAC地址/IP地址/身份标识ID/或者TCP连接关系等)的数据。本发明实施例对此不作具体限定,即Block与目标数据之间的对应关系可以依据不同的应用场景进行不同的设置。
例如,第1个Block用于存储与该第1个Block(以图1中的Block1为例)对应的目标数据S 1的数据长度,其中,Block1中一共存储有M份S 1的数据长度,且M份所述S i的数据长度分别存储在Block1的M个1R1W存储器中,如1R1W存储器1、1R1W存储器2、1R1W存储器3……1R1W存储器M中分别都存储有一份该S 1的数据长度,以此类推。
控制器101,用于在第j个Block对应的目标数据S j的数据长度变化时,读取所述第j个Block的其中一个1R1W存储器中存储的S j的数据长度,并根据S j的数据长度变化,更新所述第j个Block的M个1R1W存储器中存储的M份所述S j的数据长度,其中,1≤j≤N,且j为整数。即当任意一个或者多个Block对应的目标数据的数据长度发生变化时(例如,有对应的目标数据的写入或者读出时),则控制器101读取该变化的目标数据对应的Block中存储的该目标数据的数据长度,也即是该目标数据当前的初始长度;然后再将经过计算确定更新后的数据长度写入到该Block中的每一个1R1W存储器(一共M个1R1W存储器)中。
可选的,当目标数据的数据存储类型为队列时,目标数据的数据长度则为队列的深度(也可称之为队列的长度),第一存储器102具体可以为队列深度存储器,其中,队列深度是指队列缓存的所有包的总字节数。当有包入队时,则控制器101从队列深度存储器中对应的Block的其中一个1R1W存储器中读出队列深度,加上当前入队的包长作为新的队列深度,再将新的队列深度写回对应的Block中的所有M个1R1W存储器中。当有包出队时,则控制器101从对应的Block的其中一个1R1W存储器中读出队列深度,减去当前出队的包长作为新的队列深度,再将新的队列深度写回对应的Block中的所有M个1R1W存储器中。可以理解的是,一个数据接口可以理解为一个访问源,对应一个出队端口或者一个入队端口,因此,当某个入队端口或出队端口收到了某个用户的数据包,控制器101则通过对应的Block去更新该用户通过该端口收或发的数据长度。
例如,第2个Block(以图1中的Block 2为例)对应的目标数据S 2的数据长度变化时,比如通过数据接口写入了128个字节,此时,控制器101读取Block 2中的其中一个1R1W存储器(如1R1W存储器2)中存储的S 2当前的数据长度为128个字节,然后通过计算确定出更新后的数据长度为256个字节后,则重新向Block 2中的所有1R1W存储器(包括1R1W存储器2)写入该目标数据S 2的数据长度256(例如以二进制形式写入)。
需要说明的是,本发明实施例中的1R1W存储器(1R1W memory)即一读一写存储器,支持在一个时钟周期内进行一次读操作和一次写操作。例如,上述读取S j的数据长度则是针对第j个Block中的一个1R1W存储器进行了一次读操作;上述更新M份S j的数据长度,则是针对第j个Block中的每个1R1W存储器进行了一次写操作,一共M次写操作。即一次更新需要进行一读M写的操作,因此没有超出一个Block中的M个1R1W存储器在一个时钟周期内所能提供的M读M写的上限。可以理解的是,根据处理芯片10的实际应用需求,本申请中的1R1W存储器也可以是多读多写存储器,假设一个Block对应多个目标数据,则可以根据多读多写存储器的特性,同时读取多个目标数据的初始长度、以及更新多个目标数据的变化后的数据长度,其原理与上述一读一写存储器相同,在此不再赘述。
本发明实施例提供的处理芯片,通过在第一存储器中的N个Block中的每一个Block中的M个1R1W存储器中,分别重复存储M份与该Block对应的目标数据的数据长度,当 N个Block中的任意一个或多个Block对应的目标数据的数据长度发生变化时,读取对应的Block中的其中一个1R1W存储器中存储的初始长度,并更新该Block中的M个1R1W存储器中存储的初始长度。可选的,目标数据可以包括多类数据(例如包括多个用户的数据)。因此,在本发明实施例中,当某类数据通过N个访问源(例如N个通道、数据接口、流水或平面)中的一个或多个进行增加或减少时,由于该类数据的数据长度所存储于的Block,在一个时钟周期内,最多被允许M次读操作和M次写操作,而M次读操作中的其中一次可用于读取该类数据的初始长度(以计算更新后的数据长度),M次写操作则可用于写入该类数据的M份更新后的数据长度,以便于在同一个时钟周期内可以计算M类数据的总长度。即当需要在同一个时钟周期内计算M类数据通过N个访问源写入(除去读出的)的数据总长度时,则可以使用每一个Block中的M次读操作(每一次对应一类数据)读出在各个Block中的数据长度,最终进行求和得到数据总长度。因此,本发明实施例中的处理芯片,在一个时钟周期内,最多可以允许计算M类数据的总长度,在保证了目标数据的数据长度实时更新的情况下,实现了芯片内部的N访问源的M类数据长度的计算方法,提升了多访问源的多类数据的数据长度计算的效率和精确性。
本申请提供另一种处理芯片。请参见图2,图2是本发明实施例提供的另一种处理芯片的结构示意图,如图2中所示,处理芯片10包除了包括控制器101、与控制器101连接的第一存储器102以外,还包括与控制器101连接的第二存储器103,以及与第二存储器103连接的N个数据接口,N为大于1的整数;其中,所述第一存储器102包括N个存储块Block,每个Block包括M个一读一写1R1W存储器;所述N个数据接口与所述N个存储块Block一一对应。可选的,M取大于或者等于N的整数。
所述N个数据接口中的每个数据接口,用于向所述第二存储器写入数据,或从所述第二存储器中读出数据。可选的,每个数据接口都与处理芯片10的外部接口连接,图2中以N个外部接口为例,N个外部接口可以同时输入相同或不同用户的数据报文,每个数据报文携带用户ID并具有一定的数据长度。控制器101可以基于某个用户的数据报文(即携带该用户ID的数据报文)在第二存储器103中的总存储量,对该用户的数据报文进行相关控制(例如丢弃报文、反压端口或计费等)。
第二存储器103,用于存储通过所述N个数据接口写入的数据。例如,在处理芯片10接收到各个接口的数据报文后,把数据报文缓存到第二存储器103中,同时把每个数据报文的用户ID和报文长度以及相关控制信息发送到控制器101。
所述N个Block中的第i个Block,用于存储通过所述第i个Block对应的数据接口存储至所述第二存储器103中的数据S i的数据长度,i=1、2、3、……N;其中,所述第i个Block中存储M份所述S i的数据长度,且M份所述S i的数据长度分别存储在所述第i个Block的M个1R1W存储器中,一个1R1W存储器存储一份所述S i的数据长度。进一步地,第一存储器102中的N个Block的功能中可以参照上述图1中N个Block的相关描述,在此不再赘述。
控制器101,用于在第j个Block对应的数据接口有S j的输入或者输出时,读取所述第j个Block的其中一个1R1W存储器中存储的S j的数据长度,并根据S j的数据长度变化, 更新所述第j个Block的M个1R1W存储器中存储的M份所述S j的数据长度,其中,1≤j≤N,且j为整数。即当所述N个数据接口中的任意一个或者多个数据接口有数据输入或者输出时,则控制器控制读取对应的Block中的其中一个1R1W存储器中存储的初始数据长度,并更新该Block中的M个1R1W存储器中存储的M份数据长度,以在一个时钟周期内最多允许M个访问源同时访问读取该Block中更新后的M份数据长度。进一步地,控制器101的功能可以参照上述图1中控制器101的相关描述,在此不再赘述。
本发明实施例提供的处理芯片,还包括第二存储器以及与之连接的N个数据接口,且该N个数据接口与N个Block一一对应,因此,Block对应的目标数据即为通过该Block对应的数据接口写入或者读出的数据。该第二存储器用于存储通过N个数据接口写入的各类数据,且该N个数据接口可以看做是该处理芯片的N个访问源。当有数据通过某个数据接口写入或者读出时,则对该数据接口对应的Block中存储的数据长度进行读取和更新,以保证该数据的数据长度的精确性。
作为对图1或图2中Block的细化,图3是本发明实施例提供的一种Block的结构示意图。Block可以为本申请中图1或图2提供的第一存储器102中的N个Block中的任意一个Block。其中,
如图3所示,每个Block包括M个1R1W存储器,每个1R1W存储器包括K个位宽为W的存储单元;第i个Block对应的目标数据S i包括K类数据,且将第k类数据s k通过第i个Block对应的数据接口存储至所述第二存储器103中的数据长度记为L ik,k=1、2、3、……K;所述S i的数据长度包括K个数据长度:L i1,L i2,L i3……L iK;所述第i个Block中的M个1R1W存储器中的每一个1R1W存储器存储所述K个数据长度,且所述K个数据长度一一对应的存储在一个1R1W存储器中的K个所述存储单元中。
具体地,本申请中的1R1W存储器包含多个存储单元,每个存储单元存储的数据位宽相等且为该1R1W存储器的最小单元(本申请中假设存储单元所能存储的数据位宽为W)。因此,1R1W存储器在进行数据写入和读出时,会按照W进W出的读写方式进行实现,即1R1W存储器每个时钟周期只能将数据写入到一个存储单元中,同时也只能将一个存储单元中存储的数据读出。
可选的,当目标数据的数据存储类型为队列时,一般可以按照不同的用户、不同的业务进行队列分组。例如,所述目标数据为携带用户ID的数据报文,所述K类数据为K个不同用户(携带不同用户ID)的数据报文。第一存储器102具体可以为队列深度存储器,第二存储器具体可以为数据缓存器。并且队列深度存储器中的每个Block中的每个1R1W存储器深度为K(可以存储的队列数量),位宽为W(用来保存一个队列的长度),其中,位宽W大于每个用户的缓存量对应的队列长度的上限值即可。当有包出队时,则控制器101从对应的Block的其中一个1R1W存储器中的存储单元中读出队列深度,减去当前出队的包长作为新的队列深度,再将新的队列深度写回对应的Block中的所有M个1R1W存储器中对应的存储单元中;同理,当有包入队时,则加上当前入队的包长作为新的队列深度,并进行相关的队列深度的更新操作,此处不再赘述。进一步可选的,当队列的长度超过位宽W时,则可以通过循环计算的方式来解决长度翻转的问题,即对新的队列长度进行取模 之后再存储。
如图4所示,图4为本发明实施例提供的K类数据在第一存储器中的存储形式示意图。例如,针对第一存储器中的N个Block中的任意一个Block,其M个1R1W存储器(包括1R1W存储器1、1R1W存储器2、……1R1W存储器M)中的每一个1R1W存储器中均存储有K类数据的数据长度,即同一个Block中的M个1R1W存储器之间存储的是M份重复的数据长度。N个Block中的任意一个1R1W存储器分别存储的数据长度为L i1,L i2,L i3……L iK,i=1、2、3、……N。比如,第1个Block中的1R1W存储器1中的第一个存储单元中存储的是第1类数据通过数据接口1存储至第二存储器中的数据的数据长度L 11,第2个Block中的1R1W存储器2中的第二个存储单元中存储的是第2类数据通过数据接口2存储至第二存储器中的数据长度L 22,具体可参见图4中的标注,此处不再赘述。
控制器101,用于在第j个Block对应的数据接口有s g写入或读出的情况下,读取所述第j个Block的其中一个1R1W存储器的对应存储单元中存储的L jg,并根据s g的长度变化,更新所述第j个Block中的M个1R1W存储器中的对应存储单元中存储的M份所述L jg;其中,M取大于或者等于N的整数,L jg为第g类数据s g通过第j个Block对应的数据接口存储至所述第二存储器中的数据长度,所述第g类数据为所述K类数据中的第g类数据,1≤g≤K,且g为整数。
具体地,当任意一个数据接口有数据(假设为第g类数据)输入或输出时,控制器101则控制读取该数据接口对应的Block中的其中一个1R1W存储器中的对应的存储单元(第g类数据所对应的存储单元)中的数据长度。例如,当第2个数据接口中有第2类数据的输入时,则控制器101读取第2个Block中的其中一个1R1W存储器(假设为1R1W存储器1)中的存储单元2(假设第2类数据对应存储单元2)中的数据长度L 22,并根据该第2类数据通过第2个数据接口输入的数据长度以及读取的L 22,计算得到更新后的L 22,最终,将更新后的L 22写入到第2个Block中的每一个1R1W存储器中的第2个存储单元中。
例如,所述K类数据为K个用户的数据,每个用户的数据之间携带不同的用户ID,以区分不同用户之间的数据。每个用户数据通过数据接口存储在第二存储器中,而每个用户在第二存储器中的数据存储量(即数据长度)则存储在第一存储器中的各个Block中。当某个用户的数据通过某个数据接口写入至第二存储器中,或者从第二存储器中被读出后,则其在该数据接口对应的Block中所存储的数据长度需要及时更新。则对应的Block中需要经历一读M写的过程,因为,此时需要更新用户的数据长度,而更新之前,首先需要获知该用户的数据在该Block中所存储的初始长度,即需要进行至少一个1R1W存储器的读操作,进一步地,读取了该用户的初始数据长度之后,需要结合通过该接口写入或者读出的的数据的数据长度,更新该Block中所有的存储的长度,原因在于,本发明实施例提供的即使可以同时被M个访问源所读取,因为任意一个读取N个1R1W存储其中所存储的长度时,都要保证该用户的数据长度是最新的,因此,在更新的时候需要更新该Block中的N个1R1W存储中所存储的数据长度,即需要再同一个时钟周期内进行M个写的操作。
本发明实施例提供的处理芯片,其第一存储器中的N个Block中的每一个Block中的M个1R1W存储器的深度均为K,位宽均为W。当目标数据包括K类数据时,则每一类数据通过某个数据接口存储至第二存储器中的数据长度,恰好存储于该数据接口对应的Block 中的1R1W存储器中的其中一个存储单元中,且M份该数据的数据长度分别存储于该Block的M个1R1W存储器中。因此,当某个数据接口有数据传输(写入或读出)时,则通过控制器读取对应Block中的其中一个1R1W存储器中的存储单元(该类数据对应的固定的存储单元)中所存储的初始长度,以便于计算并更新M个1R1W存储器中对应的存储单元中存储的M份数据长度。综上,本发明实施例中的处理芯片可以实现,在同一个时钟周期内最多计算M类数据的数据总长度,且因为M取大于或者等于N的整数,因此,当N个数据接口均有数据传输(写入或读出)时,且分别为N个不同类数据时,则本发明实施例中的处理芯片可以同时支持计算N访问源的N类数据的数据总长度,提升了多访问源的数据长度的计算效率。
基于上述图3提供的处理芯片,进一步地,请参见图5,图5是本发明实施例提供的又一种处理芯片的结构示意图,如图5中所示,所述处理芯片还可以包括与所述控制器101和所述第一存储器102连接的计算单元104。其中,
控制器101,还用于在同一个时钟周期内,从所述N个Block中的每一个Block的其中一个1R1W存储器中读取s g的数据长度,并发送至所述计算单元;包括L 1g,L 2g,L 3g……L Ng;需要说明的是,每个时钟周期内,每个数据接口只能写入或者读出一个存储单元中的数据报文,当然数据报文的长度不是固定的,是可变的。因此当依据用户的不同来区分数据报文时,则每个始终周期内,每个数据接口则只能写入或者读出一个用户的数据报文。
计算单元104,用于根据读取的所述s g的数据长度,计算s g在所述第二存储器中的数据总长度S,其中,
Figure PCTCN2018122946-appb-000003
1≤g≤K,且m为整数,i=1、2、3、……N。例如,读取队列深度存放单元中的每个Block中的某个用户的队列长度,以及根据当前端口的数据报文的进出情况以及数据报文的长度,计算该用户在***中所占用的总缓存量(即总的队列长度),最终根据缓存量进行相应的控制。
当处理芯片要计算某个用户的数据在第二存储器103中的总长度,则需要获知该用户的数据通过所述N个数据接口存储至第二存储器103中的数据总量,也即是需要获知N个Block中存储的该用户的数据的数据长度。因此,计算一个用户在第二存储器103中的数据总长度,需要在每个Block中进行一个读操作。由于本发明实施例中的处理芯片10包括N个数据接口,而每个数据接口在一个时钟周期内最多写入或者读出一个数据报文,因此本发明实施例中的处理芯片在同一个时钟周期内最多有N个用户的数据长度会发生变化。
假设计算每个用户的数据总长度的条件为该用户的数据长度发生变化时,则计算该用户数据的总长度,那么该处理芯片在一个时钟周期内最多需要计算N个用户的数据总长度。而每一个用户的数据总长度的计算需要占用一个Block中的一个读操作,因此计算N个用户的数据总长度占用一个Block中的N个读操作,而针对一个Block来说,该一个时钟周期内的N个读操作分布在M个1R1W存储器中的N个1R1W存储器中,即每个用户的数据长度从一个Block的其中一个1R1W存储器中的存储单元中,不同的用户的数据分布在不同的存储单元中,并且每个用户的数据长度存储在1R1W的不同存储单元(即不同的地址)中,因此互相之间不会干扰,所以可以在同一个时钟周期内每一个Block中都可以同时被N个访问源进行访问。
在一种可能的实现方式中,控制器101,还根据所述s g在所述第二存储器中的数据总长度S,控制所述s g的写入或读出。计算队列的深度,根据队列深度计算接口执行对报文的各类控制操作。用于把报文长度和用户ID发送到队列深度存放单元,同时支持N个用户ID访问,得到N个用户ID在缓存中的深度,把数据送到队列深度(用户ID)计算单元,
在一种可能的实现方式中,控制器101,还在同一个时钟周期内,分别从所述N个Block中的各个Block中读取T类数据存储的数据长度,并发送至所述计算单元;所述T类数据为在同一个时钟周期内分别通过所述N个数据接口中的T个数据接口写入或者读出的数据;其中,从所述N个Block中的任意一个Block中读取所述T类数据的数据长度,包括从所述任意一个Block的T个1R1W存储器中分别读出的所述T类数据的数据长度,且从一个1R1W存储器中读出一类数据的数据长度,所述T类数据为所述K类数据中的其中T类数据,其中,M取大于或者等于N的整数,2≤T≤M。即当控制器需要在同一个时钟周期内计算多类数据(T类数据)在第二存储器中的总长度时,可以在同一个时钟周期内获取该T类数据分别在N个Block中的数据长度,其中T的取值最大为M,因为任意一个Block中的M个1R1W存储器在一个时钟周期内最多提供M次读操作。M取大于或者等于N的整数。当T等于M,且M等于N时,则可以对应到,表示在同一个时钟周期内,需要计算N类数据的数据总长度
计算单元,用于分别计算所述T类数据在所述第二存储器中的数据总长度。即本发明实施例中的处理芯片,在一个时钟周期内最多可以获取T类数据分别在N个Block中的数据长度,因此计算单元可以根据控制器发送过来的数据分别计算所述T类数据在所述第二存储器中的数据总长度。
本发明实施例提供的处理芯片,当同一个时钟周期内,N个数据接口中有T(2≤T≤M)类数据的传输时,那么控制器可以在同一个时钟周期内,读取该T类数据分别在N个Block中的数据长度,因而分别在N个Block中的每一个Block中产生T次读操作,以及T次写操作。由于每个Block中包括M个1R1W存储器,且M取大于或者等于N的整数,因此可以实现在同一个时钟周期内的T类数据的数据总长度的计算。可以理解的是,控制器向计算单元发送的T类数据分别在N个Block中的数据长度,可以是与M次写操作(更新对应Block中的每个1R1W存储器中的数据长度)在同一个时钟周期读取的,也可以是在M次写操作之后的时钟周期读取并发送的。前一种,可以理解为在未写入最新数据长度之前就读取当前数据长度,并结合控制器中获知的当前更新的长度计算最新的数据总长度,即将数据长度发送至计算单元和更新M个1R1W存储器是在同一个时钟周期;后一种可以理解为当更新了最新的数据长度之后,再将数据长度发送至计算单元计算数据的总长度,即更新M个1R1W存储器和将数据长度发送至计算单元是在不同时钟周期。综上,本发明实施例,在同一个时钟周期内最多可以计算M类数据的数据总长度,且因为M取大于或者等于N的整数,所以,当N个数据接口均有数据传输(写入或读出)时,且分别为N个不同类数据时,则本发明实施例中的处理芯片可以同时支持计算N类数据的数据总长度,比如,计算数据的总长度的触发条件为N个数据接口中任意一个或几个有数据传输。可选的,本发明实施例中的处理芯片也可以根据应用场景的不同,同时支持计算M类数据的数据总长度,比如,计算数据的总长度的触发条件不为数据接口有数据传输,而是周期性计 算M类数据的数据总长度等。
综上,在实际应用场景中,例如计算用户在***中的缓存量的场景中,针对某一个数据接口(也可以称之为访问源)来说,当某一个用户的数据报文(例如以队列存储方式存储)通过该数据接口入队或者是出队时,则作为***的控制器来讲,需要执行以下两项操作。
其一:针对该数据报文入队或者出队所通过的访问源所对应的Block,需要更新该用户通过该访问源所占有的缓存量,因此需要读取该Block中的针对该用户的当前缓存量对应的队列长度,然后再根据上述入队或者出队的情况,更新最新的队列长度,在此过程中,涉及到一读N写,其中一读,是指读取该Block中的任意一个1R1W存储器中的该用户对应队列长度,然后根据上述出队或者入队情况更新队列长度,此时更新队列长度,不是更新一个,而是更新M个,原因在于一个Block中存储了M份相同的队列长度信息,若需要保持信息的一致性以及后续同时计算M个用户的队列总长度,则需要更新M个Memory的关于该用户的当前的最新队列长度。
其二:从全局角度来讲,控制器的最终目的是需要计算该用户在整个***上当前的总的缓存量(因为同一个用户的数据可能会通过上述N个访问源中的任意一个访问源进行出队或者入队,因此每个访问源均可能对该用户在***上的缓存量产生影响,即该用户在整个***上的缓存量对应的队列长度受到到每个访问源的影响),所以需要控制器读取每个Block中记录的该用户对应的队列长度,进行整体数据总长度的计算。
本申请中的第一存储器和第二存储器可以包括易失性存储器(volatile memory),例如随机存取存储器(random-access memory,RAM);也可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM),快闪存储器(flash memory),硬盘(hard disk drive,HDD)或固态硬盘(solid-state drive,SSD);还可以包括上述种类的存储器的组合。可以理解的是,本发明实施例中的处理芯片的结构包括但不仅限于上述图1-图5中的结构。
请参见图6,是本发明实施例提供的一种数据处理方法的流程示意图,可应用于上述图1-图5中所述的处理芯片,该方法可以应用于处理装置,所述处理装置包括控制器、与所述控制器连接的第一存储器;其中,所述第一存储器包括N个存储块Block,每个Block包括N个一读一写1R1W存储器;N为大于1的整数,M为大于1的整数;该处理方法包括以下步骤S101-步骤S103。
步骤S101:在所述N个Block中的每个Block中,存储与所述第i个Block对应的目标数据S i的数据长度。
具体地,所述第i个Block中存储M份所述S i的数据长度,且M份所述S i的数据长度分别存储在所述第i个Block的M个1R1W存储器中,一个1R1W存储器存储一份所述S i的数据长度,i=1、2、3、……N。
步骤S102:在第j个Block对应的目标数据S j的数据长度变化时,读取所述第j个Block的其中一个1R1W存储器中存储的S j的数据长度,并根据S j的数据长度变化,更新所述第j个Block的M个1R1W存储器中存储的M份所述S j的数据长度,其中,1≤j≤N,且j 为整数。
在一种可能的实现方式中,所述处理装置还包括:与所述控制器连接的第二存储器,以及与所述第二存储器连接的N个数据接口,所述N个数据接口与所述N个存储块Block一一对应;所述方法还包括:
步骤S103:通过所述N个数据接口中的每个数据接口,向所述第二存储器写入数据,或从所述第二存储器中读出数据。
步骤S104:将通过所述N个数据接口写入的数据存储至所述第二存储器;其中,与所述第i个Block对应的目标数据S i,具体为通过所述第i个Block对应的数据接口存储至所述第二存储器中的数据。
在一种可能的实现方式中,每个1R1W存储器包括K个位宽为W的存储单元;S i包括K类数据,将第k类数据s k通过第i个Block对应的数据接口存储至所述第二存储器中的数据长度记为L ik,k=1、2、3、……K;所述S i的数据长度包括K个数据长度:L i1,L i2,L i3……L iK;所述第i个Block中的M个1R1W存储器中的每一个1R1W存储器存储所述K个数据长度,且所述K个数据长度一一对应的存储在一个1R1W存储器中的K个所述存储单元中;
步骤S105:在第j个Block对应的数据接口有s g写入或读出的情况下,读取所述第j个Block的其中一个1R1W存储器的对应存储单元中存储的L jm,并根据s g的长度变化,更新所述第j个Block中的M个1R1W存储器中的对应存储单元中存储的M份所述L jm
其中,M取大于或者等于N的整数,L jm为第m类数据s g通过第j个Block对应的数据接口存储至所述第二存储器中的数据长度,所述第m类数据为所述K类数据中的第m类数据,1≤m≤K,且m为整数。
在一种可能的实现方式中,所述方法还包括:
步骤S106:在同一个时钟周期内,从所述N个Block中的每一个Block的其中一个1R1W存储器中读取s g的数据长度,并发送至所述计算单元;包括L 1m,L 2m,L 3m……L Nm
步骤S107:根据读取的所述s g的数据长度,计算s g在所述第二存储器中的数据总长度S,其中,
Figure PCTCN2018122946-appb-000004
1≤m≤K,且m为整数,i=1、2、3、……N。
在一种可能的实现方式中,所述方法还包括:
步骤S108:根据所述s g在所述第二存储器中的数据总长度S,控制所述s g的写入或读出。
在一种可能的实现方式中,所述方法还包括:
步骤S109:在同一个时钟周期内,分别从所述N个Block中的各个Block中读取T类数据存储的数据长度,并发送至所述计算单元;
具体地,所述T类数据为在同一个时钟周期内分别通过所述N个数据接口中的T个数据接口写入或者读出的数据;其中,从所述N个Block中的任意一个Block中读取所述T类数据的数据长度,包括从所述任意一个Block的T个1R1W存储器中分别读出的所述T类数据的数据长度,且从一个1R1W存储器中读出一类数据的数据长度,所述T类数据为所述K类数据中的其中T类数据,其中,M取大于或者等于N的整数,2≤T≤N;
步骤S110:分别计算所述T类数据在所述第二存储器中的数据总长度。
需要说明的是,本发明实施例中所描述的处理方法中的具体流程以及处理装置的相关功能,可参见上述图1-图5中所述的处理芯片实施例中的相关描述,此处不再赘述。
以上所述仅为本发明的几个实施例,本领域的技术人员依据申请文件公开的可以对本发明进行各种改动或变型而不脱离本发明的精神和范围。例如本发明实施例的附图中的各个部件具体形状或结构是可以根据实际应用场景进行调整的。
在本申请所提供的几个实施例中,应该理解到,所揭露的***、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个***,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者通过所述计算机可读存储介质进行传输。所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,数字通用光盘(digital versatile disc,DVD))、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。

Claims (13)

  1. 一种处理芯片,其特征在于,包括:控制器、与所述控制器连接的第一存储器;其中,所述第一存储器包括N个存储块Block,每个Block包括M个一读一写1R1W存储器;N为大于1的整数,M为大于1的整数;
    所述N个Block中的第i个Block,用于存储与所述第i个Block对应的目标数据S i的数据长度,i=1、2、3、……N;其中,所述第i个Block中存储M份所述S i的数据长度,且M份所述S i的数据长度分别存储在所述第i个Block的M个1R1W存储器中,一个1R1W存储器存储一份所述S i的数据长度;
    所述控制器,用于在第j个Block对应的目标数据S j的数据长度变化时,读取所述第j个Block的其中一个1R1W存储器中存储的S j的数据长度,并根据S j的数据长度变化,更新所述第j个Block的M个1R1W存储器中存储的M份所述S j的数据长度,其中,1≤j≤N,且j为整数。
  2. 如权利要求1的处理芯片,其特征在于,所述芯片还包括:与所述控制器连接的第二存储器,以及与所述第二存储器连接的N个数据接口,所述N个数据接口与所述N个存储块Block一一对应;
    所述N个数据接口中的每个数据接口,用于向所述第二存储器写入数据,或从所述第二存储器中读出数据;
    所述第二存储器,用于存储通过所述N个数据接口写入的数据;
    其中,与所述第i个Block对应的目标数据S i,具体为通过所述第i个Block对应的数据接口存储至所述第二存储器中的数据。
  3. 如权利要求2的处理芯片,其特征在于,每个1R1W存储器包括K个位宽为W的存储单元;S i包括K类数据,将第k类数据s k通过第i个Block对应的数据接口存储至所述第二存储器中的数据长度记为L ik,k=1、2、3、……K;所述S i的数据长度包括K个数据长度:L i1,L i2,L i3……L iK;所述第i个Block中的M个1R1W存储器中的每一个1R1W存储器存储所述K个数据长度,且所述K个数据长度一一对应的存储在一个1R1W存储器中的K个所述存储单元中;
    所述控制器,具体用于在第j个Block对应的数据接口有s g写入或读出的情况下,读取所述第j个Block的其中一个1R1W存储器的对应存储单元中存储的L jg,并根据s g的长度变化,更新所述第j个Block中的M个1R1W存储器中的对应存储单元中存储的M份所述L jg,其中,M取大于或者等于N的整数;
    其中,L jg为第g类数据s g通过第j个Block对应的数据接口存储至所述第二存储器中的数据长度,所述第g类数据为所述K类数据中的第g类数据,1≤g≤K,且g为整数。
  4. 如权利要求3所述的处理芯片,其特征在于,所述处理芯片还包括与所述控制器和所述第一存储器连接的计算单元:
    所述控制器,还用于在同一个时钟周期内,从所述N个Block中的每一个Block的其中一个1R1W存储器中读取s g的数据长度,并发送至所述计算单元;包括L 1g,L 2g,L 3g……L Ng
    所述计算单元,用于根据读取的所述s g的数据长度,计算s g在所述第二存储器中的数据总长度S,其中,
    Figure PCTCN2018122946-appb-100001
    1≤g≤K,且g为整数,i=1、2、3、……N。
  5. 如权利要求4所述的处理芯片,其特征在于,
    所述控制器,还用于根据所述s g在所述第二存储器中的数据总长度S,控制所述s g的写入或读出。
  6. 如权利要求1-3任意一项所述的处理芯片,其特征在于,所述处理芯片还包括与所述控制器和所述第一存储器连接的计算单元:
    所述控制器,还用于在同一个时钟周期内,分别从所述N个Block中的各个Block中读取T类数据存储的数据长度,并发送至所述计算单元;所述T类数据为在同一个时钟周期内分别通过所述N个数据接口中的T个数据接口写入或者读出的数据;其中,从所述N个Block中的任意一个Block中读取所述T类数据的数据长度,包括从所述任意一个Block的T个1R1W存储器中分别读出的所述T类数据的数据长度,且从一个1R1W存储器中读出一类数据的数据长度,所述T类数据为所述K类数据中的其中T类数据,其中,M取大于或者等于N的整数,2≤T≤M;
    所述计算单元,用于分别计算所述T类数据在所述第二存储器中的数据总长度。
  7. 一种处理方法,其特征在于,应用于处理装置,所述处理装置包括控制器、与所述控制器连接的第一存储器;其中,所述第一存储器包括N个存储块Block,每个Block包括M个一读一写1R1W存储器;N为大于1的整数,M为大于1的整数;所述方法包括:
    在所述N个Block中的每个Block中,存储与所述第i个Block对应的目标数据S i的数据长度,i=1、2、3、……N;其中,所述第i个Block中存储M份所述S i的数据长度,且M份所述S i的数据长度分别存储在所述第i个Block的M个1R1W存储器中,一个1R1W存储器存储一份所述S i的数据长度;
    在第j个Block对应的目标数据S j的数据长度变化时,读取所述第j个Block的其中一个1R1W存储器中存储的S j的数据长度,并根据S j的数据长度变化,更新所述第j个Block的M个1R1W存储器中存储的M份所述S j的数据长度,其中,1≤j≤N,且j为整数。
  8. 如权利要求7的处理方法,其特征在于,所述处理装置还包括:与所述控制器连接的第二存储器,以及与所述第二存储器连接的N个数据接口,所述N个数据接口与所述N个存储块Block一一对应;所述方法还包括:
    通过所述N个数据接口中的每个数据接口,向所述第二存储器写入数据,或从所述第二存储器中读出数据;
    将通过所述N个数据接口写入的数据存储至所述第二存储器;其中,与所述第i个Block对应的目标数据S i,具体为通过所述第i个Block对应的数据接口存储至所述第二存储器中 的数据。
  9. 如权利要求8的处理方法,其特征在于,每个1R1W存储器包括K个位宽为W的存储单元;S i包括K类数据,将第k类数据s k通过第i个Block对应的数据接口存储至所述第二存储器中的数据长度记为L ik,k=1、2、3、……K;所述S i的数据长度包括K个数据长度:L i1,L i2,L i3……L iK;所述第i个Block中的M个1R1W存储器中的每一个1R1W存储器存储所述K个数据长度,且所述K个数据长度一一对应的存储在一个1R1W存储器中的K个所述存储单元中;
    在第j个Block对应的数据接口有s g写入或读出的情况下,读取所述第j个Block的其中一个1R1W存储器的对应存储单元中存储的L jg,并根据s g的长度变化,更新所述第j个Block中的M个1R1W存储器中的对应存储单元中存储的M份所述L jg;其中,M取大于或者等于N的整数,L jg为第g类数据s g通过第j个Block对应的数据接口存储至所述第二存储器中的数据长度,所述第g类数据为所述K类数据中的第g类数据,1≤g≤K,且g为整数。
  10. 如权利要求9所述的处理方法,其特征在于,所述方法还包括:
    在同一个时钟周期内,从所述N个Block中的每一个Block的其中一个1R1W存储器中读取s g的数据长度,并发送至所述计算单元;包括L 1g,L 2g,L 3g……L Ng
    根据读取的所述s g的数据长度,计算s g在所述第二存储器中的数据总长度S,其中,
    Figure PCTCN2018122946-appb-100002
    1≤g≤K,且g为整数,i=1、2、3、……N。
  11. 如权利要求10所述的处理方法,其特征在于,所述方法还包括:
    根据所述s g在所述第二存储器中的数据总长度S,控制所述s g的写入或读出。
  12. 如权利要求7-9任意一项所述的处理方法,其特征在于,所述方法还包括:
    在同一个时钟周期内,分别从所述N个Block中的各个Block中读取T类数据存储的数据长度,并发送至所述计算单元;所述T类数据为在同一个时钟周期内分别通过所述N个数据接口中的T个数据接口写入或者读出的数据;其中,从所述N个Block中的任意一个Block中读取所述T类数据的数据长度,包括从所述任意一个Block的T个1R1W存储器中分别读出的所述T类数据的数据长度,且从一个1R1W存储器中读出一类数据的数据长度,所述T类数据为所述K类数据中的其中T类数据,其中,M取大于或者等于N的整数,2≤T≤N;
    分别计算所述T类数据在所述第二存储器中的数据总长度。
  13. 一种电子设备,其特征在于,包括:
    如权利要求1至6任一所述的处理芯片,以及耦合于所述处理芯片的分立器件。
PCT/CN2018/122946 2018-12-22 2018-12-22 一种处理芯片、方法及相关设备 WO2020124609A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2018/122946 WO2020124609A1 (zh) 2018-12-22 2018-12-22 一种处理芯片、方法及相关设备
CN201880100446.8A CN113227984B (zh) 2018-12-22 2018-12-22 一种处理芯片、方法及相关设备

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/122946 WO2020124609A1 (zh) 2018-12-22 2018-12-22 一种处理芯片、方法及相关设备

Publications (1)

Publication Number Publication Date
WO2020124609A1 true WO2020124609A1 (zh) 2020-06-25

Family

ID=71100006

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/122946 WO2020124609A1 (zh) 2018-12-22 2018-12-22 一种处理芯片、方法及相关设备

Country Status (2)

Country Link
CN (1) CN113227984B (zh)
WO (1) WO2020124609A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118132012A (zh) * 2024-05-07 2024-06-04 杭州海康威视数字技术股份有限公司 数据写入方法、数据读取方法、***及电子设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1834941A (zh) * 2005-03-18 2006-09-20 恩益禧电子股份有限公司 具有闪存的半导体设备
CN105637475A (zh) * 2014-09-16 2016-06-01 华为技术有限公司 并行访问方法及***
CN106095328A (zh) * 2015-04-29 2016-11-09 马维尔以色列(M.I.S.L.)有限公司 每个周期具有一个读端口和一个或多个写端口的多组存储器
US20170262646A1 (en) * 2016-03-11 2017-09-14 CNEXLABS, Inc. Computing system with non-orthogonal data protection mechanism and method of operation thereof
CN107888512A (zh) * 2017-10-20 2018-04-06 深圳市楠菲微电子有限公司 动态共享缓冲存储器及交换机
CN107948094A (zh) * 2017-10-20 2018-04-20 西安电子科技大学 一种高速数据帧无冲突入队处理的装置及方法

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102169460A (zh) * 2010-02-26 2011-08-31 航天信息股份有限公司 变长数据管理方法及装置
US8923089B2 (en) * 2012-12-21 2014-12-30 Lsi Corporation Single-port read multiple-port write storage device using single-port memory cells
CN103413569B (zh) * 2013-07-22 2016-03-09 华为技术有限公司 一读且一写静态随机存储器
CN104484129A (zh) * 2014-12-05 2015-04-01 盛科网络(苏州)有限公司 一读一写存储器、多读多写存储器及其读写方法
CN106297861B (zh) * 2016-07-28 2019-02-22 盛科网络(苏州)有限公司 可扩展的多端口存储器的数据处理方法及数据处理***

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1834941A (zh) * 2005-03-18 2006-09-20 恩益禧电子股份有限公司 具有闪存的半导体设备
CN105637475A (zh) * 2014-09-16 2016-06-01 华为技术有限公司 并行访问方法及***
CN106095328A (zh) * 2015-04-29 2016-11-09 马维尔以色列(M.I.S.L.)有限公司 每个周期具有一个读端口和一个或多个写端口的多组存储器
US20170262646A1 (en) * 2016-03-11 2017-09-14 CNEXLABS, Inc. Computing system with non-orthogonal data protection mechanism and method of operation thereof
CN107888512A (zh) * 2017-10-20 2018-04-06 深圳市楠菲微电子有限公司 动态共享缓冲存储器及交换机
CN107948094A (zh) * 2017-10-20 2018-04-20 西安电子科技大学 一种高速数据帧无冲突入队处理的装置及方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118132012A (zh) * 2024-05-07 2024-06-04 杭州海康威视数字技术股份有限公司 数据写入方法、数据读取方法、***及电子设备

Also Published As

Publication number Publication date
CN113227984B (zh) 2023-12-15
CN113227984A (zh) 2021-08-06

Similar Documents

Publication Publication Date Title
CN114780458B (zh) 数据处理的方法和存储***
US8732360B2 (en) System and method for accessing memory
US20210342103A1 (en) Method and apparatus for performing multi-object transformations on a storage device
WO2016179968A1 (zh) 一种队列管理方法、装置及存储介质
WO2018107681A1 (zh) 一种队列操作中的处理方法、装置及计算机存储介质
US11915788B2 (en) Indication in memory system or sub-system of latency associated with performing an access command
WO2019061270A1 (zh) 数据缓存装置及控制方法、数据处理芯片、数据处理***
WO2018041074A1 (zh) 一种内存设备的访问方法、装置和***
US20180284993A1 (en) Performing data operations in a storage area network
US9092275B2 (en) Store operation with conditional push of a tag value to a queue
WO2017148292A1 (zh) 一种级联板、ssd远程共享访问的***和方法
WO2019153702A1 (zh) 一种中断处理方法、装置及服务器
US11425057B2 (en) Packet processing
WO2023179433A1 (zh) 流表存储及报文转发方法、装置、计算设备及介质
US9137780B1 (en) Synchronizing multicast data distribution on a computing device
CN110232029A (zh) 一种基于索引的fpga中ddr4包缓存的实现方法
US10241922B2 (en) Processor and method
CN108090018A (zh) 数据交换方法及***
US20200242040A1 (en) Apparatus and Method of Optimizing Memory Transactions to Persistent Memory Using an Architectural Data Mover
TWI763131B (zh) 網路介面裝置、包含該網路介面裝置之電子裝置,及網路介面裝置的操作方法
WO2020124609A1 (zh) 一种处理芯片、方法及相关设备
CN117591023A (zh) 一种基于硬件卸载的分散聚集列表查询写入读取方法及装置
CN109117288B (zh) 一种低延迟旁路的消息优化方法
WO2015000103A1 (en) System and method for data storage
WO2017219749A1 (zh) 一种缓存管理方法、装置及计算机存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18943670

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18943670

Country of ref document: EP

Kind code of ref document: A1