CN116627867B - Data interaction system, method, large-scale operation processing method, equipment and medium - Google Patents

Data interaction system, method, large-scale operation processing method, equipment and medium Download PDF

Info

Publication number
CN116627867B
CN116627867B CN202310915197.0A CN202310915197A CN116627867B CN 116627867 B CN116627867 B CN 116627867B CN 202310915197 A CN202310915197 A CN 202310915197A CN 116627867 B CN116627867 B CN 116627867B
Authority
CN
China
Prior art keywords
memory
processor
read
pcie endpoint
operation result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310915197.0A
Other languages
Chinese (zh)
Other versions
CN116627867A (en
Inventor
黄广奎
李仁刚
张闯
王敏
谢志勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202310915197.0A priority Critical patent/CN116627867B/en
Publication of CN116627867A publication Critical patent/CN116627867A/en
Application granted granted Critical
Publication of CN116627867B publication Critical patent/CN116627867B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1668Details of memory controller
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4004Coupling between buses
    • G06F13/4022Coupling between buses using switching circuits, e.g. switching matrix, connection or expansion network
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Mathematical Physics (AREA)
  • Computer Hardware Design (AREA)
  • Multi Processors (AREA)

Abstract

The present invention relates to the field of communications technologies, and in particular, to a data interaction system, a data interaction method, a large-scale operation processing device, and a medium. The system comprises: the first processor and the second processor are used for respectively executing different operations to generate an operation result and an operation completion instruction; the first memory is used for storing operation results; the read-write controller is connected with the first processor, the second processor and the first memory, and is internally provided with a second memory for storing the operation completion instruction and the storage information of the operation result in the first memory; the read-write controller is used for gating the first memory to communicate with the processor generating the operation result so that the processor writes the operation result into the first memory; and gating the first memory to communicate with another processor to read the operation result from the first memory when the read-write controller detects that the operation completion instruction and the stored information are written into the second memory. The scheme of the invention obviously improves the data interaction efficiency.

Description

Data interaction system, method, large-scale operation processing method, equipment and medium
Technical Field
The present invention relates to the field of communications technologies, and in particular, to a data interaction system, a data interaction method, a large-scale operation processing device, and a medium.
Background
The development of new technology is rapid at present, algorithms in some fields such as artificial intelligence, genetic analysis and the like are larger and larger in scale, and due to the consideration of resources and the like, the algorithms are divided into a plurality of processors to run. The benefits of such partitioning can reduce the difficulty of algorithm implementation on each processor, as well as improve overall efficiency. There are other secret related algorithms, and the algorithms are split and run on a plurality of processors for the purpose of algorithm confidentiality, so that data interaction between different processors is required due to the fact that data generated by other algorithms may be required between different algorithms.
At present, a dual-port RAM (also called DPRAM) chip is added between two processors in the conventional processor data interaction implementation manner, as shown in fig. 1, the implementation principle is as follows: CPU1 and CPU2 define a part of the space of the DPRAM as a control space (e.g., gray part area in fig. 1), and the other is a data space part. The CPU1/CPU2 knows the data information of the data space according to the information in the monitoring control space, thereby realizing correct operation. For example, CPU1 has some data to be fed to CPU2, then CPU1 first writes the data to a segment of the address space of the DPRAM, then writes 1 on the CPU1 output control register, and writes the start address and data length on the corresponding register. After detecting that 1 has been written on the control register, the CPU2 reads the information of the start address and the data length, and reads the data from the address space corresponding to the DPRAM according to the information. However, conventional processor data interaction approaches suffer from the following drawbacks: the amount of data exchanged is relatively small due to the capacity limitations of the DPRAM. When the amount of data to be interacted is relatively large, the processor needs to write data to the DPRAM and control information to the DPRAM for many times, so that the efficiency is low, the resources of the processor are wasted, and therefore, improvement is needed.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a data interaction system, a data interaction method, a large-scale arithmetic processing device, and a medium.
According to a first aspect of the present invention there is provided a data interaction system, the system comprising:
a first processor and a second processor for performing different operations to generate an operation result and an operation completion indication, respectively;
the first memory is used for storing the operation result;
the read-write controller is respectively connected with the first processor, the second processor and the first memory, a second memory is arranged in the read-write controller, the second memory is a three-port memory, three ports are respectively connected with the first processor, the second processor and the read-write controller, and the second memory is used for storing the operation completion instruction and storing the storage information of the operation result in the first memory;
the read-write controller is used for gating the first memory to communicate with the processor generating the operation result so that the processor generating the operation result writes the operation result into the first memory; and when the read-write controller detects that the operation is completed and the stored information is written into the second memory, gating the first memory to communicate with another processor so that the other processor reads the operation result from the first memory based on the stored information in the second memory.
In some embodiments, the read-write controller includes a first PCIe endpoint and a second PCIe endpoint;
the first PCIe endpoint is connected with the PCIe root complex of the first processor, and the second PCIe endpoint is connected with the PCIe root complex of the second processor.
In some embodiments, the read-write controller further comprises a gating control unit;
the gating control unit is respectively connected with the first PCIe endpoint, the second PCIe endpoint and the first memory, and is used for gating the first processor to access the first memory through the first PCIe endpoint or gating the second processor to access the first memory through the second PCIe endpoint.
In some embodiments, the read-write controller further comprises an internal logic unit, the second memory comprising a first port, a second port, and a third port;
the first port is connected with the first PCIe endpoint, the second port is connected with the second PCIe endpoint, the third port is connected with the internal logic unit, and the internal logic unit is also connected with the first PCIe endpoint and the second PCIe endpoint respectively.
In some embodiments, the first processor is configured to:
Executing the first operation to generate a first operation result, and generating a first operation completion instruction when the first operation execution is completed:
sending a communication request to the gating control unit to write the first operation result into the first memory through the first PCIe endpoint, and acquiring a starting address and a data length of the first operation result in the first memory to obtain first storage information;
and writing the first storage information and the first operation completion instruction into preset addresses of the second memory respectively.
In some embodiments, the internal logic unit is configured to:
performing loop detection on the preset address of the second memory;
and responding to the writing of a first operation completion instruction in the preset address, sending a communication request to the gating control unit so as to send a first interrupt to the second processor through the second PCIe endpoint.
In some embodiments, the second processor is configured to:
in response to receiving a first interrupt, checking an interrupt source of the first interrupt to read a starting address and a data length of the first operation result from a preset address of the second memory;
And sending a communication request to the gating control unit to read the first operation result from the first memory through the second PCIe endpoint based on the starting address and the data length of the first operation result.
In some embodiments, the second processor is further configured to:
executing a second operation on the first operation result to generate a second operation result, and generating a second operation completion instruction when the second operation is completed;
sending a communication request to the gating control unit to write the second operation result into the first memory through the second PCIe endpoint, and acquiring the initial address and the data length of the second operation result in the first memory to obtain second storage information;
and writing the second storage information and the second operation completion instruction into preset addresses of the second memory respectively.
In some embodiments, the internal logic unit is further configured to:
and responding to the writing of a second operation completion instruction in the preset address, and sending a second interrupt to the first processor through the first PCIe endpoint.
In some embodiments, the first processor is further configured to:
In response to receiving a second interrupt, checking an interrupt source of the second interrupt to read a starting address and a data length of the second operation result from a preset address of the second memory;
and sending a communication request to the gating control unit to read the second operation result from the first memory through the first PCIe endpoint based on the starting address and the data length of the second operation result, and executing subsequent operation based on the second operation result.
In some embodiments, the gating control unit is configured to:
placing the communication requests sent by the first processor and the second processor into a queue according to time sequence;
and sequentially taking out communication requests from the queue, and gating the first memory to be connected with the first PCIe endpoint or gating the first memory to be connected with the second PCIe endpoint based on the communication requests.
In some embodiments, the gating control unit is further configured to:
if the sender responding to the communication request is a first processor, the first memory is gated to be connected with the first PCIe endpoint;
the sender responding to the communication request is a second processor, and the first memory is gated to connect with the second PCIe endpoint.
In some embodiments, the read-write controller is built using a field programmable gate array.
In some embodiments, the gating control unit is a Crossbar switching unit.
In some embodiments, the first processor and the second processor are each selected from any one of a central processing unit, a graphics processor, and an application specific integrated circuit.
In some embodiments, the first memory is a double rate synchronous dynamic random access memory and the second memory is a random access memory.
In some embodiments, the system further comprises a third memory;
the third memory is connected with the first processor, and is a double-rate synchronous dynamic random access memory and is used for storing data participating in calculation when the first processor executes operation.
In some embodiments, the system further comprises a fourth memory;
the fourth memory is connected with the second processor, and is a double-rate synchronous dynamic random access memory and is used for storing data participating in calculation when the second processor executes operation.
According to a second aspect of the present invention, there is provided a data interaction method comprising:
Performing different operations with the first processor and the second processor;
responding to the processor to generate an operation result and an operation completion instruction, and sending a communication request to the read-write controller;
the read-write controller gates the first memory to communicate with the processor generating the operation result based on the communication request;
the processor generating the operation result stores the operation result into a first memory, stores the operation completion instruction and the storage information of the operation result in the first memory into a second memory which is arranged in the read-write controller, wherein the second memory is a three-port memory, and three ports are respectively connected with the first processor, the second processor and the read-write controller; and
in response to the read-write controller detecting the operation completion indication and the stored information being written to the second memory, the first memory is gated to communicate with another processor to cause the other processor to read the operation result from the first memory based on the stored information in the second memory.
According to a third aspect of the present invention, there is provided a large-scale operation processing method, the large-scale operation including a first operation and a second operation involving calculation using a result of execution of the first operation, the method comprising:
The first processor and the second processor of the data interaction system respectively process the first operation and the second operation, and exchange operation results between the first operation and the second operation to perform subsequent operations based on the exchanged operation results.
According to a fourth aspect of the present invention, there is also provided an electronic device including:
at least one processor; and
and the memory stores a computer program which can be run on a processor, and the processor executes the data interaction method when executing the program.
According to a fifth aspect of the present invention there is also provided a computer readable storage medium storing a computer program which when executed by a processor performs the aforementioned data interaction method.
According to the data interaction system, the read-write controller is arranged between the first processor and the second processor, the first processor is mounted by the read-write controller to store operation results, the second memory built in the read-write controller is used to store storage information of the operation results and operation completion indication, the first processor and the second processor are alternately connected to the first memory through detection of the operation completion indication, data read-write control operation is not required to be executed by the first processor and the second processor, calculation resources are saved, data interaction efficiency is remarkably improved, and good universality is achieved.
In addition, the invention also provides a data interaction method, a large-scale operation processing method, an electronic device and a computer readable storage medium, which can also realize the technical effects, and are not repeated here.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are necessary for the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention and that other embodiments may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a conventional processor data interaction implementation;
FIG. 2 is a schematic diagram of a data interaction system according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of another data interaction system according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a second memory connection built in a read-write controller according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a data interaction system with dual CPUs according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of algorithm allocation of dual CPUs according to an embodiment of the present invention;
fig. 7 is a flow chart of a data interaction method according to another embodiment of the present invention;
fig. 8 is an internal structural diagram of an electronic device according to another embodiment of the present invention;
fig. 9 is a block diagram of a computer readable storage medium according to another embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention will be described in further detail with reference to the accompanying drawings.
It should be noted that, in the embodiments of the present invention, all the expressions "first" and "second" are used to distinguish two entities with the same name but different entities or different parameters, and it is noted that the "first" and "second" are only used for convenience of expression, and should not be construed as limiting the embodiments of the present invention, and the following embodiments are not described one by one.
In one embodiment, referring to FIG. 2, the present invention provides a data interaction system 100, specifically, the system comprises:
a first processor 110 and a second processor 120, the first processor 110 and the second processor 120 configured to perform different operations to generate an operation result and an operation completion indication, respectively;
In this embodiment, the first processor and the second processor may be existing chips for performing data operations, such as a central processing unit (Central Processing Unit, abbreviated as CPU), a graphics processor (Graphics Processing Unit, abbreviated as GPU), a Microprocessor/microcontroller (Microprocessor/Micro controller Unit, abbreviated as MPU/MCU), a neural network processor (Neural Network Processing Unit, abbreviated as NPU), a tensor processor (Tensor Processing Unit, abbreviated as TPU), and an Application-specific integrated circuit (Application-specific integrated circuit, ASIC). It should be noted that the first processor 110 and the second processor 120 may be the same chip or different chips, for example, the first processor 110 and the second processor 120 may be both CPUs, or the first processor 110 may be, for example, a CPU, and the second processor 120 may be an ASIC. In this embodiment, the operations may be data addition, data multiplication, data inversion, etc., and may also be a certain data processing operation in the encryption and decryption processes, where in the specific implementation process, the operations executed by the first processor 110 and the second processor 120 may be allocated according to the application scenario and the processing task.
A first memory 130, where the first memory 130 is used to store the operation result; the first memory 130, preferably a direct access memory (Direct Memory Assess, abbreviated as DAM), may have a capacity selected according to a size of the calculated interaction data or an interaction scenario requirement, for example, the capacity may be set according to a maximum data that the first processor 110 and the second processor 120 need to transmit during data interaction, and to ensure that the exchange data may be completed in one interaction, the capacity of the first memory 130 may be set to be relatively larger.
A read-write controller 140, wherein the read-write controller 140 is respectively connected with the first processor 110, the second processor 120 and the first memory 130, a second memory 141 is built in the read-write controller 140, the second memory 141 is a three-port memory, three ports are respectively connected with the first processor 110, the second processor 120 and the read-write controller 140, and the second memory 141 is used for storing the operation completion instruction and storing the storage information of the operation result in the first memory 130;
in the present embodiment, the second memory 141 is preferably a random access memory (Random Access Memory, simply referred to as RAM), and since the second memory 141 stores the storage information of the operation result, the capacity of the second memory 141 can be smaller than that of the first memory 130. By means of the three ports of the second memory 141, the first processor 110, the second processor 120 and the read-write controller 140 can all communicate with the first processor 110 and the second processor 120, and the first processor 110 and the second processor 120 can write data into the second memory 141 and read data from the second memory 141 through the two ports, and the read-write controller 140 can read data from the second memory 141 through the connection port with the second memory 141. The operation completion indication may be represented by a one-bit binary number, such as by generating a signal and assigning a "1" to indicate completion of the operation after completion of the operation using the processor. It should be noted that the specific form and assignment of the operation completion instruction in this embodiment are only for illustration.
Wherein the read/write controller 140 is configured to gate the first memory 130 to communicate with a processor (abbreviated as a target processor) that generates the operation result, so that the target processor writes the operation result into the first memory 130; and when the read-write controller 140 detects that the operation completion instruction and the stored information are written into the second memory 141, gating the first memory 130 to communicate with another processor to cause the other processor to read the operation result from the first memory 130 based on the stored information in the second memory 141.
In this embodiment, the target processor refers to a processor that generates an operation result, i.e. a processor that generates data to be exchanged, the other processor refers to a processor that needs to acquire the exchanged data, the first processor 110 and the second processor 120 may both be target processors, for example, three sequentially executed operations 1, 2 and 3, the three operations are sequentially executed, the operation 2 needs to use the execution result of the operation 1, the operation 3 needs to use the execution result of the operation 2, it may be assumed that the first processor 110 executes the operation 1 and the operation 3, the second processor 120 executes the operation 2, and then the first processor 110 is the target processor when the first processor 110 completes the operation 1, and similarly, the second processor is the new target processor when the second processor 120 acquires the execution result of the operation 1 and then executes the operation 2.
According to the data interaction system, the read-write controller is arranged between the first processor and the second processor, the first processor is mounted by the read-write controller to store operation results, the second memory built in the read-write controller is used to store storage information of the operation results and operation completion indication, the first processor and the second processor are alternately connected to the first memory through detection of the operation completion indication, data read-write control operation is not required to be executed by the first processor and the second processor, calculation resources are saved, data interaction efficiency is remarkably improved, and good universality is achieved.
In some embodiments, referring to fig. 3, the read/write controller 240 is built using a field programmable gate array.
In some embodiments, in order to facilitate understanding of the present invention, a field programmable gate array (Field Programmable Gate Array, abbreviated as FPGA) is taken as an example of a read-write controller, and the connection relationship and functions of the first processor 210, the second processor 220, and the first memory 230 refer to the definitions of the first processor 110, the second processor 120, and the first memory 130 in fig. 2, and the read-write controller 240 of this embodiment has a built-in second memory 241 (refer to the definition of the second memory 141 in fig. 2), and specifically, the read-write controller 240 includes a first PCIe endpoint 242 and a second PCIe endpoint 243; the PCIe endpoint is an End Point (EP for short).
The first PCIe endpoint 242 is connected to the first processor's PCIe root complex 211 and the second PCIe endpoint 243 is connected to the second processor's PCIe root complex 221. The PCIe Root Complex is Root Complex (abbreviated as RC).
In some embodiments, as shown in fig. 3, the read-write controller 240 further includes a strobe control unit 244;
the gating control unit 244 is respectively connected to the first PCIe endpoint 242, the second PCIe endpoint 243 and the first memory 230, and the gating control unit 244 is configured to gate the first processor 210 to access the first memory 230 through the first PCIe endpoint 242 or to gate the second processor 220 to access the first memory 230 through the second PCIe endpoint 243.
In some embodiments, referring to fig. 4, the read/write controller 240 further includes an internal logic unit 245, and the second memory 241 includes a first port 2411, a second port 2412, and a third port 2413;
the first port 2411 is connected to the first PCIe endpoint 242, the second port 2412 is connected to the second PCIe endpoint 243, the third port 2413 is connected to the internal logic unit 245, and the internal logic unit 245 is also connected to the first PCIe endpoint 242 and the second PCIe endpoint 243, respectively.
In some embodiments, referring to fig. 3 and 4, the first processor 210 is configured to:
executing the first operation to generate a first operation result, and generating a first operation completion instruction when the first operation execution is completed:
sending a communication request to the gating control unit 244 to write the first operation result into the first memory 230 through the first PCIe endpoint 242, and obtaining a start address and a data length of the first operation result in the first memory 230 to obtain first storage information;
the first stored information and the first operation completion instruction are written into preset addresses of the second memory 241, respectively.
In some embodiments, referring to fig. 3 and 4, the internal logic unit 250 is configured to:
performing loop detection on the preset address of the second memory 241;
in response to writing the first operation complete indication in the preset address, a communication request is sent to the gating control unit 244 to send a first interrupt to the second processor 220 through the second PCIe endpoint 243.
In some embodiments, referring to fig. 3 and 4, the second processor 220 is configured to:
In response to receiving the first interrupt, checking an interrupt source of the first interrupt to read a start address and a data length of the first operation result from a preset address of the second memory 241;
a communication request is sent to the gating control unit 244 to read the first operation result from the first memory 230 through the second PCIe endpoint 243 based on the start address and the data length of the first operation result.
In some embodiments, referring to fig. 3, the second processor 220 is further configured to:
executing a second operation on the first operation result to generate a second operation result, and generating a second operation completion instruction when the second operation is completed;
sending a communication request to the gating control unit 244 to write the second operation result into the first memory 230 through the second PCIe endpoint 243, and obtaining a start address and a data length of the second operation result in the first memory 230 to obtain second storage information;
and writing the second storage information and the second operation completion instruction into preset addresses of the second memory 241 respectively.
In some embodiments, referring to fig. 3, the internal logic unit 245 is further configured to:
In response to writing a second operation complete indication in the preset address, a second interrupt is sent to the first processor 210 through the first PCIe endpoint 242.
In some embodiments, the first processor 210 is further configured to:
in response to receiving the second interrupt, checking an interrupt source of the second interrupt to read a start address and a data length of the second operation result from a preset address of the second memory 241;
a communication request is sent to the gating control unit 244 to read the second operation result from the first memory 230 through the first PCIe endpoint 242 based on the start address and data length of the second operation result and to perform a subsequent operation based on the second operation result.
In some embodiments, the gating control unit 244 is configured to:
placing the communication requests sent by the first processor 210 and the second processor 220 into a queue according to time sequence;
the communication requests are sequentially dequeued and either the first memory 230 is gated to connect with the first PCIe endpoint 242 or the first memory 230 is gated to connect with the second PCIe endpoint 243 based on the communication requests.
In some embodiments, the gating control unit 244 is further configured to:
the sender responding to the communication request is the first processor 210, and the first memory 230 is gated to connect with the first PCIe endpoint 242;
the sender responding to the communication request is the second processor 220, and the first memory 230 is gated to connect with the second PCIe endpoint 243.
In some embodiments, the gating control unit is a Crossbar switch unit, where Crossbar is typically provided by FPGA vendors, and is an IP module capable of interfacing multiple AXI-4.
In some embodiments, the first processor and the second processor are each selected from any one of a central processing unit, a graphics processor, and an application specific integrated circuit.
In some embodiments, the first memory is a double rate synchronous dynamic random access memory and the second memory is a random access memory. Preferably, the first memory may employ DDR4
In some embodiments, referring to fig. 5, the system further includes a third memory 250;
the third memory 250 is connected to the first processor 210, and the third memory 250 is a double-rate synchronous dynamic random access memory, and is used for storing data participating in calculation when the first processor 210 performs operation.
In some embodiments, as shown in FIG. 5, the system further includes a fourth memory 260;
the fourth memory 260 is connected to the second processor 220, and the fourth memory 260 is a double-rate synchronous dynamic random access memory, and is used for storing data participating in calculation when the second processor 220 performs operation.
In another embodiment, in order to facilitate understanding of the scheme of the present invention, taking a scenario in which a dual CPU performs encryption operation as an example, please refer to fig. 5 again, the present embodiment proposes a data interaction system that uses PCIe channels of an FPGA in combination with DDR4 to solve dual CPU interaction data, where the overall architecture of the system includes the following parts:
CPU1 and CPU2, the Root Complex end of PCIe carried in CPU1 and CPU2 is respectively marked as RC1 and RC2;
the FPAG includes the following components: two PCIe endpoints provided by FPGA manufacturers and having DMA functions are respectively marked as EP1 and EP2, the EP1 is connected with RC1, and the EP2 is connected with RC2; a three-port random access memory (Triple Port Random Access Memory, TPRAM for short) which can support 3 ports of independent read-write memory modules for controlling the data interaction flow; the internal logic unit is used for configuring FPGA internal processing logic of a CPU algorithm; and the Crossbar is an IP module capable of connecting a plurality of AXI-4 interfaces, the Crossbar can be provided by an FPGA manufacturer, three ports of the TPRAM are respectively connected with the EP1 and the EP2 and the internal logic units, and the internal logic units are also respectively connected with the EP1 and the EP 2.
The DDR4 is mounted on the FPGA through a Crossbar to be recorded as DDR4 1, the Crossbar is also respectively connected with the EP1 and the EP2, the DDR4 1 can be connected with the CPU or the CPU2, and the other two DDR4 are respectively mounted on the CPU1 and the CPU2 to be recorded as DDR4 2 and DDR4 3;
the operation of the system will be described in detail as follows:
in the first step, the CPU1 and the CPU2 determine the algorithm segmentation and the data interaction flow, as shown in fig. 6, assuming that the encryption algorithm includes 2n operations sequentially executed, the latter operations all depend on the previous operation execution result, and the operations in odd order may be allocated to the CPU1 for execution, and the operations in even order may be allocated to the CPU2 for execution, so that the CPU1 and the CPU2 will repeatedly execute data interaction.
And step two, after the algorithm finishes the node 1, the CPU1 transmits intermediate data finished by the algorithm to the DDR4 1 connected with the FPGA through a DMA flow.
Step three, the CPU1 writes the instruction information of the completion of the algorithm node 1 and the start position and total data length of the processed data storage DDR4 1 into a predefined address of the TPRAM.
And step four, after the internal logic unit detects the completion indication information of the CPU1 node 1, setting corresponding interrupt indication information and sending an interrupt to the CPU 2.
Step five, after the CPU2 receives the interrupt, checking the interrupt source, and reading the starting address and the data length of the data of the CPU1 node 1 in the TPRAM, and then reading the data into the CPU2 for further processing or application through DMA.
And step six, after the CPU2 finishes processing the data, the data finished by the CPU2 algorithm node 1 is transmitted to the DDR4 1 connected with the FPGA through DMA.
Step seven, the CPU2 writes the indication information completed by the algorithm node 1, the starting position on the DDR4 1 where the data is located and the total length of the data into the corresponding position of the CPU2 node 1 in the TPRAM
Step eight, after the internal logic unit detects the completion indication information of the CPU2 node 1, setting corresponding interrupt indication information and sending an interrupt to the CPU 1;
step nine, after the CPU1 receives the interrupt, checking the interrupt source, and then reading the starting address and the data length of the CPU2 node 1 data in the TPRAM, and reading the data into the CPU1 for further processing or application through DMA.
Step ten, according to the specific division of the algorithm, the above steps can be repeatedly executed until all algorithms are executed.
The data interaction system of the embodiment has at least the following beneficial technical effects: through the FPGA equipment of the dual-port PCIe, the two CPU systems work cooperatively, and the dual-port PCIe has better universality; in addition, the data interaction efficiency is improved through DMA interaction data, and meanwhile, the DDR4 capacity on the FPGA is large, so that large-scale data can be exchanged at one time, and the method can be applied to a large-scale algorithm model or used for protecting an algorithm.
In yet another embodiment, referring to fig. 7, the present embodiment provides a data interaction method 300, which includes the following steps:
step 301, performing different operations by using the first processor and the second processor;
step 302, in response to the processor generating an operation result and an operation completion instruction, sending a communication request to the read-write controller;
step 303, the read-write controller gates the first memory to communicate with the processor generating the operation result based on the communication request;
step 304, a processor generating the operation result stores the operation result in a first memory, stores the operation completion instruction and the storage information of the operation result in the first memory in a second memory built in the read-write controller, wherein the second memory is a three-port memory, and three ports are respectively connected with the first processor, the second processor and the read-write controller; and
in response to the read-write controller detecting the operation completion instruction and the stored information being written into the second memory, the first memory is gated to communicate with another processor to cause the other processor to read the operation result from the first memory based on the stored information in the second memory, step 305.
According to the data interaction method, the read-write controller is arranged between the first processor and the second processor, the read-write controller is used for mounting the first processor to store the operation result, the second memory built in the read-write controller is used for storing the storage information and the operation completion indication of the operation result, the first processor and the second processor are alternately connected to the first memory through detecting the operation completion indication, the first processor and the second processor are not required to execute data read-write control operation, calculation resources are saved, the data interaction efficiency is remarkably improved, and the data interaction method has better universality.
It should be noted that, the specific limitation of the data interaction method may be referred to the limitation of the data interaction system hereinabove, and will not be described herein.
In still another embodiment, the present invention also provides a large-scale operation processing method including a first operation and a second operation involving calculation using a result of the first operation, the method including: the first processor and the second processor of the data interaction system described in the above embodiments process the first operation and the second operation, respectively, and exchange the operation result between the two to perform subsequent operations based on the exchanged operation result.
According to another aspect of the present invention, there is provided an electronic device, which may be a server, and an internal structure thereof is shown in fig. 8. The electronic device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic device includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the electronic device is for storing data. The network interface of the electronic device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements the data interaction method described above, and specifically comprises the steps of:
performing different operations with the first processor and the second processor;
responding to the processor to generate an operation result and an operation completion instruction, and sending a communication request to the read-write controller;
the read-write controller gates the first memory to communicate with the target processor generating the operation result based on the communication request;
The target processor stores the operation result into a first memory, stores the operation completion instruction and the storage information of the operation result in the first memory into a second memory which is arranged in the read-write controller, wherein the second memory is a three-port memory, and three ports are respectively connected with the first processor, the second processor and the read-write controller; and
in response to the read-write controller detecting the operation completion indication and the stored information being written to the second memory, the first memory is gated to communicate with another processor to cause the other processor to read the operation result from the first memory based on the stored information in the second memory.
According to still another aspect of the present invention, a computer readable storage medium is provided, as shown in fig. 9, on which a computer program is stored, the computer program implementing the data interaction method described above when executed by a processor, specifically including performing the steps of:
performing different operations with the first processor and the second processor;
responding to the processor to generate an operation result and an operation completion instruction, and sending a communication request to the read-write controller;
The read-write controller gates the first memory to communicate with the target processor generating the operation result based on the communication request;
the target processor stores the operation result into a first memory, stores the operation completion instruction and the storage information of the operation result in the first memory into a second memory which is arranged in the read-write controller, wherein the second memory is a three-port memory, and three ports are respectively connected with the first processor, the second processor and the read-write controller; and
in response to the read-write controller detecting the operation completion indication and the stored information being written to the second memory, the first memory is gated to communicate with another processor to cause the other processor to read the operation result from the first memory based on the stored information in the second memory.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims (19)

1. A data interaction system, the system comprising:
a first processor and a second processor for performing different operations to generate an operation result and an operation completion indication, respectively;
the first memory is used for storing the operation result, wherein the first memory is a direct access memory;
The read-write controller is respectively connected with the first processor, the second processor and the first memory, a second memory is arranged in the read-write controller, the second memory is a three-port memory, three ports are respectively connected with the first processor, the second processor and the read-write controller, the second memory is used for storing the operation completion instruction and storing the storage information of the operation result in the first memory, wherein the second memory is a random access memory, and the capacity of the second memory is smaller than that of the first memory;
wherein, the read-write controller is used for: gating the first memory to communicate with a processor that generates the operation result, such that the processor that generates the operation result writes the operation result to the first memory; and when the read-write controller detects that the operation is completed and the stored information is written into the second memory, gating the first memory to communicate with another processor so that the other processor reads the operation result from the first memory based on the stored information in the second memory;
The read-write controller comprises a first PCIe endpoint and a second PCIe endpoint;
the first PCIe endpoint is connected with the PCIe root complex of the first processor, and the second PCIe endpoint is connected with the PCIe root complex of the second processor;
the read-write controller also comprises a gating control unit;
the gating control unit is respectively connected with the first PCIe endpoint, the second PCIe endpoint and the first memory, and is used for gating the first processor to access the first memory through the first PCIe endpoint or gating the second processor to access the first memory through the second PCIe endpoint;
the read-write controller also comprises an internal logic unit, and the second memory comprises a first port, a second port and a third port;
the first port is connected with the first PCIe endpoint, the second port is connected with the second PCIe endpoint, the third port is connected with the internal logic unit, and the internal logic unit is also connected with the first PCIe endpoint and the second PCIe endpoint respectively.
2. The data interaction system of claim 1, wherein the first processor is configured to:
Executing the first operation to generate a first operation result, and generating a first operation completion instruction when the first operation execution is completed:
sending a communication request to the gating control unit to write the first operation result into the first memory through the first PCIe endpoint, and acquiring a starting address and a data length of the first operation result in the first memory to obtain first storage information;
and writing the first storage information and the first operation completion instruction into preset addresses of the second memory respectively.
3. The data interaction system of claim 2, wherein the internal logic unit is configured to:
performing loop detection on the preset address of the second memory;
and responding to the writing of a first operation completion instruction in the preset address, sending a communication request to the gating control unit so as to send a first interrupt to the second processor through the second PCIe endpoint.
4. A data interaction system as claimed in claim 3, wherein the second processor is configured to:
in response to receiving a first interrupt, checking an interrupt source of the first interrupt to read a starting address and a data length of the first operation result from a preset address of the second memory;
And sending a communication request to the gating control unit to read the first operation result from the first memory through the second PCIe endpoint based on the starting address and the data length of the first operation result.
5. The data interaction system of claim 4, wherein the second processor is further configured to:
executing a second operation on the first operation result to generate a second operation result, and generating a second operation completion instruction when the second operation is completed;
sending a communication request to the gating control unit to write the second operation result into the first memory through the second PCIe endpoint, and acquiring the initial address and the data length of the second operation result in the first memory to obtain second storage information;
and writing the second storage information and the second operation completion instruction into preset addresses of the second memory respectively.
6. The data interaction system of claim 5, wherein the internal logic unit is further configured to:
and responding to the writing of a second operation completion instruction in the preset address, and sending a second interrupt to the first processor through the first PCIe endpoint.
7. The data interaction system of claim 6, wherein the first processor is further configured to:
in response to receiving a second interrupt, checking an interrupt source of the second interrupt to read a starting address and a data length of the second operation result from a preset address of the second memory;
and sending a communication request to the gating control unit to read the second operation result from the first memory through the first PCIe endpoint based on the starting address and the data length of the second operation result, and executing subsequent operation based on the second operation result.
8. The data interaction system of claim 7, wherein the gating control unit is further configured to:
placing the communication requests sent by the first processor and the second processor into a queue according to time sequence;
and sequentially taking out communication requests from the queue, and gating the first memory to be connected with the first PCIe endpoint or gating the first memory to be connected with the second PCIe endpoint based on the communication requests.
9. The data interaction system of claim 8, wherein the gating control unit is further configured to:
If the sender responding to the communication request is a first processor, the first memory is gated to be connected with the first PCIe endpoint;
the sender responding to the communication request is a second processor, and the first memory is gated to connect with the second PCIe endpoint.
10. The data interaction system of claim 1, wherein the read-write controller is built using a field programmable gate array.
11. The data interaction system of claim 10, wherein the gating control unit is a Crossbar switching unit.
12. The data interaction system of any of claims 1-11, wherein the first processor and the second processor are each selected from any of a central processing unit, a graphics processor, an application specific integrated circuit.
13. The data interaction system of any of claims 1-11, wherein the first memory is a double rate synchronous dynamic random access memory and the second memory is a random access memory.
14. The data interaction system of any of claims 1-11, wherein the system further comprises a third memory;
The third memory is connected with the first processor, and is a double-rate synchronous dynamic random access memory and is used for storing data participating in calculation when the first processor executes operation.
15. The data interaction system of any of claims 1-11, wherein the system further comprises a fourth memory;
the fourth memory is connected with the second processor, and is a double-rate synchronous dynamic random access memory and is used for storing data participating in calculation when the second processor executes operation.
16. A data interaction method, characterized in that the data interaction method comprises:
performing different operations with the first processor and the second processor;
responding to the processor to generate an operation result and an operation completion instruction, and sending a communication request to the read-write controller;
the read-write controller gates a first memory to communicate with a processor generating the operation result based on the communication request, wherein the first memory is a direct access memory;
the processor generating the operation result stores the operation result into a first memory, stores the operation completion instruction and the storage information of the operation result in the first memory into a second memory which is arranged in the read-write controller, wherein the second memory is a three-port memory, and three ports are respectively connected with the first processor, the second processor and the read-write controller, wherein the second memory is a random access memory, and the capacity of the second memory is smaller than that of the first memory; and
In response to the read-write controller detecting the operation completion indication and the stored information being written into the second memory, gating the first memory to communicate with another processor to cause the other processor to read the operation result from the first memory based on the stored information in the second memory;
the read-write controller comprises a first PCIe endpoint and a second PCIe endpoint;
the first PCIe endpoint is connected with the PCIe root complex of the first processor, and the second PCIe endpoint is connected with the PCIe root complex of the second processor;
the read-write controller also comprises a gating control unit;
the gating control unit is respectively connected with the first PCIe endpoint, the second PCIe endpoint and the first memory, and is used for gating the first processor to access the first memory through the first PCIe endpoint or gating the second processor to access the first memory through the second PCIe endpoint;
the read-write controller also comprises an internal logic unit, and the second memory comprises a first port, a second port and a third port;
the first port is connected with the first PCIe endpoint, the second port is connected with the second PCIe endpoint, the third port is connected with the internal logic unit, and the internal logic unit is also connected with the first PCIe endpoint and the second PCIe endpoint respectively.
17. A large-scale arithmetic processing method, characterized in that the large-scale arithmetic includes a first arithmetic and a second arithmetic that performs result participation calculation using the first arithmetic, the method comprising:
the data interaction system of any of claims 1-15, wherein the first and second operations are processed by a first processor and a second processor, respectively, and the results of the operations are exchanged between the two for subsequent operations based on the exchanged results of the operations.
18. An electronic device, comprising:
at least one processor; and
a memory storing a computer program executable in the processor, the processor executing the data interaction method of claim 16 when the program is executed.
19. A computer readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, performs the data interaction method of claim 16.
CN202310915197.0A 2023-07-25 2023-07-25 Data interaction system, method, large-scale operation processing method, equipment and medium Active CN116627867B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310915197.0A CN116627867B (en) 2023-07-25 2023-07-25 Data interaction system, method, large-scale operation processing method, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310915197.0A CN116627867B (en) 2023-07-25 2023-07-25 Data interaction system, method, large-scale operation processing method, equipment and medium

Publications (2)

Publication Number Publication Date
CN116627867A CN116627867A (en) 2023-08-22
CN116627867B true CN116627867B (en) 2023-11-03

Family

ID=87613877

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310915197.0A Active CN116627867B (en) 2023-07-25 2023-07-25 Data interaction system, method, large-scale operation processing method, equipment and medium

Country Status (1)

Country Link
CN (1) CN116627867B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116804915B (en) * 2023-08-28 2023-12-15 腾讯科技(深圳)有限公司 Data interaction method, processor, device and medium based on memory

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107577562A (en) * 2017-09-19 2018-01-12 南京南瑞继保电气有限公司 A kind of method of data interaction, equipment and computer-readable recording medium
CN111857546A (en) * 2019-04-28 2020-10-30 伊姆西Ip控股有限责任公司 Method, network adapter and computer program product for processing data
CN112799587A (en) * 2020-11-23 2021-05-14 哲库科技(北京)有限公司 Processor system, inter-core communication method, processor, and memory unit
CN114546913A (en) * 2022-01-21 2022-05-27 山东云海国创云计算装备产业创新中心有限公司 Method and device for high-speed data interaction among multiple hosts based on PCIE interface
CN116301667A (en) * 2023-05-24 2023-06-23 山东浪潮科学研究院有限公司 Database system, data access method, device, equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107577562A (en) * 2017-09-19 2018-01-12 南京南瑞继保电气有限公司 A kind of method of data interaction, equipment and computer-readable recording medium
CN111857546A (en) * 2019-04-28 2020-10-30 伊姆西Ip控股有限责任公司 Method, network adapter and computer program product for processing data
CN112799587A (en) * 2020-11-23 2021-05-14 哲库科技(北京)有限公司 Processor system, inter-core communication method, processor, and memory unit
CN114546913A (en) * 2022-01-21 2022-05-27 山东云海国创云计算装备产业创新中心有限公司 Method and device for high-speed data interaction among multiple hosts based on PCIE interface
CN116301667A (en) * 2023-05-24 2023-06-23 山东浪潮科学研究院有限公司 Database system, data access method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN116627867A (en) 2023-08-22

Similar Documents

Publication Publication Date Title
CN116627867B (en) Data interaction system, method, large-scale operation processing method, equipment and medium
JP5826471B2 (en) Autonomous subsystem architecture
JP5658509B2 (en) Autonomous memory architecture
US20180032267A1 (en) Extensible storage system controller
WO2019223383A1 (en) Direct memory access method and device, dedicated computing chip and heterogeneous computing system
CN109564562B (en) Big data operation acceleration system and chip
CN114416612A (en) Memory access method and device, electronic equipment and storage medium
US11093276B2 (en) System and method for batch accessing
US20230400985A1 (en) Pim computing system and pim computation offloading method thereof
KR20150090621A (en) Storage device and method for data processing
CN107247577B (en) Method, device and system for configuring SOC IP core
WO2023134588A1 (en) Computing system, method and apparatus, and acceleration device
CN116455849B (en) Concurrent communication method, device, equipment and medium for many-core processor
CN117406932B (en) Data processing method, device, server and storage medium
CN109032522A (en) Data reading method of solid state disk and solid state disk
JP2000099452A (en) Dma control device
JP7363344B2 (en) Memory control device and control method
US12019581B2 (en) Multi-core processor and storage device
US20240241826A1 (en) Computing node cluster, data aggregation method, and related device
CN114296640B (en) Data driving method, apparatus, device and storage medium for accelerating computation
WO2022142173A1 (en) Data check method and related device
EP4398116A1 (en) Computing node cluster, data aggregation method and related device
JPH03137736A (en) Operation tracing system for micro processor
CN116069718A (en) Chip configuration method, device, equipment and medium
JP2585852B2 (en) Buffer control method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant