CN117171075B - Electronic equipment and task processing method - Google Patents

Electronic equipment and task processing method Download PDF

Info

Publication number
CN117171075B
CN117171075B CN202311405894.8A CN202311405894A CN117171075B CN 117171075 B CN117171075 B CN 117171075B CN 202311405894 A CN202311405894 A CN 202311405894A CN 117171075 B CN117171075 B CN 117171075B
Authority
CN
China
Prior art keywords
coprocessor
task
dma
processed
bus
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311405894.8A
Other languages
Chinese (zh)
Other versions
CN117171075A (en
Inventor
苏运强
张�荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Xinlianxin Intelligent Technology Co ltd
Original Assignee
Shanghai Xinlianxin Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Xinlianxin Intelligent Technology Co ltd filed Critical Shanghai Xinlianxin Intelligent Technology Co ltd
Priority to CN202311405894.8A priority Critical patent/CN117171075B/en
Publication of CN117171075A publication Critical patent/CN117171075A/en
Application granted granted Critical
Publication of CN117171075B publication Critical patent/CN117171075B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Advance Control (AREA)
  • Multi Processors (AREA)

Abstract

The embodiment of the invention provides an electronic device and a task processing method, comprising the following steps: a processor CPU, a coprocessor subsystem, a memory direct access controller DMAC and a memory; the coprocessor subsystem comprises a coprocessor management unit CPMU, a sub-bus and a coprocessor CP directly connected with the sub-bus; the CPU is used for storing the task to be processed into the memory and sending a task processing request for indicating to process the task to be processed to the CPMU; the coprocessor subsystem is used for reading the task to be processed from the memory through the bus by the DMAC; the CPMU is used for processing the task to be processed in the coprocessor subsystem by controlling at least one CP, and the coprocessor subsystem is used for writing the processing result of the task to be processed into the memory through the DMAC via the bus; the CPMU is used for sending a task completion response to the CPU; the CPU is used for acquiring a processing result from the memory according to the task completion response.

Description

Electronic equipment and task processing method
Technical Field
The embodiment of the invention relates to the technical field of coprocessors, in particular to electronic equipment and a task processing method.
Background
Memory direct access (Direct Memory Access, DMA) refers to a high-speed data transfer operation.
At present, when a plurality of coprocessors sequentially process corresponding data to obtain a processing result which is used as data required by a CPU, DMA transmission is required between the coprocessors and a memory. The coprocessor is a processor which assists the CPU to complete processing work which cannot be executed or has low execution efficiency and low effect and is developed and applied.
However, since DMA may form monopolization to the bus during the transfer process, the CPU may lose control of the bus, and as the number of coprocessors increases, a large amount of DMA transfer may affect the stability of the system. In addition, in the DMA transfer process, unencrypted data is inevitably buffered in the memory, so that the security of the data is low.
In summary, how to reduce the number of DMA transfers is a technical problem to be solved currently.
Disclosure of Invention
The embodiment of the invention provides electronic equipment and a task processing method, which are used for solving the problems that the stability of a system is affected and the security of data to be transmitted is lower due to excessive DMA transmission times in the prior art.
In a first aspect, an embodiment of the present invention provides an electronic device, including: a processor CPU, a coprocessor subsystem, a memory direct access controller DMAC and a memory; the CPU, the DMAC and the memory are respectively and directly connected with the bus; the coprocessor subsystem comprises a coprocessor management unit CPMU, a sub-bus and a coprocessor CP directly connected with the sub-bus; any two CPs transmit data through the sub-buses; the CPU is used for storing the task to be processed into the memory and sending a task processing request for indicating to process the task to be processed to the CPMU; a coprocessor subsystem for reading tasks to be processed from the memory via the bus by the DMAC; the CPMU is used for processing tasks to be processed in the coprocessor subsystem by controlling at least one CP; the coprocessor subsystem is also used for writing the processing result of the task to be processed into the memory through the DMAC via the bus; the CPMU is also used for sending a task completion response to the CPU; and the CPU is also used for acquiring the processing result from the memory according to the task completion response.
In the technical scheme, the CPMU is introduced to control the coprocessor, so that the occupation of calculation force on the CPU is reduced. The data transmission is carried out between the coprocessors through the sub-buses, so that intermediate results are prevented from being transferred through the CPU and the internal memory, the safety of the data transmission is improved, the circulation of the data between the coprocessors is quickened, the data processing efficiency is improved, the frequency of DMA transmission is reduced, and the frequent occupation of buses is avoided.
Optionally, the coprocessor subsystem further comprises a DMA coprocessor; the DMA coprocessor is directly connected with the sub-bus; the CPMU is specifically used for sending a first DMA request to the DMAC through the DMA coprocessor, so that a task to be processed is read from the memory through the bus and stored in the DMA coprocessor; the DMA coprocessor is used for sending the task to be processed to the first CP through the sub-bus and acquiring a processing result from the second CP through the sub-bus; the first CP is a CP for processing the task to be processed; the second CP is the CP for finally processing the task to be processed; the CPMU is also specifically configured to send a second DMA request to the DMAC through the DMA coprocessor, and write the processing result stored in the DMA coprocessor into the memory through the bus.
In the above technical solution, since the steps of the coprocessor for DMA transfer in the present solution only need two places, one place obtains the task to be processed from the memory through DMA transfer, and the other place stores the processing result to the memory, and the data transfer between other coprocessors is realized through sub-buses, only one DMA transfer task can be executed at the same time, therefore, only the coprocessor needs to be divided into the DMA coprocessor and other coprocessors, wherein the other coprocessors do not need to have DMA units, thereby realizing the design of simplifying the other coprocessors and saving the circuit area.
Optionally, any CP is directly connected to the CPMU through a control line; CPMU is not directly connected with sub bus; CPMU, which is used to confirm sender and receiver in any two CPs; any two CPs perform data transmission through a handshake protocol.
In the above technical scheme, as the sub-buses do not distinguish the master device and the slave device, the CPMU is connected with each CP through the control line to determine the sender and the receiver in any two CPs, so that the plurality of CPs can be effectively scheduled and distributed, further, the coprocessor can realize data transmission in a pipelining manner, and the utilization rate of the coprocessor is improved.
Optionally, the sub-bus has a plurality of sub-buses; the coprocessor subsystem further comprises a multiplexer MUX; each CP is provided with a corresponding MUX; the CPMU is also used for designating a sub-bus corresponding to the CP for data transmission through the MUX.
Optionally, the task to be processed is a plurality of tasks; the DMA coprocessor comprises an input DMA coprocessor and an output DMA coprocessor; CPMU is also used for controlling parallel processing of a plurality of tasks to be processed in each CP.
Optionally, any CP includes a temporary storage area, an operation unit, a control unit, and a communication unit; the control unit is connected with the CPMU; the communication unit is connected with the sub-bus.
Optionally, the DMA co-processing includes a DMA unit, a first temporary storage area, a first communication unit, a first control unit, a second temporary storage area, a second control unit, and a second communication unit; the DMA unit, the first temporary storage area, the first control unit and the first communication unit form an input DMA coprocessor; the DMA unit, the second temporary storage area, the second control unit and the second communication unit form an output DMA coprocessor.
Optionally, the coprocessor subsystem further comprises a buffer coprocessor; the buffer type coprocessor comprises a temporary storage area, a communication unit and a simplified version control unit; the simplified version control unit is used for controlling the temporary storage area and the communication unit.
In a second aspect, an embodiment of the present invention provides a task processing method, including: the CPU stores the task to be processed into the memory, and sends a task processing request for indicating to process the task to be processed to the CPMU; the coprocessor subsystem reads the task to be processed from the memory through the bus by the DMAC; CPMU processes task to be processed by controlling at least one CP; the coprocessor subsystem is also used for writing the processing result of the task to be processed into the memory through the DMAC via the bus; at least one CP and CPMU belong to the same coprocessor subsystem; the task to be processed is processed in the coprocessor subsystem; the CPMU sends a task completion response to the CPU; and the CPU acquires the processing result from the memory according to the task completion response.
Optionally, the CPMU processes the task to be processed by controlling at least one CP, including: CPMU determines each CP for processing the task to be processed and sequentially controls each CP to process the task to be processed; after processing the task part of each CP, any CP sends the sub-result of the task part of each CP to the next CP through the sub-bus under the control of the CPMU.
Optionally, the coprocessor subsystem is configured to read the task to be processed from the memory via the DMAC via the bus, and includes: the CPMU instructs the DMA coprocessor to send a first DMA request to the DMAC, so that a task to be processed is read from the memory through the bus and stored in the DMA coprocessor; the DMA coprocessor is positioned in the coprocessor subsystem and is directly connected with the sub-bus; sequentially controlling each CP to process tasks to be processed, including: the CPMU controls the DMA coprocessor to send a task to be processed to the first CP through the sub-bus, and obtains a processing result from the second CP through the sub-bus; the first CP is a CP for processing the task to be processed; the second CP is the CP for finally processing the task to be processed; the coprocessor subsystem is also used for writing the processing result of the task to be processed into the memory through the DMAC via the bus, and comprises: the CPMU sends a second DMA request to the DMAC through the DMA coprocessor, and the processing result stored in the DMA coprocessor is written into the memory through the bus.
Optionally, the task to be processed is a plurality of tasks; the CPMU is also used for controlling a plurality of tasks to be processed in parallel in the coprocessor subsystem respectively.
In a third aspect, an embodiment of the present invention further provides a computer-readable storage medium storing a program that, when executed on a computer, causes the computer to implement a task processing method that performs the above second aspect.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it will be apparent that the drawings in the following description are only some embodiments of the present invention, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a first DMAC system according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a second DMAC system according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a method for DMA transfer according to an embodiment of the present invention;
FIG. 4 is a method interaction diagram for obtaining a processing result of a coprocessor according to an embodiment of the present invention;
FIG. 5 is an interaction diagram of a method for obtaining target data according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;
fig. 7 is a schematic diagram of an electronic device according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a coprocessor CP according to an embodiment of the present invention;
FIG. 9 is a schematic diagram of a DMA coprocessor according to an embodiment of the present invention;
FIG. 10 is a flowchart of a task processing method according to an embodiment of the present invention;
fig. 11 is a schematic structural diagram of a dual sub-bus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the following, some terms in the present application are explained in general to facilitate understanding by those skilled in the art, and are not limited thereto.
1. Direct memory access (Direct Memory Access, DMA): copying data from one address space to another provides for high speed data transfer between a peripheral and a memory or between a memory and a memory. When the CPU initiates this transfer action, the transfer action itself is implemented and completed by the DMA controller (DMAC).
2. Multiplexer (MUX): is a device that can select one signal from a plurality of analog or digital input signals to output. A data selector having a 2 n input has n selectable input-output lines, one of which is selectable as an output by a control terminal. The data selector is mainly used to increase the amount of data that can be transmitted over the network within a certain amount of time and bandwidth.
3. Coprocessor management unit (Co-Processor Management Unit, CPMU): connecting multiple coprocessors in hardware mode realizes data transmission between coprocessors in the form of task queues.
4. Coprocessor subsystem (Co-Processor Subsystem, CPSS): is an organic combination of CPMU and various coprocessors and other accessory components.
5. Bus: the buses are divided into a data bus, an address bus and a control bus. The equipment on the bus is divided into a master equipment and a slave equipment, wherein the master equipment has bus control right and can request data transmission; the slave device only has bus usage rights and can only provide or accept data to the master device upon request from the master device.
6. Data bus: the data bus is a bus for transferring data information (various instruction data information) between the CPU and the memory, and between the CPU and the I/O interface device, and these signals are transferred to and from the CPU and the memory, and between the CPU and the I/O interface device through the data bus, so that the information on the data bus is bi-directionally transferred. The width of the data bus determines the transfer speed of the CPU and the external data. Each transmission line can only transmit one bit of binary data at a time, and the data bus is the sum of the data lines.
7. Address bus: an address bus is a computer bus, which is a CPU or DMA (direct memory access) capable unit that communicates the physical addresses where the units want to access (read/write) to the computer memory components/locations, i.e., the address bus determines on which system component the data transfer occurs and on which address of that system component. The CPU or DMAC designates a memory cell through an address bus. The width of the address bus determines the addressing range, the address bus being the sum of the address data.
8. Control bus: the control bus is mainly used for transmitting control signals and timing signals. Among the control signals, some are the CPU to memory and I/O device interface circuits, such as read/write signals, chip select signals, interrupt response signals, etc. The CPU controls the external device through a control bus, the width of which determines the control capability of the CPU to the external device, and the control bus is the sum of control line data.
In order to facilitate understanding of the present solution, an application scenario of the present solution is described below.
The CPU has many functions such as transferring data, calculating, and controlling program transfer, where the core of the system operation is the CPU, and in a possible scenario, data needs to be transferred between the memory and the peripheral, and in general, byte information may be transferred by inputting and outputting instructions or using an interrupt method, but this way takes a lot of time of the CPU, and at the same time, data is easily lost. Thus, the transfer of data may be achieved by DMA transfer. The DMA transmission mode is to directly exchange data between the memory and the peripheral or between the peripheral and the peripheral, and the intermediate link is reduced without the transfer of an accumulator of the CPU, and the modification of the memory address and the end report of the transmission are realized by a hardware circuit, so that the data transmission speed is greatly improved.
It should be noted that the controller of the DMA transfer mode is not a CPU, but a DMAC, where the DMAC may be a programmable lsi, or may be another form, which is not limited herein. The DMAC is a hardware control circuit for high-speed data transfer between the memory and the peripheral, and is a special processor for realizing direct data transfer.
Fig. 1 is a schematic structural diagram of a first DMAC system according to an embodiment of the invention. Two signal interactions are respectively a DMA request signal and a DMA response signal sent by the DMAC to the peripheral. There are two signal interactions between the DMAC and the CPU, namely a bus request signal sent by the DAMC to the CPU and a bus response signal sent by the CPU to the DMAC. It should be noted that the DMAC system configuration in fig. 1 is a first DMAC system configuration in which the data lines of the peripheral devices are connected to the DMAC, the DMAC of the configuration serves as a transfer station for the peripheral devices and the bus, the data lines of the peripheral devices are connected to the DMAC, and then connected to the data bus through the DMAC, and sometimes the DMAC needs to convert the data stream received from the peripheral devices into a data stream acceptable to the data bus.
The DMAC system architecture further includes a second DMAC system architecture, see fig. 2, in which the peripheral data lines are connected to the data bus, and the DMAC of the architecture is only responsible for handshaking with the CPU and the peripheral, negotiating the usage rights of the bus, and properly setting the address bus, and the DMAC operates in the same manner as in fig. 1. The data lines of the peripheral device transmit the data streams directly to the data bus.
To facilitate an understanding of the present solution, the following description will be given by way of example of a first DMAC system configuration, but also applies to a second DMAC system configuration.
And carrying out data transmission between the peripheral and the memory, wherein the data transmission can be that the peripheral stores target data into the memory, or that the peripheral reads the target data from the memory. And are not limited herein. In order to facilitate understanding of the present solution, how to directly exchange data between the memory and the peripheral device by DMA transfer will be described below by taking the example in which the peripheral device reads target data from the memory.
Fig. 3 is an interaction diagram of a DMA transfer method according to an embodiment of the present invention. The specific steps of DMA transfer are as follows:
in step 301, the peripheral sends a DMA request signal to the DMAC.
In the embodiment of the invention, in order to realize that the peripheral directly reads the target data from the memory, DMA transmission is required. Then, the peripheral is first required to send a DMA request signal to the DMAC, where the DMA request signal is used to request a DMA transfer.
In step 302, the DMAC sends a bus request signal to the CPU.
In the embodiment of the invention, in order to realize that the peripheral reads the target data from the memory through the bus, the DMAC needs to acquire the control right of the bus, so after receiving the DMA request signal sent by the peripheral, the DMAC sends the bus request signal to the CPU, wherein the bus request signal is used for requesting to take over the control right of the bus. Optionally, the bus request signal is an interrupt.
In step 303, the CPU sends a bus response signal to the DMAC.
In the embodiment of the invention, after the CPU receives the bus request signal sent by the DMAC, the CPU performs interrupt processing after the current bus cycle is finished, and sets the control bus, the data bus and the address bus to a high-impedance state, namely, gives up the control right to the bus, and simultaneously sends a bus response signal to the DMAC.
In step 304, the DMAC gains control of the bus.
In the embodiment of the invention, after the DMAC receives the bus response signal, the DMAC takes over the control right of the bus, wherein the bus response signal is used for informing the DMAC that the CPU has abandoned the control right of the bus.
In step 305, the DMAC sends a reply signal to the external DMAC.
In the embodiment of the invention, after the DMAC obtains the bus control right, the DMA transmission can be realized, and the DMAC sends the response signal of the DMAC to the peripheral, wherein the response signal of the DMAC is used for informing the peripheral that the DMA transmission can be started.
In step 306, the DMAC sends an address signal to the memory.
In step 307, the DMAC sends a read control signal to the memory.
In step 308, the DMAC sends a write control signal to the external device.
In step 309, the peripheral reads the target data from the memory.
In the embodiment of the invention, as the DMAC system structure has two types, if the DMAC system structure is the first type, the target data of the peripheral is directly transmitted to the DMAC, and after the transfer of the DMAC, the DMAC sets an address bus and completes data transmission through a data bus, so that the target data reaches the memory. If the DMAC system architecture is the second type, the DMAC is only responsible for setting an address bus according to handshake information with the peripheral, and target data are orderly transmitted to the data bus by the peripheral according to the handshake information and finally reach the memory.
In step 310, the DMAC sends an end signal to the CPU.
In the embodiment of the invention, after the peripheral stores the target data into the memory, the DMAC sends an end signal to the CPU, wherein the end signal is used for requesting to cancel the control right of the DMAC to the bus, and after the CPU receives the end signal, the CPU processes the end signal and the CPU withdraws the control right to the bus. Optionally, the end signal is an interrupt.
Since the number of tasks that cannot be performed by the CPU is relatively large, for example: tasks such as signal transmission between devices, management of access devices, graphics processing, audio processing, etc., require a special processor, which is a coprocessor, to perform the tasks described above. Wherein a coprocessor is a processor that assists a central processing unit in developing and applying processing tasks that it cannot perform or is inefficient to perform. The above steps 301 to 310 describe a DMA transfer process, where the DMA transfer application is relatively wide, and in one possible scenario, if a program running on the CPU needs to obtain the processing result of the coprocessor, the processing result may be obtained by using a DMA transfer method. A method for obtaining processing results from a coprocessor by means of DMA transfer is described below.
Fig. 4 is an interaction diagram of a method for obtaining a processing result of a coprocessor according to an embodiment of the present invention. The method comprises the following steps:
in step 401, the coprocessor sends an interrupt to the CPU.
In the embodiment of the invention, after the coprocessor finishes processing the data to be processed to obtain the target data, the coprocessor sends an interrupt to the CPU, wherein the interrupt is used for informing the CPU to stop the current task and respond to the coprocessor.
In step 402, the cpu returns the processing result to the coprocessor request.
In step 403, the coprocessor sends a DMA request signal to the DMAC.
In step 404, the DMAC sends a bus request signal to the CPU.
In step 405, the CPU sends a bus response signal to the DMAC.
In step 406, the DMAC gains control of the bus.
In step 407, the DMAC sends a reply signal of the DMAC to the coprocessor.
In step 408, the DMAC sends an address signal to the memory.
In step 409, the DMAC sends a write control signal to the memory.
In step 410, the DMAC sends a read control signal to the coprocessor.
In step 411, the coprocessor writes the target data to memory.
In step 412, the DMAC sends an end signal to the CPU.
In one possible scenario, the CPU may be externally connected to a plurality of coprocessors, for example, the CPU may be externally connected to three coprocessors, where a program in the CPU needs to obtain target data obtained by sequentially processing data to be processed by the three coprocessors. In order to facilitate understanding of the present solution, three coprocessors are taken as examples of CP1, CP2 and CP3, respectively, where the order of processing the data is CP 1- > CP 2- > CP3. An implementation of how the target data is obtained is described below.
Fig. 5 shows an interaction diagram of a method for obtaining target data according to an embodiment of the present invention, where the method includes the following steps:
in step 501, the cpu sends data information to the memory.
In the embodiment of the invention, the data information comprises first data to be processed and address information stored in the first data to be processed.
In step 502, the cpu transmits a request for processing the first data to be processed to the CP 1.
In step 503, CP1 sends a DMA request to the DMAC.
At step 504, CP1 reads the first data to be processed from memory.
It should be noted that, the DMA transfer process is omitted between the step 503 and the step 504, and the detailed process is shown in fig. 4, which is not repeated here.
In step 505, the cp1 processes the first data to be processed, to obtain second data to be processed.
In step 506, CP1 sends a DMA request to the DMAC.
In step 507, the CP1 transfers the second pending data to the memory.
It should be noted that, the DMA transfer process is omitted between the step 506 and the step 507, and the detailed process is shown in fig. 4, which is not repeated here.
At step 508, CP1 sends an interrupt to the CPU.
In step 509, the cpu performs interrupt processing.
In step 510, the cpu sends a request for processing the second data to be processed to the CP 2.
In step 511, CP2 sends a DMA request to the DMAC.
At step 512, the CP2 reads the second pending data from the memory.
It should be noted that, the DMA transfer process is omitted between the step 511 and the step 512, and the detailed process is shown in fig. 4, which is not repeated here.
In step 513, the cp2 processes the second data to be processed, to obtain third data to be processed.
In step 514, CP2 sends a DMA request to the DMAC.
At step 515, the cp2 transfers the third pending data to the memory.
It should be noted that, the DMA transfer process is omitted between the step 514 and the step 515, and the detailed process is shown in fig. 4, which is not repeated here.
At step 516, CP2 sends an interrupt to the CPU.
In step 517, the cpu performs interrupt processing.
In step 518, the cpu sends a request for processing the third pending data to the CP 3.
In step 519, the CP3 sends a DMA request to the DMAC.
At step 520, the cp3 reads the third pending data from the memory.
It should be noted that, the DMA transfer process is omitted between the step 519 and the step 520, and the detailed process is shown in fig. 4, which is not repeated here.
In step 521, the cp3 processes the third data to be processed to obtain the target data.
In step 522, the CP3 sends a DMA request to the DMAC.
In step 523, the cp3 transfers the target data to the memory.
It should be noted that, the DMA transfer process is omitted between the step 522 and the step 523, and the detailed process is shown in fig. 4, which is not repeated here.
At step 524, CP3 sends an interrupt to the CPU.
In step 525, the cpu performs interrupt processing.
As can be seen from the above steps 501 to 525, although only the target data in step 523 is required by a certain program of the CPU, the calculation results of CP1 and CP2 in the intermediate steps need to be written back into the memory, so that the coprocessor in the subsequent step can only acquire the data to be processed from the memory, and thus can start further calculation. This has the problem that the same flow is repeated multiple times on different coprocessors, which results in 6 DMA requests and 15 interrupt processes occurring in the process of processing the data to be processed by CP1, CP2 and CP3, wherein each DMA request corresponds to 2 interrupt processes, which is described in fig. 3, and the reason is not described here. But only the DMA request in step 503, the DMA request in step 522, the interrupt request in step 525 are necessary, all other is because the data flow between coprocessors under such a system architecture must be introduced via CPU and memory transfer, which results in the DMA transfer forming an exclusive bus during the DMA transfer, the CPU cannot use the bus read memory, and the CPU loses control of the system bus partially or completely. The occurrence of a large number of DMA requests can affect the real-time nature of the system, especially the RTOS. Additional interrupt handling will also incur the additional overhead of trapping kernel mode and process context switching, as well as affecting the real-time performance of the system.
In one scenario, the encryption and decryption key is required to be placed in a security coprocessor on the hardware, if there is external encrypted data to be processed, it needs to be decrypted by the security coprocessor first, then performs complex operations by other coprocessors, and finally the security coprocessor completes encryption. If implemented by fig. 5, the data decrypted by the security coprocessor must be written back to memory with a DMA request before it can be processed by the DMA to the other coprocessors, and the final result of the operation must be transferred from memory to the security coprocessor with a DMA request. Such an implementation presents serious data security problems, since the unencrypted data must inevitably be buffered in memory.
In summary, the embodiment of the invention provides an electronic device, which reduces the times of DMA requests and the times of interrupts by introducing a coprocessor subsystem, thereby improving the efficiency of data transmission.
As shown in fig. 6, a schematic structural diagram of an electronic device according to an embodiment of the present invention is provided, where the electronic device 600 includes: the CPU610, the coprocessor subsystem 620, the DMAC630 and the memory 640, wherein the CPU610, the DMAC630 and the memory 640 are directly connected with buses respectively. Coprocessor subsystem 620 includes CPMU621, sub-bus 622, and coprocessor CP623 directly coupled to sub-bus 622; any two CPs 623 perform data transmission through the sub-bus 622; the CPMU621 is connected to any CP623 via a control line, and the CPU610 is configured to store a task to be processed in the memory 640 and send a task processing request for instructing to process the task to be processed to the CPMU 621. A coprocessor subsystem 620 for reading pending tasks from the memory 640 via the bus through the DMAC 630; a CPMU621 for processing a task to be processed within the coprocessor subsystem 620 by controlling at least one CP623; the coprocessor subsystem 620 is further configured to write, via the DMAC630, a processing result of the task to be processed into the memory 640 via the bus; CPMU621, also used for sending task completion response to CPU 610; the CPU610 is further configured to obtain a processing result from the memory 640 according to the task completion response.
In the embodiment of the invention, the CPMU is introduced to control the coprocessor, so that the occupation of calculation force on the CPU is reduced. The data transmission is carried out between the coprocessors through the sub-buses, so that intermediate results are prevented from being transferred through the CPU and the internal memory, the safety of the data transmission is improved, the circulation of the data between the coprocessors is quickened, the data processing efficiency is improved, the frequency of DMA transmission is reduced, and the frequent occupation of buses is avoided.
Fig. 7 is a schematic diagram of an electronic device according to an embodiment of the present invention. Wherein RAM in fig. 7 represents memory, CP1, CP2 and CP3 represent coprocessors, wherein coprocessors further comprise a DMA coprocessor comprising an input DMA coprocessor and an output DMA coprocessor, I in fig. 7 represents an input DMA coprocessor, and O represents an output DMA coprocessor.
Optionally, the coprocessor subsystem further comprises a DMA coprocessor; the DMA coprocessor is directly connected with the sub-bus; the CPMU is specifically used for indicating the DMA coprocessor to send a first DMA request to the DMAC, so that a task to be processed is read from the memory through the bus and stored in the DMA coprocessor; the DMA coprocessor is used for sending the task to be processed to the first CP through the sub-bus and acquiring a processing result from the second CP through the sub-bus; the first CP is a CP for processing the task to be processed; the second CP is the CP for finally processing the task to be processed; the CPMU is also specifically configured to send a second DMA request to the DMAC through the DMA coprocessor, and write the processing result stored in the DMA coprocessor into the memory through the bus.
In the embodiment of the invention, the steps of the coprocessor for DMA transmission in the scheme only need two places, one place is to acquire the task to be processed from the memory through DMA transmission, the other place is to store the processing result into the memory, and the data transmission between other coprocessors is realized through the sub-bus.
Optionally, any CP is directly connected to the CPMU through a control line; CPMU is not directly connected with sub bus; CPMU, which is used to confirm sender and receiver in any two CPs; any two CPs perform data transmission through a handshake protocol.
In the embodiment of the invention, the sub-buses do not distinguish the main equipment and the auxiliary equipment, and the CPMU is connected with any one of the CPs through the control line to determine the sender and the receiver in any two CPs, so that the CPs can be effectively scheduled and distributed, the data transmission of the coprocessor can be realized in a pipelining manner, and the utilization rate of the coprocessor is improved.
Optionally, the sub-bus has a plurality of sub-buses; the coprocessor subsystem further comprises a multiplexer MUX; each CP is provided with a corresponding MUX; the CPMU is also used for determining a sub-bus corresponding to the CP for data transmission through the MUX.
Optionally, the task to be processed is a plurality of tasks; the DMA coprocessor comprises an input DMA coprocessor and an output DMA coprocessor; CPMU is also used for controlling parallel processing of a plurality of tasks to be processed in each CP.
Optionally, any CP includes a temporary storage area, an operation unit, a control unit, and a communication unit; the control unit is connected with the CPMU; the communication unit is connected with the sub-bus. Wherein the scratch pad of any coprocessor can be used to scratch intermediate results from other coprocessors.
Optionally, the coprocessor subsystem further comprises a buffer coprocessor; the buffer type coprocessor comprises a temporary storage area, a communication unit and a simplified version control unit; the simplified version control unit is used for controlling the temporary storage area and the communication unit. The buffer type coprocessor does not have an operation unit, the control unit is reduced to be capable of controlling only a communication unit between the temporary storage area and the coprocessor, and the buffer type coprocessor has the function of providing a storage place for data which are temporarily placed in no place, so that the compactness of the pipeline is further improved.
Fig. 8 is a schematic structural diagram of a coprocessor CP according to an embodiment of the present invention. The left side of the figure is a common coprocessor, which generally has a register area, an operation unit, a control unit, and a DMA unit. The coprocessor is divided into a DMA coprocessor and a plurality of CPs in the scheme, wherein any CP only needs to have the capability of calculation and communication with each other as the DMA function is not needed, and the right of the figure 8 can be seen, so that any CP comprises a temporary storage area, an operation unit, a control unit and a communication unit; the control unit is connected with the CPMU; the communication unit is connected with the sub-bus. The circuit of any CP designed in this way is much simplified, thus saving circuit area.
Fig. 9 is a schematic diagram of a DMA coprocessor according to an embodiment of the present invention.
The DMA coprocessor comprises a DMA unit, a first temporary storage area, a first communication unit, a first control unit, a second temporary storage area, a second control unit and a second communication unit; the DMA unit, the first temporary storage area, the first control unit and the first communication unit form an input DMA coprocessor; the DMA unit, the second temporary storage area, the second control unit and the second communication unit form an output DMA coprocessor. Because the DMA coprocessor needs to acquire the address of the task to be processed from the CPMU, the DMA coprocessor and the CPMU are connected through a control line. Alternatively, the DMA coprocessor may be designed directly as part of the CPMU's IP core, due to the strong coupling between the DMA coprocessor and the CPMU.
To facilitate understanding of the present solution, the coprocessor is taken as an example below, which includes an input DMA coprocessor I, an output coprocessor O, a first CP, a second CP, and a third CP, where the input DMA coprocessor is used to read the task to be processed from the memory through a DMA transfer. The output coprocessor is used for storing the processing result of the task to be processed into the memory, and the first CP, the second CP and the third CP are used for processing the task to be processed. The processing sequence of the coprocessor is sequentially an input DMA coprocessor, a first CP, a third CP, a second CP and an output DMA coprocessor. The first CP is the CP for processing the task to be processed first, and the second CP is the CP for processing the task to be processed last.
As shown in fig. 10, a flowchart of a task processing method according to an embodiment of the present invention is provided, where the method is applied to the electronic device, and the method includes the following steps:
in step 1001, the cpu stores the task to be processed in the memory.
In the embodiment of the invention, if a program running in a CPU needs to obtain a processing result obtained by sequentially processing tasks to be processed by a plurality of coprocessors, the CPU needs to store the tasks to be processed in a memory at first, so that the subsequent coprocessors can conveniently acquire the tasks to be processed from the memory through DMA transmission. The task to be processed may be data to be processed, or may be other tasks, which are not limited herein.
In step 1002, the cpu sends a task processing request for instructing processing of a task to be processed to the CPMU.
In the embodiment of the invention, the CPMU is a coprocessor management unit and is used for controlling each coprocessor to carry out data transmission. The CPU stores the tasks to be processed in the memory, so that the coprocessor can perform data transmission in sequence, wherein the processing sequence of the coprocessor can be preset or determined according to specific situations, and the method is not limited herein. The CPU needs to send a task processing request for instructing to process a task to be processed to the CPMU, where the processing request of the task to be processed includes a processing order of the coprocessor.
In step 1003, the CPMU sends a first instruction to the input DMA coprocessor.
In the embodiment of the invention, after a CPMU receives a task processing request for indicating to process a task to be processed, a first instruction is sent to an input DMA coprocessor according to a processing sequence in the task processing request. The first instruction is used for indicating the input DMA coprocessor to send a DMA request to the DMAC, so that the subsequent input DMA coprocessor can conveniently read the task to be processed from the memory through DMA transmission. For example, the processing sequence in the task processing request is sequentially an input DMA coprocessor, a first CP, a third CP, a fourth CP, a second CP, and an output DMA coprocessor, and optionally, the input DMA coprocessor and the output DMA coprocessor may be the same DMA coprocessor.
In step 1004, the input DMA coprocessor sends a first DMA request to the DMAC.
In the embodiment of the invention, after the input DMA coprocessor receives the first instruction sent by the CPMU, the input DMA coprocessor sends the first DMA request to the DMAC according to the instruction of the first instruction, so that the subsequent input DMA coprocessor can conveniently read the task to be processed from the memory through DMA transmission, and further realize the processing of the task to be processed by a plurality of CPs.
In step 1005, the input DMA coprocessor reads the pending task from memory via the bus.
In the embodiment of the present invention, the DMA transfer process is omitted between step 1004 and step 1005, and the detailed process is shown in fig. 4, which is not repeated here. Because the task to be processed needs to be processed in sequence according to the processing sequence of the coprocessors, the input DMA coprocessors with the DMA function are required to read the task to be processed from the memory through DMA transmission, so that other subsequent coprocessors can process the task to be processed conveniently.
In step 1006, the input DMA coprocessor sends a first notification message to the CPMU.
In the embodiment of the invention, the input DMA coprocessor reads the task to be processed, stores the task to be processed in the cache, and then sends a first notification message to the CPMU, wherein the first notification message is used for notifying the CPMU, and the input DMA coprocessor has read the task to be processed. Therefore, the CPMU is convenient to control the task to be processed according to the first notification message to process the task sequentially according to the processing sequence of the coprocessor.
And step 1007, the CPMU sends a second instruction to the input DMA coprocessor and a third instruction to the first CP.
In the embodiment of the invention, after the CPMU receives the first notification message sent by the input DMA coprocessor, the CPMU determines a sender and a receiver of the sub-bus according to the processing sequence of the coprocessor, and sends a second instruction to the input DMA coprocessor, wherein the second instruction is used for indicating the input DMA coprocessor to transmit the task to be processed through the sub-bus. The CPMU sends a third instruction to the first CP, wherein the third instruction is used for instructing the first CP to receive data from the sub-bus. Wherein the sender of this sub-bus is the input DMA coprocessor and the recipient is the first CP. Because the control logic between the coprocessors and the CPMU is simpler and the required delay is lower, each coprocessors is provided with an independent control line connected with the CPMU. For example, the CPMU is connected to the input DMA coprocessor via a control line to the input DMA coprocessor and sends a second instruction to the input DMA coprocessor via the control line and a third instruction to the first CP via the control line.
In step 1008, the input DMA coprocessor sends the task to be processed to the first CP via the sub-bus.
In the embodiment of the invention, the input DMA coprocessor sends the task to be processed to the first CP through the sub-bus, and as the sub-bus is not distinguished by the master device and the slave device, the positions of the sender and the receiver are equal, two coprocessors are required to be selected in each transmission process, wherein the fact that the sub-bus does not have an address bus can be seen. Since any two coprocessors are data-transferred via a handshake protocol, the sub-bus does not have a control bus. In this way, the first CP does not need to have a DMA function to read the task to be processed from the memory, and the input DMA coprocessor directly transmits the task to be processed to the first CP through the sub-bus, so that the design of the first CP is simplified, and the circuit area is saved.
In step 1009, the first CP processes the task to be processed.
In the embodiment of the invention, the first CP acquires the task to be processed from the input DMA coprocessor through the sub-bus, and then the first CP needs to process the task to be processed until the first CP processes the task part of the first CP.
In step 1010, the first CP sends a second notification message to the CPMU.
In the embodiment of the invention, when the first CP processes the own task portion, a second notification message is sent to the CPMU, where the second notification message is used to notify the CPMU that the first CP has processed the own task portion.
In step 1011, the cpmu sends a fourth instruction to the first CP and a fifth instruction to the third CP.
In the embodiment of the invention, after the CPMU receives the second notification message, the CPMU knows that the first CP has processed the task part of the CPU, and then sends a fourth instruction to the first CP, wherein the fourth instruction is used for indicating the first CP to transmit the sub-result of the task part of the CPU to the third CP through the sub-bus, and the CPMU sends a fifth instruction to the third CP, wherein the fifth instruction is used for indicating the third CP to receive data from the sub-bus.
In step 1012, the first CP sends the sub-result of the task portion of the first CP itself to the third CP via the sub-bus.
In the embodiment of the invention, the third CP and the first CP perform data transmission through the sub-bus, so that the third CP does not need a DMA function to read the task to be processed from the memory, and the first CP directly transmits the task to be processed to the third CP through the sub-bus, thereby simplifying the design of the third CP and saving the circuit area.
In step 1013, the third CP processes the sub-result of the task portion of the first CP itself.
In the embodiment of the invention, the third CP acquires the sub-result of the task part of the first CP from the first CP through the sub-bus, and then the third CP needs to process the sub-result of the task part of the first CP until the third CP processes the task part of the third CP.
The third CP sends a third notification message to the CPMU, step 1014.
In the embodiment of the invention, when the third CP processes the own task portion, a third notification message is sent to the CPMU, where the third notification message is used to notify the CPMU that the third CP has processed the own task portion.
In step 1015, the cpmu sends a sixth instruction to the third CP and sends a seventh instruction to the second CP.
In the embodiment of the invention, after the CPMU receives the third notification message, it knows that the third CP has processed the task portion of itself, and then sends a sixth instruction to the third CP, where the sixth instruction is used to instruct the third CP to perform data transmission on the sub-result that has processed the task portion of the third CP itself through the sub-bus, and the CPMU sends a seventh instruction to the second CP, where the seventh instruction is used to instruct the second CP to receive data from the sub-bus.
In step 1016, the third CP sends the sub-result of the third CP's own task portion to the second CP via the sub-bus.
In the embodiment of the invention, the third CP and the second CP perform data transmission through the sub-bus, so that the second CP does not need a DMA function to read the task to be processed from the memory, and the third CP directly transmits the task to be processed to the second CP through the sub-bus, thereby simplifying the design of the second CP and saving the circuit area.
In step 1017, the second CP processes the sub-result of the third CP's own task portion.
In the embodiment of the invention, the second CP acquires the sub-result of the task part of the third CP from the third CP through the sub-bus, and then the second CP needs to process the sub-result of the task part of the third CP until the second CP processes the task part of the second CP.
In step 1018, the second CP sends a fourth notification message to the CPMU.
In the embodiment of the invention, when the second CP processes the own task portion, a fourth notification message is sent to the CPMU, where the fourth notification message is used to notify the CPMU that the second CP has processed the own task portion.
And 1019, the CPMU sends an eighth instruction to the second CP and sends a ninth instruction to the DMA coprocessor.
In the embodiment of the invention, after the CPMU receives the fourth notification message, the CPMU knows that the second CP has processed the task part of the CPU, and then the CPMU sends an eighth instruction to the second CP, wherein the eighth instruction is used for instructing the second CP to perform data transmission on the processing result of the task to be processed through the sub-bus. The CPMU sends a ninth instruction to the output DMA coprocessor, wherein the ninth instruction is for instructing the output DMA coprocessor to receive data from the sub-bus. It should be noted that, in one possible case, since the buses are 1, the DMA transfer cannot be parallelized, and the input DMA processor and the output DMA processor cannot simultaneously perform the DMA transfer, the input DMA coprocessor and the output DMA coprocessor may be the same DMA coprocessor.
In step 1020, the second CP sends the processing result of the task to be processed to the output DMA coprocessor through the sub-bus.
In the embodiment of the invention, since the second CP does not have a DMA function, in order to store the processing result of the task to be processed into the memory, thereby realizing that the program of the CPU reads the processing result of the task to be processed from the memory, the second CP is required to send the processing result of the task to be processed to the output DMA coprocessor through the sub-bus.
And 1021, the CPMU sends a tenth instruction to the output DMA coprocessor.
In the embodiment of the invention, the tenth instruction is used for indicating the output DMA coprocessor to send the DMA request to the DMAC, so that the subsequent output DMA coprocessor is convenient to store the processing result of the task to be processed into the memory through DMA transmission, and further the program of the CPU reads the processing result of the task to be processed from the memory
The output DMA coprocessor sends a second DMA request to the DMAC, step 1022.
In the embodiment of the invention, after the output DMA coprocessor receives the tenth instruction sent by the CPMU, the output DMA coprocessor sends the second DMA request to the DMAC according to the instruction of the tenth instruction, thereby facilitating the subsequent output DMA coprocessor to store the processing result of the task to be processed into the memory through DMA transmission.
Step 1023, the output DMA coprocessor stores the processing result of the task to be processed into the memory through the bus.
Step 1024, the output DMA coprocessor sends a fifth notification message to the CPMU.
In the embodiment of the invention, after the output DMA coprocessor stores the processing result of the task to be processed in the memory, the output DMA coprocessor sends a fifth notification message to the CPMU, wherein the fifth notification message is used for notifying the CPMU, and the output DMA coprocessor stores the processing result of the task to be processed in the memory.
Step 1025, the cpmu sends a task completion response to the CPU.
In the embodiment of the invention, after the CPMU knows that the processing result of the task to be processed is stored in the memory, the CPMU sends a task completion response to the CPU, wherein the task completion response is used for informing the CPU that the task to be processed is processed, and the processing result is stored in the memory. Optionally, the task completion response is an interrupt.
In step 1026, the cpu acquires the processing result from the memory in accordance with the task completion response.
As can be seen from steps 1001 to 1026, the plurality of coprocessors perform data transmission through the sub-bus, so that the number of DMA transmissions performed by the coprocessors is reduced, the time for exclusive bus transmission due to DMA transmissions is reduced, and the influence on the CPU due to DMA transmissions is reduced. The number of interrupt processing is also reduced, so that the number of times that the CPU switches back and forth between the kernel mode and the user mode is reduced, and the additional expenditure is reduced. The sub-buses are used for data transmission, so that the fact that unencrypted data are temporarily stored in a memory is avoided, and the safety of data transmission is improved. The CPU can complete other tasks without interference in the whole process.
In the embodiment of the invention, since any CP sends a notification to the CPMU after the processing of the task to be processed is finished, the CPMU can reasonably arrange the order of the processing of the task to be processed by the CPs according to the working state of the CPs, thereby realizing the use of each CP in the pipeline, wherein the processing time of the task to be processed on the CPs is not greatly different from the DMA transmission time of the task to be processed, and if the final processing result needs that a second CP is transmitted back to the memory through the DMA, the function of the CP operation is idle when the CPs perform the DMA transmission, thereby causing the condition of resource waste of the CPs. The first CP would also need to initially make a DMA transfer if there were no DMA coprocessors. If the CPU sends out a plurality of tasks, the first CP of a certain task will idle the operation function during DMA transmission, and can not serve other tasks. In order to avoid the situation of resource waste of the CP, the scheme needs to introduce a DMA coprocessor, so that in a possible scene, a plurality of tasks to be processed are respectively processed in parallel in each CP, wherein the DMA coprocessor performs DMA transmission, and the plurality of CPs process the plurality of tasks to be processed. In order to simplify the following examples, the CPMU is not illustrated as being connected to each co-processor via a control line, and the control instruction sent by the CPMU to the co-processor and the notification message sent by the co-processor to the CPMU, and it should be noted that the above description is provided in the complete embodiment. For example, the plurality of coprocessors includes an input DMA coprocessor, a first CP, a second CP, a third CP, and an output DMA coprocessor. If the CPU sends out three tasks to be processed, the processing sequence of the tasks on the coprocessor is identical. The processing sequence of the coprocessor is an input DMA coprocessor, a first CP, a third CP, a second CP and an output DMA coprocessor. The input DMA coprocessor acquires a first task to be processed from the memory through DMA transmission, and then the DMA coprocessor transmits the first task to be processed to the first CP through the sub-bus. When the first CP processes the first task to be processed, the input DMA coprocessor acquires the second task to be processed from the memory through DMA transmission. When the input DMA coprocessor acquires the second task to be processed, the first CP processes the task part corresponding to the first task to be processed. When the input DMA coprocessor transmits a second task to be processed to the first CP through the sub-bus, the first CP transmits a sub-result of a task part of the first CP corresponding to the first task to be processed to the third CP through the sub-bus. When the input DMA coprocessor acquires a third to-be-processed task from the memory through DMA transmission, the first CP processes the second to-be-processed task, the third CP processes the first to-be-processed task, when the input DMA coprocessor acquires the third to-be-processed task, the first CP processes the first CP self task part corresponding to the second to-be-processed task, and the third CP processes the sub-result of the first CP self task part corresponding to the first to-be-processed task. When the input DMA coprocessor transmits a third to-be-processed task to a first CP through a sub-bus, the first CP transmits a sub-result of a first CP self task part corresponding to a second to-be-processed task to the third CP through the sub-bus, the third CP transmits a sub-result of a third CP self task part corresponding to the first to-be-processed task to the second CP through the sub-bus, and then when the second CP processes a sub-result of a third CP self task part corresponding to the first to-be-processed task, the third CP processes a sub-result of a CP self part corresponding to the second to-be-processed task, and when the first CP transmits a processing result of the first to-be-processed task to the output DMA through the sub-bus, the third CP transmits a sub-result of a third CP self task part of the second to-CP through the sub-bus, and the first CP transmits a sub-result of the first CP self task part of the third to-CP through the sub-bus. When the second CP processes the sub-result of the third CP self-task part of the second task to be processed, the third CP processes the sub-result of the first CP self-task part of the third task to be processed, when the second CP transmits the processing result of the second task to be processed to the output DMA coprocessor through the sub-bus, the third CP processes the sub-result of the third CP self-task part corresponding to the third task to be processed through the sub-bus, when the output DMA coprocessor obtains the processing result of the second task to be processed, the second CP obtains the sub-result of the third CP self-task part corresponding to the third task to be processed, when the output DMA transmits the processing result of the second task to be processed to the memory, the second CP processes the sub-result of the third CP self-task part corresponding to the third CP, and finally, the second CP transmits the processing result to the third CP to the memory. There are three such tasks to be processed in this example. It should be noted that in practical applications, the order in which the coprocessors are used by the plurality of tasks to be processed may be different. This is illustrated to facilitate understanding of the present solution, but the present example is not limited to the solution.
In a possible case, if a plurality of coprocessors are required to complete tasks, one sub-bus needs to be queued in the process of data transmission, and the data transmission cannot be performed in parallel, so that the data transmission efficiency is low. If a plurality of sub buses exist, the parallelization can be realized for data transmission. See fig. 11. For example, if there are two sub-buses, there are a DMA coprocessor, a first CP, a second CP and a third CP, where, in order to show that the DMA processor has functions of output and input, the input DMA coprocessor in fig. 11 is denoted by I, the output DMA coprocessor is denoted by O, the first CP is denoted by CP1, the second CP is denoted by CP2, and the third CP is denoted by CP 3. In fig. 11, two sub-buses, namely sub-bus 0 and sub-bus 1, are respectively provided, in order to avoid collision of the two sub-buses during use, each CP is provided with a corresponding MUX, wherein both the input and the output of the MUX have bus widths, the selection line of the MUX is connected to the CPMU, and the CPMU selects the sub-bus required to be used by the CP through the MUX, wherein the MUX is a two-path MUX due to the two sub-buses. If there are three sub-buses, then the MUX is a three-way MUX. If there are N sub-buses, then the MUX is an N-way MUX. Wherein N is a positive integer greater than or equal to 2. For example, if the CPMU needs to transfer data from CP3 to CP1 and simultaneously transfer data from CP2 to O, the following steps are specific: firstly, CPMU selects sub-bus 0 for CP3 and CP1; sub-bus 1 is selected for CP2 and O, then CPMU sends a transfer request to CP2 and CP3 via control lines, sends a receive request to CP1 and O, CP3 and CP1 turn on sub-bus 0, and performs a handshake thereon. CP2 and O then turn on sub-bus 1, performing a handshake thereon. At this time, the remaining CPs will not be turned on with any sub-buses since no transmission/reception request is received. After the handshake is completed, CP3 enters a transmission mode, CP1 enters a reception mode, and data flows from CP3 to CP1 via sub-bus 0; CP2 enters a transmit mode and O enters a receive mode, with data flowing from CP2 to O via sub-bus 1. After the transmission is completed, CP1, CP2, CP3, and O each transmit a transmission completion signal to the CPMU. CP1 begins processing the received data. O obtains a target address from a control line of the CPMU, handshakes with the DMAC, and starts to write back the received data to the corresponding address in the memory through DMA transmission. And then CPMU marks CP2 and CP3 are idle, and in the positive operation of CP1, O is executing DMA, so that the CPMU is convenient to control each coprocessor to process data in a pipelining manner.
Based on the same technical idea, the embodiments of the present application also provide a computer-readable storage medium storing a computer program executable by a computing device, which when run on the computing device, causes the computing device to perform the steps of the task processing method described above.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.

Claims (10)

1. An electronic device, comprising: a processor CPU, a coprocessor subsystem, a memory direct access controller DMAC and a memory; the CPU, the DMAC and the memory are respectively and directly connected with a bus;
the coprocessor subsystem comprises a coprocessor management unit CPMU, a sub-bus, a coprocessor CP directly connected with the sub-bus and a DMA coprocessor; any two CPs perform data transmission through the sub-buses; the DMA coprocessor is directly connected with the sub-bus; the CPMU is not directly connected with the sub-bus;
the CPU is used for storing a task to be processed into the memory and sending a task processing request for indicating to process the task to be processed to the CPMU;
the coprocessor subsystem is used for reading the task to be processed from the memory through the DMAC via the bus;
The CPMU is used for processing the task to be processed in the coprocessor subsystem by controlling at least one CP;
the coprocessor subsystem is also used for writing the processing result of the task to be processed into the memory through the bus by the DMAC;
the CPMU is further used for sending a task completion response to the CPU;
the CPU is further used for acquiring the processing result from the memory according to the task completion response;
the CPMU is specifically configured to instruct the DMA coprocessor to send a first DMA request to the DMAC, so that the task to be processed is read from the memory through the bus and stored into the DMA coprocessor;
the DMA coprocessor is used for sending the task to be processed to a first CP through the sub-bus and acquiring the processing result from a second CP through the sub-bus; the first CP is a CP for processing the task to be processed first; the second CP is the CP for finally processing the task to be processed;
the CPMU is further specifically configured to send a second DMA request to the DMAC through the DMA coprocessor, and write the processing result stored in the DMA coprocessor into the memory through the bus.
2. The electronic device of claim 1, wherein any CP is directly connected to the CPMU through a control line;
the CPMU is further used for determining a sender and a receiver in any two CPs; any two CPs perform data transmission through a handshake protocol.
3. The electronic device of claim 1 or 2, wherein the sub-bus has a plurality of sub-buses; the coprocessor subsystem further comprises a multiplexer MUX; each CP is provided with a corresponding MUX;
the CPMU is further configured to designate a sub-bus for data transmission corresponding to the CP through the MUX.
4. The electronic device of claim 3, wherein the task to be processed is a plurality of;
the DMA coprocessor comprises an input DMA coprocessor and an output DMA coprocessor;
the CPMU is further used for controlling parallel processing of a plurality of tasks to be processed in each CP.
5. The electronic device of claim 3, wherein any CP comprises a temporary storage area, an arithmetic unit, a control unit, a communication unit;
the control unit is connected with the CPMU; the communication unit is connected with the sub-bus.
6. The electronic device of claim 4, wherein the DMA co-process comprises a DMA unit, a first temporary memory, a first communication unit, a first control unit, a second temporary memory, a second control unit, and a second communication unit;
The DMA unit, the first temporary storage area, the first control unit and the first communication unit form the input DMA coprocessor;
the DMA unit, the second temporary storage area, the second control unit and the second communication unit form the output DMA coprocessor.
7. The electronic device of claim 3, wherein the coprocessor subsystem further comprises a buffer coprocessor;
the buffer type coprocessor comprises a temporary storage area, a communication unit and a simplified version control unit; the simplified version control unit is used for controlling the temporary storage area and the communication unit.
8. A task processing method, applied to the electronic device as claimed in claim 1, comprising:
the CPU stores the task to be processed into a memory, and sends a task processing request for indicating to process the task to be processed to the CPMU;
the CPMU instructs a DMA coprocessor to send a first DMA request to the DMAC, so that the task to be processed is read from the memory through the bus and stored into the DMA coprocessor; the DMA coprocessor is positioned in the coprocessor subsystem and is directly connected with the sub-bus; the CPMU is not directly connected with the sub-bus;
The CPMU determines to process each CP of the task to be processed, controls the DMA coprocessor to send the task to be processed to a first CP through the sub-bus, and obtains the processing result from a second CP through the sub-bus; the first CP is a CP for processing the task to be processed first; the second CP is the CP for finally processing the task to be processed;
after the own task part is processed by any one of the CPs, the sub-result of the own task part is sent to the next CP through a sub-bus by the control of the CPMU;
the CPMU sends a second DMA request to the DMAC through the DMA coprocessor, and the processing result stored in the DMA coprocessor is written into the memory through the bus;
the at least one CP and the CPMU belong to the same coprocessor subsystem; the task to be processed is processed in the coprocessor subsystem;
the CPMU sends a task completion response to the CPU;
and the CPU acquires the processing result from the memory according to the task completion response.
9. The method of claim 8, wherein the task to be processed is a plurality of;
The CPMU is also used for controlling a plurality of tasks to be processed in parallel in the coprocessor subsystem respectively.
10. A computer readable storage medium, characterized in that it stores a computer program executable by a computer device, which program, when run on the computer device, causes the computer device to perform the steps of the method of claim 8 or 9.
CN202311405894.8A 2023-10-27 2023-10-27 Electronic equipment and task processing method Active CN117171075B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311405894.8A CN117171075B (en) 2023-10-27 2023-10-27 Electronic equipment and task processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311405894.8A CN117171075B (en) 2023-10-27 2023-10-27 Electronic equipment and task processing method

Publications (2)

Publication Number Publication Date
CN117171075A CN117171075A (en) 2023-12-05
CN117171075B true CN117171075B (en) 2024-02-06

Family

ID=88941622

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311405894.8A Active CN117171075B (en) 2023-10-27 2023-10-27 Electronic equipment and task processing method

Country Status (1)

Country Link
CN (1) CN117171075B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0330110A2 (en) * 1988-02-25 1989-08-30 Fujitsu Limited Direct memory access controller
CN1115058A (en) * 1993-12-16 1996-01-17 国际商业机器公司 Protected programmable memory cartridge and computer system using same
CN111324558A (en) * 2020-02-05 2020-06-23 苏州浪潮智能科技有限公司 Data processing method and device, distributed data stream programming framework and related components
CN113407352A (en) * 2021-07-20 2021-09-17 北京百度网讯科技有限公司 Method, processor, device and readable storage medium for processing task
CN113515483A (en) * 2020-04-10 2021-10-19 华为技术有限公司 Data transmission method and device
CN115098245A (en) * 2022-05-31 2022-09-23 北京旷视科技有限公司 Task processing method, electronic device, storage medium, and computer program product
CN115701593A (en) * 2021-08-02 2023-02-10 辉达公司 Using a hardware sequencer in a direct memory access system of a system on a chip
CN115981833A (en) * 2021-10-15 2023-04-18 华为技术有限公司 Task processing method and device
CN116136790A (en) * 2021-11-17 2023-05-19 澜起科技股份有限公司 Task processing method and device
CN116243983A (en) * 2023-03-31 2023-06-09 昆仑芯(北京)科技有限公司 Processor, integrated circuit chip, instruction processing method, electronic device, and medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11144417B2 (en) * 2018-12-31 2021-10-12 Texas Instruments Incorporated Debug for multi-threaded processing
CN112463709A (en) * 2019-09-09 2021-03-09 上海登临科技有限公司 Configurable heterogeneous artificial intelligence processor
US20220188155A1 (en) * 2020-12-11 2022-06-16 Ut-Battelle, Llc Hierarchical task scheduling for accelerators

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0330110A2 (en) * 1988-02-25 1989-08-30 Fujitsu Limited Direct memory access controller
CN1115058A (en) * 1993-12-16 1996-01-17 国际商业机器公司 Protected programmable memory cartridge and computer system using same
CN111324558A (en) * 2020-02-05 2020-06-23 苏州浪潮智能科技有限公司 Data processing method and device, distributed data stream programming framework and related components
CN113515483A (en) * 2020-04-10 2021-10-19 华为技术有限公司 Data transmission method and device
CN113407352A (en) * 2021-07-20 2021-09-17 北京百度网讯科技有限公司 Method, processor, device and readable storage medium for processing task
CN115701593A (en) * 2021-08-02 2023-02-10 辉达公司 Using a hardware sequencer in a direct memory access system of a system on a chip
CN115981833A (en) * 2021-10-15 2023-04-18 华为技术有限公司 Task processing method and device
CN116136790A (en) * 2021-11-17 2023-05-19 澜起科技股份有限公司 Task processing method and device
CN115098245A (en) * 2022-05-31 2022-09-23 北京旷视科技有限公司 Task processing method, electronic device, storage medium, and computer program product
CN116243983A (en) * 2023-03-31 2023-06-09 昆仑芯(北京)科技有限公司 Processor, integrated circuit chip, instruction processing method, electronic device, and medium

Also Published As

Publication number Publication date
CN117171075A (en) 2023-12-05

Similar Documents

Publication Publication Date Title
CN109902053B (en) SPI communication method based on double controllers, terminal equipment and storage medium
CN108763140B (en) Bidirectional communication method, system and terminal equipment
US20090119460A1 (en) Storing Portions of a Data Transfer Descriptor in Cached and Uncached Address Space
US11341087B2 (en) Single-chip multi-processor communication
JP2006107514A (en) System and device which have interface device which can perform data communication with external device
CN112711550A (en) DMA automatic configuration module and SOC
CN112445735A (en) Method, computer equipment, system and storage medium for transmitting federated learning data
US20240205170A1 (en) Communication method based on user-mode protocol stack, and corresponding apparatus
CN117171075B (en) Electronic equipment and task processing method
CN110659143B (en) Communication method and device between containers and electronic equipment
US20230153153A1 (en) Task processing method and apparatus
JP2008502977A (en) Interrupt method for bus controller
WO2022228485A1 (en) Data transmission method, data processing method, and related product
CN114968863A (en) Data transmission method based on DMA controller
KR102206313B1 (en) System interconnect and operating method of system interconnect
US11386029B2 (en) Direct memory access controller
KR101276837B1 (en) Apparatus for communicating between processor systems operating with different operating frequencies
CN111052100A (en) Bit-operable direct memory access
CN113157610B (en) Data storage method and device, storage medium and electronic device
JP5587530B2 (en) Engine / processor linkage system and linkage method
JP2001014266A (en) Dma transfer circuit and dma transfer method
JP2001344222A (en) Computer system
CN115437811A (en) Inter-process communication method, device, equipment and storage medium
JP2699873B2 (en) Bus control circuit
CN102184147A (en) Device and method for accelerating data processing based on memory interface

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant