CN115016847B - Method and device for improving throughput of assembly line and electronic equipment - Google Patents

Method and device for improving throughput of assembly line and electronic equipment Download PDF

Info

Publication number
CN115016847B
CN115016847B CN202210941984.8A CN202210941984A CN115016847B CN 115016847 B CN115016847 B CN 115016847B CN 202210941984 A CN202210941984 A CN 202210941984A CN 115016847 B CN115016847 B CN 115016847B
Authority
CN
China
Prior art keywords
group
tasks
decomposition unit
registers
distributing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210941984.8A
Other languages
Chinese (zh)
Other versions
CN115016847A (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Muxi Integrated Circuit Shanghai Co ltd
Original Assignee
Muxi Integrated Circuit Shanghai Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Muxi Integrated Circuit Shanghai Co ltd filed Critical Muxi Integrated Circuit Shanghai Co ltd
Priority to CN202210941984.8A priority Critical patent/CN115016847B/en
Publication of CN115016847A publication Critical patent/CN115016847A/en
Application granted granted Critical
Publication of CN115016847B publication Critical patent/CN115016847B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multi Processors (AREA)

Abstract

The invention provides a method, a device and electronic equipment for improving pipeline throughput, and relates to a data processing technology.A first group of registers are distributed to a lower-layer decomposition unit by an upper-layer decomposition unit, and a first group of tasks in tasks to be processed are distributed to a lower-layer decomposition unit according to the first group of registers; in the process of distributing the first group of tasks by the lower decomposition unit, the upper decomposition unit distributes a second group of registers to the lower decomposition unit, and the registers are temporarily stored by the lower decomposition unit; if the mode is the common mode, after the lower layer decomposition unit finishes distributing the first group of tasks, if the second group of registers finishes distributing and configuring, a second group of tasks in the tasks to be processed can be distributed immediately according to the second group of registers; if the mode is the safe mode, after the first group of tasks are required to be executed and the second group of registers are distributed and configured, distributing a second group of tasks in the tasks to be processed according to the second group of registers; and the computing unit executes the distributed tasks to be processed.

Description

Method and device for improving throughput of assembly line and electronic equipment
Technical Field
The present invention relates to data processing technologies, and in particular, to a method and an apparatus for improving throughput of a pipeline, and an electronic device.
Background
The software delivers the data to be processed to the hardware in units of packets (packets). The contents of the package include two parts: configuration registers, and pending tasks.
In the prior art, when the upper layer decomposition unit finishes distributing the last packet (n), and is to switch to the next packet (n + 1), unpredictable waiting is bound to be faced. Because all the lower decomposition units are still distributing the task of packet (n) at this time, if the upper decomposition unit immediately starts broadcasting the configuration register information of packet (n + 1), it will cause the lower decomposition unit to distribute the task of packet (n) according to the configuration of packet (n + 1), which will be a serious error; the register configuration of packet (n + 1) cannot begin to be broadcast until the upper decomposition unit knows that all lower decomposition units have distributed the task of packet (n) to the compute units.
At packet switch time, the upper layer decomposition unit has to pause the pipeline to wait for the feedback results of all the lower layer decomposition units, which is a significant loss to pipeline throughput.
Disclosure of Invention
The embodiment of the invention provides a method and a device for improving throughput of a production line and electronic equipment, which can improve throughput efficiency of the production line.
In a first aspect of the embodiments of the present invention, a method for improving pipeline throughput is provided, which is applied to a common mode, and includes:
distributing a first group of registers to a lower decomposition unit based on an upper decomposition unit, and distributing a first group of tasks in the tasks to be processed according to the first group of registers;
distributing a second group of registers to the lower decomposition unit based on the upper decomposition unit in the process of distributing the first group of tasks of the lower decomposition unit, and temporarily storing the second group of registers by the lower decomposition unit;
in a common mode, after the first group of tasks are distributed and the second group of registers are distributed, if the second group of registers are distributed and configured, the second group of tasks in the tasks to be processed can be distributed immediately according to the second group of registers;
and the computing unit executes the distributed tasks to be processed.
Optionally, in a possible implementation manner of the first aspect, before distributing the second set of tasks, the method further includes:
and determining that the second group of registers are completely distributed, and determining that the first group of tasks are completely distributed.
Optionally, in a possible implementation manner of the first aspect, the second set of register configurations is completed when the first set of tasks is distributed.
In a second aspect of the embodiments of the present invention, a method for improving pipeline throughput is provided, which is applied to a secure mode, and includes:
distributing a first group of registers to a lower decomposition unit based on an upper decomposition unit, and distributing a first group of tasks in the tasks to be processed according to the first group of registers;
in the process of distributing the first group of tasks, distributing a second group of registers to a lower-layer decomposition unit based on the upper-layer decomposition unit, and temporarily storing the registers by the lower-layer decomposition unit;
after the information that the execution of the first group of tasks is completed is received according to the upper decomposition unit, if the distribution of the second group of registers is completed, the second group of tasks in the tasks to be processed can be distributed according to the second group of registers;
and the computing unit executes the distributed tasks to be processed.
Optionally, in a possible implementation manner of the second aspect, after receiving, by the upper decomposition unit, information that the execution of the first group of tasks is completed, distributing, according to a second group of registers, a second group of tasks among the to-be-processed tasks, includes:
generating sub information of the executed first group of tasks in the computing unit according to the lower decomposition unit, and transmitting the sub information to the upper decomposition unit;
receiving all the sub information according to the upper decomposition unit, and determining that the first group of tasks are completely executed according to all the sub information;
and distributing a second group of tasks in the tasks to be processed according to a second group of registers.
In a third aspect of the embodiments of the present invention, a method for improving pipeline throughput is provided, including:
distributing a first group of registers to a lower layer decomposition unit based on an upper layer decomposition unit, and distributing a first group of tasks in the tasks to be processed according to the first group of registers;
in the process of distributing the first group of tasks, distributing a second group of registers to the lower decomposition unit based on the upper decomposition unit, and temporarily storing the registers by the lower decomposition unit;
if the current mode is the common mode, after the first group of tasks are distributed and the second group of registers are distributed and configured, distributing a second group of tasks in the to-be-processed tasks according to the second group of registers;
if the current mode is a safe mode, after the information that the execution of the first group of tasks is finished is received according to the upper decomposition unit, if the distribution configuration of the second group of registers is finished, the second group of tasks in the tasks to be processed can be distributed according to the second group of registers;
and the computing unit executes the distributed tasks to be processed.
Optionally, in a possible implementation manner of the third aspect, after distributing, in the process of distributing the first set of tasks, the second set of registers to the lower decomposition unit based on the upper decomposition unit, the method further includes:
and judging whether the current mode is a common mode or a safe mode.
In a fourth aspect of the embodiments of the present invention, there is provided an apparatus for improving pipeline throughput, which is applied to a normal mode, and includes:
the first module is used for distributing a first group of registers to the lower decomposition unit based on the upper decomposition unit and distributing a first group of tasks in the tasks to be processed according to the first group of registers;
the second module is used for distributing a second group of registers to the lower-layer decomposition unit based on the upper-layer decomposition unit in the process of distributing the first group of tasks of the lower-layer decomposition unit, and temporarily storing the second group of registers by the lower-layer decomposition unit;
the distribution module is used for distributing a second group of tasks in the tasks to be processed according to a second group of registers immediately by the lower decomposition unit after the first group of tasks are distributed and the second group of registers are distributed and configured;
and the execution module is used for executing the distributed tasks to be processed by the computing unit.
In a fifth aspect of the embodiments of the present invention, there is provided an apparatus for improving pipeline throughput, which is applied to a secure mode, and includes:
the system comprises a first module, a second module and a third module, wherein the first module is used for distributing a first group of registers to a lower-layer decomposition unit based on an upper-layer decomposition unit and distributing a first group of tasks in the tasks to be processed according to the first group of registers;
the second module is used for distributing a second group of registers to the lower-layer decomposition unit based on the upper-layer decomposition unit in the process of distributing the first group of tasks of the lower-layer decomposition unit, and temporarily storing the second group of registers by the lower-layer decomposition unit;
the common module is used for distributing a second group of tasks in the tasks to be processed according to a second group of registers after the first group of tasks are distributed and the second group of registers are distributed if the current mode is the common mode;
the safety module is used for receiving information that the execution of the first group of tasks is finished according to the upper decomposition unit if the current mode is the safety mode, and distributing a second group of tasks in the tasks to be processed according to a second group of registers after the second group of registers are distributed;
and the execution module is used for executing the distributed tasks to be processed by the computing unit.
In a sixth aspect of an embodiment of the present invention, there is provided an electronic device, including: memory, a processor and a computer program, the computer program being stored in the memory, the processor running the computer program to perform the method according to the first, second and third aspects of the invention.
A seventh aspect of the embodiments of the present invention provides a readable storage medium, in which a computer program is stored, where the computer program is used, when executed by a processor, to implement the method according to any of the first, second and third aspects of the present invention.
According to the method, the device and the electronic equipment for improving the throughput of the assembly line, the second group of registers are configured to the lower layer decomposition unit in the first group of task distribution process, the configuration registers of the packet (n + 1) and the tasks of the packet (n) are processed in a time overlapping mode, the assembly line is free of air bubbles, the tasks of the packet (n + 1) are immediately distributed after the tasks of the packet (n) are distributed, waiting is eliminated, the duration of the configuration registers of the packet (n + 1) is hidden, and the throughput of the assembly line is greatly improved.
Drawings
Fig. 1 is a schematic diagram of a hardware architecture according to an embodiment of the present invention.
FIG. 2 is a diagram illustrating a normal mode packet switching sequence in the prior art.
FIG. 3 is a diagram of a prior art secure mode packet switching sequence.
Fig. 4 is a flowchart illustrating a method for improving pipeline throughput according to an embodiment of the present invention.
FIG. 5 is a timing diagram of a lower decomposition unit according to an embodiment of the present invention.
Fig. 6 is a schematic diagram of a normal mode packet switching timing sequence according to an embodiment of the present invention.
Fig. 7 is a schematic diagram of a security mode packet switching timing sequence according to an embodiment of the present invention.
Fig. 8 is a flowchart illustrating a method for improving pipeline throughput according to an embodiment of the present invention.
Fig. 9 is a flowchart illustrating a method for improving pipeline throughput according to an embodiment of the present invention.
Fig. 10 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in other sequences than those illustrated or described herein.
It should be understood that, in the various embodiments of the present invention, the sequence numbers of the processes do not mean the execution sequence, and the execution sequence of the processes should be determined by the functions and the internal logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
It should be understood that in the present application, "comprising" and "having" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements explicitly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that, in the present invention, "a plurality" means two or more. "and/or" is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "comprising a, B and C", "comprising a, B, C" means that all three of a, B, C are comprised, "comprising a, B or C" means comprising one of a, B, C, "comprising a, B and/or C" means comprising any 1 or any 2 or 3 of a, B, C.
It should be understood that in the present invention, "B corresponding to a", "a corresponds to B", or "B corresponds to a" means that B is associated with a, and B can be determined from a. Determining B from a does not mean determining B from a alone, but may also be determined from a and/or other information. And the matching of A and B means that the similarity of A and B is greater than or equal to a preset threshold value.
As used herein, "if" may be interpreted as "at \8230; \8230when" or "when 8230; \8230when" or "in response to a determination" or "in response to a detection", depending on the context.
The technical means of the present invention will be described in detail with reference to specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.
Fig. 1 is a schematic diagram of a hardware architecture according to an embodiment of the present invention. The software delivers the data to be processed to the hardware in packets (packets), the contents of which include two parts: a configuration register, and a task to be processed. To improve execution efficiency (hardware parallelism), the hardware is divided into two levels of parsing packets. The upper layer decomposition unit broadcasts the configuration register information to all the lower layer decomposition units, then decomposes the tasks in the package into thread bundles and distributes the thread bundles to each lower layer decomposition unit in sequence. Each lower layer decomposition unit processes the thread beam received by the lower layer decomposition unit according to the received configuration register information, and decomposes the thread beam into wave fronts to be delivered to the calculation unit for execution.
As in the conventional pipeline design of fig. 1, when the upper layer decomposition unit finishes processing the previous packet (n), and is about to switch to the next packet (n + 1), it is necessary to face unpredictable waiting. Because all the lower decomposition units are still processing the task of packet (n) at this time, if the upper decomposition unit immediately starts broadcasting the configuration register information of packet (n + 1), it will cause the lower decomposition unit to process the task of packet (n) according to the configuration of packet (n + 1), which will be a serious error; the register configuration of packet (n + 1) cannot begin to be broadcast until the upper decomposition unit knows that all lower decomposition units have distributed the task of packet (n) to the compute units. Fig. 2 shows a packet switching sequence in the normal mode.
In addition, if packet (n + 1) needs to use the final result of packet (n), it is still not enough to know that the lower decomposition unit has finished distributing the task of packet (n), and it must be known that the computing unit has finished executing all the tasks of packet (n) safely. FIG. 3 is a packet switching sequence for secure mode.
At packet switch time, the upper decomposition unit has to pause the pipeline to wait for the feedback results of all lower decomposition units (the calculation unit execution results are also fed back to the upper decomposition unit through the lower decomposition unit), which is an efficiency loss for pipeline throughput. If the packet contains more tasks, the distribution time length of the waiting register is not high in proportion to the processing time length of each packet, and the loss is acceptable; if the number of the tasks is less, the packet switching is more frequent, the waiting register distribution time is increased, and the loss is considerable.
In order to solve the technical problem, the main concept of the present invention is to add a set of configuration registers to a lower decomposition unit, and process the configuration registers of packet (n + 1) and the tasks of packet (n) in a time overlapping manner, so that the pipeline has no bubble, and immediately distribute the tasks of packet (n + 1) after the tasks of packet (n) are distributed, thereby eliminating waiting, hiding the duration of the configuration registers of packet (n + 1), and greatly improving the throughput of the pipeline.
Before this, two packet switching modes to be used in the present scheme need to be explained, one mode is a normal mode, and the second packet does not need to use the result of the first packet, so that packet switching can be completed without the upper decomposition unit sending out a distribution permission. The other is a security mode, the second packet needs to use the result of the first packet, only the upper decomposition unit can collect the information that the execution of the computing units transmitted by all the lower decomposition units is completed, and after the information is collected, the distribution permission can be sent to all the lower decomposition units to allow the second packet to be distributed.
Referring to fig. 4, which is a flowchart illustrating a method for improving pipeline throughput according to an embodiment of the present invention, an execution subject of the method shown in fig. 4 may be a software and/or hardware device. The execution subject of the present application may include, but is not limited to, at least one of: user equipment, network equipment, etc. The user equipment may include, but is not limited to, a computer, a smart phone, a Personal Digital Assistant (PDA), the above mentioned electronic equipment, and the like. The network device may include, but is not limited to, a single network server, a server group consisting of a plurality of network servers, or a cloud consisting of a large number of computers or network servers based on cloud computing, wherein cloud computing is one of distributed computing, one super virtual computer consisting of a group of loosely coupled computers. The present embodiment does not limit this. The method for improving the pipeline throughput comprises steps S401 to S404, and is applied to a common mode, and specifically comprises the following steps:
s401, distributing a first group of registers to a lower decomposition unit based on an upper decomposition unit, and distributing a first group of tasks in the tasks to be processed according to the first group of registers.
Specifically, the upper decomposition unit broadcasts the configuration register information to all the lower decomposition units, then decomposes the tasks in the packet into thread bundles (threads), and distributes the thread bundles to each lower decomposition unit in sequence. Each lower layer decomposition unit processes the thread bundle received by the lower layer decomposition unit according to the received configuration register information, and decomposes the thread bundle into wave fronts (wave fronts) to be delivered to the calculation unit for execution.
Illustratively, the to-be-processed task in this step includes a packet (n) and a next packet (n + 1), and in this step, a first set of registers is first configured in the lower decomposition unit, and then the packet (n) is distributed. The packet (n) is the first set of tasks mentioned above, and the next packet (n + 1) is the second set of tasks. It should be noted that, in the present application, the first group of tasks and the second group of tasks are all relative, and are not specific to a specific task, and the second group of tasks is a task to be processed immediately following the first group of tasks.
S402, in the process of distributing the first group of tasks, distributing a second group of registers to the lower-layer decomposition unit based on the upper-layer decomposition unit, and temporarily storing the registers by the lower-layer decomposition unit.
Specifically, referring to fig. 2, in the prior art, it takes a certain amount of time to distribute registers and tasks, and the time required for distributing tasks is usually longer, but it is not excluded that the time required for distributing registers is longer when tasks are lighter.
In contrast, referring to fig. 5 and 6, in the present scheme, during the first group of task distribution process of the lower decomposition unit, the upper decomposition unit configures the second group of registers to the lower decomposition unit, and the registers are temporarily stored by the lower decomposition unit. The configuration registers of packet (n + 1) and the task processing of packet (n) are overlapped in time, so that the pipeline has no bubble (bubble): after the lower-layer decomposition unit distributes the task of the packet (n), the task of the packet (n + 1) is immediately distributed, and waiting is eliminated; the duration of a configuration register of a packet (n + 1) is hidden, and the throughput of a pipeline is greatly improved.
And S403, distributing a second group of tasks in the to-be-processed tasks.
Specifically, the second group of tasks are distributed in the step, so that seamless switching is realized.
In some embodiments, before the distributing the second set of tasks, the method further comprises determining that the second set of registers is distributed completely, and determining that the first set of tasks is distributed completely. It will be appreciated that the second set of tasks need to be allocated in the second set of registers and the first set of tasks can be allocated only when they are allocated.
In practical applications, the second set of registers may be configured to complete when the first set of tasks is distributed, so as to implement seamless handover.
S404, the computing unit executes the distributed tasks to be processed.
It can be understood that seamless switching can be achieved in the scheme, because when packet (n) is completely distributed, register distribution of packet (n + 1) is completed, configuration can be completed only by switching through the selection module, the uncompleted part of packet (n) does not need to be waited, register configuration time of subsequent packets is hidden, waiting time in each scene is also optimized in different ranges, no bubble is maintained in a production line, and production line throughput is improved.
It should be noted that, in the present solution, it is not necessary to determine whether packet (n) is executed completely for the normal mode.
The scheme also comprises a device for improving the throughput of the assembly line, which is applied to a common mode, and the device for improving the throughput of the assembly line comprises the following steps:
the first module is used for distributing a first group of registers to the lower decomposition unit based on the upper decomposition unit and distributing a first group of tasks in the tasks to be processed according to the first group of registers;
the second module is used for distributing a second group of registers to the lower-layer decomposition unit based on the upper-layer decomposition unit in the process of distributing the first group of tasks of the lower-layer decomposition unit, and temporarily storing the second group of registers by the lower-layer decomposition unit;
the distribution module is used for distributing a second group of tasks in the tasks to be processed according to a second group of registers immediately by the lower decomposition unit after the first group of tasks are distributed and the second group of registers are distributed and configured;
and the execution module is used for executing the distributed tasks to be processed by the computing unit.
The apparatus of the foregoing embodiment may be correspondingly used to execute the steps in the method embodiment shown in fig. 4, and the implementation principle and technical effects are similar, which are not described herein again.
Referring to fig. 8, which is a schematic flowchart of a method for improving pipeline throughput according to an embodiment of the present invention, an execution subject of the method shown in fig. 8 may be a software and/or hardware device. The execution subject of the present application may include, but is not limited to, at least one of: user equipment, network equipment, etc. The user equipment may include, but is not limited to, a computer, a smart phone, a Personal Digital Assistant (PDA), and the electronic devices mentioned above. The network device may include, but is not limited to, a single network server, a server group consisting of a plurality of network servers, or a cloud consisting of a large number of computers or network servers based on cloud computing, wherein cloud computing is one of distributed computing, one super virtual computer consisting of a group of loosely coupled computers. The present embodiment does not limit this. The method for improving the pipeline throughput comprises steps S701 to S704, and is applied to a safety mode, and specifically comprises the following steps:
s701, distributing a first group of registers to a lower decomposition unit based on an upper decomposition unit, and distributing a first group of tasks in the tasks to be processed.
S702, in the process of distributing the first group of tasks, distributing a second group of registers to the lower decomposition unit based on the upper decomposition unit, and temporarily storing the registers by the lower decomposition unit.
And S703, receiving the information that the execution of the first group of tasks is finished according to the upper decomposition unit, and distributing a second group of tasks in the to-be-processed tasks after the distribution configuration of the second group of registers is finished.
Specifically, referring to fig. 7, in the security mode, only the upper decomposition unit can collect information of completion of execution of the computing units transmitted by all the lower decomposition units, and after the information is collected, a distribution permission can be sent to all the lower decomposition units to allow the second package to be distributed.
In some embodiments, the distributing, according to the information that the upper decomposition unit receives the completion of the execution of the first group of tasks, a second group of tasks among the to-be-processed tasks includes:
generating sub information of the executed first group of tasks in the computing unit according to the lower decomposition unit, and transmitting the sub information to the upper decomposition unit;
receiving all the sub information according to the upper decomposition unit, and determining that the first group of tasks are completely executed according to all the sub information;
and if the second group of registers are distributed and configured, distributing a second group of tasks in the tasks to be processed.
It can be understood that, in the above steps, the lower decomposition unit is used to collect the corresponding sub information, and then the sub information is transmitted to the upper decomposition unit, and the upper decomposition unit performs statistics.
S704, the computing unit executes the distributed tasks to be processed.
Step S701, step S702, and step S704 are similar to those in the embodiment described in fig. 4, and are not repeated herein.
It can be understood that, in the secure mode, although the task of the packet (n) still needs to be waited for completion of execution, the waiting time is only the time from completion of task distribution to completion of execution of the packet (n), the register configuration time of the subsequent packet is also hidden, the waiting time in each scene is also optimized in different ranges, so as to maintain that no bubble exists in the pipeline and improve the pipeline throughput.
Referring to fig. 9, which is a flowchart illustrating a method for improving pipeline throughput according to an embodiment of the present invention, an execution subject of the method shown in fig. 9 may be a software and/or hardware device. The execution subject of the present application may include, but is not limited to, at least one of: user equipment, network equipment, etc. The user equipment may include, but is not limited to, a computer, a smart phone, a Personal Digital Assistant (PDA), the above mentioned electronic equipment, and the like. The network device may include, but is not limited to, a single network server, a server group of multiple network servers, or a cloud of numerous computers or network servers based on cloud computing, wherein cloud computing is one type of distributed computing, a super virtual computer consisting of a cluster of loosely coupled computers. The present embodiment does not limit this. The method for improving the pipeline throughput includes steps S801 to S805, executed in a server, and includes the following steps:
s801, distributing a first group of registers to a lower decomposition unit based on an upper decomposition unit, and distributing a first group of tasks in the to-be-processed tasks;
s802, distributing a second group of registers to a lower-layer decomposition unit based on the upper-layer decomposition unit in the process of distributing the first group of tasks, and temporarily storing the registers by the lower-layer decomposition unit;
s803, if the current mode is the common mode, if the second group of registers are distributed and configured, distributing a second group of tasks in the tasks to be processed;
s804, if the current mode is a safe mode, receiving the information that the execution of the first group of tasks is finished according to the upper decomposition unit, and distributing a second group of tasks in the to-be-processed tasks after the distribution and configuration of a second group of registers are finished;
s805, the computing unit executes the distributed tasks to be processed.
In some embodiments, in the first group of task distribution process, after distributing the second group of registers to the lower decomposition unit based on the upper decomposition unit, the method further includes determining that the current mode is a normal mode or a secure mode.
It can be understood that, in the present solution, whether the current mode is the normal mode or the security mode is automatically determined, for example, the upper layer parsing unit may notify the lower layer parsing unit between packets, and the packet is switched to belong to the normal mode or the security mode this time, and then the corresponding distribution operation is implemented.
It should be noted that, in practical applications, if packet (n + 1) does not include a task, the new scheme may directly use the register configuration of packet (n + 2) to cover the configuration of packet (n + 1), which is equivalent to no need to process packet (n + 1) in hardware timing.
The invention also provides a device for improving the throughput of the production line, which is applied to a safe mode and comprises the following components:
the first module is used for distributing a first group of registers to the lower decomposition unit based on the upper decomposition unit and distributing a first group of tasks in the tasks to be processed according to the first group of registers;
the second module is used for distributing a second group of registers to the lower-layer decomposition unit based on the upper-layer decomposition unit in the process of distributing the first group of tasks of the lower-layer decomposition unit, and temporarily storing the second group of registers by the lower-layer decomposition unit;
the common module is used for distributing a second group of tasks in the tasks to be processed according to a second group of registers after the first group of tasks are distributed and the second group of registers are distributed if the current mode is the common mode;
the safety module is used for receiving information that the execution of the first group of tasks is finished according to the upper decomposition unit if the current mode is the safety mode, and distributing a second group of tasks in the tasks to be processed according to a second group of registers after the second group of registers are distributed;
and the execution module is used for executing the distributed tasks to be processed by the computing unit.
Referring to fig. 10, which is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present invention, the electronic device 90 includes: a processor 91, memory 92 and computer programs; wherein
A memory 92 for storing the computer program, which may also be a flash memory (flash). The computer program is, for example, an application program, a functional module, or the like that implements the above method.
A processor 91 for executing the computer program stored in the memory to implement the steps performed by the apparatus in the above method. Reference may be made in particular to the description relating to the previous method embodiments.
Alternatively, the memory 92 may be separate or integrated with the processor 91.
When the memory 92 is a device independent of the processor 91, the apparatus may further include:
a bus 93 for connecting the memory 92 and the processor 91.
The present invention also provides a readable storage medium, in which a computer program is stored, which, when being executed by a processor, is adapted to implement the methods provided by the various embodiments described above.
The readable storage medium may be a computer storage medium or a communication medium. Communication media includes any medium that facilitates transfer of a computer program from one place to another. Computer storage media may be any available media that can be accessed by a general purpose or special purpose computer. For example, a readable storage medium is coupled to the processor such that the processor can read information from, and write information to, the readable storage medium. Of course, the readable storage medium may also be an integral part of the processor. The processor and the readable storage medium may reside in an Application Specific Integrated Circuits (ASIC). Additionally, the ASIC may reside in user equipment. Of course, the processor and the readable storage medium may also reside as discrete components in a communication device. The readable storage medium may be a read-only memory (ROM), a random-access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
The present invention also provides a program product comprising execution instructions stored in a readable storage medium. The at least one processor of the device may read the execution instructions from the readable storage medium, and the execution of the execution instructions by the at least one processor causes the device to implement the methods provided by the various embodiments described above.
In the above embodiments of the electronic device, it should be understood that the Processor may be a Central Processing Unit (CPU), other general-purpose processors, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and these modifications or substitutions do not depart from the spirit of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for improving pipeline throughput is characterized in that the method is applied to a normal mode, in the normal mode, a second packet can complete packet switching without using the result of a first packet and without sending distribution permission by an upper decomposition unit, and the content of the packet comprises two parts: the method comprises the following steps of configuring a register and a task to be processed, and comprises the following steps:
distributing a first group of registers to a lower decomposition unit based on an upper decomposition unit, and distributing a first group of tasks in the tasks to be processed according to the first group of registers;
distributing a second group of registers to the lower decomposition unit based on the upper decomposition unit in the process of distributing the first group of tasks of the lower decomposition unit, and temporarily storing the second group of registers by the lower decomposition unit;
after the first group of tasks are distributed, if the second group of registers are distributed and configured, the lower layer decomposition unit can distribute a second group of tasks in the tasks to be processed immediately according to the second group of registers;
and the computing unit executes the distributed tasks to be processed.
2. The method of claim 1, wherein the lower decomposition unit, prior to distributing the second set of tasks, further comprises:
and determining that the second group of registers are completely distributed, and determining that the first group of tasks are completely distributed.
3. The method of claim 1 or 2, wherein the second set of register configurations are completed after the first set of tasks have been distributed, and wherein the second set of tasks have to wait if not completed.
4. A method for improving pipeline throughput, characterized in that it is applied to a secure mode in which a second packet requires the result of a first packet, and after an upper decomposition unit has collected information on the completion of the execution of computing units delivered by all lower decomposition units, it sends a distribution permission to all lower decomposition units to allow the start of the distribution of the second packet, and the content of the packet includes two parts: the method comprises the following steps of configuring a register and a task to be processed, and comprises the following steps:
distributing a first group of registers to a lower decomposition unit based on an upper decomposition unit, wherein the lower decomposition unit distributes a first group of tasks in the tasks to be processed according to the first group of registers;
in the process of distributing the first group of tasks, distributing a second group of registers to a lower-layer decomposition unit based on the upper-layer decomposition unit, and temporarily storing the registers by the lower-layer decomposition unit;
receiving information that the execution of the first group of tasks is finished according to the upper-layer decomposition unit, and after the distribution and configuration of the second group of registers are finished, distributing a second group of tasks in the tasks to be processed by the lower-layer decomposition unit according to the second group of registers;
and the computing unit executes the distributed tasks to be processed.
5. The method of claim 4, wherein the receiving, by the upper decomposition unit, the information that the first set of tasks has been executed, and after the second set of registers are distributed and configured, the lower decomposition unit distributes, by the lower decomposition unit, the second set of tasks among the tasks to be processed according to the second set of registers, the method includes:
generating sub information of the executed first group of tasks in the computing unit according to the lower decomposition unit, and transmitting the sub information to the upper decomposition unit;
receiving all the sub information according to the upper decomposition unit, and determining that the execution of the first group of tasks is finished according to all the sub information;
and distributing a second group of tasks in the tasks to be processed according to a second group of registers.
6. A method for improving pipeline throughput, comprising:
distributing a first group of registers to a lower layer decomposition unit based on an upper layer decomposition unit, and distributing a first group of tasks in the tasks to be processed according to the first group of registers;
distributing a second group of registers to the lower decomposition unit based on the upper decomposition unit in the process of distributing the first group of tasks of the lower decomposition unit, and temporarily storing the second group of registers by the lower decomposition unit;
if the current mode is the common mode, after the first group of tasks are distributed and the second group of registers are distributed, distributing a second group of tasks in the tasks to be processed according to the second group of registers;
if the current mode is a safe mode, receiving information of the completion of the execution of the first group of tasks according to the upper decomposition unit, and distributing a second group of tasks in the to-be-processed tasks according to a second group of registers after the second group of registers are distributed;
the computing unit executes the distributed tasks to be processed;
in the normal mode, the second packet can complete packet switching without using the result of the first packet and sending distribution permission by the upper decomposition unit; in the security mode, the second packet needs to use the result of the first packet, and after the upper decomposition unit collects the information that the execution of the computing units transmitted by all the lower decomposition units is completed, the upper decomposition unit sends a distribution permission to all the lower decomposition units to allow the second packet to be distributed; the contents of the package include two parts: and configuring a register and a task to be processed.
7. The method of claim 6, wherein during the first set of task distribution, after distributing the second set of registers to the lower decomposition unit based on the upper decomposition unit, further comprising:
and judging whether the current mode is a common mode or a safe mode.
8. An apparatus for improving pipeline throughput is applied to a normal mode, in which a second packet does not need to use the result of a first packet, and packet switching can be completed without issuing a distribution permission by an upper decomposition unit, and the content of the packet includes two parts: the device comprises a configuration register and a task to be processed, and comprises:
the first module is used for distributing a first group of registers to the lower decomposition unit based on the upper decomposition unit and distributing a first group of tasks in the tasks to be processed according to the first group of registers;
the second module is used for distributing a second group of registers to the lower-layer decomposition unit based on the upper-layer decomposition unit in the process of distributing the first group of tasks of the lower-layer decomposition unit, and temporarily storing the second group of registers by the lower-layer decomposition unit;
the distribution module is used for distributing a second group of tasks in the tasks to be processed according to a second group of registers immediately by the lower decomposition unit after the first group of tasks are distributed and the second group of registers are distributed and configured;
and the execution module is used for executing the distributed tasks to be processed by the computing unit.
9. An apparatus for improving pipeline throughput is applied to a secure mode, in which a second packet needs to be used as a result of a first packet, after an upper decomposition unit collects information that all computation units passed by a lower decomposition unit have completed execution, a distribution permission is sent to all lower decomposition units to allow the second packet to start distribution, and the content of the packet includes two parts: the device comprises a configuration register and a task to be processed, and comprises:
the system comprises a first module, a second module and a third module, wherein the first module is used for distributing a first group of registers to a lower-layer decomposition unit based on an upper-layer decomposition unit and distributing a first group of tasks in the tasks to be processed according to the first group of registers;
the second module is used for distributing a second group of registers to the lower-layer decomposition unit based on the upper-layer decomposition unit in the process of distributing the first group of tasks of the lower-layer decomposition unit, and temporarily storing the second group of registers by the lower-layer decomposition unit;
the common module is used for distributing a second group of tasks in the tasks to be processed according to a second group of registers after the first group of tasks are distributed and the second group of registers are distributed if the current mode is the common mode;
the safety module is used for receiving information that the execution of the first group of tasks is finished according to the upper decomposition unit if the current mode is the safety mode, and distributing a second group of tasks in the tasks to be processed according to a second group of registers after the second group of registers are distributed;
and the execution module is used for executing the distributed tasks to be processed by the computing unit.
10. An electronic device, comprising: memory, a processor and a computer program, the computer program being stored in the memory, the processor running the computer program to perform the method of any of claims 1 to 7.
CN202210941984.8A 2022-08-08 2022-08-08 Method and device for improving throughput of assembly line and electronic equipment Active CN115016847B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210941984.8A CN115016847B (en) 2022-08-08 2022-08-08 Method and device for improving throughput of assembly line and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210941984.8A CN115016847B (en) 2022-08-08 2022-08-08 Method and device for improving throughput of assembly line and electronic equipment

Publications (2)

Publication Number Publication Date
CN115016847A CN115016847A (en) 2022-09-06
CN115016847B true CN115016847B (en) 2022-12-20

Family

ID=83066154

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210941984.8A Active CN115016847B (en) 2022-08-08 2022-08-08 Method and device for improving throughput of assembly line and electronic equipment

Country Status (1)

Country Link
CN (1) CN115016847B (en)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101739235A (en) * 2008-11-26 2010-06-16 中国科学院微电子研究所 Processor unit for seamless connection between 32-bit DSP and universal RISC CPU
US10217183B2 (en) * 2013-12-20 2019-02-26 Nvidia Corporation System, method, and computer program product for simultaneous execution of compute and graphics workloads
CN109814927B (en) * 2018-12-19 2021-01-29 成都海光集成电路设计有限公司 Machine learning reasoning coprocessor
CN111142938B (en) * 2019-11-20 2023-07-07 深圳先进技术研究院 Task processing method and device for heterogeneous chip and electronic equipment
CN112596903A (en) * 2020-12-25 2021-04-02 达科小艾(南京)人工智能技术研发有限公司 Intelligent information processing method and device based on big data
CN114489792B (en) * 2021-03-25 2022-10-11 沐曦集成电路(上海)有限公司 Processor device and instruction execution method thereof

Also Published As

Publication number Publication date
CN115016847A (en) 2022-09-06

Similar Documents

Publication Publication Date Title
US8799564B2 (en) Efficiently implementing a plurality of finite state machines
CN109101662B (en) Block generation method, device, equipment and storage medium
CN111078436A (en) Data processing method, device, equipment and storage medium
US11394527B2 (en) Blockchain program and blockchain method
CN109213828B (en) Block generation method, device, equipment and storage medium
CN111556058A (en) Session processing method and device
CN115016847B (en) Method and device for improving throughput of assembly line and electronic equipment
CN109800078B (en) Task processing method, task distribution terminal and task execution terminal
CN102761545B (en) Service processing method, service processor and service processing system
CN103823712A (en) Data flow processing method and device for multi-CPU virtual machine system
CN111277626B (en) Server upgrading method and device, electronic equipment and medium
CN110955461B (en) Processing method, device, system, server and storage medium for computing task
CN114968603B (en) Capacity detection method and device supporting multi-gear load balance
CN113505000B (en) Multithreading processing method, device, system and storage medium in block chain
Small et al. Maximizing mpi point-to-point communication performance on rdma-enabled clusters with customized protocols
CN113704174A (en) Chip and data processing method
CN114463008A (en) Block chain transaction execution method and device based on parallel computing model
CN114925139B (en) Method and device for hierarchically synchronizing data chains and electronic equipment
CN109388496A (en) A kind of image concurrent processing method, apparatus and system based on more GPU cards
CN113395302B (en) Asynchronous data distributor, related apparatus and method
CN114567520A (en) Method, computer equipment and communication system for realizing collective communication
CN109800064B (en) Processor and thread processing method
CN115827163A (en) Graphic processor, operating method, electronic device, and storage medium
JP2867381B2 (en) Transaction processing load balancing method
CN117793167A (en) Connection processing method, device, equipment and medium in connection pool

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant