CN114185600A - Acceleration framework generation method and device for target detection task and electronic equipment - Google Patents

Acceleration framework generation method and device for target detection task and electronic equipment Download PDF

Info

Publication number
CN114185600A
CN114185600A CN202111336130.9A CN202111336130A CN114185600A CN 114185600 A CN114185600 A CN 114185600A CN 202111336130 A CN202111336130 A CN 202111336130A CN 114185600 A CN114185600 A CN 114185600A
Authority
CN
China
Prior art keywords
target detection
detection task
processing
flow
hardware processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111336130.9A
Other languages
Chinese (zh)
Inventor
黄雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202111336130.9A priority Critical patent/CN114185600A/en
Publication of CN114185600A publication Critical patent/CN114185600A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
    • G06F9/3869Implementation aspects, e.g. pipeline latches; pipeline synchronisation and clocking

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Image Processing (AREA)

Abstract

The application discloses an acceleration framework generation method and device for a target detection task and electronic equipment, and relates to the technical field of artificial intelligence, in particular to the fields of deep learning and computer vision. The specific implementation scheme is as follows: acquiring a plurality of processing flows of a target detection task; determining N different hardware processing units occupied by a target detection algorithm flow used by a target detection task according to a plurality of processing flows, wherein N is a positive integer; creating N pipelines according to N different hardware processing units; each hardware processing unit corresponds to one pipeline; and generating an acceleration frame of the target detection task according to the plurality of processing flows and the N pipelines. The method and the device can improve the utilization rate of the hardware processing unit and the algorithm execution efficiency, and realize the full play of the performance of the target detection algorithm.

Description

Acceleration framework generation method and device for target detection task and electronic equipment
Technical Field
The present application relates to the field of artificial intelligence technologies, in particular, to the field of deep learning and computer vision, and in particular, to a method and an apparatus for generating an acceleration framework for a target detection task, an electronic device, and a storage medium.
Background
The target detection is a processing method combining the segmentation and the identification of a target, and is widely applied to various fields of robot navigation, intelligent video monitoring, industrial detection, aerospace and the like.
At present, an AI (Artificial Intelligence) chip has a plurality of computing units such as a CPU (Central Processing Unit), an NPU (neutral-network Processing Unit) or a GPU (Graphic Processing Unit) and image Processing. In the related art, when the target detection is performed by reading a camera in real time, the target detection may involve calculation by a plurality of calculation units such as a CPU, an NPU, and image processing. How to improve the algorithm performance of target detection has become a technical problem to be solved urgently in the target detection process.
Disclosure of Invention
The application provides an acceleration framework generation method and device for a target detection task and electronic equipment.
According to a first aspect of the present application, there is provided an acceleration framework generation method for an object detection task, including:
acquiring a plurality of processing flows of a target detection task;
determining N different hardware processing units occupied by a target detection algorithm flow used by the target detection task according to the plurality of processing flows, wherein N is a positive integer;
creating N pipelines according to the N different hardware processing units; wherein each hardware processing unit corresponds to a pipeline;
and generating an acceleration frame of the target detection task according to the plurality of processing flows and the N pipelines.
According to a second aspect of the present application, there is provided an acceleration framework generation apparatus for an object detection task, comprising:
the acquisition module is used for acquiring a plurality of processing flows of the target detection task;
a determining module, configured to determine, according to the multiple processing flows, N different hardware processing units occupied by a target detection algorithm flow used by the target detection task, where N is a positive integer;
the creating module is used for creating N pipelines according to the N different hardware processing units; wherein each hardware processing unit corresponds to a pipeline;
and the generating module is used for generating an acceleration frame of the target detection task according to the plurality of processing flows and the N pipelines.
According to a third aspect of the present application, there is provided an electronic device comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect.
According to a fourth aspect of the present application, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of the aforementioned first aspect.
According to a fifth aspect of the present application, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method according to the preceding first aspect.
According to the technical scheme, the hardware processing units occupied by the target detection algorithm process are determined by obtaining the processing process of the target detection task, a basis is provided for subsequently establishing a production line, the production line is established according to different hardware processing units, an acceleration frame of the target detection task is generated by combining the production line and a plurality of processing processes, the parallel execution of the plurality of production lines is realized, the mutual interference is avoided, and each hardware processing unit can continuously execute the processing processes. Therefore, the acceleration frame generated in the application has a plurality of pipelines of the hardware processing units, the pipelines are executed in parallel and do not interfere with each other, so that the utilization rate of the hardware units and the algorithm execution efficiency are improved, the problem that the performance of target detection is constrained by each link in the flow is solved, the performance bottleneck caused by excessively depending on a certain hardware processing unit is overcome, and the performance of the target detection algorithm is fully exerted.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
FIG. 1 is a flowchart of an acceleration framework generation method for object detection task provided in the present application;
FIG. 2 is a flowchart of another acceleration framework generation method for a target detection task according to an embodiment of the present disclosure;
FIG. 3 is a flowchart of another acceleration framework generation method for object detection task according to an embodiment of the present disclosure;
FIG. 4 is a flowchart of another acceleration framework generation method for object detection task according to an embodiment of the present disclosure;
FIG. 5 is an exemplary diagram of an acceleration framework for object detection tasks provided by an embodiment of the present application;
FIG. 6 is a block diagram illustrating an accelerating framework generating apparatus for object detection task according to an embodiment of the present disclosure;
FIG. 7 is a block diagram of another acceleration framework generation apparatus for object detection task according to an embodiment of the present disclosure;
FIG. 8 is a block diagram of another acceleration framework generation apparatus for object detection task according to an embodiment of the present disclosure;
FIG. 9 is a block diagram of another acceleration framework generation apparatus for object detection task according to an embodiment of the present disclosure;
fig. 10 is a block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Currently, an AI (Artificial Intelligence) chip has a plurality of computing units such as a CPU (Central Processing Unit), an NPU (Neural-network Processing Unit) or a GPU (Graphic Processing Unit) and image Processing. In the related art, the traditional scheme adopted by the target detection and reading camera for real-time detection is serial according to an algorithm flow, and the flow may involve the calculation of a plurality of calculation units such as a CPU, an NPU, image processing and the like. However, since the flow uses the computing unit as required, in the conventional scheme, a large number of idle load situations exist in the hardware computing unit, the large number of idle load greatly limits the reasonable balanced use of hardware resources, the performance of algorithm processing is constrained by each link in the flow, and the hardware is called as required, so that the throughput of the hardware is low. For example, when a certain flow or a processing unit cannot process due to an accident, the whole flow is down, and a local problem has a large influence on the whole situation. In conclusion, the performance of the target detection algorithm is not well exerted.
In view of the above problems, the present application provides an acceleration frame generating method for an object detection task, and it should be noted that the acceleration frame generating method for an object detection task according to the embodiments of the present application is applicable to an acceleration frame generating device for an object detection task according to the embodiments of the present application, and the acceleration frame generating device for an object detection task may be configured on an electronic device. As shown in fig. 1, fig. 1 is a flowchart of an acceleration framework generation method for an object detection task provided in the present application, where the acceleration framework generation method for the object detection task includes the following steps:
step 101, acquiring a plurality of processing flows of a target detection task.
In this embodiment of the present application, the target detection task may be a task of performing target detection on a plurality of images in a certain video, and detecting whether a target object is included in the video.
As an example, the target detection task is to detect whether an object B appears in the video a, and the processing flow of the target detection task is a specific operation flow of detecting whether the object B is included in the video a.
It should be noted that the main flow of the target detection algorithm can be roughly divided into four steps: (1) the method comprises the steps of acquiring image data, acquiring original image data processed by a target detection algorithm from a camera or a network and the like, wherein the process mostly occurs in a CPU or an image generation unit. (2) Image data preprocessing, which is to process the original image data by image processing, quantization or normalization to obtain data that is easier to process, and the calculation in this process mostly occurs in a CPU or an image processing unit. (3) Image data reasoning: and feeding the preprocessed data to a target detection algorithm for algorithmic reasoning, wherein the calculation is performed in a GPU or NPU unit due to large calculation amount in the process. (4) And (3) reasoning result analysis: and (4) performing regression, filtering, noise reduction and other processing on the target detection model to obtain a target detection result, wherein the calculation is performed in the CPU.
In the embodiment of the present application, the processing flow may be split for the target detection task based on the steps of the target detection algorithm, so as to obtain a plurality of processing flows of the target detection task.
And 102, determining N different hardware processing units occupied by a target detection algorithm flow used by a target detection task according to a plurality of processing flows, wherein N is a positive integer.
For example, in response to the triggering of the target detection task, three processing flows of the target detection task are obtained, assuming that the hardware processing unit occupied by the processing flow 1 is a GPU, assuming that the hardware processing unit occupied by the processing flow 2 is an NPU, and assuming that the hardware processing unit occupied by the processing flow 3 is a GPU, it can be determined that the three processing flows occupy two different hardware processing units.
103, creating N pipelines according to N different hardware processing units; wherein each hardware processing unit corresponds to a pipeline.
For example, in response to the triggering of the target detection task, three processing flows of the target detection task are obtained, assuming that the hardware processing unit occupied by the processing flow 1 is a GPU, assuming that the hardware processing unit occupied by the processing flow 2 is a GPU, and assuming that the hardware processing unit occupied by the processing flow 3 is an NPU, it can be determined that the three processing flows occupy two different hardware processing units, and two pipelines are created according to the GPU and the NPU, namely, a GPU pipeline and an NPU pipeline.
And 104, generating an acceleration frame of the target detection task according to the plurality of processing flows and the N pipelines.
It should be noted that there are many application scenarios of the acceleration frame of the target detection task, as an example, the target detection task is to detect whether an object B appears in a video a, obtain frame images from the video a, input the frame images into the acceleration frame of the target detection task one by one, each frame image can be processed in N pipelines according to a processing flow, and finally, the acceleration frame of the target detection task outputs to obtain whether an object B exists in each frame image, so that whether an object B appears in the video a can be determined.
According to the method for generating the acceleration frame for the target detection task, the hardware processing units occupied by the target detection algorithm flow are determined by obtaining the processing flow of the target detection task, a basis is provided for subsequently establishing a production line, the production line is established according to different hardware processing units, the acceleration frame of the target detection task is generated by combining the production line and a plurality of processing flows, a plurality of production lines are executed in parallel without mutual interference, and each hardware processing unit can continuously execute the processing flows.
For example, in the related art, after the CPU preprocesses the original image data, it is necessary to wait for the preprocessing result to obtain the detection result through operations such as reasoning and analysis, and then preprocess the new original image data. According to the method and the device, after the CPU preprocesses the original image data, the new original image data can be directly processed and detected without waiting for obtaining the detection result of the original image data. Therefore, the acceleration frame generated in the application has a plurality of pipelines of the hardware processing units, the pipelines are executed in parallel and do not interfere with each other, so that the utilization rate of the hardware units and the algorithm execution efficiency are improved, the problem that the performance of target detection is constrained by each link in the flow is solved, the performance bottleneck caused by excessively depending on a certain hardware processing unit is overcome, and the performance of the target detection algorithm is fully exerted.
It should be noted that, in some embodiments of the present application, step splitting may be performed on the target detection task, and a processing flow of the target detection task is obtained based on a result of the step splitting. Fig. 2 is a flowchart of another acceleration framework generation method for an object detection task according to an embodiment of the present application.
As shown in fig. 2, the acceleration framework generation method for the object detection task includes the following steps:
step 201, responding to the trigger of the target detection task, and performing step splitting on the target detection task.
As an example, the target detection task is to perform target detection on multiple images in a certain video, split the steps to obtain a result that the multiple images in the video are extracted, perform image data preprocessing on each image, perform image data inference on the preprocessed result, and finally analyze the inference result to obtain a target detection result.
As an example, the target detection task is to detect whether an object B appears in the video a, and the target detection task is split into the following steps: (1) acquiring a plurality of original images from the video A, (2) carrying out preprocessing operation on the original images, wherein the preprocessing operation comprises image processing, quantization or normalization and the like, and obtaining data which is easier to process. (3) And carrying out algorithmic reasoning on the preprocessed data. (4) And (5) performing regression, filtering, noise reduction and other processing on the inference result to obtain a target detection result.
Step 202, obtaining a plurality of processing flows of the target detection task according to the step splitting result of the target detection task.
As an example, the target detection task is to detect whether an object B appears in the video a, and the target detection task is split into the following steps: (1) acquiring a plurality of original images from the video A, (2) carrying out preprocessing operation on the original images, wherein the preprocessing operation comprises image processing, quantization or normalization and the like, and obtaining data which is easier to process. (3) And carrying out algorithmic reasoning on the preprocessed data. (4) And (5) performing regression, filtering, noise reduction and other processing on the inference result to obtain a target detection result. After the splitting result in the step is obtained, a plurality of processing flows can be determined according to the result, wherein each processing flow corresponds to a target detection flow of an original image, and one of the processing flows is taken as an example: preprocessing an extracted original image, carrying out algorithm reasoning on the processed data, and carrying out regression, filtering, noise reduction and other processing on the deduced result to obtain a target detection result of the original image.
Step 203, determining N different hardware processing units occupied by a target detection algorithm flow used by the target detection task according to the plurality of processing flows, wherein N is a positive integer.
In the embodiment of the present application, step 203 may be implemented by using any one of the embodiments of the present application, which is not limited in this embodiment and is not described again.
Step 204, creating N pipelines according to N different hardware processing units; wherein each hardware processing unit corresponds to a pipeline.
In the embodiment of the present application, step 204 may be implemented by using any one of the embodiments of the present application, which is not limited in this embodiment and is not described again.
And step 205, generating an acceleration frame of the target detection task according to the plurality of processing flows and the N pipelines.
In the embodiment of the present application, step 205 may be implemented by using any one of the embodiments of the present application, which is not limited in this embodiment and is not described again.
According to the accelerated frame generation method for the target detection task, the steps are split for target detection, so that a plurality of processing flows of the target detection can be obtained conveniently, basic guarantee is provided for a pipeline which can be created smoothly through the processing flows subsequently, and the utilization rate and the algorithm execution rate of each hardware processing unit can be increased subsequently. And the smooth generation of the acceleration framework of the target detection task is ensured.
It should be noted that in some embodiments of the present application, different pipelines may be created by determining hardware processing units. Fig. 3 is a flowchart of another acceleration framework generation method for an object detection task according to an embodiment of the present application.
As shown in fig. 3, the acceleration framework generation method for the object detection task includes the following steps:
step 301, acquiring a plurality of processing flows of the target detection task.
In the embodiment of the present application, step 301 may be implemented by using any one of the embodiments of the present application, which is not limited in this embodiment and is not described again.
And step 302, splitting each processing flow according to different hardware computing resources occupied by the executed processes in the step.
And 303, after splitting, connecting the steps of each processing flow in series again to obtain a hardware processing unit occupied by each processing flow.
And step 304, counting the hardware processing units occupied by each processing flow, and determining N different hardware processing units occupied by the target detection algorithm flow used by the target detection task according to the counting result.
Step 305, creating N pipelines according to N different hardware processing units; wherein each hardware processing unit corresponds to a pipeline.
As an implementation mode, subdivision is carried out according to algorithm flows, and hardware processing units occupied by each flow are marked. The pipeline created by different hardware processing units can be created by adopting multithreading and multiprocessing technology.
It should be noted that, according to the execution sequence of the steps, the respective splitting step of each processing flow is added to the corresponding pipeline, as the executed process in the step occupies the hardware computing resource.
And step 306, generating an acceleration frame of the target detection task according to the plurality of processing flows and the N pipelines.
In the embodiment of the present application, step 306 may be implemented by using any one of the embodiments of the present application, which is not limited in this embodiment and is not described again.
According to the method for generating the acceleration frame for the target detection task, N different hardware processing units occupied by a target detection algorithm flow used by the target detection task are determined according to different hardware processing units occupied by the processing flows, the number of subsequent pipelines is determined, parallel execution of multiple pipelines is achieved, and utilization rate of the hardware processing units and algorithm execution efficiency are enhanced through the parallel pipelines which are not interfered with each other.
It should be noted that fig. 4 is a flowchart of another acceleration framework generation method for an object detection task according to an embodiment of the present application.
As shown in FIG. 4, in some embodiments of the present application, a production consumption relationship is added on a pipeline by determining an input-output relationship, so as to obtain an acceleration framework of a target detection task. The acceleration framework generation method for the target detection task comprises the following steps:
step 401, acquiring a plurality of processing flows of the target detection task.
In the embodiment of the present application, step 401 may be implemented by using any one of the embodiments of the present application, which is not limited in this embodiment and is not described again.
Step 402, determining N different hardware processing units occupied by a target detection algorithm flow used by a target detection task according to a plurality of processing flows, wherein N is a positive integer.
In the embodiment of the present application, step 402 may be implemented by using any one of the embodiments of the present application, which is not limited in this embodiment and is not described again.
Step 403, creating N pipelines according to N different hardware processing units; wherein each hardware processing unit corresponds to a pipeline.
In the embodiment of the present application, step 403 may be implemented by using any one of the embodiments of the present application, which is not limited in this embodiment and is not described again.
And step 404, analyzing the input-output relationship of each processing flow in each pipeline according to the relationship among the respective steps of each processing flow.
As an implementation manner, according to a target detection algorithm flow, an IO (Input/Output) relationship of each pipeline is analyzed, and the IO relationship is modified into a producer-consumer relationship. Adding the production consumption relationship of the production line according to the respective steps of the processing flow, wherein the former step is a producer and the next step is a consumer.
And 405, adding a producer-consumer relationship to each pipeline according to the input-output relationship of each processing flow in each pipeline to obtain an acceleration framework of the target detection task.
In one implementation, as shown in fig. 5, the acceleration frame of the object detection task is divided into two directions, a longitudinal direction and a transverse direction: (1) transverse: the system is divided into producer service and consumer service, and is matched with IO of each processing unit, and some producers and some consumers are matched. (2) Longitudinal direction: and distributing a plurality of pipelines according to the number of the hardware processing units, wherein the pipelines are executed in parallel without mutual interference.
According to the accelerated frame generation method for the target detection task, the producer-consumer relationship is added to each production line, each step in the processing flow can be guaranteed to be independently and efficiently operated in combination with the production line, the synchronization and consistency of data processing are guaranteed, the timeliness of target detection is improved, the utilization rate of a hardware processing unit and the algorithm execution efficiency are improved, the problem that the performance of target detection is constrained by each link in the flow is solved, the performance bottleneck caused by excessive dependence on a certain hardware processing unit is overcome, and the performance of the target detection algorithm is fully exerted.
In order to implement the above embodiments, the present application further provides an acceleration framework generation apparatus for an object detection task.
Fig. 6 is a block diagram of an acceleration framework generating apparatus for an object detection task according to an embodiment of the present application. As shown in fig. 6, the acceleration framework generating apparatus for the object detection task may include: an acquisition module 610, a determination module 620, a creation module 630, and a generation module 640.
The obtaining module 610 is configured to obtain a plurality of processing flows of the target detection task.
The determining module 620 is configured to determine, according to the multiple processing flows, N different hardware processing units occupied by a target detection algorithm flow used by the target detection task, where N is a positive integer.
A creating module 630, configured to create N pipelines according to N different hardware processing units; wherein each hardware processing unit corresponds to a pipeline.
A generating module 640 for generating an acceleration framework of the target detection task according to the plurality of processing flows and the N pipelines
According to the accelerated frame generation device for the target detection task, the processing flow of the target detection task is obtained, the hardware processing units occupied by the target detection algorithm flow are determined, a basis is provided for the subsequent establishment of the assembly line, the assembly line is established according to different hardware processing units, the assembly line is combined with a plurality of processing flows, the accelerated frame of the target detection task is generated, the parallel execution of a plurality of assembly lines is realized, the assembly lines are not interfered with one another, and each hardware processing unit can continuously execute the processing flows.
For example, in the related art, after the CPU preprocesses the original image data, it is necessary to wait for the preprocessing result to obtain the detection result through operations such as reasoning and analysis, and then preprocess the new original image data. According to the method and the device, after the CPU preprocesses the original image data, the new original image data can be directly processed and detected without waiting for obtaining the detection result of the original image data. Therefore, the acceleration frame generated in the application has a plurality of pipelines of the hardware processing units, the pipelines are executed in parallel and do not interfere with each other, so that the utilization rate of the hardware units and the algorithm execution efficiency are improved, the problem that the performance of target detection is constrained by each link in the flow is solved, the performance bottleneck caused by excessively depending on a certain hardware processing unit is overcome, and the performance of the target detection algorithm is fully exerted.
In some embodiments of the present application, as shown in fig. 7, fig. 7 is a block diagram of an acceleration framework generating apparatus for object detection tasks according to another embodiment provided in this application, where the obtaining module 710 in the acceleration framework generating apparatus for object detection tasks may include: a first splitting unit 711 and an obtaining unit 712.
The first splitting unit 711 is configured to perform step splitting on a target detection task in response to a trigger of the target detection task.
The obtaining unit 712 is configured to obtain multiple processing flows of the target detection task according to the step splitting result of the target detection task.
According to the accelerated frame generation device for the target detection task, the steps are split for target detection, so that a plurality of processing flows of the target detection can be obtained conveniently, basic guarantee is provided for a flow line which can be created smoothly through the processing flows, and the utilization rate and the algorithm execution rate of each hardware processing unit can be increased subsequently. And the smooth generation of the acceleration framework of the target detection task is ensured.
Wherein, 710 and 740 in fig. 7 and 610 and 640 in fig. 6 have the same functions and structures.
In some embodiments of the present application, as shown in fig. 8, fig. 8 is a block diagram of an acceleration framework generating apparatus for object detection task according to another embodiment provided in this application, where the determining module 820 in the acceleration framework generating apparatus for object detection task may include: a second splitting unit 821, a concatenation unit 822 and a statistics unit 823.
The second splitting unit 821 is configured to split each processing flow in steps according to different hardware computing resources occupied by the processes executed in the steps.
And a concatenation unit 822, configured to concatenate the respective steps of each processing flow again after splitting, so as to obtain a hardware processing unit occupied by each processing flow.
The statistical unit 823 is configured to perform statistics on the hardware processing units occupied by each processing flow, and determine, according to a statistical result, N different hardware processing units occupied by the target detection algorithm flow used by the target detection task.
It should be noted that, N pipelines are created according to N different hardware processing units, and the respective splitting step of each processing flow is added to the corresponding pipeline according to the execution sequence of the steps and the hardware computing resources occupied by the executed processes in the steps.
According to the accelerating frame generation device for the target detection task, N different hardware processing units occupied by a target detection algorithm flow used by the target detection task are determined according to different hardware processing units occupied by processing flows, the number of subsequent pipelines is determined, parallel execution of multiple pipelines is achieved, and utilization rate of the hardware processing units and algorithm execution efficiency are improved through the parallel pipelines which are not interfered with each other.
Wherein 810-840 in FIG. 8 and 710-740 in FIG. 7 have the same functions and structures.
In some embodiments of the present application, as shown in fig. 9, fig. 9 is a block diagram of an acceleration framework generating apparatus for a target detection task according to another embodiment provided in this application, where the generating module 940 in the acceleration framework generating apparatus for a target detection task may include: a relationship determination unit 941 and a generation unit 942.
The relationship determining unit 941 is configured to analyze an input/output relationship of each processing flow in each pipeline according to a relationship between respective steps of each processing flow.
The generating unit 942 is configured to add a producer-consumer relationship to each pipeline according to an input-output relationship of each processing flow in each pipeline, so as to obtain an acceleration framework of the target detection task.
According to the accelerated frame generation device for the target detection task, the producer-consumer relationship is added on each production line, each step in the processing flow can be combined with the production line to independently and efficiently operate, the synchronization and consistency of data processing are guaranteed, the timeliness of target detection is improved, the utilization rate of a hardware processing unit and the algorithm execution efficiency are improved, the problem that the performance of the target detection is constrained by each link in the flow is solved, the performance bottleneck caused by excessive dependence on a certain hardware processing unit is overcome, and the performance of the target detection algorithm is fully exerted.
Wherein 910-940 in FIG. 9 and 810-840 in FIG. 8 have the same functions and structures.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.
As shown in fig. 10, is a block diagram of an electronic device for a method of accelerating generation of a framework for object detection tasks according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 10, the electronic apparatus includes: one or more processors 1001, memory 1002, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). Fig. 10 illustrates an example of one processor 1001.
The memory 1002 is a non-transitory computer readable storage medium provided herein. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method for accelerated framework generation for object detection tasks provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method for accelerated framework generation for target detection tasks provided herein.
The memory 1002, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the method for accelerated framework generation for object detection tasks in the embodiments of the present application (e.g., the obtaining module 610, the determining module 620, the creating module 630, and the generating module 640 shown in fig. 6). The processor 1001 executes various functional applications of the server and data processing, i.e., implements the method for accelerated framework generation for object detection tasks in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 1002.
The memory 1002 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of the electronic device generated by the acceleration framework for the object detection task, and the like. Further, the memory 1002 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 1002 may optionally include memory located remotely from the processor 1001, which may be connected over a network to the acceleration framework generation electronics for the object detection task. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device of the method for accelerated framework generation for object detection tasks may further comprise: an input device 1003 and an output device 1004. The processor 1001, the memory 1002, the input device 1003, and the output device 1004 may be connected by a bus or other means, and the bus connection is exemplified in fig. 10.
The input device 1003 may receive input numeric or character information and generate key signal inputs related to user settings and function controls of the electronic device generated by the acceleration framework for the object detection task, such as a touch screen, keypad, mouse, track pad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick, or other input device. The output devices 1004 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), the internet, and blockchain networks.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.
According to the technical scheme of the embodiment of the application, the hardware processing units occupied by the target detection algorithm process are determined by obtaining the processing process of the target detection task, a foundation is provided for the subsequent establishment of the production line, the production line is established according to different hardware processing units, the acceleration frame of the target detection task is generated by combining the production line and a plurality of processing processes, the parallel execution of a plurality of production lines is realized, the mutual interference is avoided, and each hardware processing unit can continuously execute the processing process.
For example, in the related art, after the CPU preprocesses the original image data, it is necessary to wait for the preprocessing result to obtain the detection result through operations such as reasoning and analysis, and then preprocess the new original image data. According to the method and the device, after the CPU preprocesses the original image data, the new original image data can be directly processed and detected without waiting for obtaining the detection result of the original image data. Therefore, the acceleration frame generated in the application has a plurality of pipelines of the hardware processing units, the pipelines are executed in parallel and do not interfere with each other, so that the utilization rate of the hardware units and the algorithm execution efficiency are improved, the problem that the performance of target detection is constrained by each link in the flow is solved, the performance bottleneck caused by excessively depending on a certain hardware processing unit is overcome, and the performance of the target detection algorithm is fully exerted.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (13)

1. An acceleration framework generation method for an object detection task, comprising:
acquiring a plurality of processing flows of a target detection task;
determining N different hardware processing units occupied by a target detection algorithm flow used by the target detection task according to the plurality of processing flows, wherein N is a positive integer;
creating N pipelines according to the N different hardware processing units; wherein each hardware processing unit corresponds to a pipeline;
and generating an acceleration frame of the target detection task according to the plurality of processing flows and the N pipelines.
2. The method of claim 1, wherein the obtaining a plurality of process flows for a target detection task comprises:
responding to the trigger of a target detection task, and performing step splitting on the target detection task;
and acquiring a plurality of processing flows of the target detection task according to the step splitting result of the target detection task.
3. The method according to claim 1, wherein the determining, according to the plurality of processing flows, N different hardware processing units occupied by a target detection algorithm flow used by the target detection task includes:
according to different hardware computing resources occupied by the executed processes in the steps, performing step splitting on each processing flow;
after splitting, connecting the steps of each processing flow in series again to obtain a hardware processing unit occupied by each processing flow;
and counting the hardware processing units occupied by each processing flow, and determining N different hardware processing units occupied by the target detection algorithm flow used by the target detection task according to the counting result.
4. The method of claim 3, after said creating N pipelines, further comprising:
and according to the execution sequence of the steps, adding the respective splitting step of each processing flow into the corresponding pipeline according to the hardware computing resource occupied by the executed process in the steps.
5. The method of claim 1, wherein said generating an acceleration framework for the target detection task from the plurality of process flows and the N pipelines comprises:
analyzing the input-output relationship of each processing flow in each pipeline according to the relationship among the steps of each processing flow;
and adding a producer-consumer relationship to each flow line according to the input-output relationship of each processing flow in each flow line to obtain an acceleration frame of the target detection task.
6. An acceleration framework generation apparatus for an object detection task, comprising:
the acquisition module is used for acquiring a plurality of processing flows of the target detection task;
a determining module, configured to determine, according to the multiple processing flows, N different hardware processing units occupied by a target detection algorithm flow used by the target detection task, where N is a positive integer;
the creating module is used for creating N pipelines according to the N different hardware processing units; wherein each hardware processing unit corresponds to a pipeline;
and the generating module is used for generating an acceleration frame of the target detection task according to the plurality of processing flows and the N pipelines.
7. The apparatus of claim 6, wherein the means for obtaining comprises:
the first splitting unit is used for responding to the triggering of a target detection task and splitting the target detection task in steps;
and the acquisition unit is used for acquiring a plurality of processing flows of the target detection task according to the step splitting result of the target detection task.
8. The apparatus of claim 6, wherein the means for determining comprises:
the second splitting unit is used for splitting each processing flow according to different hardware computing resources occupied by the executed processes in the steps;
the series unit is used for re-connecting the steps of each processing flow in series after splitting so as to obtain a hardware processing unit occupied by each processing flow;
and the statistical unit is used for counting the hardware processing units occupied by each processing flow and determining N different hardware processing units occupied by the target detection algorithm flow used by the target detection task according to the statistical result.
9. The apparatus of claim 8, wherein the creation module further comprises:
and according to the execution sequence of the steps, adding the respective splitting step of each processing flow into the corresponding pipeline according to the hardware computing resource occupied by the executed process in the steps.
10. The apparatus of claim 6, wherein the generating means comprises:
the relation determining unit is used for analyzing the input-output relation of each processing flow in each pipeline according to the relation among the steps of each processing flow;
and the generating unit is used for adding a producer-consumer relationship to each pipeline according to the input-output relationship of each processing flow in each pipeline to obtain an acceleration frame of the target detection task.
11. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.
12. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-5.
13. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 5.
CN202111336130.9A 2021-11-11 2021-11-11 Acceleration framework generation method and device for target detection task and electronic equipment Pending CN114185600A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111336130.9A CN114185600A (en) 2021-11-11 2021-11-11 Acceleration framework generation method and device for target detection task and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111336130.9A CN114185600A (en) 2021-11-11 2021-11-11 Acceleration framework generation method and device for target detection task and electronic equipment

Publications (1)

Publication Number Publication Date
CN114185600A true CN114185600A (en) 2022-03-15

Family

ID=80601502

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111336130.9A Pending CN114185600A (en) 2021-11-11 2021-11-11 Acceleration framework generation method and device for target detection task and electronic equipment

Country Status (1)

Country Link
CN (1) CN114185600A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160034617A1 (en) * 2014-07-31 2016-02-04 National Instruments Corporation Prototyping an Image Processing Algorithm and Emulating or Simulating Execution on a Hardware Accelerator to Estimate Resource Usage or Performance
CN106358003A (en) * 2016-08-31 2017-01-25 华中科技大学 Video analysis and accelerating method based on thread level flow line
CN111340237A (en) * 2020-03-05 2020-06-26 腾讯科技(深圳)有限公司 Data processing and model operation method, device and computer equipment
CN111782403A (en) * 2020-07-17 2020-10-16 Oppo广东移动通信有限公司 Data processing method and device and electronic equipment
CN112650590A (en) * 2020-12-29 2021-04-13 北京奇艺世纪科技有限公司 Task processing method, device and system, and task distribution method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160034617A1 (en) * 2014-07-31 2016-02-04 National Instruments Corporation Prototyping an Image Processing Algorithm and Emulating or Simulating Execution on a Hardware Accelerator to Estimate Resource Usage or Performance
CN106358003A (en) * 2016-08-31 2017-01-25 华中科技大学 Video analysis and accelerating method based on thread level flow line
CN111340237A (en) * 2020-03-05 2020-06-26 腾讯科技(深圳)有限公司 Data processing and model operation method, device and computer equipment
CN111782403A (en) * 2020-07-17 2020-10-16 Oppo广东移动通信有限公司 Data processing method and device and electronic equipment
CN112650590A (en) * 2020-12-29 2021-04-13 北京奇艺世纪科技有限公司 Task processing method, device and system, and task distribution method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李林;张盛兵;吴鹃;: "面向图像识别的深度学习VLIW处理器设计", 西北工业大学学报, no. 01, 15 February 2020 (2020-02-15) *

Similar Documents

Publication Publication Date Title
EP3836077A2 (en) Product defect detection method and apparatus, electronic device, storage medium and program
EP3907666A2 (en) Method, apparatus, electronic device, readable storage medium and program for constructing key-point learning model
CN115759252A (en) Scheduling method, device, equipment and medium of deep learning inference engine
CN112149741B (en) Training method and device for image recognition model, electronic equipment and storage medium
CN111506401B (en) Automatic driving simulation task scheduling method and device, electronic equipment and storage medium
CN111695519B (en) Method, device, equipment and storage medium for positioning key point
CN112270399A (en) Operator registration processing method and device based on deep learning and electronic equipment
CN111814959A (en) Model training data processing method, device and system and storage medium
JP2022017588A (en) Training method of deep-running framework, device, and storage medium
CN110706147A (en) Image processing environment determination method and device, electronic equipment and storage medium
CN111753911A (en) Method and apparatus for fusing models
CN111563541B (en) Training method and device of image detection model
JP2021174531A (en) Target tracking method and device, electronic equipment, storage medium, and computer program
EP3872704A2 (en) Header model for instance segmentation, instance segmentation model, image segmentation method and apparatus
CN111783644B (en) Detection method, detection device, detection equipment and computer storage medium
CN111738325B (en) Image recognition method, device, equipment and storage medium
US20210125353A1 (en) Method and apparatus for detecting and tracking target, electronic device and storage media
CN111696134B (en) Target detection method and device and electronic equipment
CN112560854A (en) Method, apparatus, device and storage medium for processing image
CN112560772A (en) Face recognition method, device, equipment and storage medium
CN111966767A (en) Track thermodynamic diagram generation method and device, electronic equipment and storage medium
CN111767059A (en) Deployment method and device of deep learning model, electronic equipment and storage medium
CN111669647A (en) Real-time video processing method, device, equipment and storage medium
CN114185600A (en) Acceleration framework generation method and device for target detection task and electronic equipment
CN111292223B (en) Graph calculation processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination