CN111489281A - Detection method based on GPU and CPU cooperative operation - Google Patents

Detection method based on GPU and CPU cooperative operation Download PDF

Info

Publication number
CN111489281A
CN111489281A CN202010271990.8A CN202010271990A CN111489281A CN 111489281 A CN111489281 A CN 111489281A CN 202010271990 A CN202010271990 A CN 202010271990A CN 111489281 A CN111489281 A CN 111489281A
Authority
CN
China
Prior art keywords
gpu
cpu
calculation
cut
calculated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010271990.8A
Other languages
Chinese (zh)
Inventor
张兴全
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changzhou Aochuang Medical Technology Co ltd
Original Assignee
Changzhou Aochuang Medical Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changzhou Aochuang Medical Technology Co ltd filed Critical Changzhou Aochuang Medical Technology Co ltd
Priority to CN202010271990.8A priority Critical patent/CN111489281A/en
Publication of CN111489281A publication Critical patent/CN111489281A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a detection method based on GPU and CPU cooperative operation, which comprises the following steps: cutting a picture to be calculated into N cut pictures with equal sizes; distributing the X cut pictures to a GPU for calculation and distributing the Y cut pictures to a CPU for calculation; after the GPU completes the calculation task, distributing X cut pictures in the remaining cut pictures to the GPU for calculation; after the CPU finishes the calculation task, distributing Y cut pictures in the rest cut pictures to the CPU for calculation; the GPU and the CPU sequentially complete calculation tasks until the N cut pictures are calculated; wherein MX + SY = N; m is the number of times that the GPU completes the calculation task; s is the number of times that the CPU completes the calculation task; and adding the calculation result of the cut picture calculated by the GPU and the calculation result of the cut picture calculated by the CPU to obtain the detection result of the picture to be calculated. The invention improves the detection speed and shortens the detection time.

Description

Detection method based on GPU and CPU cooperative operation
Technical Field
The invention relates to a detection method based on GPU and CPU cooperative operation.
Background
Currently, in recent years, deep learning is widely used for image processing and image content understanding, and great success is achieved in image classification, image recognition, and the like, and even in some aspects, the false detection rate has become lower than that of human beings.
The deep learning aims at establishing a neural network simulating the human brain for analysis learning, and the neural network comprises an input layer, an output layer and a plurality of hidden layers. A picture is input, and the initial 'low-level' feature representation is gradually converted into 'high-level' feature representation through layer-by-layer processing, so that classification or prediction is easier.
Target detection methods based on deep learning are mainly classified into two categories: a two-step method based on regional nomination and a one-step method based on regression. The two-step method firstly generates about 2000 area nominations on a picture, and then the nominations and the revisions of the positions on the classifier are put into the two-step method, mainly represented by rcnn, fastrcnn and the like; the 'one-step method' directly sends a picture to a network for construction, and gives the type and the position of a target object in one step, represented by yolov1, yolov2, yolov3, ssd and the like. The two-step method has high detection accuracy but low speed, and the single-step method has low accuracy but high detection speed.
The general detection algorithm workflow is as follows: the method comprises the steps of firstly, loading a trained model on a GPU, secondly, reading pictures from a hard disk by the CPU, transmitting the pictures to the GPU, finishing detection calculation by the GPU, and thirdly, transmitting a detection result to the CPU by the GPU and then reporting the detection result to a user.
The detection method has the following problems:
the computational resource is wasted, the computational efficiency is not high, and in the second step, when the GPU is used for detection, the CPU is in a waiting and idle state, so that the computational resource of the CPU is wasted.
For a large-size picture, if the picture is directly detected, the detection speed is slow, and if the resize is small, the image resolution is reduced, so that the detection result is seriously influenced.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a detection method based on the cooperative operation of a GPU and a CPU, which improves the detection speed and shortens the detection time.
In order to solve the above technical problem, a first technical solution of the present invention is: a detection method based on GPU and CPU cooperative operation comprises the following steps:
cutting a picture to be calculated into N cut pictures with equal sizes;
distributing the X cut pictures to a GPU for calculation and distributing the Y cut pictures to a CPU for calculation; wherein X + Y = N;
and adding the calculation result of the cut picture calculated by the GPU and the calculation result of the cut picture calculated by the CPU to obtain the detection result of the picture to be calculated.
Further, N = 64; x = 48; y = 16.
In order to solve the above technical problem, a second technical solution of the present invention is: a detection method based on GPU and CPU cooperative operation comprises the following steps:
cutting a picture to be calculated into N cut pictures with equal sizes;
distributing the X cut pictures to a GPU for calculation and distributing the Y cut pictures to a CPU for calculation; after the GPU completes the calculation task, distributing X cut pictures in the remaining cut pictures to the GPU for calculation; after the CPU finishes the calculation task, distributing Y cut pictures in the rest cut pictures to the CPU for calculation; the GPU and the CPU sequentially complete calculation tasks until the N cut pictures are calculated; wherein MX + SY = N; m is the number of times that the GPU completes the calculation task; s is the number of times that the CPU completes the calculation task;
and adding the calculation result of the cut picture calculated by the GPU and the calculation result of the cut picture calculated by the CPU to obtain the detection result of the picture to be calculated.
Further, starting a thread for distributing the calculation tasks of the GPU and the CPU; wherein,
the thread is in a blocking state, and when the GPU or the CPU requests to distribute the computing tasks, the thread is converted into an active state, and the corresponding computing tasks are distributed to the GPU or the CPU.
Further, N = 64; x =4, Y = 1.
After the technical scheme is adopted, the invention has the following beneficial effects:
1. different from the prior art, only the GPU or the CPU is used for calculation, the invention realizes the cooperative operation of the GPU and the CPU, reduces the waste of CPU calculation resources and improves the detection speed of the target object.
2. The invention cuts large-size pictures without resize, reduces resolution and causes no inaccuracy of detection results, and the invention not only keeps accurate detection results, but also keeps high-speed detection speed. The invention provides two task allocation schemes, namely static allocation and dynamic allocation, and the detection speed is improved by 22 percent at most.
Drawings
FIG. 1 is a first embodiment of the present invention of a detection method based on cooperative operation of a GPU and a CPU;
fig. 2 is a second embodiment of the detection method based on the cooperative operation of the GPU and the CPU according to the present invention.
Detailed Description
In order that the present invention may be more readily and clearly understood, a more particular description of the invention briefly described above will be rendered by reference to specific embodiments that are illustrated in the appended drawings.
Example one
As shown in fig. 1, a detection method based on cooperative operation of a GPU and a CPU includes the steps of:
cutting a picture to be calculated into N cut pictures with equal sizes;
distributing the X cut pictures to a GPU for calculation and distributing the Y cut pictures to a CPU for calculation; wherein X + Y = N;
and adding the calculation result of the cut picture calculated by the GPU and the calculation result of the cut picture calculated by the CPU to obtain the detection result of the picture to be calculated.
Specifically, the size of the picture to be calculated is changed to 1024 × 1024 by the resize function of opencv;
dividing the picture to be calculated into 8 × 8 through a cut _ image function, wherein the size of each cut picture is 128 × 128;
the first 48 parts of 64 cut pictures are changed into variables in the GPU through variable (cuda), and the last 16 parts of the cut pictures are changed into variables of the CPU through variable (CPU), and the variables are calculated by the GPU and the CPU respectively;
and adding the calculation result of the cut picture calculated by the GPU and the calculation result of the cut picture calculated by the CPU to obtain the detection result of the picture to be calculated.
Testing using the yolov3 algorithm found a reduction in detection time of around 15%.
Example two
As shown in fig. 2, a detection method based on cooperative operation of a GPU and a CPU includes the steps of:
cutting a picture to be calculated into N cut pictures with equal sizes;
distributing the X cut pictures to a GPU for calculation and distributing the Y cut pictures to a CPU for calculation; after the GPU completes the calculation task, distributing X cut pictures in the remaining cut pictures to the GPU for calculation; after the CPU finishes the calculation task, distributing Y cut pictures in the rest cut pictures to the CPU for calculation; the GPU and the CPU sequentially complete calculation tasks until the N cut pictures are calculated; wherein MX + SY = N; m is the number of times that the GPU completes the calculation task; s is the number of times that the CPU completes the calculation task;
and adding the calculation result of the cut picture calculated by the GPU and the calculation result of the cut picture calculated by the CPU to obtain the detection result of the picture to be calculated.
Starting a thread for distributing GPU and CPU calculation tasks; wherein,
the thread is in a blocking state, and when the GPU or the CPU requests to distribute the computing tasks, the thread is converted into an active state, and the corresponding computing tasks are distributed to the GPU or the CPU.
N=64;X=4,Y=1。
Specifically, the size of the picture to be calculated is changed to 1024 × 1024 by the resize function of opencv;
dividing the picture to be calculated into 8 × 8 through a cut _ image function, wherein the size of each cut picture is 128 × 128;
distributing 4 cut pictures to a GPU for calculation and distributing 1 cut picture to a CPU for calculation; after the GPU completes the calculation task, distributing 4 cut pictures in the rest cut pictures to the GPU for calculation; after the CPU finishes the calculation task, 1 cut picture in the rest cut pictures is distributed to the CPU for calculation; the GPU and the CPU finish the calculation tasks in turn until the 64 cut pictures are calculated;
specifically, a thread for allocating GPU and CPU computational tasks may be started; wherein,
the thread is in a blocking state, and when the GPU or the CPU requests to distribute the computing tasks, the thread is converted into an active state, and the corresponding computing tasks are distributed to the GPU or the CPU.
Through testing a large number of pictures, it is found that the method in the first embodiment has a disadvantage: the GPU and the CPU do not reach the calculation end at the same time. In some pictures, the GPU completes the computation first and then waits for the CPU to complete the computation, in other cases, the opposite is exactly what the CPU waits for the GPU to complete the computation.
Based on this, this embodiment improves the method of the first embodiment, and the GPU and the CPU optimally allocate the computing tasks, and by the method of the second embodiment, the computing tasks of the GPU and the CPU are more in which computing speed block, whereas if the computing speed is slow, the computing tasks allocated by the GPU and the CPU are less, so that the total computing time of the two devices is kept equal. Through the test, the detection speed is improved by 22%.
The above embodiments are described in further detail to solve the technical problems, technical solutions and advantages of the present invention, and it should be understood that the above embodiments are only examples of the present invention and are not intended to limit the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
In the description of the present invention, it is to be understood that the terms indicating an orientation or positional relationship are based on the orientation or positional relationship shown in the drawings only for the convenience of describing the present invention and simplifying the description, and do not indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention.
In the present invention, unless otherwise expressly stated or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; either directly or indirectly through intervening media, either internally or in any other relationship. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings or the orientations or positional relationships that the products of the present invention are conventionally placed in use, and are only used for convenience in describing the present invention and simplifying the description, but do not indicate or imply that the devices or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.
Furthermore, the terms "horizontal", "vertical", "overhang" and the like do not imply that the components are required to be absolutely horizontal or overhang, but may be slightly inclined. For example, "horizontal" merely means that the direction is more horizontal than "vertical" and does not mean that the structure must be perfectly horizontal, but may be slightly inclined.
In the present invention, unless otherwise expressly stated or limited, the first feature may be present on or under the second feature in direct contact with the first and second feature, or may be present in the first and second feature not in direct contact but in contact with another feature between them. Also, the first feature being above, on or above the second feature includes the first feature being directly above and obliquely above the second feature, or merely means that the first feature is at a higher level than the second feature. A first feature that underlies, and underlies a second feature includes a first feature that is directly under and obliquely under a second feature, or simply means that the first feature is at a lesser level than the second feature.

Claims (5)

1. A detection method based on GPU and CPU cooperative operation is characterized in that the method comprises the following steps:
cutting a picture to be calculated into N cut pictures with equal sizes;
distributing the X cut pictures to a GPU for calculation and distributing the Y cut pictures to a CPU for calculation; wherein X + Y = N;
and adding the calculation result of the cut picture calculated by the GPU and the calculation result of the cut picture calculated by the CPU to obtain the detection result of the picture to be calculated.
2. The detection method according to claim 1,
N=64;X=48;Y=16。
3. a detection method based on GPU and CPU cooperative operation is characterized in that the method comprises the following steps:
cutting a picture to be calculated into N cut pictures with equal sizes;
distributing the X cut pictures to a GPU for calculation and distributing the Y cut pictures to a CPU for calculation; after the GPU completes the calculation task, distributing X cut pictures in the remaining cut pictures to the GPU for calculation; after the CPU finishes the calculation task, distributing Y cut pictures in the rest cut pictures to the CPU for calculation; the GPU and the CPU sequentially complete calculation tasks until the N cut pictures are calculated; wherein MX + SY = N; m is the number of times that the GPU completes the calculation task; s is the number of times that the CPU completes the calculation task;
and adding the calculation result of the cut picture calculated by the GPU and the calculation result of the cut picture calculated by the CPU to obtain the detection result of the picture to be calculated.
4. The detection method according to claim 3,
starting a thread for distributing GPU and CPU calculation tasks; wherein,
the thread is in a blocking state, and when the GPU or the CPU requests to distribute the computing tasks, the thread is converted into an active state, and the corresponding computing tasks are distributed to the GPU or the CPU.
5. The detection method according to claim 3,
N=64;X=4,Y=1。
CN202010271990.8A 2020-04-09 2020-04-09 Detection method based on GPU and CPU cooperative operation Pending CN111489281A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010271990.8A CN111489281A (en) 2020-04-09 2020-04-09 Detection method based on GPU and CPU cooperative operation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010271990.8A CN111489281A (en) 2020-04-09 2020-04-09 Detection method based on GPU and CPU cooperative operation

Publications (1)

Publication Number Publication Date
CN111489281A true CN111489281A (en) 2020-08-04

Family

ID=71794724

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010271990.8A Pending CN111489281A (en) 2020-04-09 2020-04-09 Detection method based on GPU and CPU cooperative operation

Country Status (1)

Country Link
CN (1) CN111489281A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101520900A (en) * 2009-03-30 2009-09-02 中国人民解放军第三军医大学第一附属医院 Method and special equipment for quickening CR/DR/CT graphic display and graphic processing by utilizing GPU
US20110023040A1 (en) * 2009-07-24 2011-01-27 Apple Inc. Power-efficient interaction between multiple processors
CN103617626A (en) * 2013-12-16 2014-03-05 武汉狮图空间信息技术有限公司 Central processing unit (CPU) and ground power unit (GPU)-based remote-sensing image multi-scale heterogeneous parallel segmentation method
CN104267940A (en) * 2014-09-17 2015-01-07 武汉狮图空间信息技术有限公司 Quick map tile generation method based on CPU+GPU
CN104869398A (en) * 2015-05-21 2015-08-26 大连理工大学 Parallel method of realizing CABAC in HEVC based on CPU+GPU heterogeneous platform
CN106951322A (en) * 2017-02-28 2017-07-14 中国科学院深圳先进技术研究院 The image collaboration processing routine acquisition methods and system of a kind of CPU/GPU isomerous environments
CN107945098A (en) * 2017-11-24 2018-04-20 腾讯科技(深圳)有限公司 Image processing method, device, computer equipment and storage medium
CN109871352A (en) * 2017-12-01 2019-06-11 北京搜狗科技发展有限公司 A kind of cooperated computing method and device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101520900A (en) * 2009-03-30 2009-09-02 中国人民解放军第三军医大学第一附属医院 Method and special equipment for quickening CR/DR/CT graphic display and graphic processing by utilizing GPU
US20110023040A1 (en) * 2009-07-24 2011-01-27 Apple Inc. Power-efficient interaction between multiple processors
CN103617626A (en) * 2013-12-16 2014-03-05 武汉狮图空间信息技术有限公司 Central processing unit (CPU) and ground power unit (GPU)-based remote-sensing image multi-scale heterogeneous parallel segmentation method
CN104267940A (en) * 2014-09-17 2015-01-07 武汉狮图空间信息技术有限公司 Quick map tile generation method based on CPU+GPU
CN104869398A (en) * 2015-05-21 2015-08-26 大连理工大学 Parallel method of realizing CABAC in HEVC based on CPU+GPU heterogeneous platform
CN106951322A (en) * 2017-02-28 2017-07-14 中国科学院深圳先进技术研究院 The image collaboration processing routine acquisition methods and system of a kind of CPU/GPU isomerous environments
CN107945098A (en) * 2017-11-24 2018-04-20 腾讯科技(深圳)有限公司 Image processing method, device, computer equipment and storage medium
CN109871352A (en) * 2017-12-01 2019-06-11 北京搜狗科技发展有限公司 A kind of cooperated computing method and device

Similar Documents

Publication Publication Date Title
CN111176852A (en) Resource allocation method, device, chip and computer readable storage medium
WO2016173351A1 (en) Data processing method and device
CN108205469B (en) MapReduce-based resource allocation method and server
TWI679886B (en) A system and method of image analyses
US11429434B2 (en) Elastic execution of machine learning workloads using application based profiling
JP6348431B2 (en) Image processing method and image processing apparatus
CN111104210A (en) Task processing method and device and computer system
CN112181613A (en) Heterogeneous resource distributed computing platform batch task scheduling method and storage medium
KR20210043677A (en) Motion recognition method and apparatus, electronic device and recording medium
CN114519521A (en) Resource scheduling method and device, computer equipment and storage medium
CN113885956A (en) Service deployment method and device, electronic equipment and storage medium
CN112699040A (en) Pressure testing method, device, equipment and computer readable storage medium
CN110955390B (en) Data processing method, device, electronic equipment and storage medium
CN116540876A (en) Human body action recognition method based on individual information personalized federal learning
CN110390295B (en) Image information identification method and device and storage medium
CN114968567A (en) Method, apparatus and medium for allocating computing resources of a compute node
EP3869398A2 (en) Method and apparatus for processing image, device and storage medium
CN111489281A (en) Detection method based on GPU and CPU cooperative operation
CN112465050A (en) Image template selection method, device, equipment and storage medium
CN111625281A (en) Data processing method, device, equipment and storage medium
CN111143148A (en) Model parameter determination method, device and storage medium
CN109299743A (en) Gesture identification method and device, terminal
EP4050560B1 (en) Wafer testing method and apparatus, and device and storage medium
DE102022120731A1 (en) MULTIMODAL SENSOR FUSION FOR CONTENT IDENTIFICATION IN HUMAN-MACHINE INTERFACE APPLICATIONS
CN114330888A (en) Order processing method and system and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination