CN111078195A - Target capture parallel acceleration method based on OPENCL - Google Patents

Target capture parallel acceleration method based on OPENCL Download PDF

Info

Publication number
CN111078195A
CN111078195A CN201811215057.8A CN201811215057A CN111078195A CN 111078195 A CN111078195 A CN 111078195A CN 201811215057 A CN201811215057 A CN 201811215057A CN 111078195 A CN111078195 A CN 111078195A
Authority
CN
China
Prior art keywords
opencl
target
parallel
data
acceleration method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811215057.8A
Other languages
Chinese (zh)
Inventor
吴志佳
陈小林
李荅群
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changchun Institute of Optics Fine Mechanics and Physics of CAS
Original Assignee
Changchun Institute of Optics Fine Mechanics and Physics of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changchun Institute of Optics Fine Mechanics and Physics of CAS filed Critical Changchun Institute of Optics Fine Mechanics and Physics of CAS
Priority to CN201811215057.8A priority Critical patent/CN111078195A/en
Publication of CN111078195A publication Critical patent/CN111078195A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/20Software design

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Image Processing (AREA)

Abstract

The embodiment of the invention discloses a target capture parallel acceleration method based on OPENCL. The method divides the capturing function required by the system into four parts, namely a global image preprocessing step, an image data parallel division step, a target capturing step and a target decision step, and respectively executes one part of the four parts in parallel through a host end and a device end. The method not only obviously improves the real-time performance of capturing the high-speed target and the large-size image target, but also obviously reduces the probability of mistakenly capturing the target due to the introduction of a decision-making mechanism.

Description

Target capture parallel acceleration method based on OPENCL
Technical Field
The invention relates to the technical field of image processing, in particular to a target capture parallel acceleration method based on OPENCL in a high-speed target and large-size image photoelectric theodolite.
Background
In the technical field of target tracking, the real-time performance of tracking is an important technical index. At present, when the size of an image to be processed is small, the real-time requirement can be basically met by adopting a traditional CPU serial programming framework. However, when the image size becomes large, the influence of the image size on the real-time performance is very large when the image data is processed by using the conventional serial programming structure, and particularly in the target capturing stage, the search area of the target is a full image, so that the real-time performance in the capturing stage is significantly reduced when the image size increases.
Currently, the OPENCL heterogeneous multi-core programming framework is a general programming architecture proposed for heterogeneous multi-core processor hardware structures. OPENCL has a good hardware platform support, and at present, most of main stream processors such as CPUs, GPUs, DSPs, FPGAs and the like support the programming framework, so that OPENCL-based programs have a good cross-platform property, and the good cross-platform property also provides multiple schemes for program design. The OPENCL-based programming framework can maximize the overall concurrency performance of the existing processor platform, and therefore, the method is very suitable for image processing scenes requiring single task and high real-time performance.
The photoelectric theodolite is used as important observation and measurement equipment of a target range, and the subsequent tracking effect is directly determined by the capturing capability of the photoelectric theodolite. Therefore, in order to solve the problem that the target capturing phase of the high-speed target and large-size imaging electro-optic theodolite is time-consuming, a parallel acceleration method for target capturing based on OPENCL is needed.
Disclosure of Invention
Aiming at the problem that the time consumption of a target capturing phase of a high-speed target and a large-size imaging photoelectric theodolite is huge, a target capturing parallel acceleration method based on OPENCL is needed to be provided. According to the target capture parallel acceleration method based on OPENCL, the capture function required by the system is divided into the global image preprocessing step, the image data parallel division step, the target capture step and the target decision step, so that the real-time performance of capturing a high-speed target and a large-size image target is remarkably improved, and meanwhile due to the introduction of a decision mechanism, the target capture error probability is remarkably reduced.
The specific scheme of the OPENCL-based target capture parallel acceleration method is as follows: an OPENCL-based target capture parallel acceleration method comprises a global image preprocessing step, a target capture parallel acceleration step and a target capture parallel acceleration step, wherein the global image preprocessing step is operated on the equipment side of OPENCL; the method comprises the steps of image data parallel division, wherein the image data parallel division step is operated at a host end of an OPENCL framework; a target capturing step, wherein the target capturing step is operated at the equipment end of OPENCL; a target decision step, which is operated at the host end of the OPENCL framework; and the program of the host side and the program of the equipment side are executed in parallel.
Preferably, the global image preprocessing step includes the calculation of the statistics of the gray distribution of the global image and the segmentation threshold.
Preferably, the global image preprocessing step adopts a multi-core data parallel processing mode.
Preferably, the image data parallel partitioning step includes partitioning data of the global image without data write conflict, for providing efficient accessible data for the parallel kernel algorithm and reducing overall memory access delay during data processing.
Preferably, the target capturing step comprises a target tracking algorithm, the target tracking algorithm being a common piece of code executable on a plurality of arithmetic units.
Preferably, the target decision step includes collecting the captured results of the processing units, and providing the target with the highest confidence after comprehensive operation according to the target prior information.
Preferably, the method is performed by: firstly, after the host terminal collects images, data are divided and packaged into a standard format under an OPENCL framework; then, the host side starts the equipment side to perform a global image preprocessing step and a target capturing step, and the operation unit on the equipment side is executed; and finally, after receiving a completion signal of the operation unit on the equipment end, the host end starts a target decision step and gives a capture result.
According to the technical scheme, the embodiment of the invention has the following advantages:
the object capturing parallel acceleration method based on OPENCL provided by the embodiment of the invention divides the capturing function required by the system into four parts, namely a global image preprocessing step, an image data parallel division step, an object capturing step and an object decision step, and respectively executes one of the four parts in parallel through a host end and a device end, thereby effectively accelerating the speed of object capturing. Further, the OPENCL-based target capture parallel acceleration method provided by the embodiment of the present invention has a general heuristic meaning for the computation process division of a target capture task, and the process division makes the relative tasks of a single processor become simple, i.e., either planning a data structure or computing data. Further, the OPENCL-based target capture parallel acceleration method provided by the embodiment of the invention is based on an OPENCL heterogeneous multi-core programming framework, and reduces the overall memory access delay in the operation process and effectively improves the data processing speed through complete write-free data conflict data division. Further, the OPENCL-based target capture parallel acceleration method provided by the embodiment of the invention can be applied to a target capture system of a high-speed target and a large-size image photoelectric theodolite, so that the real-time target capture performance of the photoelectric theodolite can be remarkably improved.
Drawings
Fig. 1 is a schematic flowchart illustrating steps of a target capture parallel acceleration method based on OPENCL according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating data partitioning according to an embodiment of the present invention;
fig. 3 is a schematic flowchart of a host-side program based on OPENCL according to an embodiment of the present invention;
fig. 4 is a schematic flowchart of an OPENCL-based device side program according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a cooperation flow between a host side and a device side based on OPENCL according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
As shown in fig. 1, a schematic step flow diagram of a target capture parallel acceleration method based on OPENCL provided in an embodiment of the present invention. The OPENCL-based target capture parallel acceleration method mainly comprises four steps:
step S1: a global image preprocessing step, which is executed on the equipment side of OPENCL. The global image preprocessing step comprises the statistics of the gray distribution of the global image and the operation of a segmentation threshold, and adopts a multi-core data parallel processing mode.
Step S2: and the image data parallel division step is operated at the host end of the OPENCL framework. The image data parallel division step comprises the step of dividing data of the global image under the condition of no data writing conflict, and is used for providing high-efficiency accessible data for a parallel kernel algorithm and reducing the whole memory access delay during data processing.
As shown in fig. 2, a schematic diagram of data partitioning according to an embodiment of the present invention is provided. The data division in the embodiment of the invention is divided into a certain computing unit of a certain processor according to the lines of the image; the total line number of the data to be processed of each computing unit depends on the total line number of the image and the total computing unit number of the used computing platform; during design, the total number of computing units participating in operation is adjusted to ensure that the total number of image lines can be divided completely, and further the complete write conflict-free data access of image data is ensured.
Step S3: a target capturing step, which is operated on the equipment side of OPENCL. The target capturing step includes a target tracking algorithm that is a common piece of code executable on a plurality of arithmetic units. The kernel of the target tracking algorithm is designed based on an OPENCL heterogeneous multi-core processing framework.
Step S4: a goal decision step, which is operated at the host end of the OPENCL framework. The target decision step comprises the steps of collecting the capture results of all the processing units, and giving a target with the highest confidence coefficient after comprehensive operation according to target prior information.
In steps S1 to S4, the program on the host side and the program on the device side are executed in parallel.
In this embodiment, although the steps S1 and S4 are sequentially numbered, and the flowchart also has a step flow, the execution sequence of the steps S1 and S4 may be performed not only in sequence as shown in the figure, but also in parallel with a plurality of steps.
In one embodiment, the steps S1 to S4 are executed as follows: firstly, after the host terminal collects images, data are divided and packaged into a standard format under an OPENCL framework; then, the host side starts the equipment side to perform a global image preprocessing step and a target capturing step, and the operation unit on the equipment side is executed; and finally, after receiving a completion signal of the operation unit on the equipment end, the host end starts a target decision step and gives a capture result.
As shown in fig. 3, a schematic flowchart of a host-side program based on OPENCL provided in an embodiment of the present invention is shown. Step T11: the OPENCL host determines the image frame synchronization signal first, and if yes, the process goes to step T12, otherwise, the process goes to step T11 again.
Step T12: the data partitioning operation as shown in fig. 2 is performed and packed into the standard OPENCL data format.
Step T13: and starting the capturing step of the equipment side.
Step T14: inquiring the operation ending mark of the equipment end, if the operation of the current frame is ended, entering the step T15, otherwise, continuing the step T14.
Step T15: a decision mechanism is captured.
Step T16: and outputting the capture target.
Step T17: and judging whether the capturing is finished, if so, giving a final capturing target with the maximum confidence coefficient, and otherwise, entering a step T11.
As shown in fig. 4, a schematic flow chart of an OPENCL-based device side program according to an embodiment of the present invention is provided. Step T21: the OPENCL device side first determines the start signal of the host side, and if it receives the start signal, it goes to step T22, otherwise, it continues to step T21.
Step T22: each computing unit accesses the divided data shown in fig. 2 according to its own global number, and acquires the divided data.
Step T23: and carrying out data operation in parallel according to the OPENCL kernel of the algorithm.
Step T24: and judging whether all the calculation of the equipment end is finished, if so, entering the step T25, and otherwise, continuing the step T24.
Step T25: and sending a completion flag.
As shown in fig. 5, a schematic diagram of a cooperation flow based on an OPENCL host side and an equipment side according to an embodiment of the present invention is provided. The host end and the equipment end are in time sequence flow, the host end needs to complete all synchronous mechanisms, and the equipment end is only responsible for receiving a starting signal of the host end and completing all data operations at high speed.
Step S11: the host side divides data;
step S12: the host side starts the equipment side to carry out the parallel capture kernel algorithm step;
step S13: the equipment side carries out a parallel capture kernel algorithm step;
step S14: a host captures a target decision mechanism;
step S15: the host outputs a capture target.
The object capturing parallel acceleration method based on OPENCL provided by the embodiment of the invention divides the capturing function required by the system into four parts, namely a global image preprocessing step, an image data parallel division step, an object capturing step and an object decision step, and respectively executes one of the four parts in parallel through a host end and a device end, thereby effectively accelerating the speed of object capturing.
The OPENCL-based target capture parallel acceleration method provided by the embodiment of the invention has a general heuristic meaning for the calculation process division of a target capture task, and the process division enables the relative tasks of a single processor to be simplified, namely, data structure planning or data operation is performed.
The object capture parallel acceleration method based on OPENCL provided by the embodiment of the invention is based on an OPENCL heterogeneous multi-core programming framework, and through complete data division without data conflict of write-data, the whole memory access delay in the operation process is reduced, and the data processing speed is effectively improved.
The object capturing parallel acceleration method based on OPENCL provided by the embodiment of the invention can be applied to the object capturing system of the high-speed object and large-size image photoelectric theodolite, thereby remarkably improving the object real-time capturing performance of the photoelectric theodolite.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (7)

1. An OPENCL-based target capture parallel acceleration method, the method comprising:
a global image preprocessing step, which is operated at the equipment end of OPENCL;
the method comprises the steps of image data parallel division, wherein the image data parallel division step is operated at a host end of an OPENCL framework;
a target capturing step, wherein the target capturing step is operated at the equipment end of OPENCL;
a target decision step, which is operated at the host end of the OPENCL framework;
wherein the program of the host side and the program of the device side are executed in parallel.
2. The OPENCL-based target capture parallel acceleration method as claimed in claim 1, wherein said global image preprocessing step comprises the computation of the statistics of the gray distribution of the global image and the segmentation threshold.
3. The OPENCL-based target capture parallel acceleration method as claimed in claim 2, wherein the global image preprocessing step adopts a multi-kernel data parallel processing manner.
4. The OPENCL-based target capture parallel acceleration method as claimed in claim 1, wherein the image data parallel partitioning step includes partitioning the data of the global image without data write collision for providing efficient accessible data for the parallel kernel algorithm and reducing the overall access latency during data processing.
5. The OPENCL-based target capture parallel acceleration method of claim 1, wherein the target capture step comprises a target tracking algorithm that is a common piece of code executable on multiple arithmetic units.
6. The OPENCL-based target capture parallel acceleration method as claimed in claim 1, wherein the target decision step includes collecting the capture results of each processing unit, and according to the target prior information, the target with the highest confidence is given after the synthesis operation.
7. The OPENCL-based target capture parallel acceleration method as claimed in claim 1, wherein the method is performed by the steps of:
firstly, after the host terminal collects images, data are divided and packaged into a standard format under an OPENCL framework;
then, the host side starts the equipment side to perform a global image preprocessing step and a target capturing step, and the operation unit on the equipment side is executed;
and finally, after receiving a completion signal of the operation unit on the equipment end, the host end starts a target decision step and gives a capture result.
CN201811215057.8A 2018-10-18 2018-10-18 Target capture parallel acceleration method based on OPENCL Pending CN111078195A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811215057.8A CN111078195A (en) 2018-10-18 2018-10-18 Target capture parallel acceleration method based on OPENCL

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811215057.8A CN111078195A (en) 2018-10-18 2018-10-18 Target capture parallel acceleration method based on OPENCL

Publications (1)

Publication Number Publication Date
CN111078195A true CN111078195A (en) 2020-04-28

Family

ID=70308743

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811215057.8A Pending CN111078195A (en) 2018-10-18 2018-10-18 Target capture parallel acceleration method based on OPENCL

Country Status (1)

Country Link
CN (1) CN111078195A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112988395A (en) * 2021-04-20 2021-06-18 宁波兰茜生物科技有限公司 Pathological analysis method and device of extensible heterogeneous edge computing framework
CN116027363A (en) * 2023-03-27 2023-04-28 厦门大学 GNSS anti-deception baseband device accelerated by heterogeneous parallel processor

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8427359B1 (en) * 2011-01-06 2013-04-23 Sandia Corporation Tracking moving radar targets with parallel, velocity-tuned filters
CN103325124A (en) * 2012-03-21 2013-09-25 东北大学 Target detecting and tracking system and method using background differencing method based on FPGA
CN103679746A (en) * 2012-09-24 2014-03-26 中国航天科工集团第二研究院二O七所 object tracking method based on multi-information fusion
CN107563392A (en) * 2017-09-07 2018-01-09 西安电子科技大学 The YOLO object detection methods accelerated using OpenCL

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8427359B1 (en) * 2011-01-06 2013-04-23 Sandia Corporation Tracking moving radar targets with parallel, velocity-tuned filters
CN103325124A (en) * 2012-03-21 2013-09-25 东北大学 Target detecting and tracking system and method using background differencing method based on FPGA
CN103679746A (en) * 2012-09-24 2014-03-26 中国航天科工集团第二研究院二O七所 object tracking method based on multi-information fusion
CN107563392A (en) * 2017-09-07 2018-01-09 西安电子科技大学 The YOLO object detection methods accelerated using OpenCL

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘丹: "视频运动目标检测与跟踪算法的GPU并行优化", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112988395A (en) * 2021-04-20 2021-06-18 宁波兰茜生物科技有限公司 Pathological analysis method and device of extensible heterogeneous edge computing framework
CN116027363A (en) * 2023-03-27 2023-04-28 厦门大学 GNSS anti-deception baseband device accelerated by heterogeneous parallel processor
CN116027363B (en) * 2023-03-27 2023-08-04 厦门大学 GNSS anti-deception baseband device accelerated by heterogeneous parallel processor

Similar Documents

Publication Publication Date Title
EP3621034B1 (en) Method and apparatus for calibrating relative parameters of collector, and storage medium
CN109558597B (en) Text translation method and device, equipment and storage medium
US10831547B2 (en) Accelerator control apparatus for analyzing big data, accelerator control method, and program
EP2483772B1 (en) Trap handler architecture for a parallel processing unit
US9996386B2 (en) Mid-thread pre-emption with software assisted context switch
CN107562660B (en) visual SLAM system-on-chip and data processing method
EP3846079A1 (en) Image processing method, and task data processing method and device
CN110378966B (en) Method, device and equipment for calibrating external parameters of vehicle-road coordination phase machine and storage medium
CN110751676A (en) Heterogeneous computing system and method based on target detection and readable storage medium
CN107992366B (en) Method, system and electronic equipment for detecting and tracking multiple target objects
US20230195310A1 (en) Fpga board memory data reading method and apparatus, and medium
CN109036522B (en) Image processing method, device, equipment and readable storage medium
CN107153527B (en) Parallel radar data processing method based on message queue
CN111611221A (en) Hybrid computing system, data processing method and device
CN111078195A (en) Target capture parallel acceleration method based on OPENCL
CN112988395B (en) Pathological analysis method and device of extensible heterogeneous edge computing framework
CN114981776A (en) Method for scheduling hardware accelerator and task scheduler
CN112734827A (en) Target detection method and device, electronic equipment and storage medium
CN113406572B (en) Radar parallel processing system and method, storage medium and terminal
US10402510B2 (en) Calculating device, calculation method, and calculation program
US11431872B2 (en) Buffer management for plug-in architectures in computation graph structures
US10049487B2 (en) Identifying duplicate indices in an input index stream
CN112949847A (en) Neural network algorithm acceleration system, scheduling system and scheduling method
CN105378652A (en) Method and apparatus for allocating thread shared resource
WO2020118547A1 (en) Fpga-based acceleration using opencl on fcl in robot motion planning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200428

RJ01 Rejection of invention patent application after publication