CN111738920A

CN111738920A - FPGA (field programmable Gate array) framework for panoramic stitching acceleration and panoramic image stitching method

Info

Publication number: CN111738920A
Application number: CN202010535149.5A
Authority: CN
Inventors: 蔡晓军; 祝瑶佳; 赵梦莹; 申兆岩; 贾智平
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2020-06-12
Filing date: 2020-06-12
Publication date: 2020-10-02
Anticipated expiration: 2040-06-12
Also published as: CN111738920B

Abstract

The invention discloses a panoramic stitching acceleration oriented FPGA (field programmable gate array) framework and a panoramic image stitching method, wherein the FPGA framework comprises the following components: the first VDMA module and the second VDMA module are respectively used for reading a first image and a second image to be spliced and transmitting the first image and the second image to the first IP core; the first IP core is used for detecting the characteristic points of the first image and the second image, calculating the gradient amplitude and the direction of the pixel points and transmitting the calculation result to the second IP core; the second IP core is used for generating a characteristic descriptor according to the obtained gradient amplitude and direction of the pixel point and transmitting the characteristic descriptor to the MICROBLAZE soft core; and a MICROBLAZE soft core for performing feature point matching and image fusion according to the feature descriptors. The invention effectively ensures the real-time property of the splicing process by utilizing the characteristics of high parallelism and low power consumption of the FPGA.

Description

FPGA (field programmable Gate array) framework for panoramic stitching acceleration and panoramic image stitching method

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a panoramic stitching acceleration-oriented FPGA (field programmable gate array) framework and a panoramic image stitching method.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

Image stitching has been a research hotspot in the field of image processing, and in recent years, image feature recognition algorithms in image stitching have been widely proposed, wherein the SIFT algorithm is the most representative algorithm. SIFT, which is a scale invariant feature transform, is an algorithm of computer vision, which is used to detect and describe local features in an image, where the local features remain invariant to rotation and scale scaling, and also remain a certain degree of stability to view angle changes and noise. And the SIFT algorithm also has the characteristics of multiple extracted feature points, even if only a few objects can generate a large number of SIFT feature vectors, extreme points are searched in the spatial scale, and the position, scale and rotation invariants of the extreme points are extracted. The SIFT algorithm has high matching precision and excellent robustness.

However, most of the current image feature recognition algorithms utilize a CPU for processing and are limited by the problems of poor parallel processing capability of the CPU, low recognition speed of image features, low real-time performance and the like. And the feature recognition is the technical core of image stitching and is also the stage which takes the longest time in the whole panoramic stitching process, so the speed of the feature recognition directly determines the speed of the image stitching.

In order to solve the problem of low real-time performance of an image splicing technology, the academia is dedicated to accelerating feature recognition by using a parallel technology, wherein many scholars propose a technology for accelerating an image splicing process by using a GPU (graphics processing unit), for example, Heymann et al optimize and accelerate an SIFT (Scale invariant feature transform) algorithm by using the GPU, and the speed of the optimized algorithm running on the GPU is about 20 times faster than that of a CPU (central processing unit); the Yuan et al uses the CUDA device framework to realize multithreading on GPU hardware, accelerates the SIFT algorithm through the high parallelism of the GPU, has an acceleration ratio which is obviously increased along with the increase of data volume, can even reach 180 times when processing 800 ten thousand pixel images, and can basically realize the real-time identification of image features. The learners also use the multi-core characteristic of the CPU to accelerate the feature recognition algorithm, Zhang et al accelerate the SIFT algorithm through the multi-core system, and when processing 640 x 480 images, an acceleration effect of 45 frames/s can be obtained. The acceleration effect is obvious, but the method has the defect of high power consumption and cannot be applied to the embedded field. In addition, the SIFT algorithm is used as a common feature extraction method in image matching, a gaussian kernel function used in gaussian blurring is of a floating point type, but an FPGA is not good at floating point calculation, the number of groups and the number of layers of a gaussian pyramid are many, 4 groups are provided in total, the number of layers of each group is 6, feature points extracted by the SIFT algorithm are also numerous, and a large number of SIFT feature vectors can be generated even if the size of an image to be spliced is small, so that great resource burden and delay cost are brought to subsequent calculation.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides the FPGA architecture facing the panoramic stitching acceleration and the panoramic image stitching method, which execute the steps of reading, characteristic detection and the like in parallel on two images to be stitched, and ensure the real-time property of the stitching process.

In order to achieve the above object, one or more embodiments of the present invention provide the following technical solutions:

a FPGA architecture oriented to panoramic stitching acceleration comprises:

the first VDMA module and the second VDMA module are respectively used for reading a first image and a second image to be spliced and respectively transmitting the first image and the second image to the first IP core and the second IP core;

the first IP core is used for detecting the characteristic points of the first image and the second image, calculating the gradient amplitude and the direction of the pixel points and transmitting the calculation result to the second IP core;

the second IP core is used for generating a characteristic descriptor according to the obtained gradient amplitude and direction of the pixel point and transmitting the characteristic descriptor to the MICROBLAZE soft core;

and a MICROBLAZE soft core for performing feature point matching and image fusion according to the feature descriptors.

The system further comprises an ARM core, and the operation of the first VDMA module, the second VDMA module, the first IP core, the second IP core and the MICROBLAZE soft core is controlled.

Further, after the image fusion is completed by the MICROBLAZE soft core, the fused image is transmitted back to the DDR through the first VDMA module for storage.

Further, data interaction between the MICROBLAZE soft core and the PS is realized by using a BRAM shared data area and an interrupt processing mechanism.

One or more embodiments provide a panoramic image stitching method, including the steps of:

reading a first image and a second image to be spliced in parallel, establishing a scale space, detecting characteristics, calculating gradient amplitude and direction of pixel points and generating a characteristic descriptor;

performing feature point matching based on the obtained feature descriptors of the first image and the second image;

and carrying out image fusion on the overlapped area after the first image and the second image are spliced.

Further, the performing of the scale space establishment specifically includes:

constructing a Gaussian pyramid for the first image/the second image, wherein the Gaussian pyramid comprises two groups, and each group comprises four layers of images; generating a corresponding differential pyramid, wherein the differential pyramid comprises two groups, and each group comprises three layers of images;

the two groups of images of the Gaussian pyramid, the two groups of images of the differential pyramid and the images in the group of the differential pyramid are generated in a parallel mode respectively.

Further, in the process of constructing the Gaussian pyramid, a Gaussian blur function with an integer data type is adopted for image smoothing.

Further, feature detection is realized in a parallel mode, and the gradient amplitude and the direction of the pixel point are calculated:

and performing feature detection on the middle layer based on the differential pyramid, dividing all pixel points into three groups, performing feature detection on the three groups of pixel points in parallel, and simultaneously calculating gradient amplitude and direction of the three groups of pixel points in parallel to obtain a feature point coordinate set and a set of gradients and directions of all the pixel points.

Further, performing feature point matching includes: establishing KDTREE based on the feature descriptor, and performing feature point matching search; and removing mismatching points based on RANSAC algorithm.

Further, image fusion is performed by a weighted average method.

The above one or more technical solutions have the following beneficial effects:

(1) aiming at the problems of large calculation amount and low real-time performance in the SIFT panoramic stitching process, the invention provides an FPGA (field programmable gate array) framework oriented to panoramic stitching acceleration design, and the real-time performance in the stitching process is effectively ensured by utilizing the characteristics of high parallelism and low power consumption of the FPGA.

(2) In order to ensure the overall performance of the SIFT panorama stitching process, the invention provides a software and hardware combined image processing scheme, namely, the extraction of image feature points is realized by a hardware method, and the matching of the image feature points and the image fusion are completed by a software method. Under the condition of limited FPGA resources, the time delay of image feature point extraction is greatly reduced, so that the time delay of the whole splicing process is reduced, and the overall performance of the SIFT panorama splicing process in the running of an FPGA platform is ensured.

(3) Aiming at the realization of image feature point extraction by a hardware method, the invention designs a reasonable parallel strategy, and the parallel strategy is provided to maximally reduce the time delay of feature point extraction under the condition of limited FPGA (field programmable gate array) resources.

(4) The SIFT algorithm is simplified on the premise of not influencing subsequent image matching and fusion, and the method comprises the steps of selecting an integer Gaussian kernel function, reducing the number of groups and the number of layers of a Gaussian pyramid in a scale space, adjusting a screening threshold value in a feature detection stage to control the number of feature points to be within a reasonable range and the like, and fully considering the parallel effect and the calculated amount, so that the number of the feature points cannot display the parallel effect due to too few feature points and can cause splicing failure, huge resource burden and delay cost are not brought to subsequent calculation due to too many feature points, and accelerated execution of the whole stage of feature extraction is realized.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.

Fig. 1 is a heterogeneous SoC architecture (ARM + FPGA) oriented to panoramic stitching acceleration in one or more embodiments of the present invention;

FIG. 2 is a flow of parallel processing in a feature extraction stage in one or more embodiments of the invention;

fig. 3 is a data processing flow of the panorama stitching process.

Detailed Description

It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

The embodiments and features of the embodiments of the present invention may be combined with each other without conflict.

The existing SIFT feature detection mainly comprises the following 4 basic steps:

1. and (3) detection of extreme values in the scale space: the image locations are searched for on all scales. Potential scale-and rotation-invariant points of interest are identified by gaussian derivative functions.

The scale space is represented by a Gaussian pyramid during implementation, and the construction of the Gaussian pyramid comprises the following two steps:

(1) performing Gaussian smoothing on the image;

(2) and performing down-sampling on the image.

The pyramid model of the image is a pyramid model formed by sampling an original image in a descending order continuously to obtain a series of images with different sizes from large to small and from bottom to top. The original image is the first layer of the pyramid, and a new image obtained by down-sampling each time is one layer (one image for each layer) of the pyramid, wherein n layers are provided for each pyramid. In order to make the scale show the continuity, the gaussian pyramid adds gaussian filtering on the basis of simple down-sampling. Specifically, one image in each layer of the image pyramid is subjected to gaussian blurring by using different parameters, Octave represents the number of image groups that one image can generate, and Interval represents the number of image layers included in one group of images. In addition, in the down-sampling process, the initial image (bottom layer image) of the group of images on the Gaussian pyramid is obtained by down-sampling the third last image of the previous group of images.

2. Key point positioning: at each candidate location, the location and scale are determined by fitting a fine model. The selection of the key points depends on their degree of stability.

3. Direction determination: one or more directions are assigned to each keypoint location based on the local gradient direction of the image. All subsequent operations on the image data are transformed with respect to the orientation, scale and location of the keypoints, providing invariance to these transformations.

4. Description of key points: local gradients of the image are measured at a selected scale in a neighborhood around each keypoint. These gradients are transformed into a representation that allows for relatively large local shape deformations and illumination variations.

Example one

In order to solve the problems that the existing acceleration method is high in power consumption and cannot be applied to the embedded field, the embodiment provides a Field Programmable Gate Array (FPGA) architecture oriented to panoramic stitching acceleration by combining the characteristics of the FPGA, and aims to reduce the time delay of the SIFT panoramic stitching process and improve the real-time performance. The FPGA not only has extremely high parallelism, but also has the programmable characteristic, and the circuit structure can be changed at any time according to actual needs so as to adapt to different application scenes.

The FPGA architecture comprises an ARM core, and a first VDMA module, a second VDMA module, a first IP core, a second IP core, a MICROBLAZE soft core, a DDR controller and a BRAM controller which are connected with the ARM core. The input ends of the first VDMA module and the second VDMA module are both connected with the ARM core, the output ends of the first VDMA module and the second VDMA module are both connected with the input end of the first IP core, the output end of the first IP core is connected with the input end of the second IP core, the output end of the second IP core is connected with the input end of the MICROBLAZE soft core, and the BRAM controller is further connected with the MICROBLAZE soft core.

The ARM core mainly controls the running of the VDMA, HLSIP core and the MICROBLAZE soft core.

The first VDMA module and the second VDMA module are respectively used for reading a first image and a second image to be spliced from the DDR and transmitting the first image and the second image to the first IP core;

the VDMA module is a DMA specially used for video image processing, the VDMA can conveniently realize a double-buffer mechanism and a multi-buffer mechanism, can efficiently realize data access, can well fit with an ARM internal framework, shortens the development period, can convert data into AXI4-Stream data Stream types, and facilitates the acceleration processing of images by an IP core written by HLS.

The first IP core is used for carrying out characteristic point detection on the first image and the second image, calculating gradient amplitude and direction of pixel points and transmitting the pixel points to the second IP core;

the second IP core is used for determining the main direction of the feature point and generating a feature descriptor set according to the obtained gradient amplitude and direction of the pixel point, and transmitting the result to the MICROBLAZE soft core;

both the two IP cores are HLS custom IP cores.

The MICROBLAZE soft core is used for carrying out feature point matching and image fusion on the first image and the second image according to the feature descriptor set; and after the image splicing is completed, the spliced image is returned to the DDR by using the first VDMA and is stored.

Meanwhile, the BRAM shared data area and an interrupt processing mechanism are used for realizing data interaction between MICROBLAZE and PS, so that the framework can effectively solve the problems of low instantaneity, high power consumption and the like caused by high computational complexity of the SIFT panorama stitching technology. The parallel capability of the FPGA is utilized to the maximum extent so as to reduce the time delay of the splicing process.

Example two

Based on the FPGA architecture provided by the first embodiment, the embodiment provides the panoramic image stitching method based on the FPGA architecture, and the SIFT panoramic image stitching process is completed in a soft and hard combination mode.

The method comprises the following specific steps:

step 1: reading a first image and a second image to be spliced from the DDR in parallel based on two VDMA modules;

step 2: designing an IP core through VivadoHLSIDE and realizing the feature point extraction of the first image and the second image in parallel;

the specific steps of extracting the features of the first image and the second image to be spliced in parallel comprise:

step 2.1: establishing a scale space of a first image and a second image through a first IP core;

the specific process is as follows:

(1) constructing a Gaussian pyramid for the first image/the second image, wherein the Gaussian pyramid comprises two groups, and each group comprises four layers of images with different scales; (2) and generating a differential pyramid corresponding to the first image/the second image, wherein the differential pyramid comprises two groups, and each group comprises three layers of images with different scales.

The method comprises the following steps of adopting a Gaussian fuzzy function to replace a floating point type Gaussian kernel function in the original SIFT algorithm, wherein the data type of the Gaussian fuzzy function is integer.

Two groups of images of the Gaussian pyramid are generated in parallel, and the generation of each layer of image in the group is generated in a serial mode due to data dependence.

The images in the groups and among the groups of the differential pyramid are generated in parallel, and the specific mode is as follows: the method is characterized in that the parallel generation of the Gaussian difference pyramids is realized based on two groups of Gaussian pyramids generated in parallel, and because the difference pyramid is calculated as the difference between two adjacent layers of Gaussian pyramids, the number of middle layers is increased, and the middle layers are copied, so that the parallel generation of images in the difference pyramid group can be realized. For example: the first layer of the difference pyramid is obtained by making a difference by the first layer and the second layer of the Gaussian pyramid, the second layer of the difference pyramid is obtained by making a difference by the second layer of the Gaussian pyramid, and the parallel generation of the first layer and the second layer in the difference pyramid group is realized by copying and adding the shared second layer of the Gaussian pyramid.

Step 2.2: performing feature point detection of the first image and the second image in parallel through a first IP core, and calculating gradient amplitude and direction of pixel points;

respectively detecting feature points based on the corresponding differential pyramid of the first image/the second image, wherein the detection method comprises the following steps: and performing feature detection on the middle layer based on the first image/second image differential pyramid, wherein the method specifically comprises five stages, namely removing points with lower pixel values, removing edge points, selecting extreme points, positioning extreme points and removing low-contrast extreme points, and the steps are performed according to the sequence to finally obtain the number of the feature points and coordinate values of the feature points in the first image/second image.

The method comprises the steps that two groups of images with different scales of a differential pyramid are used for carrying out feature point detection in parallel, in the feature point detection of each group of differential pyramid, because the detection process is carried out by taking a single pixel point in an intermediate layer image of the differential pyramid as an object, the number of the pixel points is large, after resources and parallel performance are comprehensively considered, all the pixel points in the intermediate layer image are divided into three groups, three paths of parallel feature point detection are realized, three independent detection resources are needed, finally, feature point coordinates obtained by the three paths of parallel features are combined, and the number of the feature points is accumulated, so that partial parallelization of feature point detection and positioning is realized.

And for each path of detection, the number of the extracted feature points is controlled by adjusting the image boundary, the pixel value threshold, the main curvature threshold, the contrast threshold, the offset threshold and the like, so that the number of the extracted feature points is proper, the pixel points which are far away from the boundary as far as possible and have accurate coordinates and high pixel values and contrast are selected as the feature points.

Determining the direction and the gradient of the characteristic point, specifically comprising:

and calculating the gradient amplitude and direction of all pixel points, generating a gradient histogram based on the gradient amplitude and direction of all 9 × 9 pixels near each feature point, and finally determining the main direction of each feature point. Because there is no data relevance to calculating the gradient amplitude and direction of all pixel points and the feature point location, we adopt a parallel processing method for the two calculation processes, because the gradient and direction of all pixel points are calculated by taking each pixel point as a unit, and the feature point location adopts a three-way parallel processing mode, so that three-way feature point location and three-way pixel point gradient and direction calculation processes can be executed in parallel based on 3 independent pixel point sets, and finally, a feature point coordinate set is obtained, and a set of the gradient and direction of all pixel points is obtained. Therefore, after the coordinates of the feature point are calculated, the gradient amplitude and the direction of the pixels around the feature point can be directly used for generating the feature descriptor, and the calculation time is greatly reduced compared with the prior serial design.

And in addition, the calculation method for calculating the gradient amplitude of the feature points is simplified in a mathematical level on the premise of not influencing the calculation of the main direction of the feature points, namely, the simple operation of absolute values is used for replacing the complex operation of opening root numbers, so that the calculation process of the main direction of the feature points is simplified.

Step 2.3: determining the main direction of the feature point through a second IP core and generating a feature descriptor set;

since the last stage of feature extraction requires the generation of feature descriptors based on the gradient magnitude and direction of the pixels around the feature point, and the coordinates and principal direction of the feature point. In the specific process of generating the feature descriptors, all feature points are grouped, three sets of circuits are used for respectively processing each feature point group in parallel, and finally descriptors obtained from each group are combined. So far, a feature point descriptor subset is obtained, and the feature extraction stage is completed. The parallel processing procedure of feature extraction is described in fig. 2.

At this point, the process of feature extraction is complete.

In the embodiment, the calculation method is simplified or executed in parallel at each stage in the SIFT feature detection process, specifically, a Gaussian fuzzy function with an integral data type is adopted to replace a floating-point type Gaussian kernel function in the original SIFT algorithm at the stage of constructing the pyramid, the number of layers of the Gaussian pyramid is simplified, feature point selection conditions are controlled at the feature positioning stage, and the parallel thought is fully introduced in the calculation process; the SIFT feature detection method is more suitable for the FPGA, the SIFT algorithm can be deployed on the FPGA for acceleration, and the panoramic stitching effect is not affected.

It is emphasized that the pre-processing and feature detection for the two images involved in steps 1-3 above are performed in parallel.

And step 3: image matching is realized through Vivado Microblaze IP;

and establishing a KDTREE according to the feature descriptors, implementing a BBF algorithm on the KDTREE to search a matching point of each feature point, and finally screening mismatching points by using a RANSAC algorithm. The method comprises the following specific steps:

(1) based on the information of the feature points obtained in the feature extraction stage, namely the feature descriptors, the KDTREE algorithm is used for feature matching search to obtain the nearest neighbor point (figure 2) of the target feature point (figure 1), namely the matching point of the target feature point, until all the feature points are matched. (2) Since the matching process has the characteristic point pairs which are mismatched in error, the optimal homography matrix obtained by using the RANSAC algorithm is used for removing the point pairs which are mismatched in error.

And 4, step 4: processing the overlapped part of the images by a Vivado Microblaze IP in a smooth transition mode of a weighted average method to realize image fusion, and returning the finally spliced images to the DDR again through the VDMA and storing the images. And finishing the splicing process completely to obtain the spliced image.

Those skilled in the art will appreciate that the modules or steps of the present invention described above can be implemented using general purpose computer means, or alternatively, they can be implemented using program code that is executable by computing means, such that they are stored in memory means for execution by the computing means, or they are separately fabricated into individual integrated circuit modules, or multiple modules or steps of them are fabricated into a single integrated circuit module. The present invention is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims

1. The utility model provides a FPGA framework for panorama concatenation is accelerated which characterized in that includes:

the first VDMA module and the second VDMA module are respectively used for reading a first image and a second image to be spliced and transmitting the first image and the second image to the first IP core;

2. The panorama-tiled-oriented FPGA architecture of claim 1 further comprising an ARM core controlling the operation of the first VDMA module, the second VDMA module, the first IP core, the second IP core, and the microbalaze soft core.

3. The FPGA architecture oriented to panoramic stitching according to claim 1, wherein after image fusion is completed by a MICROBLAZE soft core, the fused image is transmitted back to the DDR through the first VDMA module for storage.

4. The panorama-tile-oriented FPGA architecture of claim 1 wherein data interaction between the microbare soft core and the PS is implemented using a BRAM shared data area and an interrupt handling mechanism.

5. A panoramic image splicing method is characterized by comprising the following steps:

6. The panoramic image stitching method according to claim 5, wherein the performing of the scale space specifically comprises:

7. The panoramic image stitching method of claim 6, wherein in the process of constructing the Gaussian pyramid, a Gaussian blur function with an integer data type is adopted for image smoothing.

8. The panoramic image stitching method according to claim 5, wherein feature detection and calculation of gradient magnitude and direction of pixel points are realized in a parallel manner:

9. The panoramic image stitching method according to claim 5, wherein the performing feature point matching comprises: establishing KDTREE based on the feature descriptor, and performing feature point matching search; and removing mismatching points based on RANSAC algorithm.

10. A panoramic image stitching method as claimed in claim 5, characterized in that the image fusion is performed by a weighted average method.