Method and device for quickly registering visible light and infrared images
Technical Field
The invention relates to the field of image registration in image processing application, in particular to a method and a device for quickly registering visible light and infrared images.
Background
Image fusion is a technology of processing image data about the same target collected by a multi-source channel, extracting favorable information in respective channels, and finally synthesizing into a high-quality image. In recent years, the unmanned aerial vehicle technology is rapidly developed, an airborne photoelectric imaging system can provide the function of acquiring aerial images and is widely applied to the fields of aerial reconnaissance, traffic monitoring, field search and rescue and the like, and a visible light camera and an infrared thermal imager are generally used on an airborne photoelectric platform to acquire images. Compared with a visible light image, the infrared image reflects the difference of the outward radiation energy of the target and the background, and has obvious advantages at night and in haze weather. Meanwhile, the infrared image also has the defects of low pixel resolution, poor contrast, fuzzy edge and the like. The visible light image and the infrared image are fused, and the respective advantages of the two images can be combined, so that the shooting image of the unmanned aerial vehicle has higher image quality and better environmental adaptability.
When the unmanned aerial vehicle imaging platform acquires a target image, due to the influences of factors such as fuselage design, camera installation and atmospheric refraction, a visible light image and an infrared image acquired from the same scene have slight spatial difference, and the positions, directions and the like of certain targets are inconsistent on the image, so that the adverse effect on the image fusion effect is generated, and the visible light image and the infrared image must be subjected to high-precision registration before fusion.
The image registration refers to finding a mapping relation between source images and estimating relative motion parameters between the images, so that the positions of corresponding points of the images to be registered acquired in the same scene are completely consistent on the position of a real space pixel. The most common registration method in the prior art is image registration based on feature matching, specifically, feature point detection and matching are performed on an infrared image and a visible light image, a transformation matrix of the two images is established by finding out the most matched feature point pair, registration of the two images is realized, and then subsequent fusion of corresponding pixel points is performed. In order to ensure the matching effect, the existing feature matching algorithms such as SIFT and SURF have complex processes, generally, feature point extraction and descriptor calculation are performed first, then the relationship of the descriptor is used to perform the matching of the feature points between two images, and after the matching result is obtained, a certain criterion is used to perform screening and optimization of excellent matching. The registration process has two obvious defects, namely, the whole process has more steps and higher complexity, and descriptor calculation and feature point matching of the traditional feature matching algorithm are time-consuming and are not beneficial to realizing real-time processing of airborne embedded equipment of the unmanned aerial vehicle; secondly, the applicability is not wide enough, the effect of the feature matching algorithm is good in scenes with clear object edges and obvious features, but the gray gradient direction of a heterogeneous image in the corresponding neighborhood of the same-name point may be reverse in some special scenes, the accuracy of feature point matching is low, and the registration and fusion effect is greatly reduced.
Aiming at an airborne imaging platform with higher real-time requirement, another simple and efficient registration idea is to compare the similarity of the visible light image edge and the infrared image edge in a certain spatial offset range, and the offset when the maximum similarity is obtained by searching is used as a registration result, but in order to obtain the offset of pixel-level precision, the search is needed in the horizontal and vertical directions, a large number of search times can increase the time cost of the algorithm, the requirement of the platform real-time property cannot be met, the calculation speed of registration is improved while the fusion effect is guaranteed, and the method is a research direction of image registration and fusion algorithm in the field of unmanned aerial vehicles.
Disclosure of Invention
In view of technical defects and technical drawbacks in the prior art, embodiments of the present invention provide a method and an apparatus for fast registering visible light and infrared images, which overcome the above problems or at least partially solve the above problems, and an algorithm for fast and automatically registering visible light and infrared images based on two-stage search is provided, aiming to solve the problem of slow registration speed of visible light and infrared images, and the specific scheme is as follows:
as a first aspect of the present invention, there is provided a method for fast registration of visible and infrared images, the method comprising:
step 1, inputting a visible light image and an infrared image, respectively graying the visible light image and the infrared image, respectively obtaining grayscale images of the visible light image and the infrared image, respectively extracting edge information of the two grayscale images, and obtaining edge images of the two grayscale images, namely a visible light image edge image and an infrared image edge image;
step 2, performing first-stage translation traversal on the infrared image edge map, calculating the goodness of fit between the infrared image edge map and the visible light image edge map in the current state after each translation, and recording the position offset (x1, y1) when the goodness of fit is maximum until the first-stage translation traversal is finished;
and 3, taking the position offset obtained by the first-stage translation traversal as a reference, performing second-stage translation traversal in a preset range around the first-stage translation traversal, and finding the offset when the infrared image edge map and the visible light image edge map have the maximum goodness of fit in the range. Like the translation traversal in the first stage, the translation traversal in the second stage is also to calculate the goodness of fit of the infrared image edge map and the visible light image edge map in the current state after each translation, and obtain the offset when the goodness of fit of the infrared image edge map and the visible light image edge map in the preset range is maximum as the final registration result.
And 4, performing translation transformation on the infrared image by using the offset obtained in the step 3, aligning the infrared image with the visible light image, and finishing image registration.
Further, in step 1, edge information of two gray-scale images is extracted by using a Sobel operator, specifically:
setting an element value matrix of a gray scale image of the visible light image or the infrared image as I;
respectively convolving I with two convolution kernels with odd numbers to calculate a horizontal gradient image Gx and a vertical gradient image Gy of the gray level image, wherein the specific formula is as follows:
wherein I is an element value matrix of the grayscale image, a represents convolution operation, f (x, y) is a pixel value of a pixel point with coordinates (x, y) in the element value matrix, and a value is 0 to 255, and then a convolution result of the (x, y) point is specifically calculated as follows:
Gx(x,y)=(-1)*f(x-1,y-1)+0*f(x,y-1)+1*f(x+1,y+1)
+(-2)*f(x-1,y)+0*f(x,y)+2*f(x+1,y)
+(-1)*f(x-1,y+1)+0*f(x,y+1)+1*f(x+1,y+1);
calculation of G Using the same methody(x, y) in obtaining Gx(x, y) and GyAfter (x, y), the approximate gradient of the (x, y) point is calculated as:
and calculating the approximate gradient of each pixel point based on the formula so as to obtain the edge information of the whole gray level image.
Further, in step 2, for adding the time of block translation traversal, the translation traversal in the first stage performs coarse traversal with a large step length by using preset m pixels as step lengths, and after the position offset when the goodness of fit reaches the maximum is obtained;
further, for adding the block translation traversal time, the translation traversal in the first stage performs coarse traversal with a large step length by taking preset m pixels as a step length, and after the position offset when the goodness of fit reaches the maximum is obtained, the translation traversal in the second stage performs precision traversal with a small step length by taking preset n pixels as a step length, and the offset (x2, y2) when the goodness of fit of the infrared image edge map and the visible image edge map is the maximum is found in a surrounding preset range by taking (x1, y1) as a center, and is used as a final offset.
Further, m is 5n or more.
Further, the overall difference of the pixel values is adopted in the translation traversal to measure the goodness of fit of the two edge maps, which specifically comprises the following steps: after each translation, comparing the pixel values of each pixel point of the visible light edge image and the translated infrared edge image one by one, counting the number of pixels with the difference smaller than a preset value, wherein the more the number is, the more similar the two edge images are.
As a second aspect of the present invention, a device for rapidly registering visible light and infrared images is provided, the device including an image input module, a graying processing module, an edge image extraction module, a first-stage traversal module, a second-stage traversal module, and a registration module;
the image input module is used for inputting visible light images and infrared images;
the graying processing module is used for respectively graying the visible light image and the infrared image and respectively acquiring grayscale images of the visible light image and the infrared image;
the edge image extraction module is used for respectively extracting edge information of the two gray-scale images to obtain edge images of the two gray-scale images, namely a visible light image edge image and an infrared image edge image;
the first-stage traversal module is used for performing first-stage translation traversal on the infrared image edge map, calculating the goodness of fit between the infrared image edge map and the visible light image edge map in the current state after each translation, and recording the offset (x1, y1) when the goodness of fit reaches the maximum until the translation traversal in the first stage is finished;
the second-stage traversal module is used for performing second-stage translation traversal in a preset range around the first-stage translation traversal module by taking the position offset obtained by the first-stage translation traversal as a reference to find the offset when the infrared image edge map and the visible light image edge map have the maximum goodness of fit in the range;
and the registration module is used for performing translation transformation on the infrared image by using the offset obtained by the second-stage traversal module, aligning the infrared image with the visible light image and finishing image registration.
Further, the edge image extraction module extracts edge information of two gray-scale images by using a Sobel operator, specifically:
setting an element value matrix of a gray scale image of the visible light image or the infrared image as I;
respectively convolving I with two convolution kernels with odd numbers to calculate a horizontal gradient image Gx and a vertical gradient image Gy of the gray level image, wherein the specific formula is as follows:
wherein I is an element value matrix of the grayscale image, a represents convolution operation, f (x, y) is a pixel value of a pixel point with coordinates (x, y) in the element value matrix, and a value is 0 to 255, and then a convolution result of the (x, y) point is specifically calculated as follows:
Gx(x,y)=(-1)*f(x-1,y-1)+0*f(x,y-1)+1*f(x+1,y+1)
+(-2)*f(x-1,y)+0*f(x,y)+2*f(x+1,y)
+(-1)*f(x-1,y+1)+0*f(x,y+1)+1*f(x+1,y+1);
calculation of G Using the same methody(x, y) in obtaining Gx(x, y) and GyAfter (x, y), the approximate gradient of the (x, y) point is calculated as:
and calculating the approximate gradient of each pixel point based on the formula so as to obtain the edge information of the whole gray level image.
Further, for adding the block translation traversal time, the translation traversal in the first stage performs coarse traversal with a large step length by taking preset m pixels as a step length, and after the position offset when the goodness of fit reaches the maximum is obtained, the translation traversal in the second stage performs precision traversal with a small step length by taking preset n pixels as a step length, and the offset (x2, y2) when the goodness of fit of the infrared image edge map and the visible image edge map is the maximum is found in a surrounding preset range by taking (x1, y1) as a center, and is used as a final offset.
Further, m is 5n or more.
Further, the overall difference of the pixel values is adopted in the translation traversal to measure the goodness of fit of the two edge maps, which specifically comprises the following steps: after each translation, comparing the pixel values of each pixel point of the visible light edge image and the translated infrared edge image one by one, counting the number of pixels with the difference smaller than a preset value, wherein the more the number is, the more similar the two edge images are.
The invention has the following beneficial effects:
the method for quickly registering the visible light image and the infrared image based on the two-stage search abandons a complex characteristic matching process, firstly utilizes a large step length to quickly search in a translation traversal process, and then carries out subdivision search in a small area, simplifies a registering process and improves the running speed of an algorithm.
Drawings
Fig. 1 is a schematic flowchart of a method for rapidly registering visible light and infrared images according to an embodiment of the present invention;
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, as a first aspect of the present invention, there is provided a method for fast registration of visible light and infrared images, the method comprising:
step 1, inputting a visible light image and an infrared image, respectively graying the visible light image and the infrared image, respectively obtaining grayscale images of the visible light image and the infrared image, respectively extracting edge information of the two grayscale images, and obtaining edge images of the two grayscale images, namely a visible light image edge image and an infrared image edge image.
Specifically, a visible light image and an infrared image captured in the same scene at the same time are input, wherein the visible light image is a three-channel color image, and the infrared image is a single-channel image.
In order to facilitate processing and edge extraction, a visible light image is subjected to graying processing, in this embodiment, the input visible light image is in a YUV format, a Y channel represents brightness of the image, and a component of the Y channel is extracted, which is a pixel value of a grayscale image.
Limited by the imaging principle of the camera, the visible light and the infrared image shot at the same time have slight difference in spatial position, and need to be registered to further perform accurate fusion, and the default horizontal and vertical offsets are all within the range of 40 pixels in the embodiment.
Specifically, the Sobel operator is utilized to extract the image edge
Specifically, the image edge refers to a place where a pixel value is in transition, that is, a place where a change rate is maximum and a place where a derivative is maximum, and the image is imagined as a continuous function. However, the image is a two-dimensional discrete function, the derivative becomes a difference, which is referred to as the gradient of the image.
The Sobel operator is a discrete differential operator, and can be used to calculate the approximate gradient of the image gray scale, and the larger the gradient, the more likely the edge. The function of Soble operator integrates Gaussian smoothing and differential derivation, also called first order differential operator, and derivation operator is derived in horizontal and vertical directions to obtain gradient images of the image in x direction and y direction.
The operator enlarges the difference through the weight, the Sobel operator utilizes two convolution kernels with the weight to carry out gradient calculation, a horizontal gradient image and a vertical gradient image of a gray scale image of the visible light image or the infrared image are respectively Gx and Gy, and the I is respectively convolved with the convolution kernels with the two odd numbers to obtain Gx and Gy:
wherein, I is the element value matrix of the gray image, which expresses convolution operation, f (x, y) is the pixel value of the pixel point with the coordinate of (x, y) in the element value matrix, and the value is 0-255, G isxThe convolution result of (x, y) is specifically calculated as follows:
calculation of G Using the same methody(x, y) in obtaining Gx(x, y) and GyAfter (x, y), the approximate gradient of the (x, y) point is calculated as:
calculating the approximate gradient of each pixel point based on the formula so as to obtain the edge information of the whole gray level image;
the Sobel operator detects the edge according to the gray weighting difference of upper, lower, left and right adjacent points of the pixel point, the phenomenon that the edge reaches an extreme value is achieved, the Sobel operator has a smoothing effect on noise, and provides more accurate edge direction information, so that the Sobel operator is a more common edge detection method.
Because the visible image and the infrared image have different picture details and only the edge information of the object in the picture is uniform, the edge of the grayed image is further extracted for registration.
Step 2, performing first-stage translation traversal on the infrared image edge map, calculating the goodness of fit between the infrared image edge map and the visible light image edge map in the current state after each translation, and recording the position offset (x1, y1) when the goodness of fit is maximum until the first-stage translation traversal is finished;
and 3, taking the position offset obtained by the translation traversal in the first stage as a reference, performing the translation traversal in the second stage in a preset range around the position offset, finding the offset when the matching degree of the infrared image edge map and the visible light image edge map in the range is maximum, and calculating the matching degree of the infrared image edge map and the visible light image edge map in the current state after each translation in the translation traversal in the second stage to obtain the offset when the matching degree of the infrared image edge map and the visible light image edge map in the preset range is maximum as a final registration result.
Specifically, after the edge images of the visible light image and the infrared image are obtained, the registration process is entered in a positive mode, and the aim of the step is to find out the optimal offset through two layers of traversal loop, so that the goodness of fit of the visible light image and the infrared image is the highest. For the two edge images obtained in the previous step, the most matching of the edges is to make the two images most similar, and for simple and efficient calculation and reduction of time consumption in multiple traversal processes, the similarity of the two edge images in the traversal process is measured by using the index of pixel value difference.
In one traversal, successively comparing all pixel points of the visible light and the infrared edge image, counting the number of points with the corresponding pixel value difference value smaller than 20, taking the proportion of the points in all the pixel points as an index for similarity judgment, wherein the calculation formula of the similarity Sim is as follows:
in the above formula, Width and Height respectively represent Width and Height of the image, num represents the number of pixels with pixel value difference of less than 20 at corresponding points in the two edge images, and larger Sim value represents higher similarity of the two images and higher edge coincidence degree.
Specifically, the translation traversal is divided into two phases.
In the first stage of translation traversal, firstly, two variables are set to respectively represent offset values of x and y, the first process enables the two variables to be gradually valued from-40 to 40 by taking m pixels as step lengths, m is preferably 5, traversal is realized by adopting a double-layer loop, for any cycle, translation transformation is carried out on an infrared edge image according to current loop variables x0 and y0, then the similarity Sim value of the translated image and a visible edge image is calculated, and after traversal is finished, the offset (x1 and y1) when Sim is maximum is recorded, and as the step length of the first process is 5, the offset which is roughly estimated is obtained at the moment;
in the second stage of translation traversal, the (x1, y1) offset is used as a reference, translation traversal is performed again within the range of 5 pixels around the offset by using m pixels as a step length, m is preferably 1, the similarity is calculated, and finally an accurate offset is obtained; specifically, with X1 and Y1 as centers, making two secondary offset value variables at this stage from-10 to 10, taking 1 pixel as a step size to take values step by step, similarly calculating a similarity Sim value between the image after each translation and the visible light edge image, and after the traversal is finished, recording the global offset (X2, Y2) when Sim is maximum as a final result of the registration.
The infrared original image is subjected to translation transformation by using the spatial offset (x2, y2) obtained by two-stage traversal, so that the infrared original image can be aligned with the visible light image, the registration process is completed, and preparation is made for image fusion.
As a second embodiment of the present invention, a device for rapidly registering visible light and infrared images is provided, where the device includes an image input module, a graying processing module, an edge image extraction module, a first-stage traversal module, a second-stage traversal module, and a registration module;
the image input module is used for inputting visible light images and infrared images;
the graying processing module is used for respectively graying the visible light image and the infrared image and respectively acquiring grayscale images of the visible light image and the infrared image;
the edge image extraction module is used for respectively extracting edge information of the two gray-scale images to obtain edge images of the two gray-scale images, namely a visible light image edge image and an infrared image edge image;
the first-stage traversal module is used for performing first-stage translation traversal on the infrared image edge map, calculating the goodness of fit between the infrared image edge map and the visible light image edge map in the current state after each translation, and recording the offset (x1, y1) when the goodness of fit reaches the maximum until the translation traversal in the first stage is finished;
the second-stage traversal module is used for performing second-stage translation traversal in a preset range around the first-stage translation traversal module by taking the position offset obtained by the first-stage translation traversal as a reference to find the offset when the infrared image edge map and the visible light image edge map have the maximum goodness of fit in the range;
and the registration module is used for performing translation transformation on the infrared image by using the offset obtained by the second-stage traversal module, aligning the infrared image with the visible light image and finishing image registration.
Preferably, the edge image extraction module extracts edge information of two gray-scale images by using a Sobel operator, specifically:
setting an element value matrix of a gray scale image of the visible light image or the infrared image as I;
respectively convolving I with two convolution kernels with odd numbers to calculate a horizontal gradient image Gx and a vertical gradient image Gy of the gray level image, wherein the specific formula is as follows:
wherein I is an element value matrix of the grayscale image, a represents convolution operation, f (x, y) is a pixel value of a pixel point with coordinates (x, y) in the element value matrix, and a value is 0 to 255, and then a convolution result of the (x, y) point is specifically calculated as follows:
Gx(x,y)=(-1)*f(x-1,y-1)+0*f(x,y-1)+1*f(x+1,y+1)
+(-2)*f(x-1,y)+0*f(x,y)+2*f(x+1,y)
+(-1)*f(x-1,y+1)+0*f(x,y+1)+1*f(x+1,y+1);
calculation of G Using the same methody(x, y) in obtaining Gx(x, y) and GyAfter (x, y), the approximate gradient of the (x, y) point is calculated as:
and calculating the approximate gradient of each pixel point based on the formula so as to obtain the edge information of the whole gray level image.
Preferably, for adding the block translation traversal time, the translation traversal in the first stage performs coarse traversal with a large step size by taking preset m pixels as a step size, and after the position offset when the goodness of fit reaches the maximum is obtained, the translation traversal in the second stage performs precision traversal with a small step size by taking preset n pixels as a step size, and with (x1, y1) as a center, the offset (x2, y2) when the goodness of fit of the infrared image edge map and the visible image edge map is maximum is found in a peripheral preset range and is used as a final offset.
Preferably, m is greater than or equal to 5 n.
Preferably, the overall difference of the pixel values is adopted in the translation traversal to measure the goodness of fit of the two edge maps, which specifically is as follows: after each translation, comparing the pixel values of each pixel point of the visible light edge image and the translated infrared edge image one by one, counting the number of pixels with the difference smaller than a preset value, wherein the more the number is, the more similar the two edge images are.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.