CN108460794B

CN108460794B - Binocular three-dimensional infrared salient target detection method and system

Info

Publication number: CN108460794B
Application number: CN201611136900.4A
Authority: CN
Inventors: 柏连发; 张超; 韩静; 张毅
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2016-12-12
Filing date: 2016-12-12
Publication date: 2021-12-28
Anticipated expiration: 2036-12-12
Also published as: CN108460794A

Abstract

The invention provides a binocular stereo infrared salient object detection method and system, on the basis of the existing LARK salient object detection, local features of an image are added to construct local area covariance, meanwhile, brightness information and spatial information of an image area are introduced, the local salient object detection is expanded to global salient object detection, and a satisfactory salient image is finally obtained; meanwhile, in order to realize portable carrying, the invention provides a hardware real-time processing system based on DSP + FPGA, which realizes the functions of infrared significant target extraction, significant target ranging, final significant target colorization output and the like, and can also meet the real-time property of final processing.

Description

Binocular three-dimensional infrared salient target detection method and system

Technical Field

The invention belongs to the technical field of salient object detection and object identification, and particularly relates to a binocular three-dimensional infrared salient object detection method and system.

Background

Vision is attracting attention as one of the ways people perceive and recognize the world. With the development of technology, people cannot meet the society of feeling surfaces only by human eyes, and hope to deeply mine information in real life by seeing the affairs observed by human eyes. With the development of computer technology, the ability of fast operation gradually changes human life in the later period, and the processing of visual images by a computer is also gradually developed, thereby forming a new field of computer vision. The main function of computer vision is to sense a two-dimensional image space and expand the two-dimensional image space into a three-dimensional space through a computer to obtain two-dimensional or even three-dimensional space information of the image, and to replace human eyes to know the world more deeply. Knowledge required to process computer vision covers disciplines such as statistics, psychology, signal and system, depth mining, etc. The us originated in the 50's of the 20 th century as a science and technology strong country for computer vision. By the 20 th century around the 60 s, the external world was defined by the Roberts professor of the american massachusetts as the "three-dimensional building block world". The three-dimensional information of the computer vision is restored by later program processing, so that the two-dimensional image information is formally popularized to the three-dimensional processing, which indicates the generation of the stereoscopic vision technology.

For salient object detection, aiming at highlighting objects or objects in salient areas on images in vision, how to improve the performance of the saliency detection algorithm is a fundamental issue that has been widely focused in recent years. Saliency detection has wide applications in the computer vision field and image processing tasks, such as image/video compression and segmentation, content perception and image resizing including image stitching, and the like. The extraction of saliency information is also exploited in advanced visual fields, such as object detection, face recognition, and a number of saliency detection algorithms are used to capture different saliency cues. Most conventional saliency models use mainly central surround filters or image statistics to identify complex scenes (local complexity/contrast) or rare salient regions in their appearance (rare/impossible). The Shannon self-information method mainly utilizes the negative logarithm of the probability of image pixels to measure and control the impossible probability of local significant target information, and further serves as a top-down significance model.

Disclosure of Invention

The invention provides a binocular stereo infrared salient object detection method and system, which are characterized in that local features of an image are added to construct local area covariance on the basis of the conventional LARK salient object detection, brightness information and spatial information of an image area are introduced at the same time, the local salient object detection is expanded to global salient object detection, and a satisfactory salient image is obtained finally. Meanwhile, in order to realize portable carrying, the invention provides a hardware real-time processing platform based on DSP + FPGA, which realizes the functions of infrared significant target extraction, significant target ranging, final significant target colorization output and the like, and can also meet the real-time property of final processing.

In order to solve the technical problem, the invention provides a binocular three-dimensional infrared salient target detection method, which comprises the following steps:

step 1, performing polar line correction on an infrared image sequence acquired by a binocular camera by using the same group of transformation matrixes;

step 2, using the covariance matrix of the local features as the eigenvalue to carry out significance detection, wherein the calculation method is as the formula (1) as the formula,

wherein, S (r)_k) Is the significant value of pixel point k, D_S(r_k,r_i) Are different regions r in the infrared image_kAnd region r_iThe spatial distance weight between, σ_sAdjusting coefficients for the spatial weights; b (r)_k,r_i) Is a region r_kAnd region r_iA brightness relationship of (A), and

Sum(r_k) Is a region r_kSum of pixel luminance values, Sum (r)_i) Is a region r_iA sum of pixel luminance values;

as a feature covariance matrix

And

the similarity between the two, and in some cases,

wherein λ is_mIs the characteristic covariance matrix C at the pixel point l_lAnd the feature covariance matrix C at pixel point i_iThe calculation method of the generalized characteristic value of (2) is shown as the formula,

λ_mC_lx_m-C_lx_m＝0,m＝1,2…d (2)

wherein x is_mIs a generalized eigenvector, d represents the characteristic of the eigenvectorCharacteristic covariance matrix C at number, pixel point i_iThe calculation method of (2) is shown in formula (3),

wherein h is_kRepresenting a characteristic matrix at a pixel point i; u. of_iMeans for representing a feature vector; n represents the total number of pixels in the selection window; characteristic matrix h at pixel point i_kAs shown in the formula (4),

h_k＝[I(x,y),I_ve(x,y),I_le(x,y),K(x,y),x,y] (4)

wherein I (x, y) is a pixel gray value of the image; i is_ve(x, y) and I_le(x, y) are the vertical and horizontal gradient values of the image; k (x, y) is the LARK kernel of the infrared image; x and y represent the abscissa and ordinate of the pixel point in the infrared image;

step 3, after extracting the significant target of the infrared image, calibrating the boundary of the significant target by using a connected domain, and carrying out binarization; and selecting the central point position of the significant target, and measuring the final distance by using the parallax of the central point pixels of the significant target of the left and right images according to a triangular distance measurement method.

The invention also provides a binocular three-dimensional infrared salient target detection method, which comprises two infrared cameras, a variable voltage power supply, a DSP processor, an FPGA and a VGA display; the infrared camera is used as a binocular camera for collecting two infrared images and outputting the infrared images to the FPGA; the FPGA receives the infrared image collected by the infrared camera and sends the infrared image to the DSP; meanwhile, receiving an image result processed by the DSP and sending the image result to a VGA display for display; the DSP processes the binocular infrared image to acquire position information and distance information of a significant target in the infrared image, and transmits the image to the FPGA.

Further, the method for processing the binocular infrared image by the DSP comprises the following steps:

as a feature covariance matrix

And

the similarity between the two, and in some cases,

λ_mC_lx_m-C_lx_m＝0,m＝1,2…d (2)

wherein x is_mIs a generalized eigenvector, d represents the number of the eigenvector, and the covariance matrix C of the characteristic at pixel point i_iThe calculation method of (2) is shown in formula (3),

h_k＝[I(x,y),I_ve(x,y),I_le(x,y),K(x,y),x,y] (4)

Compared with the prior art, the invention has the remarkable advantages that: (1) the method can effectively determine the position of the infrared salient target and extract the contour characteristic information of the salient target aiming at the characteristics of unobvious infrared salient target structure, complex infrared background and the like. (2) Based on a DSP6678+ FPGA hardware binocular platform, the invention can shorten the calculation time of the system and realize the final real-time processing through the reasonable distribution of eight-core calculation in the DSP6678 and the pretreatment of the image by the FPGA. (3) The system has simple design and high stability, and can be carried portably. Meanwhile, the functions of obvious target ranging and the like are introduced, so that the final system is more practical.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

FIG. 2 is a schematic diagram of the system of the present invention.

Fig. 3 is a schematic diagram of binocular infrared epipolar line correction results of the present invention.

Fig. 4 is a schematic illustration of salient object detection in accordance with the present invention.

FIG. 5 is a diagram illustrating DSP multi-core task allocation in the present invention.

Detailed Description

It is easily understood that according to the technical solution of the present invention, those skilled in the art can imagine various embodiments of the binocular stereo infrared salient object detecting method and system of the present invention without changing the essential spirit of the present invention. Therefore, the following detailed description and the accompanying drawings are merely illustrative of the technical aspects of the present invention, and should not be construed as all of the present invention or as limitations or limitations on the technical aspects of the present invention.

With the attached drawings, the specific flow of the infrared significant target extraction and the hardware realization thereof provided by the invention is as follows:

1. binocular infrared significant target extraction method

The method comprises the following steps: polar line correction calculation is carried out on the collected binocular infrared image

The left and right infrared images acquired by the binocular stereoscopic night vision system have certain parallax in the horizontal direction, and a series of operations such as distance measurement and the like can be performed on a significant target by utilizing the parallax. For the binocular stereo system, certain errors exist in the vertical direction of the two lenses, so that the errors are eliminated by selecting limit correction, and the detection precision is improved. The traditional epipolar line correction is mainly to carry out epipolar line correction on images collected by a binocular camera after measuring a change matrix of the images. But the computing process adds computational complexity to the hardware platform for real-time processing. The binocular stereo system adopted by the invention has real-time detection, and has strict requirements on the calculation time of the algorithm. The algorithm with high precision and low operation is more suitable for a real-time hardware platform, so the invention selects the epipolar line correction method of a single transformation matrix.

According to two left and right infrared images collected by a camera as I_LAnd I_RThe polar transformation matrix can be calculated as L₁And R₁Specifically, the following transformations:

wherein, X_L1The result graph is obtained after the left camera is subjected to epipolar line correction; x_R1The result graph is obtained after the right camera is subjected to epipolar line correction; the operation symbol represents the final correction image obtained by the image through rotation and translation under the action of the transformation matrix; l is₁And R₁And (3) representing transformation matrixes of the left camera and the right camera, and specifically solving the following steps:

wherein f is_LAnd f_RThe focal lengths of the left and right cameras are respectively; w is a_L，h_LAnd w_R，h_RThe width and height of the imaging image of the left-right camera. Since the camera resolution is not changed, L₁And R₁Is determined by the current focal length of the camera.

Different two-path images are acquired, and the epipolar rectification matrixes among the two paths of images are different, so that the epipolar rectification matrixes of different image pairs need to be calculated, and the complexity of the system is increased. The single epipolar line correction method is based on different acquired image pairs, and can perform correction by using the same set of transformation matrixes under the condition that the focal length of a camera is fixed.

When a pair of images is acquired

And

after conversion, the final corrected image can be obtained

And

the concrete transformation formula is as follows:

when the focal length of the camera is stable and unchanged, the same transformation matrix can be utilized to satisfy all image pairs acquired by the binocular camera. In order to reduce the calculated amount, two infrared images acquired in a binocular stereo mode are subjected to epipolar line correction by taking the left infrared image as a reference image, the error of the two images in the vertical direction is reduced, and a specific transformation formula is as follows:

wherein, L and R represent that the current left and right images I are collected_LAnd I_RThe calculation method of the transformation matrix of the left camera and the right camera is shown in the formulas (2) and (3); x_LAnd X_RThe two-path infrared image after limit correction.

Step two: detecting salient objects of infrared images

The invention utilizes a global contrast saliency detection method based on regional feature covariance information. The method uses a local feature covariance matrix as a feature value to carry out significance detection, introduces regional brightness information, increases contrast between a significant target and background information, introduces spatial weighting information, expands local significance detection to global significance detection, and obtains a final significance map.

2.1 computing local feature covariance

The change of the image gradient can effectively indicate the homogeneity degree of the local area. Generally, a background area part in an image is generally flat, and gradient change is not obvious, so that the homogeneity degree of the background part is high, and the obvious effect is not obvious; and the structural information of the significant target part is obvious, and the gradient change is large, so the homogeneity degree of the significant target area is low, and the significant target is obvious. For the infrared image, since the structural information of the infrared image is not obvious, the brightness information is more emphasized to distinguish the salient region from the background region. For the LARK is used as a characteristic value, the main characteristic information of the LARK is the gradient information of the image, and the gradient information of the image is not obvious to the infrared image, so that other characteristic information needs to be introduced to enhance the significance detection of the infrared image.

The traditional image characteristics mainly comprise gradient information of an image, morphological erosion and morphological expansion of the image, information entropy and the like. Each pixel point in the image can use a feature matrix to describe the feature information of the point, and the feature matrix is composed of a plurality of feature vectors of the point. For an image region R of size M × N, each pixel may describe feature information at the point by a d-dimensional feature matrix. The invention mainly utilizes five characteristic vector information of each pixel point to construct a characteristic matrix to represent the characteristic information of the image at each pixel point:

h_k＝[I(x,y),I_ve(x,y),I_le(x,y),K(x,y),x,y] (9)

wherein, I (x, y) is a pixel gray value of an image, and is a basic feature element in the detection of the image saliency; i is_ve(x, y) and I_le(x, y) are vertical and horizontal gradient values of the image, representing structural feature information of the image; k (x, y) is an LARK core value of the image and represents the local structural information change of the image and the difference thereof; and x and y represent the abscissa and the ordinate of a pixel point in the image and represent the position information of the pixel point.

And according to the multi-feature matrix of each pixel point, introducing a covariance matrix at the later stage to realize the synthesis of the features.

The characteristic covariance of each pixel point is obtained, and a covariance matrix of a window center position i, namely the covariance matrix of the center pixel point, can be obtained by selecting a pixel window with the size of m multiplied by m around each pixel point. The covariance matrix of the feature at pixel point i can be expressed as:

wherein, C_iRepresenting a feature covariance matrix at the point; h is_kRepresenting the feature matrix thereat; u. of_iMeans for representing a feature vector; n represents the total number of pixels in the selection window.

Characteristic covariance matrix C_iIs a symmetric matrix characterized in that the elements on the diagonal of the matrix represent the variance between each feature, and the elements on the non-diagonal represent the correlation between the features. The method for calculating the characteristic distance between two pixel points in an image by adopting a nearest neighbor method, namely the similarity between two covariance matrixes can be measured by the following formula:

wherein λ is_mIs C_lAnd C_iThe generalized eigenvalue of (2) is calculated as follows:

λ_mC_lx_m-C_lx_m＝0,m＝1,2…d (12)

wherein x is_mIn the generalized eigenvector, d represents the number of features of the eigenvector.

2.2 adding luminance region information

For an infrared image, the visual effect of the human eye may be more interesting for areas of the image where the contrast in brightness is very large. In addition to using the contrast of features in the infrared image to detect saliency, the luminance relationship also plays a significant role in saliency detection and is also used as a judgmental factor for saliency detection. For human vision, a high-luminance contrast between adjacent regions is more attractive to the visual attention than a low-luminance contrast. While the covariance of the pixel characteristics is calculated, the calculation of the brightness relation is introduced to enhance the final significant processing result, and two regions r_kAnd r_iThe brightness relationship of (1) is as follows:

Sum(r_k) Is r_kSum of area pixel luminance values.

2.3 adding spatially weighted regional contrast

In order to better introduce the influence of spatial information on the significance detection result and expand the local significance detection to the global significance detection, the invention introduces a spatial weight. The introduction of the space weight value enhances the space influence effect between the regions, so that the significant proportion between the adjacent regions is enhanced, the significant proportion between the regions with longer distances is weakened, and the influence of the global variable on the final significant result is considered.

The invention defines the significance of the final image as:

wherein D is_S(r_k,r_i) Denotes the region r_kAnd r_iThe spatial distance weight between, σ_sControlling spatial weight intensity, σ_sThe larger the spatial weight is, the smaller the influence of the spatial weight is, so that the contrast of a farther region makes a larger contribution to the significant value of the current region; b (r)_i) B (r) is luminance area information, the more distinct the contrast of the luminance area is_i) The greater the contribution of (c);

similarity of covariance of two pixel features; s (r)_k) Is the significant value of point k.

The invention defines the distance between two regions as the Euclidean distance of the gravity centers of the two regions, and sigma is used for detecting the final significance_s ²The coordinates of the pixels are uniformly normalized to 0.4 to [0,1]In the interval.

Step three, ranging the detected salient target

After the significant target of the infrared image is extracted, the boundary of the significant target is calibrated by adopting connected domain calibration. The target pixel selected by the invention is the central point of each significant target connected domain between the two significant target images. Since the two images are corrected by the epipolar lines, the salient objects are at the same horizontal position, and the distance between the salient objects is only the parallax of the pixels in the horizontal direction. The invention measures the parallax in the horizontal direction as d, and the focal length in the X-axis direction of the two cameras is known as f before_x2390.42, base line distance size B43.4 cm, in the triangulation public:

according to the calculation between the parallaxes, the corresponding distance size between the salient objects can be obtained. And selecting different color bars to represent the distance according to different distance intervals, and marking the position information of the remarkable target on the original image. 2. System platform set-up

In order to achieve the purpose of extracting a significant target of binocular stereoscopic night vision, the invention builds a set of binocular stereoscopic night vision hardware system, and the system only comprises two infrared cameras with the resolution of 640 multiplied by 512, a variable voltage power supply, a DSPTMSC6678 processor, an FPGA (spark tantalum 6 chip), VGA display output (the resolution of a display screen is 768 multiplied by 576) and other components. The binocular stereo night vision hardware has the functions of collecting binocular infrared images by two infrared cameras, correcting the two infrared images, extracting the significant target of the images, marking the position information of the significant target on the original infrared images, measuring the position information of the significant target according to the parallax of the two infrared images, and finally outputting the position information through a display screen.

Step 1 communication between camera and FPGA

The infrared camera and the FPGA are connected through a PAL interface. The PAL system interface mainly uses an ADV7180 decoding chip manufactured by ADI corporation to input an image. The chip supports 3-channel simultaneous input and single-channel dynamic switching output, can automatically identify and demodulate analog composite video signals of NTSC, PAL and SECAM systems, and outputs line synchronizing signal HS, FIELD synchronizing signal VS, FIELD mark signal FIELD and 8-bit YCbCr 4:2:2 format image signals conforming to ITU-R BT.656 standard after AD conversion and decoding

Step 2, communication between FPGA and DSP

The process of establishing a data transmission communication link between the DSP6678 and the FPGA through the SRIO port is as follows: firstly, initializing the SRIO port of the DSP end, configuring parameters of the SRIO port, and simultaneously setting the equipment ID address of the SRIO port and the transmission rate of final data. After the initialization is successful, the FPGA sends data to the DSP, then sends a Doorbell packet, the DSP triggers inter-core interruption after receiving the Doorbell packet, reads the data sent by the FPGA, and establishes communication connection.

The image transmission specific workflow is as follows:

(1) the FPGA reads an image from the DDR3 of the FPGA, then sends the image to the DSP through the SWRITE data packet, sends a doorbell packet to the DSP after the FPGA sends a complete image, waits for the DSP to return a doorbell response packet to the FPGA, enters corresponding transmission delay after receiving the response packet, and then sends another image.

(2) After receiving the SWRITE packet transmitted by the FPGA, the DSP stores the data to the memory area position of the destination core through internal DMA. And after the DSP receives the doorbell packet of the FPGA, changing the value of the doorbell in the program from 1 to 0, jumping out of the self-circulation state of the program, and starting to execute the algorithm processing module corresponding to the program. After the eight-core processing is completed, the core 7 transmits the final result of the DSP to the FPGA in the form of a switch packet.

(3) After receiving the SWRITE packet of the DSP, the FPGA stores the data into the corresponding DDR3, and displays the image through VGA display at the later stage.

Step 3DSP inter-core processing communication

In the invention, the binocular stereo is two infrared images, the calculation time is improved by using parallel processing when the extraction of the significant target is processed, meanwhile, the epipolar correction of the images in the binocular stereo needs to calculate the global information of the two images, the images cannot be processed respectively, and finally, the extraction of the binocular stereo night vision significant target is realized by adopting a stream processing mode under parallel connection.

The task of the core 0 is mainly to establish a communication link with the FPGA, receive two infrared images from the FPGA at the same time, carry out polar line correction on the two infrared images and transmit the two infrared images to the next core, the significant target extraction of one infrared image is respectively carried out by the core 1, the core 2, the core 3 and the core 4, the distance measurement between the two infrared images is carried out by the core 5, the colorization display between the infrared images is carried out by the core 6, the DSP and the FPGA are communicated by the core 7, and the processed images are transmitted to the FPGA. Eight-core communication between the DSPs 6678 uses Message communication. The advantage of the Message communication mode is that a single core can receive the queue Message of the previous core and send the queue Message to the next core when executing the Message communication. Any inter-core communication between eight cores in the DSPTMS320C6678 can be realized only by knowing the queue address between cores. In addition, the Message Q _ alloc () creates a dynamic memory space in the core memory, and can arbitrarily control the length of the Message queue, so that the Message communication mode has obvious advantages for processing and transmitting huge data

Repeated tests prove that the method obtains satisfactory application effect, can obtain ideal imaging effect, can be widely applied to the aspects of military detection, biomedicine, unmanned vehicle automatic driving and the like, and has good application prospect.

Claims

1. A binocular stereo infrared salient object detection method is characterized by comprising the following steps:

wherein, S (r)_k) Is the significant value of pixel point k, D_S(r_k，r_i) Are different regions r in the infrared image_kAnd region r_iThe spatial distance weight between, σ_sAdjusting coefficients for the spatial weights; b (r)_k，r_i) Is a region r_kAnd region r_iA brightness relationship of (A), and

as a feature covariance matrix

And

the similarity between the two, and in some cases,

λ_mC_lx_m-C_ix_m＝0，m＝1，2…d (2)

h_k＝[I(x，y)，I_ve(x，y)，I_le(x，y)，K(x，y)，x，y] (4)

2. A binocular stereo infrared salient object detection system is characterized by comprising two infrared cameras, a variable voltage power supply, a DSP processor, an FPGA and a VGA display;

the infrared camera is used as a binocular camera for collecting two infrared images and outputting the infrared images to the FPGA;

the FPGA receives the infrared image collected by the infrared camera and sends the infrared image to the DSP; meanwhile, receiving an image result processed by the DSP and sending the image result to a VGA display for display;

the DSP processes the binocular infrared image to acquire position information and distance information of a significant target in the infrared image, and transmits the image to the FPGA;

the method for processing the binocular infrared image by the DSP comprises the following steps:

as a feature covariance matrix

And

the similarity between the two, and in some cases,

λ_mC_lx_m-C_ix_m＝0，m＝1，2…d (2)

wherein x is_mIs a generalized eigenvector, d represents the number of the eigenvector, and the covariance matrix C of the characteristic at pixel point i_iIs calculated byThe method is shown in a formula (3),

h_k＝[I(x，y)，I_ve(x，y)，I_le(x，y)，K(x，y)，x，y] (4)