CN110992304A

CN110992304A - Two-dimensional image depth measuring method and application thereof in vehicle safety monitoring

Info

Publication number: CN110992304A
Application number: CN201911044348.XA
Authority: CN
Inventors: 郭宇翔; 郭中阳
Original assignee: Zhejiang Libang Hexin Automotive Brake System Co ltd
Current assignee: Zhejiang Libang Hexin Automotive Brake System Co ltd
Priority date: 2019-10-30
Filing date: 2019-10-30
Publication date: 2020-04-10
Anticipated expiration: 2039-10-30
Also published as: CN110992304B

Abstract

The application discloses a depth measurement method of a two-dimensional image and application of the depth measurement method in vehicle safety monitoring. And obtaining the depth information of the distance according to the correlation comparison between the image texture and the blur. When the method is applied to vehicle safety monitoring, the distance between the vehicle and other road passing participants can be estimated according to the depth information, and blind spot detection, automatic emergency braking, adaptive cruise control and the like can be realized. According to the measuring method, three-dimensional depth information can be obtained through a single-frame plane two-dimensional image, the calculated amount in the whole depth measuring process is small, the requirements on the pixel color, the brightness degree, the static background, the application scene and the like of the acquired two-dimensional image are avoided, and the measuring method is high in applicability and good in real-time performance; and the method does not depend on the accumulation of subjective experience of users and has high reliability.

Description

Two-dimensional image depth measuring method and application thereof in vehicle safety monitoring

Technical Field

The invention relates to a two-dimensional image depth measuring method and application thereof in vehicle safety monitoring, belonging to the field of vehicle safety monitoring.

Background

The comprehensive application of the advanced auxiliary driving system (ADAS) technology is expanded, and the safety of the automobile is effectively improved. In the ADAS functional category, camera applications are receiving more attention than radar and ultrasound sensor applications. High cost stereoscopic binocular cameras are not preferred due to the extremely demanding requirements of popular automotive products for cost control.

The monocular camera is obviously applied to the vehicle because the use range of the monocular camera is all over the aspects of daily life including automobiles. The automobile is a dynamic carrier, and the position of the automobile needs environment sensing technology for positioning. The image acquired by the monocular camera is planar and two-dimensional, and a two-dimensional image algorithm is required to calculate the depth information, namely the distance information between the image and other static and dynamic objects around the automobile.

At present, for image depth information extraction of a monocular camera, the following methods exist:

(1) the image segmentation method for foreground and background comparison comprises the following steps: color information and spatial position information of each pixel are needed, the calibration calculation amount of classification and distance information is large, the requirement on pixel quality is high, and the problems of calculation capacity and calculation cost exist;

(2) the multiple spatial scale method comprises the following steps: defining multiple dimensions for an image to contain necessary information to construct a multi-dimensional space diagram, constructing a coordinate axis of a space, explaining the whole space structure according to subjective experience, analyzing and classifying in a low-dimensional space, and calculating distance information. The reliability problem of the calculation result exists due to the premise of closely associating subjective experience;

(3) the semantic segmentation labeling method comprises the following steps: and dividing each pixel in the image into corresponding categories to realize pixel-level classification. On the basis, labeling is carried out to distinguish different scenes and objects so as to estimate distance information, and the distinguishing of scene division categories has the problems of large calculation amount and rough detail loss result;

(4) the motion and geometry fusion method comprises the following steps: performing scene segmentation on each frame of two-dimensional video image, and separating a static background from a dynamic foreground; and generating a geometric depth map of the static background based on the geometric information to calculate the distance information. The problems that the identification of the static background is changed in the actual automobile driving scene and the real-time response is delayed exist;

(5) the line segment feature extraction method comprises the following steps: the method needs to construct a structured environment of point features to establish the line segment features, but the point has a large dependence on the environment, and the method has a poor performance in scenes such as texture missing. In addition, the application scene often needs to utilize point features and line features extracted from images of a binocular camera.

In view of the above, the present inventors have studied this, and developed a two-dimensional image depth measurement method and its application in vehicle safety monitoring.

Disclosure of Invention

One of the objectives of the present invention is to provide a two-dimensional image depth measurement method, which obtains depth information of distance according to correlation comparison between texture and blur, and further obtains a distance value estimation.

In order to achieve the above object, the solution of the present invention is:

a two-dimensional image depth measurement method comprises the following steps:

1) uniformly dividing the two-dimensional image into N blocks, setting one block as a reference position block and setting the rest N-1 blocks as to-be-detected blocks;

2) carrying out coarse and fine analysis on N blocks of the two-dimensional image so as to enhance the saliency of image texture features and highlight blurred edge sharpness information;

3) performing principal component analysis on the extracted image texture features to reduce the image data dimension and obtain the edge line feature quantity of the pixel set;

4) and analyzing and processing the edge line characteristic quantity of the pixel set through a spatial frequency domain to obtain the texture density of an image spatial frequency domain, and calculating the distance information of each block to be detected relative to the reference position block according to the texture density.

Preferably, the coarse and fine analysis is performed by down-sampling according to a set scaling to obtain 3-6 images including the original image, and then texture feature extraction is performed on the obtained images. The purpose of the coarse and fine analysis is to strengthen the highlighting of texture features and fuzzy edge sharpness feature information which need to be extracted in different pixel maps and improve the confidence coefficient of feature extraction.

Preferably, the spatial frequency domain analysis uses discrete cosine transform (dct) (discrete cosine transform). The discrete cosine transform DCT processes only real part calculations, with no information loss with respect to the calculation of principal component analysis results.

The two-dimensional image depth measuring method of the invention comprises the following two-dimensional image characteristics: the monocular camera for shooting the two-dimensional image is set to be focused far, clear textures are arranged near the focus, and the image near the camera is fuzzy. According to the correlation comparison between the texture and the blur, the depth information of the distance can be obtained. By performing quantitative calibration on the far and near depth information, an estimation of the distance value can be obtained.

By the two-dimensional image depth measuring method, three-dimensional depth information can be obtained from a single-frame plane two-dimensional image, the calculated amount in the whole depth measuring process is small, the requirements on the pixel color, the brightness degree, the static background, the application scene and the like of the acquired two-dimensional image are avoided, and the method is high in applicability and good in real-time performance; the method does not depend on the accumulation of subjective experience of users, and has high reliability; in addition, the feature extraction calculation speed can be accelerated by block division.

The invention also aims to provide a vehicle safety monitoring method, which acquires images through a low-cost monocular camera and obtains three-dimensional depth information through single-frame planar two-dimensional image processing, so that the position of a static or dynamic environment object around the vehicle is presumed, and the safety distance is sensed.

In order to achieve the above object, the solution of the present invention is:

a vehicle safety monitoring method specifically comprises the following steps:

1) the method comprises the steps that a monocular camera arranged on a vehicle is used for shooting the vehicle environment in real time, a shot two-dimensional image is uniformly divided into N blocks, one block is set as a reference position block, and the rest N-1 blocks are blocks to be detected;

4) performing spatial frequency domain analysis processing on the edge line characteristic quantity of the pixel set to obtain spatial frequency domain result data of each block;

5) the space frequency domain result data of each block is connected in series and parallel to form an input interface of an iterative nerve;

6) and inputting the space frequency domain result data based on a deep learning neural network model, and performing interactive iterative optimization calculation on the convolutional layer and the pooling layer in the deep learning iterative neural network to obtain the distance information between the static or moving object and the self reference point in the vehicle monitoring range.

According to the vehicle safety monitoring method, when the monocular camera is focused far, clear textures are formed near the focusing position, and images near the camera are blurred. According to the correlation comparison between the texture and the blur, the depth information of the distance can be obtained. By performing quantitative calibration on the far and near depth information, an estimation of the distance value can be obtained. Thereby, the distance between the self-vehicle and other road passing participants (automobiles, electric bicycles, pedestrians and the like) is estimated, and Blind Spot Detection (BSD) can be realized; if fused with radar sensor data, other ADAS functions, such as Automatic Emergency Braking (AEB), Adaptive Cruise Control (ACC), etc., may be reliably implemented.

The invention is described in further detail below with reference to the figures and specific embodiments.

Drawings

FIG. 1 is a flowchart of a two-dimensional image depth measurement method according to the present embodiment;

fig. 2 is a two-dimensional image captured by the monocular camera according to the present embodiment;

FIG. 3 is a schematic view of a coarse and fine analysis layer structure of the block A according to the present embodiment;

FIG. 4(a) is a schematic diagram of eigenvalues and eigenvectors extracted from principal component analysis-texture feature in this embodiment;

FIG. 4(b) is a schematic diagram illustrating the extraction of edge lines of an image from principal component analysis and texture feature extraction according to this embodiment;

FIG. 5 is a representation of texture density in the spatial frequency domain, which is a representation of principal component analysis of an image according to the present embodiment;

FIG. 6 is a direct view blind zone distribution diagram of the vehicle driver according to the present embodiment;

FIG. 7 is a CMS field of view distribution diagram of the vehicle driver according to the present embodiment;

FIG. 8 is a flowchart of a vehicle safety monitoring method according to the present embodiment;

fig. 9 is a schematic diagram of a data connection layer and a deep learning neural network according to the present embodiment.

Detailed Description

The two-dimensional image depth measuring method disclosed by the embodiment has the main working principle that: setting a monocular camera for collecting two-dimensional images to focus far, wherein the acquired two-dimensional images have the characteristics that: images with sharp texture near the focus and blur near the camera itself. And obtaining the depth information of far and near according to the correlation comparison between the texture and the blur. And carrying out quantitative calibration on the far and near depth information so as to obtain an estimation of a distance value. The specific measurement method is shown in fig. 1, and comprises the following steps:

s101, uniformly dividing the two-dimensional image into N blocks, setting one block as a reference position block, and setting the rest N-1 blocks as to-be-detected blocks.

The method specifically comprises the following steps: firstly, a two-dimensional image shot by a monocular camera is uniformly divided into N blocks, in this embodiment, 8 blocks are divided into 8 × 8 blocks, one block (e.g., block B shown in FIG. 2) is set as a reference position block, and the other 63 blocks are to-be-detected blocks; by performing the relatively near-far information processing on the other 63 blocks with reference to the base position block B, the distance information of the 63 blocks with respect to the base position block B block can be obtained. The feature extraction calculation speed can be accelerated by dividing the two-dimensional image into a plurality of blocks (generally, m-th power of 2).

S102, carrying out coarse and fine analysis on the N blocks of the two-dimensional image so as to enhance the highlighting of the texture features of the image and highlight the information of the sharpness of the fuzzy edge.

The coarse to fine analysis (coarse to fine) is specifically as follows: each block is subjected to down-sampling (i.e., down-sampling), as shown in fig. 3, assuming that the scaling factor of the image pyramid is 2, the original block (layer 0) is sequentially subjected to 1/2 scaling and down-sampling to obtain 4 pictures (including the original image), and then texture feature extraction is performed on the obtained pictures respectively. The purpose of this is to strengthen the highlighting of the texture feature and the fuzzy edge sharpness feature information which need to be extracted in different pixel maps, and improve the confidence of feature extraction. The number of fine image downsampling layers for each block depends on the application scenario and is typically 3-6 layers.

And S103, performing principal component analysis on the extracted image texture features to reduce the image data dimension and obtain the edge line feature quantity of the pixel set.

The method specifically comprises the following steps: after coarse and fine analysis is performed on the block, Principal Component Analysis (PCA) is performed on the extracted image texture features. Fig. 4(a) shows the distribution of blue pixels in block a in gray, and feature vectors (long arrow line segment is orthogonal to short arrow line segment by 90 degrees) are obtained through principal component analysis. The long arrow vector is the principal vector component. The long arrow line segments are considered as the primary elements from the blue lattice of pixels, also referred to as edge lines. This produces a number of edge lines, shown as straight line segments in FIG. 4 (b). In the method, the distance is estimated in such a way that the monocular camera focuses on the far part, the texture is clear and dense, and the texture is fuzzy and sparse in the near part. And analyzing and calculating the edge line density of the texture by adopting a spatial frequency domain after the principal component analysis, and calibrating and estimating the depth distance information on the basis.

And S104, analyzing and processing the edge line characteristic quantity of the pixel set through a spatial frequency domain to obtain the texture density of an image spatial frequency domain, and calculating the distance information of each block to be detected relative to the reference position block according to the texture density.

The edge line characteristic quantity of the pixel set is subjected to spatial frequency domain analysis processing, such as Discrete Cosine Transform (DCT) (discrete Cosine transform), texture density of an image spatial frequency domain is obtained, and distance information of the block to be detected relative to the reference position block is calculated according to the texture density; focusing a clear image at a far position according to the monocular camera so as to have higher texture density of a spatial frequency domain, and focusing a clear image at a near position so as to have lower texture density of the spatial frequency domain of the image at the near position; therefore, the comparison can estimate the distance information of each block to be measured relative to the reference position block. As shown in fig. 5, the left side is an image principal component analysis expression, and the right side is a texture density expression of a spatial frequency domain. In this embodiment, the monocular camera focuses on a far image having a clear texture density in a higher spatial frequency domain, and focuses on a near image having a lower texture density in a spatial frequency domain. Discrete cosine transform DCT is adopted, only real part calculation is processed, and no information loss exists relative to the calculation of principal component analysis results.

By the two-dimensional image depth measurement method, three-dimensional depth information can be obtained from a single-frame plane two-dimensional image, the calculation amount in the whole depth measurement process is small, the requirements on the pixel color, the brightness degree, the static background, the application scene and the like of the acquired two-dimensional image are avoided, and the method is high in applicability and good in real-time performance; the method does not depend on the accumulation of subjective experience of users, and has high reliability; in addition, the feature extraction calculation speed can be accelerated by block division.

The two-dimensional image depth measurement method can be applied to vehicle safety monitoring, a plurality of monocular cameras are installed on the periphery of a vehicle, depth measurement is carried out on two-dimensional images collected by the monocular cameras, and distance information between other moving or static objects around the vehicle and the vehicle is obtained, so that functions of blind spot detection, lane change assistance, overtaking assistance and the like are achieved. Other ADAS functions, such as Automatic Emergency Braking (AEB) and Adaptive Cruise Control (ACC), etc., may be reliably implemented if fused with vehicle radar sensor data.

The following description will be made in detail taking the vehicle blind spot obstacle detection as an example.

Because of the vehicle a, B, C pillars, there may be a blind zone ➀ ➁ ➃ ➅ in the driver side rear direct view area, as shown in fig. 6. The indirect field of view through the side rearview mirror is mostly the region ➂ ➄. However, when the electronic rear view mirror Camera Monitoring System (CMS) System is used, the range in which the camera can take the mirror is shown in fig. 7.

The vehicle safety monitoring method disclosed in the embodiment, as shown in fig. 8, specifically includes the following steps:

s201, shooting in real time and dividing blocks.

Firstly, real-time shooting is carried out through a monocular camera arranged on a vehicle rearview mirror, a shot frame of two-dimensional image is uniformly divided into N blocks, the block is divided into 8 × 8 blocks in the embodiment, one block (such as a block B shown in fig. 7) is set as a reference position block, the rest 63 blocks are to-be-detected blocks, and a block A in fig. 8 is one to-be-detected block; the relative distance information processing is performed on the other 63 blocks with reference to the reference position block B, and the distance information of the 63 blocks with respect to the reference position block B can be obtained. Dividing a plurality of blocks (typically 2 to the power of m, depending on the pixel size of the image) can speed up the feature extraction computation. The reference position block is determined according to an application scene, for example, if the blind area of the rear area of the side surface of the automobile is detected as an application purpose, if the monocular camera arranged on the door mirror is in the mirror taking range, the area near the wheel close to the tail of the automobile can be taken as the reference position block as shown in block B of fig. 8.

And S202, coarse and fine analysis.

After the two-dimensional image is divided into blocks, 64 blocks of the two-dimensional image need to be subjected to coarse-fine analysis (coarse-to-fine), so that the highlighting of image texture features is enhanced, and the information of blurred edge sharpness is highlighted. In the following, taking block a as an example, the coarse-fine analysis specifically includes: as shown in fig. 3, the block a is subjected to downsampling (i.e., downsampling), assuming that the scaling factor of the image pyramid is 2, the original block a to be measured (layer 0) is sequentially subjected to 1/2 scaling to obtain 4 pictures (including the original image), and then the obtained pictures are subjected to texture feature extraction. The purpose of the coarse and fine analysis is to strengthen the highlighting of texture features and fuzzy edge sharpness feature information which need to be extracted in different pixel maps and improve the confidence coefficient of feature extraction. Fig. 3 is a coarse and fine analysis pyramid image layer structure of a block a, the number of fine image downsampling layers of each block depends on an application scene, and an automobile scene application can be sampled into 3 layers or 4 layers, so that required image texture features can be extracted, and the defects of long time consumption, high cost and the like in analysis and calculation are avoided.

And S203, principal component analysis.

After the block is subjected to coarse and fine analysis, principal Component analysis (pca) is required to be performed on the extracted image texture features, and the pca (principal Component analysis) is used for reducing the image data dimension and obtaining the edge line feature quantity of the pixel set.

Fig. 4(a) shows the distribution of blue pixels in block a in gray, and feature vectors (long arrow line segment is orthogonal to short arrow line segment by 90 degrees) are obtained through principal component analysis. The long arrow vector is the principal vector component. The long arrow line segments are considered as the primary elements from the blue lattice of pixels, also referred to as edge lines. This produces a number of edge lines, shown as straight line segments in FIG. 4 (b). In the method, the distance is estimated in such a way that the monocular camera focuses on the far part, the texture is clear and dense, and the texture is fuzzy and sparse in the near part. And analyzing and calculating the edge line density of the texture by adopting a spatial frequency domain after the principal component analysis, and calibrating and estimating the depth distance information on the basis.

And S204, analyzing a spatial frequency domain.

And obtaining the space frequency domain result data of each block through the analysis processing of the space frequency domain of the edge line characteristic quantity of the pixel set obtained through the principal component analysis, wherein the space frequency domain result data is the texture density of the image space frequency domain. Fig. 5 shows the principal component analysis of the image on the left and the texture density of the spatial frequency domain on the right. In this embodiment, according to the fact that the monocular camera focuses on the texture density of the spatial frequency domain having a clear image at a far distance and having a higher spatial frequency domain, and the texture density of the spatial frequency domain of the image at a near position is lower, the distance information of the obstacle in the blind area of the automobile can be inferred by comparison, as shown in the schematic of the distance at the right side of fig. 5.

The spatial frequency domain analysis adopts Discrete Cosine Transform (DCT) (discrete Cosine transform). The discrete cosine transform DCT only processes real part calculation, and no information loss exists relative to the calculation of principal component analysis results.

And S205, data connection.

In order to enter the iterative deep learning neural network, after the blocks are sequentially processed according to the foregoing steps, the spatial frequency domain result data strings of the blocks need to be connected in parallel as an input interface of the iterative neural network, as shown in fig. 9 to ④.

And S206, deep learning neural network calculation.

And inputting the space frequency domain result data based on a deep learning neural network model, and performing interactive iterative optimization calculation on the convolutional layer and the pooling layer in the deep learning iterative neural network to obtain the distance information between the static or moving object and the self reference point in the vehicle monitoring range. The deep learning neural network model is obtained through big data offline training, the deep learning neural network model is combined by a base layer and a pooling layer, and model parameters are shown in fig. 9. During on-line processing, the distance information between the static or moving object and the self reference point in the monitoring range is rapidly obtained due to the intelligent machine learning capability embodied by the convolutional neural network.

The vehicle safety monitoring method is suitable for an auxiliary driving system formed by a vehicle-mounted monocular camera which is popularized and applied, and an image signal is different from a radar reflection signal and is insensitive to the material composition of a detected obstacle object.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

Claims

1. A two-dimensional image depth measurement method is characterized by comprising the following steps:

2. A two-dimensional image depth measuring method as defined in claim 1, characterized in that: the coarse and fine analysis adopts down-sampling treatment, 3-6 images including the original image are obtained by down-sampling according to a set scaling, and then texture feature extraction is respectively carried out on the obtained images.

3. A two-dimensional image depth measuring method as defined in claim 1, characterized in that: the spatial frequency domain analysis adopts Discrete Cosine Transform (DCT).

4. A vehicle safety monitoring method is characterized in that:

5. A vehicle safety monitoring method as claimed in claim 4, wherein: the coarse and fine analysis adopts down-sampling treatment, 3-6 images including the original image are obtained by down-sampling according to a set scaling, and then texture feature extraction is respectively carried out on the obtained images.

6. A vehicle safety monitoring method as claimed in claim 4, wherein: the spatial frequency domain analysis adopts Discrete Cosine Transform (DCT).