CN111295667A

CN111295667A - Image stereo matching method and driving assisting device

Info

Publication number: CN111295667A
Application number: CN201980005230.8A
Authority: CN
Inventors: 周啸林
Original assignee: SZ DJI Technology Co Ltd
Current assignee: Shenzhen Zhuoyu Technology Co ltd
Priority date: 2019-04-24
Filing date: 2019-04-24
Publication date: 2020-06-16
Anticipated expiration: 2039-04-24
Also published as: WO2020215257A1; CN111295667B

Abstract

A method and a driving assistance device for image stereo matching, the method comprising: acquiring a plurality of views, wherein the views are a plurality of two-dimensional views of the same scene; determining the surface cost of each corresponding pixel point in the multiple views under a preset space plane based on the preset space plane; determining disparity maps for the multiple views in the scene according to the face cost. The method can avoid poor anti-noise capability caused by the fact that the disparity map is calculated based on the similarity cost of the pixel points, and can improve the image stereo matching precision and the anti-noise capability.

Description

Image stereo matching method and driving assisting device

Copyright declaration

The disclosure of this patent document contains material which is subject to copyright protection. The copyright is owned by the copyright owner. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the patent and trademark office official records and records.

Technical Field

The present application relates to the field of image processing, and in particular, to a method for image stereo matching and a driving assistance device.

Background

In recent years, the concept of smart cars has been proposed due to the rapid development of the vehicle networking (V2X). Wherein, intelligent automobile can realize functions such as intelligent driving. It should be understood that in order to realize functions such as intelligent driving of the intelligent automobile, the intelligent automobile needs to sense the surrounding environment. For example, a smart car is mounted with an environment detection sensor that provides the smart car with surrounding environment information, specifically, the environment information includes dense depth information of a three-dimensional (3D) scene, and the like. Dense depth information of a 3D scene can be used for 3D reconstruction, 3D travelable area detection, obstacle detection and tracking, 3D lane line detection, and the like.

The mainstream three-dimensional dense depth detection modes applied in the field of intelligent automobile environment perception comprise a three-dimensional laser radar (3D Lidar) positioning technology and a stereoscopic vision technology. The stereoscopic vision technology becomes a research hotspot in the field of intelligent automobile environment perception due to the advantages of low price and dense point cloud. Generally, a method for obtaining parallax by shooting two images at different angles through two cameras is called binocular stereo vision, and the method mainly comprises three steps: firstly, parameter calibration is carried out on a camera system to obtain camera internal and external parameters, then a matching relation between pixel points in two images, namely a visual difference, is found, and finally three-dimensional information of a scene is restored according to the matching relation and the camera internal and external parameters. The core part is the process of finding the matching relationship of the pixel points, which is called stereo matching.

The existing stereo matching methods only aim at image processing of common binocular stereo vision generally, and cannot obtain good effect in certain specific scenes, and the matching result has high noise. For example, in stereo matching of traffic scenes, the feature of the road surface cannot be well continuous, which may cause inaccuracy in subsequent further calculations. Therefore, it is necessary to provide a stereo matching method that can obtain a better effect for a specific scene.

Disclosure of Invention

The application provides an image stereo matching method and an auxiliary driving device, which can improve the image stereo matching precision and the anti-noise capability.

In a first aspect, a method for stereo matching of images is provided, including:

acquiring a plurality of views, wherein the views are a plurality of two-dimensional views of the same scene;

determining the surface cost of each corresponding pixel point in the multiple views under a preset space plane based on the preset space plane;

determining disparity maps for the multiple views under the scene according to the face cost.

The image stereo matching method provided by the application does not determine the disparity maps of multiple views based on the similarity cost of the pixel point level any more, but determines the surface cost of each pixel point under the preset spatial plane, and then determines the disparity maps of the multiple views according to the surface cost. The concept of surface cost is provided, and the image stereo matching precision and the anti-noise capability can be improved.

In a second aspect, there is provided a driving assistance apparatus comprising:

at least one memory for storing computer-executable instructions;

at least one processor, individually or collectively, configured to: accessing the at least one memory and executing the computer-executable instructions to perform operations comprising:

In a third aspect, there is provided a computer-readable storage medium having stored thereon instructions which, when run on a computer, cause the computer to perform the method of image stereo matching of the first aspect.

In a fourth aspect, a vehicle is provided that includes the driving assist apparatus of the second aspect.

Drawings

FIG. 1 is a diagram of a system architecture suitable for use with embodiments of the present application.

Fig. 2 is a schematic flow chart of a stereo matching method 100 according to an embodiment of the present application.

Fig. 3(a) - (d) are schematic diagrams of scanning pixel points according to an embodiment of the present disclosure.

Fig. 4 is a schematic flowchart of calculating a face cost according to an embodiment of the present application.

Fig. 5 is a schematic block diagram of a driving assistance device 50 according to an embodiment of the present application.

FIG. 6 is a schematic block diagram of a vehicle provided in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.

First, a scene to which the stereo matching method provided in the embodiment of the present application can be applied is described with reference to fig. 1. FIG. 1 is a diagram of a system architecture suitable for use with embodiments of the present application. Including view acquisition devices (such as camera #1 and camera #2 shown in fig. 1) and an internal view processing chip (shown in fig. 1 as showing an internal chip structure).

Specifically, fig. 1 shows a Binocular Stereo Vision (Binocular Stereo Vision) three-dimensional measurement. The binocular stereo vision is an important form of machine vision, and is a method for acquiring three-dimensional geometric information of an object by acquiring two images of the object to be detected from different positions by using imaging equipment based on a parallax principle and calculating position deviation between corresponding points of the images.

The principle of head-up binocular stereo imaging can be derived from fig. 1. For example, the distance between the projection center (OL) of the camera #1 and the projection center (OR) of the camera #2 is referred to as a baseline distance (L). Assume that camera #1 and camera #2 view the same feature point (P) of an object at the same time, and the three-dimensional coordinates of the feature point are P (x, y, z). The camera #1 and the camera #2 respectively acquire images of the feature point P, wherein the image of the feature point P acquired by the camera #1 has (xleft, yleft) coordinates; the image of the feature point P acquired by the camera #2 is (xright, right) coordinates. Further, assuming that Pleft and right are on the same plane, the ordinate of Pleft and right is the same, i.e., yleft is Y. From the trigonometric relationship, the following relationship can be obtained:

where f is the focal length of camera #1 and camera #2, and since the same camera is typically used for both left and right eyes of binocular stereopsis, camera #1 and camera #2 are equal here.

Binocular stereo vision refers to a visual processing method for acquiring scene three-dimensional information by fusing two images of the same scene acquired by two cameras. The binocular stereo vision matches corresponding pixel points of the same spatial physical point in different images, and the difference between the corresponding pixel points of the same spatial physical point in different images is called parallax (disparity). P in FIG. 1_leftAnd P_rightThe parallax between is: disparity ═ x_left-x_right. Specifically, the three-dimensional coordinates (x1, y1, z1) of the feature point P in the coordinate system constituted by the camera #1 and the camera #2 can be thus calculated:

binocular stereo vision is described in detail above in connection with fig. 1, and specifically, binocular stereo images (P shown in fig. 1) are acquired by the method of binocular stereo vision_leftAnd P_right) Then, the process of finding the degree of correlation of the binocular stereo image is called stereoAnd (4) matching the bodies. The stereo matching method provided by the embodiment of the application can be applied to determining the correlation degree of the binocular stereo image shown in fig. 1.

It should be understood that fig. 1 is only an example of a scenario in which the stereo matching method provided in the embodiment of the present application can be applied, and does not set any limit to the scope of the present application. The stereo matching method provided by the embodiment of the application can be applied to the process of determining the degree of correlation of a plurality of stereo images, for example, image matching using more cameras.

To facilitate understanding of the stereo matching method provided in the embodiments of the present application, a brief description will first be made of several basic concepts involved in the embodiments of the present application:

1. and (5) stereo matching.

Stereo matching can be regarded as a process of searching the correlation degree of two groups of data, and three-dimensional information of a scene is restored based on a plurality of two-dimensional images obtained from the same scene.

2. Hamming distance.

The hamming distance is used in data transmission error control coding, and is a concept that represents the different number of corresponding bits of two (same length) words, and the hamming distance between the two words x, y is represented by L (x, y). And carrying out exclusive OR operation on the two character strings, and counting the number of 1, wherein the number is the Hamming distance.

3. And (4) robustness.

The key to system survival in both abnormal and dangerous situations. For example, whether computer software is halted or crashed in the case of input error, disk failure, network overload, or intentional attack is the robustness of the software. By "robustness", it is meant that the control system maintains certain other performance characteristics under certain (structural, size) parameter perturbations. Robustness as referred to in this application refers to noise immunity.

4. The color distance.

Referred to as Color difference, is a point of interest in colorimetry. The color difference can be calculated simply by the euclidean distance in the color space or by using a more complex and uniform human perceptual formula.

The foregoing briefly introduces a scenario to which the stereo matching method provided in the embodiment of the present application can be applied and related basic concepts related to the present application, and the stereo matching method provided in the present application is described in detail below with reference to fig. 2 to 4.

The embodiment of the application provides a stereo matching method 100. Fig. 2 is a schematic flow chart of a stereo matching method 100 according to an embodiment of the present application. As shown in fig. 1, the method 100 includes:

s110, acquiring a plurality of views.

The acquired multiple views are multiple two-dimensional views of the same scene. The multiple views may also be referred to as multi-view views based on the binocular view described above in fig. 1. Specifically, the multiple views acquired in the present application are corrected multiple views. However, how to correct the multiple views acquired by the multiple view acquisition devices is not limited in this application. For example, a plurality of view correction methods referred to in the present application include:

the method comprises the steps of simultaneously obtaining multi-channel Red Green Blue (RGB) color images of the same scene through a multi-view camera, calibrating cameras of the multi-view camera respectively, obtaining internal and external parameters of a plurality of cameras corresponding to the multi-view camera, carrying out view correction on a plurality of views according to the obtained internal and external parameters of the plurality of cameras respectively, removing fisheye effects and position error influences of the plurality of cameras, and obtaining a plurality of corrected views.

It should be understood that the multiple view correcting method described above is only an example of the multiple view correcting method for facilitating understanding of the present application, and is not intended to limit the present application at all, and in the embodiment of the present application, how to correct multiple views is not limited, and the multiple views may be corrected by any multiple view correcting method, and details are not described here.

Further, the multiple views acquired by which multiple view acquisition device is used in the present application is not limited, and may be any device capable of acquiring multiple views, for example, the above-mentioned multi-view camera, where the multi-view camera may be a multi-channel video acquisition device combined by multiple cameras of the same specification.

As a possible implementation, the multiple views may be two views, i.e. the dual view described above.

Specifically, the method for stereo matching of images provided by the embodiment of the present application does not determine disparity maps of multiple views based on similarity costs at pixel levels any more, but determines a surface cost of each pixel under the preset spatial plane, and then determines disparity maps of multiple views according to the surface cost. In the present application, the surface cost of each pixel point under a preset spatial plane may be referred to as a global surface cost.

Further, the global surface cost in the embodiment of the present application is obtained by calculating based on the similarity cost of the pixel point, that is, the method flowchart shown in fig. 2 further includes S120, which determines the similarity cost of each pixel point in the multiple views.

Specifically, the algorithm used in the present application is not limited to calculate the similarity cost of each pixel point in multiple views. The method can be any algorithm for calculating the similarity cost of the pixel points in the existing stereo matching algorithm.

For completeness of a scheme, a scheme for calculating a similarity cost of each pixel point in a plurality of views is provided in this embodiment of the present application, and it should be understood that the calculation scheme for the similarity cost of the pixel point is provided by way of example only, and does not limit the protection scope of the present application.

The calculation scheme of the similarity cost of the pixel point comprises the following steps: after the corrected multiple views are acquired, feature points in each of the multiple views are determined. Wherein, the feature points in the view are composed of preset key points and feature descriptors.

Determining similarity costs between corresponding feature points in the plurality of views.

For ease of understanding, the following briefly introduces the concept of feature points:

the concept of feature points is proposed in the existing image matching algorithm, which is to efficiently and accurately match the same object in two images with different viewing angles, and this is also the first step in many computer vision applications. Although the images exist in a form of a gray matrix in a computer, the same object in the two images cannot be accurately found by utilizing the gray of the images. This is because the gray scale is affected by the illumination, and when the image viewing angle changes, the gray scale of the same object will also change. Therefore, there is a need to find features that can be moved and rotated (with changing viewing angles) in a camera, yet remain unchanged, and use these unchanged features to find the same object in images from different viewing angles.

In order to better perform image matching, a representative region (local feature map) needs to be selected from the image, and feature points need to be determined; or directly determine the feature points. Such as corners, edges, and pixel points in some feature regions in the image. The image is most easy to identify the corner points, that is, the identification degree of the corner points is the highest, so in many computer vision processes, the corner points are extracted as feature points to match the image. However, the simple corner points do not satisfy our needs well. For example, a camera gets a corner point from far, but may not be at near; alternatively, the corner points change when the camera is rotated. To this end, researchers in computer vision have designed many more stable feature points that do not change with the movement, rotation, or changes in illumination of the camera.

Specifically, the feature points of one view are composed of two parts: a Keypoint (Keypoint) and a Descriptor (Descriptor). The key points refer to the positions of the feature points in the image, and some feature points also have direction and scale information; a descriptor is typically a vector that describes the information of the pixels around a keypoint in an artificially designed way. Generally, descriptors are designed according to similar appearance characteristics and similar descriptors. Therefore, when matching, two feature points can be considered as the same feature point as long as their descriptors are close to each other in the vector space.

A descriptor of a feature is usually a well-designed vector that describes information of a keypoint and its surrounding pixels. The descriptors are typically characterized as follows:

invariance: the finger feature does not change with the rotation of the enlargement and reduction of the image.

Robustness: insensitive to noise, light or other small deformations

Differentiability: each feature descriptor is unique and exclusive, and minimizes the similarity between each other. The feature descriptor is usually a vector, and the distance between two feature descriptors can reflect the similarity between two feature points, i.e. the two feature points are not the same. Depending on the descriptor, different distance metrics may be selected. If the descriptor is a floating point type descriptor, the Euclidean distance of the descriptor can be used; for binary descriptors, their hamming distance can be used.

Specifically, the algorithm for determining the corresponding feature points in the multiple views may be a census algorithm, and the similarity cost of the pixel points may be a hamming distance between the corresponding feature points. It should be understood that the census algorithm and the use of the hamming distance between the corresponding feature points as the similarity cost of the pixel points only exemplify that the similarity cost of each pixel point in the multiple views can be obtained by the method in the present application, but the matching algorithm in the present application is not limited to obtaining the corresponding feature points in the multiple views only according to the census algorithm and using the hamming distance between the corresponding feature points as the similarity cost of the pixel points. The process of calculating the similarity cost of each pixel point in the multiple views is not limited in the present application, and any method of calculating the similarity cost of each pixel point in the multiple views in the existing stereo matching algorithm may be adopted, which is not illustrated here.

After the similarity cost of each pixel point in the multiple views is obtained through calculation, the face cost of each pixel point can be determined according to the similarity cost based on a preset space plane. That is, S130 is executed to determine the area cost of the pixel point under the preset spatial plane.

For any one pixel point in each corresponding pixel point in a plurality of views, a preset space plane is determined, and a plane parameter of the preset space plane is calculated. Specifically, based on the preset spatial plane, the surface cost of each pixel point under the preset spatial plane is determined according to the similarity cost of the pixel points calculated in S120. In some embodiments, the predetermined spatial plane is at least two planes, the at least two planes including a parallel plane and a perpendicular plane. Wherein, the parallel plane refers to a plane parallel to the road surface on which the vehicle runs, and the vertical plane refers to a plane which is perpendicular to the road surface on which the vehicle runs and the normal direction of which is consistent with or opposite to the extending direction of the road surface; in some general cases, the parallel plane may be a horizontal plane, and the vertical plane may be a vertical plane. In other embodiments, the predetermined spatial plane may further include a second vertical plane in addition to the parallel plane and the vertical plane, the second vertical plane being a plane perpendicular to a road surface on which the vehicle travels and a normal direction of which is perpendicular to a road surface extending direction. The preset space plane comprises a plurality of pixel points, and the parallax values of the pixel points are different.

The specific process of calculating the surface cost of the pixel point under the preset space plane based on the plane parameter of the preset space plane and the similarity cost of the pixel point comprises the following steps:

assuming that the predetermined plane equation of the spatial plane is d ═ a × x + b × y + c, where (x, y) are the coordinates of the pixel points, it is able to obtain the coordinates of the pixel points

Since the robustness of a single pixel is very poor and is easily affected by noise, the area cost of each pixel under the preset spatial plane is defined in the embodiment of the present application, and specifically, the area cost of a pixel under the preset spatial plane is obtained by using the plane parameters (a and b in the above) of the pixel for calculating other pixels except the pixel on the preset spatial planeThe result of weighted averaging of the similarity costs. Further, the weight of each pixel point in other pixel points except the pixel point is given to each pixel point and is respectively related to the color distance and the coordinate distance of the pixel point, and the formula expression is as follows:

in the above formula, p is the pixel point, q is any one of other pixel points except the pixel point p on the preset spatial plane, and σ is_r，σ_sThe parameters are parameters for controlling the influence of the color distance and the coordinate distance between the pixel point q and the pixel point p.

How to determine the surface cost of each pixel point under a preset spatial plane based on the preset spatial plane according to the similarity cost in the embodiment of the present application is described in detail below with reference to fig. 3 and 4. First, how to perform pixel scanning in the embodiment of the present application is described with reference to fig. 3, where fig. 3 is a schematic diagram of scanning pixels provided in the embodiment of the present application.

In order to increase the calculation rate, in the embodiment of the present application, a method of separately calculating rows and columns of an image may be used to calculate the surface cost of a pixel point under the preset spatial plane. The specific process comprises the following steps:

the method comprises the following steps: as shown in fig. 3(a), for a Pixel p, scanning from left to right in the preset spatial Plane, and recording a Plane cost of the Pixel p under the preset spatial Plane as a Plane _ cost (p, a, b), where a and b are Plane parameters of the preset spatial Plane, and a similarity cost of the Pixel p is calculated in the above S120 and is not described herein again. Specifically, for the pixel point p, scanning from left to right in the preset spatial plane includes:

recording a pixel point on the left side of the pixel point p in a preset space plane as p _ left, wherein because the plane equation of the preset space plane is known, the relationship between the parallax disp (p _ left) of the pixel point p _ left on the left side of the pixel point p and the parallax disp (p) of the pixel point p can be calculated as follows:

disp (p) ═ disp (p _ left) + a, in order not to lose generality, it is assumed in this application that a > 0;

specifically, the matching cost of the pixel point on the left side of the pixel point in the preset spatial plane may be calculated according to the weight of the pixel point on the left side multiplied by the sum of the similarity cost of the pixel point on the left side and the similarity cost of the pixel point:

the matching cost result of all left pixels of the Pixel p in the preset spatial Plane is Plane _ cost _ x _ left _ right (p, a) ═ w (p, p _ left) × Plane _ cost _ x _ left _ right (p _ left) + Pixel _ cost (p, a), and the Plane _ cost _ x _ left _ right (p, a) can be obtained by dynamic programming through solving. The Plane _ cost _ x _ left _ right (p, a) can be called the first matching cost.

Step two: as shown in fig. 3(b), for the Pixel point p, scanning from right to left in the preset spatial Plane, and recording a surface cost of the Pixel point p under the preset spatial Plane as a Plane _ cost (p, a, b), where a and b are Plane parameters of the preset spatial Plane, and recording a similarity cost of the Pixel point p as a Pixel _ cost (p), where the similarity cost is calculated in S120. Specifically, for the pixel point p, scanning from right to left in the preset spatial plane includes:

recording a pixel point on the right side of the pixel point p in the preset spatial plane as p _ right, wherein the relationship between the parallax disparities (p _ right) of the pixel point p _ right on the right side of the pixel point p and the parallax disparities (p) of the pixel point p can be calculated as follows because the plane equation in the preset spatial plane is known:

disp(p)＝disp(p_right)-a；

specifically, the matching cost of the pixel point on the right side of the pixel point in the preset spatial plane can be calculated according to the weight of the pixel point on the right side multiplied by the sum of the similarity cost of the pixel point on the right side and the similarity cost of the pixel point:

the matching cost result of all right pixels of the Pixel p in the preset spatial Plane is Plane _ cost _ x _ right _ left (p, a) ═ w (p, p _ right) × Plane _ cost _ x _ right _ left (p _ right) + Pixel _ cost (p, a), and a right-to-left scanning result, namely Plane _ cost _ x _ right _ left (p, a), can be obtained by dynamic programming. The Plane _ cost _ x _ right _ left (p, a) can be called the second matching cost.

Step three: as shown in fig. 3(c), for the Pixel p, scanning from top to bottom in the preset spatial Plane, and recording a Plane cost of the Pixel p under the preset spatial Plane as a Plane _ cost (p, a, b), where a and b are Plane parameters of the preset spatial Plane, and recording a similarity cost of the Pixel p as a Pixel _ cost (p), where the similarity cost is calculated in S120. Specifically, for the pixel point p, scanning from top to bottom in the preset spatial plane includes:

recording a pixel point above a pixel point p in a preset space plane as p _ up, wherein because the plane equation of the preset space plane is known, the relationship between the parallax disp (p _ up) of the pixel point p _ up above the pixel point p and the parallax disp (p) of the pixel point p can be calculated as follows:

disp (p) ═ disp (p _ up) + b, in order not to lose generality, it is assumed in this application that b > 0;

specifically, the matching cost of the pixel point above the pixel point in the preset spatial plane may be calculated according to the weight of the pixel point above the pixel point multiplied by the sum of the similarity cost of the pixel point above the pixel point and the similarity cost of the pixel point:

the matching cost result of all the pixels above the Pixel p in the preset space Plane is Plane _ cost _ y _ up _ down (p, b) ═ w (p, p _ up) × Plane _ cost _ y _ up _ down (p _ up) + Pixel _ cost (p, b), and the Plane _ cost _ y _ up _ down (p, b) can be obtained by solving through dynamic programming. The result Plane _ cost _ y _ up _ down (p, b) is scanned from top to bottom. Lane _ cost _ y _ up _ down (p, b) may be referred to as the third matching cost.

Step four: as shown in fig. 3(d), for the Pixel p, scanning from bottom to top in the preset spatial Plane, and recording a surface cost of the Pixel p under the preset spatial Plane as a Plane _ cost (p, a, b), where a and b are Plane parameters of the preset spatial Plane, and recording a similarity cost of the Pixel p as a Pixel _ cost (p), where the similarity cost is calculated in S120. Specifically, for the pixel point p, scanning from bottom to top in the preset spatial plane includes:

recording a pixel point below the pixel point p in a preset spatial plane as p _ down, wherein the relationship between the parallax disparities (p _ down) of the pixel points p _ down below the pixel point p and the parallax disparities (p) of the pixel points p can be calculated as follows because the plane equation of the preset spatial plane is known:

disp(p)＝disp(p_down)-b；

specifically, the matching cost of the pixel point below the pixel point in the preset spatial plane may be calculated according to the weight of the pixel point below the pixel point multiplied by the sum of the similarity cost of the pixel point below the pixel point and the similarity cost of the pixel point:

the matching cost result of all the following Pixel points of the Pixel point p in the preset space Plane is Plane _ cost _ y _ down _ up (p, b) ═ w (p, p _ down) × Plane _ cost _ y _ down _ up (p _ down) + Pixel _ cost (p, b), and the Plane _ cost _ y _ down _ up (p, b) can be obtained by solving through dynamic programming. The result Plane _ cost _ y _ up _ down (p, b) is scanned from bottom to top. The Plane _ cost _ y _ up _ down (p, b) can be called the fourth matching cost.

Illustratively, the surface cost calculation of the pixel point p under the preset spatial plane with the plane parameter equation d ═ a × x + b × y + c is obtained through the process calculation shown in fig. 4, and fig. 4 is a schematic flow chart for calculating the surface cost provided in the embodiment of the present application, and includes S210-S230.

And S210, calculating to obtain left-to-right scanning results and right-to-left scanning results.

Specifically, calculating the left-to-right scanning result of the pixel points in the preset spatial plane is shown in fig. 3(a) and step one in fig. 3, which is not repeated here; similarly, the calculation of the scanning result of the pixel points in the preset spatial plane from right to left is shown in fig. 3(b) and step two in fig. 3, which is not repeated here.

S220, calculating to obtain a first average value of the left-to-right scanning result and the right-to-left scanning result.

Specifically, an average value of the first matching cost Plane _ cost _ x _ left _ right (p, a) and the second matching cost Plane _ cost _ x _ right _ left (p, a) in S230 is calculated to obtain a Plane _ cost _ x.

And S211, calculating to obtain scanning results from top to bottom and from bottom to top.

Specifically, calculating and calculating the scanning result of the pixel points in the preset spatial plane from top to bottom is shown in fig. 3(c) and step three in fig. 3, which is not repeated here; similarly, calculating the scanning result of the pixel points in the preset spatial plane from bottom to top is shown in fig. 3(d) and step four in fig. 3, which is not repeated here.

And S221, calculating a second average value of the scanning results from top to bottom and from bottom to top.

Specifically, an average value of the third matching cost Plane _ cost _ y _ up _ down (p, b) and the fourth matching cost Plane _ cost _ y _ up _ down (p, b) in S240 is calculated to obtain a Plane _ cost _ y.

And S230, calculating the surface cost of the pixel point under the preset space plane.

And averaging the first average Plane _ cost _ x calculated in the step S220 and the second average Plane _ cost _ y calculated in the step S221 to obtain the surface cost of the pixel point under the preset spatial Plane.

The method flow shown in fig. 4 can calculate the weighted average matching cost of the pixel points in the vertical coordinate direction of the pixel points and the weighted average matching cost of the pixel points in the horizontal coordinate direction of the pixel points in parallel, and is easy to accelerate the processing.

As described above, according to the stereo matching method provided in the embodiment of the present application, the surface cost of each pixel point under the preset spatial plane can be calculated through the processes shown in fig. 3 and fig. 4. Because the calculation process of the surface cost of each pixel point under the preset space plane needs to be based on the plane parameters of the preset space plane, the surface cost of the pixel point under the preset space plane can be vividly abbreviated as the surface cost of the pixel point.

When the stereo matching method provided by the embodiment of the application is applied to a traffic scene, the traffic scene conforms to the manhattan world assumption and basically consists of a horizontal plane and a vertical plane, so that in the application of the traffic scene, the preset spatial plane can comprise two plane equations of a forward looking parallel plane and a ground plane.

And based on a preset space plane, according to the similarity cost, after determining the surface cost of each pixel point under the preset space plane, executing S140, and determining the disparity maps of a plurality of views.

Specifically, the disparity maps of the multiple views obtained in the scene are determined according to the area cost of each pixel point in the preset spatial plane calculated in the step S130. Further, to support large parallax variations. In the stereo matching method provided by the embodiment of the application, when the disparity map is calculated, a total variation (totalvariance) smoothing term is introduced into a defined energy equation. And the energy equation is expressed as the sum of a data item and a smoothing item, namely the energy of the disparity map needing to be solved, wherein the data item is the normalization normal mode of the second matching cost of all the pixel points, and the smoothing item is the total variation. Illustratively, the energy equation is expressed as:

wherein D represents a final disparity map to be solved, and Planes represents the standard normalization (normalized norm) of the surface cost of all the pixel points under the preset spatial plane, which is obtained by calculation in S130, and Pairwise_costTotalVarioration was used.

The energy equation is equivalent to combining the surface cost and the total variation under the preset space plane, and finally solving the energy equation.

It should be understood that the present application is not limited to solving the energy equations described above, and may be any one of algorithms for solving the energy equations in the existing stereo matching method. For example, Beliefpacting, GraphCut, TVL¹The Viterbi algorithm may be used to solve the traffic scene resolution and real-time requirements, for example.

And finally, performing plane fitting (PlaneFit) on the parallax map through the preset plane parameters of the spatial plane to obtain the sub-pixel precision. It should be understood that the plane fitting method referred to in the present application may be any one of the existing plane fitting methods, and the detailed description thereof is omitted here.

The method for stereo image matching provided by the embodiment of the present application is described in detail above with reference to fig. 2 to fig. 4, and it should be understood that, in each of the above method embodiments, the sequence numbers of the above processes do not mean the order of execution, and the order of execution of the processes should be determined by their functions and inherent logic, and should not limit the implementation processes of the embodiment of the present application in any way. The driving assistance device provided by the embodiment of the present application will be described in detail with reference to fig. 5, and the vehicle provided by the embodiment of the present application will be described in detail with reference to fig. 6.

Fig. 5 is a schematic block diagram of a driving assistance device 50 according to an embodiment of the present application. As shown in fig. 5, the driving assistance apparatus 50 includes:

at least one memory 501 for storing computer-executable instructions;

at least one processor 502, individually or collectively, for: accessing the at least one memory and executing the computer-executable instructions to perform operations comprising:

determining a similarity cost of each corresponding pixel point in the plurality of views;

determining the surface cost of each pixel point under a preset space plane according to the similarity cost based on the preset space plane;

In some embodiments, the processor 502 is specifically configured to:

and determining a plurality of local feature maps corresponding to the plurality of views respectively, wherein the Hamming distance between every two local feature maps in the plurality of local feature maps is used as the similarity cost of pixel points in the two local feature maps.

In some embodiments, the processor 502 is further configured to:

and acquiring a local feature map of each view in the plurality of views based on a census algorithm.

In some embodiments, the processor 502 is further configured to:

determining a first matching cost, a second matching cost, a third matching cost and a fourth matching cost according to the preset spatial plane parameters and the similarity cost of the pixel point, wherein the average value of the first matching cost, the second matching cost, the third matching cost and the fourth matching cost is the surface cost of the pixel point under the preset spatial plane;

the first matching cost is the matching cost of a pixel point on the left side of the pixel point in the preset spatial plane, the second matching cost is the matching cost of a pixel point on the right side of the pixel point in the preset spatial plane, the third matching cost is the matching cost of a pixel point on the upper side of the pixel point in the preset spatial plane, and the fourth matching cost is the matching cost of a pixel point on the lower side of the pixel point in the preset spatial plane.

In some embodiments, the predetermined spatial plane is at least two planes, the at least two planes including a parallel plane and a perpendicular plane.

In some embodiments, the processor 502 is specifically configured to:

determining a first average of the first matching cost and the second matching cost;

determining a second average of the third matching cost and the fourth matching cost;

and the average value of the first average value and the second average value is the surface cost of the pixel point under the preset space plane.

In some embodiments, the processor 502 is specifically configured to:

determining the weight of a pixel point on the left of the pixel point;

the first matching cost is the weight of the pixel point on the left side of the pixel point multiplied by the sum of the similarity cost of the pixel point on the left side and the similarity cost of the pixel point.

In some embodiments, the processor 502 is specifically configured to:

determining the weight of the pixel point on the right side of the pixel point;

the second matching cost is the sum of the weight of the pixel point on the right side of the pixel point multiplied by the similarity cost of the pixel point on the right side and the similarity cost of the pixel point.

In some embodiments, the processor 502 is specifically configured to:

determining the weight of the pixel point above the pixel point;

and the third matching cost is the sum of the weight of the pixel point above the pixel point multiplied by the similarity cost of the pixel point above and the similarity cost of the pixel point.

In some embodiments, the processor 502 is specifically configured to:

determining the weight of the pixel points below the pixel points;

the fourth matching cost is the sum of the weight of the pixel point below the pixel point multiplied by the similarity cost of the pixel point below and the similarity cost of the pixel point.

In some embodiments, the processor 502 is specifically configured to:

determining the weight of the pixel points except the pixel point in the plane, wherein the weight of the pixel point is related to the color distance and the coordinate distance between the pixel point and the pixel point.

In some embodiments, the processor 502 is specifically configured to:

determining an energy equation, wherein the energy equation is expressed as the sum of a data item and a smoothing item and is called the energy of the disparity map needing to be solved;

the data item is a normalized norm of the second matching cost of all the pixel points, and the smooth item is total variation;

and solving the energy equation to obtain the disparity map.

In some embodiments, the driving assistance device 50 may further include a visual sensor (not shown) for acquiring the aforementioned views.

It should be understood that the driving assistance device 50 may be an integrated device or apparatus, such as a device with at least one memory 501 and at least one processor 502, which may acquire data of existing sensors of the vehicle through an on-board communication bus, wireless communication, etc., and perform the aforementioned processing on the data; or a device or apparatus that also includes at least one visual sensor, such that the driver assistance device 50 may obtain the sensor data itself and perform the aforementioned processing on the data. The driver assistance device 50 can then be easily mounted or removed as a separate device or apparatus. The driver assistance device 50 may also be a distributed device or apparatus, such as at least one memory 501, at least one processor 502, disposed or mounted dispersed on the vehicle, in which case the driver assistance device 50 may be mounted in the vehicle as a front-mounted device or apparatus; or may also include at least one vision sensor at the same time, and the vision sensor may be mounted in the same or a different location as the memory 501 or the processor 502. The driver assistance device 50 can now be more easily arranged on the vehicle as a plurality of discrete devices or apparatuses.

It should be understood that the driving assistance device 50 may also be implemented by a corresponding software module, which is not described in detail herein.

In other embodiments, the driver assistance device 50 and the above-mentioned vision sensor are two modules, i.e. the vision sensor is not necessarily integrated in the driver assistance device 50. As shown in fig. 6, fig. 6 is a schematic view of a vehicle according to an embodiment of the present application. The schematic view includes a driving assistance device 50 and a vision sensor 60.

It should be understood that the vehicle shown in fig. 6 includes the driving assistance device 50 described above and a vision sensor for acquiring the above-described plurality of views.

It should be further understood that the vehicle shown in fig. 6 includes the driving assistance device 50 and the vision sensor 60 by way of example, and in practical applications, the vehicle may include only the driving assistance device 50, the vision sensor 60 may be a peripheral device of the vehicle, or the vision sensor 60 may be integrated on the driving assistance device 50, which is not illustrated in this application.

It should be understood that the processor mentioned in the embodiments of the present Application may be a Central Processing Unit (CPU), and may also be other general purpose processors, Digital Signal Processors (DSP), Application Specific Integrated Circuits (ASIC), Field Programmable Gate Arrays (FPGA) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It will also be appreciated that the memory referred to in the embodiments of the application may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of example, but not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), Double data rate Synchronous Dynamic random access memory (DDR SDRAM), Enhanced Synchronous SDRAM (ESDRAM), Synchronous link SDRAM (SLDRAM), and Direct Rambus RAM (DR RAM).

It should be noted that when the processor is a general-purpose processor, a DSP, an ASIC, an FPGA or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, the memory (memory module) is integrated in the processor.

It should be noted that the memory described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

Embodiments of the present application further provide a computer-readable storage medium, on which instructions are stored, and when the instructions are executed on a computer, the instructions cause the computer to execute the method of each of the above method embodiments.

An embodiment of the present application further provides a computing device, which includes the computer-readable storage medium.

The embodiment of the application also provides a vehicle, which comprises a driving assisting device 50.

In some embodiments, the vehicle further comprises a vision sensor for acquiring the view.

The embodiment of the application can be applied to the traffic scene, especially the field of intelligent automobile environment perception.

It should be understood that the division of circuits, sub-units of the various embodiments of the present application is illustrative only. Those of ordinary skill in the art will appreciate that the various illustrative circuits, sub-circuits, and sub-units described in connection with the embodiments disclosed herein can be split or combined.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The processes or functions according to the embodiments of the present application are generated in whole or in part when the computer instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions can be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., Digital Video Disk (DVD)), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

It should be understood that in the embodiment of the present application, "B corresponding to a" means that B is associated with a, from which B can be determined. It should also be understood that determining B from a does not mean determining B from a alone, but may be determined from a and/or other information.

It should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for stereo matching of images, comprising:

2. The method of claim 1, wherein the plurality of views are two views obtained by a binocular vision acquisition device.

3. The method of claim 1, further comprising:

determining a similarity cost for each of the pixels in the plurality of views.

4. The method of claim 3, further comprising:

determining the surface cost of each pixel point according to the similarity cost based on the preset space plane;

and the surface cost of each pixel point is global surface cost.

5. The method according to claim 3 or 4, wherein prior to said determining a similarity cost for each corresponding pixel point in said plurality of views, said method further comprises:

feature points in each of the plurality of views are determined.

6. The method of claim 5, wherein the determining the similarity cost for each corresponding pixel point in the plurality of views comprises:

similarity costs between corresponding feature points in the plurality of views are determined.

7. The method of claim 6, wherein the similarity cost between the corresponding feature points comprises:

a hamming distance between the corresponding feature points.

8. The method according to any one of claims 5-7, further comprising:

and determining the characteristic points through a census algorithm.

9. The method according to any one of claims 1-8, wherein the predetermined spatial plane is at least two planes, the at least two planes comprising a parallel plane and a vertical plane.

10. The method according to any one of claims 4-9, wherein the determining, according to the similarity cost, a face cost of each of the pixel points under the preset spatial plane comprises:

determining the matching cost of the pixel points on the left side of the pixel points, the matching cost of the pixel points on the right side, the matching cost of the pixel points on the upper side and the matching cost of the pixel points on the lower side according to the plane parameters of the preset space plane and the similarity cost of the pixel points;

and the average value of the matching cost of the left pixel point, the matching cost of the right pixel point, the matching cost of the upper pixel point and the matching cost of the lower pixel point is the surface cost of the pixel points under the preset space plane.

11. The method of claim 10, further comprising:

determining a first average value of the matching cost of the left pixel point and the matching cost of the right pixel point;

determining a second average value of the matching cost of the upper pixel point and the matching cost of the lower pixel point;

12. The method according to any of claims 1-11, wherein said determining disparity maps for the multiple views in the scene according to the face cost comprises:

determining an energy equation, wherein the energy equation is expressed as the energy of a disparity map needing to be solved, the sum of a data item and a smoothing item is called as a normalization norm of a second matching cost of all pixel points, and the smoothing item is a total variation;

and solving the energy equation to obtain the disparity map.

13. A driving assist apparatus, characterized by comprising:

at least one memory for storing computer-executable instructions;

14. The driver assistance apparatus according to claim 13, wherein the plurality of views are two views obtained by a binocular vision obtaining device.

15. The driving assistance apparatus according to claim 13, wherein the processor is specifically configured to:

determining a similarity cost for each of the pixels in the plurality of views.

16. The driving assistance apparatus according to claim 15, wherein the processor is specifically configured to:

and the surface cost of each pixel point is global surface cost.

17. The driving assist apparatus according to claim 15 or 16, wherein the processor is specifically configured to:

feature points in each of the plurality of views are determined.

18. The driver-assist apparatus of claim 17, wherein the processor determining a similarity cost for each corresponding pixel point in the plurality of views comprises:

19. The driving assist apparatus according to claim 18, wherein a distance between the corresponding feature points includes:

a hamming distance between the corresponding feature points.

20. The driving assistance apparatus according to any one of claims 17 to 19, characterized in that the processor is specifically configured to:

and determining the characteristic points through a census algorithm.

21. The driver-assistance device according to any one of claims 13-20, wherein the preset spatial plane is at least two planes, the at least two planes including a parallel plane and a vertical plane.

22. The driving assistance apparatus according to any one of claims 13 to 21, characterized in that the processor is specifically configured to:

23. The driving assistance apparatus according to claim 22, wherein the processor is specifically configured to:

24. The driving assistance apparatus according to any one of claims 13 to 23, characterized in that the processor is specifically configured to:

and solving the energy equation to obtain the disparity map.

25. The device of any of claims 13-24, further comprising a vision sensor for obtaining the view.

26. A vehicle characterized by comprising the driving assist apparatus according to any one of claims 13 to 25.

27. The vehicle of claim 26, further comprising at least two vision sensors for acquiring the view.

28. A computer-readable storage medium having stored thereon instructions which, when executed on a computer, cause the computer to perform the method of image stereo matching according to any one of claims 1 to 12.