CN115018926A

CN115018926A - Method, device and equipment for determining pitch angle of vehicle-mounted camera and storage medium

Info

Publication number: CN115018926A
Application number: CN202210580410.2A
Authority: CN
Inventors: 马文广; 姬鹏飞; 黄凯明
Original assignee: Streamax Technology Co Ltd
Current assignee: Streamax Technology Co Ltd
Priority date: 2022-05-26
Filing date: 2022-05-26
Publication date: 2022-09-06

Abstract

The application discloses a pitch angle determining method, device, equipment and storage medium of a vehicle-mounted camera, and belongs to the technical field of computers. The method comprises the following steps: acquiring a first image and a second image; inputting the first image and the second image into an optical flow estimation model to output an optical flow diagram through the optical flow estimation model; determining the focal length of the vehicle-mounted camera according to the light flow graph; obtaining the coordinate value of the optical center in the second image and the coordinate value of the vanishing point in the second image; and determining the pitch angle of the vehicle-mounted camera according to the focal length of the vehicle-mounted camera, the coordinate value of the optical center in the second image and the coordinate value of the vanishing point in the second image. According to the method and the device, the focal length of the vehicle-mounted camera is accurately determined according to the optical flow diagram, the pitch angle of the vehicle-mounted camera is determined according to the accurately determined focal length of the vehicle-mounted camera, the coordinate value of the optical center in the second image and the coordinate value of the vanishing point in the second image, and the accuracy of the pitch angle of the vehicle-mounted camera can be improved.

Description

Method, device and equipment for determining pitch angle of vehicle-mounted camera and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a storage medium for determining a pitch angle of a vehicle-mounted camera.

Background

At present, fields such as automatic driving and assistant driving are getting more and more hot, and parameters calibrated by a vehicle-mounted camera installed on a vehicle are very important in the fields, for example, the parameters calibrated by the vehicle-mounted camera are needed to be used in tasks such as lane line detection, vehicle tracking and distance measurement. Parameters calibrated by the vehicle-mounted camera generally comprise a heading angle, a pitch angle and a roll angle of the vehicle-mounted camera, wherein the pitch angle of the vehicle-mounted camera is the most important.

In the related art, after a technician mounts a vehicle-mounted camera, a reference object marked with a reference line is placed in front of a vehicle, and then the technician finds a target position on a horizontal ground and measures a distance between the target position and the vehicle-mounted camera, a distance between the reference object and the vehicle-mounted camera, a height of the reference line, and a height of the vehicle-mounted camera. And finally, calculating the pitch angle of the vehicle-mounted camera according to the distance between the measured target position and the vehicle-mounted camera, the distance between the reference object and the vehicle-mounted camera, the height of the reference line and the height of the vehicle-mounted camera.

However, the above process needs manual operation of technicians, which has strict requirements on the skill of the technicians, and different technicians have errors in manual measurement, so that the determined pitch angle of the vehicle-mounted camera is not accurate enough.

Disclosure of Invention

The application provides a method, a device, equipment and a storage medium for determining a pitch angle of a vehicle-mounted camera, which can improve the accuracy of the pitch angle of the vehicle-mounted camera. The technical scheme is as follows:

in a first aspect, a pitch angle determination method for a vehicle-mounted camera is provided, and the method includes:

acquiring a first image and a second image, wherein the first image is a previous frame video image in two adjacent frame video images in a video shot by a vehicle-mounted camera, and the second image is a next frame video image in the two frame video images;

inputting the first image and the second image into an optical flow estimation model to output an optical flow map through the optical flow estimation model, the optical flow map being used for indicating displacement values of pixel points belonging to the same object between the first image and the second image;

determining the focal length of the vehicle-mounted camera according to the light flow graph;

obtaining the coordinate value of the optical center in the second image and the coordinate value of the vanishing point in the second image;

and determining the pitch angle of the vehicle-mounted camera according to the focal length of the vehicle-mounted camera, the coordinate value of the optical center in the second image and the coordinate value of the vanishing point in the second image.

In the method, a first image and a second image are obtained, the first image and the second image are two adjacent frames of video images in a video shot by a vehicle-mounted camera, then the first image and the second image are input into an optical flow estimation model to output an optical flow graph through the optical flow estimation model, and the optical flow graph can accurately indicate displacement values of pixel points belonging to the same object between the first image and the second image, so that the focal length of the vehicle-mounted camera is determined according to the optical flow graph, and the determined focal length of the vehicle-mounted camera can be more accurate. And finally, obtaining the coordinate value of the optical center in the second image and the coordinate value of the vanishing point in the second image, and determining the pitch angle of the vehicle-mounted camera according to the focal length of the vehicle-mounted camera, the coordinate value of the optical center in the second image and the coordinate value of the vanishing point in the second image. Therefore, the focal length of the vehicle-mounted camera can be accurately determined according to the optical flow diagram, the pitch angle of the vehicle-mounted camera can be determined according to the accurately determined focal length of the vehicle-mounted camera, the coordinate value of the optical center in the second image and the coordinate value of the vanishing point in the second image, and the accuracy of the pitch angle of the vehicle-mounted camera can be improved.

Optionally, the optical flow estimation model comprises a first encoding module, a second encoding module and a first decoding module;

the first encoding module comprises m first convolution layers, the second encoding module comprises m second convolution layers, the m first convolution layers are used for extracting a feature map of the first image, the m second convolution layers are used for extracting a feature map of the second image, and m is an integer greater than or equal to 2;

the first decoding module comprises n optical flow estimation modules, a first one of the n optical flow estimation modules is configured to: outputting a light flow graph according to the feature graph extracted from the mth first convolution layer in the m first convolution layers and the feature graph extracted from the mth second convolution layer in the m second convolution layers; an ith optical flow estimation module of the n optical flow estimation modules is configured to: outputting a light flow diagram according to the light flow diagram output by the i-1 th light flow estimation module in the n light flow estimation modules, the feature diagram extracted by the m-i +1 th first convolution layer in the m first convolution layers and the feature diagram extracted by the m-i +1 th second convolution layer in the m second convolution layers; the optical flow graph output by the nth optical flow estimation module in the n optical flow estimation modules is an optical flow graph output by the optical flow estimation model, wherein n is an integer greater than or equal to 2, and i is an integer greater than or equal to 2 and less than or equal to n.

Optionally, the outputting a light flow graph according to the light flow graph output by the i-1 th light flow estimation module in the n light flow estimation modules, the feature map extracted by the m-i +1 th first convolution layer in the m first convolution layers, and the feature map extracted by the m-i +1 th second convolution layer in the m second convolution layers includes:

up-sampling the optical flow graph output by the i-1 optical flow estimation module to obtain a target optical flow graph;

according to the target light flow graph, converting the feature graph extracted from the (m-i + 1) th first convolution layer to obtain a target feature graph;

determining the matching cost between the target feature map and the feature map extracted by the (m-i + 1) th second convolution layer;

outputting a light flow graph according to the target light flow graph, the matching cost and the feature graph extracted from the (m-i + 1) th second convolution layer.

Optionally, the displacement value of the pixel point belonging to the same object between the first image and the second image includes a lateral displacement value, and determining the focal length of the vehicle-mounted camera according to the light flow graph includes:

determining an average value of lateral displacement values of all pixel points indicated by the optical flow graph;

acquiring the angular speed of a vehicle provided with the vehicle-mounted camera;

dividing the average value by the angular velocity to obtain a first ratio;

determining a negative of the first ratio as a focal length of the onboard camera.

Optionally, the obtaining coordinate values of vanishing points in the second image includes:

inputting the second image into a vanishing point prediction model so as to output a vanishing point probability matrix through the vanishing point prediction model, wherein the probability value in the vanishing point probability matrix is the probability value that each pixel point in the second image is a vanishing point, and the pixel point corresponding to the maximum probability value in the vanishing point probability matrix is a vanishing point in the second image;

and determining the coordinate value of the pixel point corresponding to the maximum probability value in the probability matrix of the vanishing point in the second image as the coordinate value of the vanishing point in the second image.

Optionally, the determining a pitch angle of the vehicle-mounted camera according to the focal length of the vehicle-mounted camera, the coordinate value of the optical center in the second image, and the coordinate value of the vanishing point in the second image includes:

identifying two lane lines of a road contained in the second image;

determining intersection coordinate values of the two lane lines in the second image;

and determining the pitch angle of the vehicle-mounted camera according to the focal length of the vehicle-mounted camera, the coordinate value of the intersection point, the coordinate value of the optical center in the second image and the coordinate value of the vanishing point in the second image.

Optionally, the determining a pitch angle of the vehicle-mounted camera according to the focal length of the vehicle-mounted camera, the coordinate value of the intersection point, the coordinate value of the optical center in the second image, and the coordinate value of the vanishing point in the second image includes:

determining an initial pitch angle of the vehicle-mounted camera according to the focal length of the vehicle-mounted camera, the coordinate value of the optical center in the second image and the coordinate value of the vanishing point in the second image;

determining a correction pitch angle of the vehicle-mounted camera according to the focal length of the vehicle-mounted camera, the intersection point coordinate value and the coordinate value of the vanishing point in the second image;

and determining the pitch angle of the vehicle-mounted camera according to the initial pitch angle and the corrected pitch angle of the vehicle-mounted camera.

Optionally, the determining an initial pitch angle of the vehicle-mounted camera according to the focal length of the vehicle-mounted camera, the coordinate value of the optical center in the second image, and the coordinate value of the vanishing point in the second image includes:

determining an initial pitch angle of the vehicle-mounted camera according to the focal length of the vehicle-mounted camera, the coordinate value of the optical center in the second image and the coordinate value of the vanishing point in the second image by the following formula;

wherein, theta _{First stage} Is the initial pitch angle of the vehicle-mounted camera, f is the focal length of the vehicle-mounted camera, v ₁ Is a longitudinal coordinate value, v, of the coordinate values of the vanishing point in the second image ₀ Is the ordinate value in the coordinate values of the optical center in the second image.

Optionally, the determining the pitch angle of the vehicle-mounted camera according to the initial pitch angle and the corrected pitch angle of the vehicle-mounted camera includes:

determining the pitch angle of the vehicle-mounted camera through the following formula according to the initial pitch angle and the corrected pitch angle of the vehicle-mounted camera;

θ＝θ _{beginning of the design} +Δθ

Wherein θ is the pitch angle of the vehicle-mounted camera, θ _{Beginning of the design} And the delta theta is the initial pitch angle of the vehicle-mounted camera, and the delta theta is the corrected pitch angle of the vehicle-mounted camera.

In a second aspect, there is provided an in-vehicle camera ground pitch angle determination apparatus, the apparatus including:

the device comprises a first acquisition module and a second acquisition module, wherein the first acquisition module is used for acquiring a first image and a second image, the first image is a previous frame of video image in two adjacent frame of video images in a video shot by a vehicle-mounted camera, and the second image is a next frame of video image in the two frame of video images;

an optical flow generation module, configured to input the first image and the second image into an optical flow estimation model to output an optical flow graph through the optical flow estimation model, where the optical flow graph is used to indicate displacement values of pixel points belonging to the same object between the first image and the second image;

the first determining module is used for determining the focal length of the vehicle-mounted camera according to the light flow graph;

the second acquisition module is used for acquiring the coordinate value of the optical center in the second image and the coordinate value of the vanishing point in the second image;

and the second determining module is used for determining the pitch angle of the vehicle-mounted camera according to the focal length of the vehicle-mounted camera, the coordinate value of the optical center in the second image and the coordinate value of the vanishing point in the second image.

Optionally, the ith optical flow estimation module is configured to:

outputting a light flow diagram according to the target light flow diagram, the matching cost and the feature diagram extracted by the (m-i + 1) th second convolution layer.

Optionally, displacement values of the pixel points belonging to the same object between the first image and the second image include a lateral displacement value, and the first determining module is configured to:

dividing the average value by the angular velocity to obtain a first ratio;

Optionally, the second obtaining module is configured to:

and determining the coordinate value of the pixel point corresponding to the maximum probability value in the vanishing point probability matrix in the second image as the coordinate value of the vanishing point in the second image.

Optionally, the second determining module includes:

an identification unit configured to identify two lane lines of a road included in the second image;

a first determination unit, configured to determine intersection coordinate values of the two lane lines in the second image;

and the second determining unit is used for determining the pitch angle of the vehicle-mounted camera according to the focal length of the vehicle-mounted camera, the intersection point coordinate value, the coordinate value of the optical center in the second image and the coordinate value of the vanishing point in the second image.

Optionally, the second determining unit is configured to:

determining an initial pitch angle of the vehicle-mounted camera through a following formula according to the focal length of the vehicle-mounted camera, the coordinate value of the optical center in the second image and the coordinate value of the vanishing point in the second image;

wherein, theta _{First stage} Is the initial pitch angle of the vehicle-mounted camera, f is the focal length of the vehicle-mounted camera, v ₁ Is a longitudinal coordinate value, v, of the coordinate values of the vanishing point in the second image ₀ Is an ordinate value among the coordinate values of the optical center in the second image.

Optionally, the second determining unit is configured to:

determining the pitch angle of the vehicle-mounted camera according to the initial pitch angle and the corrected pitch angle of the vehicle-mounted camera by the following formula;

θ＝θ _{first stage} +Δθ

Wherein θ is the pitch angle of the vehicle-mounted camera, θ _{First stage} And the delta theta is the initial pitch angle of the vehicle-mounted camera, and the delta theta is the corrected pitch angle of the vehicle-mounted camera.

In a third aspect, a computer device is provided, the computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the computer program, when executed by the processor, implementing the above-mentioned pitch angle determination method for an in-vehicle camera.

In a fourth aspect, a computer-readable storage medium is provided, which stores a computer program that, when executed by a processor, implements the above-described pitch angle determination method for an in-vehicle camera.

In a fifth aspect, a computer program product is provided comprising instructions which, when run on a computer, cause the computer to perform the steps of the above-described pitch angle determination method for an in-vehicle camera.

It is to be understood that, for the beneficial effects of the second aspect, the third aspect, the fourth aspect and the fifth aspect, reference may be made to the description of the first aspect, and details are not described herein again.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flowchart of a pitch angle determining method for a vehicle-mounted camera according to an embodiment of the present disclosure;

FIG. 2 is a schematic structural diagram of an optical flow estimation model provided in an embodiment of the present application;

fig. 3 is a schematic structural diagram of a vanishing point prediction model provided in an embodiment of the present application;

FIG. 4 is a schematic structural diagram of an attention submodule provided in an embodiment of the present application;

FIG. 5 is a schematic structural diagram of another attention sub-module provided in the embodiments of the present application;

FIG. 6 is a schematic diagram illustrating a pitch angle determining method for a vehicle-mounted camera according to an embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of a pitch angle determining apparatus of a vehicle-mounted camera according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

It should be understood that reference to "a plurality" in this application refers to two or more. In the description of the present application, "/" means "or" unless otherwise stated, for example, a/B may mean a or B; "and/or" herein is only an association relationship describing an association object, and means that there may be three relationships, for example, a and/or B, and may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, for the convenience of clearly describing the technical solutions of the present application, the terms "first", "second", and the like are used to distinguish the same items or similar items having substantially the same functions and actions. Those skilled in the art will appreciate that the terms "first," "second," etc. do not denote any order or quantity, nor do the terms "first," "second," etc. denote any order or importance.

Before explaining the embodiments of the present application in detail, an application scenario of the embodiments of the present application will be described.

The parameters calibrated by the vehicle-mounted camera generally comprise a heading angle, a pitch angle and a roll angle of the vehicle-mounted camera, wherein the pitch angle of the vehicle-mounted camera is the most important. The pitch angle of the vehicle-mounted camera refers to an included angle between an optical axis of the vehicle-mounted camera and a horizontal plane. If the pitch angle of the vehicle-mounted camera is determined inaccurately, an error exists in a processing result of a video image shot by the vehicle-mounted camera, which affects estimation of a driving system on a state of a vehicle on a road, such as estimation of a vehicle distance, and therefore a decision of the driving system is affected, and normal running of the vehicle is affected.

The pitch angle determining method of the vehicle-mounted camera provided by the embodiment of the application is applied to a scene for determining the pitch angle of the vehicle-mounted camera. Specifically, two adjacent frames of video images shot by a vehicle-mounted camera during the running process of the vehicle are acquired, and then the two adjacent frames of video images are input into an optical flow estimation model to obtain an optical flow diagram, so that the focal length of the vehicle-mounted camera can be calculated according to the optical flow diagram. And finally, determining the pitch angle of the vehicle-mounted camera according to the focal length of the vehicle-mounted camera, the optical center coordinate value of the video image and the coordinate value of the vanishing point. Therefore, the focal length of the vehicle-mounted camera can be accurately determined according to the optical flow diagram, and the pitch angle of the vehicle-mounted camera is determined according to the accurately determined focal length of the vehicle-mounted camera, the optical center coordinate value and the coordinate value of the vanishing point, so that the accuracy of the pitch angle of the vehicle-mounted camera can be improved.

The following explains the pitch angle determining method of the vehicle-mounted camera provided in the embodiment of the present application in detail.

Fig. 1 is a flowchart of a pitch angle determining method of a vehicle-mounted camera according to an embodiment of the present application. The method can be applied to a terminal, such as a vehicle-mounted terminal. Referring to fig. 1, the method includes the following steps.

Step 101: the terminal acquires a first image and a second image.

In the running process of a vehicle, a vehicle-mounted camera shoots a video containing a road, and a terminal obtains two adjacent frames of video images in the video shot by the vehicle-mounted camera. The first image is a previous frame video image in two adjacent frame video images in the video shot by the vehicle-mounted camera, and the second image is a next frame video image in the two frame video images. For example, the first image is a video image at time t in a video captured by the onboard camera, and the second image is a video image at time t +1 in the video captured by the onboard camera.

Optionally, the terminal may acquire the first image and the second image in real time, or may acquire the first image and the second image at intervals of a preset duration. The preset time period may be set in advance. And the preset duration can be set by a technician according to driving requirements.

That is, the first image and the second image may be two adjacent video images in a video captured by the vehicle-mounted camera, which is newly acquired by the terminal. In this case, the terminal may perform the following steps 102 to 105 to determine the latest pitch angle of the onboard camera each time the terminal acquires the first image and the second image.

Step 102: the terminal inputs the first image and the second image into the optical flow estimation model to output an optical flow diagram through the optical flow estimation model.

The optical flow estimation model is used for carrying out optical flow estimation on the input first image and the input second image to output an optical flow diagram. The optical flow graph is a visual image, and the optical flow graph is used for accurately indicating displacement values of pixel points belonging to the same object between a first image and a second image, for example: the optical flow graph is used to indicate how much a pixel point belonging to the same object moves from time t to time t + 1.

Because the capability of predicting the optical flow based on the neural network is far superior to that of the traditional method for outputting the optical flow based on feature matching, and the optical flow graph is used for accurately indicating the displacement value of the pixel point belonging to the same object between the first image and the second image, a light flow graph is output through the optical flow estimation model, and the optical flow graph can indicate the more accurate displacement value of the pixel point belonging to the same object between the first image and the second image.

Illustratively, the optical flow estimation model includes a first encoding module, a second encoding module, and a first decoding module; the first encoding module includes m first convolution layers, the second encoding module includes m second convolution layers, the first decoding module includes n optical flow estimation modules, and the first encoding module and the second encoding module process the input image using the same weight (also referred to as a parameter), specifically, the weight in the jth first convolution layer is the same as the weight in the jth second convolution layer, that is, the jth first convolution layer and the jth second convolution layer share the weight. m is an integer greater than or equal to 2, n is an integer greater than or equal to 2, and m +1 ═ n. j is an integer greater than or equal to 1 and less than or equal to m.

As shown in fig. 2, fig. 2 is a schematic structural diagram of the optical flow estimation model, the input of the optical flow estimation model is a first image 201 and a second image 202, the optical flow estimation model includes m first convolution layers 203, m second convolution layers 204, n optical flow estimation modules 205, and the output of the optical flow estimation model is an optical flow graph 206. The m first convolution layers 203 are connected in sequence, and the m second convolution layers 204 are connected in sequence; the first optical flow estimation module 205 of the n optical flow estimation modules 205 is connected to the mth first convolution layer 203 and the mth second convolution layer 204, the ith optical flow estimation module 205 is connected to the (i-1) th optical flow estimation module 205, the (m-i + 1) th first convolution layer 203 and the (m-i + 1) th second convolution layer 204, that is, the ith optical flow estimation module 205 and the (m-i + 1) th first convolution layer 203 are connected in a cross-layer manner, and the ith optical flow estimation module 205 and the (m-i + 1) th second convolution layer 204 are also connected in a cross-layer manner. i is an integer greater than or equal to 2 and less than or equal to n.

The m first convolution layers 203 are used to extract the feature map of the first image 201, and the m second convolution layers 204 are used to extract the feature map of the second image 202.

The first optical-flow estimation module 205 of the n optical-flow estimation modules 205 is configured to: a flowsheet is output according to the feature map extracted by the mth first convolution layer 203 of the m first convolution layers 203 and the feature map extracted by the mth second convolution layer 204 of the m second convolution layers 204.

The size of the feature map extracted by the mth first convolution layer 203 and the feature map extracted by the mth second convolution layer 204 is the same as the size of the optical flow map output by the first optical flow estimation module 205.

Optionally, the first optical flow estimation module 205 outputs an optical flow graph according to the feature map extracted by the mth first convolution layer 203 and the feature map extracted by the mth second convolution layer 204, where the operation of outputting the optical flow graph may be: the first optical flow estimation module 205 determines matching costs between the feature maps extracted by the mth first convolution layer 203 and the feature maps extracted by the mth second convolution layer 204; outputting a light flow graph according to the matching cost between the feature graph extracted by the mth first convolution layer 203 and the feature graph extracted by the mth second convolution layer 204.

Matching cost is used for measuring the difference between the two feature maps, and the larger the matching cost is, the larger the difference between the two feature maps is; the smaller the matching cost, the smaller the difference between the two feature maps. That is, the larger the matching cost between the feature map extracted by the mth first convolution layer 203 and the feature map extracted by the mth second convolution layer 204 is, the larger the difference between the feature map extracted by the mth first convolution layer 203 and the feature map extracted by the mth second convolution layer 204 is; the smaller the matching cost between the feature map extracted by the mth first convolution layer 203 and the feature map extracted by the mth second convolution layer 204 is, the smaller the difference between the feature map extracted by the mth first convolution layer 203 and the feature map extracted by the mth second convolution layer 204 is.

In this way, by instructing the first optical flow estimation module 205 to output an optical flow graph using the matching cost between the feature map extracted by the mth first convolution layer 203 and the feature map extracted by the mth second convolution layer 204, the optical flow graph output by the first optical flow estimation module 205 can be made more accurate.

The operation of the first optical flow estimation module 205 in determining the matching cost between the feature map extracted by the mth first convolution layer 203 and the feature map extracted by the mth second convolution layer 204 may be: according to the feature map extracted by the mth first convolution layer 203 and the feature map extracted by the mth second convolution layer 204, the matching cost between the feature map extracted by the mth first convolution layer 203 and the feature map extracted by the mth second convolution layer 204 is obtained through the following formula:

wherein the content of the first and second substances,

for the matching cost between the feature map extracted for the mth first convolution layer 203 and the feature map extracted for the mth second convolution layer 204,

the feature map extracted for the mth first convolution layer 203,

k is the number of channels of the feature map extracted by the mth first convolution layer 203 or the feature map extracted by the mth second convolution layer 204.

The first optical flow estimation module 205 outputs an optical flow diagram according to the matching cost between the feature map extracted by the mth first convolution layer 203 and the feature map extracted by the mth second convolution layer 204 by: according to the matching cost between the feature map extracted by the mth first convolution layer 203 and the feature map extracted by the mth second convolution layer 204, a light flow diagram is obtained through the following formula:

wherein, flow ¹ Optical flow graph, CNN, output by the first optical flow estimation module 205 ₁ () A first convolutional network is represented.

The first convolution network may be preset, the first convolution network is arranged in the first optical flow estimation module 205, the input of the first convolution network is the matching cost between the feature map extracted by the mth first convolution layer 203 and the feature map extracted by the mth second convolution layer 204, and the output of the first convolution network is an optical flow graph. That is, matching costs between the feature map extracted by the mth first convolution layer 203 and the feature map extracted by the mth second convolution layer 204 are input into the first convolution network for processing, so that a light flow graph can be obtained.

The ith optical flow estimation module 205 of the n optical flow estimation modules 205 is configured to: outputting an optical flow graph according to the optical flow graph output by the i-1 th optical flow estimation module 205 in the n optical flow estimation modules 205, the feature map extracted by the m-i +1 th first convolution layer 203 in the m first convolution layers 203, and the feature map extracted by the m-i +1 th second convolution layer 204 in the m second convolution layers 204.

In this way, the feature map extracted by the (m-i + 1) th first convolution layer 203 and the feature map extracted by the (m-i + 1) th second convolution layer 204 are used as the input of the i-th optical flow estimation module 205, and the i-th optical flow estimation module 205 processes the feature map extracted by the (m-i + 1) th first convolution layer 203 and the feature map extracted by the (m-i + 1) th second convolution layer 204, so that the output optical flow map can be more accurate.

In this case, the optical flow map output by the nth optical flow estimation module 205 of the n optical flow estimation modules 205 is the optical flow map 206 output by the optical flow estimation model.

Alternatively, the ith optical flow estimation module 205 outputs an optical flow diagram according to the optical flow diagram output by the i-1 st optical flow estimation module 205, the feature map extracted by the (m-i + 1) th first convolution layer 203, and the feature map extracted by the (m-i + 1) th second convolution layer 204, and the operation of outputting the optical flow diagram can be: the ith optical flow estimation module 205 performs up-sampling on the optical flow graph output by the (i-1) th optical flow estimation module 205 to obtain a target optical flow graph; converting the feature map extracted by the (m-i + 1) th first convolution layer 203 according to the target light flow map to obtain a target feature map; determining the matching cost between the target feature map and the feature map extracted by the (m-i + 1) th second convolutional layer 204; and outputting a light flow graph according to the target light flow graph, the matching cost and the feature graph extracted by the m-i +1 th second convolution layer 204.

In this case, the feature map extracted by the (m-i + 1) th first convolution layer 203 is transformed, and then the matching cost between the target feature map obtained through transformation and the feature map extracted by the (m-i + 1) th second convolution layer 204 is determined, so that the matching cost is relatively accurate, and a more accurate light-flow graph can be output according to the target light-flow graph, the matching cost and the feature map extracted by the (m-i + 1) th second convolution layer 204.

Alternatively, the multiple by which the i-th optical-flow estimation module 205 upsamples the optical-flow map output by the i-1 st optical-flow estimation module 205 may be set according to the multiple by which the size of the input data and the size of the output data of the m-i + 2-th second convolution layer differ. In this way, the size of the target light flow pattern in the subsequent operation can be ensured to be the same as the size of the feature pattern extracted by the m-i +1 th first convolution layer 203 and the size of the feature pattern extracted by the m-i +1 th second convolution layer 204. For example: the dimension of the input data and the dimension of the output data of the (m-i + 2) th second convolution layer differ by a factor of 2, then the i-th optical flow estimation module 205 may up-sample the optical flow map output by the i-1 th optical flow estimation module 205 by a factor of 2.

The ith optical flow estimation module 205 transforms the feature map extracted by the m-i +1 th first convolution layer 203 according to the target optical flow map, and the operation of obtaining the target feature map may be: according to the target light flow diagram, the feature map extracted from the m-i +1 th first convolution layer 203 is transformed by the following formula:

wherein, W ⁱ For the target feature map, Wrap () represents a transform operation,

flow, feature map extracted for the m-i +1 th first convolution layer 203 ^i-1 The optical flow graph, upsample (flow), output by the i-1 th optical flow estimation module 205 ^i-1 ) Is a target light flow diagram.

The operation of determining the matching cost between the target feature map and the feature map extracted by the (i) th optical flow estimation module 205 and the (m-i + 1) th second convolution layer 204 is similar to the operation of determining the matching cost between the feature map extracted by the (m) th first convolution layer 203 and the feature map extracted by the (m) th second convolution layer 204, and details of the embodiment of the present application are omitted.

Wherein, the ith optical flow estimation module 205 outputs an optical flow diagram according to the target optical flow diagram, the matching cost and the feature map extracted by the m-i +1 th second convolution layer 204 by: according to the target light flow graph, the matching cost and the feature map extracted by the m-i +1 th second convolution layer 204, the light flow graph is obtained through the following formula:

wherein, flow ⁱ For the optical flow graph output by the ith optical flow estimation module 205,

CNN, the matching cost between the target feature map and the feature map extracted by the m-i +1 th second convolution layer 204 ₂ () Is a second convolutional network.

The second convolution network may be preset, the second convolution network is set in the ith optical flow estimation module 205, the input of the second convolution network is the target optical flow graph, the matching cost and the feature graph extracted from the m-i +1 th second convolution layer 204, and the output of the second convolution network is the optical flow graph. That is, the target light-flow graph, the matching cost, and the feature map extracted by the m-i +1 th second convolution layer 204 are input into the second convolution network for processing, so as to obtain the light-flow graph.

It is noted that before the optical flow estimation model is used to perform optical flow estimation on the first image and the second image to obtain the optical flow graph, the optical flow estimation model also needs to be trained.

Specifically, the terminal may obtain a plurality of training samples, and train the neural network model using the plurality of training samples to obtain the optical flow estimation model.

The plurality of training samples may be preset. Each training sample in the plurality of training samples comprises a sample image and a sample mark, the sample image is a video image pair, the video image pair is two adjacent frames of video images shot by the vehicle-mounted camera, and the sample mark is a light flow graph corresponding to the sample image. That is, the input data in each of the plurality of training samples is a video image pair, and the sample is marked as an optical flow graph corresponding to the video image pair, where the optical flow graph corresponding to the video image is used to indicate a displacement value of a pixel point belonging to the same object between the video image pair.

The neural network model may include a plurality of network layers including an input layer, a plurality of hidden layers, and an output layer. The input layer is responsible for receiving input data; the output layer is responsible for outputting the processed data; the plurality of hidden layers are positioned between the input layer and the output layer and are responsible for processing data, and the plurality of hidden layers are invisible to the outside. The neural network model is identical in structure to the optical flow estimation model shown in fig. 2.

When the terminal uses a plurality of training samples to train the neural network model, for each training sample in the plurality of training samples, the input data in the training sample can be input into the neural network model to obtain output data; determining a loss value between the output data and a sample marker in the training sample by a loss function; and adjusting parameters in the neural network model according to the loss value. After parameters in the neural network model are adjusted based on each training sample in the plurality of training samples, the neural network model with the adjusted parameters is the optical flow estimation model.

Optionally, the training of the optical flow estimation model provided in the embodiment of the present application is divided into two stages: a pre-training phase and an optimization phase. During the pre-training phase, embodiments of the present application use a mean square error loss function to determine the loss value between the output data and the sample labels in this training sample. Such as: the loss value between the output data and the sample label in this training sample is determined by the following formula.

Therein, loss _pre-train F is the number of pixel points in the output data, s is the s-th optical flow estimation module in the n optical flow estimation modules, r represents the r-th pixel point in the pixel points of the output data,

for the optical flow of the r-th pixel point in the optical flow graph output by the s-th optical flow estimation module,

the optical flow of the r-th pixel point in the sample mark is shown, s is an integer which is greater than or equal to 1 and less than or equal to n, F is a positive integer, and r is an integer which is greater than or equal to 1 and less than or equal to F.

Thus, the convergence speed of the neural network model can be increased by using the mean square error loss function in the pre-training stage.

In the optimization stage, the embodiments of the present application use the mean absolute error loss function to determine the loss value between the output data and the sample label in this training sample. Such as: the loss value between the output data and the sample label in this training sample is determined by the following formula.

Therein, loss _fine-tune To optimize the loss value between the output data and the sample labels in this training sample.

Thus, the optical flow quality can be further improved by using the average absolute error loss function in the optimization stage.

The operation of the terminal adjusting the parameters in the neural network model according to the loss value may refer to related technologies, which are not described in detail in this embodiment.

For example, the terminal may pass a formula

To adjust any one of the parameters in the neural network model. Wherein the content of the first and second substances,

is the adjusted parameter. z is a parameter before adjustment. α is a learning rate, α may be preset, for example, α may be 0.001, 0.000001, and the like, which is not limited in this embodiment of the application. dz is the partial derivative of the loss function with respect to z, and can be found from the loss value.

Optionally, the training data set of the optical flow estimation model may be an open data set, or may be a video image captured by a vehicle-mounted camera collected on a network, which is not limited in this embodiment of the application.

Since the optical flow graph output by the optical flow estimation model is used for indicating the displacement value of the pixel point belonging to the same object between the first image and the second image, the terminal can determine the focal length of the vehicle-mounted camera according to the optical flow graph, and then continue to execute step 103.

Step 103: and the terminal determines the focal length of the vehicle-mounted camera according to the optical flow diagram.

The displacement values of the pixel points belonging to the same object between the first image and the second image include a lateral displacement value, that is, a displacement value of the pixel points belonging to the same object in a horizontal direction.

When the vehicle turns, that is, when the vehicle-mounted camera horizontally rotates by a certain angle around the vertical direction in unit time, the angle shows that the line pixels in the image generate a certain displacement, that is, the displacement value of the pixel points belonging to the same object in the horizontal direction (that is, in the transverse direction). The focal length of the vehicle-mounted camera is equivalent to a scaling factor, and the ratio of the angle to the displacement value generated by the corresponding line pixel in the image is the focal length of the vehicle-mounted camera.

In this case, the terminal may determine the focal length of the vehicle-mounted camera according to the displacement value of the relatively accurate pixel point indicated by the accurately obtained optical flow graph during the vehicle driving process, so that the determined focal length of the vehicle-mounted camera may be relatively accurate.

Specifically, the operation of step 103 may be: the terminal determines the average value of the transverse displacement values of all the pixel points indicated by the optical flow diagram; acquiring the angular velocity of a vehicle provided with a vehicle-mounted camera; dividing the average value by the angular velocity to obtain a first ratio; and determining the negative number of the first ratio as the focal length of the vehicle-mounted camera.

The average value is the distance offset between the first image and the second image. During the running of the vehicle, the angular speed of the running of the vehicle is measured. The process of determining the focal length of the vehicle-mounted camera is a process of determining the focal length of the vehicle-mounted camera by the following formula:

wherein f is the focal length of the vehicle-mounted camera, and Δ u is the average value, i.e. the distance deviation;

is the angular velocity.

Step 104: and the terminal acquires the coordinate value of the optical center in the second image and the coordinate value of the vanishing point in the second image.

The optical center is the center of the vehicle-mounted camera lensAnd is also the center of the second image, the coordinate value of the optical center in the second image is

Where W and H are the width and height, respectively, of the second image.

The vanishing point is the visual intersection of two lane lines parallel to each other in the second image, typically used for calibration of camera parameters.

Optionally, the operation of the terminal acquiring the coordinate values of the vanishing point in the second image may be: the terminal inputs the second image into a vanishing point prediction model so as to output a vanishing point probability matrix through the vanishing point prediction model, wherein the probability value in the vanishing point probability matrix is the probability value that each pixel point in the second image is a vanishing point, and the pixel point corresponding to the maximum probability value in the vanishing point probability matrix is a vanishing point in the second image; and determining the coordinate value of the pixel point corresponding to the maximum probability value in the vanishing point probability matrix in the second image as the coordinate value of the vanishing point in the second image.

In this way, when the second image is input to the vanishing point predicting model while the vehicle is running, the coordinate values of the vanishing point in the second image can be automatically obtained.

Optionally, after the terminal determines the coordinate value of the pixel point corresponding to the maximum probability value in the probability matrix of the vanishing point in the second image as the coordinate value of the vanishing point in the second image, a gaussian thermodynamic diagram can be generated according to the coordinate value of the vanishing point in the second image, so as to realize the visualization effect on the vanishing point.

For example: the terminal may set the kernel radius of the gaussian thermodynamic diagram to 9, set the coordinate value of the vanishing point in the second image as the central point of the gaussian thermodynamic diagram, and then set the gaussian value of the central point of the gaussian thermodynamic diagram to 1, where the color in the gaussian thermodynamic diagram corresponding to the gaussian value of 1 is red, that is, the red region in the gaussian thermodynamic diagram is the position of the vanishing point.

Illustratively, the vanishing point prediction model includes a third encoding module, an attention module, and a second decoding module in sequence, as shown in fig. 3, fig. 3 is a schematic structural diagram of the vanishing point prediction model, input data of the vanishing point prediction model is input data 301, the vanishing point prediction model includes a third encoding module 302, an attention module 303, and a second decoding module 304, output data of the vanishing point prediction model is output data 305, the third encoding module 302 is configured to perform a convolution operation, the attention module 303 is configured to extract an attention feature map, and the second decoding module 304 is configured to perform a deconvolution operation. The third encoding module 302, the attention module 303 and the second decoding module are connected in sequence.

In this case, the attention module 303 is provided in the vanishing point prediction model, and the sense of field of the vanishing point prediction model can be increased to enhance the expression power of the feature map, thereby improving the accuracy of vanishing point prediction.

Optionally, the third encoding module 302 includes a plurality of third convolutional layers, and the convolutional kernel size of each third convolutional layer may be different from or the same as the set convolutional kernel size, and the second decoding module 304 includes a plurality of deconvolution layers, and the convolutional kernel size of the deconvolution layer may be set according to the convolutional kernel size of the corresponding third convolutional layer in the third encoding module 302. For example: in fig. 3, the third encoding module 302 includes three third convolutional layers, the second decoding module includes 3 deconvolution layers, and the convolutional kernel size of the first third convolutional layer is 7 × 7, the convolutional kernel sizes of the second third convolutional layer and the third convolutional layer are both 4 × 4, and the step size is set to 2. Since the second decoding module 304 is used to decode the feature map to the same size as the input data of the vanishing point prediction model, the convolution kernel sizes of the first and second deconvolution layers are 4 × 4, the step size is set to 2, and the convolution kernel size of the third deconvolution layer is 7 × 7.

The attention module 303 includes a plurality of attention sub-modules, each of which may be implemented by two possible structures as follows.

A first possible structure is shown in fig. 4, where fig. 4 is a schematic structural diagram of an attention submodule, input data of the attention submodule is input data 401, the attention submodule includes a fourth convolution layer 402, a horizontal pooling layer 403, a vertical pooling layer 404, a first decoding layer 405, a second decoding layer 406, a first fusion layer 407, an active layer 408, a second fusion layer 409, and a third fusion layer 410, and output data of the attention submodule is an attention feature map 411. The fourth convolution layer 402 is connected to the horizontal pooling layer 403 and the vertical pooling layer 404, respectively, the horizontal pooling layer 403 is connected to the first decoding layer 405, the vertical pooling layer 404 is connected to the second decoding layer 406, the first decoding layer 405 and the second decoding layer 406 are both connected to the first merging layer 407, the first merging layer 407 is connected to the active layer 408, the active layer 408 and the fourth convolution layer 402 are both connected to the second merging layer 409, and the second merging layer 409 and the input data 401 are both connected to the third merging layer 410.

The fourth convolutional layer 401 is used for performing convolution operation on the input data 401 of the attention submodule to obtain a first feature map; the transverse pooling layer 403 is used for performing pooling operation on the first feature map by using a pooling kernel with the size of 1 xW to obtain a transverse feature map; the longitudinal pooling layer 404 is configured to perform pooling operation on the first feature map by using a pooling kernel with a size of H × 1 to obtain a longitudinal feature map; the first decoding layer 405 is configured to perform upsampling on the lateral feature map to obtain a lateral attention feature map; the second decoding layer 406 is configured to perform upsampling on the longitudinal feature map to obtain a longitudinal attention feature map; the first fusion layer 407 is configured to add the lateral attention feature map and the longitudinal attention feature map to obtain a fusion feature map; the activation layer 408 is used for performing an activation operation on the fused feature map to obtain an activation feature map including a lateral attention feature and a longitudinal attention feature; the second fusion layer 409 is used for performing product operation on the activation characteristic map and the first characteristic map to obtain a second characteristic map; the third fusion layer 410 is used to add the input data 401 of the attention submodule to the second feature map to obtain an attention feature map 411. Wherein, W is the width of the first characteristic diagram, and H is the height of the first characteristic diagram.

Because the second image contains the transverse semantics and the longitudinal semantics, the transverse attention mechanism and the longitudinal attention mechanism are adopted to gather context information from the transverse direction and the longitudinal direction respectively, long-distance dependence of the features is modeled, and the transverse features and the longitudinal features in the first feature map are extracted, so that the expression capability of the features can be enhanced, and the accuracy of vanishing point prediction is improved.

A second possible structure is shown in fig. 5, where fig. 5 is a schematic structural diagram of an attention submodule, input data of the attention submodule is input data 501, the attention submodule includes a fourth convolution layer 502, a lateral convolution layer 503, a longitudinal convolution layer 504, a fifth convolution layer 505, a sixth convolution layer 506, a first decoding layer 507, a second decoding layer 508, a first fusion layer 509, a seventh convolution layer 510, an active layer 511, a second fusion layer 512, an eighth convolution layer 513, and a third fusion layer 514, and output data of the attention submodule is an attention feature map 515. The fourth convolutional layer 502 is connected to the horizontal pooling layer 503 and the vertical pooling layer 504, respectively; the transverse pooling layer 503, the fifth convolution layer 505 and the first decoding layer 507 are connected in sequence, and the longitudinal pooling layer 504, the sixth convolution layer 506 and the second decoding layer 508 are connected in sequence; the first decoding layer 507 and the second decoding layer 508 are both connected to a first fusion layer 509; the first fusion layer 509, the seventh convolution layer 510, and the active layer 511 are connected in this order; the active layer 511 and the fourth convolution layer 502 are both connected to the second fusion layer 512; the second fused layer 512 is connected to the eighth convolutional layer 513, and both the eighth convolutional layer 513 and the input data 501 are connected to the third fused layer 514.

The fourth convolutional layer 502 is used for performing convolution operation on the input data 501 of the attention submodule to obtain a first feature map; the transverse pooling layer 503 is used for pooling the first feature map by using a pooling kernel with the size of 1 xW to obtain a transverse feature map; the longitudinal pooling layer 504 is used for pooling the first feature map by using a pooling core with the size of H × 1 to obtain a longitudinal feature map; the fifth convolutional layer 505 is used for performing convolution operation on the transverse feature map to obtain a third feature map; the sixth convolution layer 506 is used for performing convolution operation on the longitudinal feature map to obtain a fourth feature map; the first decoding layer 507 is used for up-sampling the third feature map to obtain a transverse attention feature map; the second decoding layer 508 is configured to perform upsampling on the fourth feature map to obtain a longitudinal attention feature map; the first fusion layer 509 is used for adding the transverse attention feature map and the longitudinal attention feature map to obtain a fusion feature map; the seventh convolutional layer 510 is used for performing convolution operation on the fused feature map to obtain a fifth feature map; the activation layer 511 is used for performing an activation operation on the fifth feature map to obtain an activation feature map including a lateral attention feature and a longitudinal attention feature; the second fusion layer 512 is used for multiplying the activation characteristic map by the first characteristic map to obtain a second characteristic map; the eighth convolution layer 513 is configured to perform convolution operation on the second feature map to obtain a sixth feature map; the third fusion layer 514 is used to add the input data 501 of the attention submodule to the sixth feature map to obtain an attention feature map 515.

In this case, the fifth convolution layer 505, the sixth convolution layer 506, and the eighth convolution layer 513 are added to the attention submodule to perform feature integration on the feature map, and the field of view of the attention submodule can be increased, so that the accuracy of the vanishing point prediction can be improved. Also, by adding the seventh convolutional layer 510, the dimensionality of the feature map can be reduced, thereby saving computational resources.

Alternatively, the pooling operations of the horizontal pooling layer 503 and the vertical pooling layer 504 may be average-value pooling operations, or maximum-value pooling operations, which is not limited in this embodiment.

Optionally, the activation function of the activation layer 511 may be a Sigmoid activation function, a Softmax activation function, a reduced Linear Unit (reduced Linear Unit) activation function, and the like, which is not limited in this embodiment of the application.

For example: the convolution kernel size of the fourth convolution layer 502 may be set to 3 × 3 with a dilation factor size set to 2. Then, the first feature maps extracted from the fourth convolutional layer 502 are respectively input into the horizontal pooling layer 503 with the pooling kernel size of 1 × W and the vertical pooling layer 504 with the pooling kernel size of H × 1, and the average value pooling operation is respectively performed on the first feature maps:

wherein p is the index value of the row pixel in the first characteristic diagram, q is the index value of the column pixel in the first characteristic diagram, c is the channel number of the first characteristic diagram, and x _{Cross bar} For the transverse attention feature map, x _{Longitudinal direction} Is a longitudinal injectionAnd (4) an intention characteristic diagram. The transverse attention feature map x will then be _{Horizontal bar} Inputting the convolution kernel into the fifth convolution layer 505 with convolution kernel size of 3 × 1, performing convolution operation to obtain a third feature map, and transforming the vertical attention feature map x _{Longitudinal direction} The sixth convolution layer 506 with a convolution kernel size of 1 × 3 is input to perform convolution operation, and a fourth feature map is obtained. Since the sizes of the third feature map and the fourth feature map are different from the size of the feature map of the input data 501 of the attention submodule, and the subsequent fusion operation cannot be performed, the third feature map is input into the first decoding layer 507 to obtain a transverse attention feature map, and the fourth feature map is input into the second decoding layer 508 to obtain a longitudinal attention feature map, so that the sizes of the transverse attention feature map and the longitudinal attention feature map are the same. The transverse attention feature map and the longitudinal attention feature map are then additively fused to obtain a fused feature map including the global attention feature, and this fused feature map is then input into a seventh convolution layer 510 with a convolution kernel size of 1 × 1 to perform a convolution operation to reduce the dimensionality of the fused feature map. After obtaining the activation feature map including the lateral attention feature and the longitudinal attention feature, multiplying the activation feature map by the first feature map, i.e., fusing the obtained attention feature and the first feature map, a feature map including a fused attention feature, i.e., a second feature map, may be obtained. And finally, inputting the second feature map into an eighth convolution layer with a convolution kernel size of 3 x 3 to perform convolution operation to obtain a sixth feature map, and fusing the sixth feature map and the input data 501 of the attention submodule to obtain the attention feature map 515 output by the attention submodule.

It should be noted that before the vanishing point prediction model is used to predict the vanishing point of the second image and output the vanishing point probability matrix, the vanishing point prediction model needs to be trained.

Specifically, the terminal may obtain a plurality of training samples, and train the neural network model using the plurality of training samples to obtain the vanishing point prediction model.

The plurality of training samples may be preset. Each training sample in the plurality of training samples comprises a sample image and a sample mark, the sample image is an image containing a vanishing point, and the sample mark is a coordinate value of the vanishing point contained in the sample image. That is, the input data in each of the plurality of training samples is a sample image including a vanishing point, and the sample is marked as a coordinate value of the vanishing point included in the sample image.

The neural network model may include a plurality of network layers including an input layer, a plurality of hidden layers, and an output layer. The input layer is responsible for receiving input data; the output layer is responsible for outputting the processed data; the plurality of hidden layers are positioned between the input layer and the output layer and are responsible for processing data, and the plurality of hidden layers are invisible to the outside. The neural network model may have the same structure as the vanishing point predicting model shown in fig. 4, or the neural network model may have the same structure as the vanishing point predicting model shown in fig. 5.

When the terminal uses a plurality of training samples to train the neural network model, for each training sample in the plurality of training samples, input data in the training sample can be input into the neural network model to obtain output data; determining a loss value between the output data and a sample marker in the training sample by a loss function; and adjusting parameters in the neural network model according to the loss value. After the parameters in the neural network model are adjusted based on each training sample in the plurality of training samples, the neural network model with the adjusted parameters is the vanishing point prediction model.

Optionally, the vanishing point prediction model provided in this embodiment of the present application uses a mean square error loss function to determine a loss value between the output data and the sample marker in this training sample in the training stage. Specifically, the operation of determining the loss value between the output data and the sample label in the training sample by using the mean square error loss function is described in detail in step 102, and is not described herein again.

For example, the terminal may pass a formula

Alternatively, the training dataset of the vanishing point prediction model may be a public road dataset, such as: cityscaps (city landscape) data sets, and also road scene images gathered on the network.

Step 105: and the terminal determines the pitch angle of the vehicle-mounted camera according to the focal length of the vehicle-mounted camera, the coordinate value of the optical center in the second image and the coordinate value of the vanishing point in the second image.

In this case, since the focal length of the onboard camera is a relatively accurate focal length determined according to the optical flow graph, the pitch angle of the onboard camera is determined according to the relatively accurate focal length of the onboard camera, the coordinate value of the optical center in the second image, and the coordinate value of the vanishing point in the second image, so that the determined pitch angle of the onboard camera can be relatively accurate.

Specifically, the operation of step 105 may be: the terminal identifies two lane lines of a road contained in the second image; determining the coordinate value of the intersection point of two lane lines in the second image; and determining the pitch angle of the vehicle-mounted camera according to the focal length of the vehicle-mounted camera, the coordinate value of the intersection point, the coordinate value of the optical center in the second image and the coordinate value of the vanishing point in the second image.

The coordinate value of the intersection point of the two lane lines in the second image is the coordinate value of the real vanishing point in the second image obtained by identifying the two lane lines, and the coordinate value of the vanishing point in the second image acquired in the step 104 is the coordinate value of the predicted vanishing point in the second image obtained by prediction.

Optionally, the manner in which the terminal identifies the two lane lines of the road included in the second image may be: the edge detection operator is used for detecting the edge information of the second image, and then two lane lines of a road are extracted from the edge information of the second image. Alternatively, the edge detection operator may be a canny operator, a sobel operator, or the like, and the method for extracting two lane lines of the road from the edge information of the second image may be hough transform, and the embodiment of the present application does not limit this.

The operation of determining the pitch angle of the vehicle-mounted camera by the terminal according to the focal length of the vehicle-mounted camera, the coordinate value of the intersection point, the coordinate value of the optical center in the second image and the coordinate value of the vanishing point in the second image can be realized through the following steps (1) to (3):

(1) and the terminal determines the initial pitch angle of the vehicle-mounted camera according to the focal length of the vehicle-mounted camera, the coordinate value of the optical center in the second image and the coordinate value of the vanishing point in the second image.

Since the coordinate value of the vanishing point in the second image is the predicted coordinate value of the vanishing point in the second image, the initial pitch angle of the vehicle-mounted camera determined by the predicted coordinate value of the vanishing point in the second image is the predicted initial pitch angle of the vehicle-mounted camera.

Specifically, as shown in fig. 6, fig. 6 is a schematic diagram for determining an initial pitch angle of the vehicle-mounted camera, and fig. 6 includes a second image 601, an optical center 602 of the second image, a vanishing point 603 of the second image, the vehicle-mounted camera 604, and an optical axis 605. In fig. 6, the optical axis 605, the focal length f of the vehicle-mounted camera, and the Y-axis form a right triangle, and the terminal may determine the initial pitch angle of the vehicle-mounted camera 604 according to the focal length f of the vehicle-mounted camera 604, the coordinate value of the optical center 602 in the second image, and the coordinate value of the vanishing point 603 in the second image, by the following formula:

wherein, theta _{Beginning of the design} Is the initialization of the onboard camera 604Pitch angle, f, is the focal length of the onboard camera 604, v ₁ Is an ordinate value, v, of the coordinate values of vanishing point 603 in the second image ₀ Is the ordinate value of the coordinate values of the optical center 602 in the second image.

(2) And the terminal determines the correction pitch angle of the vehicle-mounted camera according to the focal length of the vehicle-mounted camera, the coordinate value of the intersection point and the coordinate value of the vanishing point in the second image.

The corrected pitch angle is an angle at which the initial pitch angle of the onboard camera needs to be corrected.

Because the coordinate value of the vanishing point in the second image is the predicted coordinate value of the predicted vanishing point in the second image, and the coordinate value of the intersection point is the coordinate value of the real vanishing point in the second image, the coordinate value of the predicted vanishing point in the second image may have a difference from the coordinate value of the real vanishing point, so that a difference may exist between the determined initial pitch angle and the real pitch angle of the vehicle-mounted camera.

Under the condition, the difference between the initial pitch angle and the real pitch angle of the vehicle-mounted camera can be determined according to the focal length of the vehicle-mounted camera, the intersection point coordinate value and the coordinate value of the vanishing point in the second image obtained through prediction, and the difference is the corrected pitch angle of the vehicle-mounted camera. The initial pitch angle of the vehicle-mounted camera can be corrected by using the corrected pitch angle, so that the real pitch angle of the vehicle-mounted camera can be obtained.

Specifically, the operation of determining the correction pitch angle of the vehicle-mounted camera by the terminal according to the focal length of the vehicle-mounted camera, the coordinate value of the intersection point, and the coordinate value of the vanishing point in the second image may be: the terminal determines a correction pitch angle of the vehicle-mounted camera according to the focal length and the intersection point coordinate value of the vehicle-mounted camera and the coordinate value of the vanishing point in the second image through the following formula:

wherein, Delta theta is the corrected pitch angle of the vehicle-mounted camera, v ₂ The vertical coordinate value of the intersection coordinate value.

(3) And the terminal determines the pitch angle of the vehicle-mounted camera according to the initial pitch angle and the corrected pitch angle of the vehicle-mounted camera.

After the terminal knows the difference between the initial pitch angle and the true pitch angle of the vehicle-mounted camera, the initial pitch angle of the vehicle-mounted camera can be corrected according to the difference, namely the initial pitch angle is added with the corrected pitch angle so as to correct the initial pitch angle, and the more accurate pitch angle of the vehicle-mounted camera can be obtained through the correction process described by the following formula.

Specifically, according to the initial pitch angle and the corrected pitch angle of the vehicle-mounted camera, the pitch angle of the vehicle-mounted camera is determined by the following formula:

θ＝θ _{first stage} +Δθ

And theta is the pitch angle of the vehicle-mounted camera.

Under the condition, the terminal firstly determines the initial pitch angle of the vehicle-mounted camera, then determines the corrected pitch angle of the vehicle-mounted camera, and then corrects the initial pitch angle by using the corrected pitch angle, so that the determined pitch angle of the vehicle-mounted camera is more accurate, and the accuracy of the pitch angle of the vehicle-mounted camera is improved.

The pitch angle determining method of the vehicle-mounted camera provided by the embodiment of the application can automatically determine the pitch angle of the vehicle-mounted camera in the running process of a vehicle, does not need technical staff to intervene in the whole process, and can effectively solve the problem of inaccurate pitch angle determination caused by factors such as proficiency and measurement errors of the technical staff. In addition, the pitch angle determining method of the vehicle-mounted camera provided by the embodiment of the application does not need excessive input parameters, so that even if the parameters are seriously distorted due to a series of reasons such as aging and damage of the internal parameters of the vehicle-mounted camera in the subsequent use process, the pitch angle of the vehicle-mounted camera can be accurately determined.

In the embodiment of the application, the terminal acquires the first image and the second image, the first image and the second image are two adjacent frames of video images in a video shot by the vehicle-mounted camera, then the first image and the second image are input into the optical flow estimation model to output the optical flow graph through the optical flow estimation model, and the optical flow graph can accurately indicate the displacement value of pixel points belonging to the same object between the first image and the second image, so that the focal length of the vehicle-mounted camera is determined according to the optical flow graph, and the determined focal length of the vehicle-mounted camera can be more accurate. And finally, obtaining the coordinate value of the optical center in the second image and the coordinate value of the vanishing point in the second image, and determining the pitch angle of the vehicle-mounted camera according to the focal length of the vehicle-mounted camera, the coordinate value of the optical center in the second image and the coordinate value of the vanishing point in the second image. Therefore, the focal length of the vehicle-mounted camera can be accurately determined according to the optical flow diagram, the pitch angle of the vehicle-mounted camera can be determined according to the accurately determined focal length of the vehicle-mounted camera, the coordinate value of the optical center in the second image and the coordinate value of the vanishing point in the second image, and the accuracy of the pitch angle of the vehicle-mounted camera can be improved.

Fig. 7 is a schematic structural diagram of a pitch angle determining apparatus of a vehicle-mounted camera according to an embodiment of the present application. The pitch angle determination means of the onboard camera may be implemented by software, hardware, or a combination of both as part or all of a computer device, which may be a computer device shown in fig. 8 below. Referring to fig. 7, the apparatus includes: a first acquisition module 701, an optical flow generation module 702, a first determination module 703, a second acquisition module 704, and a second determination module 705.

A first obtaining module 701, configured to obtain a first image and a second image, where the first image is a previous video image in two adjacent video images in a video captured by a vehicle-mounted camera, and the second image is a next video image in the two video images;

an optical flow generation module 702, configured to input the first image and the second image into an optical flow estimation model to output an optical flow graph through the optical flow estimation model, where the optical flow graph is used to indicate displacement values of pixel points belonging to the same object between the first image and the second image;

a first determining module 703, configured to determine a focal length of the onboard camera according to the optical flow map;

a second obtaining module 704, configured to obtain coordinate values of an optical center in the second image and coordinate values of a vanishing point in the second image;

the second determining module 705 is configured to determine a pitch angle of the vehicle-mounted camera according to the focal length of the vehicle-mounted camera, the coordinate value of the optical center in the second image, and the coordinate value of the vanishing point in the second image.

the first coding module comprises m first convolution layers, the second coding module comprises m second convolution layers, the m first convolution layers are used for extracting a feature map of the first image, the m second convolution layers are used for extracting a feature map of the second image, and m is an integer greater than or equal to 2;

the first decoding module comprises n optical flow estimation modules, a first one of the n optical flow estimation modules is configured to: outputting a light flow diagram according to the feature maps extracted by the m first convolution layers in the m first convolution layers and the feature maps extracted by the m second convolution layers in the m second convolution layers; the ith optical flow estimation module of the n optical flow estimation modules is used for: outputting a light flow graph according to the light flow graph output by the i-1 st light flow estimation module in the n light flow estimation modules, the feature graph extracted by the m-i +1 th first convolution layer in the m first convolution layers and the feature graph extracted by the m-i +1 th second convolution layer in the m second convolution layers; the optical flow graph output by the nth optical flow estimation module in the n optical flow estimation modules is the optical flow graph output by the optical flow estimation model, n is an integer greater than or equal to 2, and i is an integer greater than or equal to 2 and less than or equal to n.

Optionally, the i-1 th optical flow estimation module is configured to:

converting the feature map extracted from the (m-i + 1) th first convolution layer according to the target light flow map to obtain a target feature map;

Optionally, displacement values of pixel points belonging to the same object between the first image and the second image include a lateral displacement value, and the first determining module 703 is configured to:

determining the average value of the transverse displacement values of all the pixel points indicated by the optical flow diagram;

acquiring the angular velocity of a vehicle provided with a vehicle-mounted camera;

dividing the average value by the angular velocity to obtain a first ratio;

and determining the negative number of the first ratio as the focal length of the vehicle-mounted camera.

Optionally, the second obtaining module 704 is configured to:

Optionally, the second determining module 705 includes:

an identifying unit configured to identify two lane lines of a road included in the second image;

a first determination unit for determining an intersection coordinate value of the two lane lines in the second image;

and the second determining unit is used for determining the pitch angle of the vehicle-mounted camera according to the focal length of the vehicle-mounted camera, the coordinate value of the intersection point, the coordinate value of the optical center in the second image and the coordinate value of the vanishing point in the second image.

Optionally, the second determining unit is configured to:

determining a correction pitch angle of the vehicle-mounted camera according to the focal length of the vehicle-mounted camera, the coordinate value of the intersection point and the coordinate value of the vanishing point in the second image;

Optionally, the second determining unit is configured to:

determining an initial pitch angle of the vehicle-mounted camera through the following formula according to the focal length of the vehicle-mounted camera, the coordinate value of the optical center in the second image and the coordinate value of the vanishing point in the second image;

wherein, theta _{First stage} Is the initial pitch angle of the vehicle-mounted camera, f is the focal length of the vehicle-mounted camera, v ₁ Is a longitudinal coordinate value, v, of the coordinate values of the vanishing point in the second image ₀ Is the ordinate value among the coordinate values of the optical center in the second image.

Optionally, the second determining unit is configured to:

θ＝θ _{beginning of the design} +Δθ

Wherein theta is the pitch angle of the vehicle-mounted camera, and theta is the pitch angle of the vehicle-mounted camera _{First stage} The initial pitch angle of the vehicle-mounted camera is delta theta, and the corrected pitch angle of the vehicle-mounted camera is delta theta.

In the embodiment of the application, a first image and a second image are obtained, the first image and the second image are two adjacent frames of video images in a video shot by a vehicle-mounted camera, then the first image and the second image are input into an optical flow estimation model to output an optical flow graph through the optical flow estimation model, and the optical flow graph can accurately indicate displacement values of pixel points belonging to the same object between the first image and the second image, so that the focal length of the vehicle-mounted camera is determined according to the optical flow graph, and the determined focal length of the vehicle-mounted camera can be more accurate. And finally, obtaining the coordinate value of the optical center in the second image and the coordinate value of the vanishing point in the second image, and determining the pitch angle of the vehicle-mounted camera according to the focal length of the vehicle-mounted camera, the coordinate value of the optical center in the second image and the coordinate value of the vanishing point in the second image. Therefore, the focal length of the vehicle-mounted camera can be accurately determined according to the optical flow diagram, the pitch angle of the vehicle-mounted camera can be determined according to the accurately determined focal length of the vehicle-mounted camera, the coordinate value of the optical center in the second image and the coordinate value of the vanishing point in the second image, and the accuracy of the pitch angle of the vehicle-mounted camera can be improved.

It should be noted that: in the pitch angle determining apparatus for a vehicle-mounted camera provided in the above embodiment, when determining the pitch angle of the vehicle-mounted camera, only the division of the above functional modules is used for illustration, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the apparatus may be divided into different functional modules to complete all or part of the above described functions.

Each functional unit and module in the above embodiments may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used to limit the protection scope of the embodiments of the present application.

The pitch angle determining device of the vehicle-mounted camera and the pitch angle determining method of the vehicle-mounted camera provided by the embodiment belong to the same concept, and the specific working processes and the brought technical effects of the units and the modules in the embodiment can be referred to the embodiment part of the method, and are not repeated herein.

Fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 8, the computer device 8 includes: a processor 80, a memory 81 and a computer program 82 stored in the memory 81 and operable on the processor 80, the processor 80 implementing the steps in the pitch angle determination method of the vehicle-mounted camera in the above-described embodiment when executing the computer program 82.

The computer device 8 may be a general purpose computer device or a special purpose computer device. In a specific implementation, the computer device 8 may be a terminal, and optionally, may be a vehicle-mounted terminal, and the embodiment of the present application does not limit the type of the computer device 8. Those skilled in the art will appreciate that fig. 8 is merely an example of the computer device 8 and does not constitute a limitation of the computer device 8, and may include more or less components than those shown, or combine certain components, or different components, such as input output devices, network access devices, etc.

The Processor 80 may be a Central Processing Unit (CPU), and the Processor 80 may also be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or any conventional processor.

The storage 81 may in some embodiments be an internal storage unit of the computer device 8, such as a hard disk or a memory of the computer device 8. The memory 81 may also be an external storage device of the computer device 8 in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), etc. provided on the computer device 8. Further, the memory 81 may also include both an internal storage unit of the computer device 8 and an external storage device. The memory 81 is used for storing an operating system, an application program, a Boot Loader (Boot Loader), data, and other programs. The memory 81 may also be used to temporarily store data that has been output or is to be output.

An embodiment of the present application further provides a computer device, where the computer device includes: at least one processor, a memory, and a computer program stored in the memory and executable on the at least one processor, the processor implementing the steps of any of the various method embodiments described above when executing the computer program.

Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the steps in the foregoing method embodiments may be implemented.

The embodiments of the present application provide a computer program product, which when run on a computer causes the computer to perform the steps of the above-described method embodiments.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the above method embodiments may be implemented by a computer program, which may be stored in a computer readable storage medium and used by a processor to implement the steps of the above method embodiments. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or apparatus capable of carrying computer program code to a photographing apparatus/terminal device, a recording medium, computer Memory, ROM (Read-Only Memory), RAM (Random Access Memory), CD-ROM (Compact Disc Read-Only Memory), magnetic tape, floppy disk, optical data storage device, etc. The computer-readable storage medium referred to herein may be a non-volatile storage medium, in other words, a non-transitory storage medium.

It should be understood that all or part of the steps for implementing the above embodiments may be implemented by software, hardware, firmware or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The computer instructions may be stored in the computer-readable storage medium described above.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/computer device and method may be implemented in other ways. For example, the above-described apparatus/computer device embodiments are merely illustrative, and for example, a module or a unit may be divided into only one logical function, and may be implemented in other ways, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A pitch angle determination method of a vehicle-mounted camera, the method comprising:

inputting the first image and the second image into an optical flow estimation model to output an optical flow graph by the optical flow estimation model, the optical flow graph indicating displacement values of pixel points belonging to the same object between the first image and the second image;

acquiring a coordinate value of an optical center in the second image and a coordinate value of a vanishing point in the second image;

2. The method of claim 1, wherein the optical flow estimation model comprises a first encoding module, a second encoding module, and a first decoding module;

3. The method of claim 2, wherein outputting a light flow graph according to a light flow graph output by an i-1 th of the n light flow estimation modules, an m-i +1 th of the m first convolution layers extracted feature maps, and an m-i +1 th of the m second convolution layers extracted feature maps comprises:

4. The method of any one of claims 1 to 3, wherein the displacement values of the pixel points belonging to the same object between the first image and the second image comprise lateral displacement values, and wherein determining the focal length of the onboard camera according to the light flow graph comprises:

dividing the average value by the angular velocity to obtain a first ratio;

5. The method according to any one of claims 1 to 3, wherein the obtaining of coordinate values of vanishing points in the second image comprises:

6. The method of any one of claims 1 to 3, wherein determining the pitch angle of the onboard camera based on the focal length of the onboard camera, the coordinate value of the center of light in the second image, and the coordinate value of the vanishing point in the second image comprises:

identifying two lane lines of a road contained in the second image;

7. The method of claim 6, wherein determining the pitch angle of the onboard camera from the focal length of the onboard camera, the intersection coordinate value, the coordinate value of the center of light in the second image, and the coordinate value of the vanishing point in the second image comprises:

8. The method of claim 7, wherein determining an initial pitch angle of the onboard camera based on a focal length of the onboard camera, coordinate values of an optical center in the second image, and coordinate values of a vanishing point in the second image comprises:

wherein, theta _{First stage} Is the initial pitch angle of the vehicle-mounted camera, f is the focal length of the vehicle-mounted camera, v ₁ Is a longitudinal coordinate value, upsilon, in the coordinate values of the vanishing point in the second image ₀ Is the ordinate value in the coordinate values of the optical center in the second image.

9. The method of claim 7, wherein determining the pitch angle of the vehicle-mounted camera from the initial pitch angle and the corrected pitch angle of the vehicle-mounted camera comprises:

θ＝θ _{beginning of the design} +Δθ

10. A pitch angle determination apparatus of an in-vehicle camera, characterized by comprising:

the device comprises a first acquisition module and a second acquisition module, wherein the first acquisition module is used for acquiring a first image and a second image, the first image is a previous frame of video image in two adjacent frames of video images in a video shot by a vehicle-mounted camera, and the second image is a next frame of video image in the two frames of video images;

11. A computer device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, the computer program when executed by the processor implementing the method of any one of claims 1 to 9.

12. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, implements the method of any one of claims 1 to 9.