CN114723779A

CN114723779A - Vehicle positioning method and device and computer readable storage medium

Info

Publication number: CN114723779A
Application number: CN202110016641.6A
Authority: CN
Inventors: 关倩仪; 苏威霖; 王建明; 陈泽武; 张力锴
Original assignee: Guangzhou Automobile Group Co Ltd
Current assignee: Guangzhou Automobile Group Co Ltd
Priority date: 2021-01-06
Filing date: 2021-01-06
Publication date: 2022-07-08

Abstract

The invention discloses a vehicle positioning method, a vehicle positioning device and a computer readable storage medium, wherein the method comprises the following steps: acquiring a driving scene image and pose information of a vehicle in a current driving state; extracting semantic features of the current driving scene image by using a semantic segmentation network, wherein the semantic features comprise scene features which are relatively stable in space and time; and performing loop detection on the semantic features in the loaded latest updated historical map, and if loop detection is performed, performing global pose map optimization on the driving track of the vehicle according to the pose information and updating the map. The method effectively improves the robustness and the positioning precision of the positioning method by combining the visual inertial odometer and utilizing the stable scene characteristics extracted by the semantic segmentation network to position the vehicle, and obtains a more stable and accurate vehicle repositioning effect in a dynamic scene.

Description

Vehicle positioning method and device and computer readable storage medium

Technical Field

The present invention relates to the field of positioning and navigation technologies, and in particular, to a vehicle positioning method and apparatus, and a computer-readable storage medium.

Background

In recent years, the automatic driving technology of automobiles is rapidly developed, the automatic parking technology becomes one of the key points of research and development in the field of automatic driving, and is not limited to parking and warehousing operation, and the automatic parking technology is expanded into a comprehensive parking system comprising autonomous low-speed cruising, parking space finding, parking and call response.

The existing automatic parking technology is mainly realized by a vehicle positioning algorithm based on visual SLAM: the vehicle-mounted camera is used for acquiring an image, visual feature points are correspondingly extracted and tracked, and the self pose of the vehicle and the spatial position of the feature points are estimated; and then, carrying out global pose optimization by using loop detection, realizing vehicle positioning, and simultaneously storing the feature point information and the pose information to construct a map for loading and multiplexing when the subsequent vehicle is repositioned.

However, in practical applications, areas where visual feature points of a parking lot are concentrated may change at different times, such as a vehicle or a moving pedestrian, which results in that after a period of time, the vehicle cannot achieve repositioning at the current time through a map constructed by the prior art, and on the other hand, feature points and vehicle pose information acquired through a camera sensor mounted on the vehicle are greatly affected by the measurement distance of the camera, the parking lot ambient light and the rapid movement, and the accuracy of the obtained vehicle positioning result is often low.

Disclosure of Invention

The invention provides a vehicle positioning and map building method and system, which can realize vehicle repositioning and map building through a visual inertial odometer and a semantic segmentation network model and solve the problems of low vehicle positioning precision and poor repositioning effect in the prior art.

In a first aspect, an embodiment of the present invention provides a vehicle positioning method, including:

acquiring a driving scene image and pose information of a vehicle in a current driving state;

extracting semantic features of the current driving scene image by using a semantic segmentation network, wherein the semantic features comprise scene features which are relatively stable in space and time;

and performing loop detection on the semantic features in the loaded latest updated historical map, and if loop detection is performed, performing global pose map optimization on the driving track of the vehicle according to the pose information and updating the map.

In one embodiment, the pose information is output by a visual inertial odometer and is associated with a keyframe of the driving scene image; the driving scene image is acquired by a vehicle-mounted front-view camera; the visual inertial odometer comprises a vehicle-mounted all-round fisheye camera and an IMU.

In a certain embodiment, the method further comprises: extracting pose information corresponding to a current frame and a key frame of a scene image acquired from a vehicle-mounted all-round-looking fisheye camera, and carrying out image coding on the difference between the scene images according to the pose information;

when the returned result of the visual inertial odometer is normal, if the current frame is a key frame, storing the image and pose information of the current frame, and if the current frame is a non-key frame, discarding the current frame and continuously reading the next frame of image;

and when the returned result of the visual inertial odometer is abnormal, matching the key frame which is most similar to the current frame in the key frame set according to the image coding threshold value, re-calculating the visual inertial odometer, and if the matching result does not exist, discarding the current frame and continuously reading the next frame of image.

In one embodiment, after the visual inertial odometer returns a normal result and the current frame is determined to be the key frame, the method further includes:

performing track alignment on the visual information of the key frame and an IMU pre-integration result in a loose coupling mode, and calculating visual characteristic point information and IMU state variable information by a nonlinear method;

and carrying out nonlinear optimization on the feature point information and the IMU state variable information to obtain the pose information of the scene image at the current moment.

In a certain embodiment, the extracting semantic features of the current driving scene image by using a semantic segmentation network includes:

inputting the current driving scene image into a semantic segmentation network, identifying static features, semi-static features and dynamic features in scene features of the current driving scene image, and generating a mask for eliminating the semi-static features and the dynamic features;

and outputting static characteristics as semantic characteristics under the mask effect.

In one embodiment, the driving scene image in the current driving state includes a key frame image acquired by a current forward-looking undistorted camera;

the performing loop detection on the semantic features in the loaded recently updated historical map comprises:

extracting static characteristics of a key frame image acquired by a current forward-looking undistorted camera, wherein the static characteristics comprise space coordinates of key frame image characteristic points;

extracting BRIEF descriptors corresponding to the feature points, converting the descriptors into word vectors of a word bag model, and then carrying out similarity matching on the word vectors and the feature point word vectors of the corresponding historical frames in the historical map;

and judging whether a key frame acquired by the current forward-looking undistorted camera and a corresponding historical frame in the historical map form a loop or not according to a preset similarity threshold.

In one embodiment, the performing global pose graph optimization on the driving track of the vehicle according to the pose information and updating the map includes:

and solving the variation of the vehicle pose at the current moment relative to the historical moment according to the matched feature point space coordinates of the key frames and the historical frames acquired by the current forward-looking undistorted camera and external parameters between the corresponding camera and the IMU, and performing global pose graph nonlinear optimization and updating the map by minimizing pose residuals between two frames of loop and pose residuals between all key frames under the global track.

In a second aspect, an embodiment of the present invention further provides a vehicle positioning apparatus, including:

the driving data acquisition unit is used for acquiring a driving scene image and pose information of the vehicle in the current driving state;

the semantic feature extraction unit is used for extracting semantic features of the current driving scene image by utilizing a semantic segmentation network, wherein the semantic features comprise scene features which are relatively stable in space and time;

and the positioning unit is used for performing loop detection on the semantic features in the loaded latest updated historical map, and if loop detection is performed, performing global pose map optimization on the driving track of the vehicle according to the pose information and updating the map.

In one embodiment, the driving data collecting unit is specifically configured to: extracting pose information corresponding to a current frame and a key frame of a scene image acquired from a vehicle-mounted all-round-looking fisheye camera, and carrying out image coding on the difference between the scene images according to the pose information;

In one embodiment, the driving data collecting unit is further configured to: after the visual inertial odometer returns a normal result and the current frame is determined to be the key frame:

In a third aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the method according to any one of the above embodiments.

Compared with the prior art, the embodiment of the invention has the beneficial effects that:

the vehicle positioning method, the vehicle positioning device and the computer readable storage medium provided by the invention combine four-way fisheye cameras and IMU information to obtain vehicle pose information, and on the basis, a deep learning semantic segmentation network is introduced to obtain a relatively stable static region in the environment, so that a positioning result keeps static characteristic points, the robustness and the precision of the positioning method are improved, and a more stable vehicle repositioning effect is obtained.

Drawings

In order to more clearly illustrate the technical solution of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart diagram of a vehicle locating method provided by an embodiment of the invention;

FIG. 2 is a schematic diagram of the distribution positions of the vehicle-mounted sensors provided by the embodiment of the invention;

fig. 3 is a schematic structural diagram of a vehicle positioning device provided in an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be understood that the step numbers used herein are only for convenience of description and are not used as limitations on the order in which the steps are performed.

It is to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

The terms "comprises" and "comprising" indicate the presence of the described features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

The term "and/or" refers to and includes any and all possible combinations of one or more of the associated listed items.

As shown in fig. 1, an embodiment of the present invention provides a vehicle positioning method, which specifically includes the following steps:

s11: and acquiring a driving scene image and pose information of the vehicle in the current driving state.

In this embodiment, the driving scene image is acquired by a vehicle-mounted forward-looking undistorted camera.

Generally, an on-board sensor mounted on an automobile generally includes a four-way all-round looking fisheye camera, an inertial measurement unit IMU, and a forward-looking distortion-free camera; as shown in fig. 2, the installation position of the four-way all-round fisheye camera is consistent with the installation position of the 360-degree all-round imaging function of the ordinary automobile, and the four-way all-round fisheye camera is respectively installed at the positions near the front and rear car logos of the automobile and below the left and right side rearview mirrors, and the direction of the four-way all-round fisheye camera is towards the outside; the IMU sensor is arranged at the central position of a rear axle of the vehicle; the front view undistorted camera is arranged at the rear position of the rearview mirror below the windshield of the vehicle, and the direction of the front view undistorted camera faces to the right front of the vehicle.

Specifically, internal parameters of the vehicle-mounted sensor and external parameters of the system can be obtained by using a calibration tool, wherein the internal parameters comprise a pixel center, a focal length and a distortion coefficient of the vehicle-mounted camera, white noise and random walk noise errors of an accelerometer and a gyroscope of the IMU, the external parameters of the system comprise rotation and translation matrixes between each sensor coordinate system and a vehicle coordinate system, the position of the IMU is set as an origin of the vehicle coordinate system, and the rotation and translation matrixes between the four fisheye cameras and the IMU and between the forward-looking camera and the IMU are calibrated respectively.

In a particular embodiment, the pose information is a visual inertial odometer output and is associated with a keyframe of the driving scene image; the driving scene image is acquired by a vehicle-mounted front-view camera; the visual inertial odometer comprises a vehicle-mounted all-round fisheye camera and an IMU.

In the embodiment, the pose information corresponding to the current frame and the key frame of the scene image acquired from the vehicle-mounted all-round fisheye camera is extracted, and the difference between the scene images is subjected to image coding according to the pose information.

And when the returned result of the visual inertial odometer is normal, if the current frame is a key frame, storing the image and the pose information of the current frame, and if the current frame is a non-key frame, discarding the current frame and continuously reading the next frame of image.

After the returned result of the visual inertial odometer is normal and the current frame is determined as a key frame, performing track alignment on the visual information of the key frame and the IMU pre-integration result in a loose coupling mode, and calculating visual characteristic point information and IMU state variable information by a nonlinear method; and carrying out nonlinear optimization on the feature point information and the IMU state variable information to obtain the pose information of the scene image at the current moment.

In the embodiment, the visual inertial odometer can be preferably realized based on an extended open source monocular visual inertial fusion positioning algorithm VINS-Mono to obtain the vehicle pose information, and the specific implementation steps of the preferable method are as follows:

firstly, sparse edge feature extraction is carried out on images acquired by four fish-eye cameras, and the edge features are mainly concentrated on static marks such as parked vehicles and parking lot environments.

When the acquired image is a first frame image, N Harris characteristic points of the image are respectively extracted, the pixel distance between the N characteristic points is not less than d, wherein N is a preset characteristic point number hyper-parameter, and d is a characteristic point minimum pixel distance hyper-parameter.

When the acquired image is not the first frame image, tracking the feature point pixel coordinates of the feature point of the previous frame on the feature point of the current frame image by using an LK optical flow method, presetting a feature point quantity hyper-parameter, if the tracked feature point quantity m is less than N, extracting (N-m) Harris feature points from the current frame image, and simultaneously keeping the feature point quantity of the current frame as N.

Acquiring the pixel coordinates of a feature point tracked between two frames and the system external parameter R of a camera corresponding to the feature point_iAnd T_i(R_iAs a rotation parameter, T_iAnd (4) carrying out triangulation on the pixel coordinates of the characteristic points and external parameters according to the epipolar geometric relationship, and calculating the space coordinates of the characteristic points tracked in each image and the pose information of the current frame vehicle in a world coordinate system.

And inserting visual characteristic point information corresponding to each frame of image shot by the four fish-eye cameras into the sliding window until the number of key frames in the sliding window is equal to the preset sliding window size over-parameter W, and solving the spatial coordinates of the characteristic points observed by all the key frames in the sliding window and the vehicle pose of the key frames at the corresponding moment by utilizing PnP to complete visual initialization.

And aligning the visual feature point information with a state vector obtained by IMU pre-integration, and iteratively solving the visual feature point information and a state variable estimation value of an IMU tight coupling system by a nonlinear method based on a principle of minimizing visual reprojection errors and IMU observation residual errors, wherein the state variables comprise the pose of a key frame in a sliding window, the spatial position of the visual feature point observed by the key frame, IMU bias and a state estimation covariance matrix.

When a new frame of image is inserted into the sliding window, further judging whether the last frame is a key frame or not in the sliding window by judging whether the average parallax of the tracked feature points between the two frames of images meets a preset threshold value or not, and if so, marginalizing the first frame in the sliding window; if not, marginalizing the penultimate frame and inserting a new frame into the sliding window.

When a new key frame is inserted into the sliding window, nonlinear optimization is carried out on the visual feature point information and IMU state variables of all key frames in the sliding window to obtain an optimal value of state variable estimation of the system, wherein the optimal value is vehicle pose information at the current moment, and specifically comprises a vehicle position P and a vehicle attitude R.

The visual inertial odometer provided by the embodiment can effectively make up for instability caused by the influence of illumination, dynamic objects and rapid movement on vision, and further improves the positioning precision of the vehicle.

S12: and extracting semantic features of the current driving scene image by using a semantic segmentation network, wherein the semantic features comprise scene features which are relatively stable in space and time.

In this embodiment, the semantic features are not included in the area where the vehicles, pedestrians, etc. in the scene image, which are in driving or in static parking, change after a period of time.

In a specific embodiment, the current driving scene image may be input into a semantic segmentation network, static features, semi-static features, and dynamic features among scene features of the current driving scene image are identified, and a mask for rejecting the semi-static features and the dynamic features is generated; and outputting static characteristics as semantic characteristics under the mask effect.

The used semantic segmentation network can be preferably selected to be a U-Net network, and the U-Net network is built in the following process:

extracting scene data collected by a forward-looking distortion-free camera, labeling a label of a static area in the scene data, obtaining a training data set of a semantic segmentation network model according to the scene data and the label, training the semantic segmentation network model by using the obtained training data set, and outputting an image mask.

Specifically, a forward-looking distortion-free camera is used for acquiring a scene image of an underground parking lot, a static environment area of the underground parking lot in the scene image is defined as an effective area including walls, columns, marks and the like, an area where pedestrians, running vehicles and the like in the scene image change positions along with time is defined as a dynamic area, and an area where statically parked automobiles and the like change after a period of time is defined as a semi-static area.

Then, labeling the effective region of the image obtained by the forward-looking undistorted camera to generate an image mask label, specifically setting the label value of the effective region of the image to 0, setting the label value of the dynamic region to 255, obtaining a training data set, and training the U-Net network by using the labeled training data set.

In this embodiment, the resolution of the original image captured by the front-view undistorted camera is 640 × 360, the input image may be scaled to 0.5 times the size of the original image, and a mask image with a size of 320 × 180 may be output after dynamic scene segmentation through the training U-Net network.

After the dynamic scene segmentation is carried out by utilizing the U-Net semantic segmentation network model, a mask which only reserves a static area can be generated, image feature points are extracted from the mask area, and semantic features of a scene image can be obtained, wherein the semantic features are fixed static features and are not easily changed along with the change of a parked vehicle, so that the vehicle repositioning precision can be effectively improved.

In a specific embodiment, after the vehicle pose information is obtained through the visual inertial odometer, the spatial position information of the feature points of the current scene image can be obtained through triangularization on the vehicle key frame pose information, the feature point coordinates of the current scene image obtained through the semantic segmentation network model and the system external parameters, and the association between the feature points and the vehicle pose information is realized.

Specifically, a scene image shot by a front-view undistorted camera is obtained, the size of the image is scaled to 320 × 180, the scaled image is input into a trained semantic segmentation network, an image mask is output, the obtained image mask only keeps an effective area in the image, the effective area value in the mask is 0, and the dynamic area value is 255.

Further extracting image feature points by using the image mask: when the image is the first frame, extracting N in the image_sHarris characteristic points, and the distance between the characteristic points is not less than d_sIn which N is_sA number of characteristic points in a preset effective area of the front-view camera exceeds a parameter d_sThe minimum pixel pitch over-parameter is the corresponding characteristic point; when the image is not the first frame, tracking the pixel coordinates of the feature points of the previous frame in the feature points of the current frame image by using an LK optical flow method, presetting a feature point quantity hyper-parameter, and if the tracked feature point quantity m is large_s＜N_sThen extract (N) for the current image_s-m_s) Harris characteristic points, and simultaneously keeping the number of the characteristic points of the current frame as N_s。

Triangularization is carried out on each frame after initialization of the visual inertial odometer is completed, wherein the frames specifically comprise vehicle pose information P and R of a key frame and external parameters of relative positions between the forward-looking camera and the IMU, so that the vehicle pose information of the key frame is associated with the coordinates of the image feature points observed by the forward-looking camera, and the space coordinates of the current image feature points are obtained.

S13: and performing loop detection on the semantic features in the loaded latest updated historical map, and if loop detection is performed, performing global pose map optimization on the driving track of the vehicle according to the pose information and updating the map.

In a specific embodiment, when a vehicle is positioned for the first time, extracting static features of a key frame image acquired by a current forward-looking distortion-free camera, wherein the static features comprise space coordinates of key frame image feature points; extracting BRIEF descriptors corresponding to the feature points, converting the descriptors into word vectors of a word bag model, and then carrying out similarity matching on the word vectors and the feature point word vectors of the historical frames in the historical map; judging whether a key frame acquired by the current forward-looking undistorted camera and a historical frame in a historical map form a loop or not according to a preset similarity threshold, and if so, optimizing a global pose map; if not, the global pose graph is not required to be corrected, and therefore vehicle positioning is completed.

Wherein, global position and posture graph optimization includes: according to the matched characteristic point space coordinates of the key frame acquired by the current forward-looking undistorted camera and the two frames of images of the historical frame in the historical map, and the external parameter (rotation parameter R) between the corresponding camera and the IMU_iAnd a translation parameter T_iAnd i is 1,2,3,4), solving the variation of the vehicle pose at the current moment relative to the historical moment, and performing global pose graph nonlinear optimization by minimizing pose residuals between two loop frames and pose residuals between all key frames under a global track. Namely, the global track is corrected by utilizing the relative pose relationship of the loop, and the accumulated error of the visual inertial odometer is eliminated.

In the embodiment, vehicle pose information of a moment corresponding to each key frame after the global pose map is optimized and an observed image feature point word vector are stored as binary files, and a map is constructed according to the binary files and can be loaded for vehicle repositioning.

And under the condition that the vehicle is relocated, loading a historical map, matching the associated image and the map data through loop detection, correcting the vehicle pose information according to the relative pose relationship between the associated image and the map data when a loop is formed by a matching result, and updating the map according to the corrected vehicle pose information and the associated image.

Specifically, before the vehicle enters the parking lot again, a recently updated historical map is loaded, BRIEF descriptors corresponding to static feature points of a key frame image acquired by a current forward-looking undistorted camera are extracted, the BRIEF descriptors are converted into word vectors of a word bag model, the word vectors are subjected to similarity matching with feature point word vectors of the loaded map, whether a loop is formed between the current key frame and the map key frame is judged according to a preset similarity threshold, and a global pose graph is optimized according to a loop result, so that vehicle relocation is completed.

The embodiment can optimize the global position and pose graph through loop detection, eliminate the accumulated deviation of the visual inertial odometer, and meanwhile, compared with the prior art, the matching success rate of the map constructed by the key frame position and pose in the vehicle running global track and the corresponding feature point information in the vehicle relocation is higher.

As shown in fig. 3, another embodiment of the present invention further provides a vehicle positioning device, which includes a driving data collecting unit 101, a semantic feature extracting unit 102, and a positioning unit 103.

The driving data acquisition unit 101 is configured to acquire a driving scene image and pose information of the vehicle in a current driving state.

The semantic feature extraction unit 102 is configured to extract semantic features of the current driving scene image by using a semantic segmentation network, where the semantic features include scene features that are relatively stable in space and time.

The positioning unit 103 is configured to perform loop detection on the semantic features in the loaded latest updated historical map, and if loop detection is performed, perform global pose map optimization on the driving track of the vehicle according to the pose information and update the map.

Because the content of information interaction, execution process, and the like among the units in the device is based on the same concept as the method embodiment of the present invention, specific content can be referred to the description in the method embodiment of the present invention, and is not described herein again.

An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the method according to any of the above embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the above embodiments may be implemented by a computer program, which may be stored in a computer-readable storage medium, and may include the processes of the embodiments of the methods when executed. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims

1. A vehicle positioning method, characterized by comprising:

2. The vehicle localization method of claim 1, wherein the pose information is a visual inertial odometer output and is associated with a keyframe of the driving scene image; wherein the content of the first and second substances,

the driving scene image is acquired by a vehicle-mounted forward-looking camera; the visual inertial odometer comprises a vehicle-mounted all-round fisheye camera and an IMU.

3. The vehicle positioning method according to claim 2, characterized by further comprising:

extracting pose information corresponding to a current frame and a key frame of a scene image acquired from a vehicle-mounted all-round-looking fisheye camera, and carrying out image coding on the difference between the scene images according to the pose information;

4. The vehicle positioning method according to claim 3, wherein after the visual inertial odometer returns a normal result and the current frame is determined to be the key frame, further comprising:

5. The vehicle positioning method according to any one of claims 1 to 4, wherein the extracting semantic features of the current driving scene image by using a semantic segmentation network comprises:

6. The vehicle positioning method according to claim 5, wherein the driving scene image in the current driving state comprises a key frame image acquired by a current forward-looking undistorted camera;

and judging whether a key frame acquired by the current forward-looking undistorted camera and a corresponding historical frame in a historical map form a loop or not according to a preset similarity threshold.

7. The vehicle localization method according to claim 6, wherein the global pose map optimization and map update of the driving trajectory of the vehicle according to the pose information comprises:

8. A vehicle positioning apparatus, comprising:

9. The vehicle localization apparatus of claim 8, wherein the pose information is a visual inertial odometer output and is associated with a keyframe of the driving scene image; wherein the content of the first and second substances,

the driving scene image is acquired by a vehicle-mounted forward-looking camera; the visual inertial odometer comprises a vehicle-mounted all-round fisheye camera and an IMU (inertial measurement Unit);

the driving data acquisition unit is specifically configured to:

and when the returned result of the visual inertial odometer is abnormal, matching a key frame most similar to the current frame in the key frame set according to the image coding threshold, re-calculating the visual inertial odometer, and if the matching result does not exist, discarding the current frame and continuously reading the next frame of image.

10. A computer-readable storage medium, comprising a stored computer program, wherein the computer program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform the vehicle localization method according to any one of claims 1 to 7.