CN114973181B - Multi-view BEV (beam steering angle) visual angle environment sensing method, device, equipment and storage medium - Google Patents

Multi-view BEV (beam steering angle) visual angle environment sensing method, device, equipment and storage medium Download PDF

Info

Publication number
CN114973181B
CN114973181B CN202210902286.7A CN202210902286A CN114973181B CN 114973181 B CN114973181 B CN 114973181B CN 202210902286 A CN202210902286 A CN 202210902286A CN 114973181 B CN114973181 B CN 114973181B
Authority
CN
China
Prior art keywords
image
data
perception model
environment perception
bev
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210902286.7A
Other languages
Chinese (zh)
Other versions
CN114973181A (en
Inventor
王汝卓
田良宇
王雅儒
程建伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Jimu Intelligent Technology Co ltd
Original Assignee
Wuhan Jimu Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Jimu Intelligent Technology Co ltd filed Critical Wuhan Jimu Intelligent Technology Co ltd
Priority to CN202210902286.7A priority Critical patent/CN114973181B/en
Publication of CN114973181A publication Critical patent/CN114973181A/en
Application granted granted Critical
Publication of CN114973181B publication Critical patent/CN114973181B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/588Recognition of the road, e.g. of lane markings; Recognition of the vehicle driving pattern in relation to the road
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention provides an environment sensing method, device and equipment of a multi-view BEV (beam steering angle) view angle and a storage medium, and relates to the technical field of intelligent driving. Wherein, the method comprises the following steps: inputting the pre-processed image to be perceived into an environment perception model; inputting the internal parameter data and the external parameter data of the multi-view sensor and the data representing the on-line data enhancement strategy in the training process of the environment perception model into the environment perception model; coding the image to be perceived through an environment perception model to generate a numbered image to be perceived, carrying out matrix transformation on internal parameter data and external parameter data of a multi-view sensor and data representing an on-line data enhancement strategy in the training process of the environment perception model to generate a thermodynamic diagram, carrying out matrix transformation on the numbered image to be perceived and the thermodynamic diagram to obtain voxel characteristics based on a self-vehicle coordinate system, respectively inputting the voxel characteristics into a 3D target detection head, a travelable region segmentation head and a lane line segmentation head, and outputting a reasoning result. And multi-task environment perception is realized.

Description

Multi-view BEV (beam steering angle) visual angle environment sensing method, device, equipment and storage medium
Technical Field
The present disclosure relates to the field of intelligent driving technologies, and in particular, to a method, an apparatus, a device, and a storage medium for sensing an environment of a Bird's Eye-View (Bird's Eye-View) View angle.
Background
The intelligent driving means that advanced sensors and other devices are mounted on an automobile, and new technologies such as artificial intelligence are applied to enable the automobile to have the intelligent driving capability, so as to assist a driver to safely and conveniently complete a driving task. The core problems of intelligent driving are generally divided into three major parts, namely environment perception, decision planning and control execution. The environment perception is a first link for realizing intelligent driving and is also a necessary condition for information interaction between a vehicle and the outside, the intelligent driving vehicle obtains information around the vehicle through various sensors (such as a camera, a millimeter wave radar, an ultrasonic radar, a laser radar and the like), generates information such as picture data, video data, point cloud images, electromagnetic waves and the like, and realizes the functions of 2D/3D target detection, semantic segmentation, depth estimation and the like through strategies such as a deep learning perception algorithm and the like.
At present, a deep learning algorithm in the field of automatic driving is rapidly updated and iterated, various novel algorithms are continuously proposed in the subdivision fields of vehicle/pedestrian/traffic light/traffic identification target detection, lane line identification, target tracking, semantic segmentation, dynamic target track prediction and the like, and a great amount of research is focused on multi-task learning.
For example, in 2020, a deep learning algorithm for realizing a target semantic segmentation function of a multi-view BEV view is proposed: lift, splat, shot: encoding Images From Arbitrary Camera rig by imaging to 3D. However, the Lift, splat, shot model only solves the semantic segmentation task under the multi-view BEV view, and meanwhile, the precision cannot meet the requirement of the automatic driving function on the model perception precision.
In 2021, a deep learning algorithm for realizing 3D target detection and tracking functions of laser point cloud data, namely a centrpoint deep learning model, was proposed, in which laser radar point cloud data was used as model input for training and reasoning, and in an actual automatic driving solution, the price of a laser radar sensor was much higher than that of a camera sensor, and although the 3D target detection accuracy of the centrpoint deep learning model was high, it was not possible to read image information collected by the camera sensor to realize the 3D target detection function.
In 2021, a panoramic driving prediction model, YOLOP, which is a monocular deep learning model, is proposed, and semantic segmentation functions of 2D target detection, a drivable Area (d.a., for short) and a Lane Line (Lane Line, for short) are realized. However, the YOLOP model aims at the perception task of a monocular 2D scene, and cannot solve the perception task of a monocular or 3D scene.
Disclosure of Invention
In view of this, embodiments of the present disclosure provide an environment sensing method for a multi-view BEV view to solve a problem in the prior art that multi-task environment sensing cannot be implemented under the multi-view BEV view. The method comprises the following steps:
acquiring a pure visual image of the surrounding environment of the vehicle through a multi-view sensor, preprocessing the image to generate an image to be perceived, and inputting the image to be perceived into an environment perception model;
inputting the internal parameter data and the external parameter data of the multi-view sensor and the data representing the on-line data enhancement strategy in the environment perception model training process into the environment perception model;
coding the image to be perceived through the environment perception model to generate a numbered image to be perceived, carrying out matrix transformation on internal parameter data and external parameter data of the multi-view sensor and data representing an on-line data enhancement strategy in the training process of the environment perception model to generate a thermodynamic diagram, carrying out matrix transformation on the numbered image to be perceived and the thermodynamic diagram to obtain voxel characteristics based on a self-vehicle coordinate system, respectively inputting the voxel characteristics to a 3D target detection head, a travelable region segmentation head and a lane line segmentation head, and outputting an inference result of 3D target detection, an inference result of travelable region BEV segmentation and an inference result of lane line BEV segmentation.
The embodiment of the present specification further provides an environment sensing device for a multi-view BEV viewing angle, so as to solve the problem that multi-task environment sensing cannot be achieved under the multi-view BEV viewing angle in the prior art. The device includes:
the image preprocessing module is used for preprocessing an image to generate an image to be perceived, wherein the image is a pure visual image of the surrounding environment of the vehicle acquired by the multi-view sensor;
the first data receiving module is arranged in the environment perception model and used for receiving the input image to be perceived;
the second data receiving module is arranged in the environment perception model and used for receiving input internal parameter data and external parameter data of the multi-view sensor and data representing an online data enhancement strategy in the environment perception model training process;
the image coder module is arranged in the environment perception model and used for coding the image to be perceived to generate a numbered image to be perceived;
the first matrix transformation module is arranged in the environment perception model and used for carrying out matrix transformation on internal parameter data and external parameter data of the multi-view sensor and data representing an on-line data enhancement strategy in the environment perception model training process to generate a thermodynamic diagram;
the second matrix transformation module is arranged in the environment perception model and used for carrying out matrix transformation on the numbered images to be perceived and the thermodynamic diagram to obtain voxel characteristics based on a self-vehicle coordinate system;
the 3D target detection head module is arranged in the environment perception model and used for receiving and carrying out 3D target detection on the voxel characteristics and outputting an inference result of the 3D target detection;
the drivable region segmentation head module is arranged in the environment perception model and is used for receiving and carrying out drivable region segmentation on the voxel characteristics and outputting an inference result of drivable region BEV segmentation;
and the lane line segmentation head module is arranged in the environment perception model and is used for receiving and carrying out lane line segmentation on the voxel characteristics and outputting an inference result of the lane line BEV segmentation.
The embodiment of the invention also provides computer equipment, which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the arbitrary multi-view BEV view angle environment perception method when executing the computer program so as to solve the problem that multi-task environment perception under the multi-view BEV view angle cannot be realized in the prior art.
An embodiment of the present invention further provides a computer-readable storage medium, where a computer program for executing the above-mentioned arbitrary multi-view BEV view environment sensing method is stored in the computer-readable storage medium, so as to solve a problem that multi-task environment sensing cannot be implemented in a multi-view BEV view in the prior art.
Compared with the prior art, the beneficial effects that can be achieved by the at least one technical scheme adopted by the embodiment of the specification at least comprise: 3D target detection, drivable region segmentation under a multi-view BEV view angle and lane line segmentation can be realized through the environment perception model, namely, multi-task environment perception is realized through a single model; in addition, the environment perception model introduces internal parameter data, external parameter data and on-line data enhancement strategies of the multi-view sensor, and is beneficial to improving the robustness and the precision of the model; the environment perception model carries out environment perception based on the pure visual image collected by the multi-view sensor, and compared with the scheme of adopting radar data to carry out environment perception, the environment perception model can avoid using radar equipment to collect data, and further can effectively reduce the cost of environment perception.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of an environment sensing method for a multi-view BEV view according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of image preprocessing provided by an embodiment of the present invention;
FIG. 3 is a data flow diagram of an environment awareness model according to an embodiment of the present invention;
FIG. 4 is a block diagram of a computer device according to an embodiment of the present invention;
fig. 5 is a block diagram of an environment sensing apparatus for a multi-view BEV viewing angle according to an embodiment of the present invention.
Detailed Description
The embodiments of the present application will be described in detail below with reference to the accompanying drawings.
The following description of the embodiments of the present application is provided by way of specific examples, and other advantages and effects of the present application will be readily apparent to those skilled in the art from the disclosure herein. It should be apparent that the described embodiments are only a few embodiments of the present application, and not all embodiments. The present application is capable of other and different embodiments and its several details are capable of modifications and/or changes in various respects, all without departing from the spirit of the present application. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The prior art can not realize the perception task of a multi-view or 3D scene, and the inventor of the application trains an environment perception model through a deep learning algorithm after a great deal of research, and further provides an environment perception method under a multi-view BEV visual angle based on the environment perception model so as to realize multi-task environment perception of 3D target detection, drivable region segmentation under the multi-view BEV visual angle and lane line segmentation.
The application provides a method for sensing environment of a multi-view BEV (beam-based visual angle), as shown in FIG. 1, the method comprises:
step S102: acquiring a pure visual image of the surrounding environment of the vehicle through a multi-view sensor, preprocessing the image to generate an image to be perceived, and inputting the image to be perceived into an environment perception model;
step S104: inputting the internal parameter data and the external parameter data of the multi-view sensor and the data representing the online data enhancement strategy in the environment perception model training process into the environment perception model;
step S106: the method comprises the steps of coding the image to be perceived through the environment perception model to generate the numbered image to be perceived, carrying out matrix transformation on internal parameter data and external parameter data of the multi-view sensor and data representing an on-line data enhancement strategy in the training process of the environment perception model to generate a thermodynamic diagram, carrying out matrix transformation on the numbered image to be perceived and the thermodynamic diagram to obtain voxel characteristics based on a self-vehicle coordinate system, inputting the voxel characteristics to a 3D target detection head, a travelable region segmentation head and a lane line segmentation head respectively, and outputting a reasoning result of 3D target detection, a reasoning result of travelable region BEV segmentation and a reasoning result of lane line BEV segmentation.
Compared with the prior art, the environment sensing method for the multi-view BEV viewing angle provided by the embodiments of the present specification can achieve at least the following beneficial effects: 3D target detection, drivable region segmentation under a multi-view BEV view angle and lane line segmentation can be realized through the environment perception model, namely, multi-task environment perception is realized through a single model; in addition, the environment perception model introduces internal parameter data, external parameter data and on-line data enhancement strategies of the multi-view sensor, and is beneficial to improving the robustness and the precision of the model; the environment perception model carries out environment perception based on the pure visual image collected by the multi-view sensor, and compared with the scheme of adopting radar data to carry out environment perception, the environment perception model can avoid using radar equipment to collect data, and further can effectively reduce the cost of environment perception.
In specific implementation, the multi-view sensor can be an image acquisition device such as a camera and a camera, and acquires images of the surrounding environment of the vehicle at a plurality of different angles; the purely visual image refers to two-dimensional picture data, for example, picture data, video data, and the like collected by a camera, and the like.
In specific implementation, in order to improve detection and segmentation performance in a scene with small illumination (e.g., at night) and further improve robustness and accuracy of the environment sensing model, in this embodiment, it is proposed to generate an image to be sensed by preprocessing the image before performing environment sensing based on the image, for example, generating the image to be sensed by preprocessing the image according to data representing an online data enhancement policy in a training process of the environment sensing model, where the data representing the online data enhancement policy in the training process of the environment sensing model includes: the random scaling matrix, the random clipping matrix, the random mirroring matrix, the random rotation matrix, the matrix of the chromaticity space random variation and the matrix of the added random noise, wherein the matrix of the chromaticity space random variation comprises the matrix of randomly changing hue, color saturation and brightness value in the chromaticity space. Specifically, the image is subjected to operations of random scaling, random clipping, random mirroring, random rotation, random spatial variation of chromaticity, random noise increase and the like, so as to generate the image to be perceived.
In specific implementation, in order to improve robustness and accuracy of the environmental perception model, in the process of training the environmental perception model, the sample image is also subjected to model training after being preprocessed, for example, as shown in fig. 2, original sample image data is subjected to fixed scaling and fixed clipping processing to generate a test image, and the test image is input into the model for testing; the method comprises the steps of generating a training image after online data enhancement strategy operations such as random scaling, random cropping, random mirroring, random rotation, random Hue Space (HSV) change and random noise (Solar) increasing are carried out on original sample image data, inputting a model for training, wherein the random Hue Space (HSV) change comprises random change of Hue, saturation and Value in Hue space.
In specific implementation, in order to further improve the accuracy of the above-mentioned environment perception model, in this embodiment, it is proposed that in the training process of the environment perception model, the training image may include a purely visual image of the vehicle surrounding environment and radar data, and in the environment perception model thus trained, when predicting the purely visual image, the output inference result may include not only information of the purely visual image (for example, a specific target) but also information of the radar data (for example, information of a distance between targets, etc.).
In specific implementation, in order to implement multi-task environmental perception under a multi-view BEV view, in this embodiment, it is proposed that, in the environmental perception model, the to-be-perceived image is encoded by an image encoder based on a convolutional neural network to generate a numbered to-be-perceived image.
Specifically, as shown in fig. 3, after the image to be perceived (i.e., input 2 in fig. 3) is input into the environment perception model, the image to be perceived (i.e., feature map 2 in fig. 3) is encoded by the image encoder based on the convolutional neural network (CNN 1), and a numbered image to be perceived (i.e., feature map 2 in fig. 3) is generated.
In specific implementation, the structure of the convolutional neural network for implementing the image encoder is not specifically limited in the embodiment of the present application, and may be implemented by using different Convolutional Neural Network (CNN) structures.
In specific implementation, as shown in fig. 3, after the internal parameter data and the external parameter data of the multi-view sensor and the data characterizing the on-line data enhancement strategy in the training process of the environment perception model are input into the environment perception model, a thermodynamic diagram (i.e., the characteristic diagram 1 in fig. 3) is generated by performing a series of matrix transformations on the internal parameter data matrix, the external parameter data matrix and the data characterizing the on-line data enhancement strategy.
In specific implementation, as shown in fig. 3, the numbered images to be perceived and the thermodynamic diagram are subjected to matrix transformation to obtain voxel characteristics based on the own vehicle coordinate system, so that the 2D images to be perceived are converted into 3D images.
In specific implementation, in order to implement multitask perception, as shown in fig. 3, in this embodiment, it is proposed that the travelable region partition header and the lane line partition header may be implemented by using the same decoder based on the convolutional neural network (CNN 2). Namely, the same convolutional neural network (CNN 2) structure can be adopted for the decoders for realizing the travelable region division heads and the lane line division heads. Specifically, the structure of the convolutional neural network of the decoder for implementing the drivable area dividing head and the lane line dividing head is not particularly limited in the embodiments of the present application, and may be implemented by using different Convolutional Neural Network (CNN) structures.
In specific implementation, the 3D target detection head can be implemented by using the existing 3D target detection head CenterHead of CenterPoint.
In order to further improve the accuracy of the environmental awareness model, in this embodiment, it is proposed to increase the accuracy of the true value during the process of training the environmental awareness model, for example, as shown in fig. 3,
inputting laser point cloud data aligned with an image frame into the environment perception model (i.e. input 3 in fig. 3);
aiming at current frame laser point cloud data in the laser point cloud data aligned with the image frame, enhancing the current frame laser point cloud data by adopting historical laser point cloud data after motion compensation to obtain the enhanced current frame laser point cloud data, wherein the historical laser point cloud data comprises data of preset time intervals; specifically, an IMU (Inertial Motion Unit) may be used to perform Motion compensation on the historical point cloud data;
and taking the enhanced current frame laser point cloud data as a true value, and performing supervised learning on the three-dimensional depth information inferred in the numbered image to be perceived and the inference result of the 3D target detection.
In specific implementation, when a true value of the supervision model learning is generated in a training process, the laser point cloud data of the current frame is enhanced by adopting the historical point cloud data after motion compensation, namely the laser point cloud data of the historical frame after motion compensation is adopted to supplement the laser point cloud data of the current frame, namely the laser point cloud data of the current frame is enhanced at a time sequence angle, the enhanced laser point cloud data of the current frame is used as the true value, and the three-dimensional depth information inferred in the numbered image to be perceived (namely the characteristic diagram 2 in the figure 3) and the inference result of 3D target detection are supervised, so that the precision of the environment perception model is improved, the features of the true value are richer, and the problem that the number of feature points is less due to the fact that an object of the current frame in the true value is shielded is solved; in addition, although the BEVerse model issued in the prior art has introduced a deep learning module aiming at time sequence learning, such as a GRU (Gate recovery Unit) and the like, to ensure model inference accuracy in the multitask learning of 3D detection, semantic segmentation and trajectory prediction, the model parameter quantity is large, and the requirement of the most critical real-time property of model embedded deployment is not met.
In specific implementation, in order to further improve the accuracy of the environmental awareness model, in this embodiment, it is further proposed to increase the accuracy of the truth value in the process of training the environmental awareness model, for example, as shown in fig. 3, in the process of training the environmental awareness model, map data (i.e., input 4 in fig. 3) is input into the environmental awareness model, where the map data includes semantic information of the surrounding environment of the vehicle; specifically, the semantic information of the vehicle surroundings may include information such as a road edge, a lane line, and the like;
and taking the map data as a true value, and performing supervised learning on the inference result of the drivable area BEV segmentation and the inference result of the lane line BEV segmentation.
Specifically, the truth value is made by adopting high-precision map data comprising semantic information of the surrounding environment of the vehicle, so that the characteristics of the truth value are richer, and the problem that in the truth value, the characteristic points are less due to the fact that an object of the current frame is shielded can be solved.
In the specific implementation process, in the process of training the environment perception model, by increasing the truth value precision, the method not only conducts supervised learning on the predicted 3D detection, the travelable area and the lane line result, but also conducts supervised learning on the depth information predicted by the environment perception model by using the enhanced laser point cloud data.
In specific implementation, in order to further improve the accuracy of the environmental awareness model, in this embodiment, an idea of an ablation experiment is proposed, for example, in the process of training the environmental awareness model, an image encoder (for example, an image encoder based on CNN1 in fig. 3) that encodes the image to be perceived in the environmental awareness model is trained by using an ImageNet (ImageNet project is a large visual database used for research of visual target recognition software) data set, and then, the trained image encoder is used as an initial weight to train the environmental awareness model.
In specific implementation, in order to further improve the accuracy of the environment sensing model, in this embodiment, a CDA (current data augmentation) strategy is proposed for data enhancement, for example, in the process of training the environment sensing model, the strength of the online data enhancement strategy is increased along with the increase of the training period.
Specifically, increasing the strength of the online data enhancement strategy may be increasing data adjustment for data representing the online data enhancement strategy in the training process of the environmental perception model, so that the difference between the enhanced image data and the original image data is increased.
In specific implementation, in order to further improve the accuracy of the environment sensing model, in this embodiment, it is proposed that in the environment sensing model, the travelable region dividing head and the lane line dividing head use BCE (Binary cross entropy Loss) Loss functions, the 3D target detection head uses a Loss function that is a combination of Focal Loss function and L1 Loss function, and the Loss functions of the travelable region dividing head, the lane line dividing head and the 3D target detection head are combined in a weighted form for supervised learning, where the weights of the Loss functions of the 3D target detection head are respectively greater than the weight of the Loss function of the travelable region dividing head and the weight of the Loss function of the lane line dividing head, so that the environment sensing model is easier to regress, and the accuracy of the environment sensing model is further improved.
In specific implementation, in order to further improve the accuracy of the environment sensing model, in this embodiment, in the training process of the environment sensing model, a Warm-up (pre-heating) strategy and a cosine annealing strategy are adopted to optimize the environment sensing model.
In specific implementation, the Warm-up strategy is specifically as follows: the method comprises the steps of firstly using a smaller learning rate to train a model on a training set for a smaller number of times, and then using a larger learning rate to train the model on the training set for a larger number of times, so as to reduce the convergence curve oscillation in the early stage of training when the model uses a larger learning rate subsequently.
In specific implementation, the cosine annealing strategy specifically comprises: the learning rate is reduced in a cosine curve mode along with the increase of the iteration period in the model training process, and the purpose is to enable the model to approach the local optimal solution along with the increase of the iteration period and increase the precision of the final model.
In this embodiment, a computer device is provided, as shown in fig. 4, and includes a memory 402, a processor 404, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the processor implements any of the above-mentioned environment sensing methods for multi-view BEV views.
In particular, the computer device may be a computer terminal, a server or a similar computing device.
In the present embodiment, there is provided a computer-readable storage medium storing a computer program for executing the above-mentioned arbitrary multi-view BEV perspective environment perception method.
In particular, computer-readable storage media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer-readable storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable storage medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
In this embodiment, an environment sensing apparatus with a multi-view BEV view is provided, as shown in fig. 5, the apparatus includes:
the image preprocessing module 502 is used for preprocessing an image to generate an image to be perceived, wherein the image is a pure visual image of the surrounding environment of the vehicle acquired by a multi-view sensor;
a first data receiving module 504, disposed in the environment perception model, for receiving the input image to be perceived;
a second data receiving module 506, disposed in the environmental perception model, for receiving input internal parameter data and external parameter data of the multi-view sensor, and data representing an online data enhancement strategy in a training process of the environmental perception model;
an image encoder module 508, disposed in the environment sensing model, configured to encode the to-be-sensed image to generate a numbered to-be-sensed image;
a first matrix transformation module 510, disposed in the environmental perception model, for performing matrix transformation on internal parameter data and external parameter data of the multi-view sensor and data representing an online data enhancement strategy in a training process of the environmental perception model to generate a thermodynamic diagram;
the second matrix transformation module 512 is arranged in the environment perception model and is used for performing matrix transformation on the numbered images to be perceived and the thermodynamic diagram to obtain voxel characteristics based on a self-vehicle coordinate system;
a 3D target detection head module 514, disposed in the environment perception model, and configured to receive and perform 3D target detection on the voxel characteristics, and output an inference result of the 3D target detection;
the drivable region segmentation head module 516 is arranged in the environment perception model and used for receiving and carrying out drivable region segmentation on the voxel characteristics and outputting an inference result of drivable region BEV segmentation;
and the lane line segmentation head module 518 is disposed in the environment perception model, and is configured to receive and perform lane line segmentation on the voxel characteristics, and output an inference result of the lane line BEV segmentation.
In an embodiment, the image preprocessing module is configured to preprocess the image to generate the image to be perceived according to data representing an online data enhancement policy in the training process of the environmental perception model, where the data representing the online data enhancement policy in the training process of the environmental perception model includes: the random noise reduction method comprises the following steps of a random scaling matrix, a random cutting matrix, a random mirror image matrix, a random rotation matrix, a chromaticity space random variation matrix and a random noise increasing matrix, wherein the chromaticity space random variation matrix comprises a matrix for randomly changing hue, color saturation and brightness values in a chromaticity space.
In one embodiment, the image encoder module is implemented by a convolutional neural network based image encoder.
In one embodiment, the apparatus further comprises:
the third data receiving module is used for receiving laser point cloud data which are input into the environment perception model and are aligned with the image frame in the process of training the environment perception model;
the data enhancement module is used for enhancing the current frame laser point cloud data by adopting historical laser point cloud data after motion compensation aiming at the current frame laser point cloud data in the laser point cloud data aligned with the image frame to obtain the enhanced current frame laser point cloud data, wherein the historical laser point cloud data comprise data at preset time intervals;
and the first supervised learning module is used for taking the enhanced current frame laser point cloud data as a true value and carrying out supervised learning on the three-dimensional depth information inferred in the numbered image to be perceived and the inference result detected by the 3D target.
In one embodiment, the apparatus further comprises:
the fourth data receiving module is used for inputting map data into the environment perception model in the process of training the environment perception model, wherein the map data comprise semantic information of the surrounding environment of the vehicle;
and the second supervised learning module is used for carrying out supervised learning on the inference result of the drivable area BEV segmentation and the inference result of the lane line BEV segmentation by taking the map data as a true value.
In one embodiment, the above apparatus further comprises:
and the model training module is used for firstly adopting an ImageNet data set to train an image encoder for encoding the image to be perceived in the environment perception model in the process of training the environment perception model, and then training the environment perception model by taking the weight of the trained image encoder as an initial weight.
In one embodiment, the above apparatus further comprises:
and the model training module is also used for increasing the intensity of the online data enhancement strategy along with the increase of the training period in the process of training the environment perception model.
In one embodiment, in the context awareness model, the drivable region segmentation head module and the lane line segmentation head module are implemented using the same decoder based on CNN.
In one embodiment, in the environment perception model, the travelable region dividing head module and the lane line dividing head module use BCE Loss functions, the 3D target detection head uses a Loss function that is a combination of a Focal Loss function and an L1 Loss function, and the Loss function of the travelable region dividing head, the Loss function of the lane line dividing head, and the Loss function of the 3D target detection head are combined in a weighted form for supervised learning, wherein the weight of the Loss function of the 3D target detection head is greater than the weight of the Loss function of the travelable region dividing head and the weight of the Loss function of the lane line dividing head, respectively.
The embodiment of the invention realizes the following technical effects: compared with the prior art, the beneficial effects that can be achieved by the at least one technical scheme adopted by the embodiment of the specification at least comprise: 3D target detection, drivable region segmentation under a multi-view BEV view angle and lane line segmentation can be realized through the environment perception model, namely, multi-task environment perception is realized through a single model; in addition, the environment perception model introduces internal parameter data, external parameter data and on-line data enhancement strategies of the multi-view sensor, and is beneficial to improving the robustness and the precision of the model; the environment perception model carries out environment perception based on the pure visual image collected by the multi-view sensor, and compared with the scheme of adopting radar data to carry out environment perception, the environment perception model can avoid using radar equipment to collect data, and further can effectively reduce the cost of environment perception.
It will be apparent to those skilled in the art that the modules or steps of the embodiments of the invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, embodiments of the invention are not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes may be made to the embodiment of the present invention by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (14)

1. A method for sensing the environment of a multi-view BEV view angle is characterized by comprising the following steps:
acquiring a pure visual image of the surrounding environment of the vehicle through a multi-view sensor, preprocessing the image to generate an image to be perceived, and inputting the image to be perceived into an environment perception model;
inputting the internal parameter data and the external parameter data of the multi-view sensor and the data representing the on-line data enhancement strategy in the environment perception model training process into the environment perception model;
coding the image to be perceived through the environment perception model to generate a numbered image to be perceived, carrying out matrix transformation on internal parameter data and external parameter data of the multi-view sensor and data representing an on-line data enhancement strategy in the training process of the environment perception model to generate a thermodynamic diagram, carrying out matrix transformation on the numbered image to be perceived and the thermodynamic diagram to obtain voxel characteristics based on a self-vehicle coordinate system, respectively inputting the voxel characteristics to a 3D target detection head, a drivable area dividing head and a lane line dividing head, and outputting an inference result of 3D target detection, an inference result of drivable area BEV division and an inference result of lane line BEV division.
2. The method for context awareness of multi-view BEV views according to claim 1, wherein preprocessing the image to generate an image to be perceived comprises:
preprocessing the image according to data representing an online data enhancement strategy in the environmental perception model training process to generate an image to be perceived, wherein the data representing the online data enhancement strategy in the environmental perception model training process comprises the following steps: the random noise reduction method comprises the following steps of a random scaling matrix, a random cutting matrix, a random mirror image matrix, a random rotation matrix, a chromaticity space random variation matrix and a random noise increasing matrix, wherein the chromaticity space random variation matrix comprises a matrix for randomly changing hue, color saturation and brightness values in a chromaticity space.
3. The method of claim 1, wherein encoding the image to be perceived to generate a numbered image to be perceived comprises:
and coding the image to be perceived through an image coder based on a convolutional neural network to generate the numbered image to be perceived.
4. The method for context awareness of multi-purpose BEV perspectives as claimed in any one of claims 1 to 3, further comprising:
inputting laser point cloud data aligned with an image frame into the environment perception model in the process of training the environment perception model;
aiming at current frame laser point cloud data in the laser point cloud data aligned with the image frame, enhancing the current frame laser point cloud data by adopting historical laser point cloud data after motion compensation to obtain the enhanced current frame laser point cloud data, wherein the historical laser point cloud data comprises data of a preset time interval;
and taking the enhanced current frame laser point cloud data as a true value, and performing supervised learning on the three-dimensional depth information inferred in the numbered image to be perceived and the inference result of the 3D target detection.
5. The method for context awareness of multi-view BEV views according to claim 4, further comprising:
inputting map data into the environmental perception model in the process of training the environmental perception model, wherein the map data comprises semantic information of the surrounding environment of the vehicle;
and taking the map data as a true value, and performing supervised learning on the inference result of the drivable area BEV segmentation and the inference result of the lane line BEV segmentation.
6. The method for context awareness of multi-purpose BEV perspectives as claimed in any one of claims 1 to 3, further comprising:
in the process of training the environment perception model, an ImageNet data set is firstly adopted to train an image encoder for encoding the image to be perceived in the environment perception model, and then the weight of the trained image encoder is used as an initial weight to train the environment perception model.
7. The method for context awareness of multi-purpose BEV perspectives as claimed in any one of claims 1 to 3, further comprising:
in the process of training the environment perception model, along with the increase of the training period, the strength of the online data enhancement strategy is increased.
8. The method for context awareness of multi-purpose BEV perspectives as claimed in any one of claims 1 to 3, further comprising:
in the environment perception model, the travelable region division head and the lane line division head are implemented by the same decoder based on CNN.
9. The method for context awareness of multi-purpose BEV perspectives as claimed in any one of claims 1 to 3, further comprising:
in the environment perception model, the travelable region dividing head and the lane line dividing head adopt BCE Loss functions, the 3D target detection head adopts a Loss function combining a Focal Loss function and an L1 Loss function, the Loss function of the travelable region dividing head, the Loss function of the lane line dividing head and the Loss function of the 3D target detection head are combined in a weighting mode to carry out supervised learning, wherein the weight of the Loss function of the 3D target detection head is respectively greater than the weight of the travelable region dividing head and the weight of the Loss function of the lane line dividing head.
10. The method for context awareness of multi-purpose BEV perspectives as claimed in any one of claims 1 to 3, further comprising:
and in the training process of the environment perception model, optimizing the environment perception model by adopting a preheating strategy and a cosine annealing strategy.
11. The method for context awareness of multi-purpose BEV perspectives as claimed in any one of claims 1 to 3, further comprising:
in the training process of the environment perception model, the training image comprises a purely visual image of the surrounding environment of the vehicle and radar data.
12. An environment sensing apparatus for multi-view BEV viewing angle, comprising:
the image preprocessing module is used for preprocessing an image to generate an image to be perceived, wherein the image is a pure visual image of the surrounding environment of the vehicle, which is acquired by a multi-view sensor;
the first data receiving module is arranged in the environment perception model and used for receiving the input image to be perceived;
the second data receiving module is arranged in the environment perception model and used for receiving input internal parameter data and external parameter data of the multi-view sensor and data representing an online data enhancement strategy in the environment perception model training process;
the image coder module is arranged in the environment perception model and used for coding the image to be perceived to generate a numbered image to be perceived;
the first matrix transformation module is arranged in the environment perception model and used for carrying out matrix transformation on internal parameter data and external parameter data of the multi-view sensor and data representing an on-line data enhancement strategy in the environment perception model training process to generate a thermodynamic diagram;
the second matrix transformation module is arranged in the environment perception model and used for carrying out matrix transformation on the numbered images to be perceived and the thermodynamic diagram to obtain voxel characteristics based on a self-vehicle coordinate system;
the 3D target detection head module is arranged in the environment perception model and used for receiving and carrying out 3D target detection on the voxel characteristics and outputting a reasoning result of the 3D target detection;
the drivable region segmentation head module is arranged in the environment perception model and is used for receiving and carrying out drivable region segmentation on the voxel characteristics and outputting an inference result of drivable region BEV segmentation;
and the lane line segmentation head module is arranged in the environment perception model and is used for receiving and carrying out lane line segmentation on the voxel characteristics and outputting an inference result of the lane line BEV segmentation.
13. A computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the method for context awareness of multi-purpose BEV perspectives of any one of claims 1-11 when executing the computer program.
14. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program for executing the method for context awareness of multi-purpose BEV perspectives of any one of claims 1 to 11.
CN202210902286.7A 2022-07-29 2022-07-29 Multi-view BEV (beam steering angle) visual angle environment sensing method, device, equipment and storage medium Active CN114973181B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210902286.7A CN114973181B (en) 2022-07-29 2022-07-29 Multi-view BEV (beam steering angle) visual angle environment sensing method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210902286.7A CN114973181B (en) 2022-07-29 2022-07-29 Multi-view BEV (beam steering angle) visual angle environment sensing method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN114973181A CN114973181A (en) 2022-08-30
CN114973181B true CN114973181B (en) 2022-10-14

Family

ID=82969741

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210902286.7A Active CN114973181B (en) 2022-07-29 2022-07-29 Multi-view BEV (beam steering angle) visual angle environment sensing method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114973181B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116704472B (en) * 2023-05-15 2024-04-02 小米汽车科技有限公司 Image processing method, device, apparatus, medium, and program product

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110070139A (en) * 2019-04-28 2019-07-30 吉林大学 Small sample towards automatic Pilot environment sensing is in ring learning system and method
CN112396043A (en) * 2021-01-21 2021-02-23 北京主线科技有限公司 Vehicle environment information perception method and device, electronic equipment and storage medium
CN112650220A (en) * 2020-12-04 2021-04-13 东风汽车集团有限公司 Automatic vehicle driving method, vehicle-mounted controller and system
CN113255779A (en) * 2021-05-28 2021-08-13 中国航天科工集团第二研究院 Multi-source perception data fusion identification method and system and computer readable storage medium
CN114332494A (en) * 2021-12-22 2022-04-12 北京邮电大学 Three-dimensional target detection and identification method based on multi-source fusion under vehicle-road cooperation scene

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7514166B2 (en) * 2020-11-06 2024-07-10 株式会社Subaru Vehicle driving support device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110070139A (en) * 2019-04-28 2019-07-30 吉林大学 Small sample towards automatic Pilot environment sensing is in ring learning system and method
CN112650220A (en) * 2020-12-04 2021-04-13 东风汽车集团有限公司 Automatic vehicle driving method, vehicle-mounted controller and system
CN112396043A (en) * 2021-01-21 2021-02-23 北京主线科技有限公司 Vehicle environment information perception method and device, electronic equipment and storage medium
CN113255779A (en) * 2021-05-28 2021-08-13 中国航天科工集团第二研究院 Multi-source perception data fusion identification method and system and computer readable storage medium
CN114332494A (en) * 2021-12-22 2022-04-12 北京邮电大学 Three-dimensional target detection and identification method based on multi-source fusion under vehicle-road cooperation scene

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于多模态融合的自动驾驶感知及计算;张燕咏等;《计算机研究与发展》;20200901(第09期);全文 *

Also Published As

Publication number Publication date
CN114973181A (en) 2022-08-30

Similar Documents

Publication Publication Date Title
CN114723955B (en) Image processing method, apparatus, device and computer readable storage medium
US10915793B2 (en) Method and system for converting point cloud data for use with 2D convolutional neural networks
Hou et al. Multiview detection with feature perspective transformation
CN111666921B (en) Vehicle control method, apparatus, computer device, and computer-readable storage medium
US11151734B2 (en) Method and system for generating synthetic point cloud data using a generative model
US20220343138A1 (en) Analysis of objects of interest in sensor data using deep neural networks
Akan et al. Stretchbev: Stretching future instance prediction spatially and temporally
EP3992908A1 (en) Two-stage depth estimation machine learning algorithm and spherical warping layer for equi-rectangular projection stereo matching
CN113536920B (en) Semi-supervised three-dimensional point cloud target detection method
US20220301099A1 (en) Systems and methods for generating object detection labels using foveated image magnification for autonomous driving
CN115605918A (en) Spatio-temporal embedding
Ouyang et al. A cgans-based scene reconstruction model using lidar point cloud
DE102022114201A1 (en) Neural network for object detection and tracking
CN114973181B (en) Multi-view BEV (beam steering angle) visual angle environment sensing method, device, equipment and storage medium
EP3663965A1 (en) Method for predicting multiple futures
Paravarzar et al. Motion prediction on self-driving cars: A review
US12008762B2 (en) Systems and methods for generating a road surface semantic segmentation map from a sequence of point clouds
JP2022191188A (en) System and method for training prediction system
EP3992909A1 (en) Two-stage depth estimation machine learning algorithm and spherical warping layer for equi-rectangular projection stereo matching
CN116452953A (en) Fusion perception-based robot target detection method, apparatus and medium
US20230105331A1 (en) Methods and systems for semantic scene completion for sparse 3d data
Foster Object detection and sensor data processing for off-road autonomous vehicles
US20240135721A1 (en) Adversarial object-aware neural scene rendering for 3d object detection
Schieber Camera and LiDAR based Deep Feature Fusion for 3D Semantic Segmentation
CN117710917A (en) End-to-end multi-mode multi-task automatic driving sensing method and device based on long-short time sequence mixed coding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant