WO2023040068A1 - 感知模型训练方法、基于感知模型的场景感知方法 - Google Patents

感知模型训练方法、基于感知模型的场景感知方法 Download PDF

Info

Publication number
WO2023040068A1
WO2023040068A1 PCT/CN2021/135453 CN2021135453W WO2023040068A1 WO 2023040068 A1 WO2023040068 A1 WO 2023040068A1 CN 2021135453 W CN2021135453 W CN 2021135453W WO 2023040068 A1 WO2023040068 A1 WO 2023040068A1
Authority
WO
WIPO (PCT)
Prior art keywords
center point
data
perception
loss function
scene
Prior art date
Application number
PCT/CN2021/135453
Other languages
English (en)
French (fr)
Inventor
贾楠
徐倩
杨鑫
Original Assignee
惠州市德赛西威汽车电子股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 惠州市德赛西威汽车电子股份有限公司 filed Critical 惠州市德赛西威汽车电子股份有限公司
Publication of WO2023040068A1 publication Critical patent/WO2023040068A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/80Geometric correction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration

Definitions

  • the present application relates to the technical field of automotive electronics, and more specifically, to a perception model training method and a perception model-based scene perception method.
  • ADAS Advanced Driver Assistance System
  • driverless technology has developed rapidly.
  • the vehicle can obtain the current environmental information of the vehicle through on-board sensors such as on-board cameras, and then the on-board processing system processes and refines the sensor information through perception algorithms for the vehicle control system to make corresponding decisions.
  • the perception algorithm provides prerequisites and guarantees for the safety of the vehicle.
  • the classification algorithm can give people and vehicles in the scene, but it cannot get. For example: people are less than 5, vehicles are more than 10. More detailed information such as traffic signs, and related technologies have weak perception of small-sized objects in the scene, such as traffic lights and traffic signs. In contrast, this technology can not only provide conditions such as weather, light, time of day, road sections, etc., but also cannot provide more specific structured information for more sensitive objects, such as pedestrians, vehicles, and traffic signs.
  • the present application provides a perception model training method and a perception model-based scene perception method.
  • a perception model training method applied in automotive electronic products, said method comprises the following steps:
  • S2 acquires image data, and performs a forward operation on the image data to obtain the total value of the loss function of the image data;
  • S3 performs a backward operation to update the parameters of the perception model according to the total value of the loss function
  • step S1 the perceptual model parameters are initialized by assigning random values.
  • step S2 said acquiring image data includes the following steps:
  • S21 collects multiple image data through the vehicle camera
  • S22 performs preprocessing on each image data, and marks each image data to form a data set.
  • step S22 includes:
  • S223 performs image scaling, and performs 2D frame and multi-label labeling on the image
  • correction formula is:
  • k 1 , k 2 , p 1 , p 2 , k3 are distortion parameters
  • x, y are pixel coordinates after correction
  • x′, y′ are pixel coordinates before correction
  • r x 2 +y 2 .
  • step S2 the forward calculation of the image data is performed based on a convolutional neural network algorithm, and the specific steps include:
  • S23 maps the image data to a feature map through the CNN feature extraction module
  • S24 Link the feature map to the linear mapping layer, map the feature map to a matrix of C scene ⁇ 1, the matrix of C scene ⁇ 1 is the multi-label classification data output by the multi-label classification branch, and C scene is the classification of the multi-label classification branch number;
  • S25 performs 2 n-2 times upsampling on the feature map to obtain the original image with width and height
  • the detection head feature map of , where R 4 is the downsampling factor.
  • the feature map of the detection head described in S26 is respectively passed through the convolution layer to obtain data output by three branches: center point data, center point offset data, and size data.
  • the total value of the loss function is the sum of the multi-label separation loss function and the 2D frame object detection loss function.
  • the multi-label separation loss function is calculated by the following formula: Among them, Y i is the true value of the category, Estimated values for the categories.
  • the 2D frame object detection loss function is the sum of a center point loss function, a center point offset loss function, and a size loss function;
  • the center point loss function is calculated by the following formula:
  • are the hyperparameters of the center point loss function, is the true value of the center point position, W and H are the width and height of the original image, R is the downsampling factor, and C is the number of detection categories.
  • the center point offset loss function is calculated by the following formula: (4), where p ⁇ R 2 is the true value of the center point position, is the predicted value of the center point position, Predicted value for center point position offset
  • the size loss function is calculated by the following formula: in, is the predicted value of the 2D box size, and sk is the true value of the 2D box size.
  • step S23 performing a backward operation to update the perceptual model parameters according to the total value of the loss function, including:
  • the backward calculation is performed through the chain derivation rule, and the parameters of the perception model are updated.
  • the present application also provides a scene perception method based on a perceptual model, the perceptual model is obtained through the above-mentioned perceptual model training method; the scene perception method includes:
  • S5 acquires the scene image, and preprocesses the input image to obtain the input image
  • S7 performs analysis and information fusion on the inference result to obtain a scene perception result.
  • step S5 includes:
  • S51 calibrates the scene image through a camera calibration algorithm to obtain a distortion parameter
  • S52 obtains the corrected coordinates through the corrected formula, and obtains the corrected image through the bilinear difference algorithm
  • S53 performs image scaling to obtain an input image
  • correction formula is:
  • k 1 , k 2 , p 1 , p 2 , k3 are distortion parameters
  • x, y are pixel coordinates after correction
  • x′, y′ are pixel coordinates before correction
  • r x 2 +y 2 .
  • step S6 includes:
  • the input image is input to the perceptual model, and the inference result output by the perceptual model is multi-label classification data, and center point data, center point offset data, and size data of the 2D frame.
  • the parsing and information fusion of the inference result to obtain a scene perception result includes:
  • S71 analyzes the center point data, center point offset data, and size data information to obtain 2D detection frame information
  • S72 parses the multi-label classification data to obtain scene information
  • S73 fuses 2D detection frame information and classification information to obtain the final scene perception result.
  • step S71 the three branch information of analyzing center point, center point offset and size are obtained to obtain 2D detection frame information, including:
  • S711 predicts the position coordinates of the center point of the fuzzy 2D frame through the center point data
  • S712 can predict the offset of the 2D frame center point through the center point offset data, and add it to the fuzzy center point to obtain the final precise 2D frame center position coordinates;
  • S713 can predict the width and height information of the 2D frame through the size data
  • the step S73 includes: obtaining object information through the 2D detection frame information, and predicting scene information through multi-label classification data.
  • this application trains the perception model, and obtains scene image data through the perception model, with high accuracy, and at the same time performs 2D object detection tasks and multi-target classification through the convolutional neural network algorithm Tasks, greatly reducing the amount of calculation, real-time calculations can be performed on the vehicle processor to obtain the perception results.
  • FIG. 1 is a flowchart of a perception model training method according to an embodiment of the present application.
  • FIG. 2 is a flowchart of a scene perception method according to an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a scene image of an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a rectified image according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a convolutional network algorithm according to an embodiment of the present application.
  • first and second are used for descriptive purposes only, they are mainly used to distinguish different devices, elements or components (the specific types and structures may be the same or different), and are not used for The relative importance and quantity of the indicated means, elements or components are stated or implied, and should not be construed as indicating or implying relative importance.
  • the present application provides a perceptual model training method, which is applied to automotive electronic products, and the method includes the following steps:
  • step S1 initializes the parameters of the perception model; in step S1, the parameters of the perception model are initialized by assigning random values.
  • the random number generation function of the system can be used to generate corresponding random numbers and assign values to the parameters of the perception model.
  • S2 obtains the image data, and performs forward calculation on the image data to obtain the total value of the loss function of the image data; in step 2, obtains the image data, including the following steps: S21 collects multiple image data through the vehicle camera; S22 for each The image data is preprocessed, and each image data is annotated to form a data set.
  • the forward operation of the image data is calculated based on the convolutional neural network algorithm, and the specific steps include: S23 maps the image data to a feature map through the CNN feature extraction module; S24 links the feature map to the linear mapping layer, and maps the feature map to C
  • the matrix of scene ⁇ 1, the matrix of C scene ⁇ 1 is the multi-label classification data output by the multi-label classification branch, and C scene is the classification number of the multi-label classification branch; S25 performs 2 n-2 times upsampling on the feature map to obtain The width and height are the original image
  • the feature map of the S26 detection head passes through the convolution layer to obtain three branches: center point data, center point offset data, and size data.
  • the forward operation is to perform feature extraction, feature classification, feature upsampling, and feature 2D frame object detection on the image data according to the convolutional neural network algorithm; through the forward operation, the multi-label classification branch, as well as the center point data and center of the 2D frame are obtained. Point offset data, dimension data. And according to the forward operation, the total value of the loss function of the image data is calculated.
  • step S3 performs a backward operation to update the parameters of the perception model according to the total value of the loss function; in step S3, the backward operation is the reverse derivation of the forward operation, and the function of the perception model is updated according to the total value of the loss function.
  • step S4 iterates steps S2 to S3, and when the number of iterations reaches a preset number, a final perceptual model is obtained.
  • the application uses the perceptual model parameters generated by the previous S3 as the perceptual model parameters of the next S2 to iterate S2 and S3 multiple times.
  • the preset number of times can be based on the final input image data to obtain data and pre-acquire 2D When the boxes and multi-label annotations are consistent, the iteration is stopped and the final perception model is generated. In this embodiment, the preset number of times may be 100,000 times.
  • this application is based on the CNN network model, which has high accuracy.
  • the multi-task learning and multiplexing feature extraction module is used to simultaneously perform 2D object detection tasks and multi-target classification tasks, which greatly reduces the amount of calculation. Perform real-time calculations on the computer to obtain perception results.
  • step S2 acquiring image data includes the following steps:
  • step S21 collects a plurality of image data through the vehicle-mounted camera; in step S21, the image data of the present application can be obtained through the vehicle-mounted camera, and the image data is continuously obtained by the vehicle driving on the road.
  • Step S22 preprocesses each image data, and marks each image data to form a data set.
  • Step S22 includes: S221 calibrates the image data through the camera calibration algorithm to obtain distortion parameters; S222 obtains the correction coordinates through the correction formula, and obtains the corrected image as shown in Figure 4 through the bilinear difference algorithm; S223 performs image scaling, And perform 2D frame and multi-label annotation on the image; where the correction formula is:
  • k 1 , k 2 , p 1 , p 2 , k3 are distortion parameters
  • x, y are pixel coordinates after correction
  • x′, y′ are pixel coordinates before correction
  • r x 2 +y 2 .
  • the camera calibration algorithm may be Zhang Zhengyou's calibration algorithm
  • the image scaling may use a bilinear difference method to scale the image to a uniform size, such as 1280 pixels x 720 pixels.
  • 2D frame and multi-label labeling for images can be manually performed on 2D frame selection and multi-label labeling of the features in the image data.
  • step S2 the forward calculation is performed on the image data based on the convolutional neural network algorithm, wherein the convolutional neural network algorithm includes a feature extraction module, a multi-label classification module, an upsampling module, a 2D frame Object detection module; specific steps include:
  • step S23 maps the image data to a feature map through the CNN feature extraction module; in step S23, a lightweight network is used as a feature extraction module, such as MobileNetV2, CSPNet, and the input picture is mapped to a feature map after the feature extraction module, and the feature map is wide
  • a lightweight network is used as a feature extraction module, such as MobileNetV2, CSPNet, and the input picture is mapped to a feature map after the feature extraction module, and the feature map is wide
  • the height is 1/2 n of the input image, n is usually 5.
  • S25 performs 2 n-2 times upsampling on the feature map to obtain the original image with width and height
  • the feature map of the S26 detection head passes through the convolution layer to obtain the output data of three branches: center point data, center point offset data, and size data.
  • the feature map of the detection head passes through the convolution layer to obtain three branches: the coordinates of the center point position can be obtained from the center point data, the offset value of the center point position can be obtained from the center point offset data, and the size data can be obtained in 2D The width and height of the box.
  • calculation branches that is, calculation operations, which are respectively multi-label classification branch, center point branch, center point offset branch, and size branch, and linear
  • the mapping layer, upsampling layer, and convolutional layer are specific operations in the above branches.
  • the input feature map is operated through the linear mapping layer, and the multi-label classification branch outputs multi-label classification data, that is, a matrix of C scene ⁇ 1.
  • the center point branch the feature map of the detection head is input, and the center point branch outputs the center point data through the convolution layer operation, namely matrix.
  • the center point offset branch the feature map of the detection head is input, and through the convolution layer operation, the center point offset branch outputs the center point offset data, namely matrix.
  • the size branch the feature map of the detection head is input, and through the convolutional layer operation, the size branch outputs the size data, namely Among them, the center point branch, the center point offset branch, and the size branch together form the 2D frame object detection task.
  • the application performs feature extraction in S23, the application performs multi-label classification tasks through S24 to obtain multi-label classification data; obtains center point data, center point offset data, and size data through S24 upsampling and S25 2D frame detection tasks.
  • the total value of the loss function is the sum of the multi-label separation loss function and the 2D frame object detection loss function.
  • the multi-label separation loss function is calculated by the following formula:
  • Y i is the true value of the category
  • the 2D frame object detection loss function is the sum of the center point loss function, center point offset loss function, and size loss function;
  • the center point loss function is calculated by the following formula:
  • are the hyperparameters of the center point loss function, is the true value of the center point position, W and H are the width and height of the original image, R is the downsampling factor, and C is the number of detection categories.
  • the center point offset loss function is calculated by the following formula: (4), where p ⁇ R 2 is the true value of the center point position, is the predicted value of the center point position, Predicted value for center point position offset
  • the size loss function is calculated by the following formula: in, is the predicted value of the 2D box size, and sk is the true value of the 2D box size.
  • the present application calculates the total value of the loss function by calculating the loss function of each part of the forward operation.
  • a backward calculation is performed to update the perceptual model parameters, including: performing a backward calculation according to the total value of the loss function through the chain derivation rule, and performing a backward calculation on the perceptual model parameters to update.
  • the backward calculation is performed along the reverse of each step of the forward calculation, and the input image and the output result of the forward calculation, the total value of the loss function, and the calculation and update of the perceptual model parameters .
  • the backward calculation is implemented by machine learning libraries, such as PyToch, TensorFlow and other models.
  • the present application also provides a perception model-based scene perception method, where the perception model is obtained through the above-mentioned perception model training method; the scene perception method includes:
  • S5 acquires the scene image, and preprocesses the input image to obtain the input image; in step S5, the scene image is acquired in real time through the camera while the vehicle is driving, specifically including: S51 uses the camera calibration algorithm to calibrate the scene image to obtain distortion Parameters; S52 obtains correction coordinates through a correction formula, and obtains a correction image through a bilinear difference algorithm; S53 performs image scaling to obtain an input image. S53 performs image scaling to obtain an input image; wherein, the correction formula is:
  • k 1 , k 2 , p 1 , p 2 , k3 are distortion parameters
  • x, y are pixel coordinates after correction
  • x′, y′ are pixel coordinates before correction
  • r x 2 +y 2 .
  • the camera calibration algorithm may be Zhang Zhengyou's calibration algorithm
  • the image scaling may use a bilinear difference method to scale the image to a uniform size, such as 1280 pixels x 720 pixels.
  • step S6 inputs the input image into the perceptual model for inference, and obtains the inference result; in step S6, input the input image into the perceptual model, and the inference result output by the perceptual model is multi-label classification data, as well as the center point data and center point of the 2D frame Offset data, dimension data.
  • the forward operation of the input image is calculated based on the convolutional neural network algorithm, wherein the convolutional neural network algorithm includes a feature extraction module, a multi-label classification module, an upsampling module, and a 2D frame object detection module; the specific steps include: Through the CNN feature extraction module, the image data is mapped into a feature map; this application uses a lightweight network as a feature extraction module, such as MobileNetV2, CSPNet, after the feature extraction module, the input image is mapped into a feature map, and the width and height of the feature map are input 1/2 n of the image, where n is usually 5.
  • the convolutional neural network algorithm includes a feature extraction module, a multi-label classification module, an upsampling module, and a 2D frame object detection module; the specific steps include: Through the CNN feature extraction module, the image data is mapped into a feature map; this application uses a lightweight network as a feature extraction module, such as MobileNetV2, CSPNet, after the feature extraction module, the input image is mapped
  • the feature map of the detection head passes through the convolutional layer to obtain three branches: center point data, center point offset data, and size data.
  • step S26 the feature map of the detection head passes through the convolution layer to obtain three branches: the coordinates of the center point position can be obtained from the center point data, the offset value of the center point position can be obtained from the center point offset data, and the size data can be obtained in 2D The width and height of the box.
  • this application extracts the features of the picture through the feature extraction module, then performs the multi-label classification task through the multi-label classification module, and obtains the multi-label classification data; through the up-sampling module, the 2D frame object detection module performs 2D frame detection The task obtains center point data, center point offset data, and dimension data.
  • step S7 analyzes the inference results and fuses information to obtain scene perception results.
  • the inference result is analyzed and information fusion is performed to obtain the scene perception result, including: S71 analyzing the center point data, center point offset data, and size data information to obtain 2D detection frame information; S72 analyzing the multi-label classification data to obtain the scene information; S73 fuses 2D detection frame information and classification information to obtain the final scene perception result.
  • step 71 the center point, the center point offset, and the three branch information of the size are analyzed to obtain the 2D detection frame information, including: S711 predicting the position coordinates of the center point of the fuzzy 2D frame through the center point data; S712 through the center
  • the point offset data can predict the offset of the center point of the 2D frame, and add it to the fuzzy center point to obtain the final precise 2D frame center position coordinates;
  • S713 can predict the width and height information of the 2D frame through the size data;
  • S714 can predict the width and height information of the 2D frame through the center point data, center Information fusion of point offset data and size data to obtain complete 2D detection frame information.
  • step S73 includes: obtaining object information through 2D detection frame information, and predicting scene information through multi-label classification data.
  • the object position and quantity can be obtained through the 2D detection frame information, and further the object and quantity information contained in the picture can be obtained.
  • Multi-label classification data can predict scene information.
  • perception results such as “sunny day”, “daytime”, “urban road”, “asphalt road” and “intersection” can be output.
  • Perception results of existing technologies sunny, daytime, city roads.
  • the perception results of the scene perception method of this application sunny day, daytime, city road, asphalt road, intersection, less than 5 vehicles, traffic lights, traffic signs. Recognize more objects and be more precise.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

本申请涉及感知模型训练方法、基于感知模型的场景感知方法;应用于汽车电子产品中,所述感知模型训练方法包括以下步骤:S1初始化感知模型参数;S2获取图像数据,并对所述图像数据进行前向运算,获得图像数据的损失函数总值;S3根据所述损失函数总值,进行后向运算对感知模型参数进行更新;S4迭代步骤S2到S3,当迭代次数达到预设次数时,获得最终感知模型。本申请的有益效果是:本申请训练感知模型,并通过感知模型获取场景图像的数据,准确度高,同时通过卷积神经网络算法进行2D物体检测任务和多目标分类任务,大大降低计算量,可在车载处理器上进行实时计算,得到感知结果。

Description

感知模型训练方法、基于感知模型的场景感知方法 技术领域
本申请涉及汽车电子技术领域,更具体地,涉及感知模型训练方法、基于感知模型的场景感知方法。
背景技术
近年来借助人工智能领域的技术突破,ADAS(高级驾驶辅助***)和无人驾驶技术有了飞速发展。车辆可通过车载摄像头等车载传感器获取车辆当前环境信息,之后车载处理***通过感知算法对传感器信息进行处理提炼,供车辆控制***做出相应决策。感知算法作为无人驾驶***的重要部分,为车辆的安全提供先决条件和保障。
相关技术中基于图像分类算法,能粗略的给出大致信息,但不能给出更加具体的信息,比如分类算法能给出场景中有人和车辆,但是不能得到例如:人少于5,车辆多于10,交通标志牌等更加细致的信息,而且相关技术对于场景中小尺寸物体的感知力弱,如:交通灯,交通标志牌。相比下本技术不但能给出诸如天气,光照,时段,路段等情况,还不能针对更加敏感物体,如行人、车辆,交通标识等给出更加具体的结构化信息。
发明内容
本申请为克服上述现有技术中的问题,本申请提供感知模型训练方法、基于感知模型的场景感知方法。
一种感知模型训练方法,应用于汽车电子产品中,所述方法包括以下步骤:
S1初始化感知模型参数;
S2获取图像数据,并对所述图像数据进行前向运算,获得图像数据的损失函数总值;
S3根据所述损失函数总值,进行后向运算对感知模型参数进行更新;
S4迭代步骤S2到S3,当迭代次数达到预设次数时,获得最终感知模型。
可选地,在步骤S1中,所述感知模型参数通过随机值赋值进行初始化。
可选地,在步骤S2中,所述获取图像数据,包括以下步骤:
S21通过车载摄像采集多个图像数据;
S22对每一所述图像数据进行预处理,并在每一图像数据标注形成数据集。
可选地,所述步骤S22包括:
S221通过相机标定算法,对所述图像数据进行标定获得畸变参数;
S222通过矫正公式获得获取矫正坐标,并通过双线性差值算法得到矫正图像;
S223进行图像缩放,并对图像进行2D框和多标签标注;
其中,所述矫正公式为:
Figure PCTCN2021135453-appb-000001
k 1,k 2,p 1,p 2,k3为畸变参数,x,y为矫正后像素坐标,x′,y′为矫正前像素坐标,r=x 2+y 2
可选地,在步骤S2中,所述对所述图像数据进行前向运算基于卷积神经网络算法进行计算,具体步骤包括:
S23通过CNN特征提取模块,将图像数据映射为特征图;
S24将所述特征图链接线性映射层,将特征图映射为C scene×1的矩阵,C scene×1的矩阵为多标签分类分支输出的多标签分类数据,C scene为多标签分类分支的分类数;
S25对所述特征图进行2 n-2倍上采样,得到宽高为原图像
Figure PCTCN2021135453-appb-000002
的检测头特征图,其中R=4为下采样因子。
S26所述检测头特征图分别经过卷积层得到三个分支输出的数据:中心点数据、中心点偏移数据、尺寸数据。
可选地,所述损失函数总值为多标签分离损失函数与2D框物体检测损失函数之和。
可选地,所述多标签分离损失函数,通过以下公式计算:
Figure PCTCN2021135453-appb-000003
其中,Y i为类别真值,
Figure PCTCN2021135453-appb-000004
为类别估计值。
可选地,所述2D框物体检测损失函数为中心点损失函数、中心点偏移损失函数、尺寸损失函数之和;
其中,
中心点损失函数通过以下公式计算:
Figure PCTCN2021135453-appb-000005
其中α,β为中心点损失函数的超参数,
Figure PCTCN2021135453-appb-000006
为中心点位置真值,
Figure PCTCN2021135453-appb-000007
为中心点位置预测值,W,H为原图像宽,高,R为下采样因子,C为检测类别数。
中心点偏移损失函数通过以下公式计算:
Figure PCTCN2021135453-appb-000008
(4),其中,p∈R 2为中心点位置真实值,
Figure PCTCN2021135453-appb-000009
为中心点位置预测值,
Figure PCTCN2021135453-appb-000010
为中心点位置偏移预测值
尺寸损失函数通过以下公式计算:
Figure PCTCN2021135453-appb-000011
其中,
Figure PCTCN2021135453-appb-000012
为2D框尺寸预测值,s k为2D框尺寸真值。
可选地,在步骤S23中,所述根据所述损失函数总值,进行后向运算对感知模型参数进行更新,包括:
根据所述损失函数总值通过链式求导法则进行后向运算,对感知模型参数进行更新。
此外,本申请还提供一种基于感知模型的场景感知方法,所述感知模型通过上述的一种感知模型训练方法获得;所述场景感知方法包括:
S5获取场景图像,并对所述输入图像进行预处理,获得输入图像;
S6将所述输入图像,输入所述感知模型进行推断,获得推断结果;
S7对所述推断结果进行解析和信息融合得出场景感知结果。
可选地,所述步骤S5包括:
S51通过相机标定算法,对所述场景图像进行标定获得畸变参数;
S52通过矫正公式获得获取矫正坐标,并通过双线性差值算法得到矫正图像;
S53进行图像缩放,获得输入图像;
其中,所述矫正公式为:
Figure PCTCN2021135453-appb-000013
k 1,k 2,p 1,p 2,k3为畸变参数,x,y为矫正后像素坐标,x′,y′为矫正前像素坐标,r=x 2+y 2
可选地,所述步骤S6包括:
将所述输入图像输入到感知模型,所述感知模型输出的推断结果为多标签分类数据,以及2D框的中心点数据、中心点偏移数据、尺寸数据。
可选地,在所述步骤S7中,所述对所述推断结果进行解析和信息融合得出场景感知结果,包括:
S71解析中心点数据、中心点偏移数据、尺寸数据信息得到2D检测框信息;
S72解析多标签分类数据得到场景信息;
S73融合2D检测框信息和分类信息得到最终场景感知结果。
可选地,在步骤S71中,所述解析中心点,中心点偏移,尺寸三个分支信息得到2D检测框信息,包括:
S711通过中心点数据预测模糊2D框中心点位置坐标;
S712通过中心点偏移数据可以预测2D框中心点的偏移,与模糊中心点相加得到最后的精确2D框中心位置坐标;
S713通过尺寸数据可以预测2D框的宽高信息;
S714通过中心点数据、中心点偏移数据、尺寸数据的信息融合得到完整2D检测框信息。
可选地,所述步骤S73包括:通过所述2D检测框信息获得物体信息,通过多标签分类数据预测场景信息。
与现有技术相比,本申请的有益效果是:本申请训练感知模型,并通过感知模型获取场景图像的数据,准确度高,同时通过卷积神经网络算法进行2D物体检测任务和多目标分类任务,大大降低计算量,可在车载处理器上进行实时计算,得到感知结果。
附图说明
图1为本申请实施例的感知模型训练方法流程图。
图2为本申请实施例的场景感知方法流程图。
图3为本申请实施例的场景图像示意图。
图4为本申请实施例的矫正图像示意图。
图5为本申请实施例的卷积网络算法示意图。
具体实施方式
下面结合具体实施方式对本申请作进一步的说明。
本申请实施例的附图中相同或相似的标号对应相同或相似的部件;在本申请的描述中,需要理解的是,若有术语“上”、“下”、“左”、“右”、“顶”、“底”、“内”、“外”等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述本申请和简化描述,而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作,因此附图中描述位置关系的用语仅用于示例性说明,不能理解为对本专利的限制。
此外,若有“第一”、“第二”等术语仅用于描述目的,主要是用于区分不同的装置、元件或组成部分(具体的种类和构造可能相同也可能不同),并非用于表明或暗示所指示装置、元件或组成部分的相对重要性和数量,而不能理解为指示或者暗示相对重要性。
在如图1所示的实施例中,本申请提供了一种感知模型训练方法,应用于汽车电子产品中,本方法包括以下步骤:
S1初始化感知模型参数;在步骤S1中,感知模型参数通过随机值赋值进行初始化。可通过***的随机数生成函数,生成相应随机数并对感知模型的参数进行赋值。
S2获取图像数据,并对图像数据进行前向运算,获得图像数据的损失函数总值;在步骤2中,获取图像数据,包括以下步骤:S21通过车载摄像采集多个图像数据;S22对每一图像数据进行预处理,并在每一图像数据标注形成数据集。对图像数据进行前向运算基于卷积神经网络算法进行计算,具体步骤包括:S23通过CNN特征提取模块,将图像数据映射为特征图;S24将特征图链接线性映射层,将特征图映射为C scene×1的矩阵,C scene×1的矩阵为多标签分类分支的输出的多标签分类数据,C scene为多标签分类分支的分类数;S25对特征图进行2 n-2倍上采样,得到宽高为原图像
Figure PCTCN2021135453-appb-000014
的检测头特征图,其中R=4为下采样因子。S26检 测头特征图分别经过卷积层得到三个分支:中心点数据、中心点偏移数据、尺寸数据。前向运算即根据卷积神经网络算法对图像数据依次进行特征提取、特征分类、特征上采样、特征2D框物体检测;通过前向运算获取多标签分类分支,以及2D框的中心点数据、中心点偏移数据、尺寸数据。且根据前向运算,计算图像数据的损失函数总值。
S3根据损失函数总值,进行后向运算对感知模型参数进行更新;在步骤S3中,后向运算为前向运算的反向推导,并根据损失函数总值,对感知模型的函数进行更新。
S4迭代步骤S2到S3,当迭代次数达到预设次数时,获得最终感知模型。在步骤S4,本申请通过用过上一次S3生成的感知模型参数,作为下一次S2的感知模型参数对S2、S3多次迭代,预设次数可以根据最终输入图像数据,获得数据与预先获取2D框和多标签标注连续一致时,则停止迭代,生成最终的感知模型。在本实施例中,预设次数可以是10万次。
在本实施例中,本申请基于CNN的网络模型,准确度高,同时利用多任务学习复用特征提取模块同时进行2D物体检测任务和多目标分类任务,大大降低计算量,可在车载处理器上进行实时计算,得到感知结果。
在一些实施例中,在步骤S2中,获取图像数据,包括以下步骤:
S21通过车载摄像采集多个图像数据;在步骤S21中,本申请的图像数据可通过车载摄像头进行获取,通过车辆在道路行驶持续获得图像数据。
S22对每一图像数据进行预处理,并在每一图像数据标注形成数据集。在步骤S22中包括:S221通过相机标定算法,对图像数据进行标定获得畸变参数;S222通过矫正公式获得获取矫正坐标,并通过双线性差值算法得到矫正图像如图4;S223进行图像缩放,并对图像进行2D框和多标签标注;其中,矫正公式为:
Figure PCTCN2021135453-appb-000015
k 1,k 2,p 1,p 2,k3为畸变参数,x,y为矫正后像素坐标,x′,y′为矫正前像素坐标,r=x 2+y 2。在本实施例中,相机标定算法可以是张正友标定算法,图像 缩放可以是采用双线性差值方法,将图像缩放到统一尺寸,如1280像素x720像素。对图像进行2D框和多标签标注可以通过人工对图像数据内的特征进行2D框选择及多标签标注。
在一些实施例中,在步骤S2中,对图像数据进行前向运算基于卷积神经网络算法进行计算,其中,卷积神经网络算法包括特征提取模块、多标签分类模块、上采样模块、2D框物体检测模块;具体步骤包括:
S23通过CNN特征提取模块,将图像数据映射为特征图;在步骤S23中,轻量级网络作为特征提取模块,例如MobileNetV2,CSPNet,经过特征提取模块后输入图片被映射为特征图,特征图宽高为输入图像的1/2 n,n通常为5。
S24将特征图链接线性映射层,将特征图映射为C scene×1的矩阵,C scene×1的矩阵为多标签分类分支输出的多标签分类数据,C scene为多标签分类分支的分类数;
S25对特征图进行2 n-2倍上采样,得到宽高为原图像
Figure PCTCN2021135453-appb-000016
的检测头特征图,其中R=4为下采样因子。在本实施例中,为了进行2D框物体检测,需要利用上采样模块对特征图进行2 n-2倍上采样,得到宽高为原图像
Figure PCTCN2021135453-appb-000017
的检测头特征图,其中R=4为下采样因子。
S26检测头特征图分别经过卷积层得到三个分支输出的数据:中心点数据、中心点偏移数据、尺寸数据。在步骤S26中,检测头特征图分别经过卷积层得到三个分支:中心点数据可得到中心点位置的坐标、中心点偏移数据可得到中心点位置的偏移值、尺寸数据可到2D框的宽高。
在本实施例中,参见图5,在本申请的感知模型中,有四个计算分支,即计算操作,分别为多标签分类分支、中心点分支、中心点偏移分支、尺寸分支,而线性映射层、上采样、卷积层是上述分支中具体的操作。在多标签分类分支中,输入特征图,通过线性映射层操作,多标签分类分支输出多标签分类数据,即C scene×1的矩阵。在中心点分支中,输入检测头特征图,通过卷积层操作,中心点分支输出中心点数据,即
Figure PCTCN2021135453-appb-000018
的矩阵。在中心点偏移分支中,输入检测头特征图,通过卷积层操作,中心点偏移分支输出中心点偏移数据,即
Figure PCTCN2021135453-appb-000019
的矩阵。在尺寸分支中,输入检测头特征图,通过卷积层操作,尺寸分支输出尺寸数据,即
Figure PCTCN2021135453-appb-000020
的矩阵其中,中心点分支,中心点偏移分支,尺寸分支,一起组成2D框物体检测任务。本申请将图片进行S23进行特征提取后,通过S24进行多标签 分类任务,获取多标签分类数据;通过S24上采样、S25 2D框检测任务获取中心点数据、中心点偏移数据、尺寸数据。
在一些实施例中,损失函数总值为多标签分离损失函数与2D框物体检测损失函数之和。
其中,多标签分离损失函数,通过以下公式计算:
Figure PCTCN2021135453-appb-000021
Y i为类别真值,
Figure PCTCN2021135453-appb-000022
为类别估计值。
2D框物体检测损失函数为中心点损失函数、中心点偏移损失函数、尺寸损失函数之和;
中心点损失函数通过以下公式计算:
Figure PCTCN2021135453-appb-000023
其中α,β为中心点损失函数的超参数,
Figure PCTCN2021135453-appb-000024
为中心点位置真值,
Figure PCTCN2021135453-appb-000025
为中心点位置预测值,W,H为原图像宽,高,R为下采样因子,C为检测类别数。
中心点偏移损失函数通过以下公式计算:
Figure PCTCN2021135453-appb-000026
(4),其中,p∈R 2为中心点位置真实值,
Figure PCTCN2021135453-appb-000027
为中心点位置预测值,
Figure PCTCN2021135453-appb-000028
为中心点位置偏移预测值
尺寸损失函数通过以下公式计算:
Figure PCTCN2021135453-appb-000029
其中,
Figure PCTCN2021135453-appb-000030
为2D框尺寸预测值,s k为2D框尺寸真值。
在本实施例中,本申请通过计算前向运算各部分的损失函数,计算损失函数总值。
在一些实施例中,在步骤S23中,根据损失函数总值,进行后向运算对感知模型参数进行更新,包括:根据损失函数总值通过链式求导法则进行后向运算,对感知模型参数进行更新。在本实施例中,根据链式求导法则,沿前向运算的各 步骤的反向进行后向运算,将前向运算输入图像和输出的结果、损失函数总值,计算并更新感知模型参数。其中,后向运算都是机器学习库,如PyToch、TensorFlow等模型实现。
在如图2所示的实施例中,本申请还提供一种基于感知模型的场景感知方法,感知模型通过上述的一种感知模型训练方法获得;场景感知方法包括:
S5获取场景图像,并对输入图像进行预处理,获得输入图像;在步骤S5中,场景图像通过车辆行驶时通过摄像头进行实时获取,具体包括:S51通过相机标定算法,对场景图像进行标定获得畸变参数;S52通过矫正公式获得获取矫正坐标,并通过双线性差值算法得到矫正图像;S53进行图像缩放,获得输入图像。S53进行图像缩放,获得输入图像;其中,矫正公式为:
Figure PCTCN2021135453-appb-000031
k 1,k 2,p 1,p 2,k3为畸变参数,x,y为矫正后像素坐标,x′,y′为矫正前像素坐标,r=x 2+y 2。在本实施例中,相机标定算法可以是张正友标定算法,图像缩放可以是采用双线性差值方法,将图像缩放到统一尺寸,如1280像素x720像素。
S6将输入图像,输入感知模型进行推断,获得推断结果;在步骤S6中,将输入图像输入到感知模型,感知模型输出的推断结果为多标签分类数据,以及2D框的中心点数据、中心点偏移数据、尺寸数据。具体地,对输入图像进行前向运算基于卷积神经网络算法进行计算,其中,卷积神经网络算法包括特征提取模块、多标签分类模块、上采样模块、2D框物体检测模块;具体步骤包括:通过CNN特征提取模块,将图像数据映射为特征图;本申请通过轻量级网络作为特征提取模块,例如MobileNetV2,CSPNet,经过特征提取模块后输入图片被映射为特征图,特征图宽高为输入图像的1/2 n,n通常为5。将特征图链接线性映射层,将特征图映射为C scene×1的矩阵,C scene×1的矩阵为多标签分类分支输出的多标签分类数据,C scene为多标签分类分支的分类数;;对特征图进行2 n-2倍上采样,得到宽高为原图像
Figure PCTCN2021135453-appb-000032
的检测头特征图,其中R=4为下采样因子。在本实施例中,为了进行2D框物体检测,需要利用上采样模块对特征图进行2 n-2倍上采样,得到宽高为原图像
Figure PCTCN2021135453-appb-000033
的检测头特征图,其中R=4为下采样因子。检测头特征 图分别经过卷积层得到三个分支:中心点数据、中心点偏移数据、尺寸数据。在步骤S26中,检测头特征图分别经过卷积层得到三个分支:中心点数据可得到中心点位置的坐标、中心点偏移数据可得到中心点位置的偏移值、尺寸数据可到2D框的宽高。参见图5,本申请将图片通过特征提取模块进行特征提取后,通过多标签分类模块进行多标签分类任务,获取多标签分类数据;通过上采样模块上采样、2D框物体检测模块进行2D框检测任务获取中心点数据、中心点偏移数据、尺寸数据。
S7对推断结果进行解析和信息融合得出场景感知结果。在步骤S7中,对推断结果进行解析和信息融合得出场景感知结果,包括:S71解析中心点数据、中心点偏移数据、尺寸数据信息得到2D检测框信息;S72解析多标签分类数据得到场景信息;S73融合2D检测框信息和分类信息得到最终场景感知结果。
在一些实施例中,在步骤71中,解析中心点,中心点偏移,尺寸三个分支信息得到2D检测框信息,包括:S711通过中心点数据预测模糊2D框中心点位置坐标;S712通过中心点偏移数据可以预测2D框中心点的偏移,与模糊中心点相加得到最后的精确2D框中心位置坐标;S713通过尺寸数据可以预测2D框的宽高信息;S714通过中心点数据、中心点偏移数据、尺寸数据的信息融合得到完整2D检测框信息。
在一些实施例中,步骤S73包括:通过2D检测框信息获得物体信息,通过多标签分类数据预测场景信息。在本实施例中,通过2D检测框信息可以得到物***置和数量,进而可以得到该张图片中包含的物体和数量信息。举例说明:如图3所示,由于检测到2辆车,可输出“车辆少于5”感知结果;由于检测到交通灯,可输出“交通灯”感知结果。多标签分类数据可以预测场景信息。举例说明:如图3所示,可输出“晴天”,“白天”,“城市道路”,“柏油马路”,“路口”等感知结果。现有技术的感知结果:晴天,白天,城市道路。本申请场景感知方法的感知结果:晴天,白天,城市道路,柏油马路,路口,车辆少于5,交通灯,交通标志牌。识别物体更多,更加精确。
显然,本申请的上述实施例仅仅是为清楚地说明本申请所作的举例,而并非是对本申请的实施方式的限定。对于所属领域的普通技术人员来说,在上述说明的基础上还可以做出其它不同形式的变化或变动。这里无需也无法对所有的实施方式予以穷举。凡在本申请的精神和原则之内所作的任何修改、等同替换和改进 等,均应包含在本申请权利要求的保护范围之内。

Claims (15)

  1. 一种感知模型训练方法,其特征在于,应用于汽车电子产品中,所述方法包括以下步骤:
    S1初始化感知模型参数;
    S2获取图像数据,并对所述图像数据进行前向运算,获得图像数据的损失函数总值;
    S3根据所述损失函数总值,进行后向运算对感知模型参数进行更新;
    S4迭代步骤S2到S3,当迭代次数达到预设次数时,获得最终感知模型。
  2. 根据权利要求1所述的一种感知模型训练方法,其特征在于,在步骤S1中,所述感知模型参数通过随机值赋值进行初始化。
  3. 根据权利要求1所述的一种感知模型训练方法,其特征在于,在步骤S2中,所述获取图像数据,包括以下步骤:
    S21通过车载摄像采集多个图像数据;
    S22对每一所述图像数据进行预处理,并在每一图像数据标注形成数据集。
  4. 根据权利要求3所述的一种感知模型训练方法,其特征在于,所述步骤S22包括:
    S221通过相机标定算法,对所述图像数据进行标定获得畸变参数;
    S222通过矫正公式获得获取矫正坐标,并通过双线性差值算法得到矫正图像;
    S223进行图像缩放,并对图像进行2D框和多标签标注;
    其中,所述矫正公式为:
    Figure PCTCN2021135453-appb-100001
    k 1,k 2,p 1,p 2,k3为畸变参数,x,y为矫正后像素坐标,x′,y′为矫正前像素坐标,r=x 2+y 2
  5. 根据权利要求1所述的一种感知模型训练方法,其特征在于,在步骤S2中,所述对所述图像数据进行前向运算基于卷积神经网络算法进行计算,具体步骤包括:
    S23通过CNN特征提取模块,将图像数据映射为特征图;
    S24将所述特征图链接线性映射层,将特征图映射为C scene×1的矩阵,C scene×1的矩阵为多标签分类分支输出的多标签分类数据数据,C scene为多标签分类分支的分类数;
    S25对所述特征图进行2 n-2倍上采样,得到宽高为原图像
    Figure PCTCN2021135453-appb-100002
    的检测头特征图,其中R=4为下采样因子。
    S26所述检测头特征图分别经过卷积层得到三个分支输出数据:中心点数据、中心点偏移数据、尺寸数据。
  6. 根据权利要求1所述的一种感知模型训练方法,其特征在于,所述损失函数总值为多标签分离损失函数与2D框物体检测损失函数之和。
  7. 根据权利要求6所述的一种感知模型训练方法,其特征在于,所述多标签分离损失函数,通过以下公式计算:
    Figure PCTCN2021135453-appb-100003
    (2),其中,Y i为类别真值,
    Figure PCTCN2021135453-appb-100004
    为类别估计值。
  8. 根据权利要求6所述的一种感知模型训练方法,其特征在于,所述2D框物体检测损失函数为中心点损失函数、中心点偏移损失函数、尺寸损失函数之和;
    其中,
    中心点损失函数通过以下公式计算:
    Figure PCTCN2021135453-appb-100005
    其中α,β为中心点损失函数的超参数,
    Figure PCTCN2021135453-appb-100006
    为中心点位置真值,
    Figure PCTCN2021135453-appb-100007
    为中心点位置预测值,W,H为原图像宽,高,R为下采样因子,C为检测类别数。
    中心点偏移损失函数通过以下公式计算:
    Figure PCTCN2021135453-appb-100008
    (4),其中,p∈R 2为中心点位置真实值,
    Figure PCTCN2021135453-appb-100009
    为中心点位置预测值,
    Figure PCTCN2021135453-appb-100010
    为中心点位置偏移预测值
    尺寸损失函数通过以下公式计算:
    Figure PCTCN2021135453-appb-100011
    其中,
    Figure PCTCN2021135453-appb-100012
    为2D框尺寸预测值,s k为2D框尺寸真值。
  9. 根据权利要求1所述的一种感知模型训练方法,其特征在于,在步骤S23中,所述根据所述损失函数总值,进行后向运算对感知模型参数进行更新,包括:
    根据所述损失函数总值通过链式求导法则进行后向运算,对感知模型参数进行更新。
  10. 一种基于感知模型的场景感知方法,其特征在于,所述感知模型通过权利要求1-9任一项所述的一种感知模型训练方法获得;所述场景感知方法包括:
    S5获取场景图像,并对所述输入图像进行预处理,获得输入图像;
    S6将所述输入图像,输入所述感知模型进行推断,获得推断结果;
    S7对所述推断结果进行解析和信息融合得出场景感知结果。
  11. 根据权利要求10所述的场景感知方法,其特征在于,所述步骤S5包括:
    S51通过相机标定算法,对所述场景图像进行标定获得畸变参数;
    S52通过矫正公式获得获取矫正坐标,并通过双线性差值算法得到矫正图像;
    S53进行图像缩放,获得输入图像;
    其中,所述矫正公式为:
    Figure PCTCN2021135453-appb-100013
    k 1,k 2,p 1,p 2,k3为畸变参数,x,y为矫正后像素坐标,x′,y′为矫正前像素坐标,r=x 2+y 2
  12. 根据权利要求10所述的场景感知方法,其特征在于,所述步骤S600包括:
    将所述输入图像输入到感知模型,所述感知模型输出的推断结果为多标签分类数据,以及2D框的中心点数据、中心点偏移数据、尺寸数据。
  13. 根据权利要求10所述的场景感知方法,其特征在于,在所述步骤S7中,所述对所述推断结果进行解析和信息融合得出场景感知结果,包括:
    S71解析中心点数据、中心点偏移数据、尺寸数据信息得到2D检测框信息;
    S72解析多标签分类数据得到场景信息;
    S73融合2D检测框信息和分类信息得到最终场景感知结果。
  14. 根据权利要求13所述的场景感知方法,其特征在于,在步骤S71中,所 述解析中心点,中心点偏移,尺寸三个分支信息得到2D检测框信息,包括:
    S711通过中心点数据预测模糊2D框中心点位置坐标;
    S712通过中心点偏移数据可以预测2D框中心点的偏移,与模糊中心点相加得到最后的精确2D框中心位置坐标;
    S713通过尺寸数据可以预测2D框的宽高信息;
    S714通过中心点数据、中心点偏移数据、尺寸数据的信息融合得到完整2D检测框信息。
  15. 根据权利要求13所述的场景感知方法,其特征在于,所述步骤S73包括:通过所述2D检测框信息获得物体信息,通过多标签分类数据预测场景信息。
PCT/CN2021/135453 2021-09-16 2021-12-03 感知模型训练方法、基于感知模型的场景感知方法 WO2023040068A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111091669.2A CN113780453A (zh) 2021-09-16 2021-09-16 感知模型训练方法、基于感知模型的场景感知方法
CN202111091669.2 2021-09-16

Publications (1)

Publication Number Publication Date
WO2023040068A1 true WO2023040068A1 (zh) 2023-03-23

Family

ID=78851817

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/135453 WO2023040068A1 (zh) 2021-09-16 2021-12-03 感知模型训练方法、基于感知模型的场景感知方法

Country Status (2)

Country Link
CN (1) CN113780453A (zh)
WO (1) WO2023040068A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116229426A (zh) * 2023-05-09 2023-06-06 华东交通大学 基于全景环视图像的无人驾驶泊车停车位检测方法
CN116630755A (zh) * 2023-04-10 2023-08-22 雄安创新研究院 一种检测场景图像中的文本位置的方法、***和存储介质
CN116630751A (zh) * 2023-07-24 2023-08-22 中国电子科技集团公司第二十八研究所 一种融合信息瓶颈和不确定性感知的可信目标检测方法
CN117152138A (zh) * 2023-10-30 2023-12-01 陕西惠宾电子科技有限公司 一种基于无监督学习的医学图像肿瘤目标检测方法
CN117854028A (zh) * 2024-03-07 2024-04-09 南京信息工程大学 一种自动驾驶多任务场景分析方法及***

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117376716A (zh) * 2023-10-17 2024-01-09 深圳深知未来智能有限公司 一种细粒度场景感知的动态ae控制方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109508580A (zh) * 2017-09-15 2019-03-22 百度在线网络技术(北京)有限公司 交通信号灯识别方法和装置
KR20190105541A (ko) * 2019-08-26 2019-09-17 엘지전자 주식회사 오염로그를 이용하여 청소를 수행하는 인공 지능 로봇 및 그 방법
CN112016605A (zh) * 2020-08-19 2020-12-01 浙江大学 一种基于边界框角点对齐和边界匹配的目标检测方法
CN112464911A (zh) * 2020-12-21 2021-03-09 青岛科技大学 基于改进YOLOv3-tiny的交通标志检测与识别方法
CN112906617A (zh) * 2021-03-08 2021-06-04 济南大学 一种基于手部检测的驾驶员异常行为识别方法与***

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109508580A (zh) * 2017-09-15 2019-03-22 百度在线网络技术(北京)有限公司 交通信号灯识别方法和装置
KR20190105541A (ko) * 2019-08-26 2019-09-17 엘지전자 주식회사 오염로그를 이용하여 청소를 수행하는 인공 지능 로봇 및 그 방법
CN112016605A (zh) * 2020-08-19 2020-12-01 浙江大学 一种基于边界框角点对齐和边界匹配的目标检测方法
CN112464911A (zh) * 2020-12-21 2021-03-09 青岛科技大学 基于改进YOLOv3-tiny的交通标志检测与识别方法
CN112906617A (zh) * 2021-03-08 2021-06-04 济南大学 一种基于手部检测的驾驶员异常行为识别方法与***

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HUANG YUEZHEN, WANG NAIZHOU; LIANG TIANCAI; ZHAO QINGLI: "Vehicle Recognition Method Based on Improved CenterNet", HUANAN LIGONG DAXUE XUEBAO - JOURNAL OF SOUTH CHINA UNIVERSITY OF TECHNOLOGY, GUANGZHOU, CH, vol. 49, no. 7, 31 July 2021 (2021-07-31), CH , pages 94 - 102, XP093048125, ISSN: 1000-565X, DOI: 10.12141/j.issn.1000-565X.200496 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116630755A (zh) * 2023-04-10 2023-08-22 雄安创新研究院 一种检测场景图像中的文本位置的方法、***和存储介质
CN116630755B (zh) * 2023-04-10 2024-04-02 雄安创新研究院 一种检测场景图像中的文本位置的方法、***和存储介质
CN116229426A (zh) * 2023-05-09 2023-06-06 华东交通大学 基于全景环视图像的无人驾驶泊车停车位检测方法
CN116630751A (zh) * 2023-07-24 2023-08-22 中国电子科技集团公司第二十八研究所 一种融合信息瓶颈和不确定性感知的可信目标检测方法
CN116630751B (zh) * 2023-07-24 2023-10-31 中国电子科技集团公司第二十八研究所 一种融合信息瓶颈和不确定性感知的可信目标检测方法
CN117152138A (zh) * 2023-10-30 2023-12-01 陕西惠宾电子科技有限公司 一种基于无监督学习的医学图像肿瘤目标检测方法
CN117152138B (zh) * 2023-10-30 2024-01-16 陕西惠宾电子科技有限公司 一种基于无监督学习的医学图像肿瘤目标检测方法
CN117854028A (zh) * 2024-03-07 2024-04-09 南京信息工程大学 一种自动驾驶多任务场景分析方法及***
CN117854028B (zh) * 2024-03-07 2024-05-24 南京信息工程大学 一种自动驾驶多任务场景分析方法及***

Also Published As

Publication number Publication date
CN113780453A (zh) 2021-12-10

Similar Documents

Publication Publication Date Title
WO2023040068A1 (zh) 感知模型训练方法、基于感知模型的场景感知方法
Zhou et al. Automated evaluation of semantic segmentation robustness for autonomous driving
US11458912B2 (en) Sensor validation using semantic segmentation information
CN111169468B (zh) 一种自动泊车的***及方法
US20180307921A1 (en) Image-Based Pedestrian Detection
US11551365B2 (en) Methods and systems for computer-based determining of presence of objects
US20170039436A1 (en) Fusion of RGB Images and Lidar Data for Lane Classification
US11348263B2 (en) Training method for detecting vanishing point and method and apparatus for detecting vanishing point
CN112088380A (zh) 图像分割
US20230358533A1 (en) Instance segmentation imaging system
US11586865B2 (en) Apparatus, system and method for fusing sensor data to do sensor translation
US20230222671A1 (en) System for predicting near future location of object
US11483480B2 (en) Simulated rolling shutter image data
US20210150410A1 (en) Systems and Methods for Predicting Instance Geometry
Masihullah et al. Attention based coupled framework for road and pothole segmentation
Alkhorshid et al. Road detection through supervised classification
Choi et al. Methods to detect road features for video-based in-vehicle navigation systems
CN115705693A (zh) 用于传感器数据的标注的方法、***和存储介质
Wei et al. Creating semantic HD maps from aerial imagery and aggregated vehicle telemetry for autonomous vehicles
WO2023017317A1 (en) Environmentally aware prediction of human behaviors
CN113920479A (zh) 一种目标检测网络构建和目标检测方法、装置及电子设备
Sharma et al. Deep Learning-Based Object Detection and Classification for Autonomous Vehicles in Different Weather Scenarios of Quebec, Canada
Zheng et al. Exploring OpenStreetMap availability for driving environment understanding
Salzmann et al. Online Path Generation from Sensor Data for Highly Automated Driving Functions
Khosroshahi Learning, classification and prediction of maneuvers of surround vehicles at intersections using lstms

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE