WO2021175006A1

WO2021175006A1 - Vehicle image detection method and apparatus, and computer device and storage medium

Info

Publication number: WO2021175006A1
Application number: PCT/CN2021/070733
Authority: WO
Inventors: 丁晶晶
Original assignee: 深圳壹账通智能科技有限公司
Priority date: 2020-03-04
Filing date: 2021-01-08
Publication date: 2021-09-10
Also published as: CN111461170A

Abstract

Disclosed are a vehicle image detection method and apparatus, and a computer device and a storage medium. By improving the convolution kernel and pooling layers of a convolutional neural network, the convolutional neural network is expanded to support the convolution of 3D images. A box network is constructed, and the output features of each of 3D pooling layers are identified by using a three-dimensional sliding box, so that in a detection process, the position information of each component can be output, and by means of further convolution and pooling operations in a classification network, it can be determined whether the position information of each component is the position information of a damaged part, and the position information of the damaged part can be located more accurately while ensuring the efficiency of identifying each component.

Description

车辆图像检测方法、装置、计算机设备及存储介质Vehicle image detection method, device, computer equipment and storage medium

本申请要求于2020年3月4日提交中国专利局、申请号为202010142987.6，发明名称为“车辆图像检测方法、装置、计算机设备及存储介质”的中国专利申请的优先权，其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on March 4, 2020, the application number is 202010142987.6, and the invention title is "Vehicle image detection method, device, computer equipment and storage medium", the entire content of which is incorporated by reference Incorporated in this application.

技术领域Technical field

本申请涉及计算机视觉领域，尤其涉及一种车辆图像检测方法、装置、计算机设备及存储介质。This application relates to the field of computer vision, and in particular to a vehicle image detection method, device, computer equipment and storage medium.

To

背景技术Background technique

随着科学技术的发展，车辆在人们的日常生活中占据了越来越重要的位置。在车辆由于碰撞或者其他情况出现损伤时需要对车辆进行定损，目前存在通过图像识别算法来对车辆图像中的损伤部位进行检测和识别的技术，然而，发明人意识到，目前在对车辆部件采用的车辆图像识别和检测技术的识别准确率和检测精度都比较低，从而不能够准确判断车辆部件的损伤程度。With the development of science and technology, vehicles occupy an increasingly important position in people's daily lives. When the vehicle is damaged due to a collision or other conditions, it is necessary to determine the damage of the vehicle. At present, there is a technology to detect and identify the damaged part in the vehicle image through the image recognition algorithm. However, the inventor realizes that the current The recognition accuracy and detection accuracy of the adopted vehicle image recognition and detection technology are relatively low, so that it is impossible to accurately determine the degree of damage to vehicle components.

To

申请内容Application content

本申请实施例提供一种车辆图像检测方法、装置、计算机设备及存储介质，以解决车辆图像检测定位精度低的问题。The embodiments of the present application provide a vehicle image detection method, device, computer equipment, and storage medium to solve the problem of low vehicle image detection and positioning accuracy.

一种车辆图像检测方法，包括：A vehicle image detection method, including:

获取待识别图像，所述待识别图像为通过三维图像采集设备采集的车辆图像经过图像处理之后得到的；Acquiring an image to be identified, where the image to be identified is obtained after image processing of a vehicle image collected by a three-dimensional image acquisition device;

获取车辆图像识别模型，所述车辆图像识别模型包括卷积神经网络和框网络；Acquiring a vehicle image recognition model, the vehicle image recognition model including a convolutional neural network and a frame network;

将所述待识别图像输入至所述卷积神经网络中，得到所述卷积神经网络的池化层输出特征，所述卷积神经网络的卷积层采用3D卷积核，池化层采用3D池化层；Input the image to be recognized into the convolutional neural network to obtain the output characteristics of the pooling layer of the convolutional neural network. The convolutional layer of the convolutional neural network adopts a 3D convolution kernel, and the pooling layer adopts 3D pooling layer;

将所述池化层输出特征输入至框网络中，采用三维滑动框对3D池化层中每一池化层输出特征进行识别，得到所述待识别图像的识别信息。Input the output features of the pooling layer into the frame network, and use a three-dimensional sliding frame to identify the output features of each pooling layer in the 3D pooling layer to obtain the identification information of the image to be identified.

一种车辆图像检测装置，包括：A vehicle image detection device, including:

第一获取模块，用于获取待识别图像，所述待识别图像为通过三维图像采集设备采集的车辆图像经过图像处理之后得到的；The first acquisition module is configured to acquire an image to be identified, the image to be identified is obtained after image processing of a vehicle image acquired by a three-dimensional image acquisition device;

第二获取模块，用于获取车辆图像识别模型，所述车辆图像识别模型包括卷积神经网络和框网络；The second acquisition module is used to acquire a vehicle image recognition model, the vehicle image recognition model including a convolutional neural network and a frame network;

第一输入模块，用于将所述待识别图像输入至所述卷积神经网络中，得到所述卷积神经网络的池化层输出特征，所述卷积神经网络的卷积层采用3D卷积核，池化层采用3D池化层；The first input module is used to input the image to be recognized into the convolutional neural network to obtain the output characteristics of the pooling layer of the convolutional neural network, and the convolutional layer of the convolutional neural network adopts a 3D volume Accumulate core, pooling layer adopts 3D pooling layer;

第二输入模块，用于将所述池化层输出特征输入至框网络中，采用三维滑动框对3D池化层中每一池化层输出特征进行识别，得到所述待识别图像的识别信息。The second input module is used to input the output features of the pooling layer into the box network, and use a three-dimensional sliding box to identify the output features of each pooling layer in the 3D pooling layer to obtain the identification information of the image to be identified .

一种计算机设备，包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令，其中，所述处理器执行所述计算机可读指令时实现如下步骤：A computer device includes a memory, a processor, and computer-readable instructions that are stored in the memory and can run on the processor, wherein the processor implements the following steps when the processor executes the computer-readable instructions:

一个或多个存储有计算机可读指令的可读存储介质，其中，所述计算机可读指令被一个或多个处理器执行时，使得所述一个或多个处理器执行如下步骤：One or more readable storage media storing computer readable instructions, where when the computer readable instructions are executed by one or more processors, the one or more processors execute the following steps:

上述车辆图像检测方法、装置、计算机设备及存储介质，通过对卷积神经网络的卷积核和池化层进行改进，将卷积神经网络扩充至能够支持3D图像进行卷积。构造框网络，采用三维滑动框对3D池化层中每一池化层输出特征进行识别，使得在检测过程中，能够输出各个部件的位置信息，并通过在分类网络中的进一步卷积和池化操作，能够判断所述各个部件的位置信息是否为损伤部位的位置信息，在保证识别各个部件的效率的前提下，还能够更精确的定位到损伤部位的位置信息。The aforementioned vehicle image detection method, device, computer equipment and storage medium improve the convolution kernel and pooling layer of the convolutional neural network to expand the convolutional neural network to support the convolution of 3D images. Construct a frame network, and use a three-dimensional sliding frame to identify the output features of each pooling layer in the 3D pooling layer, so that in the detection process, the position information of each component can be output, and through further convolution and pooling in the classification network The operation can determine whether the position information of each component is the position information of the damaged part. Under the premise of ensuring the efficiency of identifying each component, the position information of the damaged part can be located more accurately.

本申请的一个或多个实施例的细节在下面的附图和描述中提出，本申请的其他特征和优点将从说明书、附图以及权利要求变得明显。The details of one or more embodiments of the present application are presented in the following drawings and description, and other features and advantages of the present application will become apparent from the description, drawings and claims.

To

附图说明Description of the drawings

为了更清楚地说明本申请实施例的技术方案，下面将对本申请实施例的描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其他的附图。In order to explain the technical solutions of the embodiments of the present application more clearly, the following will briefly introduce the drawings that need to be used in the description of the embodiments of the present application. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative labor.

图1是本申请一实施例中车辆图像检测方法的一应用环境示意图；FIG. 1 is a schematic diagram of an application environment of a vehicle image detection method in an embodiment of the present application;

图2是本申请一实施例中车辆图像检测方法的一流程图；FIG. 2 is a flowchart of a vehicle image detection method in an embodiment of the present application;

图3是本申请一实施例中车辆图像检测方法的另一流程图；FIG. 3 is another flowchart of the vehicle image detection method in an embodiment of the present application;

图4是本申请一实施例中车辆图像检测方法的另一流程图；FIG. 4 is another flowchart of the vehicle image detection method in an embodiment of the present application;

图5是本申请一实施例中车辆图像检测方法的另一流程图；FIG. 5 is another flowchart of the vehicle image detection method in an embodiment of the present application;

图6是本申请一实施例中车辆图像检测方法的另一流程图；FIG. 6 is another flowchart of the vehicle image detection method in an embodiment of the present application;

图7是本申请一实施例中车牌校验装置的一原理框图；Fig. 7 is a principle block diagram of the license plate verification device in an embodiment of the present application;

图8是本申请一实施例中车牌校验装置的另一原理框图；FIG. 8 is another principle block diagram of the license plate verification device in an embodiment of the present application;

图9是本申请一实施例中车牌校验装置的另一原理框图；FIG. 9 is another principle block diagram of the license plate verification device in an embodiment of the present application;

图10是本申请一实施例中车牌校验装置的另一原理框图；FIG. 10 is another principle block diagram of the license plate verification device in an embodiment of the present application;

图11是本申请一实施例中车牌校验装置的另一原理框图；11 is another principle block diagram of the license plate verification device in an embodiment of the present application;

图12是本申请一实施例中计算机设备的一示意图。Fig. 12 is a schematic diagram of a computer device in an embodiment of the present application.

To

具体实施方式Detailed ways

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all of them. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.

本申请实施例提供一种车辆图像检测方法，该图像检测方法可应用如图1所示的应用环境中。具体地，该图像检测方法应用在图像检测***中，该图像检测***包括如图1所示的客户端和服务器，客户端与服务器通过网络进行通信，用于解决车辆图像检测定位精度低的问题。其中，客户端又称为用户端，是指与服务器相对应,为客户提供本地服务的程序。客户端可安装在但不限于各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备上。服务器可以用独立的服务器或者是多个服务器组成的服务器集群来实现。The embodiment of the present application provides a vehicle image detection method, and the image detection method can be applied in the application environment shown in FIG. 1. Specifically, the image detection method is applied in an image detection system. The image detection system includes a client and a server as shown in Figure 1. The client and the server communicate through the network to solve the problem of low vehicle image detection and positioning accuracy. . Among them, the client is also called the client, which refers to the program that corresponds to the server and provides local services to the client. The client can be installed on, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server can be implemented as an independent server or a server cluster composed of multiple servers.

在一实施例中，如图2所示，提供一种车辆图像检测方法，以该方法应用在图1中的服务器为例进行说明，包括如下步骤：In an embodiment, as shown in FIG. 2, a vehicle image detection method is provided. The method is applied to the server in FIG. 1 as an example for description, and includes the following steps:

S11：获取待识别图像，所述待识别图像为通过三维图像采集设备采集的车辆图像经过图像处理之后得到的。S11: Acquire an image to be identified, where the image to be identified is obtained after image processing is performed on a vehicle image collected by a three-dimensional image acquisition device.

其中，待识别图像为需要进行识别的车辆图像，待识别图像为通过三维图像采集设备采集的车辆图像经过图像处理之后得到的。具体地，所述通过三维图像采集设备采集的车辆图像的数据结构可以为：1*H*W*D，所述1代表所述车辆图像为单通道图像，H为所述车辆图像的长度的数据，W为所述车辆图像的宽度的数据，D为所述车辆图像的景深的数据。示例性地，将所述景深的数据设置为512，所述H和W的具体数据取决于所述三维图像采集设备的像素情况。可选地，所述三维图像采集设备可以为tof3d摄像头、双目摄像头或者其他可以采集景深通道的图像采集设备。Among them, the image to be recognized is a vehicle image that needs to be recognized, and the image to be recognized is obtained after image processing is performed on a vehicle image collected by a three-dimensional image acquisition device. Specifically, the data structure of the vehicle image collected by the three-dimensional image acquisition device may be: 1*H*W*D, where 1 represents that the vehicle image is a single-channel image, and H is the length of the vehicle image. Data, W is the data of the width of the vehicle image, and D is the data of the depth of field of the vehicle image. Exemplarily, the data of the depth of field is set to 512, and the specific data of H and W depends on the pixel situation of the three-dimensional image acquisition device. Optionally, the three-dimensional image acquisition device may be a tof3d camera, a binocular camera, or other image acquisition devices that can acquire a depth channel.

所述图像处理步骤可以包括图像增强、归一化等处理过程。优选地，所述图像处理过程还可以包括：对所述待识别图像的数据结构中的H和W进行数据重构（resize），以将H和W的值分别设定为512。可选地，数据重构可以包括以下几种参数控制方法实现：最近邻插值法，双线性插值法和双立方插值法。优选地，对H和W两个方向进行数据重构，因此，可采用双线性插值法对H和W两个方向进行数据重构。The image processing steps may include image enhancement, normalization and other processing procedures. Preferably, the image processing process may further include: performing data reconstruction (resize) on H and W in the data structure of the image to be recognized, so as to set the values of H and W to 512, respectively. Optionally, data reconstruction may include the following parameter control methods: nearest neighbor interpolation, bilinear interpolation, and bicubic interpolation. Preferably, the data is reconstructed in the two directions of H and W. Therefore, the bilinear interpolation method can be used to reconstruct the data in the two directions of H and W.

示例性地，对车辆图像数据结构中的H和W进行数据重构，得到H和W的数据都为512，而D的初始设定为512。因此在进行数据重构后，所述车辆图像数据结构为：1*512*512*512。Exemplarily, data reconstruction is performed on H and W in the vehicle image data structure, and the data of H and W are both 512, and the initial setting of D is 512. Therefore, after data reconstruction, the vehicle image data structure is: 1*512*512*512.

S12：获取车辆图像识别模型，所述车辆图像识别模型包括卷积神经网络和框网络。S12: Obtain a vehicle image recognition model, where the vehicle image recognition model includes a convolutional neural network and a frame network.

其中，所述车辆图像识别模型是采用卷积神经网络和框网络训练得到的，用于对所述待识别图像进行识别。Wherein, the vehicle image recognition model is obtained by training using a convolutional neural network and a frame network, and is used to recognize the image to be recognized.

具体地，所述卷积神经网络包含五大层，第一层为输入层，第二层到第四层为中间层，第二层到第四层中，每一层包含三小层，其中一个小层包含一个池化层，第五层为输出层，仅包含一个池化层。Specifically, the convolutional neural network includes five major layers, the first layer is the input layer, the second to fourth layers are intermediate layers, and the second to fourth layers each contain three small layers, one of which is The small layer contains a pooling layer, and the fifth layer is the output layer, which contains only one pooling layer.

所述框网络是结合所述卷积神经网络第二层到第五层的池化层构造的，所述卷积神经网络的第二层到第五层的池化层的结果作为所述框网络的输入，所述框网络的实质是一个带滑动窗口的神经网络分类器，即在第二层到第五层上，使用3d的立体框进行滑动，每个立体框接分类器，分辨框中是否存在物体，以及物体的类型，以此来确定是否为目标种类的位置信息。The box network is constructed by combining the pooling layers of the second to fifth layers of the convolutional neural network, and the result of the pooling layers of the second to fifth layers of the convolutional neural network is used as the box The input of the network, the essence of the box network is a neural network classifier with sliding window, that is, on the second to fifth layers, 3D stereo boxes are used for sliding, and each stereo box is connected to the classifier to distinguish the boxes. Whether there is an object in and the type of the object in order to determine whether it is the location information of the target type.

S13：将所述待识别图像输入至所述卷积神经网络中，得到所述卷积神经网络的池化层输出特征，所述卷积神经网络的卷积层采用3D卷积核，池化层采用3D池化层。S13: Input the image to be recognized into the convolutional neural network to obtain the output characteristics of the pooling layer of the convolutional neural network, and the convolutional layer of the convolutional neural network adopts a 3D convolution kernel to pool The layer adopts 3D pooling layer.

其中，所述池化层输出特征是指所述卷积神经网络的第二层到第五层所有的池化层的结果。具体地，将所述待识别图像输入至所述卷积神经网络中，所述待识别图像从输入层到卷积层进行卷积操作，通过卷积操作可得到所述待识别图像的卷积层输出特征，将所述卷积层输出特征输入池化层，从卷积层到池化层进行池化操作，得到池化层输出特征。从卷积层到池化层进行池化操作的目的在于减少上一层得到的卷积层输出特征的数量，进一步缩小输出特征的范围。Wherein, the output feature of the pooling layer refers to the result of all pooling layers from the second layer to the fifth layer of the convolutional neural network. Specifically, the image to be recognized is input into the convolutional neural network, and the image to be recognized is subjected to a convolution operation from the input layer to the convolution layer, and the convolution of the image to be recognized can be obtained through the convolution operation. Layer output features, input the output features of the convolutional layer to the pooling layer, and perform a pooling operation from the convolutional layer to the pooling layer to obtain the output features of the pooling layer. The purpose of the pooling operation from the convolutional layer to the pooling layer is to reduce the number of output features of the convolutional layer obtained from the previous layer, and to further reduce the range of output features.

可选地，所述3D卷积核的大小为：3*3*3，所述3D池化层的尺寸为：2*2*2。Optionally, the size of the 3D convolution kernel is: 3*3*3, and the size of the 3D pooling layer is: 2*2*2.

S14：将所述池化层输出特征输入至框网络中，采用三维滑动框对3D池化层中每一池化层输出特征进行识别，得到所述待识别图像的识别信息。S14: Input the output features of the pooling layer into the frame network, and use a three-dimensional sliding frame to identify the output features of each pooling layer in the 3D pooling layer to obtain the identification information of the image to be identified.

其中，所述三维滑动框是指，在所述每一池化层上滑动的三维立体框，所述三维滑动框在不同层级的池化层中使用的三维立体框的大小是不同的。所述待识别图像的识别信息指的是，当三维滑动框对3D池化层中每一池化层输出特征进行识别，得到的各个汽车部件的位置信息。The three-dimensional sliding frame refers to a three-dimensional frame that slides on each pooling layer, and the size of the three-dimensional frame used by the three-dimensional sliding frame in pooling layers of different levels is different. The identification information of the image to be identified refers to the position information of each automobile component obtained when the three-dimensional sliding frame recognizes the output feature of each pooling layer in the 3D pooling layer.

将所述池化层输出特征输入至框网络中，作为框网络的输入，控制每一池化层的三维滑动框在对应的池化层的起点开始滑动识别，并以两个像素为间隔进行滑动，随着每一层的三维滑动框在对应池化层上滑动，对所述池化层输出特征进行识别，能够识别出所述待识别图像的识别信息，即各个汽车部件的位置信息。进一步地，将框网络中识别到所述待识别图像的识别信息的三维滑动框的大小输入至分类网络，所述分类网络中间层为多层卷积网络，对输入层中三维滑动框的大小进行卷积和池化操作，在输出层进行全连接后，进行softmax分类，以此确定所述待识别图像的识别信息是否为损伤部位的位置信息。Input the output features of the pooling layer into the box network, as the input of the box network, control the three-dimensional sliding box of each pooling layer to start sliding recognition at the starting point of the corresponding pooling layer, and perform two pixel intervals Sliding, as the three-dimensional sliding frame of each layer slides on the corresponding pooling layer, the output characteristics of the pooling layer are identified, and the identification information of the image to be identified, that is, the position information of each automobile component can be identified. Further, the size of the three-dimensional sliding box that recognizes the identification information of the image to be recognized in the box network is input to the classification network, and the middle layer of the classification network is a multi-layer convolutional network, and the size of the three-dimensional sliding box in the input layer Perform convolution and pooling operations. After the output layer is fully connected, softmax classification is performed to determine whether the identification information of the image to be recognized is the location information of the damaged part.

在本实施例中，通过对卷积神经网络的卷积核和池化层进行改进，将卷积神经网络扩充至能够支持3D图像进行卷积。构造框网络，采用三维滑动框对3D池化层中每一池化层输出特征进行识别，使得在检测过程中，能够输出各个部件的位置信息，并通过在分类网络中的进一步卷积和池化操作，能够判断所述各个部件的位置信息是否为损伤部位的位置信息，在保证识别各个部件的效率的前提下，还能够更精确的定位到损伤部位的位置信息。In this embodiment, by improving the convolution kernel and pooling layer of the convolutional neural network, the convolutional neural network is expanded to support the convolution of 3D images. Construct a frame network, and use a three-dimensional sliding frame to identify the output features of each pooling layer in the 3D pooling layer, so that in the detection process, the position information of each component can be output, and through further convolution and pooling in the classification network The operation can determine whether the position information of each component is the position information of the damaged part. Under the premise of ensuring the efficiency of identifying each component, the position information of the damaged part can be located more accurately.

在一个实施例中，所述车辆图像识别模型还包括热点网络，在所述得到所述车辆图像的识别信息之后，所述车辆图像检测方法还包括：In one embodiment, the vehicle image recognition model further includes a hotspot network, and after the vehicle image identification information is obtained, the vehicle image detection method further includes:

S15：若所述待识别图像的识别信息为第一类型信息，则将所述池化层输出特征输入至热点网络中，所述第一类型信息为指示所述待识别图像中存在损伤部位。S15: If the identification information of the image to be identified is the first type of information, input the output feature of the pooling layer into a hotspot network, where the first type of information indicates that there is a damaged part in the image to be identified.

具体地，在采用三维滑动框对3D池化层中每一池化层输出特征进行识别之后，可以得到待识别图像中各个汽车部件的位置信息，而且在经过所述分类网络进行分类后，能识别所述待识别图像中是否存在损伤部位。若所述待识别图像中存在损伤部位，则所述待识别图像的识别信息为第一类型信息。进一步地，若所述待识别图像的识别信息为第一类型信息，则将激活热点网络，并将所述池化层输出特征输入至热点网络中。Specifically, after the three-dimensional sliding frame is used to identify the output features of each pooling layer in the 3D pooling layer, the position information of each car component in the image to be identified can be obtained, and after classification by the classification network, the position information can be obtained. Identify whether there is a damaged part in the image to be identified. If there is a damaged part in the image to be identified, the identification information of the image to be identified is the first type of information. Further, if the identification information of the image to be identified is the first type of information, the hotspot network will be activated, and the output characteristics of the pooling layer will be input into the hotspot network.

在另一具体实施例中，若所述待识别图像的识别信息不为第一类型信息，即所述待识别图像中不存在损伤部位时，则所述热点网络不会被激活，即所述池化层输出特征不会输入至热点网络中。In another specific embodiment, if the identification information of the image to be identified is not the first type of information, that is, when there is no damaged part in the image to be identified, the hotspot network will not be activated, that is, the The output features of the pooling layer will not be input to the hotspot network.

S16：在所述热点网络中，对所述池化层输出特征进行卷积分类识别，得到热点区域信息。S16: In the hotspot network, perform volume integration type recognition on the output features of the pooling layer to obtain hotspot area information.

具体地，在所述热点网络被激活后，将所述池化层输出特征输入至热点网络，在所述热点网络的输入层和输出层之间衔接一个卷积分类网络，采用卷积分类网络对所述池化层输出特征进行卷积分类识别，对所述第一类型信息的关键部位的关键点进行标识，提取所述关键点的点阵值数量最集中的区域信息作为热点区域信息，所述热点网络的输出层输出的是一个三维的one-hot点阵图。Specifically, after the hotspot network is activated, the output characteristics of the pooling layer are input to the hotspot network, a convolutional network is connected between the input layer and the output layer of the hotspot network, and a convolutional network is adopted. Performing volume integral recognition on the output features of the pooling layer, identifying the key points of the key parts of the first type of information, and extracting the area information with the most concentrated number of lattice values of the key points as the hotspot area information, The output layer of the hotspot network outputs a three-dimensional one-hot bitmap.

其中，所述关键部位为损伤部位的位置信息。由于在第三层到第五层的每一池化层中，会存在多个类似于关键部位的关键点，因此对所有类似于关键部位的关键点的点阵值数量进行比较，选出关键点的点阵值数量最集中的区域信息作为热点区域信息。由于所述待识别图像的识别信息经过分类网络识别后，会得到多个第一类型信息，从而经过热点网络中卷积分类网络识别得到的关键部位也会有多个，因此所述热点区域信息也会有多个。Wherein, the key part is the location information of the damaged part. Since in each pooling layer from the third layer to the fifth layer, there will be multiple key points similar to the key parts, the number of lattice values of all the key points similar to the key parts is compared to select the key The area information where the number of dot matrix values is the most concentrated is used as the hot spot area information. Since the identification information of the image to be identified is identified by the classification network, multiple first-type information will be obtained, and there will be multiple key parts identified by the volume integral network in the hotspot network, so the hotspot area information There will be more than one.

S17：对所述热点区域信息进行中值滤波，得到滤波后的热点区域信息。S17: Perform median filtering on the hot spot area information to obtain filtered hot spot area information.

具体地，对热点区域信息进行中值滤波是指通过将热点区域信息中的每一像素点的灰度值设置为该点某邻域窗口内的所有像素点灰度值的中值的过程。由于对所述池化层输出特征进行卷积分类过程中会对边缘造成一定的损失，为了保护图像的边缘信号，采用中值滤波能够有效做到保护边缘信号。Specifically, performing median filtering on the hotspot area information refers to a process of setting the gray value of each pixel in the hotspot area information to the median of the gray values of all pixels in a certain neighborhood window of the point. Since a certain loss will be caused to the edge in the process of convolution and integration of the output features of the pooling layer, in order to protect the edge signal of the image, the median filter can effectively protect the edge signal.

其中，所述中值滤波的滤波核的大小为所述热点网络输入大小的1/8，并向上取整。由于所述热点网络输入大小是所述池化层输出特征决定的，因此在不同层级的池化层上的滤波核的大小也是不同的。示例性地，若第二层池化层大小为256*256*256，则第二层上使用的滤波核大小为：32*32*32。Wherein, the size of the filter core of the median filter is 1/8 of the input size of the hotspot network, and is rounded up. Since the input size of the hotspot network is determined by the output characteristics of the pooling layer, the size of the filter kernels on the pooling layer of different levels is also different. Exemplarily, if the size of the second layer pooling layer is 256*256*256, the filter core size used on the second layer is: 32*32*32.

S18：对滤波后的热点区域信息进行区域最大值提取，得到所述待识别图像的关键位置信息。S18: Perform area maximum extraction on the filtered hotspot area information to obtain key position information of the image to be recognized.

由于所述热点区域信息会有多个，从而所述滤波后的热点区域信息是有多个的，因此不能确定哪一个为关键部位的关键点三维坐标，即关键位置信息，需要对所有滤波后的热点区域信息点阵图的点阵值进行比较，并将滤波后的热点区域信息点阵图的点阵值为最大值的区域作为所述待识别图像的关键位置信息的区域。Since there will be more than one hotspot area information, there will be more than one hotspot area information after filtering. Therefore, it is impossible to determine which one is the key point three-dimensional coordinate of the key part, that is, the key position information. The dot matrix values of the hot spot area information dot map are compared, and the area with the maximum dot matrix value of the filtered hot spot area information dot map is used as the key position information area of the image to be recognized.

其中，所述待识别图像的关键位置信息指的是：所述关键部位的关键点的三维坐标。Wherein, the key position information of the image to be recognized refers to the three-dimensional coordinates of the key point of the key part.

在本实施例中，若框网络在所述池化层输出特征中识别出汽车损伤部位，则激活热点网络，热点网络能够提取框网络中识别出来的关键部位的关键位置，并输出所述关键位置的三维坐标，在保证能够准确识别到损伤部位的前提下，输出了损伤部位的三维坐标，提高了识别精度和准确性。In this embodiment, if the frame network identifies the damaged part of the car in the output features of the pooling layer, the hot spot network is activated. The hot spot network can extract the key positions of the key parts identified in the frame network, and output the key The three-dimensional coordinates of the location, on the premise of ensuring that the damaged part can be accurately identified, output the three-dimensional coordinates of the damaged part, which improves the recognition accuracy and accuracy.

在一实施例中，对滤波后的热点区域信息进行区域最大值提取，得到所述待识别图像的关键位置信息，包括：In an embodiment, extracting the maximum value of the area from the filtered hotspot area information to obtain the key position information of the image to be recognized includes:

S181：获取所有滤波后的热点区域信息点阵图的点阵值。S181: Obtain all the dot matrix values of the filtered hot spot area information dot matrix image.

其中，由于所述待识别图像的识别信息经过分类网络识别后，会得到多个第一类型信息，从而经过热点网络中卷积分类网络识别得到的关键部位也会有多个，因此所述热点区域信息也会有多个，进而滤波后的热点区域信息也会有多个，对所有滤波后的热点区域信息点阵图的点阵值进行提取。Wherein, since the identification information of the image to be recognized is recognized by the classification network, multiple first-type information will be obtained, so that there will also be multiple key parts recognized by the volume integral network in the hotspot network, so the hotspot There will also be multiple area information, and there will also be multiple hot spot area information after filtering, and the dot matrix values of all the filtered hot spot area information bitmaps are extracted.

S182：比较所有滤波后的热点区域信息点阵图的点阵值，提取所述滤波后的热点区域信息的最大值。S182: Compare the dot matrix values of all filtered hot spot area information bitmaps, and extract the maximum value of the filtered hot spot area information.

其中，在获取所有滤波后的热点区域信息点阵图的点阵值之后，对所有滤波后的热点区域信息点阵图的点阵值进行比较，将滤波后的热点区域信息点阵图的点阵值为最大值的区域作为所述待识别图像的关键位置信息的区域，提取所述滤波后的热点区域信息点阵图的点阵值为最大值的区域中的最大值，作为所述待识别图像的关键位置信息，即所述待识别图像的损伤部位的三维坐标。Among them, after obtaining the dot matrix values of all the filtered hot spot area information bitmaps, the dot matrix values of all the filtered hot spot area information bitmaps are compared, and the points of the filtered hot spot area information bitmap are compared. The area with the maximum value is taken as the area of the key position information of the image to be recognized, and the maximum value in the area with the maximum value of the filtered hot spot area information dot matrix is extracted as the area to be recognized. The key position information of the recognition image, that is, the three-dimensional coordinates of the damaged part of the image to be recognized.

在本实施例中，由于识别出包含损伤部位的关键部位可以有很多个，为了进一步确定损伤部位的关键位置，对所有滤波后的热点区域信息点阵图的点阵值进行比较，提取区域最大值，并且能够输出损伤部位的关键位置的三维坐标，提高了损伤部位的识别率和定位的准确率。In this embodiment, since it is recognized that there can be many key parts including the damaged part, in order to further determine the key position of the damaged part, the dot matrix values of all the filtered hot spot area information bitmaps are compared, and the extracted area is the largest It can output the three-dimensional coordinates of the key position of the damaged part, which improves the recognition rate of the damaged part and the accuracy of positioning.

在一个实施例中，所述采用三维滑动框对3D池化层中每一池化层输出特征进行识别，得到所述待识别图像的识别信息，包括：In one embodiment, the use of a three-dimensional sliding frame to identify the output features of each pooling layer in the 3D pooling layer to obtain the identification information of the image to be identified includes:

S141：根据每一池化层的大小，确定每一池化层上三维滑动框的大小和个数。S141: Determine the size and number of three-dimensional sliding boxes on each pooling layer according to the size of each pooling layer.

其中，由于在卷积神经网络内，每经过一个池化层，所述卷积神经网络的维度就要降低2*2*2，所以每一池化层上的三维滑动框的大小和个数是不同的，所述卷积神经网络第一层，即输入层的大小为512*512*512。Among them, since in the convolutional neural network, the dimensionality of the convolutional neural network will be reduced by 2*2*2 every time it passes through a pooling layer, so the size and number of three-dimensional sliding boxes on each pooling layer The difference is that the size of the first layer of the convolutional neural network, that is, the input layer, is 512*512*512.

具体地，每一层池化层的三维滑动框的大小和个数确定方式如下：Specifically, the method for determining the size and number of the three-dimensional sliding boxes of each pooling layer is as follows:

第二层：经过第二层的池化层，维度降低为256*256*256，对第二层进行16等分，得到第二层的每一个三维滑动框大小为16*16*16。The second layer: After the second layer of pooling layer, the dimension is reduced to 256*256*256, the second layer is divided into 16 equal parts, and the size of each three-dimensional sliding frame of the second layer is 16*16*16.

第三层：经过第三层的池化层，维度降低为128*128*128，对第三层进行8等分，得到第三层的每一个三维滑动框等同的大小为16*16*16，但是从视野上看，第三层的每一个三维滑动框的大小为32*32*32。The third layer: After the third layer of pooling layer, the dimension is reduced to 128*128*128, the third layer is divided into 8 equal parts, and the equivalent size of each three-dimensional sliding frame of the third layer is 16*16*16 , But from the view point of view, the size of each three-dimensional sliding frame on the third layer is 32*32*32.

第四层：经过第四层的池化层，维度降低为64*64*64，对第四层进行8等分，得到第四层的每一个三维滑动框等同的大小为16*16*16，但是从视野上看，第四层的每一个三维滑动框的大小为64*64*64。The fourth layer: After the fourth layer of pooling layer, the dimension is reduced to 64*64*64, the fourth layer is divided into 8 equal parts, and the equivalent size of each three-dimensional sliding frame of the fourth layer is 16*16*16 , But from the view point of view, the size of each three-dimensional sliding frame on the fourth layer is 64*64*64.

第五层：经过第五层的池化层，维度降低为32*32*32，对第五层进行8等分，得到第五层的每一个三维滑动框等同的大小为16*16*16，但是从视野上看，第五层的每一个三维滑动框的大小为128*128*128。Fifth layer: After the fifth layer of pooling layer, the dimension is reduced to 32*32*32, the fifth layer is divided into 8 equal parts, and the equivalent size of each three-dimensional sliding frame of the fifth layer is 16*16*16 , But from the view point of view, the size of each three-dimensional sliding frame on the fifth layer is 128*128*128.

S142：控制所述三维滑动框在每一池化层上滑动，对所述池化层输出特征进行识别，得到识别信息。S142: Control the three-dimensional sliding frame to slide on each pooling layer, and identify the output characteristics of the pooling layer to obtain identification information.

其中，所述识别信息指的是：所述三维滑动框在每一池化层上滑动过程中识别到的框选信息。具体地，将所述池化层输出特征输入至框网络中，作为框网络的输入，控制每一池化层的三维滑动框在对应的池化层的起点开始滑动识别，并以两个像素为间隔进行滑动，随着每一层的三维滑动框在对应池化层上滑动，对所述池化层输出特征进行识别，能够识别出所述识别信息。The identification information refers to the frame selection information recognized during the sliding process of the three-dimensional sliding frame on each pooling layer. Specifically, input the output characteristics of the pooling layer into the box network, as the input of the box network, control the three-dimensional sliding box of each pooling layer to start sliding recognition at the starting point of the corresponding pooling layer, and use two pixels To slide at intervals, as the three-dimensional sliding frame of each layer slides on the corresponding pooling layer, the output characteristics of the pooling layer are identified, and the identification information can be identified.

S143：若所述识别信息为目标种类的位置信息，则将所述三维滑动框的框选大小输入至分类网络进行卷积分类。S143: If the identification information is the location information of the target category, input the frame selection size of the three-dimensional sliding frame into the classification network to perform volume integration.

其中，所述目标种类的位置信息为所述三维滑动框在所述池化层输出特征中识别出来的各个部件的位置信息。所述三维滑动框在所述池化层输出特征的识别过程中，会判断识别信息是否为检测部件的位置信息，如果是检测部件的位置信息，则还会判断检测部件的类型。若识别信息中包含了各个部件的位置信息，则将所述三维滑动框的框选大小输入至分类网络，对卷积神经网络和框网络进行训练，进一步确定目标种类的位置信息是否为损伤部位的位置信息。Wherein, the position information of the target type is the position information of each component identified by the three-dimensional sliding frame in the output feature of the pooling layer. In the process of identifying the output feature of the pooling layer, the three-dimensional sliding frame will determine whether the identification information is the position information of the detection component, and if it is the position information of the detection component, it will also determine the type of the detection component. If the position information of each component is included in the identification information, input the frame selection size of the three-dimensional sliding box into the classification network, train the convolutional neural network and the frame network, and further determine whether the position information of the target type is the damaged part Location information.

具体地，所述分类网络的结构包括：Specifically, the structure of the classification network includes:

输入层，即为所述三维滑动框的框选大小；The input layer is the selection size of the three-dimensional sliding box;

中间层，所述中间层为多层卷积网络，每一个卷积网络层有一个池化层，每一个池化层的步长为2，最后一层卷积后的大小为2*2*2；The middle layer, the middle layer is a multi-layer convolutional network, each convolutional network layer has a pooling layer, the step size of each pooling layer is 2, and the size of the last layer after convolution is 2*2* 2;

输出层，在输出层上进行全连接，在全连接之后进行softmax分类。The output layer is fully connected on the output layer, and softmax classification is performed after the full connection.

在本实施例中，通过框网络在池化层上使用三维滑动框，能够在识别所述池化层输出特征过程中，判断识别是否为检测物体，并判断检测物体的类型，在识别出检测物体和检测物体的类型后，将三维滑动框的框选大小输入至分类网络，对其进行卷积分类，提高了识别的准确率。In this embodiment, a three-dimensional sliding frame is used on the pooling layer through the frame network. In the process of recognizing the output characteristics of the pooling layer, it is possible to determine whether the recognition is a detection object, and to determine the type of the detection object. After detecting the type of the object and the object, the selection size of the three-dimensional sliding box is input to the classification network, and the volume integration class is performed on it, which improves the accuracy of recognition.

在一个实施例中，在所述获取车辆图像识别模型之前，所述车辆图像检测方法还包括：In an embodiment, before the obtaining the vehicle image recognition model, the vehicle image detection method further includes:

S19：获取训练样本集，所述训练样本集包括车辆样本图像和对应的标注数据，所述车辆样本图像为三维图像采集设备采集的图像经过图像处理之后得到的。S19: Obtain a training sample set, where the training sample set includes a vehicle sample image and corresponding annotation data, and the vehicle sample image is obtained after image processing is performed on an image collected by a three-dimensional image collection device.

其中，所述车辆样本图像是需要进行训练的车辆样本图像，所述标注数据用于对所述车辆样本图像进行分类，并用数字标记所述车辆样本图像分类的名称。具体地，所述通过三维图像采集设备采集的车辆样本图像的数据结构可以为：1*H*W*D，所述1代表所述车辆图像为单通道图像，H为所述车辆图像的长度的数据，W为所述车辆图像的宽度的数据，D为所述车辆图像的景深的数据。示例性地，将所述景深的数据设置为512，所述H和W的具体数据取决于所述三维图像采集设备的像素情况。可选地，所述三维图像采集设备可以为tof3d摄像头、双目摄像头或者其他可以采集景深通道的图像采集设备。Wherein, the vehicle sample image is a vehicle sample image that needs to be trained, the annotation data is used to classify the vehicle sample image, and the name of the vehicle sample image classification is marked with a number. Specifically, the data structure of the vehicle sample image collected by the three-dimensional image acquisition device may be: 1*H*W*D, where 1 represents that the vehicle image is a single-channel image, and H is the length of the vehicle image W is the data of the width of the vehicle image, and D is the data of the depth of field of the vehicle image. Exemplarily, the data of the depth of field is set to 512, and the specific data of H and W depends on the pixel situation of the three-dimensional image acquisition device. Optionally, the three-dimensional image acquisition device may be a tof3d camera, a binocular camera, or other image acquisition devices that can acquire a depth channel.

其中，所述图像处理步骤可以包括图像增强、归一化等处理过程。优选地，所述包括：在每个景深通道采取resize方法，可选地，所述resize方法可以包括以下几种参数控制方法：最近邻插值法，双线性插值法和双立方插值法。优选地，本申请采用的是双线性插值法。Wherein, the image processing steps may include processing procedures such as image enhancement and normalization. Preferably, the method includes: adopting a resizing method in each depth channel. Optionally, the resizing method may include the following parameter control methods: nearest neighbor interpolation method, bilinear interpolation method, and bicubic interpolation method. Preferably, the bilinear interpolation method is adopted in this application.

对车辆图像数据结构中的H和W进行数据重构，得到H和W的数据都为512，而D的初始设定为512。因此在进行数据重构后，所述车辆图像数据结构为：1*512*512*512。The data of H and W in the vehicle image data structure are reconstructed, and the data of H and W are both 512, and the initial setting of D is 512. Therefore, after data reconstruction, the vehicle image data structure is: 1*512*512*512.

S20：获取预设的神经网络模型，所述预设的神经网络模型包括卷积神经网络、框网络和热点网络。S20: Obtain a preset neural network model, where the preset neural network model includes a convolutional neural network, a box network, and a hotspot network.

其中，所述预设的神经网络模型是采用卷积神经网络、框网络和热点网络组成，用于对所述训练样本集进行识别训练。Wherein, the preset neural network model is composed of a convolutional neural network, a frame network, and a hotspot network, and is used to perform recognition training on the training sample set.

其中，所述卷积神经网络的结构共有5层，第一层为输入层，所述输入层的数据结构为：512*512*512，卷积核大小为：3*3*3，第一层共有64个卷积核；Wherein, the structure of the convolutional neural network has 5 layers, the first layer is the input layer, the data structure of the input layer is: 512*512*512, the size of the convolution kernel is: 3*3*3, the first A total of 64 convolution kernels in the layer;

第二到第四层为中间层，所述中间层的卷积核大小为：3*3*3，纵向维度每经过一个大层，所述卷积核的个数翻倍，所述中间层中有三小层，其中一层包含一个池化层，所述池化层的步长为2，即每经过一个池化层，数据结构的维度降低2*2*2；The second to fourth layers are intermediate layers. The size of the convolution kernel of the intermediate layer is: 3*3*3. When the longitudinal dimension passes through a large layer, the number of convolution kernels doubles, and the intermediate layer There are three small layers, one of which contains a pooling layer, and the step size of the pooling layer is 2, that is, after each pooling layer, the dimensionality of the data structure is reduced by 2*2*2;

第五层为输出层，所述输出层仅包含一个池化层，所述池化层的步长为2。The fifth layer is the output layer, the output layer contains only one pooling layer, and the step size of the pooling layer is 2.

所述热点网络是结合所述卷积神经网络第三层到第五层的池化层，在第三层到第五层的框网络的等同位置构造的，所述卷积神经网络的第二层到第五层的池化层的结果作为所述热点网络的输入，所述热点网络的输出是一个3维的one-hot点阵图，所述one-hot点阵图是一个与输入保持一致的0，x点阵图，其中，x表示为关键部位，0表示为非关键部位。所述热点网络的输入层和输出层之间连接卷积分类网络。The hotspot network is constructed by combining the pooling layers of the third to fifth layers of the convolutional neural network, and is constructed at the equivalent position of the frame network of the third to fifth layers, and the second of the convolutional neural network The result of the pooling layer from layer to the fifth layer is used as the input of the hotspot network. The output of the hotspot network is a three-dimensional one-hot bitmap. Consistent 0,x dot pattern, where x is a key part and 0 is a non-key part. A convolutional network is connected between the input layer and the output layer of the hotspot network.

S21：采用所述训练样本集对所述预设的神经网络模型进行训练，得到车辆图像识别模型。S21: Use the training sample set to train the preset neural network model to obtain a vehicle image recognition model.

其中，车辆图像识别模型是以预设的神经网络模型为基础，根据所述训练样本集进行训练后得到的模型。车辆图像识别模型建立的过程包括：将所述训练样本集输入到所述预设的神经网络模型中，定义所述预设的神经网络模型中的卷积神经网络，框网络和热点网络的网络结构，并初始化网络权重，定义前向传播过程，利用定义好的前向传播过程对预设的神经网络模型进行迭代训练，得到训练好的模型，并对训练好的模型进行测试和验证，得到车辆图像识别模型。Wherein, the vehicle image recognition model is based on a preset neural network model and is a model obtained after training according to the training sample set. The process of establishing a vehicle image recognition model includes: inputting the training sample set into the preset neural network model, and defining the network of the convolutional neural network, the frame network and the hotspot network in the preset neural network model Structure, initialize the network weights, define the forward propagation process, use the defined forward propagation process to iteratively train the preset neural network model to obtain the trained model, and test and verify the trained model to obtain Vehicle image recognition model.

在本实施例中，通过三维图像采集设备采集到车辆样本图像，并将车辆样本图像的数据结构重新调整，将长和宽的像素与所设置的景深方向的像素相同，避免了不同三维图像采集设备采集到的车辆样本图像的像素不同的问题；通过训练样本集对预设的神经网络模型，得到的车辆图像识别模型，能够更加准确，快速对后续需要识别的车辆图像进行识别和判定。In this embodiment, the vehicle sample image is collected by the three-dimensional image acquisition device, and the data structure of the vehicle sample image is readjusted, and the length and width pixels are the same as the pixels in the set depth direction, which avoids different three-dimensional image acquisition The problem of the different pixels of the vehicle sample images collected by the equipment; the vehicle image recognition model obtained through the preset neural network model through the training sample set can be more accurate and quickly identify and determine the subsequent vehicle images that need to be recognized.

应理解，上述实施例中各步骤的序号的大小并不意味着执行顺序的先后，各过程的执行顺序应以其功能和内在逻辑确定，而不应对本申请实施例的实施过程构成任何限定。It should be understood that the size of the sequence number of each step in the foregoing embodiment does not mean the order of execution. The execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present application.

在一实施例中，提供一种车辆图像检测装置，该车辆图像检测装置与上述实施例中车辆图像检测方法一一对应。如图7所示，该车辆图像检测装置包括第一获取模块11、第二获取模块12、第一输入模块13和第二输入模块14。各功能模块详细说明如下：In one embodiment, a vehicle image detection device is provided, and the vehicle image detection device corresponds to the vehicle image detection method in the above-mentioned embodiment in a one-to-one correspondence. As shown in FIG. 7, the vehicle image detection device includes a first acquisition module 11, a second acquisition module 12, a first input module 13 and a second input module 14. The detailed description of each functional module is as follows:

第一获取模块11，用于获取待识别图像，所述待识别图像为通过三维图像采集设备采集的车辆图像经过图像处理之后得到的；The first acquisition module 11 is configured to acquire an image to be identified, which is obtained after image processing of a vehicle image acquired by a three-dimensional image acquisition device;

第二获取模块12，用于获取车辆图像识别模型，所述车辆图像识别模型包括卷积神经网络和框网络；The second acquisition module 12 is configured to acquire a vehicle image recognition model, the vehicle image recognition model including a convolutional neural network and a frame network;

第一输入模块13，用于将所述待识别图像输入至所述卷积神经网络中，得到所述卷积神经网络的池化层输出特征，所述卷积神经网络的卷积层采用3D卷积核，池化层采用3D池化层；The first input module 13 is configured to input the image to be recognized into the convolutional neural network to obtain the output characteristics of the pooling layer of the convolutional neural network, and the convolutional layer of the convolutional neural network adopts 3D Convolution kernel, pooling layer adopts 3D pooling layer;

第二输入模块14，用于将所述池化层输出特征输入至框网络中，采用三维滑动框对3D池化层中每一池化层输出特征进行识别，得到所述待识别图像的识别信息。The second input module 14 is used to input the output features of the pooling layer into the box network, and use a three-dimensional sliding box to identify the output features of each pooling layer in the 3D pooling layer to obtain the recognition of the image to be identified information.

优选地，如图8所示，所述车辆图像检测装置还包括：Preferably, as shown in FIG. 8, the vehicle image detection device further includes:

第三输入模块15，用于在所述待识别图像的识别信息为第一类型信息时，将所述池化层输出特征输入至热点网络中，所述第一类型信息为指示所述待识别图像中存在损伤部位；The third input module 15 is configured to input the output characteristics of the pooling layer into the hotspot network when the identification information of the image to be recognized is the first type of information, and the first type of information indicates the to be recognized There are damaged parts in the image;

第一识别模块16，用于在所述热点网络中，对所述池化层输出特征进行卷积分类识别，得到热点区域信息；The first recognition module 16 is configured to perform volume integral recognition on the output features of the pooling layer in the hotspot network to obtain hotspot area information;

中值滤波模块17，用于对所述热点区域信息进行中值滤波，得到滤波后的热点区域信息；The median filtering module 17 is configured to perform median filtering on the hot spot area information to obtain filtered hot spot area information;

信息提取模块18，用于对滤波后的热点区域信息进行区域最大值提取，得到所述待识别图像的关键位置信息。The information extraction module 18 is used for extracting the maximum value of the area from the filtered hot area information to obtain the key position information of the image to be recognized.

优选地，如图9所示，所述信息提取模块18还包括：Preferably, as shown in FIG. 9, the information extraction module 18 further includes:

第三获取模块181，用于获取所述滤波后的热点区域信息点阵图的点阵值；The third obtaining module 181 is configured to obtain the bitmap value of the filtered hot spot area information bitmap;

第一比较模块182，用于比较所述滤波后的热点区域信息点阵图的点阵值，提取所述滤波后的热点区域信息的最大值。The first comparison module 182 is configured to compare the dot matrix values of the filtered hot spot area information dot map, and extract the maximum value of the filtered hot spot area information.

优选地，如图10所示，所述第二输入模块14还包括：Preferably, as shown in FIG. 10, the second input module 14 further includes:

第一计算模块141，用于根据每一池化层的大小，确定每一池化层上三维滑动框的大小和个数；The first calculation module 141 is configured to determine the size and number of three-dimensional sliding frames on each pooling layer according to the size of each pooling layer;

第二识别模块142，用于控制所述三维滑动框在每一池化层上滑动，对所述池化层输出特征进行识别，得到识别信息；The second recognition module 142 is configured to control the three-dimensional sliding frame to slide on each pooling layer, to identify the output characteristics of the pooling layer, and to obtain identification information;

第四输入模块143，用于在所述识别信息为目标种类的位置信息时，将所述三维滑动框的框选大小输入至分类网络进行卷积分类。The fourth input module 143 is configured to input the frame selection size of the three-dimensional sliding frame into the classification network to perform volume integration when the identification information is the location information of the target category.

优选地，如图11所示，所述车辆图像检测装置还包括：Preferably, as shown in FIG. 11, the vehicle image detection device further includes:

第三获取模块19，用于获取训练样本集，所述训练样本集包括车辆样本图像和对应的标注数据，所述车辆样本图像为三维图像采集设备采集的图像经过图像处理之后得到的；The third acquisition module 19 is configured to acquire a training sample set, the training sample set includes a vehicle sample image and corresponding annotation data, the vehicle sample image is obtained after image processing is performed on an image collected by a three-dimensional image acquisition device;

第四获取模块20，用于获取预设的神经网络模型，所述预设的神经网络模型包括卷积神经网络、框网络和热点网络；The fourth acquisition module 20 is configured to acquire a preset neural network model, where the preset neural network model includes a convolutional neural network, a box network, and a hotspot network;

第一训练模块21，用于采用所述训练样本集对所述预设的神经网络模型进行训练，得到车辆图像识别模型。The first training module 21 is configured to use the training sample set to train the preset neural network model to obtain a vehicle image recognition model.

关于车辆图像检测装置的具体限定可以参见上文中对于车辆图像检测方法的限定，在此不再赘述。上述车辆图像检测装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中，也可以以软件形式存储于计算机设备中的存储器中，以便于处理器调用执行以上各个模块对应的操作。For the specific definition of the vehicle image detection device, please refer to the above definition of the vehicle image detection method, which will not be repeated here. Each module in the above-mentioned vehicle image detection device may be implemented in whole or in part by software, hardware, and a combination thereof. The above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.

在一个实施例中，提供了一种计算机设备，该计算机设备可以是服务器，其内部结构图可以如图12所示。该计算机设备包括通过***总线连接的处理器、存储器、网络接口和数据库。其中，该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括可读存储介质、内存储器。该可读存储介质存储有操作***、计算机可读指令和数据库。该内存储器为可读存储介质中的操作***和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储上述车辆图像检测方法中使用到的数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种车辆图像检测方法。In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 12. The computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a readable storage medium and an internal memory. The readable storage medium stores an operating system, computer readable instructions, and a database. The internal memory provides an environment for the operation of the operating system and computer readable instructions in the readable storage medium. The database of the computer equipment is used to store the data used in the above-mentioned vehicle image detection method. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer readable instruction is executed by the processor to realize a vehicle image detection method.

在一个实施例中，提供了一种计算机设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令，处理器执行计算机可读指令时实现如下步骤：In one embodiment, a computer device is provided, including a memory, a processor, and computer readable instructions stored in the memory and capable of running on the processor, and the processor implements the following steps when the processor executes the computer readable instructions:

在一个实施例中，提供了一个或多个存储有计算机可读指令的可读存储介质，本实施例所提供的可读存储介质包括非易失性可读存储介质和易失性可读存储介质；该可读存储介质上存储有计算机可读指令，计算机可读指令被一个或多个处理器执行时，使得一个或多个处理器实现如下步骤：In one embodiment, one or more readable storage media storing computer readable instructions are provided. The readable storage media provided in this embodiment include non-volatile readable storage media and volatile readable storage. Medium; the readable storage medium stores computer readable instructions, and when the computer readable instructions are executed by one or more processors, the one or more processors implement the following steps:

将所述池化层输出特征输入至框网络中，采用三维滑动框对3D池化层中每一池化层输出特征进行识别，得到所述待识别图像的识别信息。 Input the output features of the pooling layer into the frame network, and use a three-dimensional sliding frame to identify the output features of each pooling layer in the 3D pooling layer to obtain the identification information of the image to be identified.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程，是可以通过计算机可读指令来指令相关的硬件来完成，所述的计算机可读指令可存储于一非易失性计算机可读取存储介质或者易失性计算机可读取存储介质中，该计算机可读指令在执行时，可包括如上述各方法的实施例的流程。其中，本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用，均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器（ROM）、可编程ROM（PROM）、电可编程ROM（EPROM）、电可擦除可编程ROM（EEPROM）或闪存。易失性存储器可包括随机存取存储器（RAM）或者外部高速缓冲存储器。作为说明而非局限，RAM以多种形式可得，诸如静态RAM（SRAM）、动态RAM（DRAM）、同步DRAM（SDRAM）、双数据率SDRAM（DDRSDRAM）、增强型SDRAM（ESDRAM）、同步链路（Synchlink） DRAM（SLDRAM）、存储器总线（Rambus）直接RAM（RDRAM）、直接存储器总线动态RAM（DRDRAM）、以及存储器总线动态RAM（RDRAM）等。本实施例所提供的可读存储介质包括非易失性可读存储介质和易失性可读存储介质。A person of ordinary skill in the art can understand that all or part of the processes in the methods of the foregoing embodiments can be implemented by instructing relevant hardware through computer-readable instructions. The computer-readable instructions can be stored in a non-volatile computer. In a readable storage medium or a volatile computer readable storage medium, when the computer readable instruction is executed, it may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database, or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc. The readable storage medium provided in this embodiment includes a non-volatile readable storage medium and a volatile readable storage medium.

所属领域的技术人员可以清楚地了解到，为了描述的方便和简洁，仅以上述各功能单元、模块的划分进行举例说明，实际应用中，可以根据需要而将上述功能分配由不同的功能单元、模块完成，即将所述装置的内部结构划分成不同的功能单元或模块，以完成以上描述的全部或者部分功能。Those skilled in the art can clearly understand that, for the convenience and conciseness of description, only the division of the above functional units and modules is used as an example. In practical applications, the above functions can be allocated to different functional units and modules as needed. Module completion, that is, the internal structure of the device is divided into different functional units or modules to complete all or part of the functions described above.

以上所述实施例仅用以说明本申请的技术方案，而非对其限制；尽管参照前述实施例对本申请进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围，均应包含在本申请的保护范围之内。The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, a person of ordinary skill in the art should understand that it can still implement the foregoing The technical solutions recorded in the examples are modified, or some of the technical features are equivalently replaced; these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the application, and should be included in Within the scope of protection of this application.

To

Claims

一种车辆图像检测方法，其中，所述车辆图像检测方法包括： A vehicle image detection method, wherein the vehicle image detection method includes:

获取待识别图像，所述待识别图像为通过三维图像采集设备采集的车辆图像经过图像处理之后得到的；Acquiring an image to be identified, where the image to be identified is obtained after image processing of a vehicle image collected by a three-dimensional image acquisition device;

获取车辆图像识别模型，所述车辆图像识别模型包括卷积神经网络和框网络；Acquiring a vehicle image recognition model, the vehicle image recognition model including a convolutional neural network and a frame network;

将所述待识别图像输入至所述卷积神经网络中，得到所述卷积神经网络的池化层输出特征，所述卷积神经网络的卷积层采用3D卷积核，池化层采用3D池化层；Input the image to be recognized into the convolutional neural network to obtain the output characteristics of the pooling layer of the convolutional neural network. The convolutional layer of the convolutional neural network adopts a 3D convolution kernel, and the pooling layer adopts 3D pooling layer;

将所述池化层输出特征输入至框网络中，采用三维滑动框对3D池化层中每一池化层输出特征进行识别，得到所述待识别图像的识别信息。Input the output features of the pooling layer into the frame network, and use a three-dimensional sliding frame to identify the output features of each pooling layer in the 3D pooling layer to obtain the identification information of the image to be identified.
如权利要求1所述的车辆图像检测方法，其中，所述车辆图像识别模型还包括热点网络，在所述得到所述车辆图像的识别信息之后，所述车辆图像检测方法，还包括： The vehicle image detection method according to claim 1, wherein the vehicle image recognition model further comprises a hotspot network, and after the identification information of the vehicle image is obtained, the vehicle image detection method further comprises:

若所述待识别图像的识别信息为第一类型信息，则将所述池化层输出特征输入至热点网络中，所述第一类型信息为指示所述待识别图像中存在损伤部位；If the identification information of the image to be identified is the first type of information, input the output feature of the pooling layer into the hotspot network, where the first type of information indicates that there is a damaged part in the image to be identified;

在所述热点网络中，对所述池化层输出特征进行卷积分类识别，得到热点区域信息；In the hotspot network, perform volume integral type recognition on the output features of the pooling layer to obtain hotspot area information;

对所述热点区域信息进行中值滤波，得到滤波后的热点区域信息；Performing median filtering on the hot spot area information to obtain filtered hot spot area information;

对滤波后的热点区域信息进行区域最大值提取，得到所述待识别图像的关键位置信息。The area maximum value extraction is performed on the filtered hot area information to obtain the key position information of the image to be recognized.
如权利要求2所述的车辆图像检测方法，其中，所述对滤波后的热点区域信息进行区域最大值提取，得到所述待识别图像的关键位置信息，包括： The vehicle image detection method according to claim 2, wherein said extracting the maximum value of the area from the filtered hotspot area information to obtain the key position information of the image to be recognized comprises:

获取所述滤波后的热点区域信息点阵图的点阵值；Acquiring the bitmap value of the filtered hot spot area information bitmap;

比较所述滤波后的热点区域信息点阵图的点阵值，提取所述滤波后的热点区域信息的最大值。Comparing the dot matrix values of the filtered hot spot area information dot map, and extracting the maximum value of the filtered hot spot area information.
如权利要求1所述的车辆图像检测方法，其中，所述采用三维滑动框对3D池化层中每一池化层输出特征进行识别，得到所述待识别图像的识别信息，包括： 5. The vehicle image detection method according to claim 1, wherein said using a three-dimensional sliding frame to identify the output features of each pooling layer in the 3D pooling layer to obtain the identification information of the image to be identified comprises:

根据每一池化层的大小，确定每一池化层上三维滑动框的大小和个数；According to the size of each pooling layer, determine the size and number of three-dimensional sliding boxes on each pooling layer;

控制所述三维滑动框在每一池化层上滑动，对所述池化层输出特征进行识别，得到识别信息；Controlling the three-dimensional sliding frame to slide on each pooling layer, and identifying output features of the pooling layer to obtain identification information;

若所述识别信息为目标种类的位置信息，则将所述三维滑动框的框选大小输入至分类网络进行卷积分类。If the identification information is the location information of the target category, the frame selection size of the three-dimensional sliding frame is input into the classification network to perform volume integration.
如权利要求1所述的车辆图像检测方法，其中，在所述获取车辆图像识别模型之前，所述车辆图像检测方法还包括： 5. The vehicle image detection method according to claim 1, wherein, before said obtaining the vehicle image recognition model, the vehicle image detection method further comprises:

获取训练样本集，所述训练样本集包括车辆样本图像和对应的标注数据，所述车辆样本图像为三维图像采集设备采集的图像经过图像处理之后得到的；Acquiring a training sample set, the training sample set including a vehicle sample image and corresponding annotation data, the vehicle sample image is obtained after image processing is performed on an image collected by a three-dimensional image acquisition device;

获取预设的神经网络模型，所述预设的神经网络模型包括卷积神经网络、框网络和热点网络；Acquiring a preset neural network model, where the preset neural network model includes a convolutional neural network, a box network, and a hotspot network;

采用所述训练样本集对所述预设的神经网络模型进行训练，得到车辆图像识别模型。The training sample set is used to train the preset neural network model to obtain a vehicle image recognition model.
一种车辆图像检测装置，其中，包括： A vehicle image detection device, which includes:

第一获取模块，用于获取待识别图像，所述待识别图像为通过三维图像采集设备采集的车辆图像经过图像处理之后得到的；The first acquisition module is configured to acquire an image to be identified, the image to be identified is obtained after image processing of a vehicle image acquired by a three-dimensional image acquisition device;

第二获取模块，用于获取车辆图像识别模型，所述车辆图像识别模型包括卷积神经网络和框网络；The second acquisition module is used to acquire a vehicle image recognition model, the vehicle image recognition model including a convolutional neural network and a frame network;

第一输入模块，用于将所述待识别图像输入至所述卷积神经网络中，得到所述卷积神经网络的池化层输出特征，所述卷积神经网络的卷积层采用3D卷积核，池化层采用3D池化层；The first input module is used to input the image to be recognized into the convolutional neural network to obtain the output characteristics of the pooling layer of the convolutional neural network, and the convolutional layer of the convolutional neural network adopts a 3D volume Accumulate core, pooling layer adopts 3D pooling layer;

第二输入模块，用于将所述池化层输出特征输入至框网络中，采用三维滑动框对3D池化层中每一池化层输出特征进行识别，得到所述待识别图像的识别信息。The second input module is used to input the output features of the pooling layer into the box network, and use a three-dimensional sliding box to identify the output features of each pooling layer in the 3D pooling layer to obtain the identification information of the image to be identified .
如权利要求6所述的车辆图像检测装置，其中，所述车辆图像检测装置还包括： 7. The vehicle image detection device of claim 6, wherein the vehicle image detection device further comprises:

第三输入模块，用于在所述待识别图像的识别信息为第一类型信息时，将所述池化层输出特征输入至热点网络中，所述第一类型信息为指示所述待识别图像中存在损伤部位；The third input module is configured to input the output characteristics of the pooling layer into the hotspot network when the identification information of the image to be recognized is the first type of information, where the first type information indicates the image to be recognized There is an injury site in;

第一识别模块，用于在所述热点网络中，对所述池化层输出特征进行卷积分类识别，得到热点区域信息；The first recognition module is configured to perform volume integration type recognition on the output features of the pooling layer in the hotspot network to obtain hotspot area information;

中值滤波模块，用于对所述热点区域信息进行中值滤波，得到滤波后的热点区域信息；The median filtering module is configured to perform median filtering on the hot spot area information to obtain filtered hot spot area information;

信息提取模块，用于对滤波后的热点区域信息进行区域最大值提取，得到所述待识别图像的关键位置信息。The information extraction module is used for extracting the area maximum value from the filtered hot area information to obtain the key position information of the image to be recognized.
如权利要求6所述的车辆图像检测装置，其中，所述信息提取模块还包括： The vehicle image detection device according to claim 6, wherein the information extraction module further comprises:

第三获取模块，用于获取所述滤波后的热点区域信息点阵图的点阵值；The third acquiring module is used to acquire the dot matrix value of the filtered hot spot area information dot matrix image;

第一比较模块，用于比较所述滤波后的热点区域信息点阵图的点阵值，提取所述滤波后的热点区域信息的最大值。The first comparison module is configured to compare the dot matrix values of the filtered hot spot area information dot map, and extract the maximum value of the filtered hot spot area information.
如权利要求6所述的车辆图像检测装置，其中，所述第二输入模块还包括：The vehicle image detection device according to claim 6, wherein the second input module further comprises:

第一计算模块，用于根据每一池化层的大小，确定每一池化层上三维滑动框的大小和个数；The first calculation module is used to determine the size and number of three-dimensional sliding boxes on each pooling layer according to the size of each pooling layer;

第二识别模块，用于控制所述三维滑动框在每一池化层上滑动，对所述池化层输出特征进行识别，得到识别信息；The second recognition module is used to control the three-dimensional sliding frame to slide on each pooling layer, to identify the output characteristics of the pooling layer, and to obtain identification information;

第四输入模块，用于在所述识别信息为目标种类的位置信息时，将所述三维滑动框的框选大小输入至分类网络进行卷积分类。The fourth input module is used for inputting the frame selection size of the three-dimensional sliding frame into the classification network to perform volume and integral classification when the identification information is the position information of the target category.
如权利要求6所述的车辆图像检测装置，其中，所述车辆图像检测装置还包括：7. The vehicle image detection device of claim 6, wherein the vehicle image detection device further comprises:

第三获取模块，用于获取训练样本集，所述训练样本集包括车辆样本图像和对应的标注数据，所述车辆样本图像为三维图像采集设备采集的图像经过图像处理之后得到的；The third acquisition module is configured to acquire a training sample set, the training sample set includes a vehicle sample image and corresponding annotation data, the vehicle sample image is obtained after image processing is performed on an image collected by a three-dimensional image acquisition device;

第四获取模块，用于获取预设的神经网络模型，所述预设的神经网络模型包括卷积神经网络、框网络和热点网络；The fourth acquisition module is used to acquire a preset neural network model, where the preset neural network model includes a convolutional neural network, a box network, and a hotspot network;

第一训练模块，用于采用所述训练样本集对所述预设的神经网络模型进行训练，得到车辆图像识别模型。The first training module is configured to use the training sample set to train the preset neural network model to obtain a vehicle image recognition model.
一种计算机设备，包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令，其中，所述处理器执行所述计算机可读指令时实现如下步骤：A computer device includes a memory, a processor, and computer-readable instructions that are stored in the memory and can run on the processor, wherein the processor implements the following steps when the processor executes the computer-readable instructions:

获取待识别图像，所述待识别图像为通过三维图像采集设备采集的车辆图像经过图像处理之后得到的；Acquiring an image to be identified, where the image to be identified is obtained after image processing of a vehicle image collected by a three-dimensional image acquisition device;

获取车辆图像识别模型，所述车辆图像识别模型包括卷积神经网络和框网络；Acquiring a vehicle image recognition model, the vehicle image recognition model including a convolutional neural network and a frame network;

将所述待识别图像输入至所述卷积神经网络中，得到所述卷积神经网络的池化层输出特征，所述卷积神经网络的卷积层采用3D卷积核，池化层采用3D池化层；Input the image to be recognized into the convolutional neural network to obtain the output characteristics of the pooling layer of the convolutional neural network. The convolutional layer of the convolutional neural network adopts a 3D convolution kernel, and the pooling layer adopts 3D pooling layer;

将所述池化层输出特征输入至框网络中，采用三维滑动框对3D池化层中每一池化层输出特征进行识别，得到所述待识别图像的识别信息。Input the output features of the pooling layer into the frame network, and use a three-dimensional sliding frame to identify the output features of each pooling layer in the 3D pooling layer to obtain the identification information of the image to be identified.
如权利要求11所述的计算机设备，其中，所述车辆图像识别模型还包括热点网络，在所述得到所述车辆图像的识别信息之后，所述处理器执行所述计算机可读指令时还实现如下步骤：The computer device of claim 11, wherein the vehicle image recognition model further includes a hotspot network, and after the vehicle image identification information is obtained, the processor also implements the computer readable instruction The following steps:

若所述待识别图像的识别信息为第一类型信息，则将所述池化层输出特征输入至热点网络中，所述第一类型信息为指示所述待识别图像中存在损伤部位；If the identification information of the image to be identified is the first type of information, input the output feature of the pooling layer into the hotspot network, where the first type of information indicates that there is a damaged part in the image to be identified;

在所述热点网络中，对所述池化层输出特征进行卷积分类识别，得到热点区域信息；In the hotspot network, perform volume integral type recognition on the output features of the pooling layer to obtain hotspot area information;

对所述热点区域信息进行中值滤波，得到滤波后的热点区域信息；Performing median filtering on the hot spot area information to obtain filtered hot spot area information;

对滤波后的热点区域信息进行区域最大值提取，得到所述待识别图像的关键位置信息。The area maximum value extraction is performed on the filtered hot area information to obtain the key position information of the image to be recognized.
如权利要求12所述的计算机设备，其中，所述对滤波后的热点区域信息进行区域最大值提取，得到所述待识别图像的关键位置信息，包括：The computer device according to claim 12, wherein said extracting the maximum value of the area from the filtered hotspot area information to obtain the key position information of the image to be recognized comprises:

获取所述滤波后的热点区域信息点阵图的点阵值；Acquiring the bitmap value of the filtered hot spot area information bitmap;

比较所述滤波后的热点区域信息点阵图的点阵值，提取所述滤波后的热点区域信息的最大值。Comparing the dot matrix values of the filtered hot spot area information dot map, and extracting the maximum value of the filtered hot spot area information.
如权利要求11所述的计算机设备，其中，所述采用三维滑动框对3D池化层中每一池化层输出特征进行识别，得到所述待识别图像的识别信息，包括：11. The computer device according to claim 11, wherein said using a three-dimensional sliding frame to identify the output feature of each pooling layer in the 3D pooling layer to obtain the identification information of the image to be identified comprises:

根据每一池化层的大小，确定每一池化层上三维滑动框的大小和个数；According to the size of each pooling layer, determine the size and number of three-dimensional sliding boxes on each pooling layer;

控制所述三维滑动框在每一池化层上滑动，对所述池化层输出特征进行识别，得到识别信息；Controlling the three-dimensional sliding frame to slide on each pooling layer, and identifying output features of the pooling layer to obtain identification information;

若所述识别信息为目标种类的位置信息，则将所述三维滑动框的框选大小输入至分类网络进行卷积分类。If the identification information is the location information of the target category, the frame selection size of the three-dimensional sliding frame is input into the classification network to perform volume integration.
如权利要求11所述的计算机设备，其中，在所述获取车辆图像识别模型之前，所述处理器执行所述计算机可读指令时还实现如下步骤：The computer device according to claim 11, wherein, before said acquiring the vehicle image recognition model, the processor further implements the following steps when executing the computer-readable instruction:

获取训练样本集，所述训练样本集包括车辆样本图像和对应的标注数据，所述车辆样本图像为三维图像采集设备采集的图像经过图像处理之后得到的；Acquiring a training sample set, the training sample set including a vehicle sample image and corresponding annotation data, the vehicle sample image is obtained after image processing is performed on an image collected by a three-dimensional image acquisition device;

获取预设的神经网络模型，所述预设的神经网络模型包括卷积神经网络、框网络和热点网络；Acquiring a preset neural network model, where the preset neural network model includes a convolutional neural network, a box network, and a hotspot network;

采用所述训练样本集对所述预设的神经网络模型进行训练，得到车辆图像识别模型。The training sample set is used to train the preset neural network model to obtain a vehicle image recognition model.
一个或多个存储有计算机可读指令的可读存储介质，其中，所述计算机可读指令被一个或多个处理器执行时，使得所述一个或多个处理器执行如下步骤：One or more readable storage media storing computer readable instructions, where when the computer readable instructions are executed by one or more processors, the one or more processors execute the following steps:

获取待识别图像，所述待识别图像为通过三维图像采集设备采集的车辆图像经过图像处理之后得到的；Acquiring an image to be identified, where the image to be identified is obtained after image processing of a vehicle image collected by a three-dimensional image acquisition device;

获取车辆图像识别模型，所述车辆图像识别模型包括卷积神经网络和框网络；Acquiring a vehicle image recognition model, the vehicle image recognition model including a convolutional neural network and a frame network;

将所述待识别图像输入至所述卷积神经网络中，得到所述卷积神经网络的池化层输出特征，所述卷积神经网络的卷积层采用3D卷积核，池化层采用3D池化层；Input the image to be recognized into the convolutional neural network to obtain the output characteristics of the pooling layer of the convolutional neural network. The convolutional layer of the convolutional neural network adopts a 3D convolution kernel, and the pooling layer adopts 3D pooling layer;

将所述池化层输出特征输入至框网络中，采用三维滑动框对3D池化层中每一池化层输出特征进行识别，得到所述待识别图像的识别信息。Input the output features of the pooling layer into the frame network, and use a three-dimensional sliding frame to identify the output features of each pooling layer in the 3D pooling layer to obtain the identification information of the image to be identified.
如权利要求16所述的可读存储介质，其中，所述车辆图像识别模型还包括热点网络，在所述得到所述车辆图像的识别信息之后，所述计算机可读指令被一个或多个处理器执行时，使得所述一个或多个处理器还执行如下步骤：The readable storage medium of claim 16, wherein the vehicle image recognition model further comprises a hotspot network, and after the vehicle image identification information is obtained, the computer readable instructions are processed by one or more When the processor executes, the one or more processors further execute the following steps:

若所述待识别图像的识别信息为第一类型信息，则将所述池化层输出特征输入至热点网络中，所述第一类型信息为指示所述待识别图像中存在损伤部位；If the identification information of the image to be identified is the first type of information, input the output feature of the pooling layer into the hotspot network, where the first type of information indicates that there is a damaged part in the image to be identified;

在所述热点网络中，对所述池化层输出特征进行卷积分类识别，得到热点区域信息；In the hotspot network, perform volume integral type recognition on the output features of the pooling layer to obtain hotspot area information;

对所述热点区域信息进行中值滤波，得到滤波后的热点区域信息；Performing median filtering on the hot spot area information to obtain filtered hot spot area information;

对滤波后的热点区域信息进行区域最大值提取，得到所述待识别图像的关键位置信息。The area maximum value extraction is performed on the filtered hot area information to obtain the key position information of the image to be recognized.
如权利要求17所述的可读存储介质，其中，所述对滤波后的热点区域信息进行区域最大值提取，得到所述待识别图像的关键位置信息，包括：17. The readable storage medium of claim 17, wherein said extracting the maximum value of the area from the filtered hotspot area information to obtain the key position information of the image to be recognized comprises:

获取所述滤波后的热点区域信息点阵图的点阵值；Acquiring the bitmap value of the filtered hot spot area information bitmap;

比较所述滤波后的热点区域信息点阵图的点阵值，提取所述滤波后的热点区域信息的最大值。Comparing the dot matrix values of the filtered hot spot area information dot map, and extracting the maximum value of the filtered hot spot area information.
如权利要求16所述的可读存储介质，其中，所述采用三维滑动框对3D池化层中每一池化层输出特征进行识别，得到所述待识别图像的识别信息，包括：15. The readable storage medium according to claim 16, wherein said using a three-dimensional sliding frame to identify the output features of each pooling layer in the 3D pooling layer to obtain the identification information of the image to be identified comprises:

根据每一池化层的大小，确定每一池化层上三维滑动框的大小和个数；According to the size of each pooling layer, determine the size and number of three-dimensional sliding boxes on each pooling layer;

控制所述三维滑动框在每一池化层上滑动，对所述池化层输出特征进行识别，得到识别信息；Control the three-dimensional sliding frame to slide on each pooling layer, and identify the output characteristics of the pooling layer to obtain identification information;

若所述识别信息为目标种类的位置信息，则将所述三维滑动框的框选大小输入至分类网络进行卷积分类。If the identification information is the location information of the target category, the frame selection size of the three-dimensional sliding frame is input into the classification network to perform volume integration.
如权利要求16所述的可读存储介质，其中，在所述获取车辆图像识别模型之前，所述计算机可读指令被一个或多个处理器执行时，使得所述一个或多个处理器还执行如下步骤：The readable storage medium according to claim 16, wherein, before the acquisition of the vehicle image recognition model, when the computer-readable instructions are executed by one or more processors, the one or more processors also Perform the following steps:

获取训练样本集，所述训练样本集包括车辆样本图像和对应的标注数据，所述车辆样本图像为三维图像采集设备采集的图像经过图像处理之后得到的；Acquiring a training sample set, the training sample set including a vehicle sample image and corresponding annotation data, the vehicle sample image is obtained after image processing is performed on an image collected by a three-dimensional image acquisition device;

获取预设的神经网络模型，所述预设的神经网络模型包括卷积神经网络、框网络和热点网络；Acquiring a preset neural network model, where the preset neural network model includes a convolutional neural network, a box network, and a hotspot network;

采用所述训练样本集对所述预设的神经网络模型进行训练，得到车辆图像识别模型。The training sample set is used to train the preset neural network model to obtain a vehicle image recognition model.

To