CN113963060B

CN113963060B - Vehicle information image processing method and device based on artificial intelligence and electronic equipment

Info

Publication number: CN113963060B
Application number: CN202111108599.7A
Authority: CN
Inventors: 林愉欢; 汪铖杰; 刘永; 吴凯; 张舒翼
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-09-22
Filing date: 2021-09-22
Publication date: 2022-03-18
Anticipated expiration: 2041-09-22
Also published as: CN113963060A

Abstract

The application provides a vehicle communication image processing method, a device, electronic equipment, a computer readable storage medium and a computer program product based on artificial intelligence; the method is applied to the map field, and comprises the following steps: generating a vehicle information public template corresponding to a plurality of vehicle information types based on vehicle information image samples of the plurality of vehicle information types, and acquiring a calibration point set of the vehicle information public template; carrying out feature extraction processing on the vehicle letter image including the target vehicle letter to obtain a feature map of the vehicle letter image; performing calibration point mask regression processing on the feature map to obtain a prediction mask corresponding to each calibration point in the vehicle information image, and performing calibration point position regression processing on the feature map to obtain a prediction coordinate corresponding to each calibration point in the vehicle information image; and determining a plurality of target calibration points corresponding to the vehicle information type of the target vehicle information based on the prediction mask, and determining the position of the target vehicle information in the vehicle information image based on the prediction coordinate of each target calibration point. Through this application, can improve the position detection efficiency of target car letter.

Description

Vehicle information image processing method and device based on artificial intelligence and electronic equipment

Technical Field

The present application relates to electronic map technologies, and in particular, to a method and an apparatus for processing an image of a vehicle body message based on artificial intelligence, an electronic device, a computer-readable storage medium, and a computer program product.

Background

Artificial Intelligence (AI) is a theory, method and technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results.

The electronic map needs to accurately and efficiently reflect information changes of various road elements in the real world, and the vehicle information fixed point detection can help corresponding map elements and mother library elements to accurately compare spatial positions, so that the vehicle information changes are updated to the electronic map in real time, and therefore, the efficient and accurate vehicle information fixed point detection is particularly important.

The related technology provides a key point detection technology of a target object, such as a face key point detection technology, but the key point detection technology in the related technology cannot be efficiently applied to the field of vehicle letter detection to adapt to different types of vehicle letters due to the fact that the types of the vehicle letters are complex and various.

Disclosure of Invention

The embodiment of the application provides an artificial intelligence-based vehicle letter image processing method, an artificial intelligence-based vehicle letter image processing device, electronic equipment, a computer-readable storage medium and a computer program product, and the positions of various types of target vehicle letters can be adaptively detected through regression processing of a public template and a calibration point mask, so that the position detection efficiency of the target vehicle letters is improved.

The technical scheme of the embodiment of the application is realized as follows:

the embodiment of the application provides a vehicle information image processing method based on artificial intelligence, which comprises the following steps:

carrying out feature extraction processing on a vehicle letter image including a target vehicle letter to obtain a feature map of the vehicle letter image;

generating a vehicle information public template corresponding to a plurality of vehicle information types based on vehicle information image samples of the plurality of vehicle information types, and acquiring a calibration point set of the vehicle information public template;

wherein the set of calibration points comprises a plurality of calibration points corresponding to each of the vehicle information types;

performing calibration point mask regression processing on the feature map to obtain a prediction mask corresponding to each calibration point in the vehicle information image, and performing calibration point position regression processing on the feature map to obtain a prediction coordinate corresponding to each calibration point in the vehicle information image;

and determining a plurality of target calibration points corresponding to the vehicle information type of the target vehicle information from the calibration point set based on the prediction mask, and determining the position of the target vehicle information in the vehicle information image based on the predicted coordinate of each target calibration point.

The embodiment of the application provides a car letter image processing apparatus based on artificial intelligence, includes:

the characteristic module is used for carrying out characteristic extraction processing on the vehicle information image comprising the target vehicle information to obtain a characteristic diagram of the vehicle information image;

the template module is used for generating vehicle information public templates corresponding to a plurality of vehicle information types based on vehicle information image samples of the plurality of vehicle information types and acquiring a calibration point set of the vehicle information public templates;

the regression module is used for performing calibration point mask regression processing on the feature map to obtain a prediction mask corresponding to each calibration point in the vehicle information image, and performing calibration point position regression processing on the feature map to obtain a prediction coordinate corresponding to each calibration point in the vehicle information image;

and the position module is used for determining a plurality of target calibration points corresponding to the vehicle information type of the target vehicle information from the calibration point set based on the prediction mask, and determining the position of the target vehicle information in the vehicle information image based on the predicted coordinate of each target calibration point.

In the foregoing solution, the template module is further configured to: extracting the vehicle information of each vehicle information image sample; carrying out size adjustment processing on the plurality of vehicle messages to obtain vehicle messages with the same size; and combining a plurality of vehicle messages with the same size to obtain a vehicle message common template corresponding to the plurality of vehicle message types.

In the foregoing solution, the feature module is further configured to: extracting the convolution characteristic of the vehicle communication image, and performing maximum pooling processing on the convolution characteristic of the vehicle communication image to obtain the pooling characteristic of the vehicle communication image; and carrying out residual iteration processing on the pooling characteristics of the vehicle communication image for multiple times to obtain a residual iteration processing result of the vehicle communication image, and taking the residual iteration processing result as a characteristic diagram of the vehicle communication image.

In the foregoing solution, the calibrated point mask regression processing is performed through a mask regression network, where the mask regression network includes a first pooling layer, a first full-link layer, and a first feature extraction layer, and the regression module is further configured to: performing mask feature extraction processing on the feature map of the vehicle communication image for multiple times through the first feature extraction layer to obtain mask regression features of the feature map; performing a tie pooling process on the mask regression features of the feature map through the first pooling layer to obtain an average mask pooling feature of the feature map; and performing first full-connection processing on the average mask pooling feature of the feature map through the first full-connection layer to obtain a prediction mask for each index point.

In the above scheme, the calibration point position regression processing is performed through a coordinate regression network, where the coordinate regression network includes a second pooling layer, a second fully-connected layer, and a second feature extraction layer; the regression module is further configured to: carrying out multiple position feature extraction processing on the feature map of the vehicle communication image through the second feature extraction layer to obtain position regression features of the feature map; performing a tie pooling process on the position regression feature of the feature map through the second pooling layer to obtain an average position pooling feature of the feature map; and carrying out second full-connection processing on the average position pooling characteristics of the characteristic diagram through the second full-connection layer to obtain a predicted coordinate aiming at each calibration point.

In the above scheme, the calibration point position regression processing is performed by M regression networks and a post-coordinate regression network, where M is an integer greater than or equal to 1; the regression module is further configured to: performing pre-regression processing on the vehicle information image through the M regression networks to obtain a pre-regression processing result of the vehicle information image; and carrying out post coordinate regression processing on the pre-regression processing result through the post coordinate regression network to obtain a predicted coordinate for each calibration point.

In the foregoing solution, the regression module is further configured to: when the value of M is 1, performing pre-regression processing on the feature map through the regression network to obtain a pre-regression processing result of the vehicle-information image; when the value of M is larger than 1, performing regression processing on the input of the mth regression network through the mth regression network in the M regression networks, and transmitting the mth regression processing result output by the mth regression network to the (M + 1) th regression network to continue the regression processing to obtain a corresponding (M + 1) th regression processing result; wherein m is an integer variable with the value increasing from 1, and the value range of m is

When m is 1, the input of the mth regression network is the characteristic diagram, and when m is 2

And when the M value is M-1, the output of the (M + 1) th regression network is the preposed regression processing result.

In the foregoing, the mth regression network includes an mth coordinate regression network and an mth heatmap stacking network, and the regression module is further configured to: performing mth coordinate regression processing on the input of the mth regression network through the mth coordinate regression network to obtain an mth predicted coordinate for each calibration point; selecting a plurality of valid index points from the plurality of index points based on the prediction mask for each of the index points; determining an mth heat map corresponding to the vehicle communication image based on the plurality of valid calibration points; and performing up-sampling processing on the feature map, and stacking the up-sampling processing result of the feature map and the mth heat map to obtain the mth regression processing result.

In the foregoing solution, the regression module is further configured to: performing the following for each of the index points: when the prediction mask corresponding to the index point is smaller than a mask threshold value, determining the corresponding index point as an effective index point; acquiring the m-th predicted coordinate of each effective calibration point, and executing the following processing aiming at each image coordinate of the vehicle information image: determining a distance between the image coordinates and each of the m-th predicted coordinates; obtaining a minimum distance of a plurality of said distances; when the minimum distance is larger than a distance threshold value, determining the heat value of the image coordinate as one; determining a calculated value negatively correlated to the minimum distance as a heat value of the image coordinate when the minimum distance is not greater than the distance threshold; and generating an mth heat map corresponding to the vehicle information image based on the heat value of each image coordinate.

In the above solution, the calibration point position regression process is performed by a heatmap regression network, the heatmap regression network including N cascaded downsampled layers, N cascaded upsampled layers, and convolutional layers, where N is an integer greater than or equal to 2; the regression module is further configured to: carrying out downsampling processing on the feature map through the N cascaded downsampling layers to obtain a downsampling processing result; performing convolution processing on the downsampling processing result through the convolution layer to obtain heat map features of the feature map; performing upsampling processing on the heat map features through the N cascaded upsampling layers to obtain a predicted heat map corresponding to the feature map; based on the predicted heat map, predicted coordinates for each of the tagged points are determined.

In the foregoing solution, the regression module is further configured to: performing the following for each of the index points: acquiring heat values of a plurality of image coordinates corresponding to the calibration points in the prediction heat map; based on the heat value, performing descending order processing on a plurality of image coordinates of the prediction heat map; and taking the image coordinate ordered at the head as the predicted coordinate of the index point.

An embodiment of the present application provides an electronic device, including:

a memory for storing executable instructions;

and the processor is used for realizing the vehicle information image processing method based on artificial intelligence provided by the embodiment of the application when the executable instructions stored in the memory are executed.

The embodiment of the application provides a computer-readable storage medium, which stores executable instructions and is used for realizing the artificial intelligence-based vehicle communication image processing method provided by the embodiment of the application when being executed by a processor.

The embodiment of the application provides a computer program product, which comprises a computer program or instructions, and the computer program or instructions are executed by a processor to realize the artificial intelligence-based vehicle communication image processing method provided by the embodiment of the application.

The embodiment of the application has the following beneficial effects:

the method comprises the steps of obtaining a calibration point set of a vehicle information public template corresponding to a plurality of vehicle information types, carrying out calibration point mask regression processing and calibration point position regression processing on a feature map of a vehicle information image to obtain a prediction mask corresponding to each calibration point in the vehicle information image and a prediction coordinate corresponding to each calibration point.

Drawings

FIG. 1 is a diagram illustrating detection of key points of a human face in the related art;

fig. 2A is a schematic diagram of a vehicle communication provided in an embodiment of the present application;

fig. 2B is a schematic diagram of a vehicle communication provided in an embodiment of the present application;

fig. 2C is a schematic diagram of a vehicle communication provided in an embodiment of the present application;

FIG. 2D is a schematic diagram of an obscured vehicle message in an embodiment of the present application;

FIG. 3 is a schematic structural diagram of an artificial intelligence-based vehicle telematics system provided in an embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device provided in an embodiment of the present application;

fig. 5 is a schematic structural diagram of a beacon fixed-point detection model of a vehicle communication image processing method provided in an embodiment of the present application;

fig. 6A is a schematic flowchart of an artificial intelligence-based vehicle communication image processing method provided in an embodiment of the present application;

fig. 6B is a schematic flowchart of an artificial intelligence-based vehicle communication image processing method according to an embodiment of the present application;

fig. 6C is a schematic flowchart of an artificial intelligence-based vehicle communication image processing method according to an embodiment of the present application;

fig. 6D is a schematic flowchart of an artificial intelligence-based vehicle communication image processing method according to an embodiment of the present application;

fig. 7 is a schematic three-dimensional reconstruction diagram of a vehicle communication image processing method provided in an embodiment of the present application;

fig. 8 is a vehicle-body identification comparison diagram of a vehicle-body identification image processing method provided in the embodiment of the present application;

fig. 9 is a general template diagram of a vehicle communication image processing method provided in an embodiment of the present application;

fig. 10 is a schematic diagram of a relationship between a mask and a coordinate in a vehicle communication image processing method according to an embodiment of the present application;

fig. 11 is a schematic diagram of a heatmap regression network provided by an embodiment of the present application.

Detailed Description

In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

In the following description, references to the terms "first \ second \ third" are only to distinguish similar objects and do not denote a particular order, but rather the terms "first \ second \ third" are used to interchange specific orders or sequences, where appropriate, so as to enable the embodiments of the application described herein to be practiced in other than the order shown or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.

Before further detailed description of the embodiments of the present application, terms and expressions referred to in the embodiments of the present application will be described, and the terms and expressions referred to in the embodiments of the present application will be used for the following explanation.

1) Vehicle information: the mark is used for transmitting traffic information such as guidance, restriction, warning and the like to traffic participants by lines, arrows, characters, elevation marks, raised road signs, contour marks and the like on a road. Its function is to regulate and guide traffic.

2) Index points (landworks): a group of key point positions predefined for the vehicle communication arrow, for example, the sharp points of the straight arrow, can obtain the peripheral frame of the vehicle communication through the key point positions, so as to detect and obtain the position of the vehicle communication in the image, and the key point positions of the same vehicle communication arrow in different frame data can be used for matching point data of downstream three-dimensional reconstruction, thereby improving reconstruction accuracy and success rate.

3) Heat map (heat map): heatmaps, which are often used to provide additional texture or surveillance information due to their rich color variations and vigorous information, allow key points to be encoded with the heatmap such that the closer they are, the higher the pixel values of the corresponding coordinates on the heatmap.

4) An Intelligent Vehicle-road coordination system (IVICS), which is called a Vehicle-road coordination system for short, is a development direction of an Intelligent Transportation System (ITS). The vehicle-road cooperative system adopts the advanced wireless communication, new generation internet and other technologies, implements vehicle-vehicle and vehicle-road dynamic real-time information interaction in all directions, develops vehicle active safety control and road cooperative management on the basis of full-time dynamic traffic information acquisition and fusion, fully realizes effective cooperation of human and vehicle roads, ensures traffic safety, improves traffic efficiency, and thus forms a safe, efficient and environment-friendly road traffic system.

In the related art, the detection of key points of a human face and the detection of key points of a human body can be realized, for example, the detection of key points of the human face is given, after the calibration positions of the key points of the human face are given, any image containing the human face is given as a model input, a prediction corresponding to the calibration positions is output, the prediction is given in a coordinate form, referring to fig. 1, the fig. 1 is a detection indicating view of key points of the human face in the related art, the positions of key points of parts such as eyebrows, mouths, noses, eyes and the like in the human face are all detected and identified in the human face, and the applicant finds that the related art does not apply the detection technology of key points of the human face to the detection of key points of the vehicle information, even if the detection technology of key points of the human face is applied to the detection of key points of the vehicle information, because the types of the vehicle information are many, the calibration positions of each type and the number of the key points are different, referring to fig. 2A, 2B and 2C, fig. 2A, 2B, and 2C are schematic diagrams of the vehicle communication provided in this embodiment, fig. 2A shows a vehicle communication as a straight arrow, each corner point of the straight arrow is a calibration point (e.g., calibration point 1-calibration point 5, and calibration point 28-calibration point 29), and connecting the corner points can form a corresponding vehicle communication, so that the corner point is taken as the calibration point, the vehicle communication shown in fig. 2B is a u-turn arrow, each corner point of the u-turn arrow is a calibration point (e.g., calibration point 21-calibration point 29), the vehicle communication shown in fig. 2C is a three-way arrow, each corner point of the three-way arrow is a calibration point (e.g., calibration point 1-calibration point 20, and calibration point 28-calibration point 29), and as can be seen from fig. 2A, 2B, and 2C, corners of different vehicle communications are different, so that different vehicle communications have different key points, the object targeted by the face key point detection technology is a face, different face images have the same calibration point, for example, eyebrows, mouths, eyes and noses of all faces are in similar relative positions, so the face key point detection technology cannot perform key point detection on multiple types of vehicle letters based on the same model, see fig. 2D, fig. 2D is a schematic diagram of the blocked vehicle letters in the embodiment of the application, the vehicle letter images have the conditions of blurring, blocking, damage and the like, the face key point technology still outputs the position of the calibration point in the conditions, but detection errors are caused if the positions of the calibration points are output for vehicle letter detection.

The vehicle communication image processing method provided by the embodiment of the application can be implemented by various electronic devices, for example, can be implemented by a terminal device or a server alone, or can be implemented by the terminal and the server in a cooperation manner.

An exemplary application of the electronic device implemented as a server in a car communication image processing system is described below, referring to fig. 3, fig. 3 is a schematic structural diagram of an artificial intelligence-based car communication image processing system provided in an embodiment of the present application, a terminal 400 is connected to the server 200 through a network 300, and the network 300 may be a wide area network or a local area network, or a combination of the two.

In some embodiments, the function of the artificial intelligence based vehicle communication image processing system is implemented based on the server 200, during the process of using the terminal 400 by a user, the user may be a road information collector, the terminal 400 collects vehicle communication image samples and transmits the vehicle communication image samples to the server 200, so that the server 200 performs training based on a plurality of loss functions on a vehicle communication fixed point detection model, the trained vehicle communication fixed point detection model is integrated in the server 200, in response to the terminal 400 receiving a vehicle communication image shot by the user, the terminal 400 transmits the vehicle communication image to the server 200, the server 200 determines the position of a target vehicle communication of the vehicle communication image through the vehicle communication fixed point detection model and transmits the position to the terminal 400, so that the terminal 400 performs a map image processing task in the intelligent vehicle-road coordination system based on the vehicle communication image and the position of the target vehicle communication.

In other embodiments, when the vehicle information image processing method provided by the embodiment of the present application is implemented by a terminal alone, in various application scenarios described above, the terminal may run the vehicle information fixed point detection model to determine the location of the target vehicle information, and the terminal performs a map image processing task in the intelligent vehicle-road coordination system based on the vehicle information image and the location of the target vehicle information.

In some embodiments, the server 200 may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform. The terminal 400 may include, but is not limited to, a mobile phone, a computer, an intelligent voice interaction device, an intelligent appliance, a vehicle-mounted terminal, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the embodiment of the present application is not limited.

Next, a structure of an electronic device for implementing the artificial intelligence based vehicle communication image processing method according to the embodiment of the present application is described, and as described above, the electronic device according to the embodiment of the present application may be the server 200 in fig. 3. Referring to fig. 4, fig. 4 is a schematic structural diagram of an electronic device provided in an embodiment of the present application, and the server 200 shown in fig. 4 includes: at least one processor 210, memory 250, at least one network interface 220. The various components in server 200 are coupled together by a bus system 240. It is understood that the bus system 240 is used to enable communications among the components. The bus system 240 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 240 in fig. 4.

The Processor 210 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.

The memory 250 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. Memory 250 optionally includes one or more storage devices physically located remotely from processor 210.

The memory 250 includes volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 250 described in embodiments herein is intended to comprise any suitable type of memory.

In some embodiments, memory 250 is capable of storing data, examples of which include programs, modules, and data structures, or a subset or superset thereof, to support various operations, as exemplified below.

An operating system 251 including system programs for processing various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks; a network communication module 252 for communicating to other computing devices via one or more (wired or wireless) network interfaces 220, exemplary network interfaces 220 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), among others.

In some embodiments, the artificial intelligence based car letter image processing device provided by the embodiment of the present application can be implemented in software, and fig. 4 shows an artificial intelligence based car letter image processing device 255 stored in a memory 250, which can be software in the form of programs and plug-ins, and the like, and includes the following software modules: a feature module 2551, a template module 2552, a regression module 2553 and a location module 2554, which are logical and thus can be arbitrarily combined or further split depending on the functionality implemented, the functionality of each of which will be described below.

In some embodiments, the terminal or the server may implement the artificial intelligence based vehicle information image processing method provided by the embodiment of the present application by running a computer program. For example, the computer program may be a native program or a software module in an operating system; can be a local (Native) Application program (APP), i.e. a program that needs to be installed in an operating system to run, such as a map APP; or may be an applet, i.e. a program that can be run only by downloading it to the browser environment; but also an applet that can be embedded into any APP. In general, the computer programs described above may be any form of application, module or plug-in.

The artificial intelligence based car communication image processing method provided by the embodiment of the present application will be described in conjunction with an exemplary application and implementation of the server 200 provided by the embodiment of the present application.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a car beacon fixed point detection model of the car information image processing method provided in the embodiment of the present application, the car beacon fixed point detection model adopts a two-stage coarse-to-fine key point regression mode, the input of the car beacon fixed point detection model is a car information image, such as a ground arrow image, the spatial dimension of the car information image is adjusted to a fixed dimension, such as a height of 238 pixels, a width of 184 pixels, and a number of channels of 3, after a feature map is obtained through a feature extraction network serving as a framework network, the feature map is respectively sent to a key point branch (coordinate regression network) and a mask branch (mask regression network), wherein the key point branch is divided into two stages, the first stage is a coarse regression of a key point, the second stage is a fine regression of the key point, and the final output of the branch is a coordinate of a corresponding key point on a common template, the mask branch is used for regression of the mask, and the output mask is used to select which keypoint coordinates (output of the keypoint branch) should be retained and output.

In some embodiments, the input of the mask branch is to obtain a feature map as a feature extraction network of the skeleton network, the feature map is subjected to a mask regression network to obtain a mask tensor of 1 × 29 dimensions, and mapping processing is performed on each dimension element, where an element greater than 0.5 is set to be 1, and other elements are set to be 0, so as to obtain a final mask tensor.

In some embodiments, the first-stage input of the keypoint branch is also to obtain a feature map through a feature extraction network, the feature map obtains a 1 × 58-dimensional keypoint coordinate tensor of the first stage through a keypoint regression network of the first stage, extracts a keypoint set through a mask, the mask is a set composed of 0 corresponding keypoints, and generates a heat map by using the set, the heat map has a height of 238 pixels and a width of 184 pixels, and the heat map is generated by placing a bounded distance field on the keypoint position, as shown in formula (1):

（1）；

wherein (x, y) are the spatial coordinates of the heat map,

is the heat value of the space coordinate, Landmarks is the key point set after the first stage of the key point branch is selected by the mask,

is the key point of the method, and the method is characterized in that,

is a distance threshold value, c is a constant, can be adjusted according to the size of the resolution and is limited

Is set to zero if it exceeds c, which may be 5.

In some embodiments, the heatmap and the upsampled feature map are stacked together as the input to the second stage of keypoint branching, the upsampled feature map has a height of 238 pixels and a width of 184 pixels, and the second stage of the regression network yields a keypoint coordinate tensor of 1 x 58 dimensions, for a total of 58 dimensions, since there are 29 keypoints and each keypoint has two coordinates, a horizontal axis and a vertical axis.

In some embodiments, if the value of the mask is 1 (representing that the key point does not belong to the type of the car beacon or that the key point is a fuzzy, occluded or damaged key point), the corresponding key point coordinates need to be set to (0, 0), referring to fig. 10, fig. 10 is a schematic diagram of a relationship between the mask and the coordinates provided in the embodiment of the present application, a correspondence between a mask tensor and a key point tensor is shown in fig. 10, each two element characterizations in a key point coordinate tensor (1 × 58 dimensions) is a key point coordinate, each element of the mask (1 × 29 dimensions) of the mask branch output corresponds to a key point, and the corrected key point tensor and mask are the final output of the car beacon fixed point detection model.

Referring to fig. 6A, fig. 6A is a schematic flowchart of an artificial intelligence-based vehicle communication image processing method provided in the embodiment of the present application, which will be described with reference to step 101 and step 104 shown in fig. 6A, wherein the execution sequence of step 101 and step 102 is not limited.

In step 101, a characteristic extraction process is performed on a vehicle information image including a target vehicle information to obtain a characteristic map of the vehicle information image.

In some embodiments, referring to fig. 6B, fig. 6B is a schematic flowchart of a method for processing a vehicle information image based on artificial intelligence provided in an embodiment of the present application, and in step 101, a feature extraction process is performed on a vehicle information image including a target vehicle information to obtain a feature map of the vehicle information image, which may be implemented in

steps

1011 and 1012.

In step 1011, the convolution feature of the vehicle communication image is extracted, and the maximum pooling process is performed on the convolution feature of the vehicle communication image, so as to obtain the pooling feature of the vehicle communication image.

In step 1012, residual iteration processing is performed on the pooled features of the vehicle communication image for multiple times to obtain a residual iteration processing result of the vehicle communication image, and the residual iteration processing result is used as a feature map of the vehicle communication image.

As an example, steps 1011 and 1012 are performed by a skeleton network in fig. 5, the skeleton network includes a convolution network, a pooling network and a plurality of cascaded residual error networks, and convolution characteristics of the vehicle information image are extracted by the convolution network; the convolution characteristics of the vehicle information image are subjected to maximum pooling processing through a pooling network to obtain pooling characteristics of the vehicle information image, the pooling characteristics of the vehicle information image are subjected to residual processing of multiple levels through multiple cascaded residual networks, the residual iteration processing result is used as a characteristic diagram of the vehicle information image, and the structure of a skeleton network is shown in a table 1.

TABLE 1 framework network structure table

In step 102, based on the vehicle information image samples of the plurality of vehicle information types, a vehicle information common template corresponding to the plurality of vehicle information types is generated, and a calibration point set of the vehicle information common template is obtained.

In some embodiments, the set of calibration points includes a plurality of calibration points corresponding to each type of vehicle information; in step 102, a vehicle letter common template corresponding to a plurality of vehicle letter types is generated based on the vehicle letter image samples of the plurality of vehicle letter types, and the method can be realized by the following technical scheme: extracting the vehicle message of each vehicle message image sample; carrying out size adjustment processing on the plurality of vehicle messages to obtain the vehicle messages with the same size; and combining the plurality of vehicle messages with the same size to obtain a vehicle message common template corresponding to a plurality of vehicle message types.

As an example, referring to fig. 9, fig. 9 is a general template schematic diagram of a method for processing a car letter image according to an embodiment of the present application, fig. 2A illustrates a car letter image with a car letter type of a straight arrow, fig. 2B illustrates a car letter image with a car letter type of a u-turn arrow, fig. 2C illustrates a car letter image with a car letter type of a three-way arrow, a car letter in the car letter images of fig. 2A, 2B, and 2C is extracted, the extracted straight arrow, u-turn arrow, and three-way arrow are subjected to size unification processing, so that the three extracted car letters have the same size and the same direction, the straight arrow, u-turn arrow, and three-way arrow having the same size and the same direction are obtained, the common template illustrated in fig. 9 is obtained by performing merging processing, fig. 9 illustrates a point 1-marked 29, and the merged common template for car letters includes the straight arrow, the u-turn arrow, the three-turn arrow, and the common template illustrated in fig. 9 includes a point 1-marked point 29, The embodiments of the present application include, in addition to the vehicle messages shown in fig. 2A, fig. 2B, and fig. 2C, other vehicle message types, such as a vehicle message of a right-turn arrow, a vehicle message of a left-turn arrow, a vehicle message of a straight-going right-turn arrow, a vehicle message of a straight-going turn arrow, a vehicle message of a left-turn arrow, a vehicle message of a left-turn right-turn arrow, and the like, and the vehicle messages are merged to obtain the common template of the vehicle messages.

As an example, the calibration point set of the common template for vehicle information includes a plurality of calibration points corresponding to each vehicle information type, and the calibration point set includes calibration points 1 to 29, where the calibration points 1 to 5 and 28 to 29 include straight arrows, the calibration points 21 to 29 include u-turn arrows, and the calibration points 1 to 16, 20 and 28 to 29 include three-way arrows.

In some embodiments, in addition to the road surface arrow, the vehicle information may also be a road sign, for example, a road sign for limiting speed 40, or a road sign for prohibiting left turn, when the embodiment of the present application is implemented, all vehicle information types may be merged to obtain a vehicle information common template including all vehicle information types, or only part of the vehicle information types may be merged, for example, only the vehicle information types of the road surface arrow are merged, and the source vehicle information type of the merging process determines the subsequent vehicle information type capable of performing the calibration point detection.

In step 103, performing a calibration point mask regression process on the feature map to obtain a prediction mask corresponding to each calibration point in the vehicle information image, and performing a calibration point position regression process on the feature map to obtain a prediction coordinate corresponding to each calibration point in the vehicle information image.

In some embodiments, referring to fig. 6C, fig. 6C is a schematic flowchart of a method for processing a car-communication image based on artificial intelligence provided in the embodiments of the present application, and the step 103 performs a calibration point mask regression process on the feature map to obtain a prediction mask corresponding to each calibration point in the car-communication image, which may be implemented through the steps 1031-.

In step 1031, the feature map of the train information image is subjected to multiple times of mask feature extraction processing by the first feature extraction layer, so as to obtain a mask regression feature of the feature map.

In step 1032, the mask regression features of the feature map are subjected to the flattening pooling process by the first pooling layer, so as to obtain the average mask pooling feature of the feature map.

In step 1033, a first full-link process is performed on the average mask pooled feature of the feature map through the first full-link layer to obtain a prediction mask for each index point.

As an example, the calibration point mask regression processing is performed by a mask regression network, the mask regression network includes a first pooling layer, a first full-link layer, and a first feature extraction layer, the feature map of the car-cover image is subjected to multiple times of mask feature extraction processing by the first feature extraction layer to obtain mask regression features of the feature map, convolution feature extraction processing, normalization processing, and activation processing are performed during the feature extraction processing, the mask regression features of the feature map are subjected to tie pooling processing by the first pooling layer to obtain average mask pooling features of the feature map, the average mask pooling features of the feature map are subjected to first full-link processing by the first full-link layer to obtain a prediction mask for each calibration point, if the car-cover public template of the above example is adopted, 29 calibration points exist, the prediction mask is a 1 × 29 tensor, the value of the element value of each dimension is 0 to 1, and the larger the value of the prediction mask is, the higher the possibility that the corresponding calibration point is represented to be not in the vehicle information type of the target vehicle information is, or the higher the possibility that the corresponding calibration point is represented to be in the shielded state is.

As an example, the training of the mask regression network is performed based on a cross-entropy loss function, a prediction probability that each calibration point (each calibration point in the public vehicle information template) in the vehicle information image sample belongs to an invalid calibration point is determined through the mask regression network, the prediction probability and the pre-label category are substituted into the cross-entropy loss function to determine a parameter of the mask regression network when the cross-entropy loss function reaches a minimum value, and at least one of the following conditions is satisfied, namely, the label is an invalid calibration point: the calibration point does not accord with the type of the vehicle message corresponding to the vehicle message image sample, the calibration point is in a shielding state, and the calibration point is in a wear state.

In some embodiments, the index point location regression process is performed through a coordinate regression network, the coordinate regression network including a second pooling layer, a second fully connected layer, and a second feature extraction layer; in step 103, the feature map is subjected to calibration point position regression processing to obtain the predicted coordinates corresponding to each calibration point in the vehicle information image, and the method can be implemented by the following technical scheme: carrying out multiple position feature extraction processing on the feature map of the vehicle communication image through a second feature extraction layer to obtain position regression features of the feature map; performing a tie pooling treatment on the position regression characteristic of the characteristic diagram through a second pooling layer to obtain an average position pooling characteristic of the characteristic diagram; and carrying out second full-connection processing on the average position pooling features of the feature map through a second full-connection layer to obtain a predicted coordinate for each calibration point.

As an example, the calibration point position regression processing is performed by a coordinate regression network, the coordinate regression network includes a second pooling layer, a second full-link layer, and a second feature extraction layer, the feature map of the car-cover image is subjected to mask feature extraction processing a plurality of times by the second feature extraction layer to obtain the coordinate regression features of the feature map, convolution feature extraction processing, normalization processing, and activation processing are performed when the feature extraction processing is performed, the position regression features of the feature map are subjected to tie pooling processing by the second pooling layer to obtain average position pooling features of the feature map, the average position pooling features of the feature map are subjected to second full-link processing by the second full-link layer to obtain predicted coordinates for each calibration point, if the car-cover common template of the above example is adopted, 29 calibration points exist, the predicted coordinates are tensors of 1 × 58, each index point corresponds to 2 dimensions, the abscissa dimension and the ordinate dimension.

As an example, the training of the coordinate regression network is also performed based on a cross entropy loss function, where the predicted coordinates of each calibration point (each calibration point in the public train letter template) in the train letter image sample are determined through the coordinate regression network, and the predicted coordinates and the pre-labeled coordinates are substituted into the cross entropy loss function to determine the parameters of the coordinate regression network when the cross entropy loss function takes the minimum value.

In some embodiments, referring to fig. 6D, fig. 6D is a schematic flowchart of an artificial intelligence-based vehicle-communication image processing method provided in an embodiment of the present application, where the calibration-point location regression process is performed through M regression networks and a post-coordinate regression network, where M is an integer greater than or equal to 1; the regression processing of the position of the calibration point is performed on the feature map in step 103 to obtain the predicted coordinates corresponding to each calibration point in the vehicle-communication image, which can be realized through

steps

1034 and 1035.

In step 1034, the pre-regression processing is performed on the car letter image through the M regression networks to obtain a pre-regression processing result of the car letter image.

In some embodiments, the step 1034 includes performing pre-regression processing on the car-letter image through M regression networks to obtain a pre-regression processing result of the car-letter image, and may be implemented by the following technical solutions: when the value of M is 1, performing pre-regression processing on the feature map through a regression network to obtain a pre-regression processing result of the vehicle-information image; when the value of M is larger than 1, performing regression processing on the input of the mth regression network through the mth regression network in the M regression networks, and transmitting the mth regression processing result output by the mth regression network to the (M + 1) th regression network to continue the regression processing to obtain a corresponding (M + 1) th regression processing result; wherein m is an integer variable with the value increasing from 1, and the value range of m is

When m is 1, the input of the mth regression network is a characteristic diagram, and when m is 2

Then, the input of the mth regression network is the mth-1 regression processing result output by the mth-1 regression network, and when the value of M is M-1, the mth +1 regression networkThe output of the collaterals is the result of the pre-regression processing.

As an example, referring to fig. 5, the processing, the heat map generation processing, and the stacking processing performed by the key point regression network shown in the first stage in fig. 5 all belong to pre-regression processing, and are performed by any one of M regression networks, for example, when M is 1, as shown in fig. 5, the feature map is pre-regression processed by the regression network to obtain a pre-regression processing result of the car-letter image as an input of the key point regression network in the second stage, which is a post-coordinate regression network, and when M is greater than 1, the feature map is input into multiple cascaded regression networks, and an output of the last regression network is used as a pre-regression processing result as an input of the key point regression network in the second stage.

In some embodiments, the mth regression network includes an mth coordinate regression network and an mth heatmap stacking network, and the performing of the regression process on the input of the mth regression network through the mth regression network of the M regression networks can be implemented by the following technical solutions: performing mth coordinate regression processing on the input of the mth regression network through the mth coordinate regression network to obtain an mth predicted coordinate for each calibration point; selecting a plurality of valid index points from the plurality of index points based on the prediction mask of each index point; determining an mth heat map of the corresponding vehicle information image based on the plurality of effective calibration points; and performing up-sampling processing on the feature map, and stacking the up-sampling processing result of the feature map and the mth heat map to obtain an mth regression processing result.

As an example, referring to fig. 5, the processing performed by the keypoint regression network, the heat map generation processing, and the stacking processing shown in the first stage in fig. 5 all belong to pre-regression processing, which is performed by any one of M regression networks, and is described as M taking the value of 2 and M taking the value of 3, the 3 rd coordinate regression processing is performed on the input of the 3 rd regression network through the 3 rd coordinate regression network to obtain the 3 rd predicted coordinate for each calibration point, a plurality of valid calibration points are selected from the plurality of calibration points based on the prediction mask of each calibration point, when the prediction mask for calibration point 1 is greater than the mask threshold, the calibration point is characterized to be masked, when the prediction mask for calibration point 1 is not greater than the mask threshold, the prediction mask is assigned to zero, the calibration point is not characterized to be masked, and determining the mth heat map of the corresponding vehicle communication image based on the effective mark points, performing up-sampling processing on the feature map, and stacking the up-sampling processing result of the feature map and the mth heat map to obtain an mth regression processing result.

In some embodiments, the selecting a plurality of valid index points from the plurality of index points based on the prediction mask of each index point may be implemented by the following technical solutions: the following processing is performed for each index point: when the prediction mask of the corresponding index point is smaller than the mask threshold value, determining the corresponding index point as an effective index point; the m-th heat map of the corresponding vehicle information image is determined based on the effective calibration points, and the method can be realized by the following technical scheme: acquiring the m-th predicted coordinate of each effective calibration point, and executing the following processing aiming at each image coordinate of the vehicle information image: determining a distance between the image coordinates and each mth predicted coordinate; obtaining a minimum distance of the plurality of distances; when the minimum distance is larger than the distance threshold value, determining the heat value of the image coordinate as one; determining a calculated value negatively correlated with the minimum distance as a heat value of the image coordinates when the minimum distance is not greater than the distance threshold; based on the heat value of each image coordinate, an mth heat map of the corresponding vehicle communication image is generated.

By way of example, a heatmap is generated by placing a bounded distance field at a nominal point location, see equation (2):

（2）；

wherein (x, y) are the spatial coordinates of the heat map,

is the heat value of the space coordinate, and Landmarks isThe index point branches the index point set which is selected by the mask in the first stage,

is the key point of the method, and the method is characterized in that,

Is set to zero if it exceeds c, which may be 5.

In step 1035, post-coordinate regression processing is performed on the pre-regression processing result through the post-coordinate regression network to obtain the predicted coordinates for each calibration point.

As an example, the post-coordinate regression processing is implemented by a post-coordinate regression network, the post-coordinate regression network includes a third pooling layer, a third full-link layer and a third feature extraction layer, the third feature extraction layer performs multiple times of location feature extraction processing on the pre-regression processing result to obtain location regression features of the pre-regression processing result, the feature extraction processing is performed by performing convolution feature extraction processing, normalization processing and activation processing, the third pooling layer performs a tie pooling processing on the location regression features of the pre-regression processing result to obtain an average location pooling feature of the pre-regression processing result, the third full-link layer performs a third full-link processing on the average location pooling feature of the pre-regression processing result to obtain a predicted coordinate for each calibration point, if the vehicle-information common template of the above example is adopted, there are 29 index points and the predicted coordinates are 1 x 58 dimensions tensors, with each index point corresponding to 2 dimensions, the abscissa dimension and the ordinate dimension.

As an example, the post-coordinate regression network and the M regression networks are in a cascade relationship, where the M regression networks are also in a cascade relationship, the post-coordinate regression network and the M regression networks are trained as a whole, the whole training is also performed based on a cross entropy loss function, the predicted coordinates of each calibration point (each calibration point in the public train information template) in the train information image sample are determined through the coordinate regression network, and the predicted coordinates and the pre-labeled coordinates are substituted into the cross entropy loss function to determine parameters of the coordinate regression networks when the cross entropy loss function obtains the minimum value.

In some embodiments, the landmark position regression process is performed by a heatmap regression network comprising N cascaded downsampled layers, N cascaded upsampled layers, and convolutional layers, where N is an integer greater than or equal to 2; in step 103, the feature map is subjected to calibration point position regression processing to obtain the predicted coordinates corresponding to each calibration point in the vehicle information image, and the method can be implemented by the following technical scheme: carrying out downsampling processing on the feature map through N cascaded downsampling layers to obtain a downsampling processing result; carrying out convolution processing on the down-sampling processing result through the convolution layer to obtain the heat map characteristics of the characteristic diagram; carrying out up-sampling processing on the heat map features through N cascaded up-sampling layers to obtain a predicted heat map corresponding to the feature map; based on the predicted heat map, predicted coordinates for each of the tagged points are determined.

As an example, referring to fig. 11, fig. 11 is a schematic diagram of a heatmap regression network provided in this embodiment of the present application, where the heatmap regression network may be a U-shaped network structure and is composed of N downsampling networks, N upsampling networks and a convolution network (convolutional layer), where N is an integer greater than or equal to 1, that is, a U-shaped network structure is formed, each downsampling network is a residual network, each upsampling network is also a residual network, heatmap features of a feature map are output after passing through a plurality of downsampling networks and convolution networks, and the heatmap features are upsampled by N cascaded upsampling layers to obtain a predicted heatmap corresponding to the feature map; based on the predicted heat map, the predicted coordinates for each of the calibration points are determined, and the locations of the calibration points are obtained indirectly by calculating the locations of the maximum values of the heat map.

In some embodiments, the determining the predicted coordinates for each of the tagged points based on the predicted heat map may be implemented by: the following processing is performed for each index point: acquiring heat values of a plurality of image coordinates corresponding to the calibration points in the prediction heat map; based on the heat value, performing descending order processing on a plurality of image coordinates of the prediction heat map; and taking the image coordinate sorted at the head as the predicted coordinate of the calibration point.

In step 104, a plurality of target calibration points corresponding to the type of the target vehicle information are determined from the calibration point set based on the prediction mask, and the position of the target vehicle information in the vehicle information image is determined based on the predicted coordinates of each target calibration point.

In some embodiments, when the prediction mask of a index point is less than the mask threshold, the corresponding index point is determined to be a target index point; and determining the position of the target vehicle message in the vehicle message image based on the predicted coordinates of each target calibration point.

As an example, when the prediction mask of the index point is less than the mask threshold, the corresponding index point is determined to be a valid index point, for example, when the prediction mask for the index point 1 is 0 (less than the mask threshold), the corresponding index point is determined to be a valid index point, which indicates that the index point does not need to be masked, and the position of the target vehicle message in the vehicle message image is determined based on the predicted coordinates of each target index point, for example, the valid index points are the index points 1 to 5 and the index points 28 to 29 in fig. 9, and the position of the entire straight arrow in the vehicle message image can be determined based on the predicted coordinates of the index points 1 to 5 and the index points 28 to 29.

Next, an exemplary application of the embodiment of the present application in a practical application scenario will be described.

In some embodiments, in the electronic map, data of various signs, and marking lines are very important for a driving navigation experience of a user, and therefore, information changes of various road elements in the real world need to be accurately and efficiently reflected in the electronic map, referring to fig. 7, fig. 7 is a three-dimensional reconstruction diagram of the method for processing the vehicle information image provided in the embodiment of the present application, a ground traffic image of the real world is reconstructed into a three-dimensional effect diagram by a three-dimensional reconstruction method, and information such as buildings and vehicle information of the ground traffic image of the real world is retained in the three-dimensional effect diagram, for example, a first vehicle information 701 in the ground traffic image of the real world is retained as a second vehicle information 702 in the three-dimensional effect diagram. The production chain of the map data is long, the map data comprises operations of data acquisition, off-line identification, data updating and the like, original data acquired in various major cities in the whole country every day are processed off-line, a factor data master database of each city map is updated and then is pushed to a map App and other clients, the off-line processing needs to detect and identify the content and the position of various vehicle letters, and also needs to perform differential comparison with historical data to find the change information of road traffic, see fig. 8, fig. 8 is a vehicle letter comparison diagram of the vehicle letter image processing method provided by the embodiment of the application, compared with an image acquired in 7 2020, a speed limit value (vehicle letter) of a speed limit board at the same geographic position is changed and a mark of vehicle limitation disappears in the image acquired in 5 2020, but the third vehicle letter 801 in the image acquired in 7, 2020 is replaced by a fourth vehicle letter 802, the information change of the elements is very important for the navigation experience of the electronic map and needs to be processed in time, the automation degree of data processing can be greatly improved by utilizing the computer vision technology, and meanwhile, higher requirements are provided for the precision and the efficiency of a vision algorithm. In order to realize the difference comparison of data, a three-dimensional reconstruction mode can be adopted, the first step needs to carry out three-dimensional reconstruction on historical data, the second step needs to utilize a positioning technology to calculate the pose of a newly acquired image in a three-dimensional map, so that whether road elements in the three-dimensional map change or not is judged through a projection relation, in the second step, calibration point detection can help corresponding map elements and mother library elements to carry out accurate comparison of spatial positions, the accuracy of difference comparison is improved, in the process that a user uses a terminal, the user can be a road surface information acquirer, the terminal collects vehicle information image samples and sends the vehicle information image samples to a server, so that the server carries out training based on a plurality of loss functions on a vehicle beacon fixed point detection model, the trained vehicle beacon fixed point detection model is integrated in the server, and the terminal receives the vehicle information image shot by the user, the terminal sends the vehicle information image to the server, and the server determines the position of the target vehicle information of the vehicle information image through the vehicle information fixed point detection model and sends the position to the terminal so that the terminal can execute map task image processing based on the vehicle information image.

The embodiment of the application provides a vehicle information public template and a vehicle beacon fixed point detection model, under the definition of the vehicle information public template, key points of all types of vehicle information can be uniformly detected through the vehicle beacon fixed point detection model, and a mask branch in the vehicle beacon fixed point detection model effectively solves the problem of key point loss caused by the reasons of breakage, shielding, fuzziness and the like of a ground arrow in an actual application scene, and provides accurate key point information for downstream tasks.

In some embodiments, the present application provides a general template (common template for vehicle information) including multiple types of vehicle information, and referring to fig. 9, for a total of 29 key points (calibration points), the ground key point detection task is to give the position of the calibration point of the target vehicle information in the image of the input vehicle information, and to shield the missing key points caused by blurring, breaking and blocking.

Under the existing condition of the vehicle information common template, if the face detection technology is applied to vehicle information detection, only the predicted positions of 29 fixed calibration points can be regressed, and corresponding position prediction results cannot be output for various types of vehicle information unless type recognition is carried out on the vehicle information first and regression processing is carried out through corresponding detection networks according to different types to obtain corresponding calibration point positions, so that the defects that the model structure is complex and the calculated amount is large are caused.

In some embodiments, the beacon point detection model takes a two-stage coarse-to-fine keypoint regression approach, the input to the beacon point detection model is a beacon image, for example, the space size of the ground arrow image and the vehicle communication image is adjusted to a fixed size, for example, the height is 238 pixels and the width is 184 pixels, after the feature map is obtained through the feature extraction network as the skeleton network, the feature map is fed into a key point branch (coordinate regression network) and a mask branch (mask regression network) respectively, wherein the key point branches are divided into two stages, the first stage is the coarse regression of the key points, the second stage is the fine regression of the key points, the final output of the branch is the coordinates of the corresponding key points on the public template, the mask branch is used for regression of the mask, and the output mask is used for selecting which key point coordinates (the output of the key point branch) should be reserved and output.

In some embodiments, the first-stage input of the keypoint branch is also to obtain a feature map through a feature extraction network, the feature map obtains a 1 × 58-dimensional keypoint coordinate tensor of the first stage through a keypoint regression network of the first stage, extracts a keypoint set through a mask, the mask is a set composed of 0 corresponding keypoints, and generates a heat map by using the set, the heat map has a height of 238 pixels and a width of 184 pixels, and the heat map is generated by placing a bounded distance field on the keypoint position, as shown in formula (3):

（3）；

wherein (x, y) are the spatial coordinates of the heat map,

is the key point of the method, and the method is characterized in that,

Is set to zero if it exceeds c, which may be 5.

In some embodiments, if the value of the mask is 1 (representing that the key point does not belong to the key point of the vehicle fixed point type or that the key point is a fuzzy, occluded, or damaged key point), the corresponding key point coordinate needs to be set to (0, 0), referring to fig. 10, fig. 10 is a schematic diagram of a relationship between the mask and the coordinate of the vehicle fixed point image processing method provided by the embodiment of the present application, a correspondence between a mask tensor and a key point tensor is shown in fig. 10, each two element representations in the key point tensor is a key point coordinate, each element on the mask branch output corresponds to a key point, and the corrected key point tensor and mask tensor are final outputs of the vehicle fixed point beacon detection model.

According to the vehicle information image processing method based on artificial intelligence, the prediction mask and the prediction coordinate are combined to be suitable for position detection of multiple vehicle information types, the prediction mask is introduced to effectively identify the calibration point with shielding damage, so that the position detection accuracy is improved, accurate vehicle information key point position data can be provided for downstream tasks, the point matching relation of associated arrows is provided in three-dimensional scene reconstruction based on the vehicle information key point position data, and whether the vehicle information belongs to a high-position information type or not can be judged by combining the score given by the mask branch of a vehicle information fixed point detection model with the type of the vehicle information in map service.

Although the vehicle beacon fixed point detection model provided by the embodiment of the application adopts a cascading mode to carry out a mode from coarse to fine, the prediction of key points can be carried out by trying a single stage or other structures, secondly, coordinate points can be directly regressed, a regression key point heat map can also be changed, and then the positions of the key points are indirectly obtained by calculating the positions of the maximum values of the heat map.

Continuing with the exemplary structure of the artificial intelligence based vehicle communication image processing apparatus 255 provided by the embodiments of the present application as software modules, in some embodiments, as shown in fig. 4, the software modules stored in the artificial intelligence based vehicle communication image processing apparatus 255 of the memory 250 may include: a feature module 2551, configured to perform feature extraction processing on the vehicle information image including the target vehicle information to obtain a feature map of the vehicle information image; the template module 2552 is configured to generate a vehicle information common template corresponding to a plurality of vehicle information types based on vehicle information image samples of the plurality of vehicle information types, and obtain a calibration point set of the vehicle information common template; the system comprises a fixed point set and a fixed point set, wherein the fixed point set comprises a plurality of fixed points corresponding to each vehicle information type; a regression module 2553, configured to perform calibration point mask regression on the feature map to obtain a prediction mask corresponding to each calibration point in the vehicle information image, and perform calibration point position regression on the feature map to obtain a prediction coordinate corresponding to each calibration point in the vehicle information image; a location module 2554, configured to determine, based on the prediction mask, a plurality of target calibration points corresponding to the type of the target vehicle information from the calibration point set, and determine, based on the predicted coordinates of each target calibration point, a location of the target vehicle information in the vehicle information image.

In some embodiments, template module 2552, is further configured to: extracting the vehicle message of each vehicle message image sample; carrying out size adjustment processing on the plurality of vehicle messages to obtain the vehicle messages with the same size; and combining the plurality of vehicle messages with the same size to obtain a vehicle message common template corresponding to a plurality of vehicle message types.

In some embodiments, the features module 2551, is further to: extracting the convolution characteristics of the vehicle information image, and performing maximum pooling processing on the convolution characteristics of the vehicle information image to obtain pooling characteristics of the vehicle information image; and carrying out residual iteration processing on the pooling characteristics of the vehicle communication image for multiple times to obtain a residual iteration processing result of the vehicle communication image, and taking the residual iteration processing result as a characteristic diagram of the vehicle communication image.

In some embodiments, the scaled point mask regression process is performed through a mask regression network, the mask regression network including a first pooling layer, a first fully connected layer, and a first feature extraction layer, the regression module 2553 further configured to: performing multiple feature extraction processing on the feature map of the vehicle communication image through a first feature extraction layer to obtain mask regression features of the feature map; performing tie pooling on the regression features of the feature map through a first pooling layer to obtain average mask pooling features of the feature map; and carrying out second full-connection processing on the average position pooling characteristics of the characteristic diagram through the first full-connection layer to obtain a prediction mask aiming at each calibration point.

In some embodiments, the index point location regression process is performed through a coordinate regression network, the coordinate regression network including a second pooling layer, a second fully connected layer, and a second feature extraction layer; regression module 2553, further configured to: carrying out multiple position feature extraction processing on the feature map of the vehicle communication image through a second feature extraction layer to obtain position regression features of the feature map; performing a tie pooling treatment on the regression feature of the feature map through a second pooling layer to obtain an average position pooling feature of the feature map; and carrying out second full-connection processing on the average position pooling features of the feature map through a second full-connection layer to obtain a predicted coordinate for each calibration point.

In some embodiments, the index point location regression process is performed through M regression networks and a post coordinate regression network, M being an integer greater than or equal to 1; regression module 2553, further configured to: carrying out prepositive regression processing on the vehicle information image through M regression networks to obtain a prepositive regression processing result of the vehicle information image; and carrying out post coordinate regression processing on the pre-regression processing result through a post coordinate regression network to obtain a predicted coordinate for each calibration point.

In some embodiments, regression module 2553 is further configured to: when the value of M is 1, performing pre-regression processing on the feature map through the regression network to obtain a pre-regression processing result of the vehicle-information image; when M is larger than 1, passing MThe m regression network in the regression network carries out regression processing on the input of the m regression network and transmits the m regression processing result output by the m regression network to the m +1 regression network to continue the regression processing so as to obtain the corresponding m +1 regression processing result; wherein m is an integer variable with the value increasing from 1, and the value range of m is

In some embodiments, the mth regression network comprises an mth coordinate regression network and an mth heatmap stacking network, the regression module 2553 further to: performing mth coordinate regression processing on the input of the mth regression network through the mth coordinate regression network to obtain an mth predicted coordinate for each calibration point; selecting a plurality of valid index points from the plurality of index points based on the prediction mask of each index point; determining an mth heat map of the corresponding vehicle information image based on the plurality of effective calibration points; and performing up-sampling processing on the feature map, and stacking the up-sampling processing result of the feature map and the mth heat map to obtain an mth regression processing result.

In some embodiments, regression module 2553 is further configured to: the following processing is performed for each index point: when the prediction mask of the corresponding index point is smaller than the mask threshold value, determining the corresponding index point as an effective index point; acquiring the m-th predicted coordinate of each effective calibration point, and executing the following processing aiming at each image coordinate of the vehicle information image: determining a distance between the image coordinates and each mth predicted coordinate; obtaining a minimum distance of the plurality of distances; when the minimum distance is larger than the distance threshold value, determining the heat value of the image coordinate as one; determining a calculated value negatively correlated with the minimum distance as a heat value of the image coordinates when the minimum distance is not greater than the distance threshold; based on the heat value of each image coordinate, an mth heat map of the corresponding vehicle communication image is generated.

In some embodiments, the landmark position regression process is performed by a heatmap regression network comprising N cascaded downsampled layers, N cascaded upsampled layers, and convolutional layers, where N is an integer greater than or equal to 2; regression module 2553, further configured to: carrying out downsampling processing on the feature map through N cascaded downsampling layers to obtain a downsampling processing result; carrying out convolution processing on the down-sampling processing result through the convolution layer to obtain the heat map characteristics of the characteristic diagram; carrying out up-sampling processing on the heat map features through N cascaded up-sampling layers to obtain a predicted heat map corresponding to the feature map; based on the predicted heat map, predicted coordinates for each of the tagged points are determined.

In some embodiments, regression module 2553 is further configured to: the following processing is performed for each index point: acquiring heat values of a plurality of image coordinates corresponding to the calibration points in the prediction heat map; based on the heat value, performing descending order processing on a plurality of image coordinates of the prediction heat map; and taking the image coordinate sorted at the head as the predicted coordinate of the calibration point.

Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the artificial intelligence based car communication image processing method described in the embodiment of the application.

The embodiment of the application provides a computer-readable storage medium which stores executable instructions, and when the executable instructions are executed by a processor, the executable instructions cause the processor to execute the artificial intelligence based vehicle communication image processing method provided by the embodiment of the application.

In some embodiments, the computer-readable storage medium may be memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.

In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.

In summary, the calibration point set of the vehicle information public template corresponding to a plurality of vehicle information types is obtained through the embodiment of the application, the calibration point mask regression processing and the calibration point position regression processing are performed on the feature map of the vehicle information image, the prediction mask corresponding to each calibration point in the vehicle information image and the prediction coordinate corresponding to each calibration point are obtained, and the prediction mask and the prediction coordinate are combined to be suitable for position detection of the plurality of vehicle information types.

The above description is only an example of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims

1. An artificial intelligence-based vehicle information image processing method is characterized by comprising the following steps:

performing calibration point mask regression processing on the feature map to obtain a prediction mask corresponding to each calibration point in the vehicle information image;

performing pre-regression processing on the vehicle information image through M regression networks to obtain a pre-regression processing result of the vehicle information image, wherein M is an integer greater than or equal to 1;

carrying out post coordinate regression processing on the pre-regression processing result through a post coordinate regression network to obtain a predicted coordinate for each calibration point;

2. The method of claim 1, wherein generating a common template for the plurality of types of vehicle credits based on the image samples of the plurality of types of vehicle credits comprises:

extracting the vehicle information of each vehicle information image sample;

carrying out size adjustment processing on the plurality of vehicle messages to obtain vehicle messages with the same size;

and combining a plurality of vehicle messages with the same size to obtain a vehicle message common template corresponding to the plurality of vehicle message types.

3. The method according to claim 1, wherein the performing a feature extraction process on the vehicle information image including the target vehicle information to obtain a feature map of the vehicle information image comprises:

extracting the convolution characteristic of the vehicle communication image, and performing maximum pooling processing on the convolution characteristic of the vehicle communication image to obtain the pooling characteristic of the vehicle communication image;

and carrying out residual iteration processing on the pooling characteristics of the vehicle communication image for multiple times to obtain a residual iteration processing result of the vehicle communication image, and taking the residual iteration processing result as a characteristic diagram of the vehicle communication image.

4. The method according to claim 1, wherein the calibration point mask regression processing is performed through a mask regression network, the mask regression network includes a first pooling layer, a first fully connected layer, and a first feature extraction layer, and the performing the calibration point mask regression processing on the feature map to obtain the prediction mask corresponding to each calibration point in the vehicle information image includes:

performing mask feature extraction processing on the feature map of the vehicle communication image for multiple times through the first feature extraction layer to obtain mask regression features of the feature map;

performing a tie pooling process on the mask regression features of the feature map through the first pooling layer to obtain an average mask pooling feature of the feature map;

and performing first full-connection processing on the average mask pooling feature of the feature map through the first full-connection layer to obtain a prediction mask for each index point.

5. The method according to claim 1, wherein the pre-regression processing of the car information image through the M regression networks to obtain a pre-regression processing result of the car information image comprises:

when the value of M is 1, performing pre-regression processing on the feature map through the regression network to obtain a pre-regression processing result of the vehicle-information image;

when the value of M is larger than 1, performing regression processing on the input of the mth regression network through the mth regression network in the M regression networks, and transmitting the mth regression processing result output by the mth regression network to the (M + 1) th regression network to continue the regression processing to obtain a corresponding (M + 1) th regression processing result; wherein m is an integer variable with the value increasing from 1, and the value range of m is

When m is 1, the input of the mth regression network is the characteristic diagram, and when m is 1

6. The method of claim 5, wherein the mth regression network comprises an mth coordinate regression network and an mth heat map stack network, and wherein performing regression processing on the input of the mth regression network through the mth regression network of the M regression networks comprises:

performing mth coordinate regression processing on the input of the mth regression network through the mth coordinate regression network to obtain an mth predicted coordinate for each calibration point;

selecting a plurality of valid index points from the plurality of index points based on the prediction mask for each of the index points;

determining an mth heat map corresponding to the vehicle communication image based on the plurality of valid calibration points;

and performing up-sampling processing on the feature map, and stacking the up-sampling processing result of the feature map and the mth heat map to obtain the mth regression processing result.

7. The method of claim 6,

selecting a plurality of valid index points from the plurality of index points based on the prediction mask for each of the index points, comprising:

performing the following for each of the index points: when the prediction mask corresponding to the index point is smaller than a mask threshold value, determining the corresponding index point as an effective index point;

the determining an mth heat map corresponding to the vehicle communication image based on the plurality of valid calibration points comprises:

acquiring the m-th predicted coordinate of each effective calibration point, and executing the following processing aiming at each image coordinate of the vehicle information image:

determining a distance between the image coordinates and each of the m-th predicted coordinates;

obtaining a minimum distance of a plurality of said distances;

when the minimum distance is larger than a distance threshold value, determining the heat value of the image coordinate as one;

determining a calculated value negatively correlated to the minimum distance as a heat value of the image coordinate when the minimum distance is not greater than the distance threshold;

and generating an mth heat map corresponding to the vehicle information image based on the heat value of each image coordinate.

8. An artificial intelligence-based vehicle information image processing device, characterized by comprising:

the regression module is used for performing calibration point mask regression processing on the feature map to obtain a prediction mask corresponding to each calibration point in the vehicle information image; performing pre-regression processing on the vehicle information image through M regression networks to obtain a pre-regression processing result of the vehicle information image, wherein M is an integer greater than or equal to 1; carrying out post coordinate regression processing on the pre-regression processing result through a post coordinate regression network to obtain a predicted coordinate for each calibration point;

9. An electronic device, characterized in that the electronic device comprises:

a memory for storing executable instructions;

a processor for implementing the artificial intelligence based car communication image processing method of any one of claims 1 to 7 when executing the executable instructions stored in the memory.

10. A computer-readable storage medium storing executable instructions which, when executed by a processor, implement the artificial intelligence based car-letter image processing method of any one of claims 1 to 7.