CN114445690A

CN114445690A - License plate detection method, model training method, device, medium, and program product

Info

Publication number: CN114445690A
Application number: CN202210113830.XA
Authority: CN
Inventors: 张丽; 杜悦艺; 孙亚生
Original assignee: Baidu Online Network Technology Beijing Co Ltd
Current assignee: Baidu Online Network Technology Beijing Co Ltd
Priority date: 2022-01-30
Filing date: 2022-01-30
Publication date: 2022-05-06

Abstract

The disclosure provides a license plate detection method, a model training method, equipment, a medium and a program product, and relates to the technical field of computers, in particular to the technical field of deep learning. The specific implementation scheme is as follows: acquiring N feature maps of an image to be identified based on a multi-head attention mechanism, wherein the N feature maps are different in size, and N is an integer greater than 1; performing semantic recognition on M feature maps in the N feature maps to obtain M semantic feature maps, and fusing the M semantic feature maps to obtain a fused semantic feature map, wherein M is an integer less than or equal to N; and detecting license plate information in the image to be recognized based on the fused semantic feature map. The method and the device can improve the accuracy of license plate detection.

Description

License plate detection method, model training method, device, medium, and program product

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a license plate detection method, a model training method, a device, a medium, and a program product.

Background

With the increase of vehicles, the license plate detection is more and more important, and the current license plate detection technology mainly comprises the steps of using methods such as edge detection, color segmentation, wavelet transformation and the like to detect the license plate.

Disclosure of Invention

The present disclosure provides a license plate detection method, a model training method, a device, a medium, and a program product.

According to an aspect of the present disclosure, there is provided a license plate detection method, including:

acquiring N feature maps of an image to be identified based on a multi-head attention mechanism, wherein the N feature maps are different in size, and N is an integer greater than 1;

performing semantic recognition on M feature maps in the N feature maps to obtain M semantic feature maps, and fusing the M semantic feature maps to obtain a fused semantic feature map, wherein M is an integer less than or equal to N;

and detecting license plate information in the image to be recognized based on the fused semantic feature map.

According to another aspect of the present disclosure, there is provided a license plate detection model training method, including:

acquiring a training sample image and label information of the training sample image;

performing prediction operation on a training sample image through a model to be trained to obtain a prediction result, wherein the prediction operation comprises the following steps: acquiring N feature maps of the training sample image based on a multi-head attention mechanism, wherein the N feature maps are different in size; performing semantic recognition on M feature maps in the N feature maps to obtain M semantic feature maps, and fusing the M semantic feature maps to obtain a fused semantic feature map; detecting license plate information in the training sample image based on the fused semantic feature map; n is an integer greater than 1, M is an integer less than or equal to N;

and adjusting parameters of the model to be trained based on the prediction result and the label information to obtain a license plate detection model.

According to another aspect of the present disclosure, there is provided a license plate detecting device including:

the acquisition module is used for acquiring N characteristic maps of the image to be identified based on a multi-head attention mechanism, wherein the N characteristic maps are different in size, and N is an integer greater than 1;

the recognition module is used for performing semantic recognition on M feature maps in the N feature maps to obtain M semantic feature maps, and fusing the M semantic feature maps to obtain a fused semantic feature map, wherein M is an integer less than or equal to N;

and the detection module is used for detecting the license plate information in the image to be recognized based on the fused semantic feature map.

According to another aspect of the present disclosure, there is provided a license plate detection model training device, including:

the acquisition module is used for acquiring a training sample image and label information of the training sample image;

the prediction module is used for executing prediction operation on the training sample image through the model to be trained to obtain a prediction result, and the prediction operation comprises the following steps: acquiring N feature maps of the training sample image based on a multi-head attention mechanism, wherein the N feature maps are different in size; performing semantic recognition on M feature maps in the N feature maps to obtain M semantic feature maps, and fusing the M semantic feature maps to obtain a fused semantic feature map; detecting license plate information in the training sample image based on the fused semantic feature map; n is an integer greater than 1, M is an integer less than or equal to N;

and the adjusting module is used for adjusting the parameters of the model to be trained based on the prediction result and the label information to obtain a license plate detection model.

According to another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a license plate detection method or a license plate detection model training method provided by the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to execute a license plate detection method or a license plate detection model training method provided by the present disclosure.

According to another aspect of the present disclosure, a computer program product is provided, which comprises a computer program, which when executed by a processor implements the license plate detection method or the license plate detection model training method provided by the present disclosure.

According to the license plate detection method and device, the N feature maps of the image to be recognized are obtained based on the multi-head attention mechanism, the semantic feature maps of the M feature maps in the N feature maps are fused, and then the license plate information in the image to be recognized is detected based on the fused semantic feature maps, so that the accuracy of license plate detection can be improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flow chart of a license plate detection method provided by the present disclosure;

figure 2 is a schematic diagram of a Swin Transformer unit provided by the present disclosure;

FIG. 3 is a schematic diagram of a license plate detection provided by the present disclosure;

FIG. 4 is a flowchart of a license plate detection model training method provided by the present disclosure;

FIG. 5 is a schematic illustration of one type of data pre-processing provided by the present disclosure;

FIG. 6 is a schematic illustration of model training provided by the present disclosure;

FIG. 7 is a block diagram of a license plate detection device provided by the present disclosure;

FIG. 8 is a block diagram of a license plate detection model training device provided by the present disclosure;

fig. 9 is a block diagram of an electronic device for implementing a video generation method of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Referring to fig. 1, fig. 1 is a flowchart of a license plate detection method provided by the present disclosure, as shown in fig. 1, including the following steps:

s101, acquiring N feature maps of an image to be identified based on a multi-head attention mechanism, wherein the N feature maps are different in size, and N is an integer larger than 1.

The image to be recognized may be a currently detected image or may be a previously acquired image. In addition, the image to be recognized may be a dynamic image in a video or a still picture.

The above-mentioned obtaining N feature maps of the image to be recognized based on the multi-head attention mechanism may be that the feature map of the image to be recognized is recognized based on the multi-head attention mechanism, and then another feature map of the image to be recognized is recognized based on the multi-head attention mechanism on the basis of the feature map until N feature maps of different sizes are obtained.

Step S102, performing semantic recognition on M feature maps in the N feature maps to obtain M semantic feature maps, and fusing the M semantic feature maps to obtain a fused semantic feature map, wherein M is an integer less than or equal to N.

The semantic recognition of the M feature maps of the N feature maps may be performed on each of the N feature maps, or may be performed on part of the N feature maps. For example: n is equal to 4, which may be to perform semantic recognition on 3 or 4 feature maps of the 4 feature maps.

And S103, detecting license plate information in the image to be recognized based on the fused semantic feature map.

The detecting of the license plate information in the image to be recognized based on the fused semantic feature map may be detecting a license plate position in the image to be recognized based on the fused semantic feature map, for example: and detecting the coordinate information of the upper left corner, the lower left corner, the upper right corner and the lower right corner of the license plate. In some embodiments, the license plate text information in the image to be recognized may also be detected, which is not limited to this.

According to the method, the N feature maps of the image to be recognized can be obtained based on the multi-head attention mechanism, the semantic feature maps of the M feature maps in the N feature maps are fused, and the license plate information in the image to be recognized is detected based on the fused semantic feature maps, so that the accuracy of license plate detection can be improved.

The above method in the present disclosure is performed by an electronic device, for example: monitoring equipment, mobile phones, computers, servers and other electronic equipment.

As an optional implementation, the acquiring N feature maps of the image to be recognized based on the multi-head attention mechanism includes:

based on a window multi-head self-attention (W-MSA) mechanism and a shift-window multi-head self-attention (SW-MSA) mechanism, N feature maps of the image to be recognized are obtained.

The above-mentioned obtaining N feature maps of the image to be recognized based on the W-MSA mechanism and the SW-MSA mechanism may be that a partial feature map is obtained based on the W-MSA mechanism, and another feature map is obtained based on the SW-MSA mechanism on the basis of the partial feature map.

In the embodiment, the N characteristic diagrams of the image to be recognized are obtained based on the W-MSA mechanism and the SW-MSA mechanism, so that a fixed window of the W-MSA mechanism and a movable window of the SW-MSA mechanism are utilized, local modeling can be realized, the image to be recognized is locally well understood, global modeling is also realized, the image is well integrally understood, and the accuracy of license plate detection is improved. In addition, the computation amount of a multi-head attention mechanism can be saved through a fixed window of a W-MSA mechanism and a moving window of an SW-MSA mechanism.

Optionally, the obtaining N feature maps of the image to be recognized based on the W-MSA mechanism and the SW-MSA mechanism includes:

the method comprises the steps of obtaining a first feature map of an image to be recognized based on a W-MSA mechanism, obtaining a second feature map of the image to be recognized based on the first feature map based on an SW-MSA mechanism, obtaining a third feature map of the image to be recognized based on the second feature map based on the W-MSA mechanism, and obtaining a fourth feature map of the image to be recognized based on the third feature map based on the SW-MSA mechanism.

The obtaining of the first feature map of the image to be recognized based on the W-MSA mechanism may be that the image to be recognized is used as an input, dot products are performed on each basic unit in a first window based on the W-MSA mechanism, and a feature result of each basic unit is obtained by weighted summation, so that the first feature map is obtained.

The obtaining of the second feature map of the image to be recognized based on the SW-MSA mechanism may be performed by taking the first feature map as an input, performing dot product on each basic unit in a sliding window of a first window based on the SW-MSA mechanism, and performing weighted summation to obtain a feature result of each basic unit, so as to obtain the second feature map.

The obtaining of the third feature map of the image to be recognized based on the W-MSA mechanism may be performed by taking the second feature map as an input, performing dot product on each basic unit in a second window based on the W-MSA mechanism, and performing weighted summation to obtain a feature result of each basic unit, so as to obtain the third feature map. The sizes of the first window and the second window may be different.

The obtaining of the fourth feature map of the image to be recognized based on the SW-MSA mechanism may be performed by taking the third feature map as an input, performing dot product on each basic unit in a sliding window of a second window based on the SW-MSA mechanism, and performing weighted summation to obtain a feature result of each basic unit, so as to obtain the fourth feature map.

In the embodiment, the W-MSA mechanism and the SW-MSA mechanism carry out cross recognition, so that the fixed window and the movable window can be utilized to have good overall understanding on the image, and the accuracy of license plate detection is higher. In addition, the visual characteristic of simple and effective hierarchy can be realized by outputting 4 feature maps with different sizes, the 4 feature maps with different sizes can effectively identify license plate targets with different sizes, such as the last feature map and the license plate target with a large size, and the feature map at the front can identify the license plate target with a small size.

In some embodiments, the 4 feature maps are not limited to be obtained, for example: in some embodiments or scenarios an even number of 2, 6, or more feature maps may be acquired.

Optionally, a vector of a first target basic unit in the first feature map is calculated based on vectors of all other units in a first window in which the first target basic unit is located, where the first target basic unit is any basic unit in the first feature map, and the first window is a fixed window corresponding to the W-MSA mechanism;

the vector of a second target basic unit in the second feature map is calculated based on vectors of all other units in a second window where the second target basic unit is located, the second target basic unit is any basic unit in the second feature map, and the second window is a window obtained by moving the first window and corresponds to an SW-MSA mechanism;

a vector of a third target basic unit in the third feature map is calculated based on vectors of all other units in a third window in which the third target basic unit is located, the third target basic unit is any basic unit in the third feature map, and the third window is a fixed window corresponding to the W-MSA mechanism;

the vector of a fourth target basic unit in the fourth feature map is calculated based on vectors of all other units in a fourth window in which the fourth target basic unit is located, the fourth target basic unit is any basic unit in the fourth feature map, and the fourth window is a window obtained by moving the third window and corresponds to the SW-MSA mechanism.

In this embodiment, each basic unit (a basic unit may also be understood as a numerical value) in the feature map and all other basic units (or all other numerical values) in the window may be subjected to dot product to obtain the similarity between each basic unit and all other units, and then weighted summation is performed based on the similarity vector and the vector matrix of each basic unit to obtain the final result of each basic unit.

The window can be moved in the directions of uniformly moving the fixed window downwards, rightwards and the like, so that the basic units in the feature map can be deeply communicated and interacted with different basic units, and the accuracy of license plate detection is improved.

In the embodiment, each unit of the feature map is calculated based on the vectors of other units, so that deep communication and interaction among the basic units of the feature map are enhanced, and the accuracy of license plate detection is improved.

In addition, in the embodiment, multiple mapping modes can be used for the multi-head attention mechanism, that is, multiple mapping methods can be used for each basic unit in the feature map, and finally the mapping methods are mapped into multiple different vectors to respectively participate in the attention mechanism operation, and finally results of multiple attention mechanisms are combined, so that the accuracy of license plate detection can be further improved.

Optionally, the obtaining a first feature map of the image to be recognized based on the W-MSA mechanism, obtaining a second feature map of the image to be recognized based on the SW-MSA mechanism on the basis of the first feature map, obtaining a third feature map of the image to be recognized based on the W-MSA mechanism on the basis of the second feature map, and obtaining a fourth feature map of the image to be recognized based on the SW-MSA mechanism on the basis of the third feature map includes:

acquiring a first feature map of the image to be recognized based on a W-MSA mechanism through a first Swin Transformer unit in a first converter (Swin Transformer) network in a pre-acquired target model, wherein the first Swin Transformer network comprises a first Swin Transformer unit, a second Swin Transformer unit, a third Swin Transformer unit and a fourth Swin Transformer unit, the first Swin Transformer unit and the third Swin Transformer unit comprise a W-MSA layer, and the second Swin Transformer unit and the fourth Swin Transformer unit comprise an SW-MSA layer;

acquiring a second feature map of the image to be recognized based on the SW-MSA mechanism by the second Swin transform unit on the basis of the first feature map;

acquiring a third feature map of the image to be recognized based on a W-MSA mechanism by the third Swin transform unit on the basis of the second feature map;

and acquiring a fourth feature map of the image to be recognized on the basis of the SW-MSA mechanism by the fourth Swin transform unit on the basis of the third feature map.

The target model can be a Swin transform model, the Swin transform model combines simple visual characteristics with a strong and excellent transform architecture, and the Swin transform model has strong feature extraction capability, feature expression capability and model construction capability, so that a better license plate detection effect can be realized. The target model is not limited to the Swin Transformer model in the present disclosure, and may be other models including a first Swin Transformer network and a second Swin Transformer network.

The first Swin Transformer unit, the second Swin Transformer unit, the third Swin Transformer unit and the fourth Swin Transformer unit are understood to be that the feature diagram size of each of the 4 stages in the first Swin Transformer network is different, so that targets with different sizes can be better identified.

The first Swin Transformer network is a Swin Transformer network which is trained in advance and used for acquiring a characteristic diagram of an image to be recognized.

In the first Swin Transformer network, the Swin Transformer units are W-MSA and SW-MSA, wherein W-MSA is a window-multi-head attention mechanism, the Swin Transformer unit can divide a characteristic diagram into a plurality of small windows in a regular manner, and the W-MSA performs operation of the multi-head attention mechanism in the window, that is, each basic unit in the window performs multi-head attention operation with other basic units in the window, so that the operation efficiency is saved.

In this embodiment, the SW-MSA is a moving window-multi-head attention mechanism, which is implemented by using a calculation strategy after the first Swin Transformer unit performs W-MSA, and then uniformly moving a fixed window of the first Swin Transformer unit downward and rightward according to a certain rule, and then the multi-head attention mechanism is implemented by moving the window of the previous Swin Transformer unit, so that the multi-head attention of this time implements multi-head attention operation of multiple window basic units, and meanwhile, the calculation amount is guaranteed to be unchanged. Thus, two successively appearing Swin transform units use the W-MSA and the SW-MSA successively, so that local modeling is realized, and the picture is locally well understood; meanwhile, global modeling is realized, and the picture is well understood integrally. Meanwhile, the calculation amount of a multi-head attention mechanism is saved.

In addition, in some embodiments, the multi-head attention mechanism in the Swin Transformer unit in the first Swin Transformer network can use the relative position information of the basic unit for prediction, and this method is very consistent with the intuition of computer vision, that is, although different license plate targets are at different absolute positions in an image, all license plate targets belong to the semantic meaning of license plate targets, so that the accuracy of license plate detection can be further improved.

In one embodiment, the first and second Swin Transformer units, or alternatively, the third and fourth Swin Transformer units may be as shown in fig. 2, one Swin Transformer unit (the first or third Swin Transformer unit) comprising: a Normalization Layer (LN), W-MSA, residual linkage, and multilayer perceptron (MLP), and another Swin Transformer unit (either the second Swin Transformer unit or the fourth Swin Transformer unit) includes LN, SW-MSA, residual linkage, and MLP.

In this embodiment, before performing W-MSA processing, a normalization layer processing is performed on an LN layer, then a multi-head attention mechanism module (W-MSA if the first Swin Transformer unit or the third Swin Transformer unit, or SW-MSA if the second Swin Transformer unit or the fourth Swin Transformer unit) is performed, a residual linking operation is performed on an output result of the multi-head attention mechanism module and an input of the LN, then the result is subjected to LN and MLP processing, and finally a residual linking operation is performed on a result of the MLP processing and a result of the residual linking operation again, so as to obtain a feature map output by the Swin Transformer unit.

Optionally, the object model further includes: a second Swin Transformer network;

the detecting the license plate information in the image to be recognized based on the fused semantic feature map comprises the following steps:

and inputting the fused semantic feature map into the second Swin Transformer network to predict license plate information, so as to obtain license plate information in the image to be recognized.

The second Swin Transformer network is a pre-trained detection head network used for detecting license plate information.

In the embodiment, the Swin Transformer network is used for predicting the license plate information, so that the Swin Transformer network global modeling and attention mechanism can be utilized, and the accuracy of license plate detection is improved.

It should be noted that, the present disclosure is not limited to predicting license plate information through the second Swin Transformer network, for example: and predicting license plate information in the target model based on the fusion semantic feature map through a multilayer perception machine or a simple convolutional neural network to obtain the license plate information in the image to be recognized.

Optionally, the target model further includes: feature Pyramids (FPN);

the semantic recognition of the M feature maps in the N feature maps to obtain M semantic feature maps, and the fusion of the M semantic feature maps to obtain a fusion semantic feature map includes:

and performing semantic recognition on M feature maps in the N feature maps through the FPN to obtain M semantic feature maps, and fusing the M semantic feature maps to obtain a fused semantic feature map.

In this embodiment, M feature maps with different sizes may be input to the FPN module, so that feature fusion of the high-level semantic feature map, the middle-level semantic feature map, and the low-level semantic feature map may be achieved, and enhanced feature extraction may be achieved. And finally, inputting the three feature graphs subjected to feature fusion into a detection head network, and outputting the confidence coefficient and the position of the license plate.

It should be noted that, in the present disclosure, semantic recognition through FPN is not limited, and semantic recognition and fusion may also be performed through other semantic recognition neural networks.

Optionally, the image to be recognized may be preprocessed before the license plate detection is performed on the target model, for example: as shown in fig. 3, the method comprises the following steps:

inputting original image data;

data preprocessing, which may include normalizing image data size, such as image data unified to a size of 840 x 3, and may also include normalization processing;

detecting the license plate through the trained model;

and outputting the confidence of the license plate and the specific position information of the license plate.

Therefore, the original prediction picture can be subjected to data preprocessing and input into the finally trained license plate detection model in the model training process, and the confidence and the specific position (the coordinates of the left upper corner, the left lower corner, the right upper corner and the right lower corner of the license plate) of each license plate in the prediction picture are returned and output through inference prediction of the model, so that the license plate detection result is obtained.

Referring to fig. 4, fig. 4 is a flowchart of a license plate detection model training method provided by the present disclosure, as shown in fig. 4, including the steps of:

step S401, a training sample image and label information of the training sample image are obtained.

The training sample image is obtained in advance, for example: a large number of training sample images obtained in advance from a license plate image database.

In one embodiment, the training sample images may include a license plate data set (e.g., a ccpd data set) and/or training sample images obtained from a manual construction data set. The license plate data set is a large license plate data set for license plate recognition, and the data set may include, but is not limited to, at least one of the following:

the license plate picture is normal in a natural scene and relatively easy to recognize;

comparing the blurred pictures;

challenging, difficult to identify pictures;

an overexposed or excessively dark picture of the license plate part;

a picture with a relatively long or very short shooting distance from the license plate to the camera;

pictures with large horizontal inclination;

pictures in rainy days, snowy days, heavy fog days and other severe weather;

pictures of various green license plates.

In addition, the license plate data of the corresponding type is also constructed manually in order that the model can identify license plates of various Chinese models such as trucks, buses, electric vehicles, motorcycles and the like, special license plates such as printed license plates, amplified license plates, painted license plates and the like, and license plates under various complex natural scenes such as pollution damage, overexposure, excessive inclination and the like. Therefore, the model can be used for various license plates (including blue plates, yellow plates, green plates, single-line license plates, double-line license plates and the like) of automobiles, trucks, buses, motorcycles and the like, and can also be used for detecting and printing license plates, amplifying license plates, painting license plates and other license plates under complex natural scenes such as overexposure, over inclination, pollution damage and the like.

In addition, the label information may be four coordinates of the license plate in each image, such as the top left corner, the bottom left corner, the top right corner, and the bottom right corner, which are extracted as the label information. For example: for the artificially constructed data set, all image information in the image can be artificially labeled, and then four coordinates of the upper left corner, the lower left corner, the upper right corner, the lower right corner and the like of the license plate in each image are extracted as label information.

In addition, before training, in order to perform batch training (for example, each training is based on 8 picture samples instead of 1), and in order to realize the recognition capability of the model for the high-resolution pictures, the sizes of the pictures are firstly unified into 840 × 3 or other unified sizes; then, a normalization operation is performed. And in order to enhance the generalization capability of the model, data enhancement operations such as random color change, random light and dark change and the like can be performed. For example: as shown in fig. 5, the following steps are included before training:

constructing a data set;

merging the license plate data set and the manual construction data set;

reading and marking the data to obtain original data with a label;

and preprocessing the data to obtain training data after data enhancement.

Step S402, carrying out prediction operation on the training sample image through the model to be trained to obtain a prediction result, wherein the prediction operation comprises the following steps: acquiring N feature maps of the training sample image based on a multi-head attention mechanism, wherein the N feature maps are different in size; performing semantic recognition on M feature maps in the N feature maps to obtain M semantic feature maps, and fusing the M semantic feature maps to obtain a fused semantic feature map; detecting license plate information in the training sample image based on the fused semantic feature map; n is an integer greater than 1, and M is an integer less than or equal to N.

For the above prediction operation, reference may be made to the description of the embodiment shown in fig. 1, which is not repeated herein.

And S403, adjusting parameters of the model to be trained based on the prediction result and the label information to obtain a license plate detection model.

The above-mentioned adjusting the parameters of the model to be trained may be updating the optimized parameters through multiple iterations until convergence.

In one embodiment, a random Gradient Descent (SGD) may be used for training and updating the optimization parameters through multiple iterations until convergence. For example: as shown in fig. 6, the method comprises the following steps:

inputting data after data enhancement;

constructing a network model;

judging whether convergence occurs;

if not, updating the network parameters through the SGD;

reading the training data and returning to the step of judging whether convergence occurs or not;

if so, training is complete.

In the embodiment, the license plate detection model can be trained, so that the accuracy of license plate detection can be improved when the license plate detection model is used for detecting the license plate.

Optionally, the license plate detection model includes a first Swin Transformer, the first Swin Transformer network includes a first Swin Transformer unit, a second Swin Transformer unit, a third Swin Transformer unit and a fourth Swin Transformer unit, the first Swin Transformer unit and the third Swin Transformer unit include a W-MSA layer, and the second Swin Transformer unit and the fourth Swin Transformer unit include a SW-MSA layer;

the first Swin Transformer network is used for acquiring 4 feature maps with different sizes of the training sample images.

Optionally, the license plate detection model further includes: and the second Swin Transformer network is used for predicting license plate information based on the fused semantic feature map.

Optionally, the license plate detection model further includes: and the FPN is used for carrying out semantic recognition on the M characteristic graphs to obtain M semantic characteristic graphs, and fusing the M semantic characteristic graphs to obtain a fused semantic characteristic graph.

The license plate detection model may be a target model in the embodiment shown in fig. 1, and specific reference may be made to relevant descriptions of the embodiment shown in fig. 1, which is not described herein again.

It should be noted that the license plate detection model training method may be executed by an electronic device, for example: the computer, the server, and the electronic device executing the license plate detection model training method and the electronic device executing the license plate detection method are different devices or the same device.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.

Referring to fig. 7, fig. 7 is a license plate detection device provided by the present disclosure, as shown in fig. 7, a license plate detection device 700 includes:

an obtaining module 701, configured to obtain N feature maps of an image to be identified based on a multi-head attention mechanism, where the N feature maps are different in size, and N is an integer greater than 1;

an identification module 702, configured to perform semantic identification on M feature maps in the N feature maps to obtain M semantic feature maps, and fuse the M semantic feature maps to obtain a fused semantic feature map, where M is an integer smaller than or equal to N;

a detecting module 703, configured to detect license plate information in the image to be recognized based on the fused semantic feature map.

Optionally, the obtaining module 701 is configured to obtain N feature maps of the image to be recognized based on a window multi-head self-attention layer W-MSA mechanism and a shift window multi-head self-attention layer SW-MSA mechanism.

Optionally, the obtaining module 701 is configured to obtain a first feature map of the image to be recognized based on a W-MSA mechanism, obtain a second feature map of the image to be recognized based on the first feature map based on a SW-MSA mechanism, obtain a third feature map of the image to be recognized based on the second feature map based on the W-MSA mechanism, and obtain a fourth feature map of the image to be recognized based on the third feature map based on the SW-MSA mechanism.

the vector of a fourth target basic unit in the fourth feature map is calculated based on vectors of all other units in a fourth window in which the fourth target basic unit is located, the fourth target basic unit is any basic unit in the fourth feature map, and the fourth window is a window obtained by moving the third window corresponding to the SW-MSA mechanism.

The optional obtaining module 701 is configured to:

acquiring a first feature map of the image to be recognized based on a W-MSA mechanism through a first Swin Transformer unit in a first Swin Transformer network in a pre-acquired target model, wherein the first Swin Transformer network comprises a first Swin Transformer unit, a second Swin Transformer unit, a third Swin Transformer unit and a fourth Swin Transformer unit, the first Swin Transformer unit and the third Swin Transformer unit comprise a W-MSA layer, and the second Swin Transformer unit and the fourth Swin Transformer unit comprise an SW-MSA layer;

Optionally, the target model further comprises: a second Swin Transformer network;

the detection module 703 is configured to input the fused semantic feature map to the second Swin Transformer network to perform license plate information prediction, so as to obtain license plate information in the image to be recognized.

The license plate detection device provided by the embodiment of the application can realize each process realized by the method embodiment shown in fig. 1, achieves the same technical effect, and is not repeated here to avoid repetition.

Referring to fig. 8, fig. 8 is a license plate detection model training device provided in the present disclosure, and as shown in fig. 8, a license plate detection model training device 800 includes:

an obtaining module 801, configured to obtain a training sample image and label information of the training sample image;

a prediction module 802, configured to perform a prediction operation on a training sample image through a model to be trained to obtain a prediction result, where the prediction operation includes: acquiring N feature maps of the training sample image based on a multi-head attention mechanism, wherein the N feature maps are different in size; performing semantic recognition on M feature maps in the N feature maps to obtain M semantic feature maps, and fusing the M semantic feature maps to obtain a fused semantic feature map; detecting license plate information in the training sample image based on the fused semantic feature map; n is an integer greater than 1, M is an integer less than or equal to N;

and an adjusting module 803, configured to adjust parameters of the model to be trained based on the prediction result and the label information, to obtain a license plate detection model.

the first Swin Transformer network is used for acquiring 4 feature maps with different sizes of the training sample image.

The license plate detection model training device provided by the embodiment of the application can realize each process realized by the method embodiment shown in fig. 4, achieves the same technical effect, and is not repeated here to avoid repetition.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 9 illustrates a schematic block diagram of an example electronic device 900 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the apparatus 900 includes a computing unit 901, which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 can also be stored. The calculation unit 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

A number of components in the device 900 are connected to the I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, and the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, optical disk, or the like; and a communication unit 909 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 901 performs the respective methods and processes described above, such as the license plate detection method or the license plate detection model training method. For example, in some embodiments, the license plate detection method or the license plate detection model training method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 900 via ROM 902 and/or communications unit 909. When loaded into RAM 903 and executed by computing unit 901, a computer program may perform one or more steps of the license plate detection method or the license plate detection model training method described above. Alternatively, in other embodiments, the computing unit 901 may be configured to perform a license plate detection method or a license plate detection model training method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A license plate detection method comprises the following steps:

2. The method of claim 1, wherein the acquiring N feature maps of the image to be recognized based on the multi-head attention mechanism comprises:

based on a window multi-head self-attention layer W-MSA mechanism and a shift window multi-head self-attention layer SW-MSA mechanism, N feature maps of the image to be recognized are obtained.

3. The method of claim 2, wherein the obtaining of the N feature maps of the image to be recognized based on the W-MSA mechanism and the SW-MSA mechanism comprises:

4. The method of claim 3, wherein the vector of the first target elementary unit in the first feature map is calculated based on vectors of all other units in a first window in which the first target elementary unit is located, the first target elementary unit is any elementary unit in the first feature map, and the first window is a fixed window corresponding to the W-MSA mechanism;

the vector of a second target basic unit in the second feature map is calculated based on vectors of all other units in a second window where the second target basic unit is located, the second target basic unit is any basic unit in the second feature map, and the second window is a window obtained by moving the first window and corresponds to the SW-MSA mechanism;

5. Method according to claim 3 or 4, wherein the obtaining of the first feature map of the image to be recognized based on the W-MSA mechanism, the obtaining of the second feature map of the image to be recognized based on the SW-MSA mechanism on the basis of the first feature map, the obtaining of the third feature map of the image to be recognized based on the W-MSA mechanism on the basis of the second feature map, and the obtaining of the fourth feature map of the image to be recognized based on the SW-MSA mechanism on the basis of the third feature map comprises:

acquiring a first feature map of the image to be recognized based on a W-MSA mechanism through a first Swin Transformer unit in a first converter Swin Transformer network in a pre-acquired target model, wherein the first Swin Transformer network comprises the first Swin Transformer unit, a second Swin Transformer unit, a third Swin Transformer unit and a fourth Swin Transformer unit, the first Swin Transformer unit and the third Swin Transformer unit comprise a W-MSA layer, and the second Swin Transformer unit and the fourth Swin Transformer unit comprise an SW-MSA layer;

acquiring a third feature map of the image to be recognized based on the W-MSA mechanism through the third Swin transducer unit on the basis of the second feature map;

6. The method of claim 5, wherein the object model further comprises: a second Swin Transformer network;

7. A license plate detection model training method comprises the following steps:

8. The method of claim 7, the license plate detection model comprising a first Swin Transformer, the first Swin Transformer network comprising first, second, third, and fourth Swin Transformer units, the first and third Swin Transformer units comprising a W-MSA layer, the second and fourth Swin Transformer units comprising a SW-MSA layer;

9. The method of claim 8, the license plate detection model further comprising: and the second Swin Transformer network is used for predicting license plate information based on the fused semantic feature map.

10. A license plate detection device comprising:

the acquisition module is used for acquiring N feature maps of the image to be recognized based on a multi-head attention mechanism, wherein the N feature maps are different in size, and N is an integer larger than 1;

11. The apparatus of claim 10, wherein the acquisition module is to acquire N feature maps of an image to be identified based on a window multi-headed self-attention layer W-MSA mechanism and a shift-window multi-headed self-attention layer SW-MSA mechanism.

12. The apparatus of claim 11, wherein the obtaining module is configured to obtain a first feature map of the image to be recognized based on a W-MSA mechanism, obtain a second feature map of the image to be recognized based on a SW-MSA mechanism, obtain a third feature map of the image to be recognized based on the second feature map based on the W-MSA mechanism, and obtain a fourth feature map of the image to be recognized based on the third feature map based on the SW-MSA mechanism.

13. The apparatus of claim 12, wherein a vector of a first target elementary unit in the first feature map is calculated based on vectors of all other units in a first window in which the first target elementary unit is located, the first target elementary unit is any elementary unit in the first feature map, and the first window is a fixed window corresponding to the W-MSA mechanism;

14. The apparatus of claim 12 or 13, wherein the acquisition module is to:

15. The apparatus of claim 14, wherein the object model further comprises: a second Swin Transformer network;

the detection module is used for inputting the fusion semantic feature map into the second Swin transform network to predict license plate information, and license plate information in the image to be recognized is obtained.

16. A license plate detection model training device comprises:

17. The apparatus of claim 16, the license plate detection model comprising a first Swin Transformer, the first Swin Transformer network comprising first, second, third, and fourth Swin Transformer units, the first and third Swin Transformer units comprising a W-MSA layer, the second and fourth Swin Transformer units comprising a SW-MSA layer;

18. The apparatus of claim 17, the license plate detection model further comprising: and the second Swin Transformer network is used for predicting license plate information based on the fused semantic feature map.

19. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-6 or causing the computer to perform the method of any one of claims 7-9.

20. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-6, and which, when executed by a processor, implements the method according to any one of claims 7-9.