CN112016556B

CN112016556B - Multi-type license plate recognition method

Info

Publication number: CN112016556B
Application number: CN202010851716.8A
Authority: CN
Inventors: 王子磊
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2020-08-21
Filing date: 2020-08-21
Publication date: 2022-07-15
Anticipated expiration: 2040-08-21
Also published as: CN112016556A

Abstract

The invention discloses a multi-type license plate recognition method, which comprises the following steps: extracting the features of the license plate image to be recognized to obtain a corresponding feature map; extracting semantic information and position information of the feature map; and (3) fusing semantic information and position information by adopting an attention mechanism, and realizing the prediction of the license plate character strings through a shared classifier. The method comprises the following steps: on one hand, by explicitly modeling the position information of the characters of the license plate, the position information of the characters can be efficiently acquired, and then the characters are identified by using the position information without respectively processing according to the types of the license plate; on the other hand, the attention mechanism is adopted to fuse semantic information and position information, the semantic information does not need to be manually processed, any complicated post-processing step is avoided, the algorithm efficiency is high, and the robustness is strong; finally, through sharing the classifier, improve data utilization and only need carry out once the identification process, be favorable to alleviating the unbalanced problem of data.

Description

Multi-type license plate recognition method

Technical Field

The invention relates to the field of intelligent transportation, in particular to a multi-type license plate recognition method.

Background

The intelligent transportation system improves the traffic safety and mobility by using advanced technology, thereby improving productivity and having a wide impact on people's lives. The license plate number of the motor vehicle is used as the unique 'identity' mark of the vehicle, the automatic identification technology of the license plate number is one of the basic core contents of the intelligent traffic system, the license plate number has been widely researched, and the license plate number can be applied to the traffic related fields of road charging, parking management, traffic law enforcement and the like.

Before outbreak of deep learning application, license plate recognition is mainly divided into two steps: character segmentation and character recognition. The character segmentation divides the license plate image into images containing single characters, so that the subsequent character recognition is facilitated, and the license plate recognition effect is directly influenced by the character segmentation quality. The character segmentation technology mainly comprises a connected domain analysis method and a projection analysis method, wherein the connected domain analysis method is simple to operate, and the projection analysis method is not influenced by the inclination of a license plate and can be applied to simple scenes with clear license plates and uniform illumination, such as parking lots, toll stations and the like. For complex scenes such as vehicle-mounted equipment and traffic monitoring, the segmentation effect is easily influenced by natural environment conditions, such as uneven illumination, low resolution, motion blur and the like. In contrast, the character recognition effect is greatly improved after the CNN is introduced. Therefore, the traditional method still faces a great obstacle to extracting the position information of the characters in the license plate.

The deep convolutional neural network is widely introduced in license plate recognition and achieves good effect due to strong feature extraction and learning capacity. In a patent of a license plate detection and integral recognition method based on deep learning, features extracted from a network are directly classified through seven classifiers, and each classifier is forced to pay attention to characters at a corresponding position. The patent 'efficient and accurate license plate recognition method' deconstructs license plate recognition into semantic segmentation tasks, and license plate character strings are obtained by performing post-processing analysis on a semantic segmentation map of a license plate. Both methods avoid explicitly extracting character position information, but the classifier for recognition in the former is difficult to locate characters, and a series of hyper-parameters in the latter reduces the generalization performance of the method.

In addition, the above method mainly considers the recognition of a single type of license plate (such as a single-row car license plate), but in practical application, the license plate recognition system is required to be capable of processing various license plates, that is, the recognition of multiple types of license plates needs to be considered. A few approaches have been tried to address this problem. The patents 'license plate identification method, license plate identification device, license plate identification equipment and medium' and 'a license plate identification method based on license plate classification and LSTM' divide and then splice double-row license plates into single-row license plate identification, and the operation steps are complex. In the patent license plate recognition method and device, a recurrent neural network is adopted to recognize single and double-row license plates, but the cyclic decoding brings low recognition efficiency.

In summary, the license plate recognition technology based on deep learning has been vigorously developed, but how to adopt a unified model to perform efficient and accurate license plate recognition for multiple types of license plates has not been solved yet.

Disclosure of Invention

The invention aims to provide a multi-type license plate recognition method, which does not need to carry out classification processing according to the types of license plates, has wide application prospect, is light and efficient, has strong robustness and can be used for unified recognition of different types of license plates in a complex traffic environment.

The purpose of the invention is realized by the following technical scheme:

a multi-type license plate recognition method comprises the following steps:

extracting the features of the license plate image to be recognized to obtain a corresponding feature map;

extracting semantic information and position information of the feature map;

and (3) fusing semantic information and position information by adopting an attention mechanism, and realizing the prediction of the license plate character strings through a shared classifier.

According to the technical scheme provided by the invention, on one hand, the position information of the characters can be efficiently acquired by explicitly modeling the position information of the characters of the license plate, and then the characters are identified by utilizing the position information without respectively processing according to the types of the license plate; on the other hand, the attention mechanism is adopted to fuse semantic information and position information, the semantic information does not need to be manually processed, any complicated post-processing step is avoided, the algorithm efficiency is high, and the robustness is strong; finally, through sharing the classifier, improve data utilization and only need carry out once the identification process, be favorable to alleviating the unbalanced problem of data.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a network structure diagram of a multi-type license plate recognition method according to an embodiment of the present invention;

FIG. 2 is a flowchart of a multi-type license plate recognition network including training and testing according to an embodiment of the present invention;

FIG. 3 is an example of a map of the location segmentation of different types of license plates provided by an embodiment of the present invention;

fig. 4 is a schematic diagram of fusion of semantic information and location information provided by an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention provides a multi-type license plate recognition method, which adopts an attention mechanism to process multi-type problems, constructs an end-to-end multi-type license plate recognition network and can directly output license plate character strings for input different license plate images. Fig. 1 provides a main composition structure and a working process of a multi-type license plate recognition network, and the main process of the recognition method is as follows:

1. and extracting the features of the license plate image to be recognized to obtain a corresponding feature map.

As shown in fig. 1, this step is implemented by a feature extraction module, which mainly includes a convolution layer, an active layer, and a down-sampling layer connected in sequence. Illustratively, a spatial Path (SpatialPath) and a Context Path (Context Path) in the BiSeNet may be used as the feature extraction module, and the downsampling multiple n may be set to 8.

The license plate image to be recognized is converted into a feature map with smaller space size through a feature extraction module, and the spatial resolution and the processing calculated amount can be balanced by adjusting the down-sampling multiple n.

Assuming that the size of the license plate image to be recognized is h multiplied by w multiplied by c ', the size of the extracted feature image is h multiplied by w multiplied by c' through the feature extraction module

Where h represents the image height, w represents the image width, c' represents the number of channels of the license plate image to be recognized (e.g., for an RGB image, the number of channels is 3), n represents the downsampling multiple, and c represents the number of channels of the feature map.

2. And extracting semantic information and position information of the feature map.

As shown in fig. 1, the extraction of the semantic information and the position information is realized by a semantic information extraction module and a position information extraction module, respectively. Illustratively, a Feature Fusion Module (Feature Fusion Module) in the BiSeNet may be used as the semantic information extraction Module and the position information extraction Module.

1) Semantic information extraction, namely predicting semantic information of all characters of license plate, with the size of

The semantic information output by the semantic information extraction module is the size

The semantic segmentation graph of the license plate characters is expressed, wherein C_sThe number of the character categories is the number of the character categories, the characters of all types of license plates are covered, 1 represents the background category, and each element of the license plate character semantic segmentation graph represents the character semantic category score of the corresponding pixel point.

2) Extracting position information, namely predicting the position information of all characters of the license plate, wherein the size of the position information is

The position information output by the position information extraction module is the same as the position information output by the position information extraction module

License plate wordIs represented by a symbol position segmentation map, wherein C_pThe maximum length of the license plate character string is represented, 1 represents a background category, and each element of the license plate character position segmentation graph represents the position category score of the character to which the corresponding pixel point belongs; the ith channel of the license plate character position segmentation graph represents the position segmentation graph of the ith character in the license plate character string, namely the channel sequence expresses the character sequence and is not influenced by the type of the license plate, i is 1_p。

3. And (3) fusing semantic information and position information by adopting an attention mechanism, and realizing the prediction of the license plate character strings through a shared classifier.

As shown in fig. 1, this step is implemented by a license plate string prediction module with attention fused, and the main steps are as follows:

1) according to the license plate character position segmentation map, regionality selection is carried out on the license plate character semantic segmentation map in a spatial attention (spatial attention) mode, namely each channel of the license plate character position segmentation map except a background channel is multiplied by the license plate character semantic segmentation map to obtain C_pAnd each new license plate character semantic segmentation map only keeps the semantic information of a single character, so that a proprietary semantic feature is constructed for each character and is irrelevant to the type of the license plate.

Each channel of the license plate character position segmentation graph corresponds to the position of one character, the pixel value of the character position represented by one channel is close to 1, and the pixel values of the rest positions are close to 0; the semantic segmentation map comprises semantic information of all characters, and in the information fusion operation, the license plate character position segmentation map is split according to channels, copied and expanded to the same dimension of the license plate semantic segmentation map, and then multiplied by the license plate character semantic segmentation map; and after multiplication, each new license plate character semantic segmentation graph only keeps the semantic information of one character. For example, an 8-channel license plate character position segmentation map is multiplied by a 24-channel license plate character semantic segmentation map channel by channel to obtain 8 new license plate character semantic segmentation maps with 24 channels, and each new license plate character semantic segmentation map only retains semantic information of one character.

Illustratively, a location attention map can be generated from a license plate character location segmentation map by using softmax, each location category score is distributed between 0 and 1, and the location is more prominent, and then the information fusion operation is carried out. Particularly, channels belonging to the background in the license plate character position segmentation map and the license plate character semantic segmentation map are discarded during information fusion.

As shown in FIG. 1, the symbols

The representation adopts an attention mechanism to fuse semantic information and position information.

2) After the semantic features of all characters are spliced in the dimension of batch size (batch size), all characters are processed once through a shared classifier to obtain C_p×C_lWherein C is_pFor the longest character string length in all types of license plates, corresponds to C_pA position, C_lAs the number of character classes C_sPlus a no character category.

3) To C_p×C_lAnd (4) performing post-processing on the probability matrix, and obtaining a final license plate recognition result by taking the maximum probability character of each position and discarding non-characters.

Illustratively, the shared classifier may be implemented by a convolutional layer, a downsample layer, and a fully-connected layer, which are connected in sequence.

Compared with the traditional license plate recognition algorithm, the scheme of the invention does not need character segmentation, and improves the recognition efficiency and the algorithm robustness; compared with other deep learning algorithms, the method explicitly learns the positions of characters, constructs an end-to-end model, avoids complex post-processing steps and improves the robustness of the algorithm; the system is light in weight by sharing the character classifier; a single model can process multiple types of license plates, and has better scene adaptability.

The main process for realizing the multi-type license plate recognition based on the multi-type license plate recognition network is described above. The multi-type license plate recognition network needs to be trained in advance and used for multi-type license plate recognition after training. As shown in fig. 2, in the training stage, a plurality of types of license plate images are collected first, and a license plate recognition data set used for training is constructed. And training a multi-type license plate recognition network on the constructed license plate recognition data set to obtain a trained multi-type license plate recognition network model. In the using stage, firstly, the license plate image to be recognized is zoomed to a proper size, and then forward reasoning (namely the process introduced above) is carried out through the multi-type license plate recognition network to obtain a final recognition result. The main description is as follows:

firstly, training.

1. And constructing a license plate recognition data set used for training.

Collecting license plate images of different types such as yellow color of a large automobile, yellow-green color matching of a large new energy automobile, blue color of a small automobile, green color of a small new energy automobile and the like, zooming the license plate images to a fixed size (such as 50 x 160), wherein the number of the license plate images of different types is as close as possible, and then making semantic segmentation labels (namely license plate character semantic segmentation label images), position segmentation labels (namely license plate character position segmentation label images) and license plate character string labels for the license plate images, wherein the semantic segmentation labels and the position segmentation labels are optional.

1) The semantic segmentation labels are used for labeling the types of characters in the license plate image and are in the form of matrixes with the same size as the license plate image. Specifically, each character is respectively marked with a compact rectangular frame column, pixels in the frame are marked as the category of the corresponding character, and other pixels are marked as the background category. The license plate characters mainly comprise Chinese characters, numeric characters and alphabetical characters, and the characters are totally 65 types according to the license plate specification of the Chinese motor vehicle, wherein the Chinese characters mainly comprise 22 provinces, 5 autonomous regions and 4 direct municipalities for short, and totally 31 types; the number characters are 0-9, and 10 types are total; the alphabetic characters are of the 24 classes A-Z (I and O excluded).

2) The position segmentation labels are used for labeling the positions of characters in the license plate image, the form of the position segmentation labels is the same as that of the semantic segmentation labels, and the labels in the rectangular frame are character position categories, such as: for a 7-character frame of a car license plate, representing the character frame by discrete values {1, 2.., 7} from left to right; for a 7-character large yellow double-row automobile license plate, the character position sequence is from top to bottom and from left to right, and is represented by discrete values {1, 2., 7 }; for a new energy license plate with 8 characters, the license plate is represented by {1, 2.., 8} from left to right.

3) The license plate character string label is used for marking a character sequence of a license plate, the maximum length of the character sequence is the maximum length of a license plate number (such as a new energy license plate, total 8 bits), and when the number of the license plate characters is insufficient (such as a common 7-bit car license plate), the characters without categories are used for complementing behind the license plate characters, so that all the license plate character sequences have the same length.

2. And (5) training.

During training, the input of the network is a license plate recognition data set license plate image, and the output is a predicted license plate character string; the internal processing procedures are also 1 to 3 described above, and are not described herein again.

After the network outputs the result, updating the network parameters through a loss function; the loss function for the training phase is:

L＝α₁L_s+α₂L_p+L_C

wherein L is_s、L_p、L_CRespectively extracting a loss function of semantic information, a loss function of position information extraction and a loss function of license plate character string prediction; alpha is alpha₁And alpha₂To balance the three-term loss function.

Illustratively, a PyTorch deep learning framework can be used for establishing a multi-type license plate recognition network, and a back propagation algorithm and a gradient descent strategy are used for reducing a loss function value and converging a model, so that a well-trained license plate recognition model is finally obtained.

In the embodiment of the invention, the loss function L for semantic information extraction_sLoss function L for extracting position information_pA cross entropy loss function may be employed; the formula is as follows:

where M is the number of classes, y_iFor indicating variables, if the class i is the same as the labeled class of the training sample, it is 1, otherwise it is 0, p_iIs the predicted probability that the training sample belongs to class i.

As will be appreciated by those skilled in the art, L_s、L_pIs prepared from C_s+1、C_p+1 losses resulting from bringing, respectively, C into the above formula_s+1 for M to give L_s。

Predicted loss function L of license plate character string_CIs mainly used for training a shared classifier, and can complete all license plate numbers to the maximum license plate length C during training_p(completion at the end of the string with the missing character category for the deficiency), the penalty function L_CThe formula is as follows:

wherein L is_lIs the cross entropy loss of a single character.

And secondly, a testing stage (namely, a subsequent using stage).

As described above, after the network training is completed, the recognition result can be obtained through forward reasoning, and the specific process is 1 to 3 as described above; specific numerical values will be described below as examples.

As shown in fig. 1, a license plate image of the left side as an example; the image is adjusted to a fixed size close to the aspect ratio of the license plate as an input of the license plate recognition model, and for example, the size of the image may be adjusted to 50 × 160.

Through the semantic information extraction module, the obtained license plate semantic segmentation graph has the size of

Through the position information extraction module, the obtained position segmentation graph has the size of

Illustratively, since the downsampling multiple n of the network is 8, h is 50, w is 160, there are 65 character classes, and there are 8 characters at most, the semantic segmentation map size excluding the background channel is 65 × 7 × 20, the position segmentation map size is 8 × 7 × 20, and there are manyThe position segmentation map of the type license plate samples is shown in fig. 3, and is arranged from left to right in the channel order, i.e., the character order. Therefore it is

The 7 is obtained because the calculation process is rounded up, and the downsampling is 8 times, and is usually 2 times for 3 times, and padding may occur after each downsampling, and thus the final result is 7.

Then, fusing the semantics and the position segmentation graph, as shown in fig. 4, multiplying each channel of the position segmentation graph by the semantic segmentation graph, and only retaining the semantic information belonging to the channel, thereby constructing a special semantic feature for each character, wherein the number in fig. 4 represents the semantic information corresponding to the character before and after the operation, the fused features are spliced in sequence, the character sequence is retained, and then the character sequence is sent to a shared classifier to obtain C_p×C_lAnd obtaining a final license plate recognition result by taking the maximum probability character of each position and discarding non-characters.

Illustratively, since the longest license plate number is 8 bits and the number of character classes is 65, C_pIs 8, C_lIs 66. For the four types of license plates in fig. 2, the network output result is shown in the right part of fig. 1, and "-" represents a no-character category, and the sequence of each character is consistent with the channel sequence of the position segmentation map.

Through the description of the above embodiments, it is clear to those skilled in the art that the above embodiments may be implemented by software, or by software plus a necessary general hardware platform. With this understanding, the technical solutions of the embodiments can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments of the present invention.

It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the foregoing division of each functional module is merely used as an example, and in practical applications, the foregoing function distribution may be completed by different functional modules as needed, that is, the internal structure of the network is divided into different functional modules to complete all or part of the above-described functions.

The above description is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are also within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A multi-type license plate recognition method is characterized by comprising the following steps:

extracting semantic information and position information of the feature map;

an attention mechanism is adopted to fuse semantic information and position information, and a shared classifier is used for predicting license plate character strings;

the method comprises the following steps of adopting an attention mechanism to fuse semantic information and position information, and realizing the prediction of the license plate character strings through a shared classifier, wherein the steps comprise:

the semantic information is a license plate character semantic segmentation map, and the position information is a license plate character position segmentation map;

according to the license plate character position segmentation map, regionality selection is carried out on the license plate character semantic segmentation map in a space attention mode, namely each channel of the license plate character position segmentation map except a background channel is multiplied by the license plate character semantic segmentation map to obtain C_pEach new license plate character semantic segmentation map only keeps semantic information of a single character, so that a special semantic feature is constructed for each character; c_pRepresenting a maximum license plate string length;

performing semantic features of all characters in a batch size dimensionAfter splicing, all characters are processed once through a shared classifier to obtain C_p×C_lA probability matrix of (2), wherein C_pFor the longest character string length in all types of license plates, corresponds to C_pA position, C_lAs the number of character classes C_sAdding a no character category;

to C_p×C_lThe probability matrix is post-processed, and a final license plate recognition result is obtained by taking the maximum probability character of each position and discarding non-characters.

2. The method of claim 1, wherein the extracting features of the license plate image to be recognized to obtain the corresponding feature map comprises:

for the license plate image to be recognized with the size of h multiplied by w multiplied by c', the size of the license plate image to be recognized is extracted through a characteristic extraction module

A characteristic diagram of (2);

the characteristic extraction module comprises a convolution layer, an activation layer and a down-sampling layer which are sequentially connected; wherein h represents the image height, w represents the image width, c' represents the number of channels of the license plate image to be recognized, n represents the down-sampling multiple, and c represents the number of channels of the feature map.

3. The method of claim 1, wherein the extracting semantic information from the feature map comprises:

for a size of

The semantic information is extracted through a semantic information extraction module; wherein,

c represents the height, width and channel number of the characteristic diagram respectively;

semantic information is defined by a size of

The license plate character semantic segmentation graph is expressed, wherein C_sThe number of the character categories is the number of the character categories, the characters of all types of license plates are covered, 1 represents the background category, and each element of the license plate character semantic segmentation graph represents the character semantic category score of the corresponding pixel point.

4. The method of claim 1, wherein the extracting the location information of the feature map comprises:

for a size of

The position information is extracted through a position information extraction module; wherein,

the position information is of size

The character position segmentation graph of the license plate is represented, wherein C_pThe maximum length of the license plate character string is represented, 1 represents a background category, and each element of the license plate character position segmentation graph represents the position category score of the character to which the corresponding pixel point belongs; the ith channel represents a position segmentation graph of the ith character in the license plate character string, namely the channel sequence expresses the character sequence, and i is 1_p。

5. The method according to claim 1, wherein the method is implemented by constructing an end-to-end multi-type license plate recognition network, and the loss function of the multi-type license plate recognition network in the training stage is:

L＝α₁L_s+α₂L_p+L_C