CN112580536A

CN112580536A - High-order video vehicle and license plate detection method and device

Info

Publication number: CN112580536A
Application number: CN202011541635.4A
Authority: CN
Inventors: 唐健; 高声荣; 王浩; 石伟; 陶昆
Original assignee: Shenzhen Jieshun Science and Technology Industry Co Ltd
Current assignee: Shenzhen Jieshun Science and Technology Industry Co Ltd
Priority date: 2020-12-23
Filing date: 2020-12-23
Publication date: 2021-03-30

Abstract

The embodiment of the application discloses a method and a device for detecting a high-order video vehicle and a license plate, which are used for realizing real-time detection of the vehicle and the license plate in a high-order scene. The method in the embodiment of the application comprises the following steps: acquiring an image containing information of a pre-detected vehicle through a high-order video, wherein the information of the pre-detected vehicle comprises the vehicle, the category and the position information of a license plate of the vehicle; inputting the image containing the information of the vehicle to be detected into a target model, wherein the target model is a model obtained by training the characteristic information of the types and the positions of the vehicles and the license plates of the vehicles in a learning sample set; and generating target information data according to the detection result of the target model, wherein the target information data are the types and the position data of the vehicles and the license plates of the vehicles to be detected.

Description

High-order video vehicle and license plate detection method and device

Technical Field

The embodiment of the application relates to the field of intelligent security monitoring, in particular to a method and a device for detecting high-order video vehicles and license plates.

Background

With the development of society, automobiles become essential transportation tools for most families to go out, under the background that the quantity of automobile reserves in China is continuously increased and the parking spaces in China still have great gaps, the 'difficult parking and disordered parking' become one of the main reasons of traffic jam, and the intelligent parking is an important means for solving the traffic jam except for traffic control and limit plates at present. In order to better manage automobiles, high-level video intelligent parking systems are derived in the market. High-order video wisdom parking system utilizes high-order video as collection equipment, realizes the intellectuality of parking in the way, and this system passes through the artificial intelligence algorithm of camera, accurate discernment vehicle state, parking stall situation and license plate information.

The embedded equipment adopted by the existing system has limited computing resources, and a large amount of storage space and computing resources are required to be occupied in the process of carrying out large-scale data acquisition and processing on the vehicle state, the parking space condition and the license plate information, so that the resource-limited platform is overloaded and cannot meet the requirement of real-time detection of the system.

Disclosure of Invention

The embodiment of the application provides a method and a device for detecting a high-order video vehicle and a license plate, which are used for detecting the category and the license plate information of the vehicle in real time.

In a first aspect, an embodiment of the present application provides a method for detecting a high-order video vehicle and a license plate, including:

acquiring an image containing information of a pre-detected vehicle through a high-order video, wherein the information of the pre-detected vehicle comprises the vehicle, the category and the position information of a license plate of the vehicle;

inputting the image containing the information of the vehicle to be detected into a target model, wherein the target model is a model obtained by training the characteristic information of the types and the positions of the vehicles and the license plates of the vehicles in a learning sample set;

and generating target information data according to the detection result of the target model, wherein the target information data are the types and the position data of the vehicles and the license plates of the vehicles to be detected.

Optionally, before the image containing the information of the vehicle to be detected is acquired through the high-order video, the detection method further includes:

acquiring a positive sample set and a negative sample set, wherein the positive sample set is an image sample set containing a vehicle, and the negative sample set is an image sample set not containing the vehicle;

respectively extracting an image from the positive sample set and the negative sample set, covering the image extracted from the negative sample set in a non-vehicle position area of the image extracted from the positive sample set to obtain a training image, and generating a training sample set;

extracting a training sample from the training sample set as a target detection sample;

inputting the target detection sample into an initial model to generate training data, wherein the initial model is a model established based on a neural network, and the training data is data containing the category and position information of a vehicle and a vehicle license plate;

calculating a total loss value according to the training data, the image containing the vehicle information and the image not containing the vehicle information;

generating an initial model input numerical value, wherein the initial model input numerical value is the number of times that the target detection sample is input into the initial model;

judging whether the input times of the initial model is greater than 1, if not, updating the initial model parameters according to the total loss value, extracting another training sample from the training sample set, and performing the following steps: inputting the updated initial model, calculating the total loss value and judging whether the total loss value reaches a preset value;

if yes, judging whether the total loss value reaches a preset value;

and if so, determining the initial model as a target model.

Optionally, after determining whether the total loss value reaches a preset value, the detection method further includes:

if not, updating the initial model parameters according to the total loss value, inputting the target detection sample into the updated initial model again, and performing the following steps: and calculating the total loss value and judging whether the total loss value is smaller than a preset value.

Optionally, the calculating a total loss value according to the training data, the image containing the vehicle information, and the image not containing the vehicle information includes:

calculating an xyz total loss value of the training data, the image containing the vehicle information, and the image not containing the vehicle information;

calculating a confidence total loss value of the training data, the image containing the vehicle information and the image not containing the vehicle information;

calculating a category total loss value of the training data and the image containing the vehicle information;

and adding the xywh total loss value, the confidence coefficient total loss value and the category total loss value to obtain a total loss value.

Optionally, before inputting the target detection sample into the initial model generation training data, the detection method further includes:

pruning the Darknet53 neural network by using a channel pruning mode;

the initial model was generated using the pruned Darknet53 as the underlying network.

Optionally, the pruning the Darknet53 neural network by using a channel pruning method includes:

introducing a scaling factor to a Darknet53 neural network channel, multiplying the scaling factor by the output of the channel, and combining the network weight and the scaling factor to generate a pruning network function, wherein the scaling factor is a parameter gamma of a BN layer;

generating a pruning training time value, wherein the pruning training time value is the time of inputting the target detection sample into a pruning network function;

judging whether the pruning training times are larger than 1, if so, judging whether the loss value of the scaling factor is equal to the loss value of the scaling factor calculated at the previous time;

if not, updating the parameter gamma of the BN layer, extracting another training sample from the training sample set, and performing the following steps: inputting the updated pruning network function, performing loss calculation of the updated scaling factor, and judging whether the updated scaling factor loss value is equal to the scaling factor loss value calculated at the previous time;

if yes, setting a pruning threshold, sorting according to the weight of the BN layer, reserving channels smaller than the pruning threshold, and performing channel pruning on the channels larger than the pruning threshold.

Optionally, after determining whether the updated scaling factor loss value is equal to the scaling factor loss value calculated in the previous time, the detection method further includes:

if not, updating the parameter gamma of the BN layer according to the loss value of the scaling factor, inputting the target detection sample into the updated pruning network function again, and performing the following steps: inputting the updated pruning network function, carrying out loss calculation of the updated scaling factor, and judging whether the updated scaling factor loss value is equal to the scaling factor loss value calculated at the previous time.

Optionally, before generating the initial model by using the pruned Darknet53 as the base network, the detection method further includes:

and connecting the feature extraction layer of the pruning-processed Darknet53 network with an SE module, wherein the SE module is used for learning the correlation among channels and screening the feature pertinence of the channels.

Optionally, the obtaining an image including information of a vehicle to be detected through a high-order video includes:

acquiring high-order video data through a high-order camera;

intercepting continuous frame images of the high-order video data within a preset time;

and acquiring an image containing information of the pre-detected vehicle through the continuous frame images.

The embodiment of the present application provides a detection apparatus for a high-order video vehicle and a license plate from a second aspect, which includes:

the system comprises a first acquisition unit, a second acquisition unit and a third acquisition unit, wherein the first acquisition unit is used for acquiring an image containing information of a pre-detected vehicle through a high-order video, and the information of the pre-detected vehicle is the information of the type and the position of the vehicle and the license plate of the vehicle;

the data input unit is used for inputting the image containing the information of the pre-detected vehicles into a target model, and the target model is a model obtained by training the types and the position characteristic information of the vehicles and the license plates of the vehicles in a learning sample set;

and the generating unit is used for generating target information data according to the detection result of the target model, wherein the target information data are the types and the position data of the vehicles and the license plates of the vehicles to be detected.

Optionally, the detection apparatus further includes:

the second acquisition unit is used for acquiring a positive sample set and a negative sample set, wherein the positive sample set is an image sample set containing the vehicle, and the negative sample set is an image sample set not containing the vehicle;

the sample generation unit is used for extracting one image from each of the positive sample set and the negative sample set, covering the image extracted from the negative sample set on a non-vehicle position area of the image extracted from the positive sample set to obtain a training image, and generating a training sample set;

the sample extraction unit is used for extracting a training sample from the training sample set as a target detection sample;

the sample input unit is used for inputting the target detection sample into an initial model to generate training data, the initial model is a model established based on a neural network, and the training data is data containing the category and the position information of a vehicle and a vehicle license plate;

a calculating unit, configured to calculate a total loss value according to the training data, the image containing the vehicle information, and the image not containing the vehicle information;

an input frequency generation unit, configured to generate an initial model input frequency value, where the initial model input frequency value is a frequency for inputting the target detection sample into the initial model;

the first judgment unit is used for judging whether the initial model input times value is larger than 1 or not;

a first executing unit, configured to, when the first determining unit determines that the initial model input order value is not greater than 1, update the initial model parameter according to the total loss value, extract another training sample from the training sample set, and perform the following steps: inputting the updated initial model, calculating the total loss value and judging whether the total loss value reaches a preset value;

the second judging unit is used for judging whether the total loss value reaches a preset value or not when the first judging unit determines that the initial model input number of times is greater than 1;

a second executing unit, configured to determine that the initial model is a target model when the second determining unit determines that the total loss value reaches a preset value.

Optionally, the detection apparatus further includes:

a third executing unit, configured to, when the second determining unit determines that the total loss value does not reach a preset value, update the initial model parameters according to the total loss value, re-input the target detection sample to the updated initial model, and perform the following steps: and calculating the total loss value and judging whether the total loss value is smaller than a preset value.

Optionally, the computing unit includes:

a coordinate total loss value calculation module for calculating an xyz total loss value of the training data, the image containing the vehicle information, and the image not containing the vehicle information;

a confidence total loss value calculation module for calculating a confidence total loss value of the training data, the image containing the vehicle information and the image not containing the vehicle information;

the category total loss value calculation module is used for calculating the category total loss values of the training data and the images containing the vehicle information;

and the total loss value calculating module is used for adding the xywh total loss value, the confidence coefficient total loss value and the category total loss value to obtain a total loss value.

Optionally, the detection apparatus further includes:

the channel pruning unit is used for pruning the Darknet53 neural network in a channel pruning mode;

and the model generating unit is used for generating an initial model by taking the Darknet53 subjected to pruning as a basic network.

Optionally, the channel pruning unit includes:

the parameter processing module is used for introducing a scaling factor to a Darknet53 neural network channel, multiplying the scaling factor by the output of the channel, and combining the network weight and the scaling factor to generate a pruning network function, wherein the scaling factor is a parameter gamma of a BN layer;

a pruning frequency generation module, configured to generate a pruning training frequency value, where the pruning training frequency value is a frequency at which the target detection sample is input to a pruning network function;

the pruning times judging module is used for judging whether the pruning training times value is greater than 1;

a fourth executing module, configured to determine whether a loss value of the scaling factor is equal to a scaling factor loss value calculated in the previous time when the pruning times judging module determines that the pruning training times value is greater than 1;

a third judgment execution module, configured to update the parameter γ of the BN layer and extract another training sample from the training sample set when the pruning times judgment module determines that the pruning training times value is not greater than 1, and perform the following steps: inputting the updated pruning network function, performing loss calculation of the updated scaling factor, and judging whether the updated scaling factor loss value is equal to the scaling factor loss value calculated at the previous time;

a fifth executing module, configured to set a pruning threshold when the third determining and executing module determines that the updated scaling factor loss value is equal to the scaling factor loss value calculated at the previous time, sort according to the weight of the BN layer, reserve channels smaller than the pruning threshold, and perform channel pruning on channels larger than the pruning threshold.

Optionally, the detection apparatus further includes:

a sixth executing module, configured to, when the third determining and executing module determines that the updated scaling factor loss value is not equal to the scaling factor loss value calculated in the previous time, update the parameter γ of the BN layer according to the scaling factor loss value, re-input the target detection sample to the updated pruning network function, and perform the following steps: inputting the updated pruning network function, carrying out loss calculation of the updated scaling factor, and judging whether the updated scaling factor loss value is equal to the scaling factor loss value calculated at the previous time.

Optionally, the detection apparatus further includes:

and the connection unit is used for connecting the feature extraction layer of the pruning Darknet53 network with an SE module, and the SE module is used for learning the correlation among channels and screening the feature pertinence of the channels.

Optionally, the first obtaining unit includes:

the second acquisition module is used for acquiring high-order video data through the high-order camera;

the image intercepting module is used for intercepting continuous frame images of the high-order video data within preset time;

and the third acquisition module is used for acquiring an image containing information of the vehicle to be detected through the continuous frame images.

In a third aspect, an embodiment of the present application provides a high-order video vehicle and a license plate detection device, including:

the device comprises a processor, a memory, an input and output unit and a bus;

the processor is connected with the memory, the input and output unit and the bus;

the processor specifically performs the following operations:

Optionally, the processor is further configured to perform the operations of any of the alternatives of the first aspect.

According to the technical scheme, the embodiment of the application has the following advantages:

the image containing the information of the vehicle to be detected, which is acquired from the high-level video, is input into the trained target model, and then the category and position information data containing the vehicle and the vehicle license plate of the vehicle to be detected are generated according to the detection result of the target model, so that the real-time detection of the vehicle and the license plate in the high-level scene can be realized on the premise of not occupying a large amount of storage space and computing resources.

Drawings

Fig. 1 is a schematic flowchart of an embodiment of a high-order video vehicle and a license plate detection method in an embodiment of the present application;

2-1, 2-2, and 2-3 are schematic flow charts illustrating another embodiment of a high-order video vehicle and a license plate detection method according to an embodiment of the present disclosure;

FIG. 3 is a schematic structural diagram of an embodiment of a high-order video vehicle and a license plate detection device in an embodiment of the present application;

FIG. 4 is a schematic structural diagram of another embodiment of a high-order video vehicle and a license plate detection device according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of another embodiment of a high-order video vehicle and a license plate detection device in an embodiment of the present application.

Detailed Description

In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention, and it is obvious that the described embodiment is only a part of the embodiment of the present invention, and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, shall fall within the scope of the present invention.

The embodiment of the application provides a method and a device for detecting a high-order video vehicle and a license plate, which are used for detecting the vehicle and the license plate in a high-order scene in real time.

In this embodiment, the method for detecting a high-order video vehicle and a license plate can be implemented in a system, a server, or a terminal, and is not specifically limited. For convenience of description, the embodiment of the present application uses the system as an example for the execution subject.

Referring to fig. 1, an embodiment of a method for detecting a high-order video vehicle and a license plate in an embodiment of the present application includes:

101. the system acquires an image containing information of a pre-detected vehicle through a high-order video;

in order to realize accurate timing, charging, parking space monitoring and the like for a user, the type and position information of a vehicle and a vehicle license plate need to be detected in real time, and therefore, the acquisition of an image of the vehicle information needing to be detected is a necessary premise for detecting the type and position information of the vehicle and the vehicle license plate. In order to obtain the integrity of the acquired vehicle image, the system usually obtains a high-level video within a preset time through a high-level camera, and then obtains an image containing the vehicle information to be detected through the high-level video.

102. The system inputs an image containing information of a vehicle to be detected into a target model, wherein the target model is a model obtained by training the learning sample set of the vehicle and the category and position characteristic information of the license plate of the vehicle;

before the system obtains the feature information of the vehicle to be detected, the system needs to input the image, which is acquired through the high-level video and contains the vehicle to be detected, into a target model, the target model is a model trained by learning the feature information of the vehicle image in a sample set, and the acquisition channels of the sample set image are various, and are not limited herein. For example: the system acquires an image containing a white minibus to be detected through a high-order video, inputs the image into a target model, and analyzes the type and position characteristic information of the vehicle and the license plate of the image according to the image by the target model to generate a detection result. The generated detection result includes the position coordinates of the vehicle, the position coordinates of the license plate, and the like.

103. The system generates target information data according to the detection result of the target model, wherein the target information data are the types and the position data of the vehicles and the license plates of the vehicles to be detected.

The system generates target information data according to the detection result generated by the target model in the step 102, for example: the system can preset the length and width range of a vehicle and a license plate, after a target model generates a detection result containing a white minibus image to be detected through analysis, the system calculates the position coordinates of the vehicle and the license plate in the generated detection result to obtain the distance between the vehicle and the license plate and the high-order camera and the length and width proportional length, and determines the position of the minibus vehicle to be detected and the position of the vehicle license plate according to the distance and the proportional length so as to generate target information data.

In this embodiment, the system first obtains an image containing information of a vehicle to be detected from a high-level video, then inputs the image into a target model obtained by training the category and position feature information of the vehicle and the license plate of the vehicle in a learning sample set, and finally generates the category and position data of the vehicle and the license plate of the vehicle to be detected according to a detection result output by the target model. Therefore, the real-time detection of the vehicles and the license plates in the high-level scene can be realized on the premise of not occupying a large amount of storage space and computing resources.

For the purpose of clearly describing the detection method of the high-order video vehicle and the license plate, the embodiment of the present application will be described in detail with reference to fig. 2-1, 2-2, and 2-3.

Referring to fig. 2-1, 2-2 to 2-3, another embodiment of the method for detecting a high-order video vehicle and a license plate in the embodiment of the present application includes:

201. the method comprises the steps that a system obtains a positive sample set and a negative sample set, wherein the positive sample set is an image sample set containing a vehicle, and the negative sample set is an image sample set not containing the vehicle;

202. the system extracts one image from each of the positive sample set and the negative sample set, covers the image extracted from the negative sample set in the non-vehicle position area of the image extracted from the positive sample set to obtain a training image, and generates a training sample set;

the system needs to be trained before inputting the pre-detection image into the target model. The necessary condition for training the model is to input the sample set into the initial model for training, so that a corresponding sample set needs to be obtained. The sample set is divided into a positive sample set and a negative sample set, images contained in the negative sample set are all images which are irrelevant to vehicles and are randomly taken, images in the positive sample set are all images containing vehicles, in order to increase the diversity of the samples and remove the redundancy of the samples, one image needs to be randomly extracted in the positive sample set and the negative sample set respectively, areas except the vehicles which are pre-detected in the images in the positive sample set need to be replaced, the replaced areas are areas with the same size and are randomly cut out from the images in the negative sample to be replaced, so that a new training image is generated, and the training images are integrated to generate a training sample set.

203. The system extracts a training sample from a training sample set as a target detection sample;

in the training process, one training sample is randomly extracted from a training sample set as a target detection sample during each training.

204. The system introduces a scaling factor to a Darknet53 neural network channel, the scaling factor is multiplied by the output of the channel, and a pruning network function is generated by combining the network weight and the scaling factor, wherein the scaling factor is a parameter gamma of a BN layer;

the Darknet53 neural network is a backbone network of a YOLOv3 detection model, in order to obtain a lighter-weight network, a system needs to perform channel pruning on the neural network, so that the neural network can be rapidly converged and obtain better performance, therefore, a parameter gamma of a BN layer needs to be used as a scaling factor of the network pruning, a scaling factor is introduced into each channel and then multiplied by the output of the channel, and a pruning network function is generated by combining network weight and the scaling factor.

The objective function formula of the pruning network is as follows:

where (x, y) represents training sample data, W is a trainable parameter of the network, λ is a balance factor of two terms, and g (γ) ═ γ | is a penalty term, which is added to reduce overfitting.

205. The system generates a pruning training time value;

the system generates an input number value, the input number value represents a target sample which is currently trained, the training times of the pruning network are repeatedly input, each group of target detection samples are trained, the next group of training samples are marked as target detection samples, and the generated input number is reset at the moment.

206. The system judges whether the pruning training times value is greater than 1, if yes, step 207 is executed; if not, go to step 208;

as the diversity of the pruning process needs to be ensured, multiple times of pruning training need to be carried out, so that a system is needed to judge whether the numerical value of the pruning training times is greater than 1, if so, the step 207 is executed; if not, go to step 208.

207. The system judges whether the loss value of the scaling factor is equal to the loss value of the scaling factor calculated in the previous time; if yes, go to step 209; if not, go to step 210;

when the system judges that the pruning training times are more than 1, the system determines that more than one pruning training exists and has no unicity, and can judge whether the loss value of the scaling factor is equal to the loss value of the scaling factor calculated at the previous time; if yes, go to step 209; if not, go to step 210.

208. The system updates the parameters of the BN layer, extracts another training sample from the training sample set, and carries out the following steps: inputting the updated pruning network function, performing loss calculation of the updated scaling factor, and judging whether the updated scaling factor loss value is equal to the scaling factor loss value calculated at the previous time; if yes, go to step 209; if not, go to step 210;

when the system judges that the pruning training times value is not more than 1, determining that only one pruning training exists, if the pruning training has unicity, updating the parameters of the BN layer, carrying out pruning training on the Darknet53 neural network again, and judging whether the updated scaling factor loss value is equal to the scaling factor loss value calculated at the previous time or not; if yes, go to step 209; if not, go to step 210.

209. The system sets a pruning threshold, sorts the channels according to the weight of the BN layer, reserves the channels smaller than the pruning threshold, and performs channel pruning on the channels larger than the pruning threshold;

the system judges whether to jump out of the circulation or not through the input times corresponding to the trained target detection samples, and after a group of target detection samples are input into the pruning network for pruning, the Darknet53 neural network converges, and the process of pruning the neural network channel can be determined to be completed. At this time, the system may perform channel pruning on channels greater than the pruning threshold by setting the pruning threshold. For example: setting a pruning threshold value to be 0.75, determining function convergence when the updated scaling factor loss value is judged to be equal to the scaling factor loss value calculated at the previous time, sequencing according to the weight of the BN layer at the moment, reserving twenty-five percent of channels at the first time, performing channel pruning on seventy-five percent of channels arranged at the later time, and finally obtaining the pruned Darknet53 network.

Step 211 is performed after step 209 is performed.

210. The system updates the parameters of the BN layer according to the loss value of the scaling factor, re-inputs the target detection sample into the updated pruning network function, and performs the following steps: inputting the updated pruning network function, performing loss calculation of the updated scaling factor, and judging whether the updated scaling factor loss value is equal to the scaling factor loss value calculated at the previous time;

when the updated scaling factor loss value is not equal to the scaling factor loss value calculated in the previous time, determining that the function is not converged, and channel training pruning needs to be continued, so that the system needs to update the parameter of the BN layer again according to the scaling factor loss value, re-input the target detection sample into the updated pruning network function, and perform the following steps: inputting the updated pruning network function, carrying out loss calculation of the updated scaling factor, and judging whether the updated scaling factor loss value is equal to the scaling factor loss value calculated at the previous time.

211. The system connects the feature extraction layer of the Darknet53 network subjected to pruning with the SE module;

the SE module mentioned in the embodiment comprises a global polling layer, an FC layer, a Relu layer, an FC layer and a Sigmoid layer, and the SE module is mainly used for learning the correlation among channels and screening the attention of the channels.

The SE module is connected behind the feature extraction layer of the network of the Darknet53 subjected to pruning processing, so that the model can screen out more critical information for the current detection target from a plurality of features and suppress other useless features.

212. The system takes Darknet53 which is subjected to pruning as a basic network to generate an initial model;

213. inputting a target detection sample into an initial model by the system to generate training data;

after the system generates an initial model by using the pruned Darknet53 network as a basic network, the model needs to be further trained, so that target detection samples need to be input into the initial model to generate training data.

214. The system calculates the xywh total loss value of the training data, the image containing the vehicle information and the image not containing the vehicle information;

215. the system calculates confidence total loss values of training data, images containing vehicle information and images not containing vehicle information;

216. the system calculates training data and a category total loss value of the image containing the vehicle information;

217. the system adds the xywh total loss value, the confidence coefficient total loss value and the category total loss value to obtain a total loss value;

after the system inputs the target detection sample into the initial model, multitask loss function calculation is needed according to the sample, and optimization iteration of the model is carried out according to the test result, so that the target model is trained.

The calculation formula of the loss function for multitasking is shown as formula (1):

L_det＝λ₁L₁+λ₂L₂+λ₃L₃+λ₄L₄+L₅formula (1)

Wherein L is₁To account for predicting the object's xywh loss, L₂Not responsible for predicting the object's xywh loss, L₃To be responsible for predicting confidence loss of an object, L₄Not responsible for predicting confidence loss of an object, L₅Is responsible for predicting the class loss of the object.

Wherein L is₁Is as in formula (2):

L₂is as in formula (3):

L₃the calculation formula of (2) is as formula (4):

L₄is as in formula (5):

L₅is as in formula (6):

wherein λ is the weight of the loss value, s represents the divided grid size, B represents box, x and y are the coordinates of the central point of the prediction frame, w and h are the width and height of the prediction frame, Λ is the corresponding label value, conf is the confidence of the prediction frame, p (c) is the prediction category, and the formula (6) uses the cross entropy loss in calculating the category loss. The label here refers to a circumscribed prediction rectangular box of the detection target.

218. The system generates an initial model input numerical value;

the system generates an input number value, the input number value represents a target sample which is currently trained, the initial model is repeatedly input for training, each group of target detection samples are trained, the next group of training samples are marked as the target detection samples, and the generated input number is reset.

219. The system judges whether the input numerical value of the initial model is greater than 1; if yes, go to step 221; if not, go to step 220;

as the diversity of the initial model training process needs to be ensured, multiple times of input training needs to be performed, so that the system needs to judge whether the input times of the initial model is greater than 1, if so, the step 221 is executed; if not, go to step 220.

220. The system updates the initial model parameters according to the total loss value, extracts another training sample from the training sample set, and carries out the following steps: inputting the updated initial model, calculating the total loss value and judging whether the total loss value reaches a preset value;

when the system judges that the input times value of the initial model is not more than 1, the training result of the initial model is determined to have singleness and needs to be trained again, at the moment, the system updates the initial model parameters according to the total loss value, extracts another training sample from the training sample set, and performs the following steps: inputting the updated initial model, calculating the total loss value and judging whether the total loss value reaches a preset value.

221. The system judges whether the total loss value reaches a preset value; if yes, go to step 222; if not, go to step 223;

when the system judges that the input numerical value of the initial model is greater than 1, the training result of the initial model is determined to have diversity, and the system can judge whether the total loss value reaches a preset value; if yes, go to step 222; if not, go to step 223.

222. The system determines the initial model as a target model;

and when the system judges that the total loss value reaches a preset value, determining initial network convergence, and finishing the training of the initial model to be the target model.

223. The system updates the initial model parameters according to the total loss value, re-inputs the target detection sample into the updated initial model, and carries out the following steps: calculating the total loss value and judging whether the total loss value is smaller than a preset value;

when the system judges that the total loss value does not reach the preset value, the initial network is determined not to be converged, the system needs to continuously update the initial model parameters according to the total loss value, the target detection sample is input to the updated initial model again, and the steps are carried out: and calculating the total loss value and judging whether the total loss value is smaller than a preset value.

224. The system acquires high-order video data through a high-order camera;

before formally acquiring an image containing a vehicle to be detected, which is obtained through a high-level video, the system needs to acquire the high-level video through a high-level camera.

225. Intercepting continuous frame images of high-order video data within preset time by a system;

226. the system acquires an image containing information of a pre-detected vehicle through continuous frame images;

in order to determine that the image containing the pre-detected vehicle can be acquired, the system may be configured to capture consecutive frame images of the high-order video data within a certain time period, and then perform recognition analysis on an object in the consecutive frame images, so as to acquire the image containing the information of the pre-detected vehicle.

227. The system inputs an image containing information of a vehicle to be detected into a target model, wherein the target model is a model obtained by training the learning sample set of the vehicle and the category and position characteristic information of the license plate of the vehicle;

228. the system generates target information data according to the detection result of the target model, wherein the target information data are the types and the position data of the vehicles and the license plates of the vehicles to be detected.

Steps

227 and 228 in this embodiment are similar to

steps

102 and 103 in the previous embodiment, and are not described again here.

In this embodiment, before generating an initial model based on the Darknet53 network, the system performs channel pruning on the Darknet53 network to obtain a narrow network, and connects the SE module behind the feature extraction layer, so that the model can screen out information more critical to the current detection task from a plurality of features and suppress other useless features.

As described above, in the embodiment of the present application, the method for detecting a high-order video vehicle and a license plate is described, and then, a device for detecting a high-order video vehicle and a license plate in the embodiment of the present application is described below:

referring to fig. 3, an embodiment of a device for detecting a high-order video vehicle and a license plate in an embodiment of the present application includes:

a first obtaining unit 301, configured to obtain, through a high-level video, an image including information of a pre-detected vehicle, where the information of the pre-detected vehicle includes a vehicle and category and position information of a license plate of the vehicle;

the data input unit 302 is used for inputting an image containing information of a vehicle to be detected into a target model, wherein the target model is a model obtained by training the types and the position characteristic information of the vehicle and the license plate of the vehicle in a learning sample set;

the generating unit 303 is configured to generate target information data according to a detection result of the target model, where the target information data is data of a type and a position of a vehicle of the pre-detected vehicle and a license plate of the vehicle.

In the embodiment of the application, after the first obtaining unit 301 obtains the image including the category and the position information of the vehicle and the license plate, the image is input to the target model through the data input unit 302, and after the target model outputs the detection result, the generation unit 303 generates the target information data from the detection result output by the target model. Therefore, on the premise of not occupying a large amount of storage space and computing resources, images needing to be detected are input into the target model, the target model is obtained by training the learning sample set vehicle and vehicle license plate category and position characteristic information, corresponding detection results can be output, required information data are generated according to the detection results, and the real-time detection of the category and position information of the high-order video vehicle and the license plate is achieved.

Referring to fig. 4, another embodiment of a device for detecting a high-order video vehicle and a license plate in an embodiment of the present application includes:

a second obtaining unit 401, configured to obtain a positive sample set and a negative sample set, where the positive sample set is an image sample set including a vehicle, and the negative sample set is an image sample set not including a vehicle;

a sample generation unit 402, configured to extract one image from each of the positive sample set and the negative sample set, cover the image extracted from the negative sample set in a non-vehicle position area of the image extracted from the positive sample set, obtain a training image, and generate a training sample set;

a sample extracting unit 403, configured to extract a training sample from the training sample set as a target detection sample;

a channel pruning unit 404, configured to prune the Darknet53 neural network in a channel pruning manner;

the connection unit 405 is configured to connect the feature extraction layer of the pruning-processed Darknet53 network with an SE module, where the SE module is configured to learn correlation between channels and filter feature pertinence of the channels;

a model generating unit 406, configured to generate an initial model using the pruned Darknet53 as a base network;

a sample input unit 407, configured to input a target detection sample into an initial model to generate training data, where the initial model is a model established based on a neural network, and the training data is data including a vehicle and category and position information of a license plate of the vehicle;

a calculation unit 408 for calculating a total loss value from the training data, the image containing the vehicle information, and the image not containing the vehicle information;

an input frequency generation unit 409, configured to generate an initial model input frequency value, where the initial model input frequency value is a frequency for inputting a target detection sample into an initial model;

a first judging unit 410, configured to judge whether the initial model input order value is greater than 1;

a first executing unit 411, configured to, when the first determining unit 410 determines that the initial model input order value is not greater than 1, update the initial model parameters according to the total loss value, extract another training sample from the training sample set, and perform the following steps: inputting the updated initial model, calculating the total loss value and judging whether the total loss value reaches a preset value;

a second judging unit 412, configured to, when the first judging unit 411 determines that the initial model input order value is greater than 1, judge whether the total loss value reaches a preset value;

a third executing unit 413, configured to, when the second determining unit 412 determines that the total loss value does not reach the preset value, update the initial model parameters according to the total loss value, re-input the target detection sample into the updated initial model, and perform the following steps: calculating the total loss value and judging whether the total loss value is smaller than a preset value;

a second executing unit 414, configured to determine that the initial model is the target model when the second determining unit 412 determines that the total loss value reaches the preset value;

a first obtaining unit 415, configured to obtain, through the high-level video, an image including information of a vehicle to be detected, where the information of the vehicle to be detected includes a category and a position of the vehicle and a license plate of the vehicle;

a data input unit 416, configured to input an image including information of a vehicle to be detected into a target model, where the target model is a model obtained by training classes and position feature information of vehicles and license plates of the vehicles in a learning sample set;

the generating unit 417 is configured to generate target information data according to a detection result of the target model, where the target information data is data of a vehicle of the pre-detected vehicle, and a type and a position of a license plate of the vehicle.

In this embodiment, the channel pruning unit 404 includes a parameter processing module 4041, a pruning frequency generating module 4042, a pruning frequency determining module 4043, a fourth executing module 4044, a third determining and executing module 4045, a sixth executing module 4047, and a fifth executing module 4046.

The parameter processing module 4041 is configured to introduce a scaling factor to the Darknet53 neural network channel, multiply the scaling factor by an output of the channel, and combine the network weight and the scaling factor to generate a pruning network function, where the scaling factor is a parameter γ of the BN layer;

a pruning frequency generation module 4042, configured to generate a pruning training frequency value, where the pruning training frequency value is a frequency at which the target detection sample is input to the pruning network function;

the pruning times judging module 4043 is used for judging whether the pruning training times value is greater than 1;

a fourth executing module 4044, configured to determine, when the pruning times judging module 4043 determines that the pruning training times value is greater than 1, whether the loss value of the scaling factor is equal to the scaling factor loss value calculated in the previous time;

a third judgment execution module 4045, configured to update the parameter γ of the BN layer and extract another training sample from the training sample set when the pruning times judgment module 4043 determines that the pruning training times value is not greater than 1, and perform the following steps: inputting the updated pruning network function, performing loss calculation of the updated scaling factor, and judging whether the updated scaling factor loss value is equal to the scaling factor loss value calculated at the previous time;

a fifth executing module 4046, configured to set a pruning threshold when the third determining and executing module 4045 determines that the updated scaling factor loss value is equal to the scaling factor loss value calculated at the previous time, sort according to the weight of the BN layer, reserve channels smaller than the pruning threshold, and perform channel pruning on channels larger than the pruning threshold;

a sixth executing module 4047, configured to, when the third determining and executing module 4045 determines that the updated scaling factor loss value is not equal to the scaling factor loss value calculated in the previous time, update the parameter γ of the BN layer according to the scaling factor loss value, re-input the target detection sample to the updated pruning network function, and perform the following steps: inputting the updated pruning network function, carrying out loss calculation of the updated scaling factor, and judging whether the updated scaling factor loss value is equal to the scaling factor loss value calculated at the previous time.

In this embodiment, the first obtaining unit 415 includes a second obtaining module 4151, a truncated image module 4152, and a third obtaining module 4153.

A second obtaining module 4151, configured to obtain high-order video data through the high-order camera;

an image capturing module 4152 for capturing successive frame images of the high-order video data within a preset time;

a third obtaining module 4153 for obtaining an image containing information of the pre-detected vehicle through the consecutive frame images.

In the above embodiment, the functions of each unit and each module correspond to the steps in the embodiment shown in fig. 2, and are not described herein again.

Referring to fig. 5, the high-level video vehicle and the license plate detection apparatus in the embodiment of the present application are described in detail below, and another embodiment of the high-level video vehicle and the license plate detection apparatus in the embodiment of the present application includes:

a processor 501, a memory 502, an input/output unit 503, and a bus 504;

the processor 501 is connected to the memory 502, the input/output unit 503, and the bus 504;

the processor 501 specifically executes the following operations:

acquiring an image containing information of a pre-detected vehicle through a high-order video, wherein the information of the pre-detected vehicle comprises the vehicle and the category and position information of a license plate of the vehicle;

inputting an image containing information of a vehicle to be detected into a target model, wherein the target model is a model obtained by training the learning sample set of vehicles and the category and position characteristic information of the license plate of the vehicle;

and generating target information data according to the detection result of the target model, wherein the target information data are the types and the positions of the vehicles and the license plates of the vehicles to be detected.

In this embodiment, the functions of the processor 501 correspond to the steps in the embodiments described in fig. 1 to fig. 4, and are not described herein again.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and the like.

Claims

1. A detection method for high-order video vehicles and license plates is characterized by comprising the following steps:

2. The detection method according to claim 1, wherein before the image containing the information of the vehicle to be detected is obtained through the high-order video, the detection method further comprises:

if yes, judging whether the total loss value reaches a preset value;

and if so, determining the initial model as a target model.

3. The detection method according to claim 2, wherein after determining whether the total loss value reaches a preset value, the detection method further comprises:

4. The detection method according to claim 2, wherein the calculating a total loss value from the training data, the image containing vehicle information, and the image not containing vehicle information includes:

5. The detection method according to claim 3, wherein before inputting the target detection sample into initial model generation training data, the detection method further comprises:

pruning the Darknet53 neural network by using a channel pruning mode;

6. The detection method according to claim 5, wherein the pruning of the Darknet53 neural network by using a channel pruning method comprises:

7. The detection method according to claim 6, wherein after determining whether the updated scaling factor loss value is equal to the previously calculated scaling factor loss value, the detection method further comprises:

8. The detection method according to claim 7, wherein before generating the initial model by using the pruned Darknet53 as the base network, the detection method further comprises:

9. The detection method according to any one of claims 1 to 8, wherein the obtaining of the image containing the information of the pre-detected vehicle through the high-order video comprises:

acquiring high-order video data through a high-order camera;

10. The utility model provides a detection apparatus of high-order video vehicle and license plate which characterized in that includes: