CN110610123A

CN110610123A - Multi-target vehicle detection method and device, electronic equipment and storage medium

Info

Publication number: CN110610123A
Application number: CN201910614995.3A
Authority: CN
Inventors: 傅慧源; 马华东; 耿欢; 关俊
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2019-07-09
Filing date: 2019-07-09
Publication date: 2019-12-24

Abstract

The embodiment of the application provides a multi-target vehicle detection method, a multi-target vehicle detection device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring an image to be detected; performing feature extraction on an image to be detected by using a pre-trained feature extraction network of a machine learning algorithm to obtain at least two feature images, wherein the feature images are different in size; respectively analyzing the feature images with corresponding sizes by utilizing at least two candidate region generators of a pre-trained machine learning algorithm to obtain a predicted vehicle frame in each feature image; and analyzing the regions in the predicted vehicle frame in each characteristic image by using a pre-trained target classification network of a machine learning algorithm to obtain a vehicle detection result of the image to be detected. According to the multi-target vehicle detection method, the characteristic images of the multiple sizes are analyzed, so that the vehicles of different sizes can be identified, and the success rate of identifying the vehicles of the multiple sizes is improved.

Description

Multi-target vehicle detection method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method and an apparatus for detecting multiple targets of a vehicle, an electronic device, and a storage medium.

Background

The target detection is always an important component and challenge in the field of computer vision, and in recent years, with the rapid development of multimedia technology and the continuous update of image processing technology, a great deal of excellent results appear in the field of target detection. Vehicle detection in a traffic scene is used as a precondition for vehicle tracking and a basis for vehicle abnormal behavior analysis, and becomes a popular content in research fields such as intelligent traffic and automatic driving.

With the advent of machine learning algorithms, particularly convolutional neural networks, computer vision techniques have evolved at a rapid pace. In the prior art, a pre-trained convolutional neural network is used for extracting a characteristic diagram of an image to be detected, and the position of a vehicle in the image to be detected is determined in a pooling classification mode. However, in the prior art, only fixed-size vehicles can be recognized, and the success rate of recognizing multi-size vehicles is low.

Disclosure of Invention

The embodiment of the application aims to provide a multi-target vehicle detection method, a multi-target vehicle detection device, electronic equipment and a storage medium, so that the success rate of identifying multi-size vehicles is improved. The specific technical scheme is as follows:

in a first aspect, an embodiment of the present application discloses a multi-target vehicle detection method, including:

acquiring an image to be detected;

performing feature extraction on the image to be detected by using a pre-trained feature extraction network of a machine learning algorithm to obtain at least two feature images, wherein the feature images are different in size;

respectively analyzing the feature images with corresponding sizes by utilizing at least two candidate region generators of the pre-trained machine learning algorithm to obtain a predicted vehicle frame in each feature image;

and analyzing the regions in the predicted vehicle frames in the characteristic images by using the pre-trained target classification network of the machine learning algorithm to obtain the vehicle detection result of the image to be detected.

Optionally, the at least two feature images include a shallow feature image and a deep feature image, the size of the shallow feature image is larger than that of the deep feature image, and the at least two candidate region generators include a shallow candidate region generator and a deep candidate region generator;

the at least two candidate region generators using the pre-trained machine learning algorithm respectively analyze the feature images with corresponding sizes to obtain the predicted vehicle frame in each feature image, and the method comprises the following steps:

analyzing the shallow feature image through the shallow candidate region generator to obtain a predicted vehicle frame of the shallow feature image;

and analyzing the deep characteristic image through the deep candidate region generator to obtain a predicted vehicle frame of the deep characteristic image.

Optionally, the feature extraction network includes four convolutional layers, the feature extraction network using the pre-trained machine learning algorithm performs feature extraction on the image to be detected to obtain at least two feature images, including:

and utilizing a pre-trained feature extraction network of a machine learning algorithm to extract features of the image to be detected, acquiring the shallow feature image from a second layer of convolution layer of the feature extraction network, and acquiring the deep feature image from a fourth layer of convolution layer of the feature extraction network.

Optionally, the size of the shallow feature image is 1/4 of the size of the image to be detected, and the size of the deep feature image is 1/16 of the size of the image to be detected.

Optionally, the at least two feature images include a first feature image, a second feature image and a third feature image, a size of the first feature image is larger than a size of the second feature image, a size of the second feature image is larger than a size of the third feature image, and the at least two candidate region generators include a first candidate region generator, a second candidate region generator and a third candidate region generator;

analyzing the first characteristic image through the first candidate region generator to obtain a predicted vehicle frame of the first characteristic image;

analyzing the second characteristic image through the second candidate region generator to obtain a predicted vehicle frame of the second characteristic image;

and analyzing the third characteristic image through the third candidate region generator to obtain a predicted vehicle frame of the third characteristic image.

and utilizing a pre-trained feature extraction network of a machine learning algorithm to extract features of the image to be detected, acquiring the first feature image from a second layer of convolution layer of the feature extraction network, acquiring the second feature image from a third layer of convolution layer of the feature extraction network, and acquiring the third feature image from a fourth layer of convolution layer of the feature extraction network.

Optionally, the size of the first feature image is 1/4 of the size of the image to be detected, the size of the second feature image is 1/8 of the size of the image to be detected, and the size of the third feature image is 1/16 of the size of the image to be detected.

Optionally, the candidate region generator is an RPN, and the target classification network is an RCNN.

In a second aspect, an embodiment of the present application provides a multi-target vehicle detection apparatus, including:

the image acquisition module is used for acquiring an image to be detected;

the characteristic extraction module is used for extracting the characteristics of the image to be detected by utilizing a characteristic extraction network of a pre-trained machine learning algorithm to obtain at least two characteristic images, wherein the sizes of the characteristic images are different;

the detection frame prediction module is used for analyzing the feature images with corresponding sizes respectively by utilizing at least two candidate region generators of the pre-trained machine learning algorithm to obtain a predicted vehicle frame in each feature image;

and the detection result generation module is used for analyzing the regions in the predicted vehicle frames in the characteristic images by utilizing the pre-trained target classification network of the machine learning algorithm to obtain the vehicle detection results of the images to be detected.

the detection frame prediction module is specifically configured to:

Optionally, the feature extraction network includes four convolutional layers, and the feature extraction module is specifically configured to:

the detection frame prediction module is specifically configured to:

In a third aspect, an embodiment of the present application further discloses an electronic device, which includes a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete mutual communication through the communication bus;

the memory is used for storing a computer program;

the processor is configured to implement the multi-target vehicle detection method according to any one of the first aspect described above when executing the program stored in the memory.

In a fourth aspect, an embodiment of the present application further discloses a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the method for detecting multiple target vehicles according to any one of the above first aspects is implemented.

The embodiment of the application provides a multi-target vehicle detection method, a multi-target vehicle detection device, electronic equipment and a storage medium, and an image to be detected is obtained; performing feature extraction on an image to be detected by using a pre-trained feature extraction network of a machine learning algorithm to obtain at least two feature images, wherein the feature images are different in size; respectively analyzing the feature images with corresponding sizes by utilizing at least two candidate region generators of a pre-trained machine learning algorithm to obtain a predicted vehicle frame in each feature image; and analyzing the regions in the predicted vehicle frame in each characteristic image by using a pre-trained target classification network of a machine learning algorithm to obtain a vehicle detection result of the image to be detected. By analyzing the feature images of multiple sizes, the recognition of vehicles with different sizes can be realized, the recognition success rate of vehicles with multiple sizes is improved, and the predicted vehicle frames obtained by different candidate area generators are classified through the same target classification network, so that the complexity of the network can be reduced.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic diagram of a multi-target vehicle detection method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a feature extraction network according to an embodiment of the present application;

FIG. 3 is another schematic diagram of a multi-target vehicle detection method of an embodiment of the present application;

FIG. 4 is a schematic diagram of a multi-target vehicle detection apparatus according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In the traditional deep learning detection network, calculation is carried out on a characteristic diagram obtained after multi-layer characteristic extraction according to a calculated characteristic value, so that the position prediction of a vehicle is realized, and in the process, the pixel value of the vehicle is continuously compressed and abstracted. Based on the above situation, for a vehicle target with a small size, after multi-layer feature extraction, the obtained feature height extraction is small in number, so that interference is brought to correct detection. In an actual urban road, the shooting of monitoring camera shooting is often from far to near. At the moment, the vehicle has various sizes, and a classical target detection model is difficult to simultaneously detect large and small targets, so that the problems of low detection accuracy and the like are caused.

It will be appreciated that for small target vehicles, features are efficiently extracted in the shallower feature map while retaining a greater number. Smaller targets are therefore easier to detect on shallow features and larger targets are easier to detect on deep features.

In order to realize the simultaneous detection of vehicle targets with different sizes, the embodiment of the application provides a multi-target vehicle detection method, and with reference to fig. 1, the method comprises the following steps:

and S101, acquiring an image to be detected.

The multi-target vehicle detection method can be realized through electronic equipment, and specifically, the electronic equipment can be an intelligent camera, a hard disk video recorder or a server and the like.

The electronic equipment acquires an image to be detected, the size of the image to be detected is a preset size, and if the size of the image to be detected is not the preset size, the size of the image to be detected needs to be adjusted to be the size of the preset size. The preset size is the same as the size of the sample image of the pre-trained machine learning algorithm and can be set according to actual conditions.

S102, extracting the features of the image to be detected by using a pre-trained feature extraction network of a machine learning algorithm to obtain at least two feature images, wherein the feature images are different in size.

Different from the prior art that only one scale of feature image is obtained, the feature images of multiple sizes of the image to be detected are extracted from the feature extraction network of the pre-trained machine learning algorithm in the embodiment of the application.

When the at least two feature images include a shallow feature image and a deep feature image, optionally, the feature extraction network includes four convolutional layers, and the feature extraction network using a pre-trained machine learning algorithm performs feature extraction on the image to be detected to obtain at least two feature images, including:

and extracting the features of the image to be detected by using a pre-trained feature extraction network of a machine learning algorithm, acquiring the shallow feature image from a second layer of convolution layer of the feature extraction network, and acquiring the deep feature image from a fourth layer of convolution layer of the feature extraction network.

The feature extraction network comprises four convolutional layers, the size of the image is reduced after each convolutional layer, and the feature depth is deepened. For example, as shown in fig. 2, the image size becomes 1/2 of the original image size after passing through the first layer of convolutional layer, 1/4 of the original image size after passing through the second layer of convolutional layer, 1/8 of the original image size after passing through the third layer of convolutional layer, and 1/16 of the original image size after passing through the fourth layer of convolutional layer.

Extracting the characteristic image acquired under the convolution of the second convolution layer to obtain a shallow characteristic image; and extracting the feature image acquired under the convolution of the fourth convolution layer to obtain a deep feature image. It will be appreciated that for small target vehicles, features are efficiently extracted in the shallower feature map while retaining a greater number. Smaller targets are therefore easier to detect on shallow features and larger targets are easier to detect on deep features. Therefore, the detection frame of the vehicle with a smaller size is extracted from the shallow feature image, and the detection frame of the vehicle with a larger size is extracted from the deep feature image.

Optionally, the size of the shallow feature image is 1/4 of the size of the to-be-detected image, and the size of the deep feature image is 1/16 of the size of the to-be-detected image.

The size of the shallow feature image is 1/4 of the size of the image to be detected, namely 1/4 of the size of the shallow feature image is a preset size; the size of the deep layer feature image is 1/16 of the size of the image to be detected, i.e., 1/16 of the size of the deep layer feature image is a preset size. It can be seen that the feature map with the size of one sixteenth is obtained by calculating the feature map with the size of one fourth, two candidate region generators are fused in one network, partial calculation results can be shared, and one-time input completion detection is realized.

The inventor researches and discovers that when the size of the shallow feature image is 1/4 of the size of the image to be detected and the size of the deep feature image is 1/16 of the size of the image to be detected, the detection of large-size and small-size vehicles is considered at the same time, and the complexity of a machine learning algorithm is low.

In a possible implementation manner when a complex scene is faced, the feature extraction network includes four convolutional layers, and the feature extraction network using a pre-trained machine learning algorithm performs feature extraction on the image to be detected to obtain at least two feature images, including:

the image to be detected is subjected to feature extraction by using a feature extraction network of a pre-trained machine learning algorithm, the first feature image is acquired from a second layer convolution layer of the feature extraction network, the second feature image is acquired from a third layer convolution layer of the feature extraction network, and the third feature image is acquired from a fourth layer convolution layer of the feature extraction network.

Optionally, the size of the first feature image is 1/4 of the size of the to-be-detected image, the size of the second feature image is 1/8 of the size of the to-be-detected image, and the size of the third feature image is 1/16 of the size of the to-be-detected image. The inventor researches and finds that when the size of the first characteristic image is 1/4, the size of the second characteristic image is 1/8 and the size of the third characteristic image is 1/16, the method is applicable to scenes with complex backgrounds and has a good detection effect.

And S103, analyzing the feature images with corresponding sizes respectively by using at least two candidate region generators of the pre-trained machine learning algorithm to obtain the predicted vehicle frame in each feature image.

The feature image of each size corresponds to one candidate region generator, and in the training process, the candidate region generator conducts training through the feature images of the corresponding sizes of the sample pictures.

In one possible embodiment, the at least two feature images include a shallow feature image and a deep feature image, the shallow feature image has a size larger than that of the deep feature image, and the at least two candidate region generators include a shallow candidate region generator and a deep candidate region generator;

the above-mentioned at least two candidate region generators using the above-mentioned machine learning algorithm trained in advance, analyze the characteristic image of the corresponding size separately, get the vehicle frame of prediction in every above-mentioned characteristic image, including:

and step one, analyzing the shallow feature image through the shallow candidate area generator to obtain a predicted vehicle frame of the shallow feature image.

And step two, analyzing the deep characteristic image through the deep candidate region generator to obtain a predicted vehicle frame of the deep characteristic image.

In one possible embodiment, the at least two feature images include a first feature image, a second feature image, and a third feature image, the first feature image has a size larger than that of the second feature image, the second feature image has a size larger than that of the third feature image, and the at least two candidate region generators include a first candidate region generator, a second candidate region generator, and a third candidate region generator;

the first candidate region generator analyzes the first feature image to obtain a predicted vehicle frame of the first feature image.

And step two, analyzing the second characteristic image through the second candidate region generator to obtain a predicted vehicle frame of the second characteristic image.

And step three, analyzing the third characteristic image through the third candidate region generator to obtain a predicted vehicle frame of the third characteristic image.

In the training process, the shallow characteristic image is used for training the shallow characteristic image of the sample image, and the deep characteristic image is used for training the deep characteristic image of the sample image. Specifically, the candidate Region generator may be an RPN (Region recommendation Network).

And S104, analyzing the regions in the predicted vehicle frames in the characteristic images by using the pre-trained target classification network of the machine learning algorithm to obtain the vehicle detection results of the images to be detected.

And utilizing a pre-trained target classification network of a machine learning algorithm to classify the regions in the predicted vehicle frames in the characteristic images respectively, judging whether the regions in the predicted vehicle frames are vehicles or not, mapping the predicted vehicle frames to corresponding positions in the images to be detected if the regions in the predicted vehicle frames are vehicles for any one of the predicted vehicle frames, and discarding the predicted vehicle frames if the regions in the predicted vehicle frames are not vehicles. Specifically, the target classification network may be an RCNN (regional convolutional Neural Networks) network.

Since the predicted vehicle frames generated by each candidate region generator may be the same, in one possible embodiment, the repeated vehicle detection results in each selected region generator are eliminated by non-maximum suppression, so as to obtain the vehicle detection results of the image to be detected.

In the embodiment of the application, the feature images of multiple sizes are analyzed, so that the vehicles with different sizes can be identified, the success rate of identifying the vehicles with multiple sizes is improved, the predicted vehicle frames obtained by different candidate area generators are classified through the same target classification network, and compared with the situation that one classification network is configured for each candidate area generator, the complexity of the network can be reduced.

In one possible implementation, a multi-target vehicle detection method provided by the embodiment of the present application may be as shown in fig. 3, and includes:

step one, an image to be detected is obtained.

Step two, extracting the features of the image to be detected by utilizing a pre-trained feature extraction network of a machine learning algorithm, acquiring a shallow feature image from a second layer of convolution layer of the feature extraction network, and acquiring a deep feature image from a fourth layer of convolution layer of the feature extraction network, wherein the feature extraction network comprises four layers of convolution layers; the size of the shallow feature image is 1/4 of the size of the image to be detected, and the size of the deep feature image is 1/16 of the size of the image to be detected.

Analyzing the shallow feature image through a shallow candidate region generator of a pre-trained machine learning algorithm to obtain a predicted vehicle frame of the shallow feature image; and analyzing the deep characteristic image through a deep candidate region generator of a pre-trained machine learning algorithm to obtain a predicted vehicle frame of the deep characteristic image.

And step four, analyzing the regions in the predicted vehicle frame of the shallow characteristic image and the predicted vehicle frame of the deep characteristic image respectively by using a pre-trained target classification network of a machine learning algorithm to obtain the vehicle detection result of the image to be detected.

In the embodiment of the application, the feature map with the size of one sixteenth is obtained by calculating the feature map with the size of one fourth, two candidate region generators are fused in one network, partial calculation results can be shared, and one-time input detection is achieved. By analyzing the feature images of the multiple sizes, the recognition of vehicles with different sizes can be achieved, the recognition success rate of the vehicles with the multiple sizes is improved, the predicted vehicle frames obtained by the different candidate area generators are classified through the same target classification network, and compared with the situation that one classification network is configured for each candidate area generator, the complexity of the network can be reduced.

In the following, a process of training a machine learning algorithm in advance is described by taking two candidate region generators as an example:

and step A, constructing an initial machine learning algorithm, which comprises a shared feature extraction network, two candidate region generators and a target classification network.

The main Network of the machine learning algorithm is realized by using a MobileNet, which is a lightweight real-time high-performance Network structure, and multiple experiments by the inventor find that the method can be well applied to detection of multiple size targets, and of course, other types of neural networks such as VGG (Visual Geometry Group Network) 16 and rescen can be used. The shared feature extraction network is composed of a plurality of layers of convolution layers, and each image passes through one layer of convolution layer to obtain a feature sample with a specific size, so that the size of the finally obtained feature sample is reduced and the finally obtained feature sample contains deep features of the original image. Specifically, the feature extraction network comprises four convolutional layers, one candidate region generator is cascaded with the second convolutional layer, and the other candidate region generator is cascaded with the fourth convolutional layer. The candidate region generator may be an RPN and the target classification network may be an RCNN.

And B, acquiring a plurality of sample images, and labeling the vehicles in each sample image, wherein the vehicles in the sample images are different in size.

The sample images may be from real road monitoring scene images, the size of the vehicles in the plurality of sample images should be as large as possible, and the vehicle frame is labeled for the vehicle in each sample image.

And step C, inputting the marked sample image into an initial machine learning algorithm model for training to obtain a pre-trained machine learning algorithm.

And adjusting the shared characteristic extraction network parameters according to the difference between the target possibly existing area and the manually marked group Truth (vehicle frame) generated by the candidate area generator. And when the iteration times reach a first preset time, obtaining the feature extraction network. After the 'region of interest' (the region of the predicted vehicle frame) generated by the candidate region generator based on the multi-scale feature map is subjected to the roi-posing operation, the pooled feature samples are generated. And after each pooling characteristic sample is processed by a target classification network, a prediction result of the vehicle is obtained. And adjusting parameters of the shared feature extraction network and the detector network according to the difference between the predicted area of the target classification network and the manually marked group Truth. And when the iteration times reach a second preset time, obtaining the feature extraction network and the detector network.

The embodiment of the application provides a multi-target vehicle detection device, refer to fig. 4, and the device includes:

an image obtaining module 401, configured to obtain an image to be detected;

a feature extraction module 402, configured to perform feature extraction on the image to be detected by using a pre-trained feature extraction network of a machine learning algorithm to obtain at least two feature images, where sizes of the feature images are different;

a detection frame prediction module 403, configured to analyze feature images with corresponding sizes respectively by using at least two candidate region generators of the pre-trained machine learning algorithm to obtain a predicted vehicle frame in each of the feature images;

a detection result generating module 404, configured to analyze, by using the pre-trained target classification network of the machine learning algorithm, a region in the predicted vehicle frame in each of the feature images to obtain a vehicle detection result of the image to be detected.

the detection frame prediction module 403 is specifically configured to:

analyzing the shallow feature image through the shallow candidate area generator to obtain a predicted vehicle frame of the shallow feature image;

Optionally, the feature extraction network includes four convolutional layers, and the feature extraction module 402 is specifically configured to:

Optionally, the at least two feature images include a first feature image, a second feature image, and a third feature image, a size of the first feature image is larger than a size of the second feature image, a size of the second feature image is larger than a size of the third feature image, and the at least two candidate region generators include a first candidate region generator, a second candidate region generator, and a third candidate region generator;

the detection frame prediction module 403 is specifically configured to:

and analyzing the third feature image by the third candidate region generator to obtain a predicted vehicle frame of the third feature image.

Optionally, the size of the first feature image is 1/4 of the size of the to-be-detected image, the size of the second feature image is 1/8 of the size of the to-be-detected image, and the size of the third feature image is 1/16 of the size of the to-be-detected image.

The embodiment of the application also discloses an electronic device, as shown in fig. 5. Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application, including a processor 501, a communication interface 502, a memory 503 and a communication bus 504, where the processor 501, the communication interface 502 and the memory 503 complete communication with each other through the communication bus 504;

a memory 503 for storing a computer program;

the processor 501 is configured to implement the following method steps when executing the program stored in the memory 503:

acquiring an image to be detected;

Optionally, the processor 501, when being configured to execute the program stored in the memory 503, may further implement any one of the above-described multi-target vehicle detection methods.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus 604 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.

In another aspect, the present application further discloses a computer-readable storage medium, in which a computer program is stored, and when being executed by a processor, the computer program implements the method steps of any one of the above-mentioned multi-target vehicle detection methods.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only for the preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims

1. A multi-target vehicle detection method, the method comprising:

acquiring an image to be detected;

2. The method of claim 1, wherein the at least two feature images comprise a shallow feature image and a deep feature image, the shallow feature image having a size larger than the deep feature image, and the at least two candidate region generators comprise a shallow candidate region generator and a deep candidate region generator;

3. The method of claim 2, wherein the feature extraction network comprises four convolutional layers, and the feature extraction of the image to be detected by using the pre-trained feature extraction network of the machine learning algorithm to obtain at least two feature images comprises:

4. The method of claim 3, wherein the size of the shallow feature image is 1/4 of the size of the image to be detected, and the size of the deep feature image is 1/16 of the size of the image to be detected.

5. The method of claim 1, wherein the at least two feature images comprise a first feature image, a second feature image, and a third feature image, wherein the size of the first feature image is larger than the size of the second feature image, wherein the size of the second feature image is larger than the size of the third feature image, and wherein the at least two candidate region generators comprise a first candidate region generator, a second candidate region generator, and a third candidate region generator;

6. The method of claim 5, wherein the feature extraction network comprises four convolutional layers, and the feature extraction of the image to be detected by using the pre-trained feature extraction network of the machine learning algorithm to obtain at least two feature images comprises:

7. The method according to claim 6, wherein the size of the first feature image is 1/4 of the size of the image to be detected, the size of the second feature image is 1/8 of the size of the image to be detected, and the size of the third feature image is 1/16 of the size of the image to be detected.

8. A multi-target vehicle detection apparatus, the apparatus comprising:

the image acquisition module is used for acquiring an image to be detected;

9. An electronic device, comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete communication with each other through the communication bus;

the memory is used for storing a computer program;

the processor, when executing the program stored in the memory, implementing the method steps of any of claims 1-7.

10. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 7.