CN114495571A

CN114495571A - Parking space state detection method and device based on cross-layer coupling network and storage medium

Info

Publication number: CN114495571A
Application number: CN202210402663.0A
Authority: CN
Inventors: 张超; 张波
Original assignee: University of Science and Technology Beijing USTB; Innotitan Intelligent Equipment Technology Tianjin Co Ltd
Current assignee: University of Science and Technology Beijing USTB; Innotitan Intelligent Equipment Technology Tianjin Co Ltd
Priority date: 2022-04-18
Filing date: 2022-04-18
Publication date: 2022-05-13
Anticipated expiration: 2042-04-18
Also published as: CN114495571B

Abstract

The invention provides a parking space state detection method, a device and a storage medium based on a cross-layer coupling network, relates to the technical field of parking space detection, and aims to solve the technical problem that the detection technology in the prior art is difficult to achieve good detection effects on parking spaces and vehicle targets under small sizes, different shooting angles and even imaging distortion. The invention is used for improving the detection capability of the cross-layer coupling network parking space and the vehicle.

Description

Parking space state detection method and device based on cross-layer coupling network and storage medium

Technical Field

The invention relates to the technical field of parking space detection, in particular to a parking space state detection method and device based on a cross-layer coupling network and a storage medium.

Background

Nowadays, the urban traffic development and public order maintenance are influenced by the lack of parking resources and a series of problems such as traffic jam, illegal parking, parking disputes and the like caused by the lack of parking resources, so that the position and the number of vacant parking spaces in a parking lot are detected, a driver is helped to park quickly, and the queuing and the jam conditions of the parking lot can be effectively improved.

The current target detection technology based on deep learning can automatically identify and locate the target, the technology has the advantages of high detection precision, low application cost and the like, however, most of the existing parking space detection algorithms based on deep learning adopt a general target detection network to detect the parking space, the pertinence is not strong, and good detection effects are difficult to obtain for the parking spaces and vehicle targets under small sizes and different shooting angles even imaging distortion, so that the method for detecting the parking space state accurately and reliably aiming at the self characteristics of parking space detection tasks is urgently needed.

Disclosure of Invention

The invention aims to provide a parking space state detection method based on a cross-layer coupling network, and aims to solve the technical problem that the detection technology in the prior art is difficult to obtain good detection effects on parking spaces and vehicle targets under different shooting angles. The technical effects that can be produced by the preferred technical scheme in the technical schemes provided by the invention are described in detail in the following.

In order to achieve the purpose, the invention provides the following technical scheme:

the invention provides a parking space state detection method based on a cross-layer coupling network, which is characterized by comprising the following steps: the method comprises the following steps:

step 100: constructing a parking space state detection data set;

step 200: designing a cross-layer coupling network;

step 300: training and testing the cross-layer coupling network through the parking space state detection data set to obtain a cross-layer coupling model;

step 400: and inputting the image of the parking lot to be detected into the cross-layer coupling model to obtain the number and coordinate information of empty parking spaces and parked parking spaces.

Preferably, in step 100, the specific steps of constructing the parking space state detection data set are as follows:

step 101: collecting images of a plurality of parking lots, carrying out vacant parking space labeling and parked vehicle labeling on the images, and generating a labeling file corresponding to each labeled image;

the specific meaning of the vacant parking space marking and the parked vehicle marking is that the regions of the vacant parking spaces and the parked parking spaces in the image are circled through software, and a label is set for each circled region.

Step 102: and obtaining a parking space state detection data set based on each marked image and the corresponding marking file thereof, and dividing the parking space state detection data set into a training set and a test set according to a proportion.

Preferably, in step 200, the specific steps of designing the cross-layer coupling network are:

step 201: selecting a dense connection network as a backbone network to acquire output characteristic graphs of the parking spaces and the vehicles;

the parking space state detection needs to determine a backbone network for extracting parking space characteristics and vehicle characteristics from the image, and compared with other backbone networks, the DenseNet (dense connection network) has the advantages of high characteristic utilization rate, reduced gradient disappearance, relatively small parameter quantity and the like.

Step 202: carrying out characteristic coupling on output characteristic graphs with adjacent characteristic graphs to obtain a coupling characteristic graph;

in the neural network, the high-level characteristic diagram contains more semantic information which is helpful for target classification than the low-level characteristic diagram, the low-level characteristic diagram contains more detail information which is helpful for accurate positioning than the high-level characteristic diagram, wherein the high-level characteristic diagram and the bottom layer are relative concepts, therefore, the coupling of the low-level characteristic diagram and the high-level characteristic diagram can enhance the information expression capability of the characteristic diagram and obtain better detection effect on the parking spaces with fuzzy parking space lines and smaller sizes. In addition, the correlation of the feature information between the adjacent feature layers is strong, so that the feature coupling is carried out based on the adjacent layers, and the original feature representation cannot be interfered and damaged.

Step 203: and sequentially inputting the coupling characteristic diagram into a regional suggestion network and an ROI Align layer to obtain candidate target regions of parking spaces and vehicles, and inputting each candidate target region into a regression branch and a classification branch of fast RCNN to obtain coordinate information and category information of the candidate target regions.

Preferably, the feature coupling of the output feature maps with neighboring feature maps comprises the steps of:

s1: obtaining an adjacent low-level feature map and an adjacent high-level feature map of the output feature map;

s2: inputting the adjacent high-level feature maps into an equal-size volume block to obtain a first feature map;

s3: inputting the first feature map into an up-sampling volume block to obtain a second feature map;

s4: inputting the output feature map into a convolution layer to obtain a third feature map, enabling the number of channels of the third feature map to be the same as that of channels of the second feature map, and carrying out matrix multiplication on the second feature map and the third feature map to obtain a semantic feature map;

s5: inputting the adjacent low-layer feature maps into an equal-size volume block to obtain a fourth feature map;

s6: inputting the third feature map into a downsampling rolling block to obtain a fifth feature map;

s7: performing matrix multiplication on the fifth feature map and the semantic feature map to obtain a detail feature map;

s8: inputting the detail characteristic diagrams into an asymmetric convolution network, performing convolution operation by using three parallel convolution layers respectively to obtain three intermediate characteristic diagrams, and adding the three intermediate characteristic diagrams to obtain a final coupling characteristic diagram.

Preferably, the equal-size convolution block comprises a convolution layer with convolution kernels of 1 × 1 and the number of convolution kernels of 64, a batch normalization layer and a Sigmoid activation layer, wherein the number of convolution kernels is 64 so as to unify the number of channels to 64 for feature coupling.

Preferably, the upsampled convolution block comprises a convolution layer with convolution kernel of 1 × 1 and convolution kernel number of 64, a 2-fold bilinear interpolation upsampling operation and a Sigmoid activation layer.

Preferably, the downsampled convolution block comprises a convolution layer with convolution kernel of 1 × 1 and convolution kernel number of 64, a maximum pooling operation with pooling kernel of 2 × 2, and a Sigmoid active layer.

Preferably, the parallel convolutional layers are convolutional layers with convolutional kernels of 3 × 3, 1 × 3 and 3 × 1.

The present invention also provides an electronic device, comprising:

a memory for storing program instructions;

and the processor is used for calling the program instruction stored in the memory to realize the parking space state detection method based on the cross-layer coupling network.

A computer-readable storage medium:

the computer readable storage medium stores program codes for implementing the method for detecting parking space state based on cross-layer coupling network.

According to the stall state detection method, device and storage medium based on the cross-layer coupling network, the cross-layer coupling structure is constructed, and the operation of characteristic coupling and asymmetric convolution is carried out on the characteristic images of adjacent layers through the cross-layer coupling structure, so that the characteristic images of all layers obtain richer detail and semantic information on the premise of avoiding the initial characteristics from being interfered and damaged, and meanwhile, the robustness of the model on target rotation and deformation is enhanced, and the detection capability of the cross-layer coupling network on stalls and vehicles with different sizes and shooting angles and even imaging distortion is improved;

meanwhile, the method can automatically detect and output the positions of the vacant parking spaces and the number of the parked parking spaces according to the parking lot images shot by the visible light camera, so that workers and drivers can quickly know the overall parking condition of the parking lot and the specific positions of the vacant parking spaces, the rapid parking is realized, and the use efficiency of urban parking space resources is effectively improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a schematic flow chart of a parking space state detection method according to an embodiment of a parking space state detection method based on a cross-layer coupling network of the present invention;

fig. 2 is a schematic structural diagram of a cross-layer coupling network according to an embodiment of a parking space state detection method based on the cross-layer coupling network of the present invention;

fig. 3 is a schematic diagram of a cross-layer coupling structure of an embodiment of a parking space state detection method based on a cross-layer coupling network according to the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described in detail below. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the examples given herein without any inventive step, are within the scope of the present invention.

The invention aims to provide a parking space state detection method based on a cross-layer coupling network, wherein the flow of the parking space state detection method is shown in figure 1, and the method comprises the following specific steps:

step 1: constructing a parking space state detection data set;

step 2: designing a cross-layer coupling network;

and step 3: training and testing a cross-layer coupling network through a parking space state detection data set to obtain a cross-layer coupling model;

and 4, step 4: and inputting the image of the parking lot to be detected into the cross-layer coupling model to obtain the number and coordinate information of empty parking spaces and parked parking spaces.

In this embodiment, it contains to construct parking stall state detection data set:

s1, performing overlook image acquisition on a plurality of parking lots based on a visible light camera under different parking density, illumination environments, shooting heights and shooting angles so as to ensure that the acquired images cover various detection situations in practical application;

s2, carrying out image annotation on the collected image by adopting annotation software Labelme, wherein the specific meaning of the image annotation is that Labelme software is used for manually enclosing the areas of vacant parking spaces and parked parking spaces in the image and setting a label for each enclosed area, for the vacant parking spaces, the specific position of each vacant parking space is marked in each image of the parking lot, namely the image area where each vacant parking space is located is enclosed in the image of the parking lot, and the label is set as 'vacant parking space' for each vacant parking space or the image area in the image of the parking lot; marking occupied parking spaces with vacant parking spaces, marking specific positions or image areas of each parked vehicle for the parked vehicles in the images, setting labels of the marked image areas as the parked parking spaces, and obtaining a marking file corresponding to each image through the steps, wherein the marking file comprises the specific positions or the image areas of each vehicle or vacant parking spaces and the labels corresponding to the vehicle or vacant parking spaces;

and S3, acquiring the acquired parking lot images and the corresponding annotation files thereof through the steps S1 and S2, forming a parking space state detection data set by the acquired parking lot images and the corresponding annotation files, and randomly dividing the parking space state detection data set according to a ratio of 7:3 to obtain a training set and a testing set.

Due to the fact that the installation positions of the cameras in different parking lots are diversified, in order to enhance the detection capability of the cross-layer coupling network on parking spaces and vehicles with different sizes and shooting angles and even imaging distortion, a cross-layer coupling structure is designed, and the cross-layer coupling structure is constructed based on an output characteristic diagram of a dense connection network.

Fig. 2 is a schematic structural diagram of the cross-layer coupling network, and in this embodiment, by taking an example of inputting a parking lot image to be detected with dimensions of 2048 × 2048 × 3, the building steps and operation of the cross-layer coupling network are specifically shown:

More specifically, the DenseNet-121 in the DenseNet series has the least convolution operation and faster calculation speed, so the DenseNet-121 is used as a backbone network for extracting parking space and vehicle characteristics in this embodiment. The backbone network DenseNet can also be replaced by other backbone networks.

The DenseNet-121 is composed of five convolution structures, in this embodiment, the output feature maps of the respective convolution blocks are respectively represented as D _1, D _2, D _3, D _4, and D _5, and the dimensions of the five feature maps are 512 × 512 × 32, 256 × 256 × 32, 128 × 128 × 32, 64 × 64 × 32, and 32 × 32 × 32 × 32 after image processing by the DenseNet-121.

In the neural network, the high-level feature map contains more semantic information helpful for object classification than the low-level feature map, and the low-level feature map contains more detailed information helpful for accurate positioning than the high-level feature map, wherein the high-level and the low-level are relative concepts, taking the five feature maps in fig. 2 as an example, the adjacent high-level feature map of D _3 is D _4, and the adjacent low-level feature map of D _4 is D _ 3.

Therefore, the characteristic diagrams of the lower layer and the higher layer are coupled, the information expression capacity of the characteristic diagrams can be enhanced, and better detection effects can be achieved on the parking spaces with fuzzy parking space lines and smaller sizes. In addition, the correlation of the feature information between the adjacent feature layers is strong, so that the feature coupling is carried out based on the adjacent layers, and the original feature representation cannot be interfered and damaged.

Fig. 3 is a schematic diagram of a cross-layer coupling structure, and in this embodiment, a design process of the cross-layer coupling structure is shown by taking an output characteristic diagram D _3 of a backbone network as an example:

first, the neighboring high-level feature map D _4 (dimension 64 × 64 × 32) of D _3 is input into an equal-size volume block, and a feature map with dimension 64 × 64 × 64 is output, and at this time, the feature map channel is converted from 32 to 64.

The equal-size convolution block comprises a convolution layer with convolution kernels of 1 x 1 and the number of convolution kernels of 64, a batch normalization layer and a Sigmoid activation layer, the number of the convolution kernels is fixed to 64 so as to unify the number of channels to 64 for feature coupling, and the feature graph can extract richer channel features through the equal-size convolution block.

Then, the feature map is input into an upsampling volume block to output a feature map with dimensions of 128 × 128 × 64, and at this time, the spatial dimension of the feature map is changed to 2 times of the original dimension, that is, the dimension of 64 × 64 is changed to 128 × 128.

The upsampling convolution block comprises a 2-time bilinear interpolation upsampling operation and a Sigmoid activation layer of convolution layers with convolution kernels of 1 x 1 and convolution kernels of 64.

Next, in order to perform the subsequent coupling operation, the feature map D _3 is input to convolutional layers having convolutional kernels of 1 × 1 and the number of convolutional kernels of 64, to obtain a feature map having 64 channels.

And then, matrix multiplication is carried out on the same-dimension characteristic graphs obtained after the D _4 and the D _3 are processed, so that a semantic characteristic graph S _3 is obtained, more characteristic information can be reserved by the matrix multiplication than by the matrix addition, and therefore, the characteristic coupling is carried out by adopting the matrix multiplication.

Subsequently, the adjacent low-level feature map D _2 (with the dimension of 256 × 256 × 32) of D _3 is input into an equal-size volume block to output a feature map with the dimension of 256 × 256 × 64, and the feature map channel is converted from 32 to 64 at this time, so as to extract richer channel features.

The equal-size convolution block comprises a convolution layer with convolution kernels of 1 x 1 and the number of convolution kernels of 64, a batch normalization layer and a Sigmoid activation layer.

Then, the feature map is input into a downsampled convolution block to obtain a feature map with dimensions of 128 × 128 × 64, and at this time, the spatial dimension of the feature map is half of the original dimension.

Wherein, the downsampling convolution block comprises a convolution layer with convolution kernel of 1 × 1 and convolution kernel number of 64, a maximum pooling operation with pooling kernel of 2 × 2 and a Sigmoid activation layer.

And then, matrix multiplication is carried out on the feature map output by the D _2 and the semantic feature map S _3, so that a detail feature map N _3 is obtained.

Inputting N _3 with dimension of 128 × 128 × 64 into an Asymmetric convolutional network (ACNet), performing convolution operation by using convolution layers with three parallel convolution kernels of 3 × 3, 1 × 3 and 3 × 1 respectively, and adding the three characteristic graphs to obtain a final coupling characteristic graph A _3 with dimension of 128 × 128 × 64.

In order to enhance the detection accuracy of the cross-layer coupling network on the parking spaces and vehicles under different shooting angles and even imaging distortion, an asymmetric convolution network and convolution layers with three parallel convolution kernels of 3 x 3, 1 x 3 and 3 x 1 are adopted for carrying out convolution operation.

According to the constructed cross-layer coupling structure, on the premise that the initial characteristics of each layer of characteristic image are prevented from being interfered and damaged, richer detail and semantic information are obtained through characteristic coupling and asymmetric convolution operation of the characteristic images of adjacent layers, and meanwhile robustness of a model to target rotation and deformation is enhanced, so that the detection capability of a cross-layer coupling network on parking spaces and vehicles with different sizes and shooting angles and even imaging distortion is improved.

In the above manner, the cross-layer coupling structure is constructed for D _2, D _3 and D _4, so as to obtain the coupling characteristic diagrams a _2, a _3 and a _4, as shown in fig. 2.

And finally, sequentially inputting the coupling feature maps A _2, A _3 and A _4 into a Region suggestion Network (RPN) and an ROI Align layer to obtain candidate target regions of the parking spaces and the vehicles, and inputting each candidate target Region into two detection branches of fast RCNN, namely a regression branch and a classification branch, so as to obtain coordinate information and category information of the candidate target regions, wherein the category information is label information of empty parking spaces or parked parking spaces.

The overall design of the cross-layer coupling network is completed based on the above processes.

And training the cross-layer coupling network after the design is finished, and obtaining a cross-layer coupling model.

And performing cross-layer coupling network training by adopting a training set in the parking space state detection data set, and updating network parameters based on a loss function of fast RCNN and an Adam optimizer until the detection speed and precision meeting preset requirements are obtained on a test set, so as to obtain a final cross-layer coupling model.

And inputting the image of the parking lot to be detected into the cross-layer coupling model, obtaining the coordinate information of the empty parking spaces and the parked parking spaces, and respectively obtaining the number of the empty parking spaces and the parked parking spaces according to the obtained number of the coordinates of the empty parking spaces and the parked parking spaces.

If the number M of the vacant parking spaces is larger than 0, outputting the number M of the vacant parking spaces, specific position information and the total number N of the parked parking spaces; otherwise, the information of 'the vehicle is full and the vehicle can not stop temporarily' is output.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A parking space state detection method based on a cross-layer coupling network is characterized in that: the method comprises the following steps:

step 100: constructing a parking space state detection data set;

step 200: constructing a cross-layer coupling network;

2. The method for detecting the parking space state based on the cross-layer coupling network according to claim 1, characterized in that: in step 100, the concrete steps of constructing the parking space state detection data set are as follows:

3. The stall state detection method based on the cross-layer coupling network according to claim 1, characterized in that: in step 200, the specific steps of constructing the cross-layer coupling network are:

4. The parking space state detection method based on the cross-layer coupling network according to claim 3, characterized in that: feature coupling output feature maps having neighboring feature maps comprises the steps of:

s1: acquiring adjacent low-level feature maps and adjacent high-level feature maps of the output feature maps;

5. The stall state detection method based on the cross-layer coupling network according to claim 4, characterized in that: the equal-size convolution block comprises a convolution layer with convolution kernels of 1 x 1 and convolution kernels of 64, a batch normalization layer and a Sigmoid activation layer.

6. The stall state detection method based on the cross-layer coupling network according to claim 4, characterized in that: the upsampling convolution block comprises a convolution layer with convolution kernels of 1 x 1 and the number of convolution kernels of 64, a 2-time bilinear interpolation upsampling operation and a Sigmoid activation layer.

7. The stall state detection method based on the cross-layer coupling network according to claim 4, characterized in that: the downsampled convolution block comprises a convolution layer with convolution kernel of 1 × 1 and convolution kernel number of 64, a maximum pooling operation with pooling kernel of 2 × 2 and a Sigmoid activation layer.

8. The stall state detection method based on the cross-layer coupling network according to claim 4, characterized in that: the parallel convolutional layers are convolutional layers with convolutional kernels of 3 × 3, 1 × 3 and 3 × 1.

9. An electronic device, comprising:

a memory for storing program instructions;

a processor, configured to invoke the program instructions stored in the memory to implement the method for detecting a parking space state based on a cross-layer coupling network according to any one of claims 1 to 8.

10. A computer-readable storage medium, wherein the computer-readable storage medium stores a program code for implementing the method for detecting parking space status based on the cross-layer coupling network according to any one of claims 1 to 8.