WO2022205685A1 - 一种基于轻量化网络的交通标志识别方法 - Google Patents

一种基于轻量化网络的交通标志识别方法 Download PDF

Info

Publication number
WO2022205685A1
WO2022205685A1 PCT/CN2021/107294 CN2021107294W WO2022205685A1 WO 2022205685 A1 WO2022205685 A1 WO 2022205685A1 CN 2021107294 W CN2021107294 W CN 2021107294W WO 2022205685 A1 WO2022205685 A1 WO 2022205685A1
Authority
WO
WIPO (PCT)
Prior art keywords
convolution
separable
channel
asymmetric
traffic sign
Prior art date
Application number
PCT/CN2021/107294
Other languages
English (en)
French (fr)
Inventor
魏宪
郭杰龙
杨晓迪
李�杰
俞辉
张剑锋
邵东恒
唐晓亮
Original Assignee
泉州装备制造研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 泉州装备制造研究所 filed Critical 泉州装备制造研究所
Publication of WO2022205685A1 publication Critical patent/WO2022205685A1/zh
Priority to US18/340,090 priority Critical patent/US11875576B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • G06V20/582Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of traffic signs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/32Normalisation of the pattern dimensions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/776Validation; Performance evaluation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • the invention relates to a traffic sign recognition method.
  • image recognition As a relatively mature application in the field of computer vision, image recognition has attracted more and more attention from all walks of life. In the academic field, various image recognition competitions for public datasets emerge in an endless stream, and various convolutional neural network models based on this design continue to Refreshed for better performance. In the industrial field, image recognition has applications in many aspects such as face recognition, traffic sign recognition, and food safety detection.
  • the traffic sign recognition algorithm based on convolutional neural network Due to the superior performance of convolutional neural network in image recognition, there are many intelligent applications that need to be deployed on small mobile or embedded terminal devices, and the traffic sign recognition algorithm based on convolutional neural network has a great impact on the computing power and storage space of the computing platform. Higher requirements hinder the use of algorithms on smart terminal devices. Therefore, the lightweight processing of the traffic sign recognition algorithm based on convolutional neural network and the pruning of the model can greatly reduce the computational cost and storage requirements required by the algorithm, so that the algorithm can be quickly and easily implemented on the vehicle platform. Accurate operation has important practical value.
  • the purpose of the present invention is to aim at the deficiencies of the prior art, and to provide a method that reduces the parameter scale and calculation amount of the network while ensuring the recognition accuracy, thereby improving the recognition speed of the neural network model deployed in the vehicle platform environment.
  • a traffic sign recognition method based on a lightweight neural network characterized in that it comprises the following steps:
  • Step 1 obtain original traffic sign image data
  • Step 2 data preprocessing, preprocessing the original traffic sign image data to obtain a traffic sign dataset with a training set and a test set;
  • Step 3 set the initial training hyperparameters, input the training set part of the traffic sign data set into the lightweight neural network model for training, and use the test set part of the traffic sign data set to identify the trained lightweight neural network model;
  • Step 4 check whether the recognition accuracy of the model on the test set is more than 90% ideal, if it does not meet the requirements, adjust the training hyperparameters, go to step 3, otherwise, go to step 5;
  • Step 5 prune the lightweight neural network model, set the initial pruning rate to 50%, then retrain the pruned lightweight neural network model on the training set of traffic sign image data, and retrain the pruned lightweight neural network model on the traffic sign data. Identify the trained and pruned lightweight neural network model on the test set of the set;
  • Step 6 Check the recognition accuracy of the trained lightweight neural network model after pruning. If the loss of recognition accuracy is within 1%, save the model and continue to increase the pruning rate in steps of 2%. Go to step 5. If the loss of recognition accuracy exceeds 1%, judge whether it is the result of the first pruning. If it is the result of the first pruning, reduce the pruning rate by 10% and return to step 5; if it is not the first pruning Branch results, go to step 7;
  • Step 7 save the lightweight neural network model after the last pruning
  • Step 8 Deploy the lightweight neural network model after the last pruning in the vehicle system to identify the traffic signs on the road.
  • the lightweight neural network model includes a convolution feature extraction part and a classifier part; the convolution feature extraction part is composed of 1 layer of traditional 3 ⁇ 3 convolution and 16 layers of separable asymmetric convolution, the separable asymmetric volume
  • the product includes a first separable asymmetric convolution and a second separable asymmetric convolution; the first separable asymmetric convolution firstly performs feature separation on each channel of the input, and then performs a separate feature on each channel.
  • the stride is 1, and the 1 ⁇ 3 convolution and 3 ⁇ 1 convolution are filled with 0. After the convolution, two single-channel feature maps of the same size are obtained through the nonlinear Relu activation function.
  • each channel after the summation is batch normalized and passed through the Relu activation function in turn, and then each newly formed channel is channel merged, channel shuffled, and finally the output channel.
  • the second separable asymmetric convolution first performs feature separability for each channel of the input, and then performs a 1 ⁇ 3 convolution and a 3 ⁇ 1 volume with a stride of 1 and a padding of 0 for each channel.
  • two single-channel feature maps of the same size are obtained through the nonlinear Relu activation function, and then the corresponding elements of the two single-channel feature maps are summed, and each channel after the summation is performed in turn.
  • the structure of the traditional 3 ⁇ 3 convolution is: the number of input channels is 3, the number of output channels is 64, the size of the convolution kernel is 3 ⁇ 3, the number of convolution kernels is 64, the step size is 1, the padding is 0, After the traditional 3 ⁇ 3 convolution, the length and width are 64 ⁇ 64, and the channel is 64 feature maps.
  • Separable asymmetric convolution Layers 2-5 use the first separable asymmetric convolution.
  • the solid line part of the residual connection mode indicates that a 1 ⁇ 1 convolution with a stride of 1 is used, and the number of convolution kernels is 64.
  • the length and width are 64 ⁇ 64, and the channel is 64 feature maps;
  • the sixth layer adopts the second separable asymmetric convolution. After the second separable asymmetric convolution of the sixth layer, the length and width are 32 ⁇ 32, and the channel is 64 feature maps;
  • Separable asymmetric convolution Layers 7-11 use the first separable asymmetric convolution.
  • the dotted line part of the residual connection method indicates that a 1 ⁇ 1 convolution with a step size of 2 is used, the number of convolution kernels is 64, and the residual
  • the solid line part of the differential connection method adopts a 1 ⁇ 1 convolution with a step size of 1, and the number of convolution kernels is 64.
  • the length and width are 32 ⁇ 32
  • the channel is 64 feature maps;
  • the 12th layer of separable asymmetric convolution adopts the second separable asymmetric convolution.
  • the length and width are 16 ⁇ 16, and the channel is 64 feature maps;
  • Separable asymmetric convolution Layers 13-15 use the first separable asymmetric convolution, in which the dotted line part of the residual connection mode represents a 1 ⁇ 1 convolution with a step size of 2, and the number of convolution kernels is 64;
  • the solid line part of the differential connection method adopts a 1 ⁇ 1 convolution with a stride of 1, and the number of convolution kernels is 64;
  • the channel is 64 feature maps;
  • the 16th layer of separable asymmetric convolution adopts the second separable asymmetric convolution; after passing the second separable asymmetric convolution of the 16th layer, the length and width are 8 ⁇ 8, and the channel is 64 feature maps;
  • the 17th layer of separable asymmetric convolution adopts the first separable asymmetric convolution; the dotted line part of the residual connection mode indicates that a 1 ⁇ 1 convolution with a stride of 2 is used, and the number of convolution kernels is 64; After the first layer of separable asymmetric convolution, the length and width are 8 ⁇ 8, and the channel is 64 feature maps.
  • the BN layer and the activation layer are added after each convolution operation in the separable asymmetric convolution, and the activation function used by the activation layer is the Relu function.
  • the classifier part consists of three layers of separable fully connected modules.
  • the first layer of separable fully connected modules first converts the previous layer’s length and width to 8 ⁇ 8 and the channel is 64 feature maps to 64 ⁇ 64 shape, and then respectively.
  • the second-layer separable fully-connected module firstly initializes two weight matrices of size A-2 (64 ⁇ 64) and B-2 (64 ⁇ 64), and finally uses matrix A-2 and the size of the previous layer to be 64 ⁇
  • the output matrix of 64 is subjected to matrix multiplication, and the obtained result is then subjected to matrix multiplication with matrix B-2 to obtain the output matrix of the size of 64 ⁇ 64 in the next layer;
  • the third-layer separable fully-connected module first initializes two weight matrices of size A-3 (1 ⁇ 64) and B-3 (64 ⁇ 64), and then uses matrix A-3 and the size of the previous layer to be 64 ⁇
  • the output matrix of 64 is subjected to matrix multiplication, and the result obtained is then subjected to matrix multiplication with matrix B-3 to obtain the output matrix of the next layer size of 1 ⁇ 64; finally, the output matrix is flattened after the Flatten operation, and is used by the softmax activation function. on the recognition task of 64 categories of traffic signs.
  • the data preprocessing includes determining the size of the traffic sign image data and selecting a suitable candidate frame, completing the cropping of the original traffic sign image data, uniformly setting the cropped color image with a resolution of 64 ⁇ 64, and performing the cropping process on the cropped data.
  • Category division using data enhancement method to expand each category of data, the data enhancement method specifically includes slightly horizontal or vertical translation of the image, adjustment of image saturation and whitening processing, so that the number of traffic sign categories is consistent, and then the traffic sign image is processed.
  • the data is marked, and finally the training set and the test set are divided according to the ratio of 8:2, and the traffic sign data set is constructed.
  • the present invention constructs a lightweight traffic sign recognition model by designing a lightweight neural network and a model pruning method, with fewer parameters and faster recognition speed, and can realize high-precision traffic sign recognition on a vehicle-mounted platform;
  • the number of output channels of each layer of convolution in the feature extraction part of the present invention is 64. In multiple convolution layers, it is ensured that the number of input channels is equal to the number of output channels, which can greatly reduce the cost of memory access, thereby speeding up the process.
  • the recognition speed of the traffic sign recognition model is 64.
  • the separable asymmetric convolution of the present invention requires fewer parameters.
  • the overall network model also draws on the idea of residual error, and connects the input feature map to the output through a bypass connection. Effectively prevent gradient disappearance and gradient explosion problems, and can increase the stability of the network to improve the training effect;
  • the classifier part of the present invention uses fewer parameters.
  • decomposing the weight matrix of the fully connected layer retraining two small weight matrices on each layer not only reduces the amount of parameters , but also to prevent the overfitting problem caused by the large amount of parameters;
  • the model pruning method of the present invention is carried out on the basis of depth-separable asymmetric convolution, and the importance of each convolution kernel is judged by calculating the L1 norm of each convolution kernel in the point-by-point convolution part, and then A certain pruning rate is set to prune the convolution kernel of the point-by-point convolution part, and finally the pruned model has a lower parameter amount and has a regularization effect to a certain extent.
  • Fig. 1 is the overall flow chart of the present invention
  • Figure 2 is a separable asymmetric convolution
  • Figure 3 is a lightweight neural network model
  • Figure 4 is a process diagram of replacing the traditional fully connected layer weight matrix with two separable weight matrices
  • Figure 5 is a flow chart of pruning.
  • a light-weight network-based traffic sign recognition method includes the following steps:
  • Step 1 Obtain original traffic sign picture data, use cameras to shoot a large number of traffic signs in road and street views, obtain traffic sign pictures or videos in different time periods, different weather conditions, and different angles, and obtain original traffic sign data information;
  • Step 2 data preprocessing, determine the size of the traffic sign image data and select a suitable candidate frame, complete the cropping of the original traffic sign image data, uniformly set a color (RGB) image with a resolution of 64 ⁇ 64 after cropping, and adjust the cropping.
  • the latter data is classified into categories, and each category of data is expanded by data enhancement methods.
  • the data enhancement methods include slight horizontal or vertical translation of the image, adjustment of image saturation, whitening, etc., so that the number of traffic sign categories is consistent. After that, mark the traffic sign data images, and finally divide the training set and the test set according to the ratio of 8:2 to construct the traffic sign data set;
  • Step 3 build a lightweight neural network model, and send the traffic sign data set preprocessed in step 2 into the network for training in batches, and use the test set part of the traffic sign data set to train the lightweight neural network model. to identify;
  • Step 4 Check whether the recognition accuracy of the model on the test set is above 90%. If it does not meet the requirements, adjust the hyperparameters such as the learning rate, data batch, and number of iterations. Go to Step 3. If the result is satisfactory, go to Step 5. ;
  • Step 5 Perform model pruning on the trained network model, set the initial pruning rate to 50%, and retrain the pruned network model;
  • Step 6 After model pruning is completed, check the recognition accuracy of the trained lightweight neural network model after pruning. If the final trained model accuracy loss is within 1%, save the model and continue to increase it in steps of 2%. If the pruning rate is large, go to step 5. If the loss of recognition accuracy exceeds 1%, judge whether it is the result of the first pruning. If so, reduce the pruning rate by 10% step, and return to step 5. Otherwise, go to step 7.
  • Step 7 save the lightweight neural network model after the last pruning
  • Step 8 Deploy the lightweight neural network model after the last pruning in the vehicle-mounted system to recognize the traffic signs on the road, and display and/or voice prompt the recognition result.
  • the lightweight neural network model includes a convolution feature extraction part and a classifier part;
  • the convolution feature extraction part consists of 1 layer of traditional 3 ⁇ 3 convolution and 16 layers of self-designed separable asymmetric convolution modules consists of; of which:
  • the number of input channels is 3, the number of output channels is 64, the size of the convolution kernel is 3 ⁇ 3, the number of convolution kernels is 64, the stride is 1, and the padding is 0.
  • the length and width are 64 ⁇ 64, and the channel is 64 feature maps.
  • the separable asymmetric convolution is divided into a first separable asymmetric convolution and a second separable asymmetric convolution;
  • the first separable asymmetric convolution first performs feature separability for each channel of the input, and then performs a 1 ⁇ 3 convolution and a 3 ⁇ 1 convolution with a stride of 1 and a padding of 0 for each channel.
  • two single-channel feature maps of the same size are obtained through the nonlinear Relu activation function, and then the corresponding elements of the two single-channel feature maps are summed, and each channel after the summation is batched in turn.
  • Unify and pass through the Relu activation function then perform channel merging and channel shuffling on each newly formed channel, and finally perform a 1 ⁇ 1 convolution with a stride of 1 on the output channel, and set the number of convolution kernels equal to the input number of channels;
  • the second separable asymmetric convolution first performs feature separability for each channel of the input, and then performs a 1 ⁇ 3 convolution and a 3 ⁇ 1 convolution with a stride of 1 and a padding of 0 for each channel, respectively.
  • two single-channel feature maps of the same size are obtained through the nonlinear Relu activation function, and then the corresponding elements of the two single-channel feature maps are summed, and each channel after the summation is batched in turn.
  • the number of product kernels is equal to the number of input channels.
  • Separable asymmetric convolution Layers 2-5 use the first separable asymmetric convolution.
  • the solid line part of the residual connection mode indicates that a 1 ⁇ 1 convolution with a stride of 1 is used, and the number of convolution kernels is 64.
  • the length and width are 64 ⁇ 64, and the channel is 64 feature maps.
  • Separable Asymmetric Convolution Layer 6 employs a second separable asymmetric convolution. After passing through the second separable asymmetric convolution of the sixth layer, the length and width are 32 ⁇ 32, and the channel is 64 feature maps.
  • Separable asymmetric convolution Layers 7-11 use the first separable asymmetric convolution.
  • the dotted line part of the residual connection method indicates that a 1 ⁇ 1 convolution with a stride of 2 is used, and the number of convolution kernels is 64.
  • the solid line part of the residual connection method adopts a 1 ⁇ 1 convolution with a stride of 1, and the number of convolution kernels is 64.
  • Separable Asymmetric Convolution Layer 12 employs a second separable asymmetric convolution. After passing through the second separable asymmetric convolution of the 12th layer, the length and width are 16 ⁇ 16, and the channel is 64 feature maps.
  • Separable asymmetric convolution Layers 13-15 use the first separable asymmetric convolution.
  • the dotted line part of the residual connection method indicates that a 1 ⁇ 1 convolution with a step size of 2 is used, the number of convolution kernels is 64, and the residual
  • the solid line part of the differential connection method adopts a 1 ⁇ 1 convolution with a step size of 1, and the number of convolution kernels is 64.
  • the length and width are 16 ⁇ 16
  • the channels are 64 feature maps.
  • Separable Asymmetric Convolution Layer 16 employs a second separable asymmetric convolution. After passing through the second separable asymmetric convolution of the 16th layer, the length and width are 8 ⁇ 8, and the channel is 64 feature maps.
  • the 17th layer of separable asymmetric convolution adopts the first separable asymmetric convolution.
  • the dotted line part of the residual connection method indicates that a 1 ⁇ 1 convolution with a step size of 2 is used, and the number of convolution kernels is 64.
  • the length and width are 8 ⁇ 8, and the channel is 64 feature maps.
  • a BN layer and an activation layer are added after each convolution operation in the separable asymmetric convolution module; the activation functions used in the activation layer are all Relu functions.
  • the classifier part is connected after the feature extraction part, and a three-layer separable fully-connected module is designed.
  • the first-layer separable fully-connected module first sets the length and width of the previous layer to 8 ⁇ 8 and the channel to 64 features.
  • the dimension of the graph is converted into a shape of 64 ⁇ 64, and then two weight matrices of size A-1 (64 ⁇ 64) and B-1 (64 ⁇ 64) are initialized respectively, and finally the matrix A-1 and the dimension are converted.
  • Perform matrix multiplication on the input of and then perform matrix multiplication with matrix B-1 to obtain the output matrix with the size of 64 ⁇ 64 in the next layer.
  • the second-layer separable fully-connected module firstly initializes two weight matrices of size A-2 (64 ⁇ 64) and B-2 (64 ⁇ 64), and finally uses matrix A-2 and the size of the previous layer to be 64 ⁇
  • the output matrix of 64 is subjected to matrix multiplication, and the obtained result is then subjected to matrix multiplication with matrix B-2 to obtain an output matrix with a size of 64 ⁇ 64 in the next layer.
  • the third-layer separable fully-connected module first initializes two weight matrices of size A-3 (1 ⁇ 64) and B-3 (64 ⁇ 64), and then uses matrix A-3 and the size of the previous layer to be 64 ⁇
  • the output matrix of 64 is subjected to matrix multiplication, and the obtained result is then subjected to matrix multiplication with matrix B-3 to obtain the output matrix of the size of 1 ⁇ 64 in the next layer.
  • the output matrix is flattened, and the softmax activation function is used for the recognition task of 64 categories of traffic signs.
  • model pruning is performed on the trained lightweight neural network model.
  • the depthwise separable convolution in the MobileNetV1 network consists of two convolution methods, the first is a channel-by-channel grouped convolution, and the second is a point-by-point convolution. If the influence of the bias parameter is not considered, the parameter quantity of the channel-by-channel grouped convolution is:
  • R 1 D K ⁇ D K ⁇ 1 ⁇ 1 ⁇ M
  • D K ⁇ D K is the size of the convolution kernel, which is usually represented by 3 ⁇ 3.
  • M is the number of input channels.
  • the parameter quantity of point-by-point convolution is:
  • N is the number of convolution kernels or the number of output channels.
  • a separable asymmetric convolution module designed for this method is also composed of two convolution methods. Different from the first convolution method of depthwise separable convolution, our method first The features of each input channel are separable, and then a 1 ⁇ 3 convolution and a 3 ⁇ 1 convolution with a stride of 1 are performed on each channel, and the corresponding padding methods are used for different convolutions, so that After convolution, the two single-channel feature maps through the nonlinear Relu activation function are of the same size in scale, and then the corresponding elements of the two single-channel feature maps are summed respectively. If the influence of the bias parameter is not considered, the parameter quantity of this process can be expressed as:
  • R 3 (1 ⁇ D K +D K ⁇ 1) ⁇ 1 ⁇ 1 ⁇ M
  • the amount of parameter reduction in the separable asymmetric convolution module designed by this method depends on the difference of the first convolution method compared with that in MobileNetV1, and the amount of parameter reduction is:
  • the lightweight and separable fully-connected module designed for this method is designed for this method.
  • For the fully-connected layer see the traditional fully-connected module in the figure. If the input vector The output vector is The fully connected layer can be expressed as:
  • weight matrix Decompose into two small weight matrices and makes:
  • vec -1 (*) is the operation to convert a column vector into a corresponding matrix. Therefore, the fully connected layer of (1) can be rewritten as the input matrix vec -1 (X) and two small parameter matrices and The product of , we will rewrite the network structure into a separable layer. in, is the output of the separable layer, is the input of the separable layer, is a learnable bias. Therefore, the parameter amount of a separable layer is ha+wb+hw, and the parameter amount of a fully connected layer is ab ⁇ hw+hw. If both of them do not consider the influence of the bias parameter, the ratio of the parameters can be expressed as:
  • each channel is represented as (F 1 , F 2 ,...,FM ), where the channel size is D K ⁇ D K , for a filter
  • the size is 1 ⁇ 1 ⁇ M
  • a 1 ⁇ 1 ⁇ M filter The convolution process can be described as:
  • This formula can get an output feature map where F i k i represents the multiplication of the weight coefficient k i and each element on the feature map F i .
  • N feature maps will be obtained, which can be expressed as This pruning method ranks the importance of convolution kernels, and calculates the L1 norm for each trained point-by-point convolution filter, namely:
  • the importance of the channels is sorted by the size of the L1 norm, that is, the larger the L1 norm value, the more important the convolution filter is. As shown in Figure 5, it is the pruning process of this method. If the dotted line filter corresponds to If the L1 norm is small, the corresponding filter is deleted.
  • the traffic sign deep neural network model is compressed mainly by balancing the relationship between the pruning rate ⁇ (the initial pruning rate is set to 50%) and the accuracy rate.
  • the model's accuracy drop threshold ⁇ (1%) is first defined to ensure that model compression is performed within the allowable range of model accuracy drop.
  • the method first calculates the L1 of each filter in the point-by-point convolution process. Norm, and then sort the L1 norm from small to large, and determine the pruning threshold ⁇ according to the pruning rate ⁇ , as follows:
  • n p (1- ⁇ )n w
  • n w represents the number of filters in the point-by-point convolution process
  • n p is the number of filters after pruning, then traverse the L1 norm of each filter W from large to small and count, when the count reaches n p
  • the L1 norm value at this time is the pruning threshold ⁇ .
  • the model After pruning, the model is fine-tuned and retrained. If the loss of model accuracy after pruning exceeds 1%, it is judged whether it is the result of the first pruning. If so, the pruning rate is reduced by 10% step. If the final training model accuracy loss is within 1%, save the model and continue to increase the pruning rate in 2% steps, and continue pruning until the model accuracy loss exceeds the set accuracy drop threshold ⁇ (1% ), so far, the last saved pruning model is the required traffic sign recognition model.
  • the model can guarantee a large compression ratio and maintain the properties of accuracy.
  • the present invention is a traffic sign recognition method based on a lightweight neural network.
  • the lightweight neural network model is obtained by constructing a lightweight neural network model for training and pruning.
  • the lightweight neural network model includes a convolution feature extraction part and a classifier. Part; the convolution feature extraction part consists of 1 layer of traditional 3 ⁇ 3 convolution and 16 layers of separable asymmetric convolution, and the classifier part consists of three layers of separable fully connected modules.
  • This recognition method ensures the recognition accuracy at the same time.
  • the parameter scale and computational complexity of the network are reduced, thereby improving the recognition speed of the neural network model deployed in the vehicle platform environment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

一种基于轻量化神经网络的交通标志识别方法,通过构建轻量化神经网络模型进行训练、剪枝得到轻量化神经网络模型,所述轻量化神经网络模型包括卷积特征提取部分和分类器部分;卷积特征提取部分由1层传统3×3卷积和16层可分离非对称卷积组成,分类器部分由三层可分离全连接模块组成,该识别方法在保证识别准确率的同时降低了网络的参数规模和运算量,从而提升了神经网络模型在车载平台环境下部署的识别速度。

Description

一种基于轻量化网络的交通标志识别方法 技术领域
本发明涉及一种交通标志识别方法。
背景技术
图像识别作为计算机视觉领域比较成熟的应用,越来越受到社会各界的广泛关注,在学术领域,各种针对公开数据集的图像识别竞赛层出不穷,而基于此设计的各种卷积神经网络模型不断刷新了更好的性能。在工业领域,图像识别在人脸识别、交通标志识别、食品安全检测等很多方面都有应用。
由于卷积神经网络在图像识别取得的优越性能,目前有很多智能应用需要部署在小型移动或嵌入式终端设备上,而基于卷积神经网络的交通标志识别算法对计算平台的运算能力和存储空间要求较高,阻碍了算法在智能终端设备上的使用。因此,对基于卷积神经网络的交通标志识别算法进行轻量化处理,并对模型进行剪枝,能极大程度减小算法所需要的计算成本和存储要求,使算法能够在车载平台上快速、准确的运行,具有重要的实用价值。
发明内容
本发明的目的是针对现有技术的不足,提供一种在保证识别准确率的同时降低了网络的参数规模和运算量,从而提升了神经网络模型 在车载平台环境下部署的识别速度。
本发明的目的通过如下技术方案来实现:
一种基于轻量化神经网络的交通标志识别方法,其特征在于:包括如下步骤:
步骤1,获取原始交通标志图像数据;
步骤2,数据预处理,对原始交通标志图像数据进行预处理,得到具有训练集和测试集的交通标志数据集;
步骤3,设置初始训练超参数,将交通标志数据集的训练集部分输入轻量化神经网络模型进行训练,并利用交通标志数据集的测试集部分对训练好的轻量化神经网络模型进行识别;
步骤4,查看模型在测试集上的识别精度是否达到90%以上理想,若未达到要求,则调整训练超参数,转步骤3,否则,转步骤5;
步骤5,对轻量化神经网络模型进行剪枝,设置初始剪枝率50%,然后在交通标志图像数据的训练集上对剪枝后的轻量化神经网络模型进行再训练,并在交通标志数据集的测试集上对训练好的剪枝后的轻量化神经网络模型进行识别;
步骤6,查看训练好的剪枝后的轻量化神经网络模型的识别精度,若识别精度损失在1%以内,保存模型并继续以2%的步长增大剪枝率,转步骤5,若识别精度损失超过1%,则判断是否为第一次剪枝结果,如果为第一次剪枝结果,则以10%的步长减少剪枝率,返回步骤5;如果不为第一次剪枝结果,转步骤7;
步骤7,保存上一次剪枝后的轻量化神经网络模型;
步骤8,将上一次剪枝后的轻量化神经网络模型部署在车载***中,以对道路上的交通标志进行识别。
所述轻量化神经网络模型包括卷积特征提取部分和分类器部分;卷积特征提取部分由1层传统3×3卷积和16层可分离非对称卷积组成,所述可分离非对称卷积包括第一可分离非对称卷积和第二可分离非对称卷积;所述第一可分离非对称卷积首先对输入的每个通道进行特征可分离,其次对每一个通道分别进行一个步长为1,填充为0的1×3的卷积和3×1卷积,卷积之后均通过非线性Relu激活函数得到大小相同的两个单通道特征图,其次分别对两个单通道的特征图进行对应元素求和,并依次对求和后的每个通道进行批归一化和经过Relu激活函数,然后对新形成的每个通道进行通道合并、通道混洗,最后对输出通道进行步长为1的1×1卷积,并设置卷积核的个数等于输入通道数;
所述第二可分离非对称卷积首先对输入的每个通道进行特征可分离,其次对每一个通道分别进行一个步长为1,填充为0的1×3的卷积和3×1卷积,卷积之后均通过非线性Relu激活函数得到大小相同的两个单通道特征图,其次分别对两个单通道的特征图进行对应元素求和,并依次对求和后的每个通道进行批归一化和经过Relu激活函数,然后对新形成的每个通道进行通道合并、通道混洗,最后对输出通道进行步长为2的1×1卷积,完成特征图的下采样,并设置卷积核的个数等于输入通道数。
所述传统3×3卷积的结构为:输入通道数为3,输出通道数为64, 卷积核尺寸为3×3、卷积核个数为64,步长为1,填充为0,通过传统3×3卷积后,得到长宽为64×64,通道为64特征图。
可分离非对称卷积第2-5层采用第一可分离非对称卷积,其中残差连接方式实线部分表示采用步长为1的1×1卷积,卷积核个数为64,通过第2-5层第一可分离非对称卷积后,得到长宽为64×64,通道为64特征图;
可分离非对称卷积第6层采用第二可分离非对称卷积,通过第6层第二可分离非对称卷积后,得到长宽为32×32,通道为64特征图;
可分离非对称卷积第7-11层采用第一可分离非对称卷积,其中残差连接方式虚线部分表示采用步长为2的1×1卷积,卷积核个数为64,残差连接方式实线部分采用步长为1的1×1卷积,卷积核个数为64,通过第7-11层第一可分离非对称卷积后,得到长宽为32×32,通道为64特征图;
可分离非对称卷积第12层采用第二可分离非对称卷积,通过第12层第二可分离非对称卷积后,得到长宽为16×16,通道为64特征图;
可分离非对称卷积第13—15层采用第一可分离非对称卷积,其中残差连接方式虚线部分表示采用步长为2的1×1卷积,卷积核个数为64;残差连接方式实线部分采用步长为1的1×1卷积,卷积核个数为64;通过第13-15层第一可分离非对称卷积后,得到长宽为16×16,通道为64特征图;
可分离非对称卷积第16层采用第二可分离非对称卷积;通过第16层第二可分离非对称卷积后,得到长宽为8×8,通道为64特征图;
可分离非对称卷积第17层采用第一可分离非对称卷积;其中残差连接方式虚线部分表示采用步长为2的1×1卷积,卷积核个数为64;通过第17层第一可分离非对称卷积后,得到长宽为8×8,通道为64特征图。
在可分离非对称卷积中的每次卷积操作后加BN层和激活层,激活层使用的激活函数均为Relu函数。
分类器部分由三层可分离全连接模块组成,第一层可分离全连接模块首先将上一层长宽为8×8,通道为64特征图进行维度转换为64×64的形状,之后分别初始化两个大小为A-1(64×64)、B-1(64×64)的权重矩阵,最后用矩阵A-1与进行维度转换后的输入进行矩阵乘法,得到的结果再与矩阵B-1进行矩阵乘法,得到下一层大小为64×64的输出矩阵;
第二层可分离全连接模块首先分别初始化两个大小为A-2(64×64)、B-2(64×64)的权重矩阵,最后用矩阵A-2与上一层大小为64×64的输出矩阵进行矩阵乘法,得到的结果再与矩阵B-2进行矩阵乘法,得到下一层大小为64×64的输出矩阵;
第三层可分离全连接模块首先分别初始化两个大小为A-3(1×64)、B-3(64×64)的权重矩阵,然后用矩阵A-3与上一层大小为64×64的输出矩阵进行矩阵乘法,得到的结果再与矩阵B-3进行矩阵乘法,得到下一层大小为1×64的输出矩阵;最后经过Flatten操作后对输出矩阵展平,经过softmax激活函数用于交通标志64种类别的识别任务。
所述数据预处理包括确定交通标志图像数据的尺寸并选择合适的 候选框,完成对原始交通标志图像数据的裁剪,统一设置裁剪后分辨率为64×64的彩色图像,对裁剪后的数据进行类别划分,利用数据增强方法对每种类别数据进行扩充,数据增强方法具体包括对图像进行轻微水平或垂直平移、调整图像饱和度和白化处理,使得各交通标志类别数量一致,之后对交通标志图像数据进行标记,最后按照8:2的比例划分训练集和测试集,构建出交通标志数据集。
本发明具有如下有益效果:
1、本发明通过设计轻量化神经网络和模型剪枝方法构建轻量化交通标志识别模型,参数量更少、识别速度更快,能够在车载平台上实现高精度交通标志识别;
2、本发明特征提取部分的每一层卷积输出通道数均为64,在多处卷积层中保证了输入通道数等于输出通道数,这样能较大程度降低内存访问成本,从而加快了交通标志识别模型的识别速度。
2、本发明的可分离非对称卷积相比深度可分离卷积所需要的参数更少,整体网络模型还借鉴了残差的思想,通过旁路连接将输入特征图连接到输出上,能有效防止梯度消失和梯度***问题,并能增加网络的稳定性从而提升训练效果;
3、本发明的分类器部分相比传统全连接层所用的参数更少,通过对全连接层权重矩阵的分解,在每一层上通过重训练两个小的权重矩阵,不仅减低了参数量,还能防止因参数量过大造成的过拟合问题;
4、本发明的模型剪枝方法在深度可分离非对称卷积的基础上进行, 通过计算逐点卷积部分每一个卷积核的L1范数来判断每个卷积核的重要程度,然后设置一定的剪枝率对逐点卷积部分的卷积核进行剪枝,最终得到剪枝后的模型具有更低的参数量,并在一定程度上起到了正则化的效果。
附图说明
下面结合附图对本发明作进一步详细说明。
图1为本发明的整体流程图;
图2为可分离非对称卷积;
图3为轻量化神经网络模型;
图4为传统全连接层权重矩阵替换为两个可分离权重矩阵过程图;
图5为剪枝流程图。
具体实施方式
参照图1所示,本发明提供的一种基于轻量化网络的交通标志识别方法,包括如下步骤:
步骤1,获取原始交通标志图片数据,利用摄像机对道路街景的交通标志进行大量拍摄,得到不同时间段、不同天气情况、不同角度的交通标志图片或视频,获取原始交通标志数据信息;
步骤2,数据预处理,确定交通标志图像数据的尺寸并选择合适的候选框,完成对原始交通标志图像数据的裁剪,统一设置裁剪后分辨率为64×64的彩色(RGB)图片,对裁剪后的数据进行类别划分,利用数据增强方法对每种类别数据进行扩充,数据增强方法具体包括对 图像进行轻微水平或垂直平移、调整图像饱和度、白化处理等,使得各交通标志类别数量一致,之后对交通标志数据图像进行标记,最后按照8:2的比例划分训练集和测试集,构建出交通标志数据集;
步骤3,构建轻量化神经网络模型,并将经过步骤2预处理后的交通标志数据集分批送入网络进行训练,并利用交通标志数据集的测试集部分对训练好的轻量化神经网络模型进行识别;
步骤4,查看模型在测试集上的识别精度是否达到90%以上,若未达到要求,则调整学习率、数据批次,迭代次数等超参数,转步骤3,若结果理想,则转步骤5;
步骤5,对训练好的网络模型进行模型剪枝,设置初始剪枝率50%,并对剪枝后的网络模型进行再训练;
步骤6,模型剪枝完成后,查看训练好的剪枝后的轻量化神经网络模型的识别精度,若最终训练后的模型精度损失在1%以内,保存模型并继续以2%的步长增大剪枝率,转步骤5,若识别精度损失超过1%,则判断是否为第一次剪枝结果,如果是,则以10%的步长减少剪枝率,返回步骤5。否则,转步骤7。
步骤7,保存上一次剪枝后的轻量化神经网络模型;
步骤8,将上一次剪枝后的轻量化神经网络模型部署在车载***中,以对道路上的交通标志进行识别,并对识别结果进行显示和/或语音提示。
参照图3所示,轻量化神经网络模型,包括卷积特征提取部分和分类器部分;卷积特征提取部分由1层传统3×3卷积和16层自主设 计的可分离非对称卷积模块组成;其中:
传统3×3卷积结构:输入通道数为3,输出通道数为64,卷积核尺寸为3×3、卷积核个数为64,步长为1,填充为0,通过传统3×3卷积后,得到长宽为64×64,通道为64特征图。
可分离非对称卷积分为第一可分离非对称卷积和第二可分离非对称卷积;
第一可分离非对称卷积首先对输入的每个通道进行特征可分离,其次对每一个通道分别进行一个步长为1,填充为0的1×3的卷积和3×1卷积,卷积之后均通过非线性Relu激活函数得到大小相同的两个单通道特征图,其次分别对两个单通道的特征图进行对应元素求和,并依次对求和后的每个通道进行批归一化和经过Relu激活函数,然后对新形成的每个通道进行通道合并、通道混洗,最后对输出通道进行步长为1的1×1卷积,并设置卷积核的个数等于输入通道数;
第二可分离非对称卷积首先对输入的每个通道进行特征可分离,其次对每一个通道分别进行一个步长为1,填充为0的1×3的卷积和3×1卷积,卷积之后均通过非线性Relu激活函数得到大小相同的两个单通道特征图,其次分别对两个单通道的特征图进行对应元素求和,并依次对求和后的每个通道进行批归一化和经过Relu激活函数,然后对新形成的每个通道进行通道合并、通道混洗,最后对输出通道进行步长为2的1×1卷积,完成特征图的下采样,并设置卷积核的个数等于输入通道数。
可分离非对称卷积第2-5层采用第一可分离非对称卷积,其中残 差连接方式实线部分表示采用步长为1的1×1卷积,卷积核个数为64,通过第2-5层第一可分离非对称卷积后,得到长宽为64×64,通道为64特征图。
可分离非对称卷积第6层采用第二可分离非对称卷积。通过第6层第二可分离非对称卷积后,得到长宽为32×32,通道为64特征图。
可分离非对称卷积,第7-11层采用第一可分离非对称卷积,其中残差连接方式虚线部分表示采用步长为2的1×1卷积,卷积核个数为64。残差连接方式实线部分采用步长为1的1×1卷积,卷积核个数为64。通过第7-11层第一可分离非对称卷积后,得到长宽为32×32,通道为64特征图。
可分离非对称卷积第12层采用第二可分离非对称卷积。通过第12层第二可分离非对称卷积后,得到长宽为16×16,通道为64特征图。
可分离非对称卷积第13-15层采用第一可分离非对称卷积,其中残差连接方式虚线部分表示采用步长为2的1×1卷积,卷积核个数为64,残差连接方式实线部分采用步长为1的1×1卷积,卷积核个数为64,通过第13-15层第一可分离非对称卷积后,得到长宽为16×16,通道为64特征图。
可分离非对称卷积第16层采用第二可分离非对称卷积。通过第16层第二可分离非对称卷积后,得到长宽为8×8,通道为64特征图。
可分离非对称卷积第17层采用第一可分离非对称卷积,其中残差连接方式虚线部分表示采用步长为2的1×1卷积,卷积核个数为64,通过第17层第一可分离非对称卷积后,得到长宽为8×8,通道为64 特征图。
为了提高训练收敛速度,在可分离非对称卷积模块中的每次卷积操作后加BN层和激活层;激活层使用的激活函数均为Relu函数。
为了进一步降低参数量,在特征提取部分之后接入分类器部分,设计三层可分离全连接模块,第一层可分离全连接模块首先将上一层长宽为8×8,通道为64特征图进行维度转换为64×64的形状,之后分别初始化两个大小为A-1(64×64)、B-1(64×64)的权重矩阵,最后用矩阵A-1与进行维度转换后的输入进行矩阵乘法,得到的结果再与矩阵B-1进行矩阵乘法,得到下一层大小为64×64的输出矩阵。
第二层可分离全连接模块首先分别初始化两个大小为A-2(64×64)、B-2(64×64)的权重矩阵,最后用矩阵A-2与上一层大小为64×64的输出矩阵进行矩阵乘法,得到的结果再与矩阵B-2进行矩阵乘法,得到下一层大小为64×64的输出矩阵。
第三层可分离全连接模块首先分别初始化两个大小为A-3(1×64)、B-3(64×64)的权重矩阵,然后用矩阵A-3与上一层大小为64×64的输出矩阵进行矩阵乘法,得到的结果再与矩阵B-3进行矩阵乘法,得到下一层大小为1×64的输出矩阵。最后经过Flatten操作后对输出矩阵展平,经过softmax激活函数用于交通标志64种类别的识别任务。
进一步地,为了整体降低参数量和运算速度,对训练好的轻量化神经网络模型进行模型剪枝。
MobileNetV1网络中的深度可分离卷积由两种卷积方式组成,第一种为逐通道的分组卷积,第二种为逐点卷积。如果不考虑偏置参数的 影响,逐通道分组卷积的参数量为:
R 1=D K×D K×1×1×M
式中:D K×D K为卷积核尺寸,实际常用3×3表示。M为输入通道数。逐点卷积的参数量为:
R 2=1×1×M×N
式中:N为卷积核的个数或输出通道数。
如图2所示,为本方法设计的一种可分离非对称卷积模块,同样由两种卷积方式组成,不同于深度可分离卷积的第一种卷积方式,我们的方法首先对输入的每个通道进行特征可分离,其次对每一个通道分别进行一个步长为1的1×3的卷积和3×1卷积,并对不同的卷积采用相对应的填充方式,使得卷积之后通过非线性Relu激活函数的两个单通道特征图尺度上大小相同,其次分别对两个单通道的特征图进行对应元素求和。如果不考虑偏置参数的影响,该过程的参数量可以表示为:
R 3=(1×D K+D K×1)×1×1×M
不同于深度可分离卷积的第二种卷积方式,虽然都用到了逐点卷积,但是后者再进行卷积之前先对合并后的通道进行了混洗操作,有效解决了通道之间信息流通不畅的问题,提高了可分离非对称卷积模块的特征提取能力。所以本方法设计的可分离非对称卷积模块相比MobileNetV1中的参数减少量取决于第一种卷积方式的差异,参数减小量为:
Figure PCTCN2021107294-appb-000001
当D K=3,M=64时,同样条件下,我们的方法相比MobileNetV1网络中的深度可分离卷积,参数减小量为192个。
如图4所示,为本方法设计的轻量化可分离全连接模块,对于全连接层,见图中传统全连接模块,若输入向量
Figure PCTCN2021107294-appb-000002
输出向量为
Figure PCTCN2021107294-appb-000003
全连接层可以表示为:
Y=σ(WX+b)
其中
Figure PCTCN2021107294-appb-000004
Figure PCTCN2021107294-appb-000005
分别表示可学习的权重矩阵和偏置,σ(·)表示非线性激活函数。可以对权重矩阵
Figure PCTCN2021107294-appb-000006
分解为两个小的权重矩阵
Figure PCTCN2021107294-appb-000007
Figure PCTCN2021107294-appb-000008
使得:
Figure PCTCN2021107294-appb-000009
式中满足n=hw,m=ab。通过公式变形可得到下式:
Figure PCTCN2021107294-appb-000010
其中vec -1(*)为把一个列向量转化为对应的矩阵操作,因此,可以将(1)式全连接层改写为输入矩阵vec -1(X)与两个小参数矩阵
Figure PCTCN2021107294-appb-000011
Figure PCTCN2021107294-appb-000012
的乘积,我们将改写后的网络结构成为separable层。其中,
Figure PCTCN2021107294-appb-000013
是separable层的输出,
Figure PCTCN2021107294-appb-000014
是separable层的输入,
Figure PCTCN2021107294-appb-000015
是可学习的偏置。因此,一个separable层的参数量为ha+wb+hw,而全连接层的参数量为ab×hw+hw。若两者均不考虑偏置参数带来的影响,参数之比可以表示为:
Figure PCTCN2021107294-appb-000016
因为
Figure PCTCN2021107294-appb-000017
可推知:
Figure PCTCN2021107294-appb-000018
因此,通过将全连接层替换为separable层结构,可以极大程度降低参数数量。
在可分离非对称卷积模块中,如图2所示,大部分的计算量主要集中在逐点卷积上。因此,本方法将剪枝重点放在逐点卷积层。假设图2中经过通道混洗后合并的通道为M个,每个通道表示为(F 1,F 2,…,F M),其中通道尺寸为D K×D K,对于一个滤波器
Figure PCTCN2021107294-appb-000019
的尺寸为1×1×M,则一个1×1×M的滤波器
Figure PCTCN2021107294-appb-000020
的卷积过程可描述为:
Figure PCTCN2021107294-appb-000021
该式可以得到一个输出特征图
Figure PCTCN2021107294-appb-000022
其中F ik i表示权重系数k i和特征图F i上每个元素相乘。对N个滤波器,将得到N个特征图,可以表示为
Figure PCTCN2021107294-appb-000023
本剪枝方法对卷积核重要性进行排序,对每个训练好的逐点卷积滤波器计算L1范数,即:
Figure PCTCN2021107294-appb-000024
通过对L1范数的大小对通道的重要程度进行排序,即L1范数值越大,表示卷积滤波器越重要,如图5所示,为该方法的剪枝过程, 若虚线滤波器对应的L1范数很小,则将对应的滤波器删掉。
在具体实施步骤上,主要通过平衡剪枝率λ(设置初始剪枝率为50%)和准确率关系来压缩交通标志深度神经网络模型。具体地,首先定义了模型的准确率下降阈值η(1%),以保证模型压缩是在模型准确率下降允许范围进行的,该方法首先计算出逐点卷积过程的每个滤波器的L1范数,然后对L1范数按从小到大排序,根据剪枝率λ确定剪枝阈值θ,如式:
n p=(1-λ)n w
其中n w表示逐点卷积过程滤波器的数量,n p为剪枝后的滤波器数量,则从大到小遍历每个滤波器W的L1范数并计数,当计数达到n p时,此时的L1范数值即为剪枝阈值θ。通过将L1范数值小于剪枝阈值θ的对应滤波器置0,最终得到剪枝后的模型W p。如式:
Figure PCTCN2021107294-appb-000025
剪枝之后再对模型进行微调重训练,若剪枝后模型精度损失超过1%,则判断是否为第一次剪枝结果,如果是,则以10%的步长减少剪枝率。若最终训练后的模型精度损失在1%以内,保存模型并继续以2%的步长增大剪枝率,继续进行剪枝,直到模型精度损失超过设定的准确率下降阈值η(1%)为止,至此,保存上一次的剪枝模型即为需要的交通标志识别模型。该模型能保证较大压缩比并维持准确率的性质。
通过以下实验,对本实施例交通标志识别精度及速度进行测试。
使用MPSoC ZCU106开发板作为嵌入式测试平台。对本发明所提出的轻量化交通标志识别模型进行测试。实验方法如下:
1)在GPU平台上使用处理过的交通标志数据集对设计好的轻量化神经网络进行训练;对训练好的轻量化神经网络模型进行模型剪枝,得到剪枝后的模型。
2)再通过格式转换将训练出的网络模型部署在ARM处理器上。
3)使用处理过的交通标志数据集对轻量化交通标志识别网络进行测试。使用ncnn深度学***台下的实用性。
以上所述,仅为本发明的较佳实施例而已,故不能以此限定本发明实施的范围,即依本发明申请专利范围及说明书内容所作的等效变化与修饰,皆应仍属本发明专利涵盖的范围内。
工业实用性
本发明一种基于轻量化神经网络的交通标志识别方法,通过构建轻量化神经网络模型进行训练、剪枝得到轻量化神经网络模型,所述轻量化神经网络模型包括卷积特征提取部分和分类器部分;卷积特征提取部分由1层传统3×3卷积和16层可分离非对称卷积组成,分类器部分由三层可分离全连接模块组成,该识别方法在保证识别准确率的同时降低了网络的参数规模和运算量,从而提升了神经网络模型在车载平台环境下部署的识别速度。

Claims (8)

  1. 一种基于轻量化神经网络的交通标志识别方法,其特征在于:包括如下步骤:
    步骤1,获取原始交通标志图像数据;
    步骤2,数据预处理,对原始交通标志图像数据进行预处理,得到具有训练集和测试集的交通标志数据集;
    步骤3,设置初始训练超参数,将交通标志数据集的训练集部分输入轻量化神经网络模型进行训练,并利用交通标志数据集的测试集部分对训练好的轻量化神经网络模型进行识别;
    步骤4,查看模型在测试集上的识别精度是否达到90%以上理想,若未达到要求,则调整训练超参数,转步骤3,否则,转步骤5;
    步骤5,对轻量化神经网络模型进行剪枝,设置初始剪枝率50%,然后在交通标志图像数据的训练集上对剪枝后的轻量化神经网络模型进行再训练,并在交通标志数据集的测试集上对训练好的剪枝后的轻量化神经网络模型进行识别;
    步骤6,查看训练好的剪枝后的轻量化神经网络模型的识别精度,若识别精度损失在1%以内,保存模型并继续以2%的步长增大剪枝率,转步骤5,若识别精度损失超过1%,则判断是否为第一次剪枝结果,如果为第一次剪枝结果,则以10%的步长减少剪枝率,返回步骤5;如果不为第一次剪枝结果,转步骤7;
    步骤7,保存上一次剪枝后的轻量化神经网络模型;
    步骤8,将上一次剪枝后的轻量化神经网络模型部署在车载***中, 以对道路上的交通标志进行识别。
  2. 根据权利要求1所述的一种基于轻量化网络的交通标志识别方法,其特征在于:所述轻量化神经网络模型包括卷积特征提取部分和分类器部分;
    所述卷积特征提取部分包括1层传统3×3卷积和16层可分离非对称卷积,所述可分离非对称卷积包括第一可分离非对称卷积和第二可分离非对称卷积;
    所述分类器部分包括三层可分离全连接模块。
  3. 根据权利要求2所述的一种基于轻量化网络的交通标志识别方法,其特征在于:所述第一可分离非对称卷积首先对输入的每个通道进行特征可分离;其次对每一个通道分别进行一个步长为1,填充为0的1×3的卷积和3×1卷积,卷积之后均通过非线性Relu激活函数得到大小相同的两个单通道特征图;再分别对两个单通道的特征图进行对应元素求和,并依次对求和后的每个通道进行批归一化和经过Relu激活函数;然后对新形成的每个通道进行通道合并、通道混洗,最后对输出通道进行步长为1的1×1卷积,并设置卷积核的个数等于输入通道数;
    所述第二可分离非对称卷积首先对输入的每个通道进行特征可分离;其次对每一个通道分别进行一个步长为1,填充为0的1×3的卷积和3×1卷积,卷积之后均通过非线性Relu激活函数得到大小相同的两个单通道特征图;再分别对两个单通道的特征图进行对应元素求和,并依次对求和后的每个通道进行批归一化和经过Relu激活函数; 然后对新形成的每个通道进行通道合并、通道混洗,最后对输出通道进行步长为2的1×1卷积,完成特征图的下采样,并设置卷积核的个数等于输入通道数。
  4. 根据权利要求2所述的一种基于轻量化网络的交通标志识别方法,其特征在于:所述传统3×3卷积的结构为:输入通道数为3,输出通道数为64,卷积核尺寸为3×3、卷积核个数为64,步长为1,填充为0,通过传统3×3卷积后,得到长宽为64×64,通道为64特征图。
  5. 根据权利要求3所述的一种基于轻量化网络的交通标志识别方法,其特征在于:所述可分离非对称卷积第2-5层采用第一可分离非对称卷积,其中残差连接方式实线部分表示采用步长为1的1×1卷积,卷积核个数为64,通过第2-5层第一可分离非对称卷积后,得到长宽为64×64,通道为64特征图;
    可分离非对称卷积第6层采用第二可分离非对称卷积,通过第6层第二可分离非对称卷积后,得到长宽为32×32,通道为64特征图;
    可分离非对称卷积第7-11层采用第一可分离非对称卷积,其中残差连接方式虚线部分表示采用步长为2的1×1卷积,卷积核个数为64,残差连接方式实线部分采用步长为1的1×1卷积,卷积核个数为64,通过第7-11层第一可分离非对称卷积后,得到长宽为32×32,通道为64特征图;
    可分离非对称卷积第12层采用第二可分离非对称卷积,通过第12层第二可分离非对称卷积后,得到长宽为16×16,通道为64特征图;
    可分离非对称卷积第13—15层采用第一可分离非对称卷积,其中残差连接方式虚线部分表示采用步长为2的1×1卷积,卷积核个数为64;残差连接方式实线部分采用步长为1的1×1卷积,卷积核个数为64;通过第13-15层第一可分离非对称卷积后,得到长宽为16×16,通道为64特征图;
    可分离非对称卷积第16层采用第二可分离非对称卷积;通过第16层第二可分离非对称卷积后,得到长宽为8×8,通道为64特征图;
    可分离非对称卷积第17层采用第一可分离非对称卷积;其中残差连接方式虚线部分表示采用步长为2的1×1卷积,卷积核个数为64;通过第17层第一可分离非对称卷积后,得到长宽为8×8,通道为64特征图。
  6. 根据权利要求5所述的一种基于轻量化网络的交通标志识别方法,其特征在于:在可分离非对称卷积中的每次卷积操作后加BN层和激活层,激活层使用的激活函数均为Relu函数。
  7. 根据权利要2或3或4或5或6所述的一种基于轻量化网络的交通标志识别方法,其特征在于:所述分类器部分第一层可分离全连接模块首先将上一层长宽为8×8,通道为64特征图进行维度转换为64×64的形状,之后分别初始化两个大小为A-1(64×64)、B-1(64×64)的权重矩阵,再用矩阵A-1与进行维度转换后的输入进行矩阵乘法,得到的结果再与矩阵B-1进行矩阵乘法,得到下一层大小为64×64的输出矩阵;
    第二层可分离全连接模块首先分别初始化两个大小为A-2(64× 64)、B-2(64×64)的权重矩阵,最后用矩阵A-2与上一层大小为64×64的输出矩阵进行矩阵乘法,得到的结果再与矩阵B-2进行矩阵乘法,得到下一层大小为64×64的输出矩阵;
    第三层可分离全连接模块首先分别初始化两个大小为A-3(1×64)、B-3(64×64)的权重矩阵,然后用矩阵A-3与上一层大小为64×64的输出矩阵进行矩阵乘法,得到的结果再与矩阵B-3进行矩阵乘法,得到下一层大小为1×64的输出矩阵;最后经过Flatten操作后对输出矩阵展平,经过softmax激活函数用于交通标志64种类别的识别任务。
  8. 根据权利要求1或2或3或4所述的一种基于轻量化网络的交通标志识别方法,其特征在于:所述数据预处理包括确定交通标志图像数据的尺寸并选择合适的候选框,完成对原始交通标志图像数据的裁剪,统一设置裁剪后分辨率为64×64的彩色图像,对裁剪后的数据进行类别划分,利用数据增强方法对每种类别数据进行扩充,数据增强方法具体包括对图像进行轻微水平或垂直平移、调整图像饱和度和白化处理,使得各交通标志类别数量一致,之后对交通标志图像数据进行标记,再按照8:2的比例划分训练集和测试集,构建出交通标志数据集。
PCT/CN2021/107294 2021-03-29 2021-07-20 一种基于轻量化网络的交通标志识别方法 WO2022205685A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/340,090 US11875576B2 (en) 2021-03-29 2023-06-23 Traffic sign recognition method based on lightweight neural network

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110334426.0 2021-03-29
CN202110334426 2021-03-29

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/340,090 Continuation US11875576B2 (en) 2021-03-29 2023-06-23 Traffic sign recognition method based on lightweight neural network

Publications (1)

Publication Number Publication Date
WO2022205685A1 true WO2022205685A1 (zh) 2022-10-06

Family

ID=78090099

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/107294 WO2022205685A1 (zh) 2021-03-29 2021-07-20 一种基于轻量化网络的交通标志识别方法

Country Status (3)

Country Link
US (1) US11875576B2 (zh)
CN (1) CN113537138B (zh)
WO (1) WO2022205685A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115810183A (zh) * 2022-12-09 2023-03-17 燕山大学 一种基于改进VFNet算法的交通标志检测方法
CN116108403A (zh) * 2022-11-16 2023-05-12 北京理工大学 一种具有注意力机制的浅层卷积神经网络结构及优化方法和电子设备
CN116110022A (zh) * 2022-12-10 2023-05-12 河南工业大学 基于响应知识蒸馏的轻量化交通标志检测方法及***
CN116405127A (zh) * 2023-06-09 2023-07-07 北京理工大学 水声通信前导信号检测模型的压缩方法和装置
CN117336057A (zh) * 2023-10-10 2024-01-02 中国矿业大学(北京) 一种基于深度学习的轻量化恶意流量分类方法

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022205685A1 (zh) 2021-03-29 2022-10-06 泉州装备制造研究所 一种基于轻量化网络的交通标志识别方法
CN113723377B (zh) * 2021-11-02 2022-01-11 南京信息工程大学 一种基于ld-ssd网络的交通标志检测方法
CN114972952B (zh) * 2022-05-29 2024-03-22 重庆科技学院 一种基于模型轻量化的工业零部件缺陷识别方法
CN116520990B (zh) * 2023-04-28 2023-11-24 暨南大学 一种基于轻量级神经网络的手语识别方法、***及手套
CN116704476B (zh) * 2023-06-12 2024-06-04 郑州轻工业大学 一种基于改进Yolov4-tiny算法的交通标志检测方法
CN117113010B (zh) * 2023-10-24 2024-02-09 北京化工大学 基于卷积网络轻量化的输电通道安全监测方法及***
CN117437519B (zh) * 2023-11-06 2024-04-12 北京市智慧水务发展研究院 一种无水尺水位识别方法及装置
CN117231524B (zh) * 2023-11-14 2024-01-26 浙江嘉源和达水务有限公司 一种泵汽蚀状态监测诊断方法及***
CN117593674B (zh) * 2024-01-18 2024-05-03 南昌大学 一种轻量级无人机航拍目标实时检测方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111242180A (zh) * 2020-01-03 2020-06-05 南京邮电大学 一种基于轻量化卷积神经网络的图像识别方法及***
CN111444760A (zh) * 2020-02-19 2020-07-24 天津大学 一种基于剪枝与知识蒸馏的交通标志检测与识别方法
CN111626328A (zh) * 2020-04-16 2020-09-04 湘潭大学 一种基于轻量化深度神经网络的图像识别方法及装置
CN113537138A (zh) * 2021-03-29 2021-10-22 泉州装备制造研究所 一种基于轻量化神经网络的交通标志识别方法

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11586417B2 (en) * 2018-09-28 2023-02-21 Qualcomm Incorporated Exploiting activation sparsity in deep neural networks
KR102646695B1 (ko) * 2019-01-15 2024-03-12 포틀랜드 스테이트 유니버시티 비디오 프레임 보간을 위한 특징 피라미드 워핑
US20200293864A1 (en) * 2019-03-12 2020-09-17 Qualcomm Incorporated Data-aware layer decomposition for neural network compression
CN110188705B (zh) * 2019-06-02 2022-05-06 东北石油大学 一种适用于车载***的远距离交通标志检测识别方法
US20220156554A1 (en) * 2019-06-04 2022-05-19 Northeastern University Lightweight Decompositional Convolution Neural Network
US20210089921A1 (en) * 2019-09-25 2021-03-25 Nvidia Corporation Transfer learning for neural networks
US11712224B2 (en) * 2019-10-11 2023-08-01 GE Precision Healthcare LLC Method and systems for context awareness enabled ultrasound scanning
CN110929603B (zh) * 2019-11-09 2023-07-14 北京工业大学 一种基于轻量级卷积神经网络的天气图像识别方法
WO2021166058A1 (ja) * 2020-02-18 2021-08-26 日本電気株式会社 画像認識装置、画像認識方法、及び、記録媒体
CN111311629B (zh) * 2020-02-21 2023-12-01 京东方科技集团股份有限公司 图像处理方法、图像处理装置及设备
CN111914797B (zh) * 2020-08-17 2022-08-12 四川大学 基于多尺度轻量级卷积神经网络的交通标志识别方法
CN112001385B (zh) * 2020-08-20 2024-02-06 长安大学 一种目标跨域检测与理解方法、***、设备及存储介质
US11819363B2 (en) * 2020-09-01 2023-11-21 GE Precision Healthcare LLC Systems and methods to improve resolution of ultrasound images with a neural network
US12008695B2 (en) * 2020-09-25 2024-06-11 GE Precision Healthcare LLC Methods and systems for translating magnetic resonance images to pseudo computed tomography images
US11636608B2 (en) * 2020-10-21 2023-04-25 Smart Engines Service, LLC Artificial intelligence using convolutional neural network with Hough transform
US20220327189A1 (en) * 2021-04-09 2022-10-13 Qualcomm Incorporated Personalized biometric anti-spoofing protection using machine learning and enrollment data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111242180A (zh) * 2020-01-03 2020-06-05 南京邮电大学 一种基于轻量化卷积神经网络的图像识别方法及***
CN111444760A (zh) * 2020-02-19 2020-07-24 天津大学 一种基于剪枝与知识蒸馏的交通标志检测与识别方法
CN111626328A (zh) * 2020-04-16 2020-09-04 湘潭大学 一种基于轻量化深度神经网络的图像识别方法及装置
CN113537138A (zh) * 2021-03-29 2021-10-22 泉州装备制造研究所 一种基于轻量化神经网络的交通标志识别方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DANG LANXUE, PANG PEIDONG, LEE JAY: "Depth-Wise Separable Convolution Neural Network with Residual Connection for Hyperspectral Image Classification", REMOTE SENSING, vol. 12, no. 20, pages 3408, XP055972559, DOI: 10.3390/rs12203408 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116108403A (zh) * 2022-11-16 2023-05-12 北京理工大学 一种具有注意力机制的浅层卷积神经网络结构及优化方法和电子设备
CN116108403B (zh) * 2022-11-16 2023-06-16 北京理工大学 一种具有注意力机制的浅层卷积神经网络结构及优化方法和电子设备
CN115810183A (zh) * 2022-12-09 2023-03-17 燕山大学 一种基于改进VFNet算法的交通标志检测方法
CN115810183B (zh) * 2022-12-09 2023-10-24 燕山大学 一种基于改进VFNet算法的交通标志检测方法
CN116110022A (zh) * 2022-12-10 2023-05-12 河南工业大学 基于响应知识蒸馏的轻量化交通标志检测方法及***
CN116110022B (zh) * 2022-12-10 2023-09-05 河南工业大学 基于响应知识蒸馏的轻量化交通标志检测方法及***
CN116405127A (zh) * 2023-06-09 2023-07-07 北京理工大学 水声通信前导信号检测模型的压缩方法和装置
CN116405127B (zh) * 2023-06-09 2023-09-12 北京理工大学 水声通信前导信号检测模型的压缩方法和装置
CN117336057A (zh) * 2023-10-10 2024-01-02 中国矿业大学(北京) 一种基于深度学习的轻量化恶意流量分类方法

Also Published As

Publication number Publication date
CN113537138A (zh) 2021-10-22
CN113537138B (zh) 2023-04-18
US11875576B2 (en) 2024-01-16
US20230334872A1 (en) 2023-10-19

Similar Documents

Publication Publication Date Title
WO2022205685A1 (zh) 一种基于轻量化网络的交通标志识别方法
CN108830855B (zh) 一种基于多尺度低层特征融合的全卷积网络语义分割方法
CN107292256B (zh) 基于辅任务的深度卷积小波神经网络表情识别方法
CN111310773B (zh) 一种高效的卷积神经网络的车牌定位方法
CN112396002A (zh) 一种基于SE-YOLOv3的轻量级遥感目标检测方法
CN114202672A (zh) 一种基于注意力机制的小目标检测方法
CN113052210A (zh) 一种基于卷积神经网络的快速低光照目标检测方法
US20210158166A1 (en) Semi-structured learned threshold pruning for deep neural networks
CN111460980B (zh) 基于多语义特征融合的小目标行人的多尺度检测方法
CN111696101A (zh) 一种基于SE-Inception的轻量级茄科病害识别方法
CN109711422A (zh) 图像数据处理、模型的建立方法、装置、计算机设备和存储介质
CN106156777B (zh) 文本图片检测方法及装置
CN113780211A (zh) 一种基于改进型Yolov4-tiny的轻量级飞机检测方法
CN114187450A (zh) 一种基于深度学习的遥感图像语义分割方法
CN113177560A (zh) 一种普适性轻量级深度学习车辆检测方法
CN110991349B (zh) 一种基于度量学习的轻量级车辆属性识别方法
CN113326930A (zh) 数据处理方法、神经网络的训练方法及相关装置、设备
US20220132050A1 (en) Video processing using a spectral decomposition layer
US20220121949A1 (en) Personalized neural network pruning
CN111680739A (zh) 一种目标检测和语义分割的多任务并行方法及***
CN114882530B (zh) 一种构建面向行人检测的轻量级卷积神经网络模型的方法
CN109190666B (zh) 基于改进的深度神经网络的花卉图像分类方法
CN108537226A (zh) 一种车牌识别方法和装置
CN113298032A (zh) 基于深度学习的无人机视角图像的车辆目标检测方法
CN113449671A (zh) 一种多尺度多特征融合的行人重识别方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21934329

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21934329

Country of ref document: EP

Kind code of ref document: A1