AU2017101803A4

AU2017101803A4 - Deep learning based image classification of dangerous goods of gun type

Info

Publication number: AU2017101803A4
Application number: AU2017101803A
Authority: AU
Inventors: Mufei Chen; Quansen Wang; Hang Yin; Renyuan ZHANG; Hongbo Zhu
Original assignee: Chen Mufei Ms
Current assignee: Chen Mufei Ms
Priority date: 2017-12-24
Filing date: 2017-12-24
Publication date: 2018-02-15
Anticipated expiration: 2025-12-24

Abstract

WINO201712002 This invention lies in the field of digital image processing and is, in particular, a deep learning enabled technique for image classification of dangerous goods. The invention consists of the following steps: firearm image preprocessing and post-processed data set splitting - training and test set - wherein the training set is cross validated for hyper-parameters searching and model parameter learning. Each training image is treated as a NDArray and is then fed into a deep Convolutional Neural Network (CNN) for forward evaluation. The entire training process will learn a set of optimal parameters acrossall layers of the neural net with the goal of minimizing empirical loss. The set of learned parameters together with our model is then stored and used to predict labels for the test set. The closeness between the true label and predicted label is the measure of the model performance that we look for. The final justified deep learning model is able to classify new sensitive images to achieve the goal of dangerous goods identification to a favorable accuracy. This invention does not require manualfeature selection.It is a state-of-the-art image classification technique with high performance and reliability based on deep learning method to identify dangerous goods of gun type.

Description

DESCRIPTION

TITLE

Deep learning based image classification of dangerous goods of gun type FIELD OF THE INVENTION

This invention is inthe field of digital image processing andserves as dangerous goods identification powered by deep learning.

BACKGROUND OF THE INVENTION

With rapid popularization of the Internet especially mobile Internet, the amount of multimedia data stored in the form of image and video is increasing in a way that we cannot imagine. Fast and accurate object detection algorithm via images from a variety ofsources has become the hotspot in scientific research. In this invention, we focus on identifying dangerous goods of gun type, inciudingbut not limit to handgun, submachine gun, and ammunition. Most of the traditional monitoring methods rely on the human resources. However, due to the scale of multimedia contents, it is nearly impossible to satisfy the actual needs of Internet monitoring depending only on manual inspection. Moreover, the traditional visual recognition methods have limitations inthe feature extraction ofthe image, such as SIFT and SURF. Other solutions that utilize deep learning method are usually not end-to-end, resulting in massive storage cost and separate training stages. We must also deal with the problem of overfitting due to the design of network architecture that consists of millions of parameters.

Deeping learning is indeed a subarea of machine learning. It has become the most intriguingresearch field since 2012, the year in which Rectified Linear Unit (ReLU) was introduced to resolve the issue of gradient vanishing during back propagation. It overtakes other machine learning algorithms, like SVM and logistic regression, in model performance.Since then, many emerging real world applications or systems based on deep learning have catch up with human abilities in fields like image classification, natural language processing and decision making in complex systems.

The idea of artificial neural networks (ANN) was introduced in the 1950s to mathematically model the function of biological neurons. The early research of the Artificial Neural Network reached plateau several times.When back propagation is used to update the parameters of ANN, the gradient of loss function with w.r.t each weight will start diminishing, as the network gets deeper. This is known as gradient vanishing and the performance will drop if more layers are added into the network. Because of this, the deepest network in the last decade contains only three layers as opposed to 100 layers of Inception Net introduced last year. Since the invention of SVM, artificial neural networks received little attention and research interest, because the former has superior accuracy and is intrinsically a convex optimization problem that can be solved via its dual problem. Not until a few years ago, the development of high performance computing platform and big data processing technique, together with ReLU, deep learning started to attract eye balls and has been so far the most powerful component of Artificial Intelligence. The classical exampleof deep learning is to classify hand-written digital image and house price predicting, which can also be done with traditional machine models but with low scalability and performance. In this invention, we focus on real world image classification and use TensorFlow as the deep learning framework to implement our model. We choose TensorFlow because it was introduced and maintained by Google and has the largest deep learning communities and Github contributors.

SUMMARY

In order to solve the shortcomings and deficiencies in the existing technology, this invention proposes an image classification method for firearms that is based on deep learning. Through combining deep learning with image recognition for firearms, and adopting the layer-by-layer initializing training mode, the proposed method significantly improves the training times and overcomessome of the technical difficulties of training process. The method gives full play to the advantages of deep learning and auto-learning.

The technical solution of this invention is implemented as follows:

Ourdeep learning image classification method for firearms includes: firearm image database, convolutional neural network and fully connected layers. The images of firearm image database are imported into the model and the representative characteristics will be the output after forward evaluation. At last, the classifier is able to identifyimages containing sensitive object and the result will be presented.

In summary, the deep neural network contains an input layer, intermediate hidden layers, and an output layer. The following steps are included:

Step (1), the input layer preprocess original images, including cropping, scaling and segmentation.

Step (2), hidden layers combine forward unsupervised learning andbackward supervised learning.

Step (3), the output layer creates CNN feature maps, includingcombinations of peripheral features, morphological features, and color features of the original image, and they will be used for classification task processed in FC layers.

Optionally, we use Auto encoder to initialize model parameters. It is a bottom-up unsupervised learning with symmetrical architecture: the input is encoded and compresseddown to the middle layer and is then decompressed to its original with some information lost. The encoding and decoding parameters are optimized to minimize the reconstruction error (as the distance between the output and input).

Optionally, the specific process of top-down supervised learning worksas follows: during trainingphase, the error is transmitted from top to bottom, and the parameters of each layer are fine-tuned accordingly.

DESCRIPTION OF DRAWING

Figure 1 is the flow block diagram of model training in our invention.

Figure 2 is the flow block diagram of trained model making predictions.

Figure 3 shows the generic structure of the deep neural network. DESCRIPTION OF PREFERRED EMBODIMENT Network Design

We implement this model as a convolutional neural network and evaluate it on dataset from ImageNet and other sources. The initial convolutional layers of the network extract features from the image while the fully connected layers predict the class probabilities.

Our network architecture is inspired by the GoogLeNet model for image classification and the YOLO model for real-time object detection. Figure 1 display the architecture of our detection network. Our network has four convolutional layers followed by 2 fully connected layers. Each convolutional layer consists of three components: convolution, batch normalization, and pooling. Convolution as itself is a 3x3 kernel (the entries are learned during training) that filters across the entire image. The stride is set to be one and zero padding is added to preserve the peripheral information.

Right after convolution, there is a pooling to down-sample the previous output. It is then followed by a non-linear activation. These three components are grouped together in this order and repeated four times before flattening the 2D array into a 1D vector. This vector is then fed into two consecutive fully connected layers to produce the final prediction. L2 regularization and random dropout are included in our design in the training stage to fight against overfitting and optimize the performance of the final classifier.

Procedure

The procedure of this invention is implemented as follows: 1. Download firearm related image dataset from reliable source. 2. Data set cleaning: we extract useful information from the data set and localize the object (gun/knife) shown in the image and delete images with poor quality. 3. Image preprocessing: we scale images into a uniform 32x32 shape and create a path for the image dataset. 4. Train test splitting: we use stratified sampling that enables roughly the same proportion of positive instance in both training and test set. The ratio of training set to test set is approximately 10:1. 5. This invention uses deep learning model that contains four convolutional layers and two fully connected layers. def ιηοάθΐ (data_flow, train=True) : ©data: original inputs ©return: logits f# Define Convolutional Layers for i, (weights, biases, config) in enumerate (zip (self. conv_weights, self. conv_biases, self. conv_config)) : with tf. name_scope (config [’ name’ ] + ’_model’): with tf. name_scope(’ convolution ): if default 1, 1, 1, 1 stride and SAME padding data_flow = tf. nn. conv2d(data_flow, filter=weights, strides=[l, 1, 1, 1], padding=’SAME’) data_flow = data_flow + biases if not train: self, visualize_filter_map(data_flow, how_many=config[’ out_depth’ ], display_size=32 // (i //2 + 1), name=config[’name’] + ’ _conv’) if config[’ activation ] == ’ relu’ : data_flow = tf. nn. relu(data_flow) if not train: self, visualize_filter_map(data_flow, how_many= config[’ out_depth’ ], display_size=32 // (i // 2 + 1), name=config[’name’] + ’ _relu ) else: raise Exception^ Activation Func can only be Relu right now. You passed’, config[’activation ]) if config [’pooling’] : data_flow = tf. nn. max_pool ( data_f low, ksize=[l, self. pooling_scale, self. pooling_scale, 1], strides=[l, self.pooling_stride, self.pooling_stride, 1], padding=’ SAME’) if not train: self, visualize_filter_map(data_flow, how_many= config[’ out_depth’ ], display_size=32 // (i // 2 + 1)^ // 2, name=config[’ name’ ] + ’ _pooling’)

If Define Fully Connected Layers for i, (weights, biases, config) in enumerate(zip(self. fc_weights, self. fc_biasos, self. fc_config)): if i == 0: shape = data_flow. get_shape(). as_list() data_flow = tf. reshape(data_flow, [shape[0], shape[l] * shape[2] * shape[3]]) with tf. name_sc ope (config [’name’] + ’model’): ffffff Dropout if train and i == len(self. fc_weights) - 1: ata_flow = tf. nn. dropout(data_flow, self. dropout_rate, seed=4926) ffffff data_flow = tf. matmul(data_flow, weights) + biases if config [’activation’] == ’relu’: data_flow = tf. nn. relu(data_flow) elif config[’activation’] is None: pass else: raise Exception( Activation Func can only be Relu or None right now. You passed , config[’ activation’ ]) return data_flow 6. The model is trained with post-processed training data using back propagation. Both L2 regularization and dropout are used to fight against overfitting and increase the robustness of the features extracted by the convolutional layers. Model parameters are updated through stochastic gradient descent with the loss function based on each mini-batch. # Training computation. logits = model(self. tf_train_samples) with tf. name_scope (’ loss’): self, loss = tf.reduce_meanCtf.nn. softmax_cross_entropy_with_logits(logits=logits, labels=self. tf_train_labels)) self, loss += self. apply_regularization(_lambda=5e-4) self. train_summaries. appendCtf. summary, scalar (’ Loss’, self, loss)) tf learning rate decay global_step = tf. Variable(0) learning_rate = tf. train. exponential_decay( learning_rate=self. base_learning_rate, global_step=global_step * self. train_batch_size, decay_steps=100, decay_rate=self. decay_rate, staircase=True ) tf Optimizer. with tf. name_scope (’ optimizer’ ): if (self. optimizeHethod == ’ gradient’): self, optimizer = tf. train \ . Gradi entDescentOptimi zer Clearning_rate) \ . minimizeCself. loss) elif Cself. optimizeMethod == ’momentum’): self, optimizer = tf. train \ . MomentumDptimizer(learning_rate, 0.5) \ . minimize Cself. loss) elif Cself. optimizeHethod == ’ adam’ ) : self, optimizer = tf. train \ . AdamDptimizer (learning_rate) \ . minimize Cself. loss) 7. Testing: we take unlabeled images (from test set) as input and evaluate all the nodes in the neural network to output final prediction. def testCself, test_samples, test_labels, *, data_iterator): if self, saver is None: self. define_model0 if self.writer is None: self, writer = tf. summary. FileVriter (’ . /board’, tf. get_def ault_graphO) print(’ Before session’) with tf. SessionCgraph=tf. get_default_graphC)) as session: self.saver.restoreCsession, self. save_path) tftftf 9)¾ accuracies = [] confusionMatrices = []

for i, samples, labels in data_iteratorCtest_samples, test_labels, chunkSize=self. test_batch_size): result = session. runC self. test_prediction, feed_dict={self. tf_test_samples: samples} ) tf self, writer, add_summaryCsummary, i) accuracy, cm = self, accuracyCresult, labels, need_confusion_matrix=True) accuracies, appendCaccuracy) confusionMatrices. appendCcm) print(’ Test Accuracy: %.lf%%’ % accuracy) printC Average Accuracy:’, np. average(accuracies)) print(’Standard Deviation:’, np. std(accuracies)) self. print_confusion_matrix(np. add. reduce(confusionMatrices))

Claims

CLAIM
1. A method for image identification based on deep learning, comprising: getting the representative characteristics by importing the image database to the model; using the classifier to identify images containing sensitive objects. 1