CN111986156A

CN111986156A - Axe-shaped sharp tool detection method, system, device and storage medium

Info

Publication number: CN111986156A
Application number: CN202010702240.1A
Authority: CN
Inventors: 黄翰; 孙梦托; 冯夫健; 徐杨; 百晓; 董志诚
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2020-07-20
Filing date: 2020-07-20
Publication date: 2020-11-24

Abstract

The invention discloses a method, a system, a device and a storage medium for detecting an axe-shaped sharp tool, wherein the method comprises the following steps: establishing a sharp instrument detection model; acquiring image data, carrying out sharp instrument object identification detection on the image data by adopting the sharp instrument detection model, and outputting a detection result; the edge tool detection model is obtained based on the Yolov3 neural network training, and the size of a boundary box adopted in the detection process is calculated through a k-means clustering algorithm. The method focuses on the auxiliary effect of computer vision based on the neural network on the frontier defense safety field, and has innovative significance; in addition, the size of the boundary box adopted in the detection process is obtained through calculation of a k-means clustering algorithm, the boundary box closer to the size of the axe-shaped sharp tool is generated by the aid of the model, detection accuracy of the model is improved, and the method can be widely applied to the technical field of image data processing.

Description

Axe-shaped sharp tool detection method, system, device and storage medium

Technical Field

The invention relates to the technical field of image data processing, in particular to an axe-shaped sharp instrument detection method, system, device and storage medium.

Background

According to the world health organization, over 15,000 people die from violent crimes every year, and timely discovery of potential dangers is critical to the safety of citizens. The current methods for predicting sharps situations are primarily through the detection of the presence of dangerous objects in the surveillance video, but current surveillance and control systems still require manual supervision and intervention. One of the most effective ways to solve this problem is to equip the surveillance camera with an accurate sharps detection system, reducing human involvement.

The axe-shaped sharp tool has definite shape definition, but has larger shape difference when being presented at different angles in a complex environment, is easy to be identified into other objects by mistake, and brings great difficulty to the detection of the axe-shaped sharp tool due to other problems of shielding, illumination, complex background and the like.

Disclosure of Invention

In order to solve one of the above technical problems, an object of the present invention is to provide an axe-shaped sharps detection method, system, device and storage medium.

The technical scheme adopted by the invention is as follows:

an axe-shaped sharp tool detection method comprises the following steps:

establishing a sharp instrument detection model;

acquiring image data, carrying out sharp instrument object identification detection on the image data by adopting the sharp instrument detection model, and outputting a detection result;

the edge tool detection model is obtained based on the Yolov3 neural network training, and the size of a boundary box adopted in the detection process is calculated through a k-means clustering algorithm.

Further, the establishing a sharps detection model includes:

constructing a picture data set, and calibrating all picture data in the picture data set to obtain an xml file;

after data enhancement is carried out on the calibrated picture data aiming at different scaling ratios, rotation angles and brightness, a training data set is obtained from the picture data set;

inputting the training data set into a YOLOv3 neural network for training, and obtaining a sharp instrument detection model after the training is completed.

Further, the calibrating all the picture data in the picture data set to obtain the xml file includes:

classifying the sharps object and determining a class name;

calibrating all picture data in the picture data set by adopting a labelImg tool, calibrating the category and position information of the sharps object, and generating an xml file;

wherein the position information includes four coordinate points.

Further, the data enhancement of the calibrated picture data for different scaling, rotation angle and brightness includes:

performing contrast stretching on the picture data in the picture data set, keeping the corresponding calibration information in the xml file unchanged, and adding the picture data obtained through the contrast stretching into the picture data set;

adding random noise into the picture data of the picture data set, keeping the corresponding calibration information in the xml file unchanged, and adding the picture data added with random noise processing into the picture data set;

and carrying out multi-scale change on the picture data in the picture data set, carrying out corresponding coordinate change on the corresponding calibration information in the xml file according to the scale change, and adding the picture data obtained through the multi-scale change into the picture data set.

Further, the inputting the training data set into the YOLOv3 neural network for training, and obtaining a sharer detection model after the training is completed, includes:

training by adopting a deep learning frame Darknet, and inputting a training data set into a Darknet-53 network structure for convolution and feature extraction;

dividing picture data into a plurality of grids, each grid comprising a plurality of bounding boxes, the bounding boxes having confidence levels;

using a non-maximum value suppression NMS algorithm to perform boundary box suppression on the boundary box with the reliability lower than a preset threshold value, and outputting a final boundary box and a prediction category;

iterating the model, and stopping training when the iteration times reach preset times to obtain a sharp instrument detection model;

wherein the confidence is defined as:

where Pr (Object) represents the probability of the Object class appearing on this grid,

the cross-over ratio is used.

Further, the loss function used in the training process is:

wherein, σ (t)_x)、σ(t_y) Respectively based on the horizontal and vertical coordinate offsets of the grid point at the upper left corner of the center point of the prior rectangular frame,

respectively based on the horizontal and vertical coordinate offsets of the lattice point at the upper left corner of the central point of the posterior rectangle frame, sigma is an activation function, t_w、t_hRespectively the width and height of the prior box,

respectively the width and height of the posterior box, C,

Respectively a priori and a posteriori, p_i(c)、

A priori, a posteriori probability, respectively, for class c.

Further, the performing sharp object identification detection on the image data by using the sharp object detection model includes:

after carrying out size adjustment on input image data, dividing the input image data into a plurality of grids, predicting a plurality of boundary frames by each grid, wherein the boundary frames have class probabilities and confidence degrees;

threshold value screening is carried out on the boundary box obtained through prediction, and the boundary box with the confidence coefficient lower than a set threshold value is screened out;

carrying out coordinate transformation on the remaining bounding boxes in the image data to obtain final coordinate information of the edge tool object;

drawing the corresponding coordinate information in the image data by adopting an opencv image algorithm library, and labeling the confidence coefficient.

The other technical scheme adopted by the invention is as follows:

an axe-shaped sharps detection system, comprising:

the model establishing module is used for establishing a sharp instrument detection model;

the image detection module is used for acquiring image data, carrying out sharp instrument object identification detection on the image data by adopting the sharp instrument detection model and outputting a detection result;

The other technical scheme adopted by the invention is as follows:

an axe-shaped sharps detection device, comprising:

at least one processor;

at least one memory for storing at least one program;

when executed by the at least one processor, cause the at least one processor to implement the method described above.

The other technical scheme adopted by the invention is as follows:

a storage medium having stored therein processor-executable instructions for performing the method as described above when executed by a processor.

The invention has the beneficial effects that: the method focuses on the auxiliary effect of computer vision based on the neural network on the frontier defense safety field, and has innovative significance; in addition, the size of the boundary box adopted in the detection process is obtained through calculation of a k-means clustering algorithm, so that the boundary box closer to the size of the axe-shaped sharp tool is generated by the model, and the detection accuracy of the model is improved; the axe-shaped sharp tool detection device has a certain effect on detection of axe-shaped sharp tools in a frontier defense environment, and is beneficial to frontier defense personnel to timely handle dangerous situations.

Drawings

FIG. 1 is a flow chart illustrating steps of a method for detecting an axe-shaped sharp instrument according to an embodiment of the present invention;

fig. 2 is a block diagram of an axe-shaped sharps detection system in the embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.

In the description of the present invention, it should be understood that the orientation or positional relationship referred to in the description of the orientation, such as the upper, lower, front, rear, left, right, etc., is based on the orientation or positional relationship shown in the drawings, and is only for convenience of description and simplification of description, and does not indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention.

In the description of the present invention, the meaning of a plurality of means is one or more, the meaning of a plurality of means is two or more, and larger, smaller, larger, etc. are understood as excluding the number, and larger, smaller, inner, etc. are understood as including the number. If the first and second are described for the purpose of distinguishing technical features, they are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.

In the description of the present invention, unless otherwise explicitly limited, terms such as arrangement, installation, connection and the like should be understood in a broad sense, and those skilled in the art can reasonably determine the specific meanings of the above terms in the present invention in combination with the specific contents of the technical solutions.

As shown in fig. 1, the present embodiment provides an axe-shaped sharps detection method, which is capable of resisting interference of a complex scene, including but not limited to the following steps:

s1, shooting a large number of images containing the axe-shaped sharp instruments held by people from the class monitoring scene in the class frontier environment to form a picture data set of the axe-shaped sharp instruments.

Specifically, step S1 includes steps S11-S14:

and S11, analyzing the types of the common axe-shaped sharp instruments in the frontier defense environment, and determining the axe-shaped sharp instrument data requirements meeting the requirements.

And S12, downloading a correlation diagram containing the axe-shaped sharp weapon held by the crawler to form a part of the data set.

S13, taking a picture of the axe-shaped sharp instrument held by a person under the condition of an open plain, grassland or mountain as background, and forming a part of a data set

And S14, screening all the collected pictures and removing wrong pictures.

And S2, calibrating all image data to obtain an xml file for recording the object types and the position information of the object types in the picture so as to meet the requirement of Yolov3 training.

Labeling the picture through step S2, specifically performing image labeling through the following steps S21-S22:

and S21, downloading the labelImg and configuring. Wherein labelImg is an open source image tag annotation software.

S22, selecting an object for each picture frame by using labelImg, designating the name of the type of the target object to be axe, requiring the calibration frame to only contain the pixels of the object as much as possible, calibrating the type and the position information (xmin, ymin, xmax, ymax) of the object, generating an xml file, and facilitating the later analysis to obtain the type and the position information of the object in the picture.

And S3, performing data enhancement on the data in the processed data set according to different scaling, rotation angles, brightness and the like, and randomly dividing the data into a training data set, a verification data set and a test data set according to the ratio of 7:2: 1.

The image is enhanced through the step S3, the data volume is expanded, and the robustness of the model is improved, wherein the step S3 specifically includes the steps S31 to S34:

and S31, performing contrast stretching on all pictures in the data set obtained in the step S2, keeping the marking information in the corresponding xml unchanged, and adding the pictures subjected to contrast stretching into a new data set.

And S32, adding random noise to all the pictures in the data set obtained in the step S2, keeping the marking information in the corresponding xml unchanged, and simultaneously adding the random noise into a new data set.

And S33, carrying out multi-scale change on all pictures in the data set obtained in the step S2, simultaneously carrying out corresponding coordinate change on the corresponding label in the xml according to the scale change, and adding the obtained data into a new data set.

And S34, randomly generating a training data set, a verification data set and a test data set according to the ratio of 7:2:1 for the enhanced data set.

And S4, generating the anchor box (namely the bounding box) size corresponding to the data set by adopting a clustering algorithm.

In the embodiment, the Anchor size obtained by clustering the VOC data set is not adopted, and the difference of the sizes of the 20 types of targets is large, so that the clustered Anchor size is not reasonable for axe-shaped sharp instruments; and calculating the anchor size for the application by adopting a k-means clustering algorithm.

And S5, inputting the training data set into the neural network to extract the ax-shaped sharp instrument characteristics, generating a predicted ax-shaped sharp instrument prediction boundary box according to the anchor box, and performing multi-scale fusion prediction by using a similar FPN network to improve the accuracy of the boundary box and class prediction.

Training the model through the step S5, and specifically comprising the steps S51-S57:

s51, training by adopting a deep learning frame Darknet, inputting training data into a Darknet-53 network structure for convolution and feature extraction: the Darknet-53 network structure is composed of 53 convolutional layers and a Residual structure, each convolutional layer is followed by a batch normalization layer and an activation function, and the activation function selects a Leaky Relu function, because the Leaky Relu function can solve the problem that a neuron cannot learn after the Relu function enters a negative interval. The initial training parameters are set as follows: initial learning rate (learning rate): 0.0001; attenuation of weight: 0.0005; momentum: 0.9.

s52, in the process of multi-scale fusion prediction of the axe-shaped sharp machine of the target object through the similar FPN network, dividing a characteristic diagram with the size of 13x13 output by the 82 th layer, a characteristic diagram with the size of 26x26 output by the 94 th layer and a characteristic diagram with the size of 52x52 output by the 106 th layer into 13x13 grids, 26x26 grids and 52x52 grids respectively.

S53, according to the anchor box, three prior frames are set on each down-sampling scale in step S52, and there are nine prior frames in total. Dividing the feature map of three scales generated in the previous step into grids of different numbers, and through the prediction process of a YOLOv3 network, generating 3 anchor boxes on each grid, wherein each anchor box is responsible for predicting a target boundary box, each boundary box simultaneously outputs 5 predicted values, x and y are the central coordinates of the predicted boundary box, and w and h are the width and height of the predicted boundary box, so that 3 anchor boxes can be generated on each grid positioned on three feature maps in the map to generate the predicted target boundary box, each boundary box has five basic parameters of x, y, w, h and confidence, wherein (x, y) is the central coordinates of the boundary box, (w, h) is the width and height of the boundary box, and confidence is confidence; while each bounding box has a probability of one category (axe sharps); wherein the confidence (confidence) is defined as:

where Pr (Object) represents the probability of the Object category appearing in this grid,

the meaning of (1) is cross-over ratio, that is, intersection is performed between the real bounding box truth and the predicted bounding box pred, and then the intersection is divided by the union of the two, so that the cross-over ratio is obtained.

S54, let the loss function be:

respectively based on the horizontal and vertical coordinate offsets of the lattice point at the upper left corner of the central point of the posterior rectangle frame, sigma is an activation function, and a sigmoid function is adopted in the embodiment

As an activation function, t_w、t_hRespectively the width and height of the prior box,

respectively the width and height of the posterior box, C,

Respectively a priori and a posteriori, p_i(c)、

Prior, posterior probabilities for class c, respectively; in addition, the first and second substrates are,

representing that in the lattice in the ith row and j column, if there is a corresponding object class obj, the equation is 1, otherwise 0,

representing that in the grids in the ith row and the j column, if any category exists, the sub value of the formula is 1, otherwise, the value is 0; s²Representing the number of grids, this example takes 7 × 7; b represents the number of prediction frames of each unit cell, and the invention takes 3. In general, in the formula of the loss function, the first line and the second line adopt the total square error as the loss function of the position prediction, the third line adopts the total variance as the loss function of the confidence degree prediction, and the fourth line adopts the total variance as the loss function of the class probability.

And S55, using a non-maximum value suppression NMS algorithm to perform boundary box suppression on the prediction boundary box with the reliability lower than a given threshold value, and outputting the final prediction boundary box and the prediction category.

And S56, calculating the weight value and the offset value after the convolutional neural network is updated by adopting a random gradient descent method.

And S57, stopping training after the training is iterated to 40000 times, and storing the trained model.

And S6, packaging the model file obtained through training and the network structure description file into a Python SDK together, reading the image or video by adopting OpenCV, and identifying the image or video to obtain the position information of the axe-shaped sharp tool in the test data.

Detecting video data or picture data by using a trained model, acquiring each frame of picture in the video data when the video data is the video data, and then detecting, wherein the method specifically comprises the following steps of S61-S65:

and S61, extracting each frame in the test video as an input image.

S62, adjusting each picture to 416 × 416 size, and dividing into 7 × 7 ═ 49 grids on average.

S63, for each of the 49 grids generated in the previous step, YOLOv3 predicts 3 bounding boxes, each with 6 predicted values, 4 of which are the coordinates of the center point and the length and the height of the predicted bounding box, another predicted value is the category probability, and the last one is the confidence.

S64, threshold screening is carried out on the predicted 7x7x3 target windows, the windows with the confidence degrees lower than the set threshold are screened out, and then the non-maximum value is used for inhibiting and removing the redundant windows.

S65, performing coordinate transformation on the coordinates of the articles detected by the remaining windows in the picture, performing non-maximum suppression again, and removing repeated windows in which the same object is detected to obtain final object coordinate information; and drawing the corresponding position information in the original image by using opencv.

In summary, compared with the prior art, the method of the embodiment has the following beneficial effects:

(1) in the low-dimensional feature, the shape feature of the axe-shaped sharp tool is focused on, and the features of other objects which are not related to the importance are weakened. The method is close to a real scene in data collection as much as possible, solves the problem of insufficient training data by adopting methods such as data enhancement, transfer learning and the like, finishes labeling, and divides the training data into a training set, a verification set and a test set for training.

(2) In order to avoid the problem that the original anchors are not suitable for the size characteristics of the detected object, the original 9 anchors are not adopted in the embodiment, and the new 9 anchors are calculated for the labeled data by independently adopting a k-means clustering algorithm and used for inputting the model training, so that the model is helped to generate a detection frame which is closer to the size of the axe-shaped sharp machine, and the accuracy of the model is improved.

(3) And carrying out a plurality of data enhancement on the data set, and improving the robustness of the model.

The method provided by the embodiment focuses on the auxiliary effect of computer vision based on the deep neural network on the frontier defense safety field, and has innovative significance. And the method has high accuracy and good real-time performance, has certain effect on the detection of the axe-shaped sharp tool in the frontier defense environment, and is favorable for frontier defense personnel to timely handle dangerous situations.

As shown in fig. 2, the present embodiment further provides an axe-shaped sharps detection system, which includes:

The axe-shaped sharp tool detection system of the embodiment can execute the axe-shaped sharp tool detection method provided by the method embodiment of the invention, can execute any combination implementation steps of the method embodiment, and has corresponding functions and beneficial effects of the method.

This embodiment also provides an axe form sharp weapon detection device, includes:

at least one processor;

at least one memory for storing at least one program;

The axe-shaped sharp tool detection device of the embodiment can execute the axe-shaped sharp tool detection method provided by the method embodiment of the invention, can execute any combination implementation steps of the method embodiment, and has corresponding functions and beneficial effects of the method.

The present embodiments also provide a storage medium having stored therein processor-executable instructions, which when executed by a processor, are configured to perform the method as described above.

The embodiment also provides a storage medium, which stores an instruction or a program capable of executing the axe-shaped sharps detection method provided by the embodiment of the method of the invention, and when the instruction or the program is run, the method can be executed by any combination of the embodiment of the method, and the method has corresponding functions and beneficial effects.

It will be understood that all or some of the steps, systems of methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention.

Claims

1. An axe-shaped sharp tool detection method is characterized by comprising the following steps:

establishing a sharp instrument detection model;

2. An axe-shaped sharps detection method as claimed in claim 1, wherein the establishing of the sharps detection model comprises:

3. The axe-shaped sharps detection method according to claim 2, wherein the calibrating all picture data in the picture data set to obtain an xml file comprises:

classifying the sharps object and determining a class name;

wherein the position information includes four coordinate points.

4. The axe-shaped sharp detection method as claimed in claim 2, wherein the data enhancement of the calibrated picture data for different scaling, rotation angle and brightness comprises:

5. The axe-shaped sharps detection method as claimed in claim 2, wherein the inputting the training data set into a YOLOv3 neural network for training, and obtaining a sharps detection model after the training is completed comprises:

inputting the training data set into a Darknet-53 network structure for convolution and feature extraction;

wherein the confidence is defined as:

the cross-over ratio is used.

6. An axe-shaped sharps detection method according to claim 2, characterized in that the loss function adopted in the training process is:

respectively the width and height of the posterior box, C,

Respectively a priori and a posteriori, p_i(c)、

A priori, a posteriori probability, respectively, for class c.

7. The axe-shaped sharps detection method as claimed in claim 1, wherein the performing sharps object identification detection on the image data by using the sharps detection model comprises:

8. An axe-shaped sharps detection system, comprising:

9. An axe-shaped sharps detection device, comprising:

at least one processor;

at least one memory for storing at least one program;

when executed by the at least one processor, cause the at least one processor to implement a method of hatchet detection as recited in any of claims 1-7.

10. A storage medium having stored therein a program executable by a processor, wherein the program executable by the processor is adapted to perform the method of any one of claims 1-7 when executed by the processor.