CN116824511A

CN116824511A - Tool identification method and device based on deep learning and color space

Info

Publication number: CN116824511A
Application number: CN202310975350.9A
Authority: CN
Inventors: 陆彬; 孟思宏; 李琳; 姜德田; 范以云; 龙如兵
Original assignee: Xingwei Technology Beijing Co ltd
Current assignee: Xingwei Technology Beijing Co ltd
Priority date: 2023-08-03
Filing date: 2023-08-03
Publication date: 2023-09-29

Abstract

The application relates to the field of information processing, and discloses a tool identification method and device based on deep learning and color space, wherein the track of a brightness V value is adjusted according to the unique illumination characteristics of an outdoor tool through an improved HSV color space, and the track is changed from a fixed value to a dynamic value, so that the original characteristics of data can be expressed, and the accuracy of characteristic extraction is ensured; meanwhile, by using a network of the resnet50, algorithm overfitting is avoided, and detection of the humanoid form is combined with deep learning targets, so that the accuracy of tool identification is ensured.

Description

Tool identification method and device based on deep learning and color space

Technical Field

The application relates to the field of information processing, in particular to a tool identification method and device based on deep learning and color space.

Background

At present, with the continuous development of artificial intelligence technology, more and more new business demands are induced, the application of artificial intelligence in security industry is also more and more popular, and the form of providing security by means of artificial and video monitoring alone in the past is also changed. Workers in the scenes such as factories, shops and working parks are required to wear unified dressing to facilitate management, the cameras are installed alone, monitoring video is monitored manually, the time of dressing of the monitoring workers is realized in the parks with more complex scene workers, more monitoring is needed, more resources such as manpower are input, and time, manpower and money are consumed. For common merchants, a large amount of wiring and energy consumption are required for placing cameras, which has become a headache problem in factories, parks, shops and other places. How to improve the recognition accuracy and the efficiency of image processing becomes a problem needing attention in the security field.

Disclosure of Invention

In order to solve one of the problems, the application provides a recognition method and a device based on deep learning and a color space, which are applied to tool recognition. The method comprises the following steps:

step 101, acquiring a tool monitoring image data set;

102, preprocessing data of images in a tool image dataset, screening the images in the tool image dataset, and then performing resolution unification;

step 103, performing data enhancement on the tooling images in the tooling image dataset after the data preprocessing, and enhancing the color space in the data comprises the following steps: when different visual perception scenes are caused by different illumination, calculating by using a Value channel, and extracting the edge of an object by calculating the gradient; when the Saturation of the foreground is high, the background adopts a color with low Saturation to set off the foreground, and the information of a Saturation channel is extracted; when the scene is in an indoor scene and the style is single, increasing the Hue channel weight value; the value of the brightness V is changed from static state to dynamic state;

104, constructing a tool identification network, and using a residual error network to avoid over fitting based on the characteristic of small gap between tools except colors; sending the tooling image dataset enhanced by the data into a target detection humanoid network model for detection, and extracting the tooling image humanoid dataset;

step 105, training a tool classification network by adopting a tool image data set with enhanced data to obtain a tool identification model;

and 106, carrying out tool classification detection on the tool monitoring image to be detected by adopting a tool identification model.

Preferably, the data preprocessing in step 102 further includes: sorting the data sets, and dividing the total data set into a test set and a training set; the training set is used for training model parameters of the tool classification network.

Preferably, the converting the value of the brightness V from static to dynamic in the step 103 includes: selecting a sigmoid function as a description of a dynamic change track of the V value; sigmoid function

The sigmoid function takes a value range of (0, 1), and maps the real value V to the interval of (0, 1).

Preferably, the step 105 specifically includes:

and training a human shape detection model by adopting the tool image dataset with the data enhanced so as to obtain human shape data in the image by using the human shape detection model, and setting maximum numbers=50000 of the data enhanced picture.

Preferably, the use of the residual network to avoid overfitting in step 104 specifically includes: the setup tool identification network includes a resnet50.

Preferably, the image in the tooling image data set is a video image shot by a monitoring camera.

Preferably, step 105 specifically further includes: training the tool image classification network by adopting the humanoid data set to obtain a tool image classification model; training a tool image classification network by using a training set, wherein the learning rate adopts a cosine algorithm, the initial learning rate r=0.00001, and the gradient model uses small-batch gradient descent; when 300 epochs are trained, a judgment is carried out to judge whether the error and the precision can meet the requirements; and stopping training if the requirement is met, otherwise, continuing training until the requirement is met.

Preferably, the tooling image dataset with enhanced data in step 104 is sent to a target detection humanoid network model for detection, and the tooling image humanoid dataset is extracted, wherein the target detection algorithm is yolov 3 or yolov5-s algorithm.

Preferably, the method is applied to tool recognition of staff of a gas station.

The application also provides a tool identification device based on deep learning and color space, which is used for realizing the method, and comprises the following steps: the data set acquisition module is used for acquiring a tool image data set;

the data preprocessing module is used for preprocessing the data of the images in the tooling image dataset; the data enhancement module is used for enhancing the data of the tooling images in the tooling image dataset after the data preprocessing;

the tool image classification network construction module is used for constructing a tool identification network, and the tool identification network comprises a ResNet50 network;

the tool image classification network training module is used for training a tool identification network by adopting the tool image data set with enhanced data to obtain a tool identification model;

and the tool identification module is used for identifying the tool of the tool image to be detected by adopting the tool identification model.

The application changes the previous mode of carrying out tool identification by manual work only, uses an artificial intelligent algorithm to replace manual work for tool identification, reduces the investment of manual work and saves funds. Meanwhile, the improved HSV color space data enhancement method solves the problems of over fitting of an algorithm and low recognition accuracy in the conventional tool recognition. There is preferably also provided an apparatus comprising a processor and a memory, the memory having stored thereon a computer program for executing the computer program on the memory for carrying out the above method.

The application avoids the algorithm overfitting by using the network of the resnet50, which is more beneficial to the engineering landing; the improved HSV color space adjusts the track of the brightness V value according to the unique illumination characteristics of the outdoor tool, changes the track from a fixed value into a dynamic value, can express the original characteristics of the data, and ensures the accuracy of characteristic extraction; the detection of the human shape by combining the deep learning target is realized, and the accuracy of tool identification is ensured.

Drawings

The features and advantages of the present application will be more clearly understood by reference to the accompanying drawings, which are schematic and should not be interpreted as limiting the application in any way.

FIG. 1 is a flow chart of an identification method provided by the application.

FIG. 2 is a flow chart of the present application for providing identification method optimization.

FIG. 3 is a schematic diagram of the dynamic change track of V value in the present application.

Fig. 4 is a schematic view of the structure of the device of the present application.

Detailed Description

These and other features and characteristics of the present application, as well as the methods of operation and functions of the related elements of structure, the combination of parts and economies of manufacture, may be better understood with reference to the following description and the accompanying drawings, all of which form a part of this specification. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the application. It will be understood that the figures are not drawn to scale. Various block diagrams are used in the description of the various embodiments according to the present application.

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

In this context "/" means "or" for example, a/B may mean a or B; "and/or" herein is merely an association relationship describing an association object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone.

It should be noted that, in order to clearly describe the technical solution of the embodiments of the present application, in the embodiments of the present application, the terms "first", "second", and the like are used to distinguish the same item or similar items having substantially the same function or effect, and those skilled in the art will understand that the terms "first", "second", and the like do not limit the number and execution order. For example, the first information and the second information are used to distinguish between different information, and not to describe a particular order of information.

It should be noted that, in the embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g." in an embodiment should not be taken as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.

Example 1

The application provides a tool identification method based on deep learning and color space, which comprises the following steps: acquiring a tool identification image dataset; performing data preprocessing on the tool identification image data set; the data pre-processed image data is subjected to data enhancement including, but not limited to, flipping, rotation, cropping, noise, blurring, masking, cutout, random, erasing, mixup, color transformation, and the like. The color transform is an improved HSV color space transform that uses a varying V value to determine a maximum value of RGB; and constructing a tooling image classification network, wherein the image classification network comprises a human shape detection network tooling classification network.

The human shape data captured by the human shape detection network is sent into a tool classification network, the tool classification network comprises an improved resnet50 serving as a main network, human shape data extracted by the human shape detection network is received for training, and a tool image classification model is obtained; and carrying out image classification on the image to be detected by using the tool image classification model.

Preferably: in the step of acquiring the tool identification image data set, the tool video is manufactured into the image data set, data screening is carried out, and the image size resolution is processed in a unified mode.

Preferably: in the step of data enhancement of the image data after the data preprocessing, the data enhancement of the tooling image data set after the data preprocessing is performed by adopting a data enhancement mode of Mosaic, mixUp, randomErasing, hideAndSeek and GridMask, overturning, rotating, cutting, noise adding, blurring, masking, cutout, random, wearing, mixup, color conversion and other methods.

Preferably: in the step of constructing the tooling image classification network, the image classification network comprises a human shape detection network tooling classification network. The human shape data captured by the human shape detection network is sent into the tool classification network, the tool classification network comprises an improved resnet50 serving as a main network, human shape data extracted by the human shape detection network is received for training, and a tool image classification model is obtained.

Priority is given to: in the step of tool identification, the tool image classification model is adopted to detect and identify the tool image to be detected.

The scheme provided by the application has the beneficial technical effects that the collected data is subjected to data preprocessing, and then the data after the data preprocessing is subjected to data enhancement; the data enhancement includes, but is not limited to, turning, rotating, clipping, adding noise, blurring, masking, cutout, random, wearing, mixup, color conversion and the like, and the color conversion adopts an improved HSV color space method to enhance the data; constructing a tool image classification network; training a tool classification network by adopting the tool image data set with the enhanced data to obtain a tool classification model; and carrying out tool identification on the tool image to be detected by adopting the tool classification model.

In a specific embodiment, as shown in fig. 1-2, a tool identification method and device based on deep learning and color space are provided, and the method includes:

and 101, acquiring a tool monitoring image data set.

And 102, preprocessing the data of the images in the tooling image dataset. And the image in the tool image data set is a video image shot by the monitoring camera.

The step 102 specifically includes:

and after the images in the tool image dataset are screened, resolution unification processing is carried out. Because intelligent monitoring equipment is more in variety, the resolution of video images shot by the monitoring cameras is different, the network regression effect can be influenced, the resolution is unified, and the resolution of the video images is unified to be the same fixed size.

The data preprocessing also includes data set sorting, and the total data set is divided into a test set and a training set. The training set is mainly used for training parameters of the tool classification network model, and the corresponding tool identification method and parameters based on deep learning and color space.

The test set is mainly used for verifying the accuracy of the tool classification network model and ensuring that the tool classification network model can be used for actual engineering.

Step 103, carrying out data enhancement on the tooling image in the tooling image dataset after the data preprocessing, wherein step 103 specifically comprises the step of carrying out data enhancement on the tooling image dataset after the data preprocessing by adopting a data enhancement mode of Mosaic, mixUp, randomErasing, hideAndSeek and Gridmask, turning, rotating, cutting, adding noise, blurring, masking, cutout, random, wearing, mixup, color conversion and other methods.

Wherein data enhancement plays a vital role in deep learning network training. Because the parameters of the network are changed according to the change of the characteristics of the data, the quality of the data directly determines the quality of model training. Obviously, more obvious characteristics and color change which is easier to extract can be extracted by a convolution network, and the expression of the trained model is more accurate.

According to experiments, it was found that since the surveillance video is continuous, the resulting video image is also continuous, and in most cases, the variation between the upper and lower frames of image data is minimal.

Experiments show that the characteristics extracted by the final network tend to be a common characteristic regardless of the sequence of the characteristics between the images, and the neural network regression is taken as a basic principle. Based on this principle, a configuration is provided to the features more obvious to the neural network. The color feature is found to be an extremely important point of tool classification in experiments, and based on the experimental finding, the gap between tool shapes in a real scene is almost negligible, and only the feature capable of distinguishing the tool is only the color feature. Based on this, the present application focuses on improved enhancements to the data enhanced color space.

In image processing, RGB images are not generally processed directly, mainly because RGB is far from human visual perception, while HSV is a common color space. HSV refers to Hue (Hue), saturation (Saturation), brightness (Value), respectively, and we usually use different channels for different problems. The most common method in the prior art is to change an image into a gray level image, and experiments show that the Value channel is adopted for calculation under certain scenes with different visual perceptions caused by different illumination, and the object edge can be conveniently extracted by calculating the gradient. Meanwhile, experiments show that when the foreground Saturation is high, the background is set off the foreground by adopting a color with low Saturation, and the information of the channel of the Saturation is very useful. When in some indoor scenes, the style is single, i.e. in general, the object has only one color, and the channel weight value Hue is increased. According to HSV color space set in different situations, a proper channel is selected to complete most image preprocessing work.

According to a large number of observation and experiments, the tool relates to the outdoor and indoor situations, especially the outdoor tool has larger difference under different illumination conditions due to illumination influence, so that the data feature extraction is difficult, and the model training is inaccurate. While the control point is brightness (V)

Original HSV formula

Wherein r represents red and g represents green. g represents that the blue RGB has a value of 0 to 255, H has a value of 0:000 to 5:255, and S and V have a value of 0 to 255. When both S and V take full values, the corresponding red is 255, blue is 0, and b green varies from 0 to 255, between yellow and green, the corresponding red varies from 0 to 255, blue is 0, and g green 255, among others can be analogized.

RGB to V formula: v=max, where S is 255, adjusting V, adjusting the H hue value directly affects the maximum value of RGB and the value of V. Indicating that the value of V determines the maximum value of RGB

Saturation S

When max is fixed, the effect of S is to represent the value of min. When V and S are determined. The sliding color tone H, RGB colors have two colors, namely min and max, respectively, and the other color varies from min to max.

In general, the brightness V is processed by setting a fixed value in advance, so as to control the maximum value of RGB to realize the color conversion, and obviously, this method cannot follow the real-time change of the data in time, and cannot better reflect the real color characteristics of the image data. We therefore turn the value of luminance (V) from static to dynamic. So that the brightness can restore the true color of the image data.

From experimental observations, the succession of video images makes the image data continuous, such that the image features are continuous, and such continuous features cannot be increased all the time, which would make the network prediction effect overfit when all the picture data tends to one feature. We therefore choose the sigmoid function as a description of the dynamic change trace of the V value.

The track is shown in figure 3 of the drawings,

sigmoid function

The sigmoid function has a value range of (0, 1), and can map a real number V value to a section of (0, 1), so that the effect is better when the characteristic phase difference is complex or the phase difference is not particularly large.

It can be seen from fig. 3 that when the control dependent variable V value is continuously expanded to two ends, the color value of RGB is continuously increased, a certain feature of image data is continuously increased along with the increase of the data amount, and the feature is no longer increased when the feature is increased to a certain value, so that the model expressive force is prevented from being single, the model is over-fitted, and the recognition accuracy of the model is reduced. Therefore, the brightness V value is controlled through the sgmoid function, so that the RGB value is controlled, and the enhancement of HSV color space data better reflects the color characteristics of image data. And the tool classification network extracts more accurate data characteristics.

Step 104, constructing a tool identification network, wherein the tool identification network comprises a resnet50, and a residual network is needed to avoid over fitting because the tools are not far apart from the colors, and the resnet50 just meets the characteristic. Step 104 specifically includes:

and sending the tooling image dataset with the enhanced data into a target detection humanoid network model for detection, and extracting the tooling image humanoid dataset, wherein the humanoid dataset is also called a tooling dataset because the humanoid dataset is extracted from the tooling dataset. The target detection algorithm is a yolov 3 algorithm or a yolov5-s algorithm.

And 105, training the tool classification network by adopting the tool image data set with the enhanced data to obtain a tool classification model. Step 105 specifically includes:

and training a human shape detection model by adopting the tool image dataset after data enhancement to human shape position information in the tool image, extracting human shape data in the image by using the human shape detection model to manufacture a dataset, setting a maximum number=50000 of a data enhancement picture, and training the tool image classification network by adopting the human shape dataset if the data quantity is more, so as to obtain a tool image classification model.

The training set is used for training a network model, the learning rate adopts a cosine algorithm, the initial learning rate r=0.00001, and the gradient model uses small-batch gradient descent. When 300 epochs are trained, a judgment is carried out to judge whether the error and the precision can meet the requirements. And stopping training if the requirement is met, otherwise, continuing training until the requirement is met. And after the accuracy of the tool identification algorithm after the improvement is verified in the test set and meets the requirement, the parameters of the established algorithm model are selected as the parameters of the final model. The tool identification algorithm using the parameters is used in actual engineering.

Fig. 4 is a schematic structural diagram of a tool recognition device based on deep learning and color space, as shown in fig. 4, and a tool recognition method and device based on deep learning and color space, including:

a data set acquisition module 201, configured to acquire a tool image data set.

The data preprocessing module 202 is configured to perform data preprocessing on the images in the tooling image dataset.

And the data enhancement module 203 is configured to perform data enhancement on the tooling images in the tooling image dataset after data preprocessing.

The tool image classification network construction module 204 is configured to construct a tool identification network, where the tool identification network includes a res net50 network, and since the tools are not widely separated except by color, a residual network is required to avoid overfitting, and the res net50 just meets this feature.

The tool image classification network training module 205 is configured to train the tool identification network by using the tool image data set after the data enhancement, and obtain a tool identification model.

The tool identification module 206 is configured to identify a tool for the to-be-detected tool image by using the tool identification model.

The data preprocessing module 202 specifically includes a data preprocessing unit, configured to perform resolution unification processing on images in the tool image dataset.

The data enhancement module 203 specifically includes a data enhancement unit, and performs data enhancement on the tool image dataset after data preprocessing by adopting a data enhancement mode of Mosaic, mixUp, randomErasing, hideAndSeek and GridMask, turning, rotating, cutting, noise adding, blurring, masking, cutout, random, erase, mixup, color conversion and other methods.

The tool image classification network training module 205 specifically comprises a tool image classification network training unit, a human shape detection network, a tool image classification network training module and a tool image classification network training module, wherein the tool image classification network training unit is used for adopting a tool image data set with enhanced data, sending the tool image data set into the human shape detection network to capture human shape data, and then sending the human shape into the tool classification network, wherein the tool classification network comprises an improved resnet50 as a backbone network, and receiving the human shape data extracted by the human shape detection network for training to obtain a tool image classification model;

the application solves the problems that the manpower monitoring cost is high and the personnel cannot monitor the working negligence in place in the mode of monitoring by the manpower alone, provides a more practical HSV color space data enhancement method and improves the accuracy and efficiency of tool identification.

It will be appreciated by those skilled in the art that implementing all or part of the above-described embodiment method may be implemented by a computer program to instruct related hardware, where the program may be stored in a computer readable storage medium, and the program may include the above-described embodiment method when executed. Wherein the storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a Flash Memory (Flash Memory), a Hard Disk (HDD), or a Solid State Drive (SSD); the storage medium may also comprise a combination of memories of the kind described above.

As used in this disclosure, the terms "component," "module," "apparatus," and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, or software in execution. For example, the components may be, but are not limited to: a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of example, both an application running on a computing device and the computing device can be a component. One or more components may reside within a process and/or thread of execution and a component may be localized on one computer and/or distributed between two or more computers. Furthermore, these components can execute from various computer readable media having various data structures thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from one component interacting with another component in a local device, distributed device, and/or across a network such as the internet with other devices by way of the signal).

It should be noted that the above embodiments are only for illustrating the technical solution of the present application and not for limiting the same, and although the present application has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present application may be modified or substituted without departing from the spirit and scope of the technical solution of the present application, which is intended to be covered in the scope of the claims of the present application.

Claims

1. A tool identification method based on deep learning and color space, the method comprising:

step 101, acquiring a tool monitoring image data set;

step 103, performing data enhancement on the tool images in the tool image dataset after the data preprocessing, and enhancing the HSV color space in the data comprises the following steps: when different visual perception scenes are caused by different illumination, calculating by using a Value channel, and extracting the edge of an object by calculating the gradient; when the Saturation of the foreground is high, the background adopts a color with low Saturation to set off the foreground, and the information of a Saturation channel is extracted; when the scene is in an indoor scene and the style is single, increasing the Hue channel weight value; the value of the brightness V is changed from static state to dynamic state;

2. The method of claim 1, wherein the preprocessing of the data in step 102 further comprises: sorting the data sets, and dividing the total data set into a test set and a training set; the training set is used for training model parameters of the tool classification network.

3. The method of claim 2, wherein the step 103 of converting the value of the luminance V from static to dynamic comprises: selecting a sigmoid function as a description of a dynamic change track of the V value; sigmoid function

4. A method as claimed in claim 3, wherein: the step 105 specifically includes:

5. The method of claim 4, wherein: the use of the residual network to avoid overfitting in step 104 specifically includes: the setup tool identification network includes a resnet50.

6. The method of claim 5, wherein: the images in the tool image data set are video images shot through the monitoring camera.

7. The method of claim 6, wherein: step 105 specifically further includes: training the tool image classification network by adopting the humanoid data set to obtain a tool image classification model; training a tool image classification network by using a training set, wherein the learning rate adopts a cosine algorithm, the initial learning rate r=0.00001, and the gradient model uses small-batch gradient descent; when 300 epochs are trained, a judgment is carried out to judge whether the error and the precision can meet the requirements; and stopping training if the requirement is met, otherwise, continuing training until the requirement is met.

8. The method of claim 7, wherein: and (4) sending the tooling image dataset enhanced by the data in the step (104) into a target detection humanoid network model for detection, and extracting the tooling image humanoid dataset, wherein the target detection algorithm is a yolov5-s algorithm.

9. The method as recited in claim 8, wherein: the method is applied to tool identification of gas station staff.

10. A tool recognition device based on deep learning and color space for implementing any one of the methods of claims 1-9; characterized in that the device comprises: the data set acquisition module is used for acquiring a tool image data set;