CN110110665B - Detection method for hand area in driving environment - Google Patents
Detection method for hand area in driving environment Download PDFInfo
- Publication number
- CN110110665B CN110110665B CN201910378179.7A CN201910378179A CN110110665B CN 110110665 B CN110110665 B CN 110110665B CN 201910378179 A CN201910378179 A CN 201910378179A CN 110110665 B CN110110665 B CN 110110665B
- Authority
- CN
- China
- Prior art keywords
- hand
- convolution
- driving environment
- training
- data set
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/107—Static hand or arm
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a method for detecting a hand area in a driving environment, which comprises the following steps: step 1) preparing a data set, wherein the data set is obtained in a real driving environment under the condition that the scene in a cab is shot by camera equipment arranged at different positions of the cab, the data set is divided into a training image set and a testing image set, then the data set is subjected to data expansion, and then a new hand surrounding frame is generated; step 2) constructing a hand detection convolutional neural network structure, and completing feature extraction and fusion by adopting a multi-scale framework and utilizing feature information on different scales; step 3) performing end-to-end training by adopting an ADAM (adaptive dynamic adaptive analysis) optimization algorithm, randomly sampling from a training image set, and stopping training when a loss function L is stable; step 4), suppressing the candidate frame used for eliminating redundancy by adopting a non-maximum value to obtain an optimal hand surrounding frame; step 5), publishing a detection result; the hand region detection method is convenient to realize the detection of the human hand region and suitable for the hand region marking in the cab environment.
Description
Technical Field
The invention belongs to the field of target detection of computer vision, and particularly relates to a method for detecting a hand area in a driving environment.
Background
Human hand detection, classification and tracking have been studied for many years and can be applied in many fields, such as virtual reality, man-machine interaction environment, driver behavior monitoring and the like. Because the hand region in the natural image is interfered by a plurality of factors, such as illumination change, shading, hand shape change, visual angle change, low hand resolution and the like, so far, the detection of the hand region in the natural image does not reach the accuracy of human identification, and many application occasions have to rely on an artificial detection mode with low efficiency. Therefore, it is very important to research the accurate detection method of the human hand region in the natural environment. The aim of the method is to detect a hand region from a static image in a vehicle cab environment, and a novel method based on a deep learning technology is researched, so that technical means such as driver behavior detection can be provided.
The use of skin color information in hand detection is an effective strategy for achieving good results in many ways. A two-stage approach is proposed as in document [1] [ a.misttal, a.zisserman, and p.h.s.torr.hand detection using multiple products in British Machine Vision Conference,2011] that uses three complementary detectors of context, skin color, and sliding window shape to give hand region candidate boxes, and then gives confidence probabilities for each candidate box through a classifier. A disadvantage of this type of method is that when detecting hand regions in natural images, the detection performance is greatly affected by skin color variations due to complex lighting conditions. Methods that employ multimodal information may also yield better results in certain applications. For example, documents [2] [ E.Ohn-Bar, S.Martin, A.Tawari, and M.M.Trivedi.head, eye, and hand patterns for driver activity recognition. in ICPR, patterns 660-. However, this method does not provide high detection accuracy for the hand region because of the limitations of the selected HOG features. The shape-sensitive structured forest algorithm is adopted in the document [3] [ X.Zhu, X.Jua, and K.Wong, "Pixel-level detection with shape structured requirements," in Processing of auxiliary Conference on Computer Vision.Springer Press,2014, pp.64-78 ] to detect the hand region Pixel by Pixel, although the method has a good effect on hand detection at a first viewing angle, but the method of scanning the whole image Pixel by Pixel is too time-consuming. The hand region [4] [ L.Karlinsky, M.Dinerstein, D.Harari, and S.Ullman, "The chain model for detecting parts by The person context," in Proceedings of Computer Vision and Pattern recognition. IEEE Press,2010, pp.25-32 ] is another hand region detection scheme, and The hand region is determined by dividing The human body into different parts, but when occlusion occurs, it is difficult to detect The hand. With the rapid development of deep learning technology, the target detection based on the convolutional neural network makes great progress. Such as convolutional neural network series (RCNN, Fast-RCNN, R-FCN), YOLO series object detection networks, etc., which are named based on candidate regions, although they achieve good results in detecting objects such as cats, dogs, pedestrians, automobiles, sofas, etc., when the regions occupied by the objects in the images are relatively small (e.g., human hands) or are occluded, the original structure detection accuracy using these networks is not high, and it is necessary to design a more efficient structure. Document [5] [ Lu Ding, Yong Wang, et al, Multi-scale representations for robust detection and classification, arXiv:1804.08220v1[ cs.CV ],2018] proposes a multi-scale R-FCN network structure, which comprises 5 convolution layers, gives hand region candidate frames from different scales, extracts feature maps of different layers from the hand region candidate frames for fusion, and further obtains a detected hand region enclosure frame. The document [6] [ T.hoang Ngan Le Kha Gia Quach Chenche Zhu, et al.Robust Hand Detection and Classification in Vehicles and in the Wild, CVPRW 2018, pp:39-46] also takes an R-FCN network structure as a basic frame, fuses the characteristics of different layers in a multi-scale manner, and screens a player part area in a candidate frame. A combined network of Hand region Detection and Hand Rotation direction Detection is designed in documents [7] [ Xiaoming Deng, Ye Yuan, Yinda Zhang, et al ], Joint Hand Detection and Rotation Estimation by Using CNN, arXiv:1612.02742v1[ cs.CV ],2016 ], and final Hand region Detection is completed through feature sharing.
Disclosure of Invention
The invention aims to: the method for detecting the hand area in the driving environment is used as a new hand detection network structure, a skin color model does not need to be established, an additional feature extractor is not needed, the network model is trained through an RGB (red, green and blue) data set in the driving cab environment, the detection of the hand area of the human is realized, and the method is suitable for labeling the hand area in the driving cab environment.
The technical scheme of the invention is as follows: a detection method for a hand area in a driving environment specifically comprises the following steps:
step 1) preparing a data set, wherein the data set is obtained in a real driving environment under the condition that the scene in a cab is shot by camera equipment arranged at different positions of the cab, the data set is divided into a training image set and a testing image set, then the data set is subjected to data expansion, and then a new hand surrounding frame is generated;
step 2) constructing a hand detection convolutional neural network structure, and completing feature extraction and fusion by adopting a multi-scale framework and utilizing feature information on different scales;
step 3) performing end-to-end training by adopting an ADAM (adaptive dynamic adaptive analysis) optimization algorithm, randomly sampling from a training image set, and stopping training when a loss function L is stable;
the loss function L is formulated as follows:
L=Lc+Lr (1)
wherein L iscProbability for evaluating whether pixels inside and outside a hand bounding box are correctly classified, LrThe method is used for evaluating whether the vertex position of the hand bounding box is correctly regressed;
Lc=-αp*(1-p)γlogp-(1-α)(1-p*)pγlog(1-p) (2)
where p denotes the true pixel classification result, p denotes the probability that the network estimated pixel is inside the hand bounding box, α is the positive and negative sample balance factor,gamma is taken according to experience;
wherein C isiAndrespectively representing a regression result and a true value of the hand surrounding frame coordinates;
step 4), suppressing the candidate frame used for eliminating redundancy by adopting a non-maximum value to obtain an optimal hand surrounding frame;
and 5) publishing the detection result.
As a preferred technical solution, the training image set in step 1) is according to 9: the 1 proportion is randomly divided into a training subset and a verification subset.
As a preferred technical solution, the data expansion method for the data set in step 1) includes horizontal flipping, vertical flipping, random angle rotation, translation, gaussian blurring and sharpening, and the training data after expansion is increased to at least 22000 images.
As a preferred technical solution, the data expansion in step 1) includes the following rules:
expansion rule 1: the brightness enhancement range is 1.2-1.5 times, the scaling is 0.7-1.5 times, 40 pixels are translated in the x direction, and 60 pixels are translated in the y direction;
and (3) expanding the rule 2: randomly cutting 0-16 pixels of edge distance, and horizontally turning over according to 50% probability;
and (3) expansion rule: vertically turning 100%, and adding Gaussian blur treatment with the mean value of 0 and the variance of 3;
and (4) expansion rule: randomly rotating, rotating the upper limit of the angle by 45 degrees, adding Gaussian white noise, and randomly sharpening according to 50% probability, wherein the noise level is 20%.
As a preferable technical solution, the method for generating the new hand bounding box in step 1) is as follows: taking four frames of the original hand surrounding frame as reference, retracting the frame to a specified length d of 0.2lmin,lminFor the shortest bounding box length, the intra-frame portion is labeled 1 and the out-frame portion is labeled 0.
As a preferred technical scheme, the feature extraction and fusion in step 2) comprises three convolution modules and an up-sampling feature fusion process, and specifically comprises the following steps:
the image size of an input layer is 256 multiplied by 256, a first convolution module ConvB _1 comprises two convolution layers and a maximum pooling layer, and convolution kernels are 3 multiplied by 3, and 64 channels; the second convolution module ConvB _2 comprises two convolution layers and one maximum pooling layer, and the convolution kernel has 3 × 3, 128 channels; the third convolution module ConvB _3 comprises three convolution layers and a maximum pooling layer, and the convolution kernel has 3 × 3 and 256 channels; the nuclear sizes of the pooling layers are all 2 multiplied by 2, and the step length is 2;
up-sampling the feature map output by the third convolution module ConvB _3, doubling the size, randomly removing 20% of the number of channels from the feature map output by the second convolution module ConvB _2 by using a Dropout mechanism, and cascading the two modules; the fused feature map FusF _1 is subjected to normalization processing and then is sent into a 1 × 1 and 3 × 3 cascade convolution group ConvC _1, and 128 channels are formed in total; the output of the convolution is sent to an output layer after passing through a 3 multiplied by 3 convolution layer with 32 convolution kernels; the output layer comprises two branches, and the branch 1 predicts the probability that each pixel point is located in the target area through single-channel 1 × 1 convolution; branch 2 predicts the coordinate values of the vertices of the target bounding box by 4-channel 1 × 1 convolution.
As a preferable technical scheme, the detection result in the step 5) comprises the following objective quantitative evaluation indexes: average accuracy AP, average recall rate AR, comprehensive evaluation index F1-score and detection speed FPS;
assuming that TP represents that a real target is estimated, FP represents that the estimated target is not the real target, and FN represents that the real target is not estimated, the method comprises the following steps
The FPS employs a frame rate description.
The invention has the advantages that:
1. the method for detecting the hand region in the driving environment has the advantages of high accuracy, better applicability, low computational complexity, less operation time, simple training process, high efficiency and 42fps (measured time).
2. The invention adopts a deep convolution neural network structure to establish a hand detection model, can extract more comprehensive characteristics related to human hands, and has better robustness on shielding, uneven illumination, scale change, shape change and the like.
Drawings
The invention is further described with reference to the following figures and examples:
fig. 1 is a schematic diagram of detection results for different illumination, different hand shapes, different sizes of hands, and different numbers of hands.
Detailed Description
Example (b): because the hand area has larger size change in different images, the characteristic graphs of different depths are considered to respectively express the characteristics of the hands with different sizes, wherein deeper features are adopted to focus on the hand area with larger depth, and shallower features are adopted to focus on the hand area with smaller depth, in order to reduce the calculation expense, the invention adopts the idea of a U-shaped convolution neural network structure to gradually merge the characteristic graphs, and the method specifically comprises the following steps:
step 1) preparing a data set, wherein the data set is obtained in a real driving environment under the condition that the scene in a driving cab is shot by camera equipment arranged at different positions of the driving cab, and is used for researching the performance of a hand region detection method under the conditions of disordered background, complex lighting conditions and frequent shielding, dividing the data set into a training image set and a testing image set, then performing data expansion on the data set, and then generating a new hand surrounding frame;
the data set comprises 5500 training images and 5500 testing images, and the image size is uniformly adjusted to 256 multiplied by 256 during training and testing; training image sets were as follows 9: the 1-scale random division is performed on a training subset and a verification subset, wherein the training subset comprises 4950 images, the verification subset comprises 550 images, and the test image set comprises 5500 images. The camera view includes: moving camera, fixed at left front camera driver, fixed at right front camera driver, fixed at back, fixed at right side of driver, fixed at top, worn on driver head, etc.
The deep neural network requires massive data training to obtain a better model. Therefore, the data set needs to be expanded on the basis of the original data. The data expansion method for the data set comprises horizontal overturning, vertical overturning, random angle rotation, translation, Gaussian blur and sharpening, and the training data is increased to at least 22000 images after expansion.
Data augmentation contains the following rules:
expansion rule 1: the brightness enhancement range is 1.2-1.5 times, the scaling is 0.7-1.5 times, 40 pixels are translated in the x direction, and 60 pixels are translated in the y direction;
and (3) expanding the rule 2: randomly cutting 0-16 pixels of edge distance, and horizontally turning over according to 50% probability;
and (3) expansion rule: vertically turning 100%, and adding Gaussian blur treatment with the mean value of 0 and the variance of 3;
and (4) expansion rule: randomly rotating, rotating the upper limit of the angle by 45 degrees, adding Gaussian white noise, and randomly sharpening according to 50% probability, wherein the noise level is 20%.
The hand bounding box given by the original dataset is in the form of vertex coordinates of the bounding box. The information used by the network output part of the patent is probability information of pixel points falling in the bounding box, so that the original hand bounding box needs to be processed to generate a new hand bounding box form. The new hand bounding box generation method is as follows: taking four frames of the original hand surrounding frame as reference, retracting the frame to a specified length d of 0.2lmin,lminFor the shortest bounding box length, the intra-frame portion is labeled 1 and the out-frame portion is labeled 0.
Step 2) constructing a hand detection convolutional neural network structure, and completing feature extraction and fusion by adopting a multi-scale framework and utilizing feature information on different scales;
the feature extraction and fusion comprises three convolution modules and an up-sampling feature fusion process, and specifically comprises the following steps:
the image size of an input layer is 256 multiplied by 256, a first convolution module ConvB _1 comprises two convolution layers and a maximum pooling layer, and convolution kernels are 3 multiplied by 3, and 64 channels; the second convolution module ConvB _2 comprises two convolution layers and one maximum pooling layer, and the convolution kernel has 3 × 3, 128 channels; the third convolution module ConvB _3 comprises three convolution layers and a maximum pooling layer, and the convolution kernel has 3 × 3 and 256 channels; the nuclear sizes of the pooling layers are all 2 multiplied by 2, and the step length is 2;
up-sampling the feature map output by the third convolution module ConvB _3, doubling the size, randomly removing 20% of the number of channels from the feature map output by the second convolution module ConvB _2 by using a Dropout mechanism, and cascading the two modules; the fused feature map FusF _1 is subjected to normalization processing and then is sent into a 1 × 1 and 3 × 3 cascade convolution group ConvC _1, and 128 channels are formed in total; the output of the convolution is sent to an output layer after passing through a 3 multiplied by 3 convolution layer with 32 convolution kernels; the output layer comprises two branches, and the branch 1 predicts the probability that each pixel point is located in the target area through single-channel 1 × 1 convolution; branch 2 predicts the coordinate values of the vertices of the target bounding box by 4-channel 1 × 1 convolution.
Step 3) performing end-to-end training by adopting an ADAM (adaptive dynamic adaptive analysis) optimization algorithm, randomly sampling from a training image set, and stopping training when a loss function L is stable;
the loss function L is formulated as follows:
L=Lc+Lr (1)
wherein L iscProbability for evaluating whether pixels inside and outside a hand bounding box are correctly classified, LrThe method is used for evaluating whether the vertex position of the hand bounding box is correctly regressed;
Lc=-αp*(1-p)γlogp-(1-α)(1-p*)pγlog(1-p) (2)
where p denotes the true pixel classification result, p denotes the probability that the network estimated pixel is inside the hand bounding box, α is the positive and negative sample balance factor,gamma is taken according to experience; better experimental results can be obtained by setting gamma to 2 in the experiment;
wherein C isiAndrespectively representing a regression result and a true value of the hand surrounding frame coordinates;
and 4) in the target detection process, a large number of mutually overlapped candidate frames are generated at the same target position, and each candidate frame has different confidence degrees. Adopting non-maximum value to suppress the candidate frame for eliminating redundancy to obtain the optimal hand surrounding frame;
step 5), publishing a detection result; the detection result comprises the following objective quantitative evaluation indexes: average accuracy AP, average recall rate AR, comprehensive evaluation index F1-score and detection speed FPS;
assuming that TP represents that a real target is estimated, FP represents that the estimated target is not the real target, and FN represents that the real target is not estimated, the method comprises the following steps
The FPS employs a frame rate description.
The performance of the hand region in the RGB static image under the environment of the cab is evaluated by means of subjective visual inspection and objective quantitative indexes, fig. 1 shows the hand inspection results of a few typical examples, and it can be seen that the method has good inspection effects on different illumination, different hand shapes, different sizes of hands and different numbers of hands.
The results of the quantitative evaluation of the method on the test set are shown in table 1, and the method performance is compared with the best result of competition on the VIVA data set.
TABLE 1 quantitative evaluation index for hand region detection in test set
Method | AP(%) | AR(%) | F | FPS |
This patent | 98.3 | 86.7 | 92.2 | 42 |
Background Art document [6] | 94.8 | 74.7 | - | 4.65 |
The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.
Claims (7)
1. A method for detecting a hand area in a driving environment is characterized by comprising the following steps:
step 1) preparing a data set, wherein the data set is obtained in a real driving environment under the condition that the scene in a cab is shot through camera equipment arranged at different positions of the cab, the data set is divided into a training image set and a testing image set, then the data set is subjected to data expansion, and then a new hand surrounding frame is generated
Step 2) constructing a hand detection convolutional neural network structure, and completing feature extraction and fusion by adopting a multi-scale framework and utilizing feature information on different scales;
step 3) performing end-to-end training by adopting an ADAM (adaptive dynamic adaptive analysis) optimization algorithm, randomly sampling from a training image set, and stopping training when a loss function L is stable;
the loss function L is formulated as follows:
L=Lc+Lr (1)
wherein L iscProbability for evaluating whether pixels inside and outside a hand bounding box are correctly classified, LrThe method is used for evaluating whether the vertex position of the hand bounding box is correctly regressed;
Lc=-αp*(1-p)γlogp-(1-α)(1-p*)pγlog(1-p) (2)
where p denotes the true pixel classification result and p denotes the probability that the network estimated pixel is inside the hand bounding boxAnd alpha is a positive and negative sample balance factor,gamma is taken according to experience;
wherein C isiAndrespectively representing a regression result and a true value of the hand surrounding frame coordinates;
step 4), suppressing the candidate frame used for eliminating redundancy by adopting a non-maximum value to obtain an optimal hand surrounding frame;
and 5) publishing the detection result.
2. The method for detecting the hand region in the driving environment according to claim 1, wherein the training image set in step 1) is obtained according to the following formula 9: the 1 proportion is randomly divided into a training subset and a verification subset.
3. The method for detecting the hand region in the driving environment of claim 1, wherein the data expansion method for the data set in step 1) comprises horizontal flipping, vertical flipping, random angle rotation, translation, gaussian blurring and sharpening, and the expanded training data is added to at least 22000 images.
4. The method for detecting the hand region in the driving environment of claim 1, wherein the data expansion in step 1) comprises the following rules:
expansion rule 1: the brightness enhancement range is 1.2-1.5 times, the scaling is 0.7-1.5 times, 40 pixels are translated in the x direction, and 60 pixels are translated in the y direction;
and (3) expanding the rule 2: randomly cutting 0-16 pixels of edge distance, and horizontally turning over according to 50% probability;
and (3) expansion rule: vertically turning 100%, and adding Gaussian blur treatment with the mean value of 0 and the variance of 3;
and (4) expansion rule: randomly rotating, rotating the upper limit of the angle by 45 degrees, adding Gaussian white noise, and randomly sharpening according to 50% probability, wherein the noise level is 20%.
5. The method for detecting a hand region in a driving environment according to claim 1, wherein the new hand bounding box in step 1) is generated as follows: taking four frames of the original hand surrounding frame as reference, retracting the frame to a specified length d of 0.2lmin,lminFor the shortest bounding box length, the intra-frame portion is labeled 1 and the out-frame portion is labeled 0.
6. The method for detecting the hand region in the driving environment according to claim 1, wherein the feature extraction and fusion in step 2) includes three convolution modules and an upsampling feature fusion process, and specifically includes the following steps:
the image size of an input layer is 256 multiplied by 256, a first convolution module ConvB _1 comprises two convolution layers and a maximum pooling layer, and convolution kernels are 3 multiplied by 3, and 64 channels; the second convolution module ConvB _2 comprises two convolution layers and one maximum pooling layer, and the convolution kernel has 3 × 3, 128 channels; the third convolution module ConvB _3 comprises three convolution layers and a maximum pooling layer, and the convolution kernel has 3 × 3 and 256 channels; the nuclear sizes of the pooling layers are all 2 multiplied by 2, and the step length is 2;
up-sampling the feature map output by the third convolution module ConvB _3, doubling the size, randomly removing 20% of the number of channels from the feature map output by the second convolution module ConvB _2 by using a Dropout mechanism, and cascading the two modules; the fused feature map FusF _1 is subjected to normalization processing and then is sent into a 1 × 1 and 3 × 3 cascade convolution group ConvC _1, and 128 channels are formed in total; the output of the convolution is sent to an output layer after passing through a 3 multiplied by 3 convolution layer with 32 convolution kernels; the output layer comprises two branches, and the branch 1 predicts the probability that each pixel point is located in the target area through single-channel 1 × 1 convolution; branch 2 predicts the coordinate values of the vertices of the target bounding box by 4-channel 1 × 1 convolution.
7. The method for detecting the hand region in the driving environment according to claim 1, wherein the detection result in the step 5) includes the following objective quantitative evaluation indexes: average accuracy AP, average recall rate AR, comprehensive evaluation index F1-score and detection speed FPS;
assuming that TP represents that a real target is estimated, FP represents that the estimated target is not the real target, and FN represents that the real target is not estimated, the method comprises the following steps
The FPS employs a frame rate description.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910378179.7A CN110110665B (en) | 2019-05-08 | 2019-05-08 | Detection method for hand area in driving environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910378179.7A CN110110665B (en) | 2019-05-08 | 2019-05-08 | Detection method for hand area in driving environment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110110665A CN110110665A (en) | 2019-08-09 |
CN110110665B true CN110110665B (en) | 2021-05-04 |
Family
ID=67488704
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910378179.7A Active CN110110665B (en) | 2019-05-08 | 2019-05-08 | Detection method for hand area in driving environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110110665B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112364805B (en) * | 2020-11-21 | 2023-04-18 | 西安交通大学 | Rotary palm image detection method |
CN112686888A (en) * | 2021-01-27 | 2021-04-20 | 上海电气集团股份有限公司 | Method, system, equipment and medium for detecting cracks of concrete sleeper |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102129673A (en) * | 2011-04-19 | 2011-07-20 | 大连理工大学 | Color digital image enhancing and denoising method under random illumination |
CN109086779A (en) * | 2018-07-28 | 2018-12-25 | 天津大学 | A kind of attention target identification method based on convolutional neural networks |
CN109635750A (en) * | 2018-12-14 | 2019-04-16 | 广西师范大学 | A kind of compound convolutional neural networks images of gestures recognition methods under complex background |
CN109711288A (en) * | 2018-12-13 | 2019-05-03 | 西安电子科技大学 | Remote sensing ship detecting method based on feature pyramid and distance restraint FCN |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10996372B2 (en) * | 2017-08-25 | 2021-05-04 | Exxonmobil Upstream Research Company | Geophysical inversion with convolutional neural networks |
CN108875732B (en) * | 2018-01-11 | 2022-07-12 | 北京旷视科技有限公司 | Model training and instance segmentation method, device and system and storage medium |
-
2019
- 2019-05-08 CN CN201910378179.7A patent/CN110110665B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102129673A (en) * | 2011-04-19 | 2011-07-20 | 大连理工大学 | Color digital image enhancing and denoising method under random illumination |
CN109086779A (en) * | 2018-07-28 | 2018-12-25 | 天津大学 | A kind of attention target identification method based on convolutional neural networks |
CN109711288A (en) * | 2018-12-13 | 2019-05-03 | 西安电子科技大学 | Remote sensing ship detecting method based on feature pyramid and distance restraint FCN |
CN109635750A (en) * | 2018-12-14 | 2019-04-16 | 广西师范大学 | A kind of compound convolutional neural networks images of gestures recognition methods under complex background |
Non-Patent Citations (2)
Title |
---|
HBE: Hand Branch Ensemble Network for Real-time 3D Hand Pose Estimation;Yidan Zhou 等;《ECCV 2018》;20181231;第1-16页 * |
自适应增强卷积神经网络图像识别;刘万军 等;《中国图象图形学报》;20171231;第1723-1736页 * |
Also Published As
Publication number | Publication date |
---|---|
CN110110665A (en) | 2019-08-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ali et al. | Structural crack detection using deep convolutional neural networks | |
CN109886986B (en) | Dermatoscope image segmentation method based on multi-branch convolutional neural network | |
CN108416266B (en) | Method for rapidly identifying video behaviors by extracting moving object through optical flow | |
CN109903331B (en) | Convolutional neural network target detection method based on RGB-D camera | |
CN108062525B (en) | Deep learning hand detection method based on hand region prediction | |
CN107358258B (en) | SAR image target classification based on NSCT double CNN channels and selective attention mechanism | |
CN110032925B (en) | Gesture image segmentation and recognition method based on improved capsule network and algorithm | |
CN107909005A (en) | Personage's gesture recognition method under monitoring scene based on deep learning | |
CN106157303A (en) | A kind of method based on machine vision to Surface testing | |
CN113160062B (en) | Infrared image target detection method, device, equipment and storage medium | |
CN113408584B (en) | RGB-D multi-modal feature fusion 3D target detection method | |
EP4174792A1 (en) | Method for scene understanding and semantic analysis of objects | |
CN109753996B (en) | Hyperspectral image classification method based on three-dimensional lightweight depth network | |
CN107808376A (en) | A kind of detection method of raising one's hand based on deep learning | |
CN110490924B (en) | Light field image feature point detection method based on multi-scale Harris | |
CN103903275A (en) | Method for improving image segmentation effects by using wavelet fusion algorithm | |
CN110110665B (en) | Detection method for hand area in driving environment | |
Li et al. | Research on a product quality monitoring method based on multi scale PP-YOLO | |
Wang et al. | Segmentation of corn leaf disease based on fully convolution neural network | |
CN104376312B (en) | Face identification method based on bag of words compressed sensing feature extraction | |
Peng et al. | Litchi detection in the field using an improved YOLOv3 model | |
Li et al. | The research on traffic sign recognition based on deep learning | |
CN111832508B (en) | DIE _ GA-based low-illumination target detection method | |
CN105930789A (en) | Human body behavior recognition based on logarithmic Euclidean space BOW (bag of words) model | |
Nie et al. | Analysis on DeepLabV3+ performance for automatic steel defects detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |