CN112163545A - Head feature extraction method and device, electronic equipment and storage medium - Google Patents

Head feature extraction method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112163545A
CN112163545A CN202011087869.6A CN202011087869A CN112163545A CN 112163545 A CN112163545 A CN 112163545A CN 202011087869 A CN202011087869 A CN 202011087869A CN 112163545 A CN112163545 A CN 112163545A
Authority
CN
China
Prior art keywords
human body
trained
body detection
image
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011087869.6A
Other languages
Chinese (zh)
Inventor
杨建权
赵阳
朱涛
张天麒
李高杨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Hualu Group Co Ltd
Beijing E Hualu Information Technology Co Ltd
Original Assignee
China Hualu Group Co Ltd
Beijing E Hualu Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Hualu Group Co Ltd, Beijing E Hualu Information Technology Co Ltd filed Critical China Hualu Group Co Ltd
Priority to CN202011087869.6A priority Critical patent/CN112163545A/en
Publication of CN112163545A publication Critical patent/CN112163545A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a head feature extraction method, a head feature extraction device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring an image to be detected; inputting the image to be detected into a pre-trained human body detection neural network model to obtain a human body detection frame in the image to be detected; and inputting the human body detection frame into a pre-trained head feature detection classification model to obtain the human body head features of the human body detection frame. By implementing the method and the device, the human body detection frame of the image to be detected is obtained first, the head characteristics in the human body detection frame are detected, and the human body occupies a larger screen than the head in the image to be detected for the same pedestrian, so that the missing detection is less likely to happen, and the accuracy of the head characteristic detection is improved.

Description

Head feature extraction method and device, electronic equipment and storage medium
Technical Field
The invention relates to the field of neural networks, in particular to a head feature extraction method and device, electronic equipment and a storage medium.
Background
With the development of deep learning technology, the application of computer vision technology in social production and life is more and more extensive, and from game machines capable of recognizing gestures to police, according to the intelligent tracking of road monitoring to criminals, computers gradually have the functions of seeing, knowing, analyzing and feeding back by naked eyes. In each city, considerable road monitoring cameras are installed, and the road monitoring cameras have the functions of recording road conditions, normalizing road behaviors, tracing the occurrence process of events, or preventing accidents and the like.
How to apply computer vision technology to automatically mine effective information in videos is always an important subject for smart city development. The target detection algorithm plays a major role in road monitoring all the time, and position information of an object needing attention in a video can be automatically deduced through a deep learning model. In the related art, human head features are generally directly extracted for feature recognition, but actually in a video surveillance video in a street scene, a camera is generally deep and high and is used for shooting the overall situation of a crowd or a car crowd, and head missing detection is possibly caused because head pixels of a single pedestrian occupy a small screen ratio, so that the head feature detection accuracy is low.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for extracting a head feature, an electronic device, and a storage medium, so as to solve the problem in the prior art that head missing detection is caused, so that the head feature detection accuracy is low.
According to a first aspect, an embodiment of the present invention provides a head feature extraction method, including the following steps: acquiring an image to be detected; inputting the image to be detected into a pre-trained human body detection neural network model to obtain a human body detection frame in the image to be detected; and inputting the human body detection frame into a pre-trained head feature detection classification model to obtain the human body head features of the human body detection frame.
Optionally, the human body detection neural network model is a YOLOv3 neural network model, and the head feature detection classification model is a MobileNet classification network.
Optionally, the training process of the human detection neural network model includes: acquiring a first training sample, wherein the first training sample comprises scene images of different regions, different time periods and different illumination conditions and human body pre-labeling information in the scene images; acquiring a first pre-training neural network model trained according to a target data set; and carrying out transfer learning on the first pre-training neural network model according to the first training sample to obtain a human body detection neural network model.
Optionally, the training process of the head feature detection classification model includes: acquiring a second training sample, wherein the second training sample comprises a multi-class sample label, and the multi-class sample label is obtained according to a multi-label binarization function in a target function library; inputting the second training sample to a second pre-trained neural network model; and when the second pre-training neural network model meets the preset condition, obtaining a head feature detection classification model.
Optionally, the method further comprises: and when the second training sample is input into the second pre-training neural network model and the feature extraction error occurs, repeatedly inputting the second training sample with the feature extraction error into the second pre-training neural network model, and performing iterative training for the target times.
Optionally, obtaining the second training sample comprises: acquiring an image to be trained; inputting the image to be trained into a human body detection YOLO V3 model trained in advance to obtain a human body detection frame in the image to be trained; inputting the human body detection frame into a pre-trained feature classification YOLO V3 model to obtain a label corresponding to the human body detection frame, and constructing according to the human body detection frame and the corresponding label to obtain the second training sample.
Optionally, inputting the image to be detected into a pre-trained human detection neural network model, including: adjusting the size of the image to be detected to a first target size, and inputting the image to be detected of the first target size into a human body detection neural network model trained in advance; and/or inputting the human body detection frame into a pre-trained head feature detection classification model, comprising: and adjusting the size of the human body detection frame to a second target size, and inputting the human body detection frame with the second target size into a pre-trained head feature detection classification model.
According to a second aspect, an embodiment of the present invention provides a head feature extraction device, including: the image acquisition module to be detected is used for acquiring an image to be detected; the human body detection module is used for inputting the image to be detected into a pre-trained human body detection neural network model to obtain a human body detection frame in the image to be detected; and the head characteristic detection module is used for inputting the human body detection frame into a pre-trained head characteristic detection classification model to obtain the human body head characteristics of the human body detection frame.
According to a third aspect, an embodiment of the present invention provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the head feature extraction method according to the first aspect or any of the embodiments of the first aspect when executing the program.
According to a fourth aspect, an embodiment of the present invention provides a storage medium, on which computer instructions are stored, and the instructions, when executed by a processor, implement the steps of the head feature extraction method according to the first aspect or any of the embodiments of the first aspect.
The technical scheme of the invention has the following advantages:
according to the head feature extraction method/device provided by the embodiment of the invention, the human body detection frame of the image to be detected is obtained firstly, and the head feature in the human body detection frame is detected.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart of a specific example of a head feature extraction method in an embodiment of the present invention;
fig. 2 is a schematic block diagram of a specific example of a head feature extraction device in an embodiment of the present invention;
fig. 3 is a schematic block diagram of a specific example of an electronic device in the embodiment of the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; the two elements may be directly connected or indirectly connected through an intermediate medium, or may be communicated with each other inside the two elements, or may be wirelessly connected or wired connected. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The present embodiment provides a method for extracting head features, as shown in fig. 1, including the following steps:
s101, acquiring an image to be detected;
for example, the image to be detected may be a street view image including a pedestrian or a non-motor vehicle driver, or may be an image including a human body in a specific environment. The acquisition mode of the image to be detected can be shooting through a camera arranged at the street or frame-extracting video, the embodiment does not limit the image to be detected and the mode of acquiring the image to be detected, and the technical personnel in the field can determine the mode according to the needs.
S102, inputting an image to be detected into a pre-trained human body detection neural network model to obtain a human body detection frame in the image to be detected;
for example, the pre-trained human body detection neural network model may be a training model with YOLO V3 as a framework, or may be a training model with YOLO V4 and EfficientDet as a framework. And inputting the image to be detected into a pre-trained human body detection neural network model to obtain a human body detection frame in the image to be detected.
S103, inputting the human body detection frame into a pre-trained head feature detection classification model to obtain the human body head features of the human body detection frame.
Illustratively, the pre-trained head feature detection classification model may be a model trained by a framework, such as a residual error network ResNet series and a dense link network densnet series. Inputting the human body detection frame into a pre-trained head feature detection classification model to obtain the human body head features of the human body detection frame, wherein the human body head features can be determined according to actual needs, for example, when a 'helmet-with-belt' action is carried out, the human body head features can judge whether the head wears a helmet or not; when detecting whether the mask is worn in a public place, the human head characteristics can be used for wearing the mask; when used to track an object, the human head feature may be an accessory item of the human head (e.g., glasses, earrings, etc.), or the like.
According to the head feature extraction method provided by the embodiment of the invention, the human body detection frame of the image to be detected is obtained firstly, and the head feature in the human body detection frame is detected.
As an optional implementation manner of this embodiment, the human body detection neural network model is a YOLOv3 neural network model, and the head feature detection classification model is a MobileNet classification network. The processing speed of the YOLOv3 neural network model (about 40ms) can meet the requirement of video real-time processing, which keeps higher mAP (mAP @ 0.5 ═ 58) while the processing speed is outstanding. The MobileNet classification network is a lightweight classification network, the calculated amount of a model can be reduced by nearly one order of magnitude by utilizing a deep separable network structure, and the loss of precision is less than one percent, so that the classification speed can be improved.
As an optional implementation manner of this embodiment, the training process of the human detection neural network model includes:
firstly, acquiring a first training sample, wherein the first training sample comprises scene images with different regions, different time periods and different illumination conditions and human body pre-labeling information in the scene images;
for example, the first training sample may be obtained by shooting street view images in different time periods, different weather conditions and different illumination conditions by using cameras arranged in different regions, or may be obtained by obtaining scene images in different regions, different weather conditions and different illumination conditions from a database, and then pre-labeling the positions of human bodies in the obtained street view/scene images. The method for pre-labeling the human body position in the obtained street view/scene image can adopt a manual labeling method, or can preliminarily apply a yolov3.weights file of a YOLO official to a YOLO V3 neural network model, label the human body position in the scene image according to the YOLO V3 neural network model, and then manually correct and finely label the label, wherein the yolov3.weights weight file contains neural network weight parameters trained by an ImageNet data set and a coco data set. The method for obtaining the first training sample is not limited in this embodiment, and can be determined by those skilled in the art as needed.
Secondly, acquiring a first pre-training neural network model trained according to a target data set;
illustratively, the target dataset may be an ImageNet dataset and/or a coco dataset. The first pre-trained neural network model trained according to the target data set may be obtained by obtaining yolov3.weights file at YOLO official website and using the yolov3.weights file in YOLO v3 neural network framework.
And thirdly, performing transfer learning on the first pre-trained neural network model according to the first training sample to obtain the human body detection neural network model.
Illustratively, the strategy of FineTune is adopted to perform fine-tuning training on the first pre-trained neural network model, because the scene image distribution in the actual scene is different from the ImageNet dataset and/or the coco dataset, which may cause false detection and missing detection.
Because the ImageNet data set and the coco data set have huge personnel data volume, distribution information of personnel data in various scenes is contained, compared with the situation that the first training sample is used for training from the beginning, the transfer learning is beneficial to improving the generalization capability of the model in strange data distribution, and the limitation of the adaptive capability of the model in the distribution of the first training sample is avoided. The specific way of performing the migration learning is to freeze the front 81 layers of YOLO v3, and only train and adjust the weight coefficients of the latter layers. The command to freeze the weights of the front 81 layers is as follows:
darknet partial cfg/yolov3.cfg yolov3.weights yolov3. conv.8181 now gets a pre-trained model named yolov3.conv.81 at the current path. And then, training the first pre-training neural network model by using the first training sample, wherein the accuracy of the verification set shows a trend of ascending first and then descending in the training process, the descending proves that the model is over-fit in the training process, the point with the highest verification accuracy is regarded as the optimal weight of the model training, and the training is stopped when the verification accuracy reaches the highest point. The command to train the weights of the network 81 layers of successors using the first training sample is as follows:
Darknet detector train cfg/coco.data cfg/yolov3.cfg yolov3.conv.81-gpus 0,1
as an optional implementation manner of this embodiment, the training process of the head feature detection classification model includes:
firstly, obtaining a second training sample, wherein the second training sample comprises a multi-class sample label, and the multi-class sample label is obtained according to a multi-label binarization function in a target function library;
illustratively, the library of objective functions may be a sklern library of functions and the multi-label binarization function may be a multilabel binarizer function. The method for obtaining the second training sample may be to manually perform head feature labeling on the human body obtained in the scene image, for example, to put helmet labels on all helmet-worn head frames in the current scene image, put glasses labels on all glasses-worn head frames, put mask labels on all mask-worn head frames, and so on. The embodiment does not limit the way of performing the head feature marking, and a person skilled in the art can determine the way as needed. After the head features of all scene images are labeled, a multi-label binary multiLabelBinarizer function in a sklern function library is used for converting a plurality of labels such as the presence or absence of a helmet, the presence or absence of an eye lens, the presence or absence of a mask and the like into one-dimensional vectors (0 represents the absence, and 1 represents the presence) composed of 0/1, and each position of the fused 0/1 vector corresponds to a fixed label. For example, vector [0, 1, 1, … ] indicates [ helmet-free, glasses-containing, mask-containing, … ].
Secondly, inputting a second training sample into a second pre-training neural network model;
and thirdly, when the second pre-training neural network model meets the preset condition, obtaining a head feature detection classification model.
For example, the predetermined condition may be that the loss function value of the second pre-trained neural network model is smaller than a preset threshold, or that the accuracy of the validation set reaches a preset threshold. The preset condition is not limited in this embodiment, and can be determined by those skilled in the art as needed.
According to the head feature extraction method provided by the embodiment of the invention, a multi-classification sample label is used in the training process instead of a simple single classification label, and mutual exclusion among features is avoided, so that the representation capability of head feature extraction is enhanced.
As an optional implementation manner of this embodiment, the head feature extraction method further includes:
and when the second training sample is input into the second pre-training neural network model and the feature extraction error occurs, repeatedly inputting the second training sample with the feature extraction error into the second pre-training neural network model, and performing iterative training for the target times.
Illustratively, when the second training sample is input into the second pre-training neural network model and the feature extraction is manually verified, the second training sample with the error is input into an error-prone set feature library, wherein the error-prone set feature library usually includes data which are difficult to recognize by human eyes and have fuzzy head features (caused by human movement or camera shake and the like). The error-prone set feature library is added into the training set in parallel for incremental training, and through three times of iterative training, the accuracy rate of extracting the head features of the model can reach over 95 percent, so that the accuracy rate of extracting the head features is improved.
As an optional implementation manner of this embodiment, the obtaining the second training sample includes:
firstly, acquiring an image to be trained;
for example, the image to be trained may be a street view image containing a pedestrian or a non-motor vehicle driver, or may be an image containing a human body in a specific environment. The acquisition mode of the image to be trained can be shooting through a camera arranged at the street or frame-extracting video, the embodiment does not limit the image to be trained and the mode of acquiring the image to be trained, and a person skilled in the art can determine the mode as required.
Secondly, inputting the image to be trained into a human body detection YOLO V3 model trained in advance to obtain a human body detection frame in the image to be trained; inputting the human body detection frame into a pre-trained feature classification YOLO V3 model to obtain a label corresponding to the human body detection frame, and constructing according to the human body detection frame and the corresponding label to obtain a second training sample.
Illustratively, a pre-trained human detection YOLO V3 model and a pre-trained feature classification YOLO V3 model are concatenated to obtain a head detection box and a corresponding label. And taking the head detection frame and the corresponding label as a second training sample, namely taking the detection result of the pre-trained neural network model as the training sample of the head feature detection classification model.
Compared with a feature classification YOLO V3 model, the head feature detection classification model is simpler in network model and faster in processing speed, and therefore, in this embodiment, the head feature detection classification model is selected to perform head feature detection on an actual image to be detected. However, in the training process of the head feature detection classification model, because the pictures input into the network are human-shaped frames and the classification is based on the local features of the head, a small amount of data (a hundred-degree data set) cannot teach that the model focuses on the concerned feature points, and the generalization capability of the model is often poor. Therefore, a large number of training samples are required to achieve a better classification effect.
In this embodiment, the pre-trained YOLO V3 model and the pre-trained feature classification YOLO V3 model are connected in series, and the input data of the pre-trained feature classification YOLO V3 model is the output result of the pre-trained YOLO V3 model, and the output result already contains the position information of the head, so that the model trained from a small data set (500+) has a relatively good generalization capability. The neural network model trained by a small number of data sets provides a large number of training samples for the head characteristic detection classification model, so that the cost of a large number of manual labels can be saved, the ground propulsion speed of the project is increased, and the thought is provided for rapidly establishing a large-scale classification data set.
As an optional implementation manner of this embodiment, inputting an image to be detected to a pre-trained human detection neural network model includes: adjusting the size of an image to be detected to a first target size, and inputting the image to be detected of the first target size into a human body detection neural network model trained in advance; and/or
Inputting a human body detection frame into a pre-trained head feature detection classification model, comprising: and adjusting the size of the human body detection frame to a second target size, and inputting the human body detection frame with the second target size into the pre-trained head feature detection classification model.
Illustratively, the first target size may be 608 × 608, and the second target size may be 416 × 416 (or 320 × 320), and the present embodiment does not limit the first target size and the second target size, and may be determined by a person skilled in the art as needed. The size of the image to be detected and/or the human body detection frame is enlarged, so that the accuracy and the speed of detection can be improved.
An embodiment of the present invention provides a head feature extraction device, as shown in fig. 2, including:
an image to be detected acquisition module 201, configured to acquire an image to be detected; for details, reference is made to the corresponding parts of the above method embodiments, which are not described herein again.
The human body detection module 202 is used for inputting the image to be detected into a pre-trained human body detection neural network model to obtain a human body detection frame in the image to be detected; for details, reference is made to the corresponding parts of the above method embodiments, which are not described herein again.
The head feature detection module 203 is configured to input the human body detection frame into a pre-trained head feature detection classification model to obtain human body head features of the human body detection frame. For details, reference is made to the corresponding parts of the above method embodiments, which are not described herein again.
As an optional implementation manner of this embodiment, the human body detection neural network model is a YOLOv3 neural network model, and the head feature detection classification model is a MobileNet classification network. For details, reference is made to the corresponding parts of the above method embodiments, which are not described herein again.
As an optional implementation manner of this embodiment, the human body detection module includes:
the first training sample acquisition module is used for acquiring a first training sample, wherein the first training sample comprises scene images of different regions, different time periods and different illumination conditions and human body pre-labeling information in the scene images; for details, reference is made to the corresponding parts of the above method embodiments, which are not described herein again.
The first pre-training neural network model acquisition module is used for acquiring a first pre-training neural network model trained according to a target data set; for details, reference is made to the corresponding parts of the above method embodiments, which are not described herein again.
And the human body detection neural network model determining module is used for carrying out transfer learning on the first pre-trained neural network model according to the first training sample to obtain the human body detection neural network model. For details, reference is made to the corresponding parts of the above method embodiments, which are not described herein again.
As an optional implementation manner of this embodiment, the head feature detection module includes:
the second training sample acquisition module is used for acquiring a second training sample, wherein the second training sample comprises a multi-classification sample label, and the multi-classification sample label is obtained according to a multi-label binarization function in the target function library; for details, reference is made to the corresponding parts of the above method embodiments, which are not described herein again.
The second training sample input module is used for inputting a second training sample to the second pre-training neural network model; for details, reference is made to the corresponding parts of the above method embodiments, which are not described herein again.
And the head feature detection classification model determining module is used for obtaining a head feature detection classification model when the second pre-training neural network model meets the preset condition. For details, reference is made to the corresponding parts of the above method embodiments, which are not described herein again.
As an optional implementation manner of this embodiment, the apparatus further includes:
and the iterative training module is used for repeatedly inputting the second training sample with the characteristic extraction error into the second pre-training neural network model when the second training sample is input into the second pre-training neural network model and the characteristic extraction error occurs, and performing iterative training for the target times. For details, reference is made to the corresponding parts of the above method embodiments, which are not described herein again.
As an optional implementation manner of this embodiment, the second training sample obtaining module includes:
the image to be trained acquisition module is used for acquiring an image to be trained; for details, reference is made to the corresponding parts of the above method embodiments, which are not described herein again.
The image to be trained input module is used for inputting the image to be trained into a human body detection YOLO V3 model trained in advance to obtain a human body detection frame in the image to be trained; for details, reference is made to the corresponding parts of the above method embodiments, which are not described herein again.
And the second training sample determining module is used for inputting the human body detection frame into a pre-trained feature classification YOLO V3 model to obtain a label corresponding to the human body detection frame, and constructing according to the human body detection frame and the corresponding label to obtain a second training sample. For details, reference is made to the corresponding parts of the above method embodiments, which are not described herein again.
As an optional implementation manner of this embodiment, the human body detection module includes:
the first size adjusting module is used for adjusting the size of the image to be detected to a first target size and inputting the image to be detected of the first target size to a pre-trained human body detection neural network model; for details, reference is made to the corresponding parts of the above method embodiments, which are not described herein again. And/or
A head feature detection module comprising: and the second size adjusting module is used for adjusting the size of the human body detection frame to a second target size and inputting the human body detection frame with the second target size into the pre-trained head feature detection classification model. For details, reference is made to the corresponding parts of the above method embodiments, which are not described herein again.
The embodiment of the present application also provides an electronic device, as shown in fig. 3, including a processor 310 and a memory 320, where the processor 310 and the memory 320 may be connected by a bus or in other manners.
Processor 310 may be a Central Processing Unit (CPU). The Processor 310 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or any combination thereof.
The memory 320, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the head feature extraction method in the embodiment of the present invention. The processor executes various functional applications and data processing of the processor by executing non-transitory software programs, instructions, and modules stored in the memory.
The memory 320 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by the processor, and the like. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 320 may optionally include memory located remotely from the processor, which may be connected to the processor via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The one or more modules are stored in the memory 320 and, when executed by the processor 310, perform a head feature extraction method as in the embodiment shown in fig. 1.
The details of the electronic device may be understood with reference to the corresponding related description and effects in the embodiment shown in fig. 1, and are not described herein again.
The present embodiment also provides a computer storage medium, which stores computer-executable instructions that can execute the method for extracting the head features in any of the method embodiments 1. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), a Solid State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.
It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications therefrom are within the scope of the invention.

Claims (10)

1. A head feature extraction method is characterized by comprising the following steps:
acquiring an image to be detected;
inputting the image to be detected into a pre-trained human body detection neural network model to obtain a human body detection frame in the image to be detected;
and inputting the human body detection frame into a pre-trained head feature detection classification model to obtain the human body head features of the human body detection frame.
2. The method of claim 1, wherein the human detection neural network model is a YOLOv3 neural network model and the head feature detection classification model is a MobileNet classification network.
3. The method of claim 1, wherein the training process of the human detection neural network model comprises:
acquiring a first training sample, wherein the first training sample comprises scene images of different regions, different time periods and different illumination conditions and human body pre-labeling information in the scene images;
acquiring a first pre-training neural network model trained according to a target data set;
and carrying out transfer learning on the first pre-training neural network model according to the first training sample to obtain a human body detection neural network model.
4. The method of claim 1, wherein the training process of the head feature detection classification model comprises:
acquiring a second training sample, wherein the second training sample comprises a multi-class sample label, and the multi-class sample label is obtained according to a multi-label binarization function in a target function library;
inputting the second training sample to a second pre-trained neural network model;
and when the second pre-training neural network model meets the preset condition, obtaining a head feature detection classification model.
5. The method of claim 4, further comprising:
and when the second training sample is input into the second pre-training neural network model and the feature extraction error occurs, repeatedly inputting the second training sample with the feature extraction error into the second pre-training neural network model, and performing iterative training for the target times.
6. The method of claim 4, wherein obtaining second training samples comprises:
acquiring an image to be trained;
inputting the image to be trained into a human body detection YOLO V3 model trained in advance to obtain a human body detection frame in the image to be trained;
inputting the human body detection frame into a pre-trained feature classification YOLO V3 model to obtain a label corresponding to the human body detection frame, and constructing according to the human body detection frame and the corresponding label to obtain the second training sample.
7. The method of claim 1, wherein inputting the image to be detected into a pre-trained human detection neural network model comprises: adjusting the size of the image to be detected to a first target size, and inputting the image to be detected of the first target size into a human body detection neural network model trained in advance; and/or
Inputting the human body detection frame into a pre-trained head feature detection classification model, comprising: and adjusting the size of the human body detection frame to a second target size, and inputting the human body detection frame with the second target size into a pre-trained head feature detection classification model.
8. A head feature extraction device characterized by comprising:
the image acquisition module to be detected is used for acquiring an image to be detected;
the human body detection module is used for inputting the image to be detected into a pre-trained human body detection neural network model to obtain a human body detection frame in the image to be detected;
and the head characteristic detection module is used for inputting the human body detection frame into a pre-trained head characteristic detection classification model to obtain the human body head characteristics of the human body detection frame.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the head feature extraction method according to any one of claims 1 to 7 are implemented when the program is executed by the processor.
10. A storage medium having stored thereon computer instructions, which when executed by a processor, carry out the steps of the head feature extraction method of any one of claims 1 to 7.
CN202011087869.6A 2020-10-12 2020-10-12 Head feature extraction method and device, electronic equipment and storage medium Pending CN112163545A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011087869.6A CN112163545A (en) 2020-10-12 2020-10-12 Head feature extraction method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011087869.6A CN112163545A (en) 2020-10-12 2020-10-12 Head feature extraction method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112163545A true CN112163545A (en) 2021-01-01

Family

ID=73866529

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011087869.6A Pending CN112163545A (en) 2020-10-12 2020-10-12 Head feature extraction method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112163545A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112733754A (en) * 2021-01-15 2021-04-30 上海有个机器人有限公司 Infrared night vision image pedestrian detection method, electronic device and storage medium
CN112883840A (en) * 2021-02-02 2021-06-01 中国人民公安大学 Power transmission line extraction method based on key point detection
CN113240671A (en) * 2021-06-16 2021-08-10 重庆科技学院 Water turbine runner blade defect detection method based on YoloV4-Lite network
CN113762190A (en) * 2021-09-15 2021-12-07 中科微至智能制造科技江苏股份有限公司 Neural network-based parcel stacking detection method and device
CN115713715A (en) * 2022-11-22 2023-02-24 天津安捷物联科技股份有限公司 Human behavior recognition method and system based on deep learning

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018121690A1 (en) * 2016-12-29 2018-07-05 北京市商汤科技开发有限公司 Object attribute detection method and device, neural network training method and device, and regional detection method and device
CN108416265A (en) * 2018-01-30 2018-08-17 深圳大学 A kind of method for detecting human face, device, equipment and storage medium
US10083352B1 (en) * 2017-05-22 2018-09-25 Amazon Technologies, Inc. Presence detection and detection localization
CN110414428A (en) * 2019-07-26 2019-11-05 厦门美图之家科技有限公司 A method of generating face character information identification model
CN110490099A (en) * 2019-07-31 2019-11-22 武汉大学 A kind of subway common location stream of people's analysis method based on machine vision
CN111275058A (en) * 2020-02-21 2020-06-12 上海高重信息科技有限公司 Safety helmet wearing and color identification method and device based on pedestrian re-identification
CN111489284A (en) * 2019-01-29 2020-08-04 北京搜狗科技发展有限公司 Image processing method and device for image processing
CN111598066A (en) * 2020-07-24 2020-08-28 之江实验室 Helmet wearing identification method based on cascade prediction

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018121690A1 (en) * 2016-12-29 2018-07-05 北京市商汤科技开发有限公司 Object attribute detection method and device, neural network training method and device, and regional detection method and device
US10083352B1 (en) * 2017-05-22 2018-09-25 Amazon Technologies, Inc. Presence detection and detection localization
CN108416265A (en) * 2018-01-30 2018-08-17 深圳大学 A kind of method for detecting human face, device, equipment and storage medium
CN111489284A (en) * 2019-01-29 2020-08-04 北京搜狗科技发展有限公司 Image processing method and device for image processing
CN110414428A (en) * 2019-07-26 2019-11-05 厦门美图之家科技有限公司 A method of generating face character information identification model
CN110490099A (en) * 2019-07-31 2019-11-22 武汉大学 A kind of subway common location stream of people's analysis method based on machine vision
CN111275058A (en) * 2020-02-21 2020-06-12 上海高重信息科技有限公司 Safety helmet wearing and color identification method and device based on pedestrian re-identification
CN111598066A (en) * 2020-07-24 2020-08-28 之江实验室 Helmet wearing identification method based on cascade prediction

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112733754A (en) * 2021-01-15 2021-04-30 上海有个机器人有限公司 Infrared night vision image pedestrian detection method, electronic device and storage medium
CN112883840A (en) * 2021-02-02 2021-06-01 中国人民公安大学 Power transmission line extraction method based on key point detection
CN112883840B (en) * 2021-02-02 2023-07-07 中国人民公安大学 Power transmission line extraction method based on key point detection
CN113240671A (en) * 2021-06-16 2021-08-10 重庆科技学院 Water turbine runner blade defect detection method based on YoloV4-Lite network
CN113762190A (en) * 2021-09-15 2021-12-07 中科微至智能制造科技江苏股份有限公司 Neural network-based parcel stacking detection method and device
CN113762190B (en) * 2021-09-15 2024-03-29 中科微至科技股份有限公司 Method and device for detecting package stacking based on neural network
CN115713715A (en) * 2022-11-22 2023-02-24 天津安捷物联科技股份有限公司 Human behavior recognition method and system based on deep learning
CN115713715B (en) * 2022-11-22 2023-10-31 天津安捷物联科技股份有限公司 Human behavior recognition method and recognition system based on deep learning

Similar Documents

Publication Publication Date Title
CN112163545A (en) Head feature extraction method and device, electronic equipment and storage medium
CN110399856B (en) Feature extraction network training method, image processing method, device and equipment
CN112417953B (en) Road condition detection and map data updating method, device, system and equipment
CN111209810A (en) Bounding box segmentation supervision deep neural network architecture for accurately detecting pedestrians in real time in visible light and infrared images
CN112801008B (en) Pedestrian re-recognition method and device, electronic equipment and readable storage medium
CN112166439A (en) True-to-composite image domain transfer
CN107918776B (en) Land planning method and system based on machine vision and electronic equipment
CN111797657A (en) Vehicle peripheral obstacle detection method, device, storage medium, and electronic apparatus
Zhang et al. Deep learning in lane marking detection: A survey
Han et al. A comprehensive review for typical applications based upon unmanned aerial vehicle platform
WO2022037587A1 (en) Methods and systems for video processing
CN110555420B (en) Fusion model network and method based on pedestrian regional feature extraction and re-identification
Bell et al. A novel system for nighttime vehicle detection based on foveal classifiers with real-time performance
KR20230127287A (en) Pose estimation method and related device
CN109543691A (en) Ponding recognition methods, device and storage medium
CN112364778A (en) Power plant safety behavior information automatic detection method based on deep learning
CN113065645A (en) Twin attention network, image processing method and device
CN111967396A (en) Processing method, device and equipment for obstacle detection and storage medium
CN110781980A (en) Training method of target detection model, target detection method and device
CN112990057A (en) Human body posture recognition method and device and electronic equipment
Sayeed et al. Bangladeshi traffic sign recognition and classification using cnn with different kinds of transfer learning through a new (btsrb) dataset
CN111767831A (en) Method, apparatus, device and storage medium for processing image
Basnyat et al. Flood detection using semantic segmentation and multimodal data fusion
WO2021147055A1 (en) Systems and methods for video anomaly detection using multi-scale image frame prediction network
CN112288701A (en) Intelligent traffic image detection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination