CN110598592A - Intelligent real-time video monitoring method suitable for nursing places - Google Patents

Intelligent real-time video monitoring method suitable for nursing places Download PDF

Info

Publication number
CN110598592A
CN110598592A CN201910805510.9A CN201910805510A CN110598592A CN 110598592 A CN110598592 A CN 110598592A CN 201910805510 A CN201910805510 A CN 201910805510A CN 110598592 A CN110598592 A CN 110598592A
Authority
CN
China
Prior art keywords
detection
detection frame
frame
network
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201910805510.9A
Other languages
Chinese (zh)
Inventor
袁贤
彭富明
孙瑜
陈阳阳
杨涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Tech University
Original Assignee
Nanjing Tech University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Tech University filed Critical Nanjing Tech University
Priority to CN201910805510.9A priority Critical patent/CN110598592A/en
Publication of CN110598592A publication Critical patent/CN110598592A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Multimedia (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)
  • Closed-Circuit Television Systems (AREA)

Abstract

The invention provides an intelligent real-time video monitoring method suitable for nursing places, which comprises the following steps: inputting the image sequence into a designed detection network; detecting multi-model fusion of the network; non-primary category suppression based on context; identifying the attribute of the person object; tracking of a human object of interest.

Description

Intelligent real-time video monitoring method suitable for nursing places
Technical Field
The invention relates to an intelligent monitoring technology for nursing institutions, in particular to an intelligent real-time video monitoring method suitable for nursing places.
Background
The important technology of intelligent video monitoring is the identification and tracking of targets. The current tracking based on deep learning is mostly realized based on detection. The research based on the deep learning target detection comprises the related fields of target detection based on a single-frame image, target detection based on a video, target detection based on a three-dimensional point cloud and the like. With the development of deep learning, the application of the convolutional neural network in image-based target detection is mainly based on detection of a single frame image, and video-based target detection is gradually concerned, and the data relevance of time sequence information and context of a video is concerned, so that end-to-end object detection is realized. However, the difficulty is that the detection of the object in the image is disabled due to the change of the visual angle, the illumination, the self deformation and the like, and particularly, the problems of motion blur, low resolution and the like often exist in the video.
After detecting a moving object, in a background application of a medical environment, it is necessary to further identify whether the object belongs to a patient and acquire identity information of the patient. At present, the target recognition can be served by the traditional artificial design features or the abstract depth features obtained through the convolutional neural network, and a very high accuracy rate can be obtained. The target identification means is realized based on classification means, such as classification based on characteristic distance, statistical learning-based classification means and logistic regression-based classification means. The classification means is many and mature, and the main working difficulty is to design a set of suitable feature extraction means and classification means.
Disclosure of Invention
The invention aims to provide an intelligent real-time video monitoring method suitable for nursing places.
The technical scheme for realizing the purpose of the invention is as follows: an intelligent real-time video monitoring method suitable for nursing places comprises the following steps:
inputting the image sequence into a designed detection network;
detecting multi-model fusion of the network;
non-primary category suppression based on context;
identifying the attribute of the person object;
tracking of a human object of interest.
Further, the detection networks include the ResNet residual network and the GoogleNet Google network.
Further, multiple models of the detection network are fused into a result union set of multiple model detection.
Further, the non-main category of the image after multi-model fusion is suppressed by using the target window statistical information in the multi-frame information of the video sequence, which specifically comprises the following steps:
step S301, assuming that A is a set of all bounding boxes of a plurality of frames in a video;
step S302, summing the detection scores of each bounding box in the A on each category to obtain the total score of each category in the video frames;
step S303, the categories of the objects are sorted from high to low according to the total score, a threshold is set, if the score of an object is lower than the threshold, the object is determined to be a non-main category, the bounding box of the object is deleted, and the object is not subjected to the attribute recognition of the character object.
Further, the attribute identification of the person object specifically includes:
inputting the image with the main category suppressed to a VGG-16 convolutional neural network;
fusing the information of the characteristic diagrams of the 1, 3 and 5 convolution layers of the VGG-16 convolution neural network;
after the fused depth features are obtained, calculating the Euclidean distance between one frame of image in the monitoring video and a template in a library by using a classification means based on feature distance;
finding out a template image with the minimum Euclidean distance between the bounding box and the library and smaller than a preset threshold value, judging that a target object in the detection frame is the object in the library, giving semantic information to the detection frame as an attribute entry of the template object, and marking the detection frame as a patient detection frame;
if the distance is larger than the preset threshold value, the target object in the detection frame is judged not to be the template object in the library,
and if the Euclidean distances between the images in the detection frame and the images in all the libraries are larger than a certain threshold value, judging that the object in the detection frame does not belong to the patient group, and abandoning the follow-up tracking of the detection frame.
Further, a Deep-Sort tracking algorithm is adopted to track the patient detection frame.
Compared with the prior art, the invention has the following advantages: (1) by adopting a multi-model fusion technology and non-main category inhibition based on context, the accuracy of moving target detection can be improved; (2) the detected moving target image is subjected to feature extraction under a VGG-16 framework, multi-layer features are fused, similarity measurement is carried out by utilizing the fused depth features, whether a target in a detection frame is a patient or not is obtained, the accuracy of character identification can be improved, meanwhile, due to the fact that character attribute identification is introduced, more reference information can be provided for monitoring personnel/nursing personnel, and the moving target is not provided any more by only using an enclosure frame to hook the moving target under other application scenes.
The invention is further described below with reference to the accompanying drawings.
Drawings
Fig. 1 is a schematic diagram of a framework flow of a target identification and tracking method provided by the present invention.
Fig. 2 is a schematic diagram of two kinds of multi-model fusion based on SSD detection provided by the present invention.
Fig. 3 is a schematic diagram of a framework flow of the target attribute identification method provided by the present invention.
Detailed Description
The framework flow of the target recognition and tracking method is shown in fig. 1. The method comprises the steps that a video image sequence is obtained through a common camera installed at a proper position in a nursing place, the video image sequence is respectively input into an improved ResNet residual error network SSD frame and an improved GoogleNet Google network SSD frame, as shown in figure 2, responses of different detectors are carried out in a single-frame image, different features are extracted, and different detection frame (BoudingBox) results corresponding to an object are obtained. The improved method is characterized in that an SSD detection framework is improved, the original SSD is classified by using a VGG network, a ResNet residual network is used on the SSD framework, and a GoogleNet Google network is used on the SSD framework.
And performing multi-model fusion on two different detection frames obtained by two different detection methods in the last step, as shown in fig. 2, wherein the core of the multi-model fusion is strategy selection of multi-mode information fusion. In the application context of the nursing place, in order to reduce the missed detection rate, the results of the multi-model detection are merged. In order to make full use of the information of the detection frames, the detection relocation is performed, that is, the position information of the detection frames is re-corrected according to the overlapping condition of the surrounding frames and the scores. And (3) performing model integration on the SSD detection frame (marked as A) based on the residual error network and the SSD detection frame (marked as B) based on the Google network, so that the SSD detection frame and the SSD detection frame (marked as B) jointly predict the detection results, and the detection results are merged. And the results of detection by A + those by B were collected. For example, if a detects that a patient 1 appears in the X region of the picture, and B detects that no patient appears in the picture, the final result will be determined that a patient 1 appears in the X region; if the AB simultaneously detects the appearance of the patient 1, but respectively appears in the X area and the Y area, the final result is judged that the X U Y area appears the patient.
Optimizing the corrected detection frame obtained in the last step, eliminating interference in a complex environment, fully utilizing information on a video stream time sequence, designing and using non-main category suppression based on context, performing statistical analysis on all detected detection frames of multiple frames in a video, suppressing non-main category objects with later scores, and further optimizing a detection result. In particular to
S301, assuming that A is a set of all bounding boxes of multiple frames in a video, when a moving object is detected, a rectangular frame is used to surround the moving object, the rectangular frame is a bounding box and is used for dividing a local area in the detection frame, and only the local area is used for classification or regression, so that the operation amount can be reduced;
step S302, summing up the detection scores of each bounding box in a on each category, so as to obtain a total score of each category in the video frames.
In step S303, the categories of the objects are sorted from high to low according to the total score, the passing inhibition score is far lower than the detection score of the main category, and then the bounding box of the non-main category is removed.
For the detection result obtained in the last step, the identity information of the detection target is further required to be obtained, as shown in fig. 3, a VGG-16 convolutional neural network is used for extracting features, except for using the feature map of the convolutional layer at the highest layer, a feature map with a larger size at the previous layer is also used, and the feature map at the last layer is subjected to upsampling and then fused with the feature map at the previous layer, so that the spatial resolution capability is improved, and the information of the feature maps of 1, 3 and 5 convolutional layers of the VGG-16 is used for fusion. After the fused depth features are obtained, calculating the Euclidean distance between the subject and a template in a library by using a classification means based on feature distance, if the distance between a traversal template and the template image with the minimum Euclidean distance between the detection frame and the template in the library is found out after the traversal template has the distance smaller than a certain threshold value, judging that a target object in the detection frame is the object in the library, giving semantic information of the detection frame as an attribute vocabulary entry of the template object, and marking the detection frame as a patient detection frame; if the distance is greater than a certain threshold value, the target object in the detection frame is judged not to be the template object in the library, if the Euclidean distances between the image in the detection frame and the images in all the libraries are greater than the certain threshold value, the object in the detection frame is judged not to belong to a patient group, and follow-up tracking on the detection frame is abandoned. For example: a, B, C three patients are in the template, the threshold set by me is 10, then the Euclidean distances of the actually obtained images and A, B, C are respectively 40 (greater than 10, not A type), 56 (greater than 10, not B type) and 78 (greater than 10, not C type), and as all the distances (40, 56 and 78) are greater than 10, the monitored target is judged not to be a patient, the target is directly discarded, and the target is not tracked subsequently.
And finally, in order to realize stable tracking of the target, the high-efficiency Deep-Sort-based tracking algorithm is adopted, the tracker is used for compensating the detector, meanwhile, the tracking algorithm can be used for transferring the detection frame with higher score to the adjacent frame, and the non-maximum value is used for inhibiting and eliminating the redundant detection frame, so that the accuracy rate is improved. Therefore, the accuracy rate and precision rate of object detection in the video can be further improved by the combined optimization mode. The information and the detection frame of the target object are displayed in a man-machine interaction interface in real time, and measures such as sound alarm and the like are assisted, so that the staff are effectively helped to nurse the patient.

Claims (6)

1. An intelligent real-time video monitoring method suitable for nursing places is characterized by comprising the following steps:
inputting the image sequence into a designed detection network;
detecting multi-model fusion of the network;
non-primary category suppression based on context;
identifying the attribute of the person object;
tracking of a human object of interest.
2. The method of claim 1, wherein the detection networks include a ResNet residual network and a GoogleNet Google network.
3. The method of claim 1, wherein the multi-model fusion of the detection network is a union of results of the multi-model detection.
4. The method according to claim 1, wherein the non-main categories of the multi-model fused image are suppressed by using the target window statistical information in the multi-frame information of the video sequence, specifically:
step S301, assuming that A is a set of all bounding boxes of a plurality of frames in a video;
step S302, summing the detection scores of each bounding box in the A on each category to obtain the total score of each category in the video frames;
step S303, the categories of the objects are sorted from high to low according to the total score, a threshold is set, if the score of an object is lower than the threshold, the object is determined to be a non-main category, the bounding box of the object is deleted, and the object is not subjected to the attribute recognition of the character object.
5. The method of claim 1, wherein the attribute identification of the human object specifically comprises: inputting the image with the main category suppressed to a VGG-16 convolutional neural network;
fusing the information of the characteristic diagrams of the 1, 3 and 5 convolution layers of the VGG-16 convolution neural network;
after the fused depth features are obtained, calculating the Euclidean distance between one frame of image in the monitoring video and a template in a library by using a classification means based on feature distance;
finding out a template image with the minimum Euclidean distance between the bounding box and the library and smaller than a preset threshold value, judging that a target object in the detection frame is the object in the library, giving semantic information to the detection frame as an attribute entry of the template object, and marking the detection frame as a patient detection frame;
if the distance is larger than the preset threshold value, the target object in the detection frame is judged not to be the template object in the library,
and if the Euclidean distances between the images in the detection frame and the images in all the libraries are larger than a certain threshold value, judging that the object in the detection frame does not belong to the patient group, and abandoning the follow-up tracking of the detection frame.
6. The method of claim 7, wherein the patient detection box is tracked using a Deep-Sort tracking algorithm.
CN201910805510.9A 2019-08-29 2019-08-29 Intelligent real-time video monitoring method suitable for nursing places Withdrawn CN110598592A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910805510.9A CN110598592A (en) 2019-08-29 2019-08-29 Intelligent real-time video monitoring method suitable for nursing places

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910805510.9A CN110598592A (en) 2019-08-29 2019-08-29 Intelligent real-time video monitoring method suitable for nursing places

Publications (1)

Publication Number Publication Date
CN110598592A true CN110598592A (en) 2019-12-20

Family

ID=68856179

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910805510.9A Withdrawn CN110598592A (en) 2019-08-29 2019-08-29 Intelligent real-time video monitoring method suitable for nursing places

Country Status (1)

Country Link
CN (1) CN110598592A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111325279A (en) * 2020-02-26 2020-06-23 福州大学 Pedestrian and personal sensitive article tracking method fusing visual relationship
CN118152989A (en) * 2024-05-11 2024-06-07 飞狐信息技术(天津)有限公司 Video auditing method and device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111325279A (en) * 2020-02-26 2020-06-23 福州大学 Pedestrian and personal sensitive article tracking method fusing visual relationship
CN111325279B (en) * 2020-02-26 2022-06-10 福州大学 Pedestrian and personal sensitive article tracking method fusing visual relationship
CN118152989A (en) * 2024-05-11 2024-06-07 飞狐信息技术(天津)有限公司 Video auditing method and device

Similar Documents

Publication Publication Date Title
US10706285B2 (en) Automatic ship tracking method and system based on deep learning network and mean shift
CN109635666B (en) Image target rapid detection method based on deep learning
WO2019232894A1 (en) Complex scene-based human body key point detection system and method
Choi et al. A general framework for tracking multiple people from a moving camera
Cong et al. Does thermal really always matter for RGB-T salient object detection?
CN108388882B (en) Gesture recognition method based on global-local RGB-D multi-mode
CN102831439B (en) Gesture tracking method and system
CN108062525B (en) Deep learning hand detection method based on hand region prediction
CN112488073A (en) Target detection method, system, device and storage medium
CN110796018B (en) Hand motion recognition method based on depth image and color image
JP2022548569A (en) Keypoint-based pose tracking using entailment
CN110688980B (en) Human body posture classification method based on computer vision
CN103677274A (en) Interactive projection method and system based on active vision
Höferlin et al. Uncertainty-aware video visual analytics of tracked moving objects
CN112541403B (en) Indoor personnel falling detection method by utilizing infrared camera
CN106504274A (en) A kind of visual tracking method and system based under infrared camera
CN110598592A (en) Intelligent real-time video monitoring method suitable for nursing places
CN112651994A (en) Ground multi-target tracking method
Kota et al. Automated detection of handwritten whiteboard content in lecture videos for summarization
Fei et al. Flow-pose Net: An effective two-stream network for fall detection
CN113780145A (en) Sperm morphology detection method, sperm morphology detection device, computer equipment and storage medium
Feng Mask RCNN-based single shot multibox detector for gesture recognition in physical education
CN116824641B (en) Gesture classification method, device, equipment and computer storage medium
CN113361475A (en) Multi-spectral pedestrian detection method based on multi-stage feature fusion information multiplexing
Fan et al. Positive-aware lesion detection network with cross-scale feature pyramid for OCT images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20191220

WW01 Invention patent application withdrawn after publication