CN112287877A - Multi-role close-up shot tracking method - Google Patents

Multi-role close-up shot tracking method Download PDF

Info

Publication number
CN112287877A
CN112287877A CN202011294296.4A CN202011294296A CN112287877A CN 112287877 A CN112287877 A CN 112287877A CN 202011294296 A CN202011294296 A CN 202011294296A CN 112287877 A CN112287877 A CN 112287877A
Authority
CN
China
Prior art keywords
video
close
path
video data
deep learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011294296.4A
Other languages
Chinese (zh)
Other versions
CN112287877B (en
Inventor
方倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Aikor Intelligent Technology Co ltd
Original Assignee
Shanghai Sike Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Sike Intelligent Technology Co ltd filed Critical Shanghai Sike Intelligent Technology Co ltd
Priority to CN202011294296.4A priority Critical patent/CN112287877B/en
Publication of CN112287877A publication Critical patent/CN112287877A/en
Application granted granted Critical
Publication of CN112287877B publication Critical patent/CN112287877B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/165Detection; Localisation; Normalisation using facial parts and geometric relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Geometry (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a multi-role close-up shot tracking method, which comprises the following steps: acquiring multi-channel video data, constructing a deep learning model based on a CNN network, and respectively carrying out human body detection and face detection on the multi-channel video data through the deep learning model; respectively carrying out identity matching on people appearing in each path of video according to the human body detection result and the human face detection result; and respectively selecting the lens with the optimal view angle for different identity characters, and pushing the image and/or video stream with the optimal view angle corresponding to the different identity characters. The invention can accurately judge the effective character data in the monitoring system, improve the analysis capability of the video data, provide high-quality analysis results and provide close-up images and/or video streams of detected characters in real time.

Description

Multi-role close-up shot tracking method
Technical Field
The invention relates to the technical field of video image processing, in particular to a method for tracking a multi-angle close-up shot.
Background
At present, the deep learning technology is continuously developed and advanced, and becomes one of the most popular scientific trends at present. Convolutional Neural Networks (CNNs) are important algorithms in deep learning, are very good at handling image-related problems, are widely used in the field of computer vision today, and play an important role in face detection, image retrieval, and the like.
In the prior art, people to be monitored often need to be monitored and identified, most of the existing methods are used for monitoring the people to be monitored in real time by arranging video monitoring equipment, establishing a model for a large-scale data set obtained by monitoring, extracting characteristics and outputting related data of the people to be monitored, but in most scenes, monitoring and identifying by utilizing video data are difficult to effectively meet customized service requirements, such as: because child abusing events of the childbirth control mechanism occur frequently, in order to ensure the safety of children and keep the consistency of home care and childbirth care, a household often wants to check the monitoring video of the childbirth control mechanism in real time. This need is often difficult to meet, however, for the following reasons: (1) the privacy of other children can be invaded by directly using the traditional monitoring system to check the video; (2) the characteristic education content of the entrusting institution can be revealed by displaying the video of the full picture, so that the competitiveness of the entrusting institution is weakened; (3) even if some support mechanisms allow guardians to check monitoring videos on mobile phone software or computer software in real time, due to the fact that the shooting angle of the monitoring camera is fixed and the positions of infants move, the monitoring camera cannot change according to the actions of people to be detected in real time, the guardians cannot always see close-up shots of children, time is needed to be spent on positioning the children in each monitoring picture, the data checked by all the guardians are consistent, and no customized video data are distributed. Therefore, based on the above-mentioned practical problems, it is urgently needed to provide a close-up scene that can track the occurrence of a specific character in a specific scene in real time under multiple paths of monitoring videos in some scenes, and realize highly customized automatic generation service of video data so as to meet the customized data requirements in such scenes.
Disclosure of Invention
In order to solve the above technical problem, the present invention provides a method for tracking a multi-angle close-up shot. In the system and the method,
in order to achieve the purpose, the technical scheme of the invention is as follows:
a multi-character close-up shot tracking method, comprising the steps of:
obtaining multi-channel video data, constructing a deep learning model based on a CNN network, and passing the deep learning model
Respectively carrying out human body detection and human face detection on the multi-channel video data;
respectively carrying out identity matching on people appearing in each path of video according to the human body detection result and the human face detection result;
and respectively selecting the lens with the optimal view angle for different identity characters, and pushing the image and/or video stream with the optimal view angle corresponding to the different identity characters.
Preferably, before the step of pushing the optimal view angle images and/or video streams corresponding to the different identity characters, the method further includes the steps of performing corresponding central area interception on the person in the lens of the optimal view angle of each different identity character, and performing high-definition image restoration on the intercepted area.
Preferably, the shot with the optimal viewing angle is the shot with the largest number of key points for face detection in the multi-path video.
Preferably, the key points of the face detection include a left inner and outer eye corner, a nose heel point, a right inner and outer eye corner, a nose root point, a left nose wing, a right nose wing, a nose separation point, a left lip, a right lip, an upper lip, a lower lip and a mental point.
Preferably, the human body detection and the face detection are respectively performed on the multiple paths of video data, and the method specifically includes the following steps:
constructing a deep learning model based on a CNN network, extracting image features from multi-path video data, and completing the process
Generating a plurality of location box predictions and category predictions;
the method comprises the steps that loss calculation is carried out on a plurality of position frame predictions and category predictions and a label frame respectively to obtain corresponding loss values;
and updating parameters of the deep learning model according to the loss value.
Preferably, the loss function adopted by the position prediction is smooth L1 loss:
Figure BDA0002784899120000021
in the formula, x1And x2Both are the difference between the position prediction and the real position, and the value ranges of the position prediction and the real position are respectively-1<x1<1,x2>1 or x2<-1。
Preferably, the loss function of the classification prediction is a cross-entropy function:
Figure BDA0002784899120000022
in the formula, y'iIndex data tag, yiRefers to the prediction probability value.
Preferably, the specific method for respectively performing identity matching on people in each video according to the human body detection result and the human face detection result comprises the following steps:
calling a pre-trained feature vector extraction model, and extracting feature vectors of characters in each path of video from the video stream;
calculating Euler distance between every two feature vectors;
according to the calculated Euler distance, obtaining a similarity result of people in each path of video;
and matching the identities of the people in each path of video according to the similarity result.
Preferably, the euler distance is calculated by the formula:
Figure BDA0002784899120000031
in the formula, miAnd niAre elements of any two sets of feature vectors in different video streams.
Preferably, the multi-channel video data is acquired from different angles through a plurality of monitoring acquisition devices.
Based on the technical scheme, the invention has the beneficial effects that: the invention takes a deep learning technology as a core, firstly solves the problems of pedestrian detection and face detection by utilizing a target detection technology in the deep learning, multiplexes the feature vectors of a detection network to complete the person identity matching of each path of video, automatically selects a group of shots with the best view angle for each person appearing in a scene, and generates a close-up shot of each person in real time under the shots with the best view angle. Even if the optimal visual angle changes due to the fact that the position of the person moves, the method and the device can capture the path with the optimal visual angle in all video streams all the time to process, track the close-up shot of the person all the time, provide the close-up image and/or the video stream of the detected person in real time, and improve the use experience of a user.
Drawings
The following describes embodiments of the present invention in further detail with reference to the accompanying drawings.
FIG. 1: the invention relates to a flow chart of a multi-role close-up shot tracking method;
FIG. 2: the invention discloses a video stream input and output schematic diagram in a multi-role close-up shot tracking method;
FIG. 3: the invention relates to an algorithm function flow chart in a multi-role close-up shot tracking method;
FIG. 4: the invention relates to a deep learning training schematic diagram in a multi-role close-up shot tracking method;
FIG. 5: the invention discloses a task identity matching schematic diagram in a multi-role close-up shot tracking method.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
Example one
As shown in fig. 1 to 5, the method for tracking a multi-angular close-up shot of the present invention comprises the following steps:
1. in certain scenarios (e.g., kindergarten, nursery, nursing home, etc.), multiple surveillance cameras capture video from multiple perspectives.
2. Inputting the multi-channel video data acquired in the step 1 into an AI inference edge server for comprehensive processing, wherein the specific processing steps are as shown in FIG. 1: and performing real-time human body detection and face detection on each path of video, so as to obtain human body detection results and face recognition results of corresponding multiple paths of video → performing identity matching on people appearing in each path of video, wherein the matching result is that each person with a specific identity corresponds to the monitoring video under multiple visual angles. → the selection of the optimal view angle for each person with a specific identity is carried out according to the number of key points detected by the human face, the more key points detected by the human face, the better the view angle is, the central region is intercepted for the optimal view angle of each person, and the exclusive 'close-up shot' of each person with a specific identity is obtained. → high-definition repair is carried out for the close-up shot of each specific identity figure, and the video stream output is pushed.
The algorithm work flow is illustrated by taking fig. 3 as an example. Assume that two video streams 1 and 2 are captured by different cameras at two viewing angles, and for the same scene, there are two people: adults and children. Firstly, a deep learning detection algorithm is used for carrying out human body detection and human face detection on videos at two visual angles. Video streams 1 and 2 each get two detection results ID1 and ID 2. The ID detected by the two video streams is matched, and as a result, the ID2 under the video stream 1 corresponds to the ID1 under the video stream 2 and is a person (adult) with the same identity, and the ID1 under the video stream 1 corresponds to the ID2 under the video stream 2 and is a person (child) with the same identity. Thus, an adult and a child have shots from two perspectives, for the adult, that appear in both ID1 in video 1 and ID2 in video stream 2. Next, comparing the two lenses, and selecting the best visual angle of the adult according to the number of key points of the face detection, wherein the key points of the face detection comprise a left inner and outer eye angle, a nose heel point, a right inner and outer eye angle, a nose root point, a left nose wing, a right nose wing, a nose septal point, a left lip, a right lip, an upper lip, a lower lip and a front chin point. It is found that there are more key points detectable by the adult under the lens 2, and the visual angle of the lens 2 for the adult is better. Also, the child's angle of view under the lens 1 is better. And then, performing center area interception on the optimal shots of the two characters, and performing push video streaming.
Implementation details:
and respectively carrying out human body detection and face detection on the multi-path video data: the target detection network using deep learning as the core is not limited to single-stage, double-stage or anchor free, anchor base and other frameworks.
The training process is as shown in fig. 4, a large amount of image data with a character position label box is needed, a deep learning model is input, the model takes a CNN (continuous neural network) as a main network, image features are gradually extracted, finally predictions of a plurality of human body/human face position boxes and predictions of corresponding categories are output, loss calculation is performed on the prediction results and the label box, loss values of a batch of data by the network are obtained, and parameters of the deep learning model are updated according to the loss values, so that the position predictions and the category predictions of subsequent data predictions by the model are closer to true values.
Wherein, the loss function of the position prediction is smooth L1 loss:
Figure BDA0002784899120000051
in the formula, x1And x2Both are the difference between the position prediction and the real position, and the value ranges of the position prediction and the real position are respectively-1<x1<1,x2>1 or x2<-1。
The loss function for class prediction is cross entropy loss:
Figure BDA0002784899120000052
in the formula, y'iIndex data tag, yiRefers to the prediction probability value. It can be seen that human detection/face detection is a training process for multi-task learning.
And respectively carrying out identity ID matching on people appearing in each path of video according to the results of human body detection and human face detection: in order to match the identities of people in different video streams (visual angles), the invention adopts a mode of calculating human body detection/human face detection and outputting the Euler distance of features. The specific method is as shown in fig. 5, the detection model performs inference calculation on different videos to obtain corresponding output features, and performs euler distance calculation on features corresponding to different video stream detection results to perform person identity matching. Essentially, two euler distances between a plurality of feature vectors are calculated, and the euler distance calculation formula is as follows:
Figure BDA0002784899120000053
in the formula, miAnd niAre elements of any two sets of feature vectors in different video streams.
The above description is only a preferred embodiment of the multi-angular close-up tracing method disclosed in the present invention, and is not intended to limit the scope of the embodiments of the present disclosure. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the embodiments of the present disclosure should be included in the protection scope of the embodiments of the present disclosure.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The embodiments in the present specification are all described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

Claims (10)

1. A method for multi-angular close-up shot tracking, comprising the steps of:
acquiring multi-channel video data, constructing a deep learning model based on a CNN network, and respectively carrying out human body detection and face detection on the multi-channel video data through the deep learning model;
respectively carrying out identity matching on people appearing in each path of video according to the human body detection result and the human face detection result;
and respectively selecting the lens with the optimal view angle for different identity characters, and pushing the image and/or video stream with the optimal view angle corresponding to the different identity characters.
2. The method for multi-character close-up shot tracking according to claim 1, wherein before the step of pushing the optimal perspective images and/or video streams corresponding to different identity characters, the method further comprises the steps of intercepting the corresponding central area of each of the different identity characters within the optimal perspective shot, and performing high-definition image restoration on the intercepted area.
3. The method as claimed in claim 1, wherein the shots with the best view angle are shots with the largest number of key points detected by the face in the multi-path video.
4. The method as claimed in claim 3, wherein the key points of face detection include left inner and outer corners of eyes, nose heel point, right inner and outer corners of eyes, nose root point, left nose wing, right nose wing, nose diaphragm point, left and right lips, upper and lower lips, and mental front point.
5. The method for multi-angular close-up shot tracking according to claim 1, wherein the steps of performing human body detection and human face detection on the multi-channel video data respectively comprise:
constructing a deep learning model based on a CNN network, extracting image features from multi-path video data, and completing prediction of a plurality of position frames and category prediction;
the method comprises the steps that loss calculation is carried out on a plurality of position frame predictions and category predictions and a label frame respectively to obtain corresponding loss values; and updating parameters of the deep learning model according to the loss value.
6. A multi-feature close-up tracing method according to claim 5, wherein said position prediction uses a loss function of smooth L1 loss:
Figure FDA0002784899110000011
in the formula, x1And x2Both are the difference between the position prediction and the real position, and the value ranges of the position prediction and the real position are respectively-1<x1<1,x2>1 or x2<-1。
7. A method of multi-feature close-up tracing according to claim 5, wherein said classification predicted loss function is a cross-entropy function:
Figure FDA0002784899110000021
in the formula, y'iIndex data tag, yiRefers to the prediction probability value.
8. The method for multi-angular close-up shot tracking according to claim 1, wherein the specific method for respectively matching the identities of the people in each video according to the results of human body detection and human face detection comprises:
calling a pre-trained feature vector extraction model, and extracting feature vectors of characters in each path of video from the video stream;
calculating Euler distance between every two feature vectors;
according to the calculated Euler distance, obtaining a similarity result of people in each path of video;
and matching the identities of the people in each path of video according to the similarity result.
9. A method of multi-feature close-up tracing, according to claim 8, wherein said euler distance is calculated by the formula:
Figure FDA0002784899110000022
in the formula, miAnd niAre elements of any two sets of feature vectors in different video streams.
10. The method of claim 1, wherein the multiple video data is captured from different angles by a plurality of surveillance capture devices.
CN202011294296.4A 2020-11-18 2020-11-18 Multi-role close-up shot tracking method Active CN112287877B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011294296.4A CN112287877B (en) 2020-11-18 2020-11-18 Multi-role close-up shot tracking method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011294296.4A CN112287877B (en) 2020-11-18 2020-11-18 Multi-role close-up shot tracking method

Publications (2)

Publication Number Publication Date
CN112287877A true CN112287877A (en) 2021-01-29
CN112287877B CN112287877B (en) 2022-12-02

Family

ID=74397916

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011294296.4A Active CN112287877B (en) 2020-11-18 2020-11-18 Multi-role close-up shot tracking method

Country Status (1)

Country Link
CN (1) CN112287877B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114542874A (en) * 2022-02-23 2022-05-27 常州工业职业技术学院 Device for automatically adjusting photographing height and angle and control system thereof

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108564052A (en) * 2018-04-24 2018-09-21 南京邮电大学 Multi-cam dynamic human face recognition system based on MTCNN and method
CN109117803A (en) * 2018-08-21 2019-01-01 腾讯科技(深圳)有限公司 Clustering method, device, server and the storage medium of facial image
CN109543560A (en) * 2018-10-31 2019-03-29 百度在线网络技术(北京)有限公司 Dividing method, device, equipment and the computer storage medium of personage in a kind of video
CN109829436A (en) * 2019-02-02 2019-05-31 福州大学 Multi-face tracking method based on depth appearance characteristics and self-adaptive aggregation network
CN109919097A (en) * 2019-03-08 2019-06-21 中国科学院自动化研究所 Face and key point combined detection system, method based on multi-task learning
CN109919977A (en) * 2019-02-26 2019-06-21 鹍骐科技(北京)股份有限公司 A kind of video motion personage tracking and personal identification method based on temporal characteristics
CN110414415A (en) * 2019-07-24 2019-11-05 北京理工大学 Human bodys' response method towards classroom scene
WO2020001082A1 (en) * 2018-06-30 2020-01-02 东南大学 Face attribute analysis method based on transfer learning
CN110852219A (en) * 2019-10-30 2020-02-28 广州海格星航信息科技有限公司 Multi-pedestrian cross-camera online tracking system
CN111401238A (en) * 2020-03-16 2020-07-10 湖南快乐阳光互动娱乐传媒有限公司 Method and device for detecting character close-up segments in video
CN111783749A (en) * 2020-08-12 2020-10-16 成都佳华物链云科技有限公司 Face detection method and device, electronic equipment and storage medium
CN111815675A (en) * 2020-06-30 2020-10-23 北京市商汤科技开发有限公司 Target object tracking method and device, electronic equipment and storage medium

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108564052A (en) * 2018-04-24 2018-09-21 南京邮电大学 Multi-cam dynamic human face recognition system based on MTCNN and method
WO2020001082A1 (en) * 2018-06-30 2020-01-02 东南大学 Face attribute analysis method based on transfer learning
CN109117803A (en) * 2018-08-21 2019-01-01 腾讯科技(深圳)有限公司 Clustering method, device, server and the storage medium of facial image
CN109543560A (en) * 2018-10-31 2019-03-29 百度在线网络技术(北京)有限公司 Dividing method, device, equipment and the computer storage medium of personage in a kind of video
WO2020155873A1 (en) * 2019-02-02 2020-08-06 福州大学 Deep apparent features and adaptive aggregation network-based multi-face tracking method
CN109829436A (en) * 2019-02-02 2019-05-31 福州大学 Multi-face tracking method based on depth appearance characteristics and self-adaptive aggregation network
CN109919977A (en) * 2019-02-26 2019-06-21 鹍骐科技(北京)股份有限公司 A kind of video motion personage tracking and personal identification method based on temporal characteristics
CN109919097A (en) * 2019-03-08 2019-06-21 中国科学院自动化研究所 Face and key point combined detection system, method based on multi-task learning
CN110414415A (en) * 2019-07-24 2019-11-05 北京理工大学 Human bodys' response method towards classroom scene
CN110852219A (en) * 2019-10-30 2020-02-28 广州海格星航信息科技有限公司 Multi-pedestrian cross-camera online tracking system
CN111401238A (en) * 2020-03-16 2020-07-10 湖南快乐阳光互动娱乐传媒有限公司 Method and device for detecting character close-up segments in video
CN111815675A (en) * 2020-06-30 2020-10-23 北京市商汤科技开发有限公司 Target object tracking method and device, electronic equipment and storage medium
CN111783749A (en) * 2020-08-12 2020-10-16 成都佳华物链云科技有限公司 Face detection method and device, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
肖旭章: ""基于人脸跟踪识别的多摄像头调度***的研究与实现"", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114542874A (en) * 2022-02-23 2022-05-27 常州工业职业技术学院 Device for automatically adjusting photographing height and angle and control system thereof

Also Published As

Publication number Publication date
CN112287877B (en) 2022-12-02

Similar Documents

Publication Publication Date Title
Sabir et al. Recurrent convolutional strategies for face manipulation detection in videos
EP3467707B1 (en) System and method for deep learning based hand gesture recognition in first person view
Laraba et al. 3D skeleton‐based action recognition by representing motion capture sequences as 2D‐RGB images
CN105590091B (en) Face recognition method and system
US20170032182A1 (en) System for adaptive real-time facial recognition using fixed video and still cameras
CN112530019B (en) Three-dimensional human body reconstruction method and device, computer equipment and storage medium
Carneiro et al. Fight detection in video sequences based on multi-stream convolutional neural networks
CN109063580A (en) Face identification method, device, electronic equipment and storage medium
WO2022120843A1 (en) Three-dimensional human body reconstruction method and apparatus, and computer device and storage medium
Shah et al. Multi-view action recognition using contrastive learning
CN110348371A (en) Human body three-dimensional acts extraction method
CN112906520A (en) Gesture coding-based action recognition method and device
Zhou et al. A study on attention-based LSTM for abnormal behavior recognition with variable pooling
CN112287877B (en) Multi-role close-up shot tracking method
Badhe et al. Artificial neural network based indian sign language recognition using hand crafted features
Kaur et al. Violence detection in videos using deep learning: A survey
Tur et al. Isolated sign recognition with a siamese neural network of RGB and depth streams
CN110543813B (en) Face image and gaze counting method and system based on scene
Xu et al. Beyond two-stream: Skeleton-based three-stream networks for action recognition in videos
CN104751144B (en) A kind of front face fast appraisement method of facing video monitoring
Uçan et al. Deepfake and security of video conferences
Tang et al. A Survey on Human Action Recognition based on Attention Mechanism
Deotale et al. Optimized hybrid RNN model for human activity recognition in untrimmed video
CN115393963A (en) Motion action correcting method, system, storage medium, computer equipment and terminal
Liu et al. User-generated-video summarization using sparse modelling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20210714

Address after: 215000 room d303, building g-1, shazhouhu science and Technology Innovation Park, Huachang Road, yangshe Town, Zhangjiagang City, Suzhou City, Jiangsu Province

Applicant after: Suzhou aikor Intelligent Technology Co.,Ltd.

Address before: 201601 building 6, 351 sizhuan Road, Sijing Town, Songjiang District, Shanghai

Applicant before: Shanghai Sike Intelligent Technology Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant