CN108647599A - In conjunction with the Human bodys' response method of 3D spring layers connection and Recognition with Recurrent Neural Network - Google Patents
In conjunction with the Human bodys' response method of 3D spring layers connection and Recognition with Recurrent Neural Network Download PDFInfo
- Publication number
- CN108647599A CN108647599A CN201810394571.6A CN201810394571A CN108647599A CN 108647599 A CN108647599 A CN 108647599A CN 201810394571 A CN201810394571 A CN 201810394571A CN 108647599 A CN108647599 A CN 108647599A
- Authority
- CN
- China
- Prior art keywords
- video
- recognition
- neural network
- recurrent neural
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24147—Distances to closest patterns, e.g. nearest neighbour classification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The present invention discloses a kind of Human bodys' response method of combination 3D spring layers connection and Recognition with Recurrent Neural Network, includes the following steps:Step 1, every section of video is divided into N section, from every extracting section L frame pictures, N, L are natural number;Step 2, space-time characteristic extraction is carried out to video using trained 3D convolutional neural networks, and the space-time characteristic of different levels is connected in series with to obtain high dimensional feature vector;Step 3, standardization processing is carried out to the high dimensional feature vector that step 2 obtains;Step 4, the high dimensional feature vector after standardization processing in step 3 is sent into Recognition with Recurrent Neural Network, carries out Fusion Features;Step 5, classify to the feature after being merged in step 4, obtain the corresponding action classification of video.Such method need not manually extract low layer movable information, compare artificial sport feature design method, the present invention has better robustness, while the video information of long period can be effectively treated.
Description
Technical field
The invention belongs to Computer Vision Recognition technical field, more particularly to a kind of combination 3D convolutional layer spring layers are connected and are followed
The Human bodys' response method of ring neural network.
Background technology
Since Human bodys' response has important application prospect in fields such as video monitoring, human-computer interaction, virtual realities
And market value, therefore the identification of the human action based on video has become one of research hotspot in computer vision.Meanwhile with
The especially convolutional neural networks of deep learning achieve effective achievement in computer vision, based on convolutional neural networks
The concern of the human body behavior person of being studied much.
Patent No. CN201611117772.9's《Activity recognition side based on track and convolutional neural networks feature extraction
Method》Track is extracted to the image/video data of input first, recycles convolutional neural networks to extract convolution feature, then in conjunction with rail
The convolution feature of mark and convolutional layer feature extraction based on profile constraints simultaneously extracts stack part Fei Sheer vector characteristics, finally trains
Support vector machine model reaches classification purpose.
Patent No. CN201510527937.9's《Human bodys' response method based on 3D convolutional neural networks》First
The apparent image of screening human body behavioural characteristic simultaneously preserves, then to the image after preservation, extract respectively gray scale, the directions x and y ladder
Degree and light stream amount to five channel informations, and the convolution feature that five channel informations are extracted followed by convolutional neural networks is finally real
Now classify.
Both the above method is required for utilizing video data to extract low-dimensional movable information in advance, can not be directly by original video
Data are sent into network, therefore cannot achieve end-to-end classification prediction.
Patent No. CN201610047682.0's《Activity recognition method based on deep learning and multi-scale information》It is first
Deep video is first split into multiple video-frequency bands, then each video-frequency band is learnt using branch's neural network, then pair simultaneously
The high-rise expression that each neural network branch of row operation learns carries out simple fusion connection, finally indicates the high level after fusion
It is sent into full articulamentum and classification layer carries out Classification and Identification.In the method when the longer video of input duration, it can to melt
Characteristic dimension after conjunction is excessively high, so that the more difficult training of network.
Although in conclusion having more research to the action recognition based on convolutional neural networks both at home and abroad, exists and need
The problems such as carrying out artificial sport information extraction in advance to video data or long-term video can not be handled.
Invention content
The purpose of the present invention is to provide a kind of Human bodys' response side of combination 3D spring layers connection and Recognition with Recurrent Neural Network
Method need not manually extract low layer movable information, compare artificial sport feature design method, and the present invention has better robust
Property, while the video information of long period can be effectively treated.
In order to achieve the above objectives, solution of the invention is:
A kind of Human bodys' response method of combination 3D spring layers connection and Recognition with Recurrent Neural Network, includes the following steps:
Step 1, every section of video is divided into N section, from every extracting section L frame pictures, N, L are natural number;
Step 2, space-time characteristic extraction is carried out to video using trained 3D convolutional neural networks, and by different levels
Space-time characteristic is connected in series with to obtain high dimensional feature vector;
Step 3, standardization processing is carried out to the high dimensional feature vector that step 2 obtains;
Step 4, the high dimensional feature vector after standardization processing in step 3 is sent into Recognition with Recurrent Neural Network, carries out feature and melts
It closes;
Step 5, classify to the feature after being merged in step 4, obtain the corresponding action classification of video.
In above-mentioned steps 1, give up the video if video totalframes is less than 48 frames, if video totalframes cannot be divided exactly by L,
Then give up last several frames.
In above-mentioned steps 1, every section of video is divided into N section, the content from every extracting section L frame pictures is:By one
Video is averagely divided into the parts N=3 by frame number, includes same number of frames per part, and extract L=16 frames from every part equal intervals
Picture.
The detailed process of above-mentioned steps 2 is:
Transfer learning:Using the convolution sum pond layer of trained C3D networks as feature extractor, to every in step 1
A 16 frame input carries out space-time characteristic extraction, obtains the output vector of pool5num dimensions, and carrying out space-time characteristic to entire video carries
It takes, result is indicated with two-dimentional tensor (3, pool5num) after extraction, wherein pool5num indicates feature extractor pond layer 5
Export dimension;
It is connected in series with:Each 16 frame is inputted, by the pond layer 1 of feature extractor, pond layer 2, pond layer 3 and pond layer
5 output is connected in series with, and the feature vector of poolall_num dimensions is obtained, and carrying out feature to entire video is connected in series with behaviour
Make, the result after being connected in series with is indicated with two-dimentional tensor (3, poolall_num), wherein poolall_num=pool1num+
Pool2num+pool3num+pool5num, pool1num, pool2num, pool3num indicate feature extractor pond layer respectively
1, the output dimension of pond layer 2, pond layer 3.
In above-mentioned steps 3, carrying out the detailed process of standardization processing is:
Mean value E [the x of each dimension of high dimensional feature vector in step 2 are sought on entire training set(k)] and variance Var
[x(k)], then each dimension of feature vector is standardized, standardization formula is:
Wherein, x(k)Indicate activation value,Indicate the value after standardization;
Then, with following formula pairIt is converted, obtains passing through γ(k)And β(k)New value y after variation(k), then y(k)Table
Show the characteristic value after standardization processing:
Wherein, γ(k)And β(k)It is Recognition with Recurrent Neural Network parameter, is obtained by e-learning.
In above-mentioned steps 4, the high dimensional feature vector after standardization processing in step 3 is sent into Recognition with Recurrent Neural Network, is carried out
The particular content of Fusion Features is:Two-dimentional tensor (3, poolall_num) after standardization processing is sent into cycle nerve
Network, wherein the time step of Recognition with Recurrent Neural Network is 3, includes a hidden layer, and the neuron number for including in hidden layer is 256.
In above-mentioned steps 5, using multiclass Softmax graders, the output of Recognition with Recurrent Neural Network in step 4 is carried out linear
Classification.
After adopting the above scheme, beneficial effects of the present invention are as follows:
(1) it utilizes C3D networks directly to extract the space time information of video, movable information need not be carried out in advance to video data
Extraction, realizes end-to-end identification method;
(2) it is connected in series with the characteristic information for the different levels extracted by convolution kernel, video is extracted compared to engineer
Low layer movable information, the low layer space time information of convolution kernel output has higher robustness, while more fully;
(3) feature of different levels in feature extractor is connected in series with, obtains the height for including different levels information
Position feature vector, this step can be obviously improved identification accuracy;
(4) standardization processing is carried out to high dimensional feature vector, accelerates network convergence;
(5) further time-domain information fusion is carried out in the feature vector using Recognition with Recurrent Neural Network after standardization so that
Whole network structure can handle prolonged video input.
Description of the drawings
Fig. 1 is the flow chart of the present invention;
Fig. 2 is the network structure of the present invention;
Fig. 3 is Recognition with Recurrent Neural Network detail view.
Specific implementation mode
Below with reference to attached drawing, technical scheme of the present invention and advantageous effect are described in detail.
As shown in Figure 1, the present invention provides a kind of Human bodys' response side of combination 3D spring layers connection and Recognition with Recurrent Neural Network
Method, detailed process are embodied in following steps:
Video segmentation, 3 parts are averagely divided by frame number by a video, and 16 frame pictures are extracted at equal intervals from each part
Form a segment, wherein give up the video if video totalframes is less than 48 frames, if if video totalframes cannot be divided exactly by 3,
Then give up last several frames.
After video segmentation, a video is represented by 5 dimension tensors (3,16, H, W, 3), and each 16 frame fragment can indicate
For 4 dimension tensors (16, H, W, 3), wherein 3 expression videos, which are divided evenly, to be indicated for 3 parts, 16 from 16 frame of each extracting section
Picture, H, W respectively represent the length and width dimensions of picture, and 3 indicate the port number of picture, represent RGB pictures here.
Training set video is divided according to mentioned above principle, after division in training set each video be expressed as 5 dimension tensors (3,
16, H, W, 3), by each video scaling to 3 × 16 × 128 × 171 × 3 sizes, then each video be represented by 5 dimension tensors (3,
16,128,171,3), 16 each segment frame number is represented, 128,171,3 respectively represent the length and width and port number of every frame picture.
To a video, it converts 5 dimension tensors (3,16,128,171,3) to 34 dimension tensor (16,128,171,3) tables
Show form.
According to previous step, by all training set datas by all becoming (16,128,171,3) form, wherein each video
Including continuous 34 dimension tensors (16,128,171,3).
It averages to all 4 dimension tensors (16,128,171,3) of training set, 4 dimension tensor mean=of the mean value acquired
(16,128,171,3) it indicates.
All segments in training set are subtracted into mean=(16,128,171,3) so that each pixel value divides in training set
For cloth near zero, this step can eliminate influence of the noise to classification.
To a video, by 3 continuous 4 dimension tensors (16,128,171,3) for subtracting mean value be converted into 5 dimension tensors (3,
16,128,171,3).
By all video datas in training set according to previous step, it is converted into the expression of 5 dimension tensors (3,16,128,171,3)
Form, and it is cut to (3,16,112,112,3) size by 5 dimension tensors after average value processing are subtracted.
By treated, video is sent into C3D feature extractors, to each video, 1 16 frame piece of continuous 3 times each feedings
Section is sent into 4 dimension tensors (16,112,112,3) every time, export pool5num dimensional vectors, 2 dimensions of final each video features
It measures (3, pool5num) to indicate, wherein pool5num indicates the output dimension of feature extractor pond layer 5.
To each video, the output of the pond layer 1, pond layer 2, pond layer 3 and pond layer 5 of feature extractor is gone here and there
Connection connection, as shown in Fig. 2, the high dimensional feature after being connected in series with is indicated with two-dimentional tensor (3, poolall_num), wherein
Poolall_num=pool1num+pool2num+pool3num+pool5num, pool1num, pool2num, pool3num points
Not Biao Shi feature extractor pond layer 1, pond layer 2, pond layer 3 output dimension.
Entire training set is sent into feature extractor and is connected in series with operation through row, obtains high dimensional feature training data.
Obtained high dimensional feature training data is sent into Recognition with Recurrent Neural Network, as shown in Figure 2, wherein be sent into cycle god
Through advanced row standardized operation before network, as shown in Fig. 2, the layer of addition standardization here is to accelerate network convergence rate and convergence
Effect.
Standardized operation is made of two steps, first, is standardized to feature, and higher-dimension is sought on entire training set
Mean value E [the x of feature each dimension in pool5num dimensions(k)] and variance Var [x(k)], x then is inputted to each activation(k)
It is standardized,Indicate that the value after standardization, standardization formula are:
Secondly, in order not to change the ability to express of feature vector, with following formula pairIt is converted, is passed through
γ(k)And β(k)New value y after variation(k), then y(k)Indicate the characteristic value after standardization processing:
Wherein, γ(k)And β(k)It is obtained by e-learning.
Using backpropagation, training Recognition with Recurrent Neural Network parameter and parameter γ(k), β(k), obtain trained network.
When predicting input video, a video is averagely divided into 3 parts by frame number, is extracted at equal intervals from each part
16 frame pictures form a segment, then the video is represented by 5 dimension tensors (3,16, H, W, 3).
Video to be predicted (3,16, H, W, 3) is first zoomed into (3,16,128,171,3) size, then each 16 frame is regarded
Frequency subtracts mean value mean=(16,128,171,3), then is cut in every frame center picture, and treated, and video to be predicted can indicate
For 5 dimension tensors (3,16,112,112,3).
Will treated video (3,16,112,112,3) to be predicted is converted into 34 dimension tensors (16,112112,3) and according to
Secondary feeding network, the high dimensional feature (3, poolall_num) after being connected in series with.
The high dimensional feature (3, poolall_num) of video to be predicted is sent into trained BN layers and Recognition with Recurrent Neural Network,
Obtain prediction output.
Above example is merely illustrative of the invention's technical idea, and protection scope of the present invention cannot be limited with this, every
According to technological thought proposed by the present invention, any change done on the basis of technical solution each falls within the scope of the present invention
Within.
Claims (7)
1. a kind of Human bodys' response method of combination 3D spring layers connection and Recognition with Recurrent Neural Network, it is characterised in that including walking as follows
Suddenly:
Step 1, every section of video is divided into N section, from every extracting section L frame pictures, N, L are natural number;
Step 2, space-time characteristic extraction is carried out to video using trained 3D convolutional neural networks, and by the space-time of different levels
Feature is connected in series with to obtain high dimensional feature vector;
Step 3, standardization processing is carried out to the high dimensional feature vector that step 2 obtains;
Step 4, the high dimensional feature vector after standardization processing in step 3 is sent into Recognition with Recurrent Neural Network, carries out Fusion Features;
Step 5, classify to the feature after being merged in step 4, obtain the corresponding action classification of video.
2. as described in claim 1 in conjunction with the Human bodys' response method of 3D spring layers connection and Recognition with Recurrent Neural Network, feature
It is:In the step 1, give up the video if video totalframes is less than 48 frames, if video totalframes cannot be divided exactly by L,
Give up last several frames.
3. as described in claim 1 in conjunction with the Human bodys' response method of 3D spring layers connection and Recognition with Recurrent Neural Network, feature
It is:In the step 1, every section of video is divided into N section, the content from every extracting section L frame pictures is:By a video
It is averagely divided into the parts N=3 by frame number, includes same number of frames per part, and L=16 frame figures are extracted from every part equal intervals
Piece.
4. as claimed in claim 2 in conjunction with the Human bodys' response method of 3D spring layers connection and Recognition with Recurrent Neural Network, feature
It is:The detailed process of the step 2 is:
Transfer learning:Using the convolution sum pond layer of trained C3D networks as feature extractor, in step 1 each 16
Frame input carries out space-time characteristic extraction, obtains the output vector of pool5num dimensions, carries out space-time characteristic extraction to entire video, carries
The two-dimentional tensor (3, pool5num) of rear result is taken to indicate, wherein pool5num indicates the output dimension of feature extractor pond layer 5
Degree;
It is connected in series with:Each 16 frame is inputted, by the pond layer 1 of feature extractor, pond layer 2, pond layer 3 and pond layer 5
Output is connected in series with, and the feature vector of poolall_num dimensions is obtained, and feature series connection attended operation is carried out to entire video,
Result after being connected in series with is indicated with two-dimentional tensor (3, poolall_num), wherein poolall_num=pool1num+
Pool2num+pool3num+pool5num, pool1num, pool2num, pool3num indicate feature extractor pond layer respectively
1, the output dimension of pond layer 2, pond layer 3.
5. as described in claim 1 in conjunction with the Human bodys' response method of 3D spring layers connection and Recognition with Recurrent Neural Network, feature
It is:In the step 3, carrying out the detailed process of standardization processing is:
Mean value E [the x of each dimension of high dimensional feature vector in step 2 are sought on entire training set(k)] and variance Var [x(k)], then each dimension of feature vector is standardized, standardization formula is:
Wherein, x(k)Indicate activation value,Indicate the value after standardization;
Then, with following formula pairIt is converted, obtains passing through γ(k)And β(k)New value y after variation(k), then y(k)Indicate warp
Cross the characteristic value after standardization processing:
Wherein, γ(k)And β(k)It is Recognition with Recurrent Neural Network parameter, is obtained by e-learning.
6. as described in claim 1 in conjunction with the Human bodys' response method of 3D spring layers connection and Recognition with Recurrent Neural Network, feature
It is:In the step 4, the high dimensional feature vector after standardization processing in step 3 is sent into Recognition with Recurrent Neural Network, carries out feature
The particular content of fusion is:Two-dimentional tensor (3, poolall_num) after standardization processing is sent into Recognition with Recurrent Neural Network,
Wherein, the time step of Recognition with Recurrent Neural Network is 3, includes a hidden layer, and the neuron number for including in hidden layer is 256.
7. as described in claim 1 in conjunction with the Human bodys' response method of 3D spring layers connection and Recognition with Recurrent Neural Network, feature
It is:In the step 5, using multiclass Softmax graders, the output of Recognition with Recurrent Neural Network in step 4 is linearly divided
Class.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810394571.6A CN108647599B (en) | 2018-04-27 | 2018-04-27 | Human behavior recognition method combining 3D (three-dimensional) jump layer connection and recurrent neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810394571.6A CN108647599B (en) | 2018-04-27 | 2018-04-27 | Human behavior recognition method combining 3D (three-dimensional) jump layer connection and recurrent neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108647599A true CN108647599A (en) | 2018-10-12 |
CN108647599B CN108647599B (en) | 2022-04-15 |
Family
ID=63747937
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810394571.6A Active CN108647599B (en) | 2018-04-27 | 2018-04-27 | Human behavior recognition method combining 3D (three-dimensional) jump layer connection and recurrent neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108647599B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109961037A (en) * | 2019-03-20 | 2019-07-02 | 中共中央办公厅电子科技学院(北京电子科技学院) | A kind of examination hall video monitoring abnormal behavior recognition methods |
CN109977854A (en) * | 2019-03-25 | 2019-07-05 | 浙江新再灵科技股份有限公司 | Unusual checking analysis system under a kind of elevator monitoring environment |
CN110839156A (en) * | 2019-11-08 | 2020-02-25 | 北京邮电大学 | Future frame prediction method and model based on video image |
CN111460889A (en) * | 2020-02-27 | 2020-07-28 | 平安科技(深圳)有限公司 | Abnormal behavior identification method, device and equipment based on voice and image characteristics |
CN112449155A (en) * | 2020-10-21 | 2021-03-05 | 苏州怡林城信息科技有限公司 | Video monitoring method and system for protecting privacy of personnel |
CN112863482A (en) * | 2020-12-31 | 2021-05-28 | 思必驰科技股份有限公司 | Speech synthesis method and system with rhythm |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106599907A (en) * | 2016-11-29 | 2017-04-26 | 北京航空航天大学 | Multi-feature fusion-based dynamic scene classification method and apparatus |
WO2017211395A1 (en) * | 2016-06-07 | 2017-12-14 | Toyota Motor Europe | Control device, system and method for determining the perceptual load of a visual and dynamic driving scene |
CN107506712A (en) * | 2017-08-15 | 2017-12-22 | 成都考拉悠然科技有限公司 | Method for distinguishing is known in a kind of human behavior based on 3D depth convolutional networks |
CN107811626A (en) * | 2017-09-10 | 2018-03-20 | 天津大学 | A kind of arrhythmia classification method based on one-dimensional convolutional neural networks and S-transformation |
-
2018
- 2018-04-27 CN CN201810394571.6A patent/CN108647599B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017211395A1 (en) * | 2016-06-07 | 2017-12-14 | Toyota Motor Europe | Control device, system and method for determining the perceptual load of a visual and dynamic driving scene |
CN106599907A (en) * | 2016-11-29 | 2017-04-26 | 北京航空航天大学 | Multi-feature fusion-based dynamic scene classification method and apparatus |
CN107506712A (en) * | 2017-08-15 | 2017-12-22 | 成都考拉悠然科技有限公司 | Method for distinguishing is known in a kind of human behavior based on 3D depth convolutional networks |
CN107811626A (en) * | 2017-09-10 | 2018-03-20 | 天津大学 | A kind of arrhythmia classification method based on one-dimensional convolutional neural networks and S-transformation |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109961037A (en) * | 2019-03-20 | 2019-07-02 | 中共中央办公厅电子科技学院(北京电子科技学院) | A kind of examination hall video monitoring abnormal behavior recognition methods |
CN109977854A (en) * | 2019-03-25 | 2019-07-05 | 浙江新再灵科技股份有限公司 | Unusual checking analysis system under a kind of elevator monitoring environment |
CN110839156A (en) * | 2019-11-08 | 2020-02-25 | 北京邮电大学 | Future frame prediction method and model based on video image |
CN111460889A (en) * | 2020-02-27 | 2020-07-28 | 平安科技(深圳)有限公司 | Abnormal behavior identification method, device and equipment based on voice and image characteristics |
CN111460889B (en) * | 2020-02-27 | 2023-10-31 | 平安科技(深圳)有限公司 | Abnormal behavior recognition method, device and equipment based on voice and image characteristics |
CN112449155A (en) * | 2020-10-21 | 2021-03-05 | 苏州怡林城信息科技有限公司 | Video monitoring method and system for protecting privacy of personnel |
CN112863482A (en) * | 2020-12-31 | 2021-05-28 | 思必驰科技股份有限公司 | Speech synthesis method and system with rhythm |
Also Published As
Publication number | Publication date |
---|---|
CN108647599B (en) | 2022-04-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108647599A (en) | In conjunction with the Human bodys' response method of 3D spring layers connection and Recognition with Recurrent Neural Network | |
CN105205475B (en) | A kind of dynamic gesture identification method | |
CN103942751B (en) | A kind of video key frame extracting method | |
CN110070033B (en) | Method for detecting wearing state of safety helmet in dangerous working area in power field | |
CN107527351B (en) | Lactating sow image segmentation method fusing FCN and threshold segmentation | |
US20220051025A1 (en) | Video classification method and apparatus, model training method and apparatus, device, and storage medium | |
CN109635721B (en) | Video human body falling detection method and system based on track weighted depth convolution order pooling descriptor | |
CN105740945B (en) | A kind of people counting method based on video analysis | |
CN106600560B (en) | A kind of image defogging method suitable for automobile data recorder | |
Riche et al. | Dynamic saliency models and human attention: A comparative study on videos | |
CN109101876A (en) | Human bodys' response method based on long memory network in short-term | |
CN110348381A (en) | A kind of video behavior recognition methods based on deep learning | |
CN104021544B (en) | A kind of greenhouse vegetable disease monitor video extraction method of key frame, that is, extraction system | |
CN103279737A (en) | Fight behavior detection method based on spatio-temporal interest point | |
CN107220596A (en) | Estimation method of human posture based on cascade mechanism for correcting errors | |
CN109583331B (en) | Deep learning-based accurate positioning method for positions of wrist vein and mouth of person | |
CN103096185A (en) | Method and device of video abstraction generation | |
CN105046202B (en) | Adaptive recognition of face lighting process method | |
Soeleman et al. | Adaptive threshold for background subtraction in moving object detection using Fuzzy C-Means clustering | |
CN110533026A (en) | The competing image digitization of electricity based on computer vision and icon information acquisition methods | |
Xu et al. | Unusual event detection in crowded scenes using bag of LBPs in spatio-temporal patches | |
CN112949560A (en) | Method for identifying continuous expression change of long video expression interval under two-channel feature fusion | |
CN112418032A (en) | Human behavior recognition method and device, electronic equipment and storage medium | |
CN110009708B (en) | Color development transformation method, system and terminal based on image color segmentation | |
CN108171282A (en) | A kind of blackboard person's handwriting automatic synthesis method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |