CN108171148A - The method and system that a kind of lip reading study cloud platform is established - Google Patents
The method and system that a kind of lip reading study cloud platform is established Download PDFInfo
- Publication number
- CN108171148A CN108171148A CN201711432189.1A CN201711432189A CN108171148A CN 108171148 A CN108171148 A CN 108171148A CN 201711432189 A CN201711432189 A CN 201711432189A CN 108171148 A CN108171148 A CN 108171148A
- Authority
- CN
- China
- Prior art keywords
- lip reading
- cloud platform
- data
- lip
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Abstract
The present invention provides the method and system that a kind of lip reading study cloud platform is established, including:Lip reading is obtained, the lip reading includes lip tongue action and corresponding sentence;The lip reading is extracted, the lip tongue action is divided into image data and the sentence is divided into voice data, and described image data and voice data transmission to lip reading study cloud platform are carried out data training;Trained data are stored on the host node of lip reading study cloud platform setting, form tranining database;The distributed of lip reading study cloud platform is built, and as needed will be on other nodes of the data organization of tranining database to lip reading study cloud platform.The present invention, to improve the efficiency of lip reading study, promotes the development of lip reading study by promoting the accuracy of labiomaney model extraction sentence.
Description
Technical field
Embodiment of the present invention be related to communication technique field more particularly to a kind of lip reading study cloud platform establish method and
System.
Background technology
Labiomaney has played very crucial effect in the exchange of the mankind and speech understanding, when a phoneme saying in people
When dubbing in words video is another different phoneme that someone says, hearer can perceive the different phoneme of third.
In implementing the present invention, it may, inventor has found the prior art, at least there are the following problems:
Labiomaney is a well-known difficult task for the mankind.Tongue and tooth in addition to lip and sometimes,
Most of labiomaney signals are all obscure, it is difficult to be differentiated in the case of no linguistic context.
Therefore, the automation for realizing labiomaney is a critically important target.Machine lip-read device has very big practical potentiality, than
It such as can be applied to improve hearing aid, the mute dictation of public space, secret dialogue, the speech recognition in noisy environment, biology
Feature recognition and the processing of silent film film.Machine labiomaney is highly difficult, since it is desired that extracting space-time characteristic from video, such as position
The features such as put and move.Although deep learning method attempts to extract these features by mode end to end.But it is all
There is work all only to perform the classification of single word rather than the sequence prediction of sentence surface.
It is all by manually being trained under line and online learning software that current lip reading study is most of.But the mankind
Language receives the influences such as areal variation, nationality's difference, and there is the presence of dialect in each place.Meanwhile the artificial training under line
Standard language based on official, the applicability in each region be far from anticipation it is so high, therefore, study when be unable to reach
Expected effect.And online lip reading learning software, also all only consider the standard language of official, and it is the side in view of place
Speech.Moreover, for the content extracted, be all based on for the word in a word, there is no from entire sentence surface come into
Row prediction, therefore, there is the drawbacks of very big, expected requirement is also not achieved in the accuracy rate of extraction.
It should be noted that the introduction of technical background is intended merely to above it is convenient technical scheme of the present invention is carried out it is clear,
Complete explanation, and facilitate the understanding of those skilled in the art and illustrate.Cannot merely because these schemes the present invention
Background technology part is expounded and thinks that above-mentioned technical proposal is known to those skilled in the art.
Invention content
In view of the above-mentioned problems, a kind of method for being designed to provide lip reading study cloud platform and establishing of embodiment of the present invention
And system, by promoting the accuracy of labiomaney model extraction sentence, to improve the efficiency of lip reading study, promote the hair of lip reading study
Exhibition.
To achieve the above object, embodiment of the present invention provides a kind of method that lip reading study cloud platform is established, including:It obtains
Lip reading is taken, the lip reading includes lip tongue action and corresponding sentence;The lip reading is extracted, by the lip tongue
Head action is divided into image data and the sentence is divided into voice data, and by described image data and voice data transmission
Data training is carried out to lip reading study cloud platform;Trained data are stored in the host node of lip reading study cloud platform setting
On, form tranining database;The distributed of lip reading study cloud platform is built, and as needed by training data
In the data organization in library to other nodes of lip reading study cloud platform.
Further, the method further includes:It carries out distributed system hardware platform to build, at least builds two nodes,
Each node includes central processing unit CPU and graphics processing unit GPU;Bottom process communication uses gRPC Support Libraries, uses
The tool that Tensorflow is provided defines the cluster_spec numbers of cluster, and the more mode cards of multimachine are configured.
Further, the lip reading is extracted, specially:The lip reading is extracted by Tensorflow.
Further, described image data and voice data transmission to lip reading study cloud platform are subjected to data training, shape
Include into tranining database:The lip tongue action is divided into image data and the sentence is divided into voice data;
By data according to the partitioning algorithm of data correlation model, voice data is packaged into training mission with image data, is assigned to difference
In working node;Each working node is assigned to by CPU in multiple GPU, after GPU completes training mission every time, sends training number
According to CPU, CPU calculates average training data, undated parameter;After the completion of single node training mission, with the forms of broadcasting to lip reading
Learn other node transmission datas in cloud platform, and wait for the training data of other nodes;All nodes complete calculating task
Afterwards, final training data is stored by the Master nodes set, forms tranining database.
Further, the neural network framework of the lip reading study cloud platform, selects convolutional neural networks structure, selection
128 convolution kernels, 16 layers of convolutional layer, wherein, the layer name of 16 layers of convolutional layer and description are defined as:Init, netinit;
Conv1 realizes that convolution and rectification linearly activate;Pool1, maximum pond;Norm1, local acknowledgement's normalization;Conv2 realizes volume
Product and rectification linearly activate;Pool2, maximum pond;Som, self-organizing structures input layer;Som2, self-organizing structures output layer;
Norm2, local acknowledgement's normalization;Hand1 artificially increases network disturbance according to intermediate result;Conv3 realizes convolution and whole
Cleanliness activates;Pool3, maximum pond;Re, the residual computations that recurrence changes;Local3, the full connection linearly activated based on amendment
Layer;Local4, the full articulamentum linearly activated based on amendment;And softmax_linear, linear transformation is carried out to export
logits。
Further, in the convolutional neural networks structure, including:Feedback self-oscilation mechanism allows cross-layer to transmit information, tool
Body backs towards hand1 layers of transmission residual information for pool3 layers.
Further, in the convolutional neural networks structure, including:There is recursive structure at re, specially using Elman nets
Network structure, hand1, conv3, pool3, re layers as a hidden layer progress recursive feedback.
Further, in the convolutional neural networks structure, including:There is recursive structure at re, specially using Elman nets
Network structure, hand1, conv3, pool3, re layers as a hidden layer progress recursive feedback.
To achieve the above object, embodiment of the present invention also provides a kind of lip reading study cloud platform system, including:It obtains single
Member, for obtaining lip reading, the lip reading includes lip tongue action and corresponding sentence;Extraction unit, for by the mouth
Lip tongue action is divided into image data and the sentence is divided into voice data, and by described image data and voice data
The working node for being transmitted to lip reading study cloud platform carries out data training;Host node is set, for trained data to be stored
On the host node of lip reading study cloud platform setting, tranining database is formed;Unit is built, for building lip reading study cloud platform
Distributed, and as needed by the data organization of tranining database to lip reading study cloud platform other section
Point on.
Therefore the method and system of a kind of lip reading study cloud platform foundation that embodiment of the present invention provides, it utilizes
The labiomaney model of Tensorflow carrys out the extraction into line statement, compared to the previous extraction for word, has higher
Accuracy.Meanwhile the idea for building cloud platform facilitates user to be learnt at any time, is also convenient for being handed over the people of other study
Stream.
Description of the drawings
It, below will be to embodiment in order to illustrate more clearly of embodiment of the present invention or technical solution of the prior art
Or attached drawing needed to be used in the description of the prior art is simply introduced one by one, it should be apparent that, the accompanying drawings in the following description is
Some embodiments of the present invention, for those of ordinary skill in the art, without creative efforts, also
Other attached drawings can be obtained according to these attached drawings.
Fig. 1 is the flow diagram that the lip reading that embodiment of the present invention provides learns the method that cloud platform is established;
Fig. 2 is the neural network structure schematic diagram of hand1 layers and re interlayers that embodiment of the present invention provides.
Specific embodiment
Purpose, technical scheme and advantage to make embodiment of the present invention are clearer, implement below in conjunction with the present invention
The technical solution in embodiment of the present invention is clearly and completely described in attached drawing in mode, it is clear that described reality
The mode of applying is the embodiment of a part of embodiment of the present invention rather than whole.Based on the embodiment in the present invention, ability
The every other embodiment that domain those of ordinary skill is obtained without creative efforts, belongs to the present invention
The range of protection.
Embodiment of the present invention is based on Distributed T ensorflow technologies progress lip reading study cloud platform and builds.
TensorFlow is the second generation artificial intelligence learning system that Google is researched and developed based on DistBelief, and name is from this
The operation logic of body.Tensor (tensor) means N-dimensional array, and Flow (stream) means the calculating based on data flow diagram,
TensorFlow flow to other end calculating process for tensor from one end of flow graph.TensorFlow is by complicated data structure
It is transmitted to the system that analysis and processing procedure are carried out in artificial intelligence nerve net.
Tensorflow expresses high-level machine learning and calculates, and has significantly simplified first generation system, and have more
Good flexibility and ductility.Mono- spotlights of TensorFlow are to support heterogeneous device Distributed Calculation, it can be each
Automatic running model on platform, the distributed system formed from mobile phone, single cpu/GPU to hundreds and thousands of GPU cards.
TensorFlow can establish labiomaney model from sentence surface, it uses a list end to end independently of speaking
The depth model of people simultaneously learns space-time visual signature and a series model.On GRID corpus, Tensorflow
The sentence model of upper foundation realizes 93.4% accuracy, has been more than veteran mankind's lip reader and before 79.6%
Best accuracy.Therefore, the lip reading reading model based on Tensorflow can effectively improve the accuracy of reading, so as to have
Help be promoted the efficiency of lip reading study.
Tensorflow is trained in a manner of end to end, so as to make the sentence surface independently of speaker
Prediction.Model is run in character level, has used space-time convolutional neural networks (STCNN), LSTM and connectionism time point
Class loses (CTC).
It is on the data set GRID corpus (Cookeetal., 2006) of disclosed sentence surface the experimental results showed that
The sentence surface model that Tensorflow is established can reach the word accuracy of 93.4% sentence surface.Correspondingly, before
The optimum of the word classification version independently of speaker in this task is 79.6%.It the performance of Tensorflow and listens
Feel that the performance of the people of impaired meeting lip-read compares.On average, performances of the Tensorflow on identical sentence is meeting
1.78 times of the people of lip-read language.
In order to implement the present invention method, need carry out distributed system hardware platform build, such as build two nodes,
It needs to have:
1. each node is CPUi57400, GPUGTX1050;
2. with interchanger, bandwidth 100G.
Bottom process communication uses gRPC Support Libraries, and the tool provided using Tensorflow defines cluster
The more mode cards of multimachine are configured in cluster_spec numbers.
Current these are only is enumerated, and is not restricted and is used.
Embodiment of the present invention provides a kind of method that lip reading study cloud platform is established.Referring to Fig. 1, the method is big
Cause can be divided into backstage training stage and service offer stage, specifically may comprise steps of:
Step S1:Lip reading is obtained, the lip reading includes lip tongue action and corresponding sentence.
Step S2:The lip reading is extracted by Tensorflow, the lip tongue action is divided into image
Data and the sentence are divided into voice data, and described image data and voice data transmission to lip reading are learnt cloud platform
Carry out data training.
In present embodiment, in the backstage training stage, the situation of data flow is:By data drawing according to data correlation model
Divide algorithm, voice data is packaged into task with image data, is assigned in different operating node.Each working node by CPU (in
Central Processing Unit) it is assigned in multiple GPU (graphics processing unit), after GPU completes calculating task every time, CPU is sent data to,
CPU calculates average data, undated parameter.After the completion of single node task, with the forms of broadcasting to lip reading study cloud platform in its
He sends training data, and wait for the data of other nodes by node.After all nodes complete calculating task, by the Master set
Node stores final training data.
The neural network framework of lip reading study cloud platform, selection convolutional neural networks structure, 128 convolution kernels of selection, 16
Layer convolutional layer, each layer name is with being described as follows:
It is worth noting that, hand1 layers with the neural network structures of re interlayers as shown in Fig. 2, in embodiment of the present invention
To important improvement at network structure at least following three:
First, feedback self-oscilation mechanism is added in self-organization rule, cross-layer is allowed to transmit information, pool3 layers back towards hand1 layers
Transmit residual information;
2nd, there is recursive structure at re, using Elman network structures, hand1, conv3, pool3, re layers are used as one
Hidden layer carries out recursive feedback.
3rd, part SOM network structures, som, som2 layers are self-organizing structures input layer, self-organizing structures output layer respectively,
Som layers of neuron are arranged in two-dimensional space in a matrix fashion, and there are one weight vectors for each neuron, and som layers defeated in receiving
After incoming vector, som2 layers of neuron can be activated, can be adjusted between som2 neurons according to the method that self-organizing network is trained
It is whole.
Step S3:Trained data are stored on the Master nodes of lip reading study cloud platform setting, form training
Database.
In the present embodiment, after all nodes complete calculating task, final instruction is stored by the Master nodes set
Practice data, form tranining database.
Step S4:The distributed of lip reading study cloud platform is built using Hadoop, and as needed will
In the data organization of tranining database to other nodes of lip reading study cloud platform.
In the present embodiment, the offer stage is being serviced, the dump of trained data elder generation is on Master nodes, Ran Houli
Distributed is built with Hadoop, data are effectively organized on several Master, Slave nodes.
Embodiment of the present invention also provides lip reading study cloud platform system, including:
Acquiring unit, for obtaining lip reading, the lip reading includes lip tongue action and corresponding sentence;
Extraction unit, for the lip tongue action being divided into image data and the sentence is divided into voice number
According to, and the working node of described image data and voice data transmission to lip reading study cloud platform is subjected to data training;
Host node is set, for trained data to be stored in the host node of lip reading study cloud platform setting, is formed
Tranining database;
Unit is built, for building the distributed of lip reading study cloud platform, and as needed will training
In the data organization of database to other nodes of lip reading study cloud platform.
Wherein,
The extraction unit extracts the lip reading by Tensorflow;The lip tongue action is divided into
Image data and the sentence are divided into voice data;By data according to the partitioning algorithm of data correlation model, voice data
Training mission is packaged into image data, is assigned in different operating node;Each working node is assigned to multiple GPU by CPU
In, after GPU completes training mission every time, training data is sent to CPU, CPU calculates average training data, undated parameter;Work as list
After the completion of node training mission, with other node transmission datas of the forms of broadcasting into lip reading study cloud platform, and other are waited for
The training data of node;
The setting host node stores training data after all nodes complete calculating task, forms tranining database.
The method that the particular technique details and lip reading study cloud platform that above-mentioned lip reading study cloud platform system is related to are established
In it is similar, therefore no longer specifically repeat.
Therefore the method and system of a kind of lip reading study cloud platform foundation that embodiment of the present invention provides, it utilizes
The labiomaney model of Tensorflow carrys out the extraction into line statement, compared to the previous extraction for word, has higher
Accuracy.Meanwhile the idea for building cloud platform facilitates user to be learnt at any time, is also convenient for being handed over the people of other study
Stream.
Each embodiment in this specification is described by the way of progressive, identical similar between each embodiment
Just to refer each other for part, what each embodiment stressed is the difference with other embodiment.
Finally it should be noted that:Ability is supplied to the purpose described to the description of the various embodiments of the present invention above
Field technique personnel.It is not intended to exhaustive or is not intended to and limits the invention to single disclosed embodiment.As above institute
It states, various replacements of the invention and variation will be apparent for above-mentioned technology one of ordinary skill in the art.Therefore,
Although having specifically discussed some alternative embodiments, other embodiment will be apparent or ability
Field technique personnel relatively easily obtain.The present invention is directed to include having discussed herein all replacements of the present invention, modification and
Change and fall the other embodiment in the spirit and scope of above-mentioned application.
Claims (10)
1. a kind of method that lip reading study cloud platform is established, which is characterized in that including:
Lip reading is obtained, the lip reading includes lip tongue action and corresponding sentence;
The lip reading is extracted, the lip tongue action is divided into image data and the sentence is divided into voice
Data, and described image data and voice data transmission to lip reading study cloud platform are subjected to data training;
Trained data are stored on the host node of lip reading study cloud platform setting, form tranining database;
The distributed of lip reading study cloud platform is built, and as needed arrives the data organization of tranining database
On other nodes of lip reading study cloud platform.
2. the method that lip reading study cloud platform according to claim 1 is established, which is characterized in that the method further includes:
Carry out distributed system hardware platform to build, at least build two nodes, each node include central processing unit CPU and
Graphics processing unit GPU;
Bottom process communication uses gRPC Support Libraries, the tool provided using Tensorflow, defines the cluster_ of cluster
The more mode cards of multimachine are configured in spec numbers.
3. the method that lip reading study cloud platform according to claim 2 is established, which is characterized in that carried to the lip reading
It takes, specially:
The lip reading is extracted by Tensorflow.
4. the method that lip reading according to claim 3 study cloud platform is established, which is characterized in that by described image data and
Voice data transmission to lip reading study cloud platform carries out data training, forms tranining database and includes:
The lip tongue action is divided into image data and the sentence is divided into voice data;
By data according to the partitioning algorithm of data correlation model, voice data is packaged into training mission with image data, is assigned to
In different operating node;
Each working node is assigned to by CPU in multiple GPU, after GPU completes training mission every time, sends training data to CPU,
CPU calculates average training data, undated parameter;
After the completion of single node training mission, with the forms of broadcasting to lip reading study cloud platform in other node transmission datas, and
Wait for the training data of other nodes;
After all nodes complete calculating task, final training data is stored by the Master nodes set, forms training data
Library.
5. the method that lip reading study cloud platform according to claim 4 is established, which is characterized in that the lip reading study cloud is put down
The neural network framework of platform, selection convolutional neural networks structure, 128 convolution kernels of selection, 16 layers of convolutional layer, wherein, 16 layers of volume
The layer name of lamination and description are defined as:Init, netinit;Conv1 realizes that convolution and rectification linearly activate;
Pool1, maximum pond;Norm1, local acknowledgement's normalization;Conv2 realizes that convolution and rectification linearly activate;Pool2, it is maximum
Pond;Som, self-organizing structures input layer;Som2, self-organizing structures output layer;Norm2, local acknowledgement's normalization;Hand1, according to
Intermediate result artificially increases network disturbance;Conv3 realizes that convolution and rectification linearly activate;Pool3, maximum pond;Re, recurrence
The residual computations of change;Local3, the full articulamentum linearly activated based on amendment;Local4 is linearly activated complete based on amendment
Articulamentum;And softmax_linear, linear transformation is carried out to export logits.
6. the method that lip reading study cloud platform according to claim 5 is established, which is characterized in that the convolutional neural networks
In structure, including:Feedback self-oscilation mechanism allows cross-layer to transmit information, and specially pool3 layers backs towards hand1 layers and transmit residual error letter
Breath.
7. the method that lip reading study cloud platform according to claim 5 is established, which is characterized in that the convolutional neural networks
In structure, including:There is recursive structure at re, specially using Elman network structures, hand1, conv3, pool3, re layers of work
Recursive feedback is carried out for a hidden layer.
8. the method that lip reading study cloud platform according to claim 5 is established, which is characterized in that the convolutional neural networks
In structure, including:SOM network structures, specially som, som2 layers are self-organizing structures input layer, self-organizing structures output respectively
Layer, som layers of neuron are arranged in two-dimensional space in a matrix fashion, and there are one weight vectors for each neuron, and som layers are receiving
After input vector, som2 layers of neuron can be activated, can be carried out between som2 neurons according to the method that self-organizing network is trained
Adjustment.
9. a kind of lip reading learns cloud platform system, which is characterized in that including:
Acquiring unit, for obtaining lip reading, the lip reading includes lip tongue action and corresponding sentence;
Extraction unit, for the lip tongue action being divided into image data and the sentence is divided into voice data,
And the working node of described image data and voice data transmission to lip reading study cloud platform is subjected to data training;
Host node is set, for trained data to be stored in the host node of lip reading study cloud platform setting, forms training
Database;
Unit is built, for building the distributed of lip reading study cloud platform, and as needed by training data
In the data organization in library to other nodes of lip reading study cloud platform.
10. lip reading according to claim 9 learns cloud platform system, which is characterized in that the extraction unit passes through
Tensorflow extracts the lip reading;The lip tongue action is divided into image data and the sentence divides
For voice data;By data according to the partitioning algorithm of data correlation model, voice data is packaged into training mission with image data,
It is assigned in different operating node;Each working node is assigned to by CPU in multiple GPU, after GPU completes training mission every time,
Training data is sent to CPU, CPU calculates average training data, undated parameter;After the completion of single node training mission, with broadcast
Other node transmission datas of form into lip reading study cloud platform, and wait for the training data of other nodes;
The setting host node stores training data after all nodes complete calculating task, forms tranining database.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711432189.1A CN108171148A (en) | 2017-12-26 | 2017-12-26 | The method and system that a kind of lip reading study cloud platform is established |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711432189.1A CN108171148A (en) | 2017-12-26 | 2017-12-26 | The method and system that a kind of lip reading study cloud platform is established |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108171148A true CN108171148A (en) | 2018-06-15 |
Family
ID=62520954
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711432189.1A Withdrawn CN108171148A (en) | 2017-12-26 | 2017-12-26 | The method and system that a kind of lip reading study cloud platform is established |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108171148A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109524006A (en) * | 2018-10-17 | 2019-03-26 | 天津大学 | A kind of standard Chinese lip reading recognition methods based on deep learning |
CN111988652A (en) * | 2019-05-23 | 2020-11-24 | 北京地平线机器人技术研发有限公司 | Method and device for extracting lip language training data |
CN113033098A (en) * | 2021-03-26 | 2021-06-25 | 山东科技大学 | Ocean target detection deep learning model training method based on AdaRW algorithm |
CN113239902A (en) * | 2021-07-08 | 2021-08-10 | 中国人民解放军国防科技大学 | Lip language identification method and device for generating confrontation network based on double discriminators |
-
2017
- 2017-12-26 CN CN201711432189.1A patent/CN108171148A/en not_active Withdrawn
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109524006A (en) * | 2018-10-17 | 2019-03-26 | 天津大学 | A kind of standard Chinese lip reading recognition methods based on deep learning |
CN109524006B (en) * | 2018-10-17 | 2023-01-24 | 天津大学 | Chinese mandarin lip language identification method based on deep learning |
CN111988652A (en) * | 2019-05-23 | 2020-11-24 | 北京地平线机器人技术研发有限公司 | Method and device for extracting lip language training data |
CN111988652B (en) * | 2019-05-23 | 2022-06-03 | 北京地平线机器人技术研发有限公司 | Method and device for extracting lip language training data |
CN113033098A (en) * | 2021-03-26 | 2021-06-25 | 山东科技大学 | Ocean target detection deep learning model training method based on AdaRW algorithm |
CN113033098B (en) * | 2021-03-26 | 2022-05-17 | 山东科技大学 | Ocean target detection deep learning model training method based on AdaRW algorithm |
CN113239902A (en) * | 2021-07-08 | 2021-08-10 | 中国人民解放军国防科技大学 | Lip language identification method and device for generating confrontation network based on double discriminators |
CN113239902B (en) * | 2021-07-08 | 2021-09-28 | 中国人民解放军国防科技大学 | Lip language identification method and device for generating confrontation network based on double discriminators |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7337953B2 (en) | Speech recognition method and device, neural network training method and device, and computer program | |
EP4047598B1 (en) | Voice matching method and related device | |
US11315570B2 (en) | Machine learning-based speech-to-text transcription cloud intermediary | |
Sun et al. | Speech emotion recognition based on DNN-decision tree SVM model | |
JP6873333B2 (en) | Method using voice recognition system and voice recognition system | |
CN108171148A (en) | The method and system that a kind of lip reading study cloud platform is established | |
Nakkiran et al. | Compressing deep neural networks using a rank-constrained topology | |
Kamaruddin et al. | Cultural dependency analysis for understanding speech emotion | |
CN108172218B (en) | Voice modeling method and device | |
CN109887484A (en) | A kind of speech recognition based on paired-associate learning and phoneme synthesizing method and device | |
CN106683661A (en) | Role separation method and device based on voice | |
CN105390141A (en) | Sound conversion method and sound conversion device | |
CN111696572B (en) | Voice separation device, method and medium | |
US20190259384A1 (en) | Systems and methods for universal always-on multimodal identification of people and things | |
Ault et al. | On speech recognition algorithms | |
CN111192659A (en) | Pre-training method for depression detection and depression detection method and device | |
WO2021203880A1 (en) | Speech enhancement method, neural network training method, and related device | |
CN108877812B (en) | Voiceprint recognition method and device and storage medium | |
Milde et al. | Using representation learning and out-of-domain data for a paralinguistic speech task. | |
Jansson | Single-word speech recognition with convolutional neural networks on raw waveforms | |
US11100940B2 (en) | Training a voice morphing apparatus | |
CN113555032A (en) | Multi-speaker scene recognition and network training method and device | |
CN115393933A (en) | Video face emotion recognition method based on frame attention mechanism | |
Wu et al. | A sequential contrastive learning framework for robust dysarthric speech recognition | |
Kumar et al. | Designing neural speaker embeddings with meta learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20201111 Address after: 318015 no.2-3167, zone a, Nonggang City, no.2388, Donghuan Avenue, Hongjia street, Jiaojiang District, Taizhou City, Zhejiang Province Applicant after: Taizhou Jiji Intellectual Property Operation Co.,Ltd. Address before: 201616 Shanghai city Songjiang District Sixian Road No. 3666 Applicant before: Phicomm (Shanghai) Co.,Ltd. |
|
TA01 | Transfer of patent application right | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20180615 |
|
WW01 | Invention patent application withdrawn after publication |