CN108986801A - A kind of man-machine interaction method, device and human-computer interaction terminal - Google Patents
A kind of man-machine interaction method, device and human-computer interaction terminal Download PDFInfo
- Publication number
- CN108986801A CN108986801A CN201710408396.7A CN201710408396A CN108986801A CN 108986801 A CN108986801 A CN 108986801A CN 201710408396 A CN201710408396 A CN 201710408396A CN 108986801 A CN108986801 A CN 108986801A
- Authority
- CN
- China
- Prior art keywords
- gesture
- feature
- text
- target
- speech samples
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000003993 interaction Effects 0.000 title claims abstract description 163
- 238000000034 method Methods 0.000 title claims abstract description 81
- 238000012549 training Methods 0.000 claims abstract description 59
- 239000000284 extract Substances 0.000 claims abstract description 12
- 239000002245 particle Substances 0.000 claims description 43
- 230000003044 adaptive effect Effects 0.000 claims description 28
- 230000001133 acceleration Effects 0.000 claims description 27
- 238000012545 processing Methods 0.000 claims description 25
- 238000004422 calculation algorithm Methods 0.000 claims description 22
- 238000001914 filtration Methods 0.000 claims description 14
- 238000010801 machine learning Methods 0.000 claims description 9
- 238000000605 extraction Methods 0.000 claims description 7
- 230000008859 change Effects 0.000 claims description 6
- 241000208340 Araliaceae Species 0.000 claims description 3
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 claims description 3
- 235000003140 Panax quinquefolius Nutrition 0.000 claims description 3
- 238000009826 distribution Methods 0.000 claims description 3
- 235000008434 ginseng Nutrition 0.000 claims description 3
- 230000006872 improvement Effects 0.000 claims description 2
- 241000406668 Loxodonta cyclotis Species 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 20
- 230000002452 interceptive effect Effects 0.000 description 16
- 238000004891 communication Methods 0.000 description 14
- 230000008569 process Effects 0.000 description 13
- 238000005516 engineering process Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 7
- 238000010276 construction Methods 0.000 description 6
- 230000033001 locomotion Effects 0.000 description 6
- 239000011159 matrix material Substances 0.000 description 5
- 210000003205 muscle Anatomy 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 4
- 238000005259 measurement Methods 0.000 description 4
- 230000001737 promoting effect Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 230000004927 fusion Effects 0.000 description 3
- 239000000203 mixture Substances 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 238000012952 Resampling Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 238000003860 storage Methods 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 230000019771 cognition Effects 0.000 description 1
- 238000013016 damping Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 238000003331 infrared imaging Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000011897 real-time detection Methods 0.000 description 1
- 230000003238 somatosensory effect Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000012876 topography Methods 0.000 description 1
- 230000010415 tropism Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/017—Gesture based interaction, e.g. based on a set of recognized hand gestures
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
- G10L15/19—Grammatical context, e.g. disambiguation of the recognition hypotheses based on word sequence rules
- G10L15/197—Probabilistic grammars, e.g. word n-grams
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Artificial Intelligence (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The embodiment of the present invention provides a kind of man-machine interaction method, device and human-computer interaction terminal, this method comprises: obtaining the control information that user conveys, the control information includes voice messaging;Extract the text feature of the voice messaging;Determine the corresponding Text eigenvector of the text feature;According to the Classification of Speech model of pre-training, the matched speech samples of the Text eigenvector are determined;The Classification of Speech model table is shown with ownership probability of the Text eigenvector with corresponding speech samples;Phonetic control command by the corresponding phonetic control command of identified voice sample, as the voice messaging;According to the phonetic control command, target control instruction is generated.The embodiment of the present invention is able to ascend the naturality and intelligence of human-computer interaction, reduces user's threshold of human-computer interaction, to provide strong support for the universal of human-computer interaction.
Description
Technical field
The present invention relates to human-computer interaction technique fields, and in particular to a kind of man-machine interaction method, device and human-computer interaction are whole
End.
Background technique
Human-computer interaction refers to communicates with each other between user and machine, so that machine understands a kind of technology that user is intended to;Tool
Body, by human-computer interaction, user can be by conveying control information to machine, so that machine completes the work that user is intended to.
Human-computer interaction has a wide range of applications in multiple fields, is related to mobile phone control, automatic driving etc., especially with machine
How the development of device people (such as server people) technology, human-computer interaction technology are preferably applied in terms of robot control, at
The key point promoted for robot technology.
It was found by the inventors of the present invention that current human-computer interaction technology urgent problem is how to promote human-computer interaction
Naturality and intelligence so that user's threshold of human-computer interaction reduces, human-computer interaction technology can be widely spread.
Summary of the invention
In view of this, the embodiment of the present invention provides a kind of man-machine interaction method, device and human-computer interaction terminal, to promote people
The naturality and intelligence of machine interaction, reduce user's threshold of human-computer interaction, to provide strong branch for the universal of human-computer interaction
It holds.
To achieve the above object, the embodiment of the present invention provides the following technical solutions:
A kind of man-machine interaction method, comprising:
The control information that user conveys is obtained, the control information includes voice messaging;
Extract the text feature of the voice messaging;
Determine the corresponding Text eigenvector of the text feature;
According to the Classification of Speech model of pre-training, the matched speech samples of the Text eigenvector are determined;The voice
Disaggregated model indicates ownership probability of the Text eigenvector with corresponding speech samples;
Phonetic control command by the corresponding phonetic control command of identified voice sample, as the voice messaging;
According to the phonetic control command, target control instruction is generated.
The embodiment of the present invention also provides a kind of human-computer interaction device, comprising:
Data obtaining module is controlled, for obtaining the control information of user's reception and registration, the control information includes voice messaging;
Text character extraction module, for extracting the text feature of the voice messaging;
Text eigenvector determining module, for determining the corresponding Text eigenvector of the text feature;
Speech samples determining module determines the Text eigenvector for the Classification of Speech model according to pre-training
The speech samples matched;The Classification of Speech model table is shown with ownership probability of the Text eigenvector with corresponding speech samples;
Phonetic order determining module is used for by the corresponding phonetic control command of identified voice sample, as the voice
The phonetic control command of information;
Target instruction target word generation module, for generating target control instruction according to the phonetic control command.
The embodiment of the present invention also provides a kind of human-computer interaction terminal, comprising: at least one processor and at least one processing
Device;
The memory is stored with program, and the processor calls described program;Described program is used for:
The control information that user conveys is obtained, the control information includes voice messaging;
Extract the text feature of the voice messaging;
Determine the corresponding Text eigenvector of the text feature;
According to the Classification of Speech model of pre-training, the matched speech samples of the Text eigenvector are determined;The voice
Disaggregated model indicates ownership probability of the Text eigenvector with corresponding speech samples;
Phonetic control command by the corresponding phonetic control command of identified voice sample, as the voice messaging;
According to the phonetic control command, target control instruction is generated.
Based on the above-mentioned technical proposal, man-machine interaction method provided in an embodiment of the present invention can believe the control that user conveys
Voice messaging in breath carries out Text character extraction, and determines corresponding Text eigenvector;To according to the voice of pre-training
Disaggregated model, it may be determined that the matched speech samples of Text eigenvector;And then with the corresponding voice of identified voice sample
Control instruction generates target control instruction by the phonetic control command as the phonetic control command of the voice messaging,
Realize the generation instructed in human-computer interaction process for the target control of machine.
Possible intention is belonged to since the Classification of Speech model of pre-training can accurately define each Text eigenvector
Speech samples probability so that the corresponding relationship of speech samples and Text eigenvector is more accurate;Therefore by the present invention
Embodiment, user can carry out human-computer interaction by being similar to the exchange way of person to person, and user passes through natural voice messaging
After conveying voice messaging to human-computer interaction terminal, human-computer interaction terminal can utilize Classification of Speech model, accurately identify user
The matched speech samples of the voice messaging of reception and registration, to identify the voice letter that user conveys by the matched speech samples of institute
Cease the phonetic control command being intended to.Using the embodiment of the present invention, user conveys the mode of voice messaging can be more naturally, man-machine
Interactive terminal can accurately match the speech samples of user speech information by Classification of Speech model, realize user speech letter
The accurate determination for ceasing the phonetic control command being intended to reduces user to improve the naturality and intelligence of human-computer interaction
The exchange threshold of human-computer interaction is carried out, provides strong support for the universal of human-computer interaction.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
The embodiment of invention for those of ordinary skill in the art without creative efforts, can also basis
The attached drawing of offer obtains other attached drawings.
Fig. 1 is the structural block diagram of man-machine interactive system provided in an embodiment of the present invention;
Fig. 2 is another structural block diagram of man-machine interactive system provided in an embodiment of the present invention;
Fig. 3 is the structural block diagram of human-computer interaction terminal;
Fig. 4 is the construction method flow chart of Classification of Speech model provided in an embodiment of the present invention;
Fig. 5 is the flow chart of man-machine interaction method provided in an embodiment of the present invention;
Fig. 6 is the example schematic diagram of human-computer interaction;
Fig. 7 is another flow chart of man-machine interaction method provided in an embodiment of the present invention;
Fig. 8 is the method flow diagram for improving particle filter processing gesture posture feature;
Fig. 9 is recongnition of objects method flow diagram provided in an embodiment of the present invention;
Figure 10 is the structural block diagram of human-computer interaction device provided in an embodiment of the present invention;
Figure 11 is another structural block diagram of human-computer interaction device provided in an embodiment of the present invention;
Figure 12 is another structural block diagram of human-computer interaction device provided in an embodiment of the present invention;
Figure 13 is the another structural block diagram of human-computer interaction device provided in an embodiment of the present invention;
Figure 14 is another structural block diagram again of human-computer interaction device provided in an embodiment of the present invention;
Figure 15 is another structural block diagram again of human-computer interaction device provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
Man-machine interaction method provided in an embodiment of the present invention can be applicable to robot control, mobile phone control, automatic Pilot etc.
Aspect;For purposes of illustration only, hereafter will be mainly in terms of service robot control, to human-computer interaction side provided in an embodiment of the present invention
Method is introduced;Certainly, man-machine interaction method provided in an embodiment of the present invention mobile phone control, in terms of use
Principle is consistent with the use principle in terms of service robot controls, can be cross-referenced.
It needs to introduce, service robot is one kind of robot, and service robot can be divided into professional domain service
Robot and personal, home-services robot, service robot are of wide application, and are mainly engaged in maintenance, repairing, fortune
The work such as defeated, cleaning, security personnel, rescue, monitoring.
Optionally, Fig. 1 is a kind of alternative construction block diagram of man-machine interactive system provided in an embodiment of the present invention, referring to Fig.1,
The man-machine interactive system may include: human-computer interaction terminal 10 and service robot 11;Human-computer interaction terminal 10 and service-delivery machine
People 11 can realize information exchange by internet;
Based on man-machine interactive system shown in Fig. 1, user can be by conveying control information, human-computer interaction to human-computer interaction terminal
It, can be by the Internet transmission control instruction to service-delivery machine after terminal understands the corresponding control instruction of control information that user conveys
People executes the control instruction by service robot, completes the work that user is intended to;
Optionally, user conveys the mode of control information to can be voice to human-computer interaction terminal;It is also possible to voice knot
Close gesture etc.;
Further, service robot can by the status information of robot, and/or, view-based access control model perception environmental information, lead to
The Internet transmission is crossed to human-computer interaction terminal, from human-computer interaction terminal to the status information of user's show robot, and/or, clothes
The environmental information (can be shown by the display screen of human-computer interaction terminal) on business robot periphery, so that user preferably conveys control
Information processed.
Man-machine interactive system shown in Fig. 1 can transmit information in human-computer interaction terminal and the service-delivery machine human world by internet,
Realize remote control of the user to service robot;It certainly, is only a kind of alternative construction of man-machine interactive system shown in Fig. 1, optionally,
The case where human-computer interaction terminal built in service robot is not precluded in the embodiment of the present invention, as shown in Fig. 2, to which human-computer interaction is whole
End can be worked by local communication (forms such as local wired or local area network wireless) control service robot;Shown in Fig. 2
Man-machine interactive system is removed communication mode by being changed into outside local communication by internet communication, other aspects can with shown in Fig. 1
Man-machine interactive system is similar.
Optionally, human-computer interaction terminal may be considered service robot and the interaction platform of user, and realize to clothes
The controlling terminal of business robot control;Human-computer interaction terminal can be separately positioned with service robot, realizes information by internet
Interaction, is also possible to be built in service robot, human-computer interaction terminal can be in the corresponding control of control information for understanding user
After instruction, corresponding control instruction is transmitted to service robot, thus to control member (such as motor, motor of service robot
Deng) controlled, complete the work that user is intended to.
In embodiments of the present invention, human-computer interaction terminal can load corresponding program, realize that the embodiment of the present invention provides
Man-machine interaction method;The program can be stored by the memory of human-computer interaction terminal, by the processor tune of human-computer interaction terminal
With execution;Optionally, Fig. 3 shows a kind of alternative construction of human-computer interaction terminal, and referring to Fig. 3, which can be with
It include: at least one processor 1, at least one communication interface 2, at least one processor 3 and at least one communication bus 4;
In embodiments of the present invention, processor 1, communication interface 2, memory 3, communication bus 4 quantity be at least one,
And processor 1, communication interface 2, memory 3 complete mutual communication by communication bus 4;Obviously, processor shown in Fig. 3
1, the communication connection signal of communication interface 2, memory 3 and communication bus 4 is only optional;
Processor 1 may be a central processor CPU or specific integrated circuit ASIC (Application
Specific Integrated Circuit), or be arranged to implement the integrated electricity of one or more of the embodiment of the present invention
Road.
Memory 3 may include high speed RAM memory, it is also possible to further include nonvolatile memory (non-volatile
Memory), a for example, at least magnetic disk storage.
Wherein, memory 3 is stored with program, and the program that processor 1 calls memory 3 to be stored realizes that the present invention is implemented
The man-machine interaction method that example provides.
Voice commonly conveys the mode of control information, the control conveyed below with user to human-computer interaction terminal as user
Information processed includes the case where voice, and man-machine interaction method provided in an embodiment of the present invention is introduced.It is described below man-machine
Exchange method is suitable for service robot control, mobile phone control, automatic Pilot etc..
For the naturality and intelligence for promoting human-computer interaction, how to make that service robot is more acurrate, quick understandings use
The intention of family voice is very important, therefore the embodiment of the present invention considers building accurately and efficiently Classification of Speech model, thus
The corresponding speech samples of voice that more accurate, quick identification user conveys, by the corresponding phonetic control command of speech samples,
Determine the phonetic control command that the voice that user conveys is intended to.
Fig. 4 is the construction method flow chart of Classification of Speech model provided in an embodiment of the present invention, the Classification of Speech model
Construction method can be implemented by background server, and trained Classification of Speech model can import in human-computer interaction terminal, by man-machine
Realize the identification of the corresponding speech samples of user speech in interactive terminal;Certainly, the building of Classification of Speech model, is also possible to by people
It realizes machine interactive terminal;
Referring to Fig. 4, this method may include:
Step S100, training corpus is obtained, the training corpus record has the speech samples of each phonetic control command,
One phonetic control command corresponds at least one speech samples.
The speech samples for each phonetic control command that training corpus record has the embodiment of the present invention to collect in advance, and training
A phonetic control command corresponds at least one speech samples in corpus;It, can be with by the speech samples of each phonetic control command
Classification of Speech model is trained using machine learning algorithm.
Optionally, speech samples can be the natural language of user, and phonetic control command can be natural-sounding and be converted
Can by service robot understand control instruction.
Step S110, the text feature for extracting each speech samples, obtains multiple text features.
The embodiment of the present invention can carry out Text character extraction to each speech samples, and the extracted text of a speech samples is special
Sign may be at least one, so that multiple text features can be obtained by carrying out Text character extraction to each speech samples;
Optionally, the extracted text feature of different speech samples there may be repeat the case where, can be to repeated text
Feature carries out duplicate removal, so that duplicate text feature is not present in obtained multiple text features;
Optionally, the text feature of speech samples may be considered, after carrying out text conversion to speech samples, from being converted
Text in the forms such as the keyword that extracts text feature.
Step S120, feature vector weighting is carried out respectively to each text feature, obtain the text feature of each text feature to
Amount.
Optionally, the embodiment of the present invention can utilize TF-IDF (term frequency-inverse document
Frequency, term frequency-inverse document frequency, a technique for for information retrieval) feature vector is carried out to each text feature respectively
Weighting, hence for each text feature, obtain the corresponding Text eigenvector of text feature, get multiple text features to
Amount.
It should be noted that TF-IDF is a kind of statistical method, to assess words for the weight of the file in corpus
Want degree;The importance of words, but simultaneously can be as it be in corpus with the directly proportional increase of number that it occurs hereof
The number of middle appearance is inversely proportional decline;
Optionally, for a text feature, the embodiment of the present invention can determine the words of this article eigen in corresponding voice
Go out occurrence in sample (the corresponding speech samples of text feature may be considered, and extract the speech samples of this article eigen)
Number, and the frequency of occurrence in training corpus, thus according to the words of this article eigen in corresponding speech samples and instruction
Practice the frequency of occurrence in corpus, significance level of this article eigen in corresponding speech samples is determined, according to the important journey
It spends and determines the corresponding Text eigenvector of this article eigen;Wherein, the words of the significance level and text feature is in speech samples
Frequency of occurrence it is proportional, inversely with the frequency of occurrence of the words of text feature in corpus;
Optionally, when carrying out the acquisition of Text eigenvector, if there is n word, then accordingly the text of available n dimension is special
Levy vector.
Step S130, according to machine learning algorithm, to the ownership probability of each Text eigenvector and corresponding speech samples
It is modeled, obtains Classification of Speech model.
Optionally, the corresponding speech samples of a Text eigenvector, it can be understood as be that text feature vector is intended to
The speech samples of expression, quantity at least one;The ownership probability of Text eigenvector and a corresponding speech samples, can be with
It is considered that Text eigenvector belongs to the probability of the corresponding speech samples;
Optionally, since the text feature that different speech samples extract may be identical, a text feature may be right
Answer at least one speech samples, correspondingly, speech samples corresponding to the Text eigenvector of a text feature be also likely to be to
It is one few;And the Text eigenvector of a text feature can indicate, this article eigen is important in corresponding speech samples
Degree, thus the embodiment of the present invention can according to the significance level represented by Text eigenvector, determine Text eigenvector with
The ownership probability of corresponding each speech samples;
And then using machine learning algorithm, to each Text eigenvector, and, each Text eigenvector with it is corresponding each
The ownership probability of speech samples is modeled, and Classification of Speech model is obtained;Optionally, Classification of Speech model can indicate text spy
Levy the ownership probability of vector and corresponding speech samples.
Pass through Text eigenvector and the ownership probability of Text eigenvector and corresponding speech samples, the present invention
Embodiment can accurately define the probability that each Text eigenvector belongs to the speech samples that may be intended to, so that voice sample
This is more accurate with the corresponding relationship of Text eigenvector;It is trained with this and obtains Classification of Speech model, natural language can be passed through
Text eigenvector, accurately determine belonged to speech samples, realize the corresponding voice of natural language that user conveys
The accurate identification of sample;To the subsequent phonetic control command by identified speech samples, the voice as the natural language
Control instruction can accurately determine the matched phonetic control command of natural language that user conveys, and promote service robot pair
The identification accuracy for the phonetic control command that user's natural language is intended to provides for the intelligence and naturality of human-computer interaction
It may.
Optionally, the embodiment of the present invention can utilize maximum entropy algorithm, to each Text eigenvector and corresponding voice control
The ownership probability of instruction is modeled, and a kind of probability distribution uniform maximum entropy disaggregated model (shape of Classification of Speech model is obtained
Formula), the ownership probability of Text eigenvector and corresponding speech samples is indicated by the maximum entropy disaggregated model;
Optionally, in specific modeling, the embodiment of the present invention can be realized using following formula;
Wherein, fi(x, y) is the characteristic function of i-th of Text eigenvector, and n is characterized function number, the numerical value and text of n
The numerical value of eigen vector is consistent, if i-th of Text eigenvector and corresponding speech samples appear in it is same it is collected from
In right language, then it is assumed that fi(x, y) is 1, otherwise it is assumed that fi(x, y) is 0, λiFor fi(x, y) corresponding weight, λ are that glug is bright
Day multiplier, Z (x) be the normalization factor set, p*For the expression parameter of maximum entropy disaggregated model.
Optionally, when being modeled using maximum entropy algorithm, since modeling process is realized by Given information, for known
Information has accomplished to meet as far as possible, and does not do any hypothesis to unknown, it is possible to comprehensive observing phase to various correlations or not
The probability of pass is applied under Text eigenvector classification, and performance is better than the machine learning algorithms such as other Bayes;This
Inventive embodiments preferably use maximum entropy algorithm, establish the Classification of Speech model of maximum entropy disaggregated model form, but this is only
The machine learning algorithm of other Bayes etc. is not precluded in preferred embodiment, the embodiment of the present invention.
After training obtains Classification of Speech model, human-computer interaction terminal is communicated to user using Classification of Speech model
Control information comprising voice is handled, so that the speech samples that the voice that user conveys matches are identified, with the voice
The corresponding phonetic control command of sample, the corresponding phonetic control command of voice conveyed as user.
Fig. 5 is the flow chart of man-machine interaction method provided in an embodiment of the present invention, and this method can be applied to human-computer interaction end
End, referring to Fig. 5, this method may include:
Step S200, the control information that user conveys is obtained, the control information includes voice messaging.
Optionally, the control information that human-computer interaction terminal can be conveyed by setting detector acquisition user, control information can
To include the voice messaging of user's reception and registration, it is also possible to the gesture information conveyed including user;This embodiment discussion controls packet
The case where including voice messaging, for including the case where that gesture information will be described later;
Optionally, the form of detector can such as microphone speech detector, three-dimensional camera or infrared thermoviewer etc.
Contactless visual detector etc.;The form of detector can type setting according to the control information, do not make fixed limitation.
Step S210, the text feature of the voice messaging is extracted.
Optionally, the embodiment of the present invention can carry out text conversion to the voice messaging, mention from the text being converted to
Corresponding text feature is taken, to get the text feature of the voice messaging.
Step S220, the corresponding Text eigenvector of the text feature is determined.
Optionally, the embodiment of the present invention can determine the corresponding Text eigenvector of the text feature by TF-IDF;It can
Choosing, when determining the corresponding Text eigenvector of the text feature, the embodiment of the present invention in combination with the voice messaging with
Training corpus carries out the determination of the corresponding Text eigenvector of the text feature.
Step S230, according to the Classification of Speech model of pre-training, the matched speech samples of the Text eigenvector are determined.
Optionally, since the Classification of Speech model of pre-training can indicate Text eigenvector and corresponding speech samples
Belong to probability, by Classification of Speech model, the embodiment of the present invention can determine that the voice that the Text eigenvector may belong to
Sample, and the ownership probability with each speech samples that may belong to, so as to choose the ownership highest speech samples of probability, as
The matched speech samples of Text eigenvector.
Step S240, the voice control by the corresponding phonetic control command of identified voice sample, as the voice messaging
System instruction.
Step S250, according to the phonetic control command, target control instruction is generated.
Target control instruction can be the final control instruction for service robot of human-computer interaction terminal generation, in list
Solely using on the basis of voice control, phonetic control command can be instructed directly as target control and be used by the embodiment of the present invention;
And in the case where gesture is used in combination in user, user gesture corresponding gesture control instruction will also be instructed as target control
One parameter, thus by combining the phonetic control command of user speech expression and the gesture control instruction of user gesture expression,
Generate target control instruction.
Certainly, to the target object in service robot local environment scene, (target object may be considered clothes needing
Robot be engaged in based on object operated by user's control) when being controlled (being only a kind of optional control situation), it may also be combined with
Environment scene carries out the identification of target object, refers to so that service robot carries out target control for the target object of identification
Enable corresponding control operation.
Man-machine interaction method provided in an embodiment of the present invention, the voice messaging in control information that can be conveyed to user carry out
Text character extraction, and determine corresponding Text eigenvector;To according to the Classification of Speech model of pre-training, it may be determined that described
The matched speech samples of Text eigenvector;And then with the corresponding phonetic control command of identified voice sample, as institute's predicate
The phonetic control command of message breath generates target control instruction by the phonetic control command, realizes in human-computer interaction process
For the generation of the target control instruction of machine.
Possible intention is belonged to since the Classification of Speech model of pre-training can accurately define each Text eigenvector
Speech samples probability so that the corresponding relationship of speech samples and Text eigenvector is more accurate;Therefore by the present invention
Embodiment, user can carry out human-computer interaction by being similar to the exchange way of person to person, and user passes through natural voice messaging
After conveying voice messaging to human-computer interaction terminal, human-computer interaction terminal can utilize Classification of Speech model, accurately identify user
The matched speech samples of the voice messaging of reception and registration, to identify the voice letter that user conveys by the matched speech samples of institute
Cease the phonetic control command being intended to.Using the embodiment of the present invention, user conveys the mode of voice messaging can be more naturally, man-machine
Interactive terminal can accurately match the speech samples of user speech information by Classification of Speech model, realize user speech letter
The accurate determination for ceasing the phonetic control command being intended to reduces user to improve the naturality and intelligence of human-computer interaction
The exchange threshold of human-computer interaction is carried out, provides strong support for the universal of human-computer interaction.
The embodiment of the present invention using voice messaging carry out human-computer interaction example can as shown in fig. 6, user to human-computer interaction
Terminal says the voice for carrying out service robot control;After human-computer interaction terminal obtains the voice that user conveys, digitize the speech into
For text, extracts the text feature of text and determine the corresponding Text eigenvector of text feature, pass through maximum entropy disaggregated model
The matched speech samples of text feature vector are determined, so that it is determined that going out the corresponding phonetic control command of the speech samples;People
The phonetic control command is passed through the Internet transmission to service robot by machine interactive terminal, and service robot executes the voice control
Remote control operation of the user to service robot is realized in instruction.Certainly, the human-computer interaction terminal in Fig. 6 may also be built in service
In robot.
The embodiment of the present invention also realizes human-computer interaction in combination with user gesture, and user gesture is being combined to realize human-computer interaction
In the process, human-computer interaction terminal need to understand the corresponding gesture control instruction of user gesture, be more accurate determination user gesture
Corresponding gesture control instruction, it is necessary to the identification of user gesture be optimized, to pass through the standard for promoting user gesture identification
Exactness carrys out the instruction of service hoisting gesture control and fixes exactness really.
In terms of the accuracy for promoting user gesture identification, the embodiment of the present invention can be quasi- by promoting the identification of hand gesture location
The recognition accuracy of exactness and gesture posture is realized, and then determining hand gesture location is merged with gesture posture, realizes user hand
The recognition accuracy of gesture is promoted.
Correspondingly, the control information that user conveys can also include gesture information in method shown in Fig. 5;Optionally, Fig. 7
Another flow chart of man-machine interaction method provided in an embodiment of the present invention is shown, this method can be by human-computer interaction terminal reality
Row, referring to Fig. 7, this method may include:
Step S300, the control information that user conveys is obtained, the control information includes voice messaging and gesture information;Institute
Stating gesture information includes: hand gesture location feature and gesture posture feature.
Optionally, gesture information can be represented by the user gesture image (such as user gesture image sequence) of continuous multiple frames
Original gesture feature information, original gesture feature information can be extracted from the user gesture image of continuous multiple frames, should
Original gesture feature information can be indicated by hand gesture location feature and gesture posture feature;Optionally, hand gesture location feature is hand
The relevant feature in gesture position, such as manpower is in the coordinate of tri- axis of XYZ, speed, acceleration etc., gesture posture feature such as manpower about
The rotation angle etc. of each axis of coordinate system of tri- axis of XYZ;
Optionally, the image detection that user gesture image can be contactless by three-dimensional camera or infrared thermoviewer etc.
Device acquisition is realized;Such as stereoscopic vision camera or infrared imaging sensor can be with real-time detection and identification manpower, relevant biographies
Sensor hardware has binocular vision camera, Kinect somatosensory sensor and Leap Motion sensor;With Leap Motion biography
For sensor, manpower is placed on detection zone, and sensor can acquire three-dimension gesture image with high-frequency, return to manpower for Leap
Rotation angle (gesture of the rectangular co-ordinate (a kind of form of hand gesture location feature) and palm of Motion basis coordinates about three-coordinate
A kind of form of posture feature), to obtain the gesture information indicated by hand gesture location feature and gesture posture feature;
Optionally, the images of gestures of the embodiment of the present invention can be three dimensional form, to pass through the three-dimensional hand to manpower
Gesture is captured, and can identify that the interaction of user is intended to and is converted into interactive instruction;It is different from traditional two-dimentional gesture interaction, three
Dimension gesture data has many advantages, such as that semantic meaning representation is abundant, mapping intuitive.
Step S310, the text feature of the voice messaging is extracted.
Step S320, the corresponding Text eigenvector of the text feature is determined.
Step S330, according to the Classification of Speech model of pre-training, the matched speech samples of the Text eigenvector are determined.
Step S340, the voice control by the corresponding phonetic control command of identified voice sample, as the voice messaging
System instruction.
Optionally, the processing of step S310 to step S340 are referred to step S210 shown in Fig. 5 to step S240;Scheming
In method shown in 7, there is also the parallel processing for user gesture image, steps as follows.
Step S350, the hand gesture location feature is handled according to adaptive Interval Kalman filter, obtains target gesture position
Set feature;And the gesture posture feature is handled according to particle filter is improved, obtain target gesture posture feature.
Optionally, adaptive Interval Kalman filter (Adaptive Kalman Filter) can utilize measurement data
It while being filtered, constantly goes to judge whether the dynamic of system changes in itself by filtering, unites to model parameter and noise
Meter characteristic is estimated and is corrected, with the actual error improved filter design, reduce filtering;It is filtered by adaptive section Kalman
The original hand gesture location feature that wave processing is extracted from user gesture image, may filter that the noise of detector and manpower muscle are trembled
The dynamic influence for user gesture, the accuracy for the hand gesture location feature (i.e. target hand gesture location feature) that promotes that treated;
It is extracted from user gesture image by improving particle filter (Improved Particle filter) processing
The quaternion components of original gesture posture feature, (i.e. target gesture posture is special for the gesture posture feature that can make that treated
Sign) quaternion components more approaching to reality quaternion components, to promote the accuracy of treated gesture posture feature.
Step S360, the target hand gesture location feature and the target gesture posture feature are merged, determines the hand of user
Gesture feature.
The embodiment of the present invention and can will be adopted using adaptive Interval Kalman filter treated target hand gesture location feature
With particle filter is improved, treated that target gesture feature blends, and realizes the determination of the gesture feature of user;The present invention is implemented
Example is by adaptive Interval Kalman filter and improves particle filter, can carry out to the temporal correlation of hand gesture location and posture
Constraint, to eliminate the instability and ambiguousness of three-dimension gesture data as much as possible.
Step S370, the corresponding gesture control instruction of the gesture feature is determined.
Optionally, the settable gesture control instruction database of the embodiment of the present invention records each gesture by gesture control instruction database
The corresponding gesture feature of control instruction (can be three dimensional form), so that the embodiment of the present invention is determining user gesture image
Gesture feature after, each gesture control that can be recorded by gesture control instruction database instructs corresponding gesture feature, determines
The instruction of gesture control corresponding to the gesture feature of user gesture image.
Step S380, it is instructed according to the phonetic control command and the gesture control, generates target control instruction.
Optionally, step S350 to step S370 and step S310 to step S340 can be parallel, be to be directed to respectively
The processing of the control information of the control information and user speech form of user gesture image format, step S350 to step S370,
And step S310 is to can be without apparent tandem between step S340.
Optionally, for service robot target control instruct, generally can by four variables form control to
(C is described in amount formdir,Copt,Cvol,Cunit), wherein CdirFor operative orientation keyword, CoptAnd CvolIt is retouched for a heap operation
It states, respectively operation keyword and operating value, CunitFor operating unit, which is known as voice control variable;
Four variables can then be specified by phonetic control command under normal circumstances;Correspondingly, generating mesh according to phonetic control command
When marking control instruction, the embodiment of the present invention can determine the corresponding voice control variable of phonetic control command, the voice control variable
Include: operative orientation keyword indicated by phonetic control command, operate keyword, operates the corresponding operating value of keyword, and
Operating unit;To describe target control instruction with the dominant vector that the voice control variable is constituted, target control instruction is realized
Generation.
In the case where combining voice and gesture control, the embodiment of the present invention can increase new variable Chand, i.e. target control
System instruction may be modified such that the dominant vector form being made up of following five variables is described:
(Cdir,Copt,Chand,Cval,Cunit);
And in the case where not needing gesture control, it is believed that Chand=NULL.
Correspondingly, when generating target control instruction, the present invention is implemented according to phonetic control command and gesture control instruction
Example can determine the corresponding voice control variable of phonetic control command, which includes: indicated by phonetic control command
Operative orientation keyword, operate keyword, the corresponding operating value of operation keyword and operating unit;And gesture is determined simultaneously
The corresponding gesture control variable of control instruction;To which in conjunction with the voice control variable and the gesture control variable, formation is retouched
State the dominant vector (C of target control instructiondir,Copt,Chand,Cval,Cunit), realize the generation of target control instruction.
Optionally, the embodiment of the present invention, which is believed that in the detection range of visual detector, captures user gesture image,
Then think to need that control gesture is combined to control;Otherwise it is assumed that do not need to combine gesture control, it can (may also by voice messaging
In conjunction with the environment scene of service robot) determine (Cdir,Copt,Cvol,Cunit) composition dominant vector.
Optionally, it is situated between below for the means for handling hand gesture location feature according to adaptive Interval Kalman filter
It continues.It should be noted that during obtaining images of gestures by contactless visual detector, images of gestures expression
User gesture can energy band detector noise, and user gesture determined by making often has unstability, ambiguousness and fuzzy
Property;In addition, being not intended to act since human factor inevitably will appear muscle jitter etc., to make when user carries out gesture operation
User gesture determined by obtaining has non-precision;Therefore the embodiment of the present invention can be handled by adaptive Interval Kalman filter
Original hand gesture location feature in user gesture image, so that the noise and manpower muscle jitter that filter out detector are for user
The influence of gesture.
Optionally, the model of adaptive Interval Kalman filter can be expressed as follows:
Wherein,It is the state vector of n × 1 at k moment, in order to make Kalman filtering preferably estimate manpower position data, shape
The variable of manpower speed and manpower acceleration is introduced in state vector;The state transition matrix of n × n, can according to displacement,
Relationship between velocity and acceleration is designed;It is n × l control output matrix, is determined by acceleration of gravity;It is defeated
Incoming vector,WithNoisy vector is represented,Usual Gaussian distributed;Be m × 1 at k moment measurement vector (Member
Element andUnanimously, obtained in k moment measurement, as manpower is equivalent in the position in the direction XYZ, speed, acceleration),It is m × n
Observation matrix;
It should be noted that
Wherein, Φ is state transition matrix, in embodiments of the present inventionInclude position, speed, acceleration, the i.e. member of Φ
Element is the coefficient of the variables such as position, speed, acceleration when meeting kinematics formula;Γ is constant matrices, by system input vectorElement number output be it is consistent with state vector;H is constant matrices, illustrates the relationship of measurement vector and state vector;
Introducing three with Δ symbols herein indicates that the normal of unknown but bounded determines perturbation matrix.
Correspondingly, state x ' of the hand gesture location feature in moment kkIt can be expressed as follows:
x′k=[px,k,Vx,k,Ax,k,py,k,Vy,k,Ay,k,pz,k,Vz,k,Az,k]
Wherein, px,k, py,k, pz,kFor the coordinate of k moment manpower tri- axis of XYZ in space, Vx,k, Vy,k, Vz,kFor k moment people
Speed of the hand in the direction XYZ, Ax,k, Ay,k, Az,kAcceleration for k moment manpower in the direction XYZ.Because of adaptive section karr
Graceful filtering is an estimator, when more acurrate can estimate current using gesture coordinate, gesture speed and the acceleration of previous moment
The position at quarter;
And in this process, noisy vector can indicate are as follows: w'k=[0,0, w'x,0,0,w'y,0,0,wz]T, wherein
(w'x,w'y,wz) be palm acceleration process noise (can be do not meet gesture whole acceleration change rule make an uproar
Sound), this noisy vector can be filtered out in the model of adaptive Interval Kalman filter;Thus by hand gesture location feature
In the state x ' of moment k-1k-1(as in model formation), noisy vector (as) using above-mentioned statement from
Adapt to Interval Kalman filter model handled, available filtering noise, and eliminate muscle jitter at the time of k target
Hand gesture location feature.
As can be seen that the embodiment of the present invention can determine that gesture acceleration becomes according to the corresponding acceleration of hand gesture location feature
Law, to pass through the model of adaptive Interval Kalman filter, filtering deviates the noise of gesture acceleration change rule, and
Using the model of adaptive Interval Kalman filter, sat according to the gesture of previous moment in the hand gesture location feature after noise filtering excessively
Mark, gesture speed and acceleration, estimate gesture coordinate, gesture speed and the acceleration at current time, determine current time
Target hand gesture location feature;
So that being obtained by the accuracy of the fused target hand gesture location feature of adaptive Interval Kalman filter
It is promoted, can be used to carry out service robot coarse adjustment control operation (since user can not accurately allow when not by foreign object
Hand is mobile in millimeter dimension accuracy, so what is carried out to service robot is coarse adjustment control).
Optionally, for according to the means for improving particle filter processing gesture posture feature, Fig. 8 shows implementation of the present invention
What example provided improves the method flow diagram of particle filter processing gesture posture feature, and this method can be executed by human-computer interaction terminal,
Referring to Fig. 8, this method may include:
Step S400, manpower represented by gesture posture feature is obtained in the rotation angle of each axis of three-dimensional system of coordinate.
Step S410, quaternion components are determined in the rotation angle of each axis of three-dimensional system of coordinate according to the manpower.
Optionally, Quaternion Algorithm can be used to carry out the estimation in rigid body direction, can carry out the calculating of quaternion components;Four
First number component is one group of supercomplex, can describe rigid body in the posture in space, quaternion components can refer in inventive embodiments
The posture of manpower;Correspondingly, the embodiment of the present invention can be determined by the original gesture posture feature extracted from images of gestures
In the rotation angle of each axis of three-dimensional system of coordinate, (rotation angle can be in the information that original gesture posture feature is included manpower out
One kind), and then corresponding quaternion components are determined with Quaternion Algorithm.
Step S420, according to particle filter is improved, the posterior probability of manpower particle is determined.
Optionally, error brought by Quaternion Algorithm is used in order to reduce, enhances data using particle filter is improved
(fusion is each attitude data for expressing the particle of manpower, and improved particle filter algorithm can be chosen preferably for fusion
The importance density function or optimization resampling process, it is therefore an objective to obtain accurate manpower attitude data);Improving particle filter can be with
The particle after resampling is handled using Markov chain Monte Carlo, to improve the diversification of particle, is avoided
The local convergence phenomenon of standard particle filtering, improves the accuracy of data estimation.
Step S430, the quaternion components according to the posterior probability iterative processing, obtain target quaternion components, with
Get target gesture posture feature.
Optionally, target quaternion components can approach the true quaternion components of manpower gesture.
Optionally, when the posterior probability for carrying out manpower particle determines, in tkMoment, the posterior probability of manpower particle it is close
It can be with like value is defined as:
Wherein, xi,kIt is tkI-th of state particle at moment, N are number of samples, ωi,kIt is tkI-th of particle at moment
Criteria weights, δ are Dirac functions;xkIt can be manpower state, be in embodiments of the present invention 4 elements of quaternary number, it can
To be used to indicate the posture of manpower.
So as to pass through the posterior probability of manpower particle, iterative calculation manpower particle be (i.e. original manpower posture feature
Quaternion components), make particle state increasingly approaching to reality value, obtains true three-dimension gesture posture (i.e. target gesture posture
Feature);
Specific iterative manner can following formula:
Wherein KkIt is kalman gain, zkIt is observation, h is Observation Operators, vi,kIt is particle in tkI-th of state of moment
Observation error;
Rigid-body attitude (the purpose of calculating quaternion components is to obtain rigid-body attitude) is then indicated using quaternary number, in tk+1Moment
The quaternion components of each particle can be expressed as follows, to obtain target gesture posture feature;
Wherein ω indicates angular speed, and t is sample time.
The embodiment of the present invention, can be to the original gesture posture extracted from user gesture image by improving particle filter
Feature is handled, and be may make the estimation accuracy of gesture posture feature to be greatly improved, be can be provided for service-delivery machine
People carries out coarse adjustment control operation.
What needs to be explained here is that particle weight calculation need by combine Kalman filtering location estimation as a result,
Since there are certain associations on out of control for the position of three-dimension gesture data and posture;I.e. the velocity and acceleration of gesture has side
Tropism, and direction then needs body coordinate system determined by posture to be calculated, the position of gesture is on three-dimensional
Superposition amount needs posture to be estimated, therefore by combining adaptive Interval Kalman filter, can by position and posture when
The empty restrictive precision for improving data estimation.Since accurate position data can preferably calculate particle weights, to obtain
Accurate attitude data, and accurately attitude data can be passed through by velocity and acceleration preferably estimated position data
Adaptive Interval Kalman filter and improvement particle filter are handled and are merged to manpower position and attitude data, can be more preferable
The three-dimension gesture feature of user is estimated on ground, improves the accuracy and robustness of identified user gesture feature.
Optionally, further, in fusion target hand gesture location feature and target gesture posture feature, the gesture of user is determined
After feature, the embodiment of the present invention can be filtered with the gesture feature that damping method is not intended to indicate to user, and by drawing
Enter the accuracy that virtual spring coefficient further increases gesture identification;It can specifically realized by following formula:
Wherein F is the input of robot control instruction, and wherein k is virtual spring coefficient, and D is the mobile distance of manpower, and τ is bullet
Property limiting threshold value, when D be greater than τ, robot be not responding to the three-dimension gesture input;I.e. the embodiment of the present invention is in the gesture for determining user
After feature, if the distance of the corresponding manpower movement of the gesture feature of user then needs to filter greater than the elastic limit threshold value of setting
The gesture feature;I.e. in view of in interactive process manpower position may occur acutely it is mobile (be different from muscle jitter, referred to herein as
Manpower is frequently moved in large-scale position), three-dimension gesture data at this time are to be not intended to input data, so by these data
It filters out, keeps the stability of system;
Correspondingly, the embodiment of the present invention can determine non-mistake when determining that the corresponding gesture control of the gesture feature instructs
The corresponding gesture control instruction of the gesture feature of filter.
The optional application example of the embodiment of the present invention can be such that
User says the voice " mobile toward this direction " for carrying out service robot control to human-computer interaction terminal, and makes
It is directed toward gesture;After human-computer interaction terminal obtains the voice that user conveys, text is converted speech into, the text feature of text is extracted
And determine the corresponding Text eigenvector of text feature, determine that text feature vector is matched by maximum entropy disaggregated model
Speech samples enable so that it is determined that going out the mobile relevant voice control of the corresponding execution of the speech samples;
Meanwhile human-computer interaction terminal obtains the hand gesture location feature and gesture posture feature of user gesture image, according to certainly
It adapts to Interval Kalman filter and handles the hand gesture location feature, and handle the gesture posture feature according to particle filter is improved, and
Treated hand gesture location feature and gesture posture feature are merged, the gesture feature of user is determined, is based on the gesture
Feature, human-computer interaction terminal can determine gesture control instruction relevant to moving direction;
It is instructed to which human-computer interaction terminal can be enabled according to determining voice control with gesture control, control service robot exists
It moves in the direction of user's instruction;I.e. human-computer interaction terminal can " be moved by the operational order that the natural language of user obtains
It is dynamic ", moving direction is the direction of user's finger.
In this human-computer interaction process, user can be in conjunction with voice and gesture, so that between user and service robot
Exchange can be similar to the exchange between user, and human-computer interaction is very convenient direct, improve the naturality and intelligence of human-computer interaction
Property, the exchange threshold that user carries out human-computer interaction is reduced, provides strong support for the universal of human-computer interaction.
Optionally, in some man-machine interaction scenarios, service robot is generally required according to user's control to environment scene
In target object operated, if user indicate service robot " cup for picking up ground ", then service robot needs to know
Not Chu " cup " this target object in environment scene, tell robot which is " cup " without user, " cup " exists
Where etc. information, thus the cup in the environment-identification scene that service robot can be autonomous, and execute the operation of " picking up ";It can be seen that
So that service robot has certain independence, user just seems very simple during control for cognition to environment,
Therefore the target object in environment scene is accurately identified, facilitates the naturality and intelligence that promote human-computer interaction.
Optionally, Fig. 9 shows recongnition of objects method flow provided in an embodiment of the present invention, and this method can be applied to
Human-computer interaction terminal, referring to Fig. 9, this method may include:
Step S500, environment scene image is obtained.
Optionally, the embodiment of the present invention can be acquired by image collecting devices such as cameras preset on service robot
Environment scene image;Environment scene image may be considered the image of environment scene at service robot;
Optionally, if human-computer interaction terminal is interacted with service robot by internet, human-computer interaction terminal can lead to
It crosses internet and obtains service robot environment scene image collected;If service robot is built-in with human-computer interaction terminal,
Then human-computer interaction terminal can obtain the image collecting device of service robot, environment scene image collected.
Step S510, the HOG feature of the environment scene image is determined.
Optionally, HOG (Histogram of Oriented Gradient, direction gradient can be used in the embodiment of the present invention
Histogram) characteristics of image in environment scene image is described in feature;Obviously, HOG feature is only one kind of characteristics of image
The characteristics of image of other forms can also be used in optional avatar, the embodiment of the present invention.
HOG is primarily used to calculate the statistical value of topography's Gradient direction information.Relative to other feature descriptors,
The advantage of HOG is that its algorithm operating is carried out in the local cells elementary layer of image, so that it is with good geometry and light
Learn invariance.
The characteristics of image in environment scene image is described using HOG feature, then it can be first by environment scene image
It is divided into a certain number of subgraphs, then each subgraph is divided into cell factory according to certain rules;Then, for each
A subgraph can acquire the gradient orientation histogram (i.e. HOG feature) of each pixel in cell factory, it is straight to calculate each gradient direction
Density of the square figure in the subgraph, to do normalized to each cell factory in the subgraph according to the density;
Finally the normalization result of each subgraph is combined, determines the HOG feature of environment scene image.
Step S520, the target keyword in the voice messaging that user conveys is extracted.
Optionally, target keyword can be the keyword of target object to be identified in environment scene, carry in user
Voice messaging in;Optionally, target keyword is usually that occlusion (needs the name of the target object operated in environment scene
Word etc.), after following the movement word in voice messaging, or it is associated with word is acted in voice messaging.
Step S530, it according to the object-class model of pre-training, is matched from the HOG feature of environment scene image and mesh
Mark the corresponding HOG feature of keyword.
Step S540, by the corresponding object of the matched HOG feature of institute in environment scene image, it is determined as identified target
Object.
Optionally, target object may be considered service robot based on object operated by user's control, can be institute
The object being directed to when stating target control instruction execution.
In this process, object-class model can indicate the corresponding HOG feature of each object, the instruction of object-class model
Practice study, it is most important for the accuracy and efficiency of target identification;Here, deep learning method can be used in the embodiment of the present invention
Training objective disaggregated model;Deep learning be from un-marked data be unfolded study, this closer to human brain mode of learning,
Voluntarily mastering concept after training can be passed through;In face of mass data, deep learning algorithm can accomplish that traditional artificial intelligence is calculated
The thing that method can not accomplish, and exporting result can be more accurate with the increase of data processing amount.This will be increased substantially
The efficiency of computer treatmenting information;And according to the network structure of foundation difference, there is also very big for the training method of deep learning
Difference;In order to allow the robot to complete on-line study within a short period of time, training obtains object-class model, and the present invention is implemented
Example is quasi- to take a two stage method to be learnt;
Optionally, the characteristics of image comprising the object that can be reduced by one for any object, the embodiment of the present invention
Feature set (referred to as fisrt feature collection) determine Candidate Set, it is special to reuse a bigger, more reliable image comprising the object
(arrangement mode can be the feature according to HOG feature to feature in feature set (referred to as second feature collection) the arrangement Candidate Set of sign
Be worth it is descending realize etc., specific queueing discipline can not do stringent limitation), i.e., the image of the second feature collection object that is included is special
Sign, is more than, the characteristics of image for the object that fisrt feature collection is included, to choose the feature of the setting tagmeme arranged in Candidate Set
As the training characteristics of the object, to obtain the training characteristics of the object;This processing is carried out for any object, then can be obtained
The training characteristics of each object, and then object-class model is obtained according to the training of the training characteristics of each object.
Optionally, in human-computer interaction, robot can know unknown object by means of the Heuristics of user
Not, or from identification mistake it is corrected, this just needs to establish the training pattern of a tape label data, can more new engine
The learning network parameter of people.Under the cooperation of user, one side robot can be best understood from unknown by the description of user
The feature (Features) of object;On the other hand, robot can correctly recognize object by sharing an experience for user
(Ground-truth);
In learning process, to find out the optimal parameter of accuracy of identification for making system;It here, will be defeated in user's supporting process
What is entered is used to correct the data conduct of machine ginseng number, the learning network parameter attribute value (Features) and label of robot
Data (Ground-truth), to update the learning network parameter of robot according to this feature value and label data.
Man-machine interaction method provided in an embodiment of the present invention, user can be by voices, alternatively, the shape of voice combination gesture
Formula conveys control information to human-computer interaction terminal, and the mode that user carries out human-computer interaction can be similar to the exchange between user, people
Machine interaction is very convenient direct;Meanwhile human-computer interaction terminal carries out target object in combination with the environment scene of service robot
Identification, is described further the target object operated in the control information of reception and registration without user, so that user
Human-computer interaction process seem very simple;As it can be seen that man-machine interaction method provided in an embodiment of the present invention improves human-computer interaction
Naturality and intelligence, reduce the exchange threshold that user carries out human-computer interaction, for human-computer interaction it is universal provide it is strong
Support.
Human-computer interaction device provided in an embodiment of the present invention is introduced below, human-computer interaction device described below can
To be considered human-computer interaction terminal, the program module of setting needed for the man-machine interaction method that embodiment provides to realize the present invention.
Human-computer interaction device content described below can correspond to each other reference with above-described man-machine interaction method content.
Figure 10 is the structural block diagram of human-computer interaction device provided in an embodiment of the present invention, which can be applied to human-computer interaction
Terminal, referring to Fig.1 0, this method may include:
Data obtaining module 100 is controlled, for obtaining the control information of user's reception and registration, the control information includes voice letter
Breath;
Text character extraction module 200, for extracting the text feature of the voice messaging;
Text eigenvector determining module 300, for determining the corresponding Text eigenvector of the text feature;
Speech samples determining module 400 determines the Text eigenvector for the Classification of Speech model according to pre-training
Matched speech samples;The Classification of Speech model table is shown with ownership probability of the Text eigenvector with corresponding speech samples;
Phonetic order determining module 500 is used for by the corresponding phonetic control command of identified voice sample, as institute's predicate
The phonetic control command of message breath;
Target instruction target word generation module 600, for generating target control instruction according to the phonetic control command.
Optionally, speech samples determining module 400 determines the text for the Classification of Speech model according to pre-training
The matched speech samples of feature vector, specifically include:
Determine the speech samples that the Text eigenvector may belong to according to the Classification of Speech model, and with may return
The ownership probability of each speech samples belonged to;
The ownership highest speech samples of probability are chosen, as the matched speech samples of the Text eigenvector.
Optionally, Figure 11 shows another structural block diagram of human-computer interaction device provided in an embodiment of the present invention, in conjunction with figure
Shown in 10 and Figure 11, which can also include:
Classification of Speech model training module 700, for obtaining training corpus, the training corpus record has each voice
The speech samples of control instruction, corresponding at least one speech samples of a phonetic control command;The text for extracting each speech samples is special
Sign, obtains multiple text features;Feature vector weighting is carried out to each text feature respectively, obtains the text feature of each text feature
Vector;According to machine learning algorithm, each Text eigenvector is modeled with the ownership probability of corresponding speech samples, is obtained
Classification of Speech model.
Optionally, Classification of Speech model training module 700, for carrying out feature vector weighting respectively to each text feature,
The Text eigenvector of each text feature is obtained, is specifically included:
For a text feature, frequency of occurrence of the words of this article eigen in corresponding speech samples is determined, and
Frequency of occurrence in training corpus;
According to frequency of occurrence of the words of this article eigen in corresponding speech samples and training corpus, this article is determined
Significance level of the eigen in corresponding speech samples;Wherein, the words of the significance level and text feature is in speech samples
Frequency of occurrence it is proportional, inversely with the frequency of occurrence of the words of text feature in corpus;
The corresponding Text eigenvector of this article eigen is determined according to the significance level.
Optionally, Classification of Speech model training module 700 is used for according to machine learning algorithm, to each Text eigenvector
It is modeled with the ownership probability of corresponding speech samples, obtains Classification of Speech model, specifically include:
Using maximum entropy algorithm, each Text eigenvector is built with the ownership probability of corresponding phonetic control command
Mould obtains the uniform maximum entropy disaggregated model of probability distribution.
Optionally, the embodiment of the present invention may also be combined with user gesture and carry out human-computer interaction, correspondingly, the control information is also
It may include gesture information;The gesture information may include: the hand gesture location feature and gesture from user gesture image zooming-out
Posture feature;
Optionally, Figure 12 shows another structural block diagram of human-computer interaction device provided in an embodiment of the present invention, in conjunction with figure
Shown in 10 and Figure 12, which can also include:
Adaptive Interval Kalman filter processing module 800, for according to the processing of adaptive Interval Kalman filter
Hand gesture location feature obtains target hand gesture location feature;
Particle filter processing module 900 is improved, for obtaining according to the particle filter processing gesture posture feature is improved
Target gesture posture feature;
Gesture feature determining module 1000, it is special for merging the target hand gesture location feature and the target gesture posture
Sign, determines the gesture feature of user;
Gesture control instructs determining module 1100, for determining the corresponding gesture control instruction of the gesture feature;
Correspondingly, target instruction target word generation module 600, for generating target control instruction according to the phonetic control command,
It specifically includes:
It is instructed according to the phonetic control command and the gesture control, generates target control instruction.
Optionally, adaptive Interval Kalman filter processing module 800, at according to adaptive Interval Kalman filter
The hand gesture location feature is managed, target hand gesture location feature is obtained, specifically includes:
According to the corresponding acceleration of hand gesture location feature, gesture acceleration change rule is determined;
According to the model of adaptive Interval Kalman filter, filtering deviates the noise of gesture acceleration change rule;
Using the model of adaptive Interval Kalman filter, according to previous moment in the hand gesture location feature after noise filtering excessively
Gesture coordinate, gesture speed and acceleration, estimate gesture coordinate, gesture speed and the acceleration at current time, determine to work as
The target hand gesture location feature at preceding moment.
Optionally, particle filter processing module 900 is improved, for special according to the particle filter processing gesture posture is improved
Sign, obtains target gesture posture feature, specifically includes:
Manpower represented by gesture posture feature is obtained in the rotation angle of each axis of three-dimensional system of coordinate;
According to the manpower in the rotation angle of each axis of three-dimensional system of coordinate, quaternion components are determined;
According to particle filter is improved, the posterior probability of manpower particle is determined;
According to quaternion components described in the posterior probability iterative processing, target quaternion components are obtained, to get mesh
Mark gesture posture feature.
Optionally, target instruction target word generation module 600, for being referred to according to the phonetic control command and the gesture control
It enables, generates target control instruction, specifically include:
Determine that the corresponding voice control variable of the phonetic control command, the voice control variable include: the voice
Operative orientation keyword indicated by control instruction operates keyword, the corresponding operating value of operation keyword and operating unit;
And determine that the gesture control instructs corresponding gesture control variable;
In conjunction with the voice control variable and the gesture control variable, formed the control of description target control instruction to
Amount.
Optionally, Figure 13 shows the another structural block diagram of human-computer interaction device provided in an embodiment of the present invention, in conjunction with figure
Shown in 12 and Figure 13, which can also include:
Gesture feature filtering module 1200, if the distance of the corresponding manpower movement of gesture feature for user, is greater than
The elastic limit threshold value of setting, then filter the gesture feature;
Correspondingly, gesture control instructs determining module 1100, it can be used for determining the corresponding gesture of unfiltered gesture feature
Control instruction.
Optionally, Figure 14 shows another structural block diagram again of human-computer interaction device provided in an embodiment of the present invention, in conjunction with
Shown in Figure 10 and Figure 14, which can also include:
Recongnition of objects module 1300, for obtaining environment scene image;Determine the image of the environment scene image
Feature;Extract the target keyword in the voice messaging that user conveys;According to the object-class model of pre-training, from environment scene
Characteristics of image corresponding with target keyword is matched in the characteristics of image of image;The object-class model indicates each right
As corresponding characteristics of image;By the corresponding object of the matched characteristics of image of institute in environment scene image, it is determined as identified mesh
Mark object;The object that the target object is directed to when being the target control instruction execution.
Optionally, object-class model training module is realized as shown in Figure 15 for the training of object-class model, and Figure 15 is shown
Another structural block diagram again of human-computer interaction device provided in an embodiment of the present invention, in conjunction with shown in Figure 14 and Figure 15, the device is also
May include:
Object-class model training module 1400, for for any object, by the inclusion of the characteristics of image of the object
Fisrt feature collection determines Candidate Set, arranges the feature in Candidate Set by the inclusion of the second feature collection of the characteristics of image of the object,
Training characteristics of the feature of the setting tagmeme arranged in Candidate Set as the object are chosen, to obtain the training characteristics of each object;
Wherein, the characteristics of image for the object that second feature collection is included, is more than, the characteristics of image for the object that fisrt feature collection is included;
Object-class model is obtained according to the training of the training characteristics of each object.
Optionally, human-computer interaction device provided in an embodiment of the present invention can also be used in:
Using the data for being used to correct machine ginseng number of user's input as the characteristic value of the learning network parameter of robot
And label data;The learning network parameter of robot is updated according to this feature value and label data.
Optionally, the module architectures of above-described human-computer interaction device can be loaded into human-computer interaction end by program form
In end.The structure of human-computer interaction terminal can be as shown in Figure 3, comprising: at least one processor and at least one processor;
Wherein, the memory is stored with program, and the processor calls described program;Described program is used for:
The control information that user conveys is obtained, the control information includes voice messaging;
Extract the text feature of the voice messaging;
Determine the corresponding Text eigenvector of the text feature;
According to the Classification of Speech model of pre-training, the matched speech samples of the Text eigenvector are determined;The voice
Disaggregated model indicates ownership probability of the Text eigenvector with corresponding speech samples;
Phonetic control command by the corresponding phonetic control command of identified voice sample, as the voice messaging;
According to the phonetic control command, target control instruction is generated.
Each embodiment in this specification is described in a progressive manner, the highlights of each of the examples are with other
The difference of embodiment, the same or similar parts in each embodiment may refer to each other.For device disclosed in embodiment
For, since it is corresponded to the methods disclosed in the examples, so being described relatively simple, related place is said referring to method part
It is bright.
Professional further appreciates that, unit described in conjunction with the examples disclosed in the embodiments of the present disclosure
And algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clearly demonstrate hardware and
The interchangeability of software generally describes each exemplary composition and step according to function in the above description.These
Function is implemented in hardware or software actually, the specific application and design constraint depending on technical solution.Profession
Technical staff can use different methods to achieve the described function each specific application, but this realization is not answered
Think beyond the scope of this invention.
The step of method described in conjunction with the examples disclosed in this document or algorithm, can directly be held with hardware, processor
The combination of capable software module or the two is implemented.Software module can be placed in random access memory (RAM), memory, read-only deposit
Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology
In any other form of storage medium well known in field.
The foregoing description of the disclosed embodiments enables those skilled in the art to implement or use the present invention.
Various modifications to these embodiments will be readily apparent to those skilled in the art, as defined herein
General Principle can be realized in other embodiments in the case where not departing from core of the invention thought or scope.Therefore, originally
Invention is not intended to be limited to the embodiments shown herein, and is to fit to and the principles and novel features disclosed herein
Consistent widest scope.
Claims (15)
1. a kind of man-machine interaction method characterized by comprising
The control information that user conveys is obtained, the control information includes voice messaging;
Extract the text feature of the voice messaging;
Determine the corresponding Text eigenvector of the text feature;
According to the Classification of Speech model of pre-training, the matched speech samples of the Text eigenvector are determined;The Classification of Speech
Model table is shown with ownership probability of the Text eigenvector with corresponding speech samples;
Phonetic control command by the corresponding phonetic control command of identified voice sample, as the voice messaging;
According to the phonetic control command, target control instruction is generated.
2. man-machine interaction method according to claim 1, which is characterized in that the Classification of Speech mould according to pre-training
Type determines that the matched speech samples of the Text eigenvector include:
The speech samples that the Text eigenvector may belong to are determined according to the Classification of Speech model, and belong to possibility
The ownership probability of each speech samples;
The ownership highest speech samples of probability are chosen, as the matched speech samples of the Text eigenvector.
3. man-machine interaction method according to claim 1 or 2, which is characterized in that further include:
Training corpus is obtained, the training corpus record there are the speech samples of each phonetic control command, and a voice control refers to
Enable corresponding at least one speech samples;
The text feature for extracting each speech samples obtains multiple text features;
Feature vector weighting is carried out to each text feature respectively, obtains the Text eigenvector of each text feature;
According to machine learning algorithm, each Text eigenvector is modeled with the ownership probability of corresponding speech samples, is obtained
Classification of Speech model.
4. man-machine interaction method according to claim 3, which is characterized in that described to carry out feature respectively to each text feature
Vector weighting, the Text eigenvector for obtaining each text feature include:
For a text feature, frequency of occurrence of the words of this article eigen in corresponding speech samples is determined, and instructing
Practice the frequency of occurrence in corpus;
According to frequency of occurrence of the words of this article eigen in corresponding speech samples and training corpus, text spy is determined
Levy the significance level in corresponding speech samples;Wherein, the words of the significance level and text feature going out in speech samples
Occurrence number is proportional, inversely with the frequency of occurrence of the words of text feature in corpus;
The corresponding Text eigenvector of this article eigen is determined according to the significance level.
5. man-machine interaction method according to claim 3, which is characterized in that it is described according to machine learning algorithm, to each text
Eigen vector is modeled with the ownership probability of corresponding speech samples, is obtained Classification of Speech model and is included:
Using maximum entropy algorithm, each Text eigenvector is modeled with the ownership probability of corresponding phonetic control command, is obtained
To the uniform maximum entropy disaggregated model of probability distribution.
6. man-machine interaction method according to claim 1, which is characterized in that the control information further includes gesture information;
The gesture information includes: the hand gesture location feature and gesture posture feature from user gesture image zooming-out;
The method also includes:
The hand gesture location feature is handled according to adaptive Interval Kalman filter, obtains target hand gesture location feature;And according to
It improves particle filter and handles the gesture posture feature, obtain target gesture posture feature;
The target hand gesture location feature and the target gesture posture feature are merged, determines the gesture feature of user;
Determine the corresponding gesture control instruction of the gesture feature;
It is described according to the phonetic control command, generating target control instruction includes:
It is instructed according to the phonetic control command and the gesture control, generates target control instruction.
7. man-machine interaction method according to claim 6, which is characterized in that the adaptive Interval Kalman filter of basis
The hand gesture location feature is handled, obtaining target hand gesture location feature includes:
According to the corresponding acceleration of hand gesture location feature, gesture acceleration change rule is determined;
According to the model of adaptive Interval Kalman filter, filtering deviates the noise of gesture acceleration change rule;
Using the model of adaptive Interval Kalman filter, according to the hand of previous moment in the hand gesture location feature after noise filtering excessively
Gesture coordinate, gesture speed and acceleration estimate gesture coordinate, gesture speed and the acceleration at current time, when determining current
The target hand gesture location feature at quarter.
8. man-machine interaction method according to claim 6, which is characterized in that described according to improvement particle filter processing
Gesture posture feature, obtaining target gesture posture feature includes:
Manpower represented by gesture posture feature is obtained in the rotation angle of each axis of three-dimensional system of coordinate;
According to the manpower in the rotation angle of each axis of three-dimensional system of coordinate, quaternion components are determined;
According to particle filter is improved, the posterior probability of manpower particle is determined;
According to quaternion components described in the posterior probability iterative processing, target quaternion components are obtained, to get target hand
Gesture posture feature.
9. man-machine interaction method according to claim 6, which is characterized in that described according to the phonetic control command and institute
Gesture control instruction is stated, generating target control instruction includes:
Determine that the corresponding voice control variable of the phonetic control command, the voice control variable include: the voice control
The indicated operative orientation keyword of instruction operates keyword, the corresponding operating value of operation keyword and operating unit;And really
The fixed gesture control instructs corresponding gesture control variable;
In conjunction with the voice control variable and the gesture control variable, the dominant vector of description target control instruction is formed.
10. man-machine interaction method according to claim 1, which is characterized in that the method also includes:
Obtain environment scene image;
Determine the characteristics of image of the environment scene image;
Extract the target keyword in the voice messaging that user conveys;
According to the object-class model of pre-training, matched from the characteristics of image of environment scene image opposite with target keyword
The characteristics of image answered;The object-class model indicates the corresponding characteristics of image of each object;
By the corresponding object of the matched characteristics of image of institute in environment scene image, it is determined as identified target object;The mesh
The object that mark object is directed to when being the target control instruction execution.
11. man-machine interaction method according to claim 10, which is characterized in that the method also includes:
For any object, Candidate Set is determined by the inclusion of the fisrt feature collection of the characteristics of image of the object, it is right by the inclusion of this
Feature in the second feature collection arrangement Candidate Set of the characteristics of image of elephant, the feature for choosing the setting tagmeme arranged in Candidate Set are made
For the training characteristics of the object, to obtain the training characteristics of each object;Wherein, the image for the object that second feature collection is included is special
Sign, is more than, the characteristics of image for the object that fisrt feature collection is included;
Object-class model is obtained according to the training of the training characteristics of each object.
12. man-machine interaction method according to claim 1, which is characterized in that the method also includes:
Using the data for being used to correct machine ginseng number of user's input as the characteristic value and mark of the learning network parameter of robot
Sign data;
The learning network parameter of robot is updated according to this feature value and label data.
13. a kind of human-computer interaction device characterized by comprising
Data obtaining module is controlled, for obtaining the control information of user's reception and registration, the control information includes voice messaging;
Text character extraction module, for extracting the text feature of the voice messaging;
Text eigenvector determining module, for determining the corresponding Text eigenvector of the text feature;
Speech samples determining module determines that the Text eigenvector is matched for the Classification of Speech model according to pre-training
Speech samples;The Classification of Speech model table is shown with ownership probability of the Text eigenvector with corresponding speech samples;
Phonetic order determining module is used for by the corresponding phonetic control command of identified voice sample, as the voice messaging
Phonetic control command;
Target instruction target word generation module, for generating target control instruction according to the phonetic control command.
14. human-computer interaction device according to claim 13, which is characterized in that the control information further includes gesture letter
Breath;The gesture information includes: the hand gesture location feature and gesture posture feature from user gesture image zooming-out;
Described device further include:
Adaptive Interval Kalman filter processing module, for handling the hand gesture location according to adaptive Interval Kalman filter
Feature obtains target hand gesture location feature;
Particle filter processing module is improved, for obtaining target hand according to the particle filter processing gesture posture feature is improved
Gesture posture feature;
Gesture feature determining module is determined for merging the target hand gesture location feature and the target gesture posture feature
The gesture feature of user;
Gesture control instructs determining module, for determining the corresponding gesture control instruction of the gesture feature;
The target instruction target word generation module, for generating target control instruction, specifically including according to the phonetic control command:
It is instructed according to the phonetic control command and the gesture control, generates target control instruction.
15. a kind of human-computer interaction terminal characterized by comprising at least one processor and at least one processor;
The memory is stored with program, and the processor calls described program;Described program is used for:
The control information that user conveys is obtained, the control information includes voice messaging;
Extract the text feature of the voice messaging;
Determine the corresponding Text eigenvector of the text feature;
According to the Classification of Speech model of pre-training, the matched speech samples of the Text eigenvector are determined;The Classification of Speech
Model table is shown with ownership probability of the Text eigenvector with corresponding speech samples;
Phonetic control command by the corresponding phonetic control command of identified voice sample, as the voice messaging;
According to the phonetic control command, target control instruction is generated.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710408396.7A CN108986801B (en) | 2017-06-02 | 2017-06-02 | Man-machine interaction method and device and man-machine interaction terminal |
PCT/CN2018/088169 WO2018219198A1 (en) | 2017-06-02 | 2018-05-24 | Man-machine interaction method and apparatus, and man-machine interaction terminal |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710408396.7A CN108986801B (en) | 2017-06-02 | 2017-06-02 | Man-machine interaction method and device and man-machine interaction terminal |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108986801A true CN108986801A (en) | 2018-12-11 |
CN108986801B CN108986801B (en) | 2020-06-05 |
Family
ID=64455698
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710408396.7A Active CN108986801B (en) | 2017-06-02 | 2017-06-02 | Man-machine interaction method and device and man-machine interaction terminal |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108986801B (en) |
WO (1) | WO2018219198A1 (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109902155A (en) * | 2018-12-29 | 2019-06-18 | 清华大学 | Multi-modal dialog condition processing method, device, medium and calculating equipment |
CN110047487A (en) * | 2019-06-05 | 2019-07-23 | 广州小鹏汽车科技有限公司 | Awakening method, device, vehicle and the machine readable media of vehicle-mounted voice equipment |
CN110444204A (en) * | 2019-07-22 | 2019-11-12 | 北京艾米智能机器人科技有限公司 | A kind of offline intelligent sound control device and its control method |
CN110491390A (en) * | 2019-08-21 | 2019-11-22 | 深圳市蜗牛智能有限公司 | A kind of method of controlling switch and device |
CN111026320A (en) * | 2019-12-26 | 2020-04-17 | 腾讯科技(深圳)有限公司 | Multi-mode intelligent text processing method and device, electronic equipment and storage medium |
CN112037786A (en) * | 2020-08-31 | 2020-12-04 | 百度在线网络技术(北京)有限公司 | Voice interaction method, device, equipment and storage medium |
CN112102831A (en) * | 2020-09-15 | 2020-12-18 | 海南大学 | Cross-data, information and knowledge modal content encoding and decoding method and component |
CN112101219A (en) * | 2020-09-15 | 2020-12-18 | 济南大学 | Intention understanding method and system for elderly accompanying robot |
CN112257434A (en) * | 2019-07-02 | 2021-01-22 | Tcl集团股份有限公司 | Unmanned aerial vehicle control method, system, mobile terminal and storage medium |
CN112307974A (en) * | 2020-10-31 | 2021-02-02 | 海南大学 | User behavior content coding and decoding method of cross-data information knowledge mode |
CN112395456A (en) * | 2021-01-20 | 2021-02-23 | 北京世纪好未来教育科技有限公司 | Audio data classification method, audio data training device, audio data medium and computer equipment |
CN112527113A (en) * | 2020-12-09 | 2021-03-19 | 北京地平线信息技术有限公司 | Method and apparatus for training gesture recognition and gesture recognition network, medium, and device |
CN112908328A (en) * | 2021-02-02 | 2021-06-04 | 安通恩创信息技术(北京)有限公司 | Equipment control method, system, computer equipment and storage medium |
CN113569712A (en) * | 2021-07-23 | 2021-10-29 | 北京百度网讯科技有限公司 | Information interaction method, device, equipment and storage medium |
CN113674742A (en) * | 2021-08-18 | 2021-11-19 | 北京百度网讯科技有限公司 | Man-machine interaction method, device, equipment and storage medium |
CN113779201A (en) * | 2021-09-16 | 2021-12-10 | 北京百度网讯科技有限公司 | Method and device for recognizing instruction and voice interaction screen |
CN114490971A (en) * | 2021-12-30 | 2022-05-13 | 重庆特斯联智慧科技股份有限公司 | Robot control method and system based on man-machine conversation interaction |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111382267B (en) * | 2018-12-29 | 2023-10-10 | 深圳市优必选科技有限公司 | Question classification method, question classification device and electronic equipment |
CN111796926A (en) * | 2019-04-09 | 2020-10-20 | Oppo广东移动通信有限公司 | Instruction execution method and device, storage medium and electronic equipment |
CN111862946B (en) * | 2019-05-17 | 2024-04-19 | 北京嘀嘀无限科技发展有限公司 | Order processing method and device, electronic equipment and storage medium |
CN112527095A (en) * | 2019-09-18 | 2021-03-19 | 奇酷互联网络科技(深圳)有限公司 | Man-machine interaction method, electronic device and computer storage medium |
CN111177375B (en) * | 2019-12-16 | 2023-06-02 | 医渡云(北京)技术有限公司 | Electronic document classification method and device |
CN113726686A (en) * | 2020-05-26 | 2021-11-30 | 中兴通讯股份有限公司 | Flow identification method and device, electronic equipment and storage medium |
CN112102830B (en) * | 2020-09-14 | 2023-07-25 | 广东工业大学 | Coarse granularity instruction identification method and device |
CN112783324B (en) * | 2021-01-14 | 2023-12-01 | 科大讯飞股份有限公司 | Man-machine interaction method and device and computer storage medium |
CN115476366B (en) * | 2021-06-15 | 2024-01-09 | 北京小米移动软件有限公司 | Control method, device, control equipment and storage medium for foot robot |
CN113539243A (en) * | 2021-07-06 | 2021-10-22 | 上海商汤智能科技有限公司 | Training method of voice classification model, voice classification method and related device |
CN114166204A (en) * | 2021-12-03 | 2022-03-11 | 东软睿驰汽车技术(沈阳)有限公司 | Repositioning method and device based on semantic segmentation and electronic equipment |
CN117122859B (en) * | 2023-09-08 | 2024-03-01 | 广州普鸿信息科技服务有限公司 | Intelligent voice interaction fire-fighting guard system and method |
CN117611924B (en) * | 2024-01-17 | 2024-04-09 | 贵州大学 | Plant leaf phenotype disease classification method based on graphic subspace joint learning |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120239396A1 (en) * | 2011-03-15 | 2012-09-20 | At&T Intellectual Property I, L.P. | Multimodal remote control |
CN102945672A (en) * | 2012-09-29 | 2013-02-27 | 深圳市国华识别科技开发有限公司 | Voice control system for multimedia equipment, and voice control method |
CN104423543A (en) * | 2013-08-26 | 2015-03-18 | 联想(北京)有限公司 | Information processing method and device |
CN104656877A (en) * | 2013-11-18 | 2015-05-27 | 李君� | Human-machine interaction method based on gesture and speech recognition control as well as apparatus and application of human-machine interaction method |
CN106095109A (en) * | 2016-06-20 | 2016-11-09 | 华南理工大学 | The method carrying out robot on-line teaching based on gesture and voice |
CN106125925A (en) * | 2016-06-20 | 2016-11-16 | 华南理工大学 | Method is arrested based on gesture and voice-operated intelligence |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9152376B2 (en) * | 2011-12-01 | 2015-10-06 | At&T Intellectual Property I, L.P. | System and method for continuous multimodal speech and gesture interaction |
CN107150347B (en) * | 2017-06-08 | 2021-03-30 | 华南理工大学 | Robot perception and understanding method based on man-machine cooperation |
-
2017
- 2017-06-02 CN CN201710408396.7A patent/CN108986801B/en active Active
-
2018
- 2018-05-24 WO PCT/CN2018/088169 patent/WO2018219198A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120239396A1 (en) * | 2011-03-15 | 2012-09-20 | At&T Intellectual Property I, L.P. | Multimodal remote control |
CN102945672A (en) * | 2012-09-29 | 2013-02-27 | 深圳市国华识别科技开发有限公司 | Voice control system for multimedia equipment, and voice control method |
CN104423543A (en) * | 2013-08-26 | 2015-03-18 | 联想(北京)有限公司 | Information processing method and device |
CN104656877A (en) * | 2013-11-18 | 2015-05-27 | 李君� | Human-machine interaction method based on gesture and speech recognition control as well as apparatus and application of human-machine interaction method |
CN106095109A (en) * | 2016-06-20 | 2016-11-09 | 华南理工大学 | The method carrying out robot on-line teaching based on gesture and voice |
CN106125925A (en) * | 2016-06-20 | 2016-11-16 | 华南理工大学 | Method is arrested based on gesture and voice-operated intelligence |
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109902155A (en) * | 2018-12-29 | 2019-06-18 | 清华大学 | Multi-modal dialog condition processing method, device, medium and calculating equipment |
CN110047487A (en) * | 2019-06-05 | 2019-07-23 | 广州小鹏汽车科技有限公司 | Awakening method, device, vehicle and the machine readable media of vehicle-mounted voice equipment |
CN112257434B (en) * | 2019-07-02 | 2023-09-08 | Tcl科技集团股份有限公司 | Unmanned aerial vehicle control method, unmanned aerial vehicle control system, mobile terminal and storage medium |
CN112257434A (en) * | 2019-07-02 | 2021-01-22 | Tcl集团股份有限公司 | Unmanned aerial vehicle control method, system, mobile terminal and storage medium |
CN110444204A (en) * | 2019-07-22 | 2019-11-12 | 北京艾米智能机器人科技有限公司 | A kind of offline intelligent sound control device and its control method |
CN110491390A (en) * | 2019-08-21 | 2019-11-22 | 深圳市蜗牛智能有限公司 | A kind of method of controlling switch and device |
CN111026320B (en) * | 2019-12-26 | 2022-05-27 | 腾讯科技(深圳)有限公司 | Multi-mode intelligent text processing method and device, electronic equipment and storage medium |
CN111026320A (en) * | 2019-12-26 | 2020-04-17 | 腾讯科技(深圳)有限公司 | Multi-mode intelligent text processing method and device, electronic equipment and storage medium |
CN112037786A (en) * | 2020-08-31 | 2020-12-04 | 百度在线网络技术(北京)有限公司 | Voice interaction method, device, equipment and storage medium |
CN112102831A (en) * | 2020-09-15 | 2020-12-18 | 海南大学 | Cross-data, information and knowledge modal content encoding and decoding method and component |
CN112101219A (en) * | 2020-09-15 | 2020-12-18 | 济南大学 | Intention understanding method and system for elderly accompanying robot |
CN112101219B (en) * | 2020-09-15 | 2022-11-04 | 济南大学 | Intention understanding method and system for elderly accompanying robot |
CN112307974A (en) * | 2020-10-31 | 2021-02-02 | 海南大学 | User behavior content coding and decoding method of cross-data information knowledge mode |
CN112527113A (en) * | 2020-12-09 | 2021-03-19 | 北京地平线信息技术有限公司 | Method and apparatus for training gesture recognition and gesture recognition network, medium, and device |
CN112395456B (en) * | 2021-01-20 | 2021-04-13 | 北京世纪好未来教育科技有限公司 | Audio data classification method, audio data training device, audio data medium and computer equipment |
CN112395456A (en) * | 2021-01-20 | 2021-02-23 | 北京世纪好未来教育科技有限公司 | Audio data classification method, audio data training device, audio data medium and computer equipment |
CN112908328A (en) * | 2021-02-02 | 2021-06-04 | 安通恩创信息技术(北京)有限公司 | Equipment control method, system, computer equipment and storage medium |
CN113569712A (en) * | 2021-07-23 | 2021-10-29 | 北京百度网讯科技有限公司 | Information interaction method, device, equipment and storage medium |
CN113569712B (en) * | 2021-07-23 | 2023-11-14 | 北京百度网讯科技有限公司 | Information interaction method, device, equipment and storage medium |
CN113674742A (en) * | 2021-08-18 | 2021-11-19 | 北京百度网讯科技有限公司 | Man-machine interaction method, device, equipment and storage medium |
CN113779201A (en) * | 2021-09-16 | 2021-12-10 | 北京百度网讯科技有限公司 | Method and device for recognizing instruction and voice interaction screen |
CN113779201B (en) * | 2021-09-16 | 2023-06-30 | 北京百度网讯科技有限公司 | Method and device for identifying instruction and voice interaction screen |
CN114490971A (en) * | 2021-12-30 | 2022-05-13 | 重庆特斯联智慧科技股份有限公司 | Robot control method and system based on man-machine conversation interaction |
CN114490971B (en) * | 2021-12-30 | 2024-04-05 | 重庆特斯联智慧科技股份有限公司 | Robot control method and system based on man-machine interaction |
Also Published As
Publication number | Publication date |
---|---|
WO2018219198A1 (en) | 2018-12-06 |
CN108986801B (en) | 2020-06-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108986801A (en) | A kind of man-machine interaction method, device and human-computer interaction terminal | |
Mariappan et al. | Real-time recognition of Indian sign language | |
CN110135249B (en) | Human behavior identification method based on time attention mechanism and LSTM (least Square TM) | |
CN110555481A (en) | Portrait style identification method and device and computer readable storage medium | |
CN103984416A (en) | Gesture recognition method based on acceleration sensor | |
CN110741377A (en) | Face image processing method and device, storage medium and electronic equipment | |
Santhalingam et al. | Sign language recognition analysis using multimodal data | |
CN111680550B (en) | Emotion information identification method and device, storage medium and computer equipment | |
CN111444488A (en) | Identity authentication method based on dynamic gesture | |
CN105917356A (en) | Contour-based classification of objects | |
CN111813910A (en) | Method, system, terminal device and computer storage medium for updating customer service problem | |
Neverova | Deep learning for human motion analysis | |
CN110807391A (en) | Human body posture instruction identification method for human-unmanned aerial vehicle interaction based on vision | |
Yan et al. | Human-object interaction recognition using multitask neural network | |
CN110427864B (en) | Image processing method and device and electronic equipment | |
Kumarage et al. | Real-time sign language gesture recognition using still-image comparison & motion recognition | |
CN113449548A (en) | Method and apparatus for updating object recognition model | |
CN110991279A (en) | Document image analysis and recognition method and system | |
CN111798367A (en) | Image processing method, image processing device, storage medium and electronic equipment | |
CN106599926A (en) | Expression picture pushing method and system | |
CN104635930A (en) | Information processing method and electronic device | |
Li et al. | [Retracted] Human Motion Representation and Motion Pattern Recognition Based on Complex Fuzzy Theory | |
CN111611917A (en) | Model training method, feature point detection device, feature point detection equipment and storage medium | |
CN111571567A (en) | Robot translation skill training method and device, electronic equipment and storage medium | |
CN116503654A (en) | Multimode feature fusion method for carrying out character interaction detection based on bipartite graph structure |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |