CN106512393A

CN106512393A - Application voice control method and system suitable for virtual reality environment

Info

Publication number: CN106512393A
Application number: CN201610899523.3A
Authority: CN
Inventors: 曹志强; 谢英臣
Original assignee: Shanghai Outsider Mdt Infotech Ltd
Current assignee: Shanghai Outsider Mdt Infotech Ltd
Priority date: 2016-10-14
Filing date: 2016-10-14
Publication date: 2017-03-22

Abstract

The invention provides an application voice control method and system suitable for a virtual reality environment. The method includes: a voice acquisition step: acquiring a voice input command of a user; a voice command recognition step: extracting one or more voice input words from the voice input command of the user, and performing matching to acquire a voice command according to the voice input words; and a control command acquisition step: acquiring a control command associated with the voice command. The application voice control method and system can void the defect that a command input manner is limited due to shortage of hardware input equipment (such as a mouse and a keyboard) in a virtual reality game environment; the feedback speed of acquiring a result by the voice command is greatly improved; a user can control the input time by himself instead of real-time monitoring of input; and interference due to unmeant talk of players and outer sounds.

Description

Voice-operated method and system is applied suitable for reality environment

Technical field

The present invention relates to virtual reality technology game technical field and voice technology field, and in particular to one kind is applied to Voice-operated method and system is applied in reality environment.

Background technology

With the progressively maturation of virtual reality science and technology, people have also expressed increasing concern to virtual reality, Wherein, reality-virtualizing game exactly one of focus.

Electronic game industry has been developed many decades, and people have got used to carrying out game behaviour using mouse and keyboard Control, but under reality environment, by hardware limitation, people cannot be manipulated to game by mouse and keyboard.Such as Where player is allowed comfortably to experience naturally game content in reality environment, this has become reality-virtualizing game developers needs A big problem to be solved.

For many years, voice technology has had a great development, and have started to from the high research and production field of professional degree by Step enters into the central of the life of people.Wherein it is the biggest it is well known be exactly speech recognition technology, by huge sample Storehouse, recognizes vocabulary using complicated speech recognition algorithm, and using artificial neural network and the voice based on grammar rule at Constituting complete sentence, the huge material resources of this needs and manpower basis, medium-sized and small enterprises are difficult to undertake correlative charges reason mechanism.Language Sound recognizes the complexity of the huge and algorithm due to database so that recognition speed has higher delay, it is impossible to which meeting people makes The immediate feedback needed when being entertained with game software.Also, the language of the mankind is actually extremely complex, this So that the degree of accuracy of speech recognition is inversely proportional to the voice length of input.

Due to above reason, in computer game field, there is presently no company and voice technology is applied to into game In terms of the manipulation of system, being still for employing is manipulated to games system by keyboard and mouse this quasi-tradition input mode.

The content of the invention

For defect of the prior art, it is an object of the invention to provide one kind applies language suitable for reality environment The method and system of sound control.

Voice-operated method is applied suitable for reality environment according to one kind that the present invention is provided, including：

Speech acquisition step：The speech-input instructions of collection user；

Phonetic order identification step：One or more phonetic entry words are extracted from the speech-input instructions of user, Phonetic order is obtained according to phonetic entry word matched；

Control command obtaining step：The control command that acquisition is associated with phonetic order.

Preferably, the speech acquisition step, including：

Acquisition time window setting procedure：Voice collecting time window is determined according to the operation of user；

Voice is prescribed a time limit acquisition step：The speech-input instructions of user are gathered in voice collecting time window；

Punctuate judges step：During the speech-input instructions of collection user, will be greater than equal to dead time threshold value Pronunciation pause as punctuate mark.

Preferably, the acquisition time window setting procedure, including：

Time window initial time setting procedure：In non-voice acquisition time window, by the moment of user operation input equipment As the initial time of current speech acquisition time window；

Time window end time setting procedure：When current speech acquisition time window continues, by user operation input equipment Moment as this voice collecting time window end time.

Preferably, the phonetic order identification step, including：

Split word step：According to language model storehouse, one or more languages are extracted from the speech-input instructions of user Sound is input into word, and one or more of phonetic entry words are constituted to be identified group；

Matching step：To be identified group is matched in language model storehouse, obtain in language model storehouse with to be identified group The speech recognition group of matching；

Wherein, speech recognition group is corresponded with phonetic order.

Preferably, the language model library module is only made by phonetic order and is obtained, including：

Phonetic order presets step：One or more phonetic orders are preset, wherein, phonetic order is stored in language model storehouse In；

Speech recognition group construction step：For single phonetic order, by extract from phonetic order one or more Keyword is configured to speech recognition group, and wherein, speech recognition group is stored in language model library module；

Order association step：Speech recognition group and control command are set up into one-to-one incidence relation, wherein, association is closed System is stored in language model library module.

Voice-operated system is applied suitable for reality environment according to one kind that the present invention is provided, including：

Voice acquisition module：The speech-input instructions of collection user；

Phonetic order identification module：One or more phonetic entry words are extracted from the speech-input instructions of user, Phonetic order is obtained according to phonetic entry word matched；

Control command acquisition module：The control command that acquisition is associated with phonetic order.

Preferably, the voice acquisition module, including：

Acquisition time window setting module：Voice collecting time window is determined according to the operation of user；

Voice is prescribed a time limit acquisition module：The speech-input instructions of user are gathered in voice collecting time window；

Punctuate judge module：During the speech-input instructions of collection user, will be greater than equal to dead time threshold value Pronunciation pause as punctuate mark.

Preferably, the acquisition time window setting module, including：

Time window initial time setting module：In non-voice acquisition time window, by the moment of user operation input equipment As the initial time of current speech acquisition time window；

Time window end time setting module：When current speech acquisition time window continues, by user operation input equipment Moment as this voice collecting time window end time.

Preferably, the phonetic order identification module, including：

Split word module：According to language model storehouse, one or more languages are extracted from the speech-input instructions of user Sound is input into word, and one or more of phonetic entry words are constituted to be identified group；

Matching module：To be identified group is matched in language model storehouse, obtain in language model storehouse with to be identified group The speech recognition group of matching；

Wherein, speech recognition group is corresponded with phonetic order.

Preferably, including：

Phonetic order presetting module：One or more phonetic orders are preset, wherein, phonetic order is stored in language model storehouse In；

Speech recognition group builds module：For single phonetic order, by extract from phonetic order one or more Keyword is configured to speech recognition group, and wherein, speech recognition group is stored in language model library module；

Order association module：Speech recognition group and control command are set up into one-to-one incidence relation, wherein, association is closed System is stored in language model library module；

Wherein, the language model library module is only made by phonetic order and is obtained.

Compared with prior art, the present invention has following beneficial effect：

1st, make up and evaded under reality-virtualizing game environment, due to lacking hardware input equipment (such as mouse and keyboard) And the extremely limited situation of the instruction input mode that causes (such as existing HTC VIVE virtual game input equipments, user is in trip Only can be manipulated by 2 handle controllers in gaming in play, and each control machine is only had 6 buttons).

2nd, the feedback speed for obtaining result by phonetic order obtains significant increase.By the editor to speech model storehouse, The scale in speech model storehouse is reduced, simultaneously as given up the speech processing mechanism based on grammar rule, and only to voice list Word is matched itself, also significantly reduces the amount of calculation of voice messaging identification.

3rd, player oneself the control input time, rather than moment monitors input, reduces player and unintentionally speaks and extraneous The interference of sound.Setting dead time mark, allows player's control dead time, and minibreak when reducing due to speaking naturally is made Into punctuate mistake.

4th, the discrimination of long sentence is substantially improved.Because the complexity of human language and randomness so that computer Complete sentence is constituted based on the speech processing mechanism of grammar rule very difficult.So, conventional speech recognition technology is to long Sentence discrimination is relatively low.And after using the method for the present invention and system, using is carried out to the key words in phonetic order Matching and screening, so the key words included in phonetic order are more, it is easier correctly to be matched, so drastically increase The identification probability of long sentence.

5th, significantly reduce the cost of a set of available speech control system of framework.At present, many language have all been deposited In acoustic model, dictionary, or even large vocabulary language model is available for download, but in huge model library greatly actually It is not needed, but due to being limited by speech recognition algorithm and software content updates considers, and not directly deletes Remove.Meanwhile, the cost of collection special sound cannot also bear in most enterprises.After the method for the present invention and system, phase Shutout business can be adapted to the language model storehouse of itself to meet the use demand of oneself Games Software from edlin, can not only ensure Needed for content update voice resource addition, and not again can by huge acoustic model repository acquisition cost and complexity semanteme at Reason mechanism is limited.So that relevant manufactures can to have more method bands to give people happy, and be more worth for social creativity.

6th, the habits and customs of people, extremely low learning cost are more pressed close to.Keyboard and mouse in human society history Jing occurs in that the time of decades, and nonetheless, many special populations still need longer time learning and mastering its use Method.And the language technical ability that to be everyone be accustomed to grasps, without the need for learn again, and be also easier to be accepted, understand and Memory.

7th, in reality environment, more preferably, it is more natural to interact and manipulating.In life, people's custom passes through Interacting and manipulate, what reality-virtualizing game was emphasized is exactly that significant environment substitutes into sense for language and gesture.By the present invention Method and system, people will can from only limit hand manipulations limitation in break away from, combine this using voice and gesture Plant more natural mode to interact and manipulate.

Description of the drawings

Detailed description non-limiting example made with reference to the following drawings by reading, the further feature of the present invention, Objects and advantages will become more apparent upon：

Fig. 1 is the module relation figure of the present invention.

Fig. 2 is the speech processes principle schematic of the present invention.

Fig. 3 is the step flow chart of the present invention.

Specific embodiment

With reference to specific embodiment, the present invention is described in detail.Following examples will be helpful to the technology of this area Personnel further understand the present invention, but the invention is not limited in any way.It should be pointed out that the ordinary skill to this area For personnel, without departing from the inventive concept of the premise, some changes and improvements can also be made.These belong to the present invention Protection domain.

Speech acquisition step：The speech-input instructions of collection user；

Preferably, the speech acquisition step, including：

Preferably, the acquisition time window setting procedure, including：

Preferably, the phonetic order identification step, including：

Wherein, speech recognition group is corresponded with phonetic order.

The present invention also provides one kind and applies voice-operated system suitable for reality environment, described suitable for virtual Voice-operated side can be applied suitable for reality environment by described using voice-operated system in actual environment The flow process realization of the step of method.Carry out specifically using voice-operated system suitable for reality environment to described below It is bright.

It is described to apply voice-operated system suitable for reality environment, including：Phonetic order presetting module：It is default One or more phonetic orders, wherein, phonetic order is stored in language model storehouse；Speech recognition group builds module：For list One or more keywords extracted from phonetic order are configured to speech recognition group by one phonetic order, and wherein, voice is known Other group is stored in language model library module；Order association module：Speech recognition group and control command are set up into one-to-one Incidence relation, wherein, incidence relation is stored in language model library module；Wherein, the language model library module only passes through language Sound instruction making is obtained.

Specifically, what traditional language model library module (language model and dictionary) was included is that the word of whole languages is sent out The huge information such as sound, probability of occurrence, combinations of words.And the phonetic order system that the present invention only will be related in the application such as game As language model and dictionary, rather than using the model and dictionary of whole languages, this significantly reduces language model and word The scale of allusion quotation, so that improve the accuracy and speed of speech recognition.Wherein, build in module in speech recognition group, can be by language Phonetic word in sound instruction is divided into 2 priority：Then the phonetic word of high priority is made by high priority, low priority For keyword.Language model library module includes language model and dictionary.The information stored by language model is for constraining word Search, define which word can follow probability behind a upper identified word, can be thus matching process Exclude some impossible words.Such as, " I " is to have recognized word, just very high followed by the probability of " having a meal ", and " chicken The probability of egg " is just extremely low.Dictionary is contained from word (words) to the mapping phoneme (phones).Each pronunciation of words All being made up of phoneme, but multiple mappings being there may be because the pronunciation of people is different, the phoneme of such as " Fire " is included " F AY ER " or " F AY R ", can so improve identification probability.

It is described to apply voice-operated system suitable for reality environment, also include：Voice acquisition module：Collection is used The speech-input instructions at family；Phonetic order identification module：One or more voices are extracted from the speech-input instructions of user Input word, obtains phonetic order according to phonetic entry word matched；Control command acquisition module：Acquisition is associated with phonetic order Control command.

The voice acquisition module, including：Acquisition time window setting module：When determining voice collecting according to the operation of user Between window；Voice is prescribed a time limit acquisition module：The speech-input instructions of user are gathered in voice collecting time window；Punctuate judge module： During the speech-input instructions of collection user, will be greater than pausing as punctuate mark equal to the pronunciation of dead time threshold value Know.The acquisition time window setting module, including：Time window initial time setting module：In non-voice acquisition time window, will Initial time of the moment of user operation input equipment as current speech acquisition time window；Time window end time sets mould Block：When current speech acquisition time window continues, using the moment of user operation input equipment as this voice collecting time window End time.

Specifically, input equipment can be the specified button on virtual unit, and user can be by activating on virtual unit Specified button voluntarily control voice be input into beginning and end time, games system without the need for the moment monitor phonetic entry.In void When proposing standby upper specified button and not being activated, now not in voice collecting time window, the speech-input instructions that user sends It is invalid to be accordingly to be regarded as, and will not be input into games system, thus avoids to big degree user and unintentionally speaks and other sound Interference.Meanwhile, we are paused with the pronunciation of certain time and identify (such as continuing the pause of 1 second) as punctuate, when with After family is input into one section of continuous voice messaging, when pause duration reaches 1 second, this instruction input can be judged as by system automatically Terminate.User can by this method voluntarily between control statement pause, so as to avoid minibreak in natural pronunciation The punctuate mistake for causing.

The phonetic order identification module, including：Split word module：According to language model storehouse, the voice from user is defeated One or more phonetic entry words are extracted in entering instruction, one or more of phonetic entry words is constituted to be identified Group；Matching module：To be identified group is matched in language model storehouse, is matched with to be identified group in obtaining language model storehouse Speech recognition group；Wherein, speech recognition group is corresponded with phonetic order.

Specifically, to be identified group is divided with phonetic word with the phonetic entry word for each being included in speech recognition group Matching screening is not carried out, matching degree highest speech recognition group is therefrom filtered out, and with this result as index, is searched corresponding trip Play order, according to the game commands control games system for finding.Wherein, phonetic entry word is word with phonetic word, So as to be matched between word.

It is described to apply voice-operated system suitable for reality environment, also include：Apparatus control module, wherein, Apparatus control module is for according to control command control games system.

Below the preferred embodiment of the present invention is illustrated.

Example 1, realizes the effect at " interface of spreading out the map " in gaming using phonetic order " show me the map "

I realizes example 1 by following steps：

Step 1：If we have 3 phonetic orders：“show me the map”,“show myself”,“fire Debris ", and related words (" show " " me " " the " " map " " myself " " fire " " debris ") are constituted into play speech Model library.

Step 2：Phonetic order is split, further according to the identification priority of word, is recombinated respectively, is each corresponded to Speech recognition group, it is as follows：

Phonetic order	Speech recognition group after splitting and reorganizing
		show me the map	“show”+“me”+“map”
show myself	“show”+“myself”
		fire debris	“fire”+“debris”

Step 3：By speech recognition group and game commands associated storage, for afterwards the step of inquiry it is used, it is as follows：

Speech recognition group after splitting and reorganizing	Game control command
		“show”+“me”+“map”	Spread out the map interface
“show”+“myself”	Open role interface
		“fire”+“debris”	Release fireball

Step 4：The speech-input instructions of collection user input, and it is converted into be identified group.Such as user says finger " show me a map " is made, phonetic entry word " show "+" me "+" a "+" map " is split as

Step 5：To be identified group " show "+" me "+" a "+" map " is carried out respectively with all speech recognition groups for setting Matching, three keywords of " show "+" me "+" map " occur all in this group, and order is correct, and probability of occurrence is 100%.All results are as follows：

Speech recognition group	Matching degree
		“show”+“me”+“map”	100%
“show”+“myself”	50%
		“fire”+“debris”	0%

Screened according to matching result, selected matching degree highest speech recognition group " show "+" me "+" map "

Step 6：In associated storage module, corresponding control command is searched out according to the speech recognition group that matching is filtered out (with reference to step 3) " interface of spreading out the map ", and this game commands is sent to into Gamecontrol system；

Step 7：The game feedback of correlation after Gamecontrol system receives the order at " interface of spreading out the map ", is carried out, is terminated This flow process.

By above example：

The present invention can be constituted by only selecting the speech data for only meeting software requirement from existing language model storehouse With targetedly small-sized language model storehouse, so as to the scale of construction of data is greatly reduced and save the collection of primary voice data into This.

It is additionally, since and uses word identification matching way, and non-voice implication RM so that related calculating Amount is greatly reduced, so as to improve the feedback speed of phonetic order.

Further, since the matching way of speech recognition group is used, it is only single comprising the key for arranging in advance in identification group Word, while more key words, matching degree is more accurate, and this can not only improve the phonetic order recognition success rate of long sentence, And user's deviation when phonetic order is input into is allowed, is facilitated user's memory and is used.

The above, preferable implementation example only of the invention is not intended to limit protection scope of the present invention.It is empty The technical staff for intending reality game field can be designed that a lot of other modifications, equivalent, and improved embodiment, wrap Include but be not limited to as：Technical ability is discharged in gaming using phonetic order, manipulates other game in gaming using phonetic order single Position etc..These modifications and embodiment will fall within spirit disclosed in the present application and spirit, and should be included in the present invention Protection domain within.

One skilled in the art will appreciate that except the system for realizing present invention offer in pure computer readable program code mode And its beyond each device, module, unit, the present invention can be provided by method and step is carried out programming in logic completely System and its each device, module, unit with gate, switch, special IC, programmable logic controller (PLC) and embedding Enter the form of the controller that declines etc. to realize identical function.So, system and its every device, module, list that the present invention is provided Unit is considered a kind of hardware component, and the device for realizing various functions that includes in which, module, unit also may be used With the structure being considered as in hardware component；It can both be real that will can also be used for realizing that the device of various functions, module, unit be considered as The software module of existing method can be the structure in hardware component again.

Above the specific embodiment of the present invention is described.It is to be appreciated that the invention is not limited in above-mentioned Particular implementation, those skilled in the art can make a variety of changes within the scope of the claims or change, this not shadow Ring the flesh and blood of the present invention.In the case where not conflicting, the feature in embodiments herein and embodiment can any phase Mutually combine.

Claims

1. one kind applies voice-operated method suitable for reality environment, it is characterised in that include：

Speech acquisition step：The speech-input instructions of collection user；

Phonetic order identification step：One or more phonetic entry words are extracted from the speech-input instructions of user, according to Phonetic entry word matched obtains phonetic order；

2. it is according to claim 1 to apply voice-operated method suitable for reality environment, it is characterised in that institute Speech acquisition step is stated, including：

Punctuate judges step：During the speech-input instructions of collection user, sending out equal to dead time threshold value is will be greater than Sound pauses as punctuate mark.

3. it is according to claim 2 to apply voice-operated method suitable for reality environment, it is characterised in that institute Acquisition time window setting procedure is stated, including：

Time window initial time setting procedure：In non-voice acquisition time window, using the moment of user operation input equipment as The initial time of current speech acquisition time window；

Time window end time setting procedure：Current speech acquisition time window continue when, by user operation input equipment when Carve the end time as this voice collecting time window.

4. it is according to claim 1 to apply voice-operated method suitable for reality environment, it is characterised in that institute Phonetic order identification step is stated, including：

Split word step：According to language model storehouse, one or more voices are extracted from the speech-input instructions of user defeated Enter word, one or more of phonetic entry words are constituted into be identified group；

Matching step：To be identified group is matched in language model storehouse, is matched with to be identified group in obtaining language model storehouse Speech recognition group；

Wherein, speech recognition group is corresponded with phonetic order.

5. it is according to claim 4 to apply voice-operated method suitable for reality environment, it is characterised in that institute Predicate speech model library module is only made by phonetic order and is obtained, including：

Phonetic order presets step：One or more phonetic orders are preset, wherein, phonetic order is stored in language model storehouse；

Speech recognition group construction step：For single phonetic order, by one or more keys extracted from phonetic order Word is configured to speech recognition group, and wherein, speech recognition group is stored in language model library module；

Order association step：Speech recognition group and control command are set up into one-to-one incidence relation, wherein, incidence relation is deposited Storage is in language model library module.

6. one kind applies voice-operated system suitable for reality environment, it is characterised in that include：

Voice acquisition module：The speech-input instructions of collection user；

Phonetic order identification module：One or more phonetic entry words are extracted from the speech-input instructions of user, according to Phonetic entry word matched obtains phonetic order；

7. it is according to claim 6 to apply voice-operated system suitable for reality environment, it is characterised in that institute Voice acquisition module is stated, including：

Punctuate judge module：During the speech-input instructions of collection user, sending out equal to dead time threshold value is will be greater than Sound pauses as punctuate mark.

8. it is according to claim 7 to apply voice-operated system suitable for reality environment, it is characterised in that institute Acquisition time window setting module is stated, including：

Time window initial time setting module：In non-voice acquisition time window, using the moment of user operation input equipment as The initial time of current speech acquisition time window；

Time window end time setting module：Current speech acquisition time window continue when, by user operation input equipment when Carve the end time as this voice collecting time window.

9. it is according to claim 6 to apply voice-operated system suitable for reality environment, it is characterised in that institute Predicate sound instruction identification module, including：

Split word module：According to language model storehouse, one or more voices are extracted from the speech-input instructions of user defeated Enter word, one or more of phonetic entry words are constituted into be identified group；

Matching module：To be identified group is matched in language model storehouse, is matched with to be identified group in obtaining language model storehouse Speech recognition group；

Wherein, speech recognition group is corresponded with phonetic order.

It is 10. according to claim 9 to apply voice-operated system suitable for reality environment, it is characterised in that Including：

Phonetic order presetting module：One or more phonetic orders are preset, wherein, phonetic order is stored in language model storehouse；

Speech recognition group builds module：For single phonetic order, by one or more keys extracted from phonetic order Word is configured to speech recognition group, and wherein, speech recognition group is stored in language model library module；

Order association module：Speech recognition group and control command are set up into one-to-one incidence relation, wherein, incidence relation is deposited Storage is in language model library module；