CN109712620A

CN109712620A - Voice interactive method, interactive voice equipment and storage device

Info

Publication number: CN109712620A
Application number: CN201811585283.5A
Authority: CN
Inventors: 徐小峰; 张晨; 黄海斌; 刘欣
Original assignee: Midea Group Co Ltd; Guangdong Midea White Goods Technology Innovation Center Co Ltd
Current assignee: Midea Group Co Ltd; Guangdong Midea White Goods Technology Innovation Center Co Ltd
Priority date: 2018-12-24
Filing date: 2018-12-24
Publication date: 2019-05-03

Abstract

This application discloses a kind of voice interactive method, interactive voice equipment and storage devices.Wherein, which comprises obtain to voice responsive data；Natural language processing is carried out to voice responsive data to described using non-network end data；Result based on the natural language processing is responded.Above scheme can reduce influence of the Network status to equipment interactive voice.

Description

Voice interactive method, interactive voice equipment and storage device

Technical field

This application involves speech processes fields, more particularly to a kind of voice interactive method, interactive voice equipment and storage Device.

Background technique

With the continuous development of information technology, user interaction techniques are widely used.And interactive voice be used as after User interaction patterns of new generation after keyboard mutuality, touch screen interaction, have the characteristics that convenient and efficient, are gradually widely used in In each field.For example, family's smart home device such as speaker can also add voice on its traditional function for field of household appliances Interactive mode, with the interaction between realization and user.

Currently, equipment needs to handle its collected user voice data when carrying out interactive voice with user And then it responds thereto.However, being usually destined to network-side since the process handled voice data is complex Special Language Processing platform is responded to realize by receiving the processing result that Language Processing platform is fed back.It is this according to Rely the speech interaction mode handled on line, the network condition that equipment is presently in can be limited to, if current network conditions are poor, Equipment possibly can not in time for user voice data carry out respond or can not directly respond, therefore extreme influence equipment with Normal voice interaction between user.

Summary of the invention

The application is mainly solving the technical problems that provide a kind of voice interactive method, interactive voice equipment and storage dress It sets, can reduce influence of the Network status to interactive voice.

To solve the above-mentioned problems, the application first aspect provides a kind of voice interactive method, which is characterized in that packet It includes: obtaining to voice responsive data；Natural language processing is carried out to the voice data using non-network end data；Based on described The result of natural language processing is responded.

To solve the above-mentioned problems, the application second aspect provides a kind of interactive voice equipment, including what is be mutually coupled Memory and processor；Wherein, the program instruction that the processor is used to execute the memory storage realizes above-mentioned method.

To solve the above-mentioned problems, the application third aspect provides a kind of storage device, and being stored with processor can run Program instruction, described program instruct for executing above-mentioned method.

In above scheme, the natural language processing of voice responsive data is treated using non-network end data, the treatment process It without relying on network end data, therefore can realize the interactive voice of off-line type, reduce and even avoid Network status to equipment voice Interactive influence.Moreover, because the data interaction of network-side needs to expend certain time, therefore compared to relying on network end data, Equipment directly relies on non-network end data and carries out natural language processing, and then the knot based on the natural language processing to voice data Fruit is responded, and the response speed of language interaction can be accelerated.

Detailed description of the invention

Fig. 1 is the flow diagram of one embodiment of the application voice interactive method；

Fig. 2 is the flow diagram of step S120 in another embodiment of the application voice interactive method；

Fig. 3 a is the flow diagram of step S223 in the application voice interactive method another embodiment

Fig. 3 b is the structural schematic diagram of one embodiment of the application interactive voice equipment；

Fig. 4 a is the flow diagram of the application voice interactive method another embodiment；

Fig. 4 b is the structural schematic diagram of one embodiment of the application voice interactive system；

Fig. 5 is the structural schematic diagram of another embodiment of the application interactive voice equipment；

Fig. 6 is the structural schematic diagram of one embodiment of the application storage device.

Specific embodiment

With reference to the accompanying drawings of the specification, the scheme of the embodiment of the present application is described in detail.

In being described below, for illustration and not for limitation, propose such as specific system structure, interface, technology it The detail of class, so as to provide a thorough understanding of the present application.

The terms " system " and " network " are often used interchangeably herein.The terms "and/or", only It is a kind of incidence relation for describing affiliated partner, indicates may exist three kinds of relationships, for example, A and/or B, can indicates: individually There are A, exist simultaneously A and B, these three situations of individualism B.In addition, character "/" herein, typicallys represent forward-backward correlation pair As if a kind of relationship of "or".In addition, " more " expressions two or more than two herein.

Referring to Fig. 1, Fig. 1 is the flow diagram of one embodiment of the application voice interactive method.In the present embodiment, the party Method is executed by interactive voice equipment, which can be the arbitrary equipment with processing capacity, for example, intelligent family Occupy equipment, computer, tablet computer, mobile phone etc..The smart home device can for air-conditioning, electric cooker, micro-wave oven, refrigerator, The household electric appliances such as speaker.Specifically, method includes the following steps:

S110: it obtains to voice responsive data.

Specifically, interactive voice equipment is equipped with voice collecting circuit, which specifically may include microphone etc. Sound collector and for being filtered to the collected electric signal of sound collector and the filter amplification circuit of enhanced processing. Interactive voice equipment detects the voice data in local environment by the voice collecting circuit, as to voice responsive number According to.By taking interactive voice equipment is intelligent air condition as an example, intelligent air condition can detect what user issued by the voice collecting circuit Voice data, and using the voice data as to voice responsive data, and then following S120 are executed to realize between user Interaction.

It is understood that reduce the circuit complexity of interactive voice equipment, and then reduce the body of interactive voice equipment Product, interactive voice equipment can also be obtained from other voice capture devices to voice responsive data.For example, the interactive voice equipment It can be connect by modes such as USB, bluetooths with voice capture device, voice capture device is detecting the voice number in local environment According to rear, by communication modes such as USB data line transmission or Bluetooth transmissions using the voice data detected as to voice responsive Data are transmitted to interactive voice equipment, so that interactive voice equipment is obtained to voice responsive data.

S120: natural language processing is carried out to the voice data using non-network end data.

Wherein, the non-network end data be not to be obtained by communication networks such as internet, local area network, mobile communications networks The teledata obtained, the non-network end data may include that the local data of interactive voice equipment and interactive voice equipment carry out The processing equipment of electrical connection (is such as electrically connected by USB interface, bus interface input/output interface mode with interactive voice equipment Processing equipment) storage data.

The present embodiment treats voice responsive data using non-network end data and carries out natural language processing, i.e., treats in realization Voice responsive data carry out being not necessarily to rely on arbitrary network end data during natural language processing, which only needs to rely on voice The non-network end data of interactive device can only be realized.Interactive voice equipment even if can also voluntarily carry out in off-line case as a result, To the processing of voice responsive data, without being influenced by network condition.The natural language processing process may include language to be responded Sound data are identified to obtain text data, and then carry out semantic understanding to text data, and generate correspondence according to semantic results Response data.

S130: the result based on the natural language processing is responded.

Carry out obtaining corresponding response data from Language Processing for example, S120 treats voice responsive data, such as execute instruction or For informing the reply data of user.Language interactive device executes this and executes instruction or soundplay or show the reply number According to respond the interaction to response language data, between realization and user of user's input.

It is understood that needing to treat voice responsive data to realize interactive voice and carrying out natural language processing, then It is responded based on natural language processing result.And realize the biggish processing capacity of natural language processing needs, therefore the application language The processing capacity of natural language processing can be realized configured with can not depend on network end data in sound interactive device, and then can realize The application can off-line type interactive voice.

The present embodiment using non-network end data treat voice responsive data carry out natural language processing, the treatment process without Network end data need to be relied on, therefore can realize the interactive voice of off-line type, reduces and even avoids Network status to the friendship of equipment voice Mutual influence.Moreover, because the data interaction of network-side needs to expend certain time, therefore compared to dependence network end data, language Sound interactive device directly relies on non-network end data and treats the progress natural language processing of voice responsive data, and then is based on the nature The result of Language Processing is responded, and the response speed of language interaction can be accelerated.

In the following, carrying out natural language processing using the voice responsive data for the treatment of that non-network end data is realized to the application Process carry out concrete example explanation.

Fig. 2 and Fig. 3 b is please referred to, Fig. 2 is the process signal of S120 in another embodiment of the application voice interactive method Figure, Fig. 3 b is the structural schematic diagram of one embodiment of the application interactive voice equipment.In the present embodiment, in above-mentioned S120 to described Natural language processing is carried out to voice responsive data, specifically includes following sub-step:

S221: will be to voice responsive data conversion at text data.

In order to realize subsequent natural language processing, interactive voice equipment is first led to what S110 was obtained to voice responsive data It crosses speech recognition the relevant technologies to be converted, to obtain corresponding text data.

Specifically, interactive voice equipment can will be written to voice responsive data conversion based on the vocabulary in local lexicon Notebook data.For example, interactive voice equipment is in local, there are acoustic model, language model and pronunciation dictionaries, wherein acoustic model It is the representation of knowledge to the difference of acoustics, phonetics, the variable of environment, speaker's gender, accent etc..Language model is to one group The representation of knowledge of word Sequence composition.Pronounceable dictionary (lexicon) is contained from word (words) to phoneme (phones) Mapping, effect are for connection to acoustic model and language model.The above-mentioned acoustic model prestored, language model and pronunciation dictionary At least one of include several vocabulary (such as pronunciation dictionary include several word-phonetics mapping), wherein this is used to deposit The database or data file for storing up vocabulary are referred to as local lexicon.Specifically, language interactive device first will be to voice responsive number Pattern match, the text data finally identified in conjunction with pronunciation dictionary and language model are carried out according to acoustic model.

Since the number of the vocabulary prestored in interactive voice equipment will affect the accuracy of the speech recognition, for example, if Interactive voice equipment does not prestore to vocabulary a certain in voice responsive data, then interactive voice equipment can not carry out the voice vocabulary Correct text conversion.Therefore, theoretically, the vocabulary that interactive voice equipment prestores is more, and speech recognition experience is better.But On the other hand, the vocabulary that interactive voice equipment prestores is more, and the requirement to the processing capacity of interactive voice equipment is higher, namely Interactive voice equipment may need to configure the processor of higher height reason ability.Therefore the accuracy to speech recognition can be comprehensively considered It is required that the processing capacity that can be supported with interactive voice equipment, determines the vocabulary that interactive voice equipment prestores.In an applied field Jing Zhong can prestore medium-scale vocabulary for the local lexicon of interactive voice equipment, for example, the vocabulary in the local lexicon Quantity is more than 1000 and less than 10000, and for example 1000,2000,5000 or 10000 etc..

In addition, include in local lexicon it is following at least one: set the vocabulary and universal word in field.Voice as a result, Interactive device can accordingly identify the voice data of setting field and general field.Wherein, field is set, alternatively referred to as vertical neck Domain refers to the preset field with certain industry background, set the vocabulary in field as with the stronger word of field relevance It converges, such as this sets field as field of household appliances, or is further field of air conditioning, the micro-wave oven field etc. of household electrical appliances, field of air conditioning Vocabulary include opening, air-conditioning, refrigeration, heating etc..For general field is Relative vertical field, and not only for a certain row Industry field, but it is generally applicable to be related to conglomerate field or conglomerate field, universal word is that conglomerate field may The vocabulary used, such as some spoken words: today, tomorrow, ask, help I, once, weather, such as how.

In an application scenarios, processing capacity needed for considering vocabulary decision interactive voice equipment can be by interactive voice Equipment is limited to the interactive voice for setting field, and then the vocabulary prestored needed for reducing.For example, interactive voice equipment The vocabulary in setting field is stored in local lexicon and the universal word that is likely to occur in the exchange in the setting field such as Above-mentioned spoken word.For example, interactive voice equipment is fan, field is set as fan art, therefore its vocabulary for setting field Including switch, wind speed etc, simultaneously because spoken randomness, it is also possible to it is related to many meaningless auxiliary vacabularies, such as: " me is helped to open fan ", " opening fan to me ", " fan please be open once ", " helping me ", " giving me " in these sentences, " asking ", " once ", these universal words are no for understanding in all senses, but for speech recognition, it is necessary to must cover this The general word of class.For the interaction for setting field typically for one, saying vocabulary is often limited, general Chinese Vocabulary is to a certain amount of as more than 2000 can substantially unimpeded exchange.Therefore, interactive voice equipment need to only prestore its setting field Vocabulary and relevant some universal words reach certain vocabulary, the interaction in setting field can be realized.

In the present embodiment, the step of which is converted into text, is by the automatic speech recognition in interactive voice equipment (Automatic Speech Recognition, ASR) module 31 is realized.The ASR module 31 can be stored with above-mentioned acoustic model, The local data files such as pronunciation dictionary, language model, for language data to be converted into the symbol sequence of computer capacity identification Column --- text.

It is understood that in other embodiments, above-mentioned acoustic model, pronunciation dictionary, language for speech recognition The local data files such as model can also be not stored in interactive voice equipment local, and be stored in and be electrically connected with interactive voice equipment In other processing equipments connect.

S222: judge whether text data belongs to the content in setting field.If so, executing S223, otherwise terminate process.

Specifically, interactive voice equipment is analyzed the text data obtained through S110 to determine what text data was related to Whether content belongs to setting field.In the present embodiment, interactive voice equipment is locally being preset with disaggregated model 32, this S222 can have Body includes: to be analyzed using local disaggregated model 32 text data, and determine the textual data based on analysis result According to whether belonging to the content in setting field.For example, interactive voice equipment prestores the classification established using deep learning scheduling algorithm The text data being converted to is input to the disaggregated model 32 and handled, obtained text data and belong to by model, ARS module 31 A possibility that setting field as a result, the possibility result may include execution degree when text data belongs to default field and/or Score.Obtained result a possibility that belonging to setting field is compared by interactive voice equipment with preset threshold, if possibility As a result it is greater than the preset threshold, then can determine that text data belongs to the content in setting field.

It is understood that in other embodiments, interactive voice equipment is not limited to the interactive voice in setting field When, S222 can not also be executed namely interactive voice equipment executes S221 and is not necessarily to judge directly to execute S223 later.

S223: natural language processing is carried out to text data.

Specifically, interactive voice equipment is rung in the case where judging that natural language to be processed belongs to the content in setting field Natural language processing should be carried out to the text data in the judging result.Wherein, interactive voice equipment can be used existing Natural language processing mode handles text data, to obtain the response data with text Data Matching, the number of responses According to being, for example, to execute instruction, or reply data for informing user.

In the present embodiment, as shown in Figure 3a, step S223 specifically includes following sub-step:

S223a: semantic understanding is carried out to text data, obtains semantic results.

Specifically, it first treats the corresponding text data of voice responsive data and carries out natural language understanding, also i.e. by textual data According to be converted into computer it will be appreciated that slot position information, the semantic of text data reception and registration can be obtained using the slot position information and believe Breath.The present embodiment combination Fig. 3 b is illustrated, and interactive voice equipment can be equipped with natural language understanding (Natural Language Understanding, NLU) module 33, sub-step S223a can realize by the NLU module 33.For example, NLU Module 33 receives the classification results of the output of disaggregated model 32, and the allocation result exported in disaggregated model 32 is text data category When setting field content, executes and semantic understanding is carried out to text data.

In one embodiment, sub-step S223a can include: slot position filling is carried out to the text data, obtains slot position letter Breath, wherein the slot position information is used to indicate the semantic results of the text data.Slot position (also referred to as semantic slot, slot) refers to The expression of semantic understanding, usually by some parameters to description, such as: to the corresponding text data of voice responsive data are as follows: " today Shenzhen weather is how ", there is following slot position information: { field: weather；Time: today；Address: Shenzhen }.Wherein, for language The field that sound interactive device is directed to is different, and slot position can be set as different.For example, interactive voice equipment be directed to set field as family Electrical domain, slot position information can include: equipment control, kitchen, scene etc..

In the present embodiment, interactive voice equipment is first filled into the text data in non-setting field, therefore only need to be to setting field Text data carry out semantic understanding, thus can reduce interactive voice equipment realize semantic understanding processing capability requirements.

S223b: the first response data is generated based on the semantic results.

Specifically, first response data can for generate it is matched execute instruction and execute this execute instruction or generate For interactive reply data.Wherein, interactive voice equipment can be equipped with dialogue management (Dialog Management, DM) module 34 (alternatively referred to as applying logic module 34), spatial term (Natural Language Generation, NLG) module 35 and Text To Speech synthesize (Text To Speech, TTS) module 36.The execution of sub-step S223b can be by DM module 34, the combination of NLG module 35 and TTS module 36 or part of module are realized.

In an application scenarios, it includes: to generate and the semantic results are matched holds that interactive voice equipment, which executes the S223b, Row instruction；Accordingly, interactive voice equipment is specifically included when executing above-mentioned S130 executes instruction described in execution；And/or it is raw At with the matched reply data of the semantic results, wherein the reply data can reply data or speech answering for text Data；For example, the generation replys data with the matched text of the semantic results, comprising: the determining and semantic results The reply content matched；Spatial term is carried out to the reply content, text is obtained and replys data；The generation and institute's predicate The adopted matched speech answering data of result, comprising: the determining and matched reply content of the semantic results；To the reply content Spatial term is carried out, text is obtained and replys data；The text is replied into data conversion into speech answering data.It is corresponding Ground, interactive voice equipment are specifically included when executing above-mentioned S130: showing that the text is replied described in data or voice broadcasting Speech answering data.

Wherein, this is executed instruction can be directly generated by DM module 34, and be exported to the execution unit of interactive voice equipment 37, which execute this, executes instruction, which is, for example, the flabellum driver of fan, compressor motor of air-conditioning etc..The reply Data can be first by the content to be replied of 34 determination of DM module, and the Content Transformation that will be replied using NLG module 35 is literary at generating The reply data of this type, are replied according to display mode, then export the reply data of text type to voice and hand over The display unit (Fig. 3 b does not show) of mutual equipment, to show the reply data by display unit；It is carried out according to voice broadcast mode Reply, then exported the reply data of text type to TTS module 36 by NLG module 35, with by TTS module by text type Reply data carry out text compressing, obtain the reply data of sound-type, and then by the reply data of the sound-type The sound playing unit 38 of output to interactive voice equipment carrys out voice and plays the reply data.It can be understood that, voice is handed over Mutual equipment, which can both generate to execute instruction or generate, replys data.For example, the voice data that air-conditioning receives user's sending " helps me to beat Open lower air-conditioning ", air-conditioning natural language processing is carried out to it and generate air-conditioning open instructions and reply data " it is good, beaten for you Turn on the aircondition ", air-conditioning starts itself in response to the air-conditioning open instructions, and by the voice broadcast reply data " it is good, for You open air-conditioning ".

It is understood that module included by interactive voice equipment described in Fig. 3 b or unit can be different programs Part can also be the functional module group realized by different circuits.

The present embodiment, interactive voice equipment filter out the voice data to non-setting field, only carry out the language in setting field Sound data carry out natural language processing, can reduce during interactive voice equipment realizes speech recognition and other natural language processings Required processing capacity, and then reduce the realization difficulty and cost of interactive voice equipment.Moreover, because the correlation in setting field Vocabulary is limited, therefore within the scope of the vocabulary of processing capacity for meeting interactive voice equipment, it is general that more correlation can be increased Vocabulary, and then more reliable accurately understanding can be carried out to the natural language in the setting field, therefore can realize to a certain extent true Positive natural language interaction.

Fig. 4 a is please referred to, Fig. 4 a is the flow diagram of the application voice interactive method another embodiment.In the present embodiment, This method is executed by interactive voice equipment, which can be the arbitrary equipment with processing capacity, for example, intelligence Can home equipment, computer, tablet computer, mobile phone etc. specifically, method includes the following steps:

S410: it obtains to voice responsive data.

S420: natural language processing is carried out to voice responsive data to described using non-network end data.

S430: the result based on the natural language processing obtained using non-network end data is responded.

Wherein, the explanation of step S410-S430 please refers to the associated description of above example, and this will not be repeated here.

S440: whether judge, which can treat response language data using non-network end data, carries out natural language processing.If Be then to execute S450, otherwise, process can be terminated, with and receive it is new when response data, continue to execute the application voice friendship Mutual method.

In the present embodiment, S420 step is to rely on non-network end data to carry out natural language to voice responsive data to described Processing, it is found that the realization of the natural language processing to voice responsive data is needed by interactive voice as described in above example Pre-stored some relative words or other data in equipment, since the processing capacity of interactive voice equipment is limited, therefore its is pre- The related data deposited is also limited, thus there may be using interactive voice equipment local pre-stored data can not to voice data into The case where row speech recognition and subsequent natural language processing.Therefore whether interactive voice equipment monitor itself after executing S420 Based on carrying out natural language processing to voice responsive data and responded based on processing result, for example, detection is in preset time Inside whether there is the response action (such as execute instruction or voice plays and replys data) carried out to described to voice responsive data.If It is not detected in preset time to the response action to voice responsive data, it is determined that utilize non-network end number to detect According to natural language processing can not be carried out to voice responsive data to described, namely off-line type interactive voice cannot achieve at present, this When, interactive voice equipment then considers that relying on network-side carries out natural language processing to voice responsive data to described.

S450: it relies on network end data and carries out natural language processing to voice responsive data to described.

Whether interactive voice equipment can be detected first currently can realize network communication, if cannot, it initiates network connection and asks It asks, communication link is established to carry out subsequent data interaction with request.After determining achievable network communication, interactive voice equipment Network end data is relied on to carry out natural language processing to voice responsive data to described.For example, as shown in Figure 4 b, interactive voice System includes interactive voice equipment 41 and network side equipment 42 interconnected, and the interactive voice equipment 41 is as described above Interactive voice equipment, network side equipment 42 can be for remote server etc., for carrying out voice to the voice data received Identification and natural language processing, to generate the second response data.Specifically, interactive voice equipment 41 will be described to voice responsive Data are sent to network side equipment 42；Interactive voice equipment 41 receives the second response data that the network side equipment 42 is fed back； Wherein, second response data is that the network side equipment is obtained to described to the progress natural language processing of voice responsive data It arrives, executes instruction or reply data as described above.Wherein, network side equipment is based on described to voice responsive data progress nature The process of Language Processing can refer to interactive voice equipment itself and carry out natural language processing to voice responsive data based on described Process, therefore this will not be repeated here.

S460: it is responded based on the result for relying on the natural language processing that network end data obtains.

For example, interactive voice equipment is after the second response data for receiving network-side feedback, which is held Row second response data, such as execute corresponding instruction or voice broadcasting reply data etc..Specifically, it can refer to above-mentioned S130 step Rapid associated description.

In the present embodiment, when off-line type interactive voice cannot achieve, interactive voice equipment can further rely on network-side Data come realize be based on it is described carry out natural language processing to voice responsive data and responded, namely by realizing language on line Sound interaction, thus using the combination on offline and line, it is ensured that the reliability of interactive voice.

Referring to Fig. 5, Fig. 5 is the structural schematic diagram of one embodiment of the application interactive voice equipment.In the present embodiment, the language Sound interactive device 50 includes memory 51, processor 52 and telecommunication circuit 53.Wherein, telecommunication circuit 53, memory 51 distinguish coupling Connect processor 52.Specifically, the various components of interactive voice equipment 50 can be coupled by bus or interactive voice is set Standby 50 processor is connect with other assemblies one by one respectively.The interactive voice equipment 50 can be computer, mobile phone, smart home The equipment that equipment etc. has certain processing capacity.Wherein, smart home device can be air-conditioning, electric cooker, micro-wave oven, electric ice The household electric appliances such as case, speaker.

Telecommunication circuit 53 with other equipment for communicating.For example, working as the natural language that interactive voice equipment needs to provide on line When saying processing function, telecommunication circuit 53 can be communicated with network side equipment.

Memory 51 is used for the data of the program instruction and processor 52 of the execution of storage processor 52 during processing Such as non-network end data, wherein the memory 51 includes non-volatile memory portion, for storing above procedure instruction.

Processor 52 controls the operation of interactive voice equipment 50, and processor 52 can also be known as CPU (Central Processing Unit, central processing unit).Processor 52 may be a kind of IC chip, the processing energy with signal Power.Processor 52 can also be general processor, digital signal processor (DSP), specific integrated circuit (ASIC), ready-made compile Journey gate array (FPGA) either other programmable logic device, discrete gate or transistor logic, discrete hardware components.It is logical It can be microprocessor with processor or the processor be also possible to any conventional processor etc..In addition, processor 52 can To be realized jointly by multiple at circuit chip.

In the present embodiment, the program instruction that processor 52 is stored by calling memory 51, it is any of the above-described for executing The step of embodiment of the method method.

For example, processor 52 is for obtaining to voice responsive data；Using non-network end data to described to voice responsive Data carry out natural language processing；Result based on the natural language processing, which is responded, (such as controls above-mentioned execution list The components such as member and voice playing unit execute and the matched operation of natural language processing result).

In some embodiments, processor 52 execute it is described to it is described to voice responsive data carry out natural language processing packet It includes: to voice responsive data conversion at text data by described in；Natural language processing is carried out to the text data.

Further, memory 51 is stored with lexicon, and processor 52, which executes, to be turned described in described incite somebody to action to voice responsive data Change text data into, it may include: based on the vocabulary in local lexicon will it is described to voice responsive data conversion at text data.

Wherein, the vocabulary quantity in the local lexicon can be for more than 1000 and less than 10000.

Wherein, may include in the local lexicon it is following at least one: set the vocabulary and universal word in field.

In some embodiments, processor 52 execute it is described will it is described to voice responsive data conversion at text data it Afterwards, it is also used to: judging whether the text data belongs to the content in setting field；Processor 52 executes described to the textual data According to progress natural language processing, comprising: the content for belonging to the setting field in response to the text data, to the textual data According to progress natural language processing.

In some embodiments, memory 51 is stored with disaggregated model, and processor 52 executes the judgement textual data According to whether belonging to the content in setting field, comprising: the text data is analyzed using local disaggregated model, and based on point Analysis result determines whether the text data belongs to the content in setting field.

In some embodiments, processor 52 executes described to text data progress natural language processing, comprising: right The text data carries out semantic understanding, obtains semantic results；The first response data is generated based on the semantic results.

In some embodiments, processor 52 executes described based on the semantic results the first response data of generation, can wrap It includes: generating and the semantic results are matched executes instruction；The result based on the natural language processing is responded, packet It includes: being executed instruction described in execution.

In some embodiments, described to generate the first response data based on the semantic results, comprising: to generate and institute's predicate The adopted matched reply data of result, wherein the reply data are that text replys data or speech answering data；It is described to be based on institute The result for stating natural language processing is responded, comprising: shows that the text replys data or voice plays the speech answering Data.

Wherein, processor 52 executes the generation and the matched speech answering data of the semantic results, it may include: it determines With the matched reply content of the semantic results；Spatial term is carried out to the reply content, text is obtained and replys data； The text is replied into data conversion into speech answering data.

In some embodiments, whether processor 52 is also used to judge to utilize non-network end data can be to described wait respond Language data carries out natural language processing；In response to that can be carried out certainly to described to response language data using non-network end data Right Language Processing relies on network end data and carries out natural language processing to voice responsive data to described.

Further, processor 52 execute it is described judge using non-network end data whether can be to described to response language Data carry out natural language processing, it may include: whether detection has within a preset time carries out to described to voice responsive data Response action；Processor 52 executes institute's dependence network end data and carries out at natural language to described to voice responsive data Reason, it may include: network side equipment is sent to voice responsive data for described by telecommunication circuit 53；It is connect by telecommunication circuit 53 Receive the second response data of the network side equipment feedback, wherein second response data is the network side equipment to institute It states and is obtained to the progress natural language processing of voice responsive data.

It is understood that in another embodiment, if interactive voice equipment is not necessarily to be communicated with network-side, the language Sound interactive device may not include above-mentioned telecommunication circuit 53.

Referring to Fig. 6, the application also provides a kind of embodiment of storage device.In the present embodiment, which is deposited The program instruction 61 that processor can be run is contained, which is used to execute the method in above-described embodiment.

The storage device 60 be specifically as follows USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), Random access memory (RAM, Random Access Memory), magnetic or disk etc. can store Jie of program instruction Matter, or may be the server for being stored with the program instruction, which can be sent to other for the program instruction of storage Equipment operation, or can also be with the program instruction of the self-operating storage.

In some embodiments, storage device 60 can also be memory as shown in Figure 5.

In several embodiments provided herein, it should be understood that disclosed method and apparatus can pass through it Its mode is realized.For example, device embodiments described above are only schematical, for example, stroke of module or unit Point, only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components can To combine or be desirably integrated into another system, or some features can be ignored or not executed.Another point, it is shown or beg for The mutual coupling, direct-coupling or communication connection of opinion can be through some interfaces, the INDIRECT COUPLING of device or unit Or communication connection, it can be electrical property, mechanical or other forms.

Unit may or may not be physically separated as illustrated by the separation member, shown as a unit Component may or may not be physical unit, it can and it is in one place, or may be distributed over multiple networks On unit.It can select some or all of unit therein according to the actual needs to realize the mesh of present embodiment scheme 's.

It, can also be in addition, each functional unit in each embodiment of the application can integrate in one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.

It, can if integrated unit is realized in the form of SFU software functional unit and when sold or used as an independent product To be stored in a computer readable storage medium.Based on this understanding, the technical solution of the application substantially or Say that all or part of the part that contributes to existing technology or the technical solution can embody in the form of software products Out, which is stored in a storage medium, including some instructions are used so that a computer equipment (can be personal computer, server or the network equipment etc.) or processor (processor) execute each implementation of the application The all or part of the steps of methods.And storage medium above-mentioned include: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. it is various It can store the medium of program code.

Claims

1. a kind of voice interactive method characterized by comprising

It obtains to voice responsive data；

Natural language processing is carried out to voice responsive data to described using non-network end data；

Result based on the natural language processing is responded.

2. the method according to claim 1, wherein described carry out natural language to voice responsive data to described Processing, comprising:

To voice responsive data conversion at text data by described in；

Natural language processing is carried out to the text data.

3. according to the method described in claim 2, it is characterized in that, to voice responsive data conversion at textual data described in the general According to, comprising:

Based on the vocabulary in local lexicon will it is described to voice responsive data conversion at text data；

Wherein, include in the local lexicon it is following at least one: set the vocabulary and universal word in field.

4. according to the method described in claim 2, it is described will it is described to voice responsive data conversion at text data after, institute State method further include:

Judge whether the text data belongs to the content in setting field；

It is described that natural language processing is carried out to the text data, comprising:

The content for belonging to the setting field in response to the text data carries out natural language processing to the text data.

5. according to the method described in claim 4, the content for judging the text data and whether belonging to setting field, packet It includes:

The text data is analyzed using local disaggregated model, and whether the text data is determined based on analysis result Belong to the content in setting field.

6. according to the method described in claim 2, it is characterized in that, it is described to the text data carry out natural language processing, Include:

Semantic understanding is carried out to the text data, obtains semantic results；

The first response data is generated based on the semantic results.

7. according to the method described in claim 6, it is characterized in that, described generate the first number of responses based on the semantic results According to, comprising:

It generates and the semantic results are matched executes instruction；

The result based on the natural language processing is responded, comprising:

It is executed instruction described in execution.

8. according to the method described in claim 6, it is characterized in that, described generate the first number of responses based on the semantic results According to, comprising:

It generates and the matched reply data of the semantic results, wherein the reply data are that text replys data or voice returns Complex data；

The result based on the natural language processing is responded, comprising:

Show that the text replys data or voice plays the speech answering data.

9. according to the method described in claim 8, it is characterized in that, the generation and the matched reply number of the semantic results According to, comprising:

The determining and matched reply content of the semantic results；

Spatial term is carried out to the reply content, text is obtained and replys data；

The text is replied into data conversion into speech answering data.

10. the method according to claim 1, wherein the method also includes:

Judge whether natural language processing can be carried out to response language data to described using non-network end data；

In response to using non-network end data natural language processing can be carried out to response language data to described, network-side is relied on Data carry out natural language processing to voice responsive data to described.

11. according to the method described in claim 10, it is characterized in that, whether described judge to utilize non-network end data can be right It is described to carry out natural language processing to response language data, comprising:

Whether detection has within a preset time carries out to the response action to voice responsive data；

The dependence network end data carries out natural language processing to voice responsive data to described, comprising:

Network side equipment is sent to voice responsive data by described；

Receive the second response data of the network side equipment feedback, wherein second response data is that the network-side is set It is standby to be obtained to described to the progress natural language processing of voice responsive data.

12. a kind of interactive voice equipment, which is characterized in that including the memory and processor being mutually coupled；Wherein, the processing The program instruction that device is used to execute the memory storage realizes the described in any item methods of claim 1 to 11.

13. a kind of storage device, which is characterized in that be stored with the program instruction that can be run by processor, described program instruction For realizing the described in any item methods of claim 1 to 11.