CN109830240A

CN109830240A - Method, apparatus and system based on voice operating instruction identification user's specific identity

Info

Publication number: CN109830240A
Application number: CN201910229227.6A
Authority: CN
Inventors: 刘红强
Original assignee: Chumen Wenwen Information Technology Co Ltd
Current assignee: Chumen Wenwen Information Technology Co Ltd
Priority date: 2019-03-25
Filing date: 2019-03-25
Publication date: 2019-05-31

Abstract

The embodiment of the invention discloses a kind of method, apparatus and system based on voice operating instruction identification user's specific identity, this method comprises: the voice operating instruction to pre-acquiring pre-processes, obtain feature tag corresponding with voice operating instruction；Feature tag is input in pre-established training pattern and is predicted, obtains prediction result；According to prediction result, the user's specific identity for issuing voice operating instruction is determined.By this kind of mode, the specific identity of user can be effectively identified, and then when determining that the specific identity and voice operating instruction are corresponding, execute voice operating instruction, be reached for special population and the effect of special function service is provided.

Description

Method, apparatus and system based on voice operating instruction identification user's specific identity

Technical field

The present embodiments relate to speech signal processing technologies, and in particular to one kind is based on voice operating instruction identification The method, apparatus and system of user's specific identity.

Background technique

With the rapid development of intelligent sound identification technology, more and more interactive voice products are provided convenience for people Service.For example, intelligent sound box, can identify the phonetic order of user, and then execute accordingly according to the phonetic order of user Operation.

But, present intelligent sound interactive product is only the voice operating instruction that can identify user, is not distinguished Issue user's specific identity of voice operating instruction.This for those exclusively for special population provide special function service for, It is then a kind of limitation.Such as children's book playing function is provided in intelligent sound box, and the use condition of this playing function After the phonetic control command issued for identification children, corresponding play operation can be just executed.However, traditional speech recognition technology Can not accurately identify whether the current user for issuing voice is children, therefore, for certain voices provided exclusively for children Service also can not be executed normally.

So, how to go out to issue the specific body of user of voice according to the voice operating voice operating instruction identification of user's input Part, and then execute special operation corresponding with user's specific identity, then become technical problems to be solved in this application.

Summary of the invention

For this purpose, the embodiment of the present invention provides a kind of method, apparatus based on voice operating instruction identification user's specific identity And system, it can not be instructed according to the voice operating of user with solving existing speech recognition apparatus, identify the identity of user, thus The technical issues of causing special operation corresponding with user's specific identity that cannot accurately execute.

To achieve the goals above, the embodiment of the present invention provides the following technical solutions:

According to a first aspect of the embodiments of the present invention, it provides a kind of based on voice operating instruction identification user's specific identity Method, this method comprises:

Further, the voice operating instruction of pre-acquiring is pre-processed, obtains spy corresponding with voice operating instruction Levy label；

Feature tag is input in pre-established training pattern and is predicted, obtains prediction result；

According to prediction result, the user's specific identity for issuing voice operating instruction is determined.

Further, the voice operating instruction of pre-acquiring is pre-processed, obtains spy corresponding with voice operating instruction Label is levied, is specifically included:

Semantics recognition processing is carried out to the voice operating instruction of pre-acquiring, obtains semantic text content；

According to the first preset rules, semantic feature is extracted from semantic text content；

Feature tag corresponding with semantic feature is matched from pre-established database.

Further, it when not being matched to feature tag corresponding with semantic feature from pre-established database, determines Voice operating instruction ignore stops executing subsequent operation；

Alternatively, being pressed when being matched at least one feature tag corresponding with semantic feature from pre-established database According to the second preset rules, validity feature label is screened, from least one feature tag so as to subsequent that validity feature label is defeated Enter into pre-established training pattern and is predicted.

Further, the voice operating instruction of pre-acquiring is pre-processed, obtains spy corresponding with voice operating instruction Before levying label, method further include:

By sound groove recognition technology in e, the voice operating for obtaining pre-acquiring instructs corresponding user type；

The pre-established training pattern is determined according to the user type of acquisition.

Further, pre-established training pattern is decision-tree model.

Further, pre-established training pattern construction step includes:

A plurality of voice operating instruction sample is received and stored, a plurality of voice operating instruction sample is sent out by different user Out；

According to sound groove recognition technology in e, classify to voice operating instruction sample, the voice that same user is issued Operational order sample is classified as one kind, and operational order sample of users type is marked；

Each voice operating instruction sample is pre-processed respectively, obtains spy corresponding with voice operating instruction sample Levy label；

Feature tag corresponding with voice operating instruction sample is input in training pattern and is trained, until obtaining energy It, will be described optimal when accurately identifying the optimal training pattern of the corresponding feature tag of a certain type of user voice operating instruction sample Pre-established training pattern of the training pattern as the user type.

According to a second aspect of the embodiments of the present invention, it provides a kind of based on voice operating instruction identification user's specific identity Device, which includes:

Processing unit is pre-processed for the voice operating instruction to pre-acquiring, is obtained corresponding with voice operating instruction Feature tag；

Predicting unit is predicted for feature tag to be input in pre-established training pattern, obtains prediction result；

Identity recognizing unit, for determining the user's specific identity for issuing voice operating instruction according to prediction result.

Further, processing unit is also used to, to pre-acquiring voice operating instruction pre-process, obtain with it is described Before voice operating instructs corresponding feature tag, by sound groove recognition technology in e, the voice operating instruction for obtaining pre-acquiring is corresponded to User type；

According to a third aspect of the embodiments of the present invention, it provides a kind of based on voice operating instruction identification user's specific identity System, which includes: processor and memory；

Memory is for storing one or more program instructions；

Processor, it is as above a kind of based on voice operating instruction knowledge to execute for running one or more program instructions Method step either in the method for other user's specific identity.

According to a fourth aspect of the embodiments of the present invention, a kind of computer storage medium is provided, the computer storage medium In comprising one or more program instructions, one or more program instructions are used to be based on voice operating instruction identification user by one kind The system of specific identity either executes in a kind of as above method based on voice operating instruction identification user's specific identity method Step.

The embodiment of the present invention has the advantages that be pre-processed by the voice operating instruction issued to user, is obtained Feature tag corresponding with voice operating instruction.Then this feature label is input in pre-established training pattern and is carried out in advance It surveys, which is that can be instructed according to voice operating after great amount of samples data are trained, identify spy not of the same race Determine the user of identity.For example, user's specific identity may include the elderly, a middle-aged person, young people or children etc..Assuming that with The voice operating instruction that family issues is plays children's book, and when identifying user's specific identity for children by this method, then The function of playing children's book is executed, does not otherwise execute the function.By this kind of mode, the specific body of user can be effectively identified Part, and then when determining that the specific identity and voice operating instruction are corresponding, execute voice operating instruction.It is reached for special population The effect of special function service is provided.

Detailed description of the invention

It, below will be to embodiment party in order to illustrate more clearly of embodiments of the present invention or technical solution in the prior art Formula or attached drawing needed to be used in the description of the prior art are briefly described.It should be evident that the accompanying drawings in the following description is only It is merely exemplary, it for those of ordinary skill in the art, without creative efforts, can also basis The attached drawing of offer, which is extended, obtains other implementation attached drawings.

Structure depicted in this specification, ratio, size etc., only to cooperate the revealed content of specification, for Those skilled in the art understands and reads, and is not intended to limit the invention enforceable qualifications, therefore does not have technical Essential meaning, the modification of any structure, the change of proportionate relationship or the adjustment of size are not influencing the function of the invention that can be generated Under effect and the purpose that can reach, should all still it fall in the range of disclosed technology contents obtain and can cover.

Fig. 1 is a kind of method stream based on voice operating instruction identification user's specific identity that the embodiment of the present invention 1 provides Journey schematic diagram；

Fig. 2 is a kind of device knot based on voice operating instruction identification user's specific identity that the embodiment of the present invention 2 provides Structure schematic diagram；

Fig. 3 is a kind of system knot based on voice operating instruction identification user's specific identity that the embodiment of the present invention 3 provides Structure schematic diagram.

Specific embodiment

Embodiments of the present invention are illustrated by particular specific embodiment below, those skilled in the art can be by this explanation Content disclosed by book is understood other advantages and efficacy of the present invention easily, it is clear that described embodiment is the present invention one Section Example, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not doing Every other embodiment obtained under the premise of creative work out, shall fall within the protection scope of the present invention.

The embodiment of the present invention 1 provides a kind of method based on voice operating instruction identification user's specific identity, this method After being mainly used in user's sending specific human voices operational order that those need specific identity, voice interactive system can just be executed Application scenarios, such as the voice operating instruction that user issues is the application scenarios for playing children's book, voice interactive system only has When detecting that issuing the user identity of broadcasting this voice operating of children's book instruction is children, it can just execute the voice operating and refer to It enables, if user identity is and children not that will not respond voice operating instruction.In other words, this operating method application It is specific as shown in Figure 1, this method walks in the situation for the special operation instruction execution special function that system is issued according to special population It is rapid as follows:

Step 110, the voice operating instruction of pre-acquiring is pre-processed, obtains feature corresponding with voice operating instruction Label.

It should be noted that before being only with the voice operating instruction that voice interactive system has got user's sending here It mentions and being illustrated.Extraneous voice signal is collected as voice interactive system, and a series of processing is carried out to voice signal, Originally it is not the claimed emphasis of the present invention to obtain the process of voice operating instruction, and can also be executed by the prior art It realizes, therefore here without being described in detail.

After getting voice operating instruction, corresponding preprocessing process can be executed.

Optionally, preprocessing process may include analyzing voice operating instruction.It is specific:

Semantics recognition processing is carried out to the voice operating instruction of pre-acquiring, obtains semantic text content.It is specific at one In example, language can be carried out using neural LISP program LISP (Neuro-Linguistic Programming, abbreviation NLP) technology Justice identification.Then according to the first preset rules, semantic feature is extracted from speech text content.

Specifically, the first preset rules can be word segmentation processing, it that is to say and word segmentation processing is carried out to semantic text content, from And obtain presetting keyword.These keywords can be expressed as semantic feature.For example, when voice operating instruction is " broadcasting When the song of one head Zhang Jie, First Lady ", keyword may include " Zhang Jie ", " song " and " First Lady " etc..

Finally, matching feature tag corresponding with semantic feature from pre-established database.

So, it when semantic feature is " Zhang Jie ", " song " and " First Lady ", can be carried out first according to phonetic feature Positioning that is to say and match classification corresponding with the semantic feature in pre-established database, it is evident that it can be classified as music Then class finds feature tag corresponding with the semantic feature from music class label.Such as label corresponding with " Zhang Jie " can To include " singer ", " 80, favorite singer after 90s ", " song " corresponding label can be " song ", and " First Lady " is corresponding Label can be " song title " etc..

Optionally, although in example cited hereinabove, voice operating instruction can be matched to corresponding feature mark Label, but corresponding feature tag can not be matched there are also situation or corresponding feature tag is invalid.The process is understood that To filter missing values, it that is to say and instructed when according to the voice operating of user, semanteme can not be analyzed, or the semanteme analyzed refers to Order can not be handled, then need to filter out.

Such as user is in chat process, it is not intended to which a word said is collected by voice interactive system, then this sentence Words are likely to that feature tag can not be matched in the database, then the voice operating instruction ignore can be determined, stop executing Subsequent operation.

Or when be matched to from pre-established database feature tag corresponding with the semantic feature quantity be to When one few, for example including 2 or 3.It so also needs to be filtered feature tag, deletes and some do not meet convention Feature tag, and effective feature tag is input in pre-established training pattern and is trained.It can be according to when screening Second preset rules are screened.Specific second preset rules may include: the processing of Exception Filter value, filter at repeated data The treatment processes such as reason and filtering noise data.

Wherein, Exception Filter value includes: that deviate group too far for the feature vector calculated according to label, in this embodiment What it was referred to is that feature vector is more than polymerization site point preset threshold；And filter repeated data and include: the instruction being such as collected into is The same instruction of same time, then its corresponding feature tag be necessarily also it is duplicate, then filter out those Duplicate feature tag.Noise data is filtered, including uses regression algorithm, outlier is found out and carries out noise elimination.

It is stored treated voice operating instruction will be passed through, the form of storage is textual form, i.e., by language Sound operational order is stored after being converted to text.And feature tag is input in pre-established training pattern and is carried out in advance It surveys, i.e. execution step 120.

Step 120, feature tag is input in pre-established training pattern and is predicted, obtain prediction result.

Specifically, pre-established training pattern is to be trained rear acquired optimal instruction to it by great amount of samples data Practice model.When being executed, it needs that known identities label is set separately to a large amount of sample data.For example, the elderly, a middle-aged person, The identity labels such as young people and children or other identity labels.Optionally, training pattern can be decision tree mould Type.Specifically training process may include:

A plurality of voice operating instruction sample is received and stored, a plurality of voice operating instruction sample is issued by different user.

According to sound groove recognition technology in e, classify to voice operating instruction sample, the voice operating that same user is issued Instruction sample is classified as one kind, and after classification, and voice operating instruction sample of users type is marked.

Each voice operating instruction sample is pre-processed respectively, obtains spy corresponding with voice operating instruction sample Levy label.It is specifically similar as described in step 110 with treatment process, do not do excessive explanation here.

Then, feature tag corresponding with voice operating instruction sample is input in training pattern and is trained, until When acquisition can accurately identify the optimal training pattern of the corresponding feature tag of a certain type voice operational order sample, most by this Pre-established training pattern of the excellent training pattern as the user type.

Finally, after getting optimal training pattern, feature tag obtained in step 110 is input to optimal trained mould In type, so that prediction, which obtains voice operating acquired in step 110, instructs corresponding user's specific identity, i.e. execution step 130。

Step 130, according to prediction result, the user's specific identity for issuing voice operating instruction is determined.

After determining that user's specific identity is effective, further according to voice operating instruction execution corresponding operation.Here determine to use Specific identity whether effective principle in family is to determine whether there is control voice interactive system by user identity and execute the operation The permission of instruction.If had permission, voice interactive system executes voice operating instruction, does not otherwise execute.Optionally, may be used also The prompt information of corresponding actions is executed according to the operational order that user issues to user feedback lack of competence.

Optionally, it is also necessary to before executing the above method, that is to say before executing step 110, execute following steps:

By sound groove recognition technology in e, the voice operating for obtaining pre-acquiring instructs corresponding user type.Then, further according to obtaining The user type taken determines the pre-established training pattern.

Specifically, vocal print feature is extracted from voice operating instruction respectively by sound groove recognition technology in e, and according to vocal print spy Sign forms one group of feature description vectors, and generates unique identification information, such as ID to this group of feature vector.The mark is believed Breath is matched with the identification information in pre-established database, is determined that the voice operating instructs affiliated type, is needed to illustrate It is that type described in the present embodiment is the type distinguished according to user.Such as user A, user B, user C etc..User A is corresponding Voice operating instruction storage to the first preset memory locations, the corresponding voice operating instruction storage of user B is preset to second Storage location, and so on.After determining user type, it is also necessary to determine pre-established training pattern according to user type.

It is different based on user type specifically, in present specification, corresponding to training pattern be also not quite similar. It is mainly in view of in present specification in certain special populations, such as some family, an adult sound compares as children Sound.So, how the feature of the voice operating instruction based on the user correctly identifies the identity of the user, rather than misidentifies For children?

It seeks to establish a training pattern corresponding with the user identity, the voice operating instruction sample of the user is done Special marking out is input to and the user identity after the voice operating instruction of other users is then extracted feature tag respectively It is trained in corresponding training pattern, until training pattern corresponding with the user identity can accurately identify user's body Part.

It in other words, can be respectively for each use for the identity characteristic of more accurate each user of determination Family constructs corresponding training pattern.Specific building process is described in detail above, does not do excessive theory here It is bright.

Naturally, needing after determining that voice operating instructs corresponding user type by sound groove recognition technology in e for each A user type selects corresponding pre-established training pattern.In turn, it can accurately just identify that every a kind of user's is specific Identity.And according to the user identity, determine whether user has permission control voice interactive system and execute such operational order.

Further alternative, if multiple users have permission control, the execution of voice interactive system current time is corresponding to it Operational order when, if exceeding envelop of function of the system achieved by synchronization, system can issue operational order The response message of mistake.

A kind of method based on voice operating instruction identification user's specific identity provided in an embodiment of the present invention, by with The voice operating instruction that family issues is pre-processed, and feature tag corresponding with voice operating instruction is obtained.Then by the spy Sign label, which is input in pre-established training pattern, to be predicted, which is to be trained by great amount of samples data Afterwards, it can be instructed according to voice operating, identify the user of specific identity not of the same race.For example, user's specific identity may include The elderly, a middle-aged person, young people or children etc..Assuming that the voice operating instruction that user issues is broadcasting children's book, and When identifying that user's specific identity is children by this method, then the function of playing children's book is executed, does not otherwise execute the function Energy.By this kind of mode, the specific identity of user can be effectively identified, and then determining the specific identity and voice operating instruction When corresponding, voice operating instruction is executed.It is reached for special population and the effect of special function service is provided.

Corresponding with above-described embodiment 1, the embodiment of the present invention 2 additionally provides a kind of based on voice operating instruction identification user The device of specific identity, specifically as shown in Fig. 2, the device includes: processing unit 201, predicting unit 202 and identity recognizing unit 203。

Processing unit 201 is pre-processed for the voice operating instruction to pre-acquiring, is obtained and voice operating instruction pair The feature tag answered；

Predicting unit 202 is predicted for feature tag to be input in pre-established training pattern, obtains prediction knot Fruit；

Identity recognizing unit 203, for determining the user's specific identity for issuing voice operating instruction according to prediction result.

Optionally, processing unit 201 is specifically used for, and carries out semantics recognition processing to the voice operating instruction of pre-acquiring, obtains Take semantic text content；

Optionally, processing unit 201 is specifically used for, corresponding with semantic feature when not being matched to from pre-established database Feature tag when, determine voice operating instruction ignore, stop executing subsequent operation；

Optionally, processing unit is also used to, and is pre-processed, is obtained and institute's predicate in the voice operating instruction to pre-acquiring Before the corresponding feature tag of sound operational order, by sound groove recognition technology in e, the voice operating instruction for obtaining pre-acquiring is corresponding User type；

Optionally, device further include: taxon 204 and receiving unit 205；

Receiving unit 205, for receiving and storing a plurality of voice operating instruction sample, a plurality of voice operating instruction sample by Different user issues；

Taxon 204 will be same for classifying to voice operating instruction sample according to sound groove recognition technology in e The voice operating instruction sample that one user issues is classified as one kind, and operational order sample of users type is marked；

Processing unit 201 is also used to, and is pre-processed respectively to each voice operating instruction sample, is obtained and grasp with voice Make the corresponding feature tag of instruction sample；

Optionally, pre-established training pattern is decision-tree model.

Each component in a kind of device based on voice operating instruction identification user's specific identity provided in an embodiment of the present invention Performed function has been discussed in detail in above-described embodiment 1, therefore does not do excessively repeat here.

A kind of device based on voice operating instruction identification user's specific identity provided in an embodiment of the present invention, by with The voice operating instruction that family issues is pre-processed, and feature tag corresponding with voice operating instruction is obtained.Then by the spy Sign label, which is input in pre-established training pattern, to be predicted, which is to be trained by great amount of samples data Afterwards, it can be instructed according to voice operating, identify the user of specific identity not of the same race.For example, user's specific identity may include The elderly, a middle-aged person, young people or children etc..Assuming that the voice operating instruction that user issues is broadcasting children's book, and When identifying that user's specific identity is children by this method, then the function of playing children's book is executed, does not otherwise execute the function Energy.By this kind of mode, the specific identity of user can be effectively identified, and then determining the specific identity and voice operating instruction When corresponding, voice operating instruction is executed.It is reached for special population and the effect of special function service is provided.

Corresponding with above-described embodiment, the embodiment of the present invention 3 additionally provides a kind of based on voice operating instruction identification use The system of family specific identity, specifically as shown in figure 3, the system includes: processor 301 and memory 302；

Memory 302 is for storing one or more program instructions；

Processor 301, for running one or more program instructions, a kind of base for being introduced to execute embodiment as above The method step either in the method for voice operating instruction identification user's specific identity.

A kind of system based on voice operating instruction identification user's specific identity provided in an embodiment of the present invention, by with The voice operating instruction that family issues is pre-processed, and feature tag corresponding with voice operating instruction is obtained.Then by the spy Sign label, which is input in pre-established training pattern, to be predicted, which is to be trained by great amount of samples data Afterwards, it can be instructed according to voice operating, identify the user of specific identity not of the same race.For example, user's specific identity may include The elderly, a middle-aged person, young people or children etc..Assuming that the voice operating instruction that user issues is broadcasting children's book, and When identifying that user's specific identity is children by this method, then the function of playing children's book is executed, does not otherwise execute the function Energy.By this kind of mode, the specific identity of user can be effectively identified, and then determining the specific identity and voice operating instruction When corresponding, voice operating instruction is executed.It is reached for special population and the effect of special function service is provided.

Corresponding with above-described embodiment, the embodiment of the invention also provides a kind of computer storage medium, the computers Include one or more program instructions in storage medium.Wherein, one or more program instructions are used to be grasped by one kind based on voice Make instruction identification user's specific identity system execute one kind as described above be based on voice operating instruction identification user it is specific The method of identity.

Although above having used general explanation and specific embodiment, the present invention is described in detail, at this On the basis of invention, it can be made some modifications or improvements, this will be apparent to those skilled in the art.Therefore, These modifications or improvements without departing from theon the basis of the spirit of the present invention are fallen within the scope of the claimed invention.

Claims

1. a kind of method based on voice operating instruction identification user's specific identity, which is characterized in that the described method includes:

The voice operating instruction of pre-acquiring is pre-processed, feature tag corresponding with voice operating instruction is obtained；

The feature tag is input in pre-established training pattern and is predicted, obtains prediction result；

According to the prediction result, the user's specific identity for issuing the voice operating instruction is determined.

2. the method according to claim 1, wherein the voice operating instruction to pre-acquiring is located in advance Reason obtains feature tag corresponding with voice operating instruction, specifically includes:

Semantics recognition processing is carried out to the voice operating instruction of the pre-acquiring, obtains semantic text content；

According to the first preset rules, semantic feature is extracted from the semantic text content；

Feature tag corresponding with the semantic feature is matched from pre-established database.

3. according to the method described in claim 2, it is characterized in that, when not being matched to from pre-established database and institute's predicate When the corresponding feature tag of adopted feature, determines the voice operating instruction ignore, stop executing subsequent operation；

Alternatively, when being matched at least one feature tag corresponding with the semantic feature from the pre-established database When, according to the second preset rules, validity feature label is screened from least one described feature tag, is had so as to subsequent by described Effect feature tag, which is input in pre-established training pattern, to be predicted.

4. method according to claim 1-3, which is characterized in that the voice operating to pre-acquiring instruct into Row pretreatment, before obtaining feature tag corresponding with voice operating instruction, the method also includes:

5. method according to claim 1-3, which is characterized in that the pre-established training pattern construction step Include:

A plurality of voice operating instruction sample is received and stored, a plurality of voice operating instruction sample is issued by different user；

According to sound groove recognition technology in e, classify to voice operating instruction sample, the voice operating that same user is issued Instruction sample is classified as one kind, and operational order sample of users type is marked；

Each voice operating instruction sample is pre-processed respectively, obtains feature mark corresponding with voice operating instruction sample Label；Feature tag corresponding with voice operating instruction sample is input in training pattern and is trained, it can be accurate until obtaining When identifying the optimal training pattern of the corresponding feature tag of a certain type of user voice operating instruction sample, by the optimal training Pre-established training pattern of the model as the user type.

6. method according to claim 1-3, which is characterized in that the pre-established training pattern is decision tree Model.

7. a kind of device based on voice operating instruction identification user's specific identity, which is characterized in that described device includes:

Predicting unit is predicted for the feature tag to be input in pre-established training pattern, obtains prediction result；

Identity recognizing unit, for determining the user's specific identity for issuing the voice operating instruction according to the prediction result.

8. device according to claim 7, which is characterized in that the processing unit is also used to, in the voice to pre-acquiring Operational order is pre-processed, before obtaining corresponding with voice operating instruction feature tag, by sound groove recognition technology in e, The voice operating for obtaining pre-acquiring instructs corresponding user type；

9. a kind of system based on voice operating instruction identification user's specific identity, which is characterized in that the system comprises: processing Device and memory；

The memory is for storing one or more program instructions；

The processor, for running one or more program instructions, to execute side as claimed in any one of claims 1 to 6 Method.

10. a kind of computer storage medium, which is characterized in that refer in the computer storage medium comprising one or more programs It enables, one or more of program instructions are used for the system by a kind of based on voice operating instruction identification user's specific identity and execute As the method according to claim 1 to 6.