CN110211578A

CN110211578A - Speaker control method, device and equipment

Info

Publication number: CN110211578A
Application number: CN201910304851.8A
Authority: CN
Inventors: 戚耀文
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd; Shanghai Xiaodu Technology Co Ltd
Priority date: 2019-04-16
Filing date: 2019-04-16
Publication date: 2019-09-06
Anticipated expiration: 2039-04-16
Also published as: CN110211578B

Abstract

The embodiment of the present invention provides a kind of speaker control method, device and equipment, this method comprises: after at least two speakers detect default voice messaging, the radio reception energy information of at least two speaker is obtained, the radio reception energy information is used to indicate the sound size that the speaker receives the default voice messaging；According to the radio reception energy information of at least two speaker, target speaker is determined at least two speaker, and wake up the target speaker.Improve the accuracy controlled speaker.

Description

Speaker control method, device and equipment

Technical field

The present embodiments relate to field of computer technology more particularly to a kind of speaker control methods, device and equipment.

Background technique

Currently, being deployed with intelligent sound box in many scenes (such as home scenarios, laboratory scene etc.), user can lead to Cross voice control intelligent sound box.

In actual application, multiple intelligent sound boxes may be disposed in Same Scene, need to wake up one in user When intelligent sound box, user can say wake-up word.However, the speaker can not detect user's when the sound of user is too small Sound causes the intelligent sound box that can not be waken up.When user too loudly when, while waking up the intelligent sound box, may Other intelligent sound boxes can be also waken up, cause intelligent sound box by false wake-up.From the foregoing, it will be observed that the accuracy controlled intelligent sound box It is poor.

Summary of the invention

The embodiment of the present invention provides a kind of speaker control method, device and equipment, improves the essence controlled speaker Exactness.

In a first aspect, the embodiment of the present invention provides a kind of speaker control method, comprising:

After at least two speakers detect default voice messaging, the radio reception energy letter of at least two speaker is obtained Breath, the radio reception energy information are used to indicate the sound size that the speaker receives the default voice messaging；

According to the radio reception energy information of at least two speaker, target speaker is determined at least two speaker, And wake up the target speaker.

In a kind of possible embodiment, at least two microphones, the radio reception energy letter are provided in the speaker It include the radio reception energy value that each microphone receives the default voice messaging in breath；It is described according at least two speaker Radio reception energy information, at least two speaker determine target speaker, comprising:

The radio reception energy value of the default voice messaging is received according at least two microphones in each speaker, is determined The radio reception average energy of each speaker；

According to the radio reception average energy of each speaker, target speaker is determined at least two speaker.

In a kind of possible embodiment, the radio reception average energy according to each speaker, described at least two Target speaker is determined in a speaker, comprising:

According to the radio reception average energy of each speaker, at least one first sound is determined at least two speaker Case, at least two speaker, the radio reception average energy maximum of first speaker；

The target speaker is determined at least one described first speaker.

In a kind of possible embodiment, the target speaker is determined at least one described first speaker, comprising:

When the number of at least one first speaker is 1, at least one first speaker is determined as the target sound Case；

When the number of at least one first speaker is greater than 1, the corresponding maximum radio reception energy of each first speaker is obtained Magnitude determines at least one at least one described first speaker according to the corresponding maximum radio reception energy value of each first speaker Second speaker, and the target speaker is determined at least one described second speaker；Wherein, the maximum radio reception energy value is Maximum value in the radio reception energy value of at least two microphones in first speaker, at least one described first speaker In, the maximum radio reception energy value of second speaker it is maximum.

In a kind of possible embodiment, the target speaker is determined at least one described second speaker, comprising:

When the number of at least one second speaker is 1, at least one second speaker is determined as the target sound Case；

When the number of at least one second speaker is greater than 1, it is maximum to obtain radio reception energy value in each second speaker Microphone microphone adjacent thereto between radio reception energy differences, and according to the corresponding radio reception energy difference of each second speaker Value, determines the target speaker.

In a kind of possible embodiment, the corresponding radio reception energy differences of each second speaker of basis determine institute State target speaker, comprising:

At least one the smallest third speaker of radio reception energy differences is determined at least one described second speaker；

When the number of at least one third speaker is 1, then at least one described third speaker is determined as described Target speaker；

It, then will be any at least one described third speaker when the number of at least one third speaker is greater than 1 One speaker is determined as the target speaker.

In a kind of possible embodiment, the method also includes:

After at least two speaker detects the default voice messaging, the default voice messaging is obtained；

Obtain the corresponding default vocal print of each speaker and the corresponding vocal print of the default voice messaging；

According to the corresponding default vocal print of each speaker and the corresponding vocal print of the default voice messaging, described at least two Target speaker is determined in speaker, and wakes up the target speaker, the vocal print of the target speaker and the default voice messaging pair The voice print matching answered.

In a kind of possible embodiment, at least two speaker is located at identical local area network.

In a kind of possible embodiment, at least two speaker is intelligent sound box.

Second aspect, the embodiment of the present invention provide a kind of speaker control device, comprising: first obtains module, determining module And wake-up module, wherein

The first acquisition module is used for, and after at least two speakers detect default voice messaging, acquisition is described extremely The radio reception energy information of few two speakers, the radio reception energy information are used to indicate the speaker and receive the default voice letter The sound size of breath；

The determining module is used for, according to the radio reception energy information of at least two speaker, at least two sound Target speaker is determined in case；

The wake-up module is used for, and wakes up the target speaker.

In a kind of possible embodiment, at least two microphones, the radio reception energy letter are provided in the speaker It include the radio reception energy value that each microphone receives the default voice messaging in breath；The determining module is specifically used for:

In a kind of possible embodiment, the determining module is specifically used for:

The target speaker is determined at least one described first speaker.

In a kind of possible embodiment, the determining module is specifically used for::

In a kind of possible embodiment, described device further includes the second acquisition module, wherein

The second acquisition module is used for, and after at least two speaker detects the default voice messaging, is obtained The default voice messaging is taken, and obtains each corresponding default vocal print of speaker and the corresponding sound of the default voice messaging Line；

The determining module is also used to, corresponding according to the corresponding default vocal print of each speaker and the default voice messaging Vocal print determines target speaker, the vocal print of the target speaker and the default voice messaging pair at least two speaker The voice print matching answered；

The wake-up module is also used to, and wakes up the target speaker.

The third aspect, the embodiment of the present invention provide a kind of speaker control device, comprising: at least one processor and storage Device；

The memory stores computer executed instructions；

At least one described processor executes the computer executed instructions of memory storage so that it is described at least one Processor executes such as the described in any item speaker control methods of first aspect.

Fourth aspect, the embodiment of the present invention provide a kind of computer readable storage medium, the computer-readable storage medium It is stored with computer executed instructions in matter, when processor executes the computer executed instructions, realizes as first aspect is any Speaker control method described in.

Speaker control method, device and equipment provided in an embodiment of the present invention listen to default language at least two speakers After message breath, the radio reception energy information of available at least two speaker of server, according to the radio reception energy of at least two speakers It measures information and determines target speaker, and wake up target speaker at least two speakers.In above process, even if multiple speakers are same When listened to the default voice messaging of user, server still can select a radio reception effect best in multiple speaker Target speaker, and wake up target speaker.Avoid when user too loudly when, excessive speaker unnecessary call out It wakes up, reduces probability of the speaker by false wake-up, and then improve the accuracy controlled speaker.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair Bright some embodiments for those of ordinary skill in the art without any creative labor, can be with It obtains other drawings based on these drawings.

Fig. 1 is the application scenarios schematic diagram of speaker control method provided in an embodiment of the present invention；

Fig. 2 is the flow diagram of speaker control method provided in an embodiment of the present invention；

Fig. 3 A is speaker schematic diagram provided in an embodiment of the present invention；

Fig. 3 B is speaker schematic diagram provided in an embodiment of the present invention；

Fig. 4 is the flow diagram of another speaker control method provided in an embodiment of the present invention；

Fig. 5 is the flow diagram of speaker control method provided in an embodiment of the present invention；

Fig. 6 is the flow diagram of another speaker control method provided in an embodiment of the present invention；

Fig. 7 is a kind of speaker schematic diagram provided in an embodiment of the present invention；

Fig. 8 is a kind of speaker controling device structure diagram provided in an embodiment of the present invention；

Fig. 9 is another speaker controling device structure diagram provided in an embodiment of the present invention；

Figure 10 is the hardware structural diagram of speaker control device provided in an embodiment of the present invention.

Specific embodiment

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.

Fig. 1 is the application scenarios schematic diagram of speaker control method provided in an embodiment of the present invention.Referring to Figure 1, including it is more A speaker (for example, speaker 1, speaker 2, speaker 3 and speaker 4) and server, multiple speaker are located at same local area network, Mei Geyin Case can be communicated with server.The state of speaker includes dormant state and wake-up states, speaker in a dormant state When, speaker can carry out audio monitoring, and after listening to default voice messaging, speaker is available to receive default voice letter The radio reception energy information of breath, radio reception energy information are used to indicate the sound size that speaker receives default voice messaging.Speaker will Radio reception energy information is sent to server, and server selects a sound according to the radio reception energy information of speaker in multiple speakers Case, and a speaker for selection is waken up, after speaker is waken up, speaker can carry out audio broadcasting.

In this application, even if multiple speakers are while listening for the default voice messaging for having arrived user, server still can be with The speaker for selecting a speech recognition effect best in multiple speaker, and wake up the speaker of selection.It avoids when user's When too loudly, unnecessary wake-up is carried out to excessive speaker, reduces probability of the speaker by false wake-up, and then improve to intelligence The accuracy that energy speaker is controlled.

In the following, technical solution shown in the application is described in detail by specific embodiment.Under it should be noted that The several specific embodiments in face can be combined with each other, and for the same or similar content, no longer carry out weight in various embodiments Multiple explanation.

Fig. 2 is the flow diagram of speaker control method provided in an embodiment of the present invention.Refer to Fig. 2, this method can be with Include:

S201, after at least two speakers detect default voice messaging, obtain at least two speakers radio reception energy Information.

The executing subject of the embodiment of the present invention can be server, or speaker control dress in the server is arranged It sets.Optionally, speaker control device can be by software realization, can also being implemented in combination with by software and hardware.

Optionally, speaker involved in the embodiment of the present invention can be intelligent sound box, that is, involved by the embodiment of the present invention Speaker at least have audio monitoring, speech recognition, voice messaging handled, carries out the function such as communicating with server.

Optionally, at least two speakers are located at identical local area network, that is, at least two speakers can access identical local Net, and communicated by identical local area network with server.

Optionally, the position of at least two speakers in a local network is different.It is in for example, at least two speakers can be put The different location in front yard.

Wherein, presetting voice messaging is the wake-up word for waking up speaker.For example, default voice messaging can for ", small degree sound Case ", " hello, speaker ", " the small small degree of degree " etc..

The wake-up word of each speaker can be identical in same local area network.That is, local can be waken up by identical wake-up word All speakers in net.

In actual application, at least two speakers can carry out audio monitoring, listen at least two speakers pre- If after voice messaging, at least two speakers can obtain respective radio reception energy value respectively, and respective to server transmission Radio reception energy value.

Optionally, it when at least two speakers send respective radio reception energy value to server, can also be sent out to server The mark of the local area network where speaker is sent, so that server can identify owning in same local area network according to the mark of local area network Speaker.

Wherein, radio reception energy information is used to indicate the sound size that speaker receives default voice messaging.

Optionally, multiple microphones can be set in a speaker, each microphone can receive voice messaging, accordingly , it may include the radio reception energy value of each microphone in the speaker in the radio reception energy information of a speaker.The receipts of microphone Sound energy value is bigger, and the sound size for illustrating that the microphone receives is bigger.

S202, the radio reception energy information according at least two speakers determine target speaker at least two speakers.

Optionally, radio reception effect can be determined at least two speakers according to the radio reception energy information of at least two speakers The best speaker of fruit, and the best speaker of speech recognition effect is determined as target speaker.The best speaker of radio reception effect is usual Voice result of broadcast for the speaker nearest apart from user, and the speaker is preferably also.

It should be noted that being illustrated in the embodiment shown in fig. 4 to the process of determining target speaker, herein no longer It is repeated.

S203, wake up target speaker.

Optionally, after server determines to obtain target speaker, server can send wake up instruction to target speaker, So that target speaker is switched to wake-up states according to wake up instruction.

It should be noted that then server only gets one when only one speaker detects default voice messaging The radio reception energy information of speaker correspondingly, a speaker is then determined as target speaker by server, and wakes up a sound Case.

Speaker control method provided in an embodiment of the present invention, after at least two speakers listen to default voice messaging, The radio reception energy information of available at least two speaker of server, according to the radio reception energy information of at least two speakers at least Target speaker, and wake up target speaker are determined in two speakers.In above process, though multiple speakers while listening for arrived use The default voice messaging at family, the target speaker that server still can select a radio reception effect best in multiple speaker, And wake up target speaker.Avoid when user too loudly when, unnecessary wake-up is carried out to excessive speaker, reduces speaker By the probability of false wake-up, and then improve the accuracy controlled speaker.

On the basis of embodiment shown in Fig. 2, it optionally, in speaker is provided at least two microphones, each microphone It can receive voice messaging.Optionally, speaker can be that solid sound box can be adjacent by position when speaker is cylindrical body Microphone is known as adjacent microphone.When speaker is cube, multiple Mikes can be set in the not ipsilateral of speaker Wind, same side can be arranged, multiple microphones that position is adjacent are known as adjacent microphone, alternatively, can also will not Coplanar setting, microphone that position is adjacent be known as adjacent microphone.In the following, being carried out in conjunction with Fig. 3 to the microphone in speaker Explanation.

Fig. 3 A is speaker schematic diagram provided in an embodiment of the present invention.Fig. 3 A is referred to, speaker is cylindrical body, in the side of speaker Face is provided with microphone A, microphone B, microphone C and microphone D, then microphone A and microphone B, microphone B and microphone C, Microphone C and microphone D is adjacent microphone.

Fig. 3 B is speaker schematic diagram provided in an embodiment of the present invention.Fig. 3 B is referred to, speaker is cube, the one of speaker A side is provided with microphone E, microphone F and microphone G, another side of speaker is provided with microphone H and microphone I, then microphone E and microphone F, microphone F and microphone G, microphone H and microphone I are adjacent microphone.Alternatively, also Microphone G and microphone H can be also referred to as adjacent microphone.

Fig. 4 is the flow diagram of another speaker control method provided in an embodiment of the present invention.Refer to Fig. 4, the party Method may include:

S401, the radio reception energy value that default voice messaging is received according at least two microphones in each speaker, really The radio reception average energy of fixed each speaker.

Wherein, include for any one speaker, in the radio reception energy information of the speaker in the speaker each microphone connect Receive the radio reception energy value of default voice messaging.

For any one speaker, the average value of the radio reception energy value of the microphone in the speaker can be determined as the sound The radio reception average energy of case.

For example, it is assumed that being provided with 3 microphones in a speaker, it is denoted as microphone 1, microphone 2 and microphone 3 respectively, Assuming that the radio reception energy value that microphone 1 receives default voice messaging is a, microphone 2 receives the radio reception energy of default voice messaging Value is b, and the radio reception energy value that microphone 3 receives default voice messaging is c, correspondingly, the radio reception average energy of the speaker is (a+b+c)/3。

S402, the radio reception average energy according to each speaker determine at least one first sound at least two speakers Case.

Wherein, at least two speakers, the radio reception average energy maximum of the first speaker.

Optionally, the number of the first speaker may be 1, it is also possible to be multiple.

S403, judge whether the number of the first speaker is greater than 1.

If so, executing S405.

If it is not, then executing S404.

S404, first speaker is determined as target speaker.

When the number of the first speaker is not more than 1, then the number of the first speaker is 1.Since the number of the first speaker is 1, Therefore, which can be determined as target speaker.

S405, the corresponding maximum radio reception energy value of each first speaker is obtained.

Wherein, maximum radio reception energy value is the maximum in the radio reception energy value of at least two microphones in the first speaker Value.

For example, it is assumed that being provided with 3 microphones in the first speaker, it is denoted as microphone 1, microphone 2 and microphone 3 respectively, In microphone 1, microphone 2 and microphone 3, it is assumed that the radio reception energy value of microphone 1 is maximum, then by the radio reception energy of microphone 1 Magnitude is determined as the corresponding maximum radio reception energy value of the first speaker.

S406, it is determined at least at least one first speaker according to the corresponding maximum radio reception energy value of each first speaker One the second speaker.

Wherein, at least one first speaker, the maximum radio reception energy value of the second speaker maximum.

Optionally, the number of the second speaker may be 1, it is also possible to be multiple.

S407, judge whether the number of the second speaker is greater than 1.

If so, executing S409.

If it is not, then executing S408.

S408, second speaker is determined as target speaker.

When the number of the second speaker is not more than 1, then the number of the second speaker is 1.Since the number of the second speaker is 1, Therefore, which can be determined as target speaker.

Receipts in S409, each second speaker of acquisition between the maximum microphone of radio reception energy value microphone adjacent thereto Sound energy differences.

For example, it is assumed that the second speaker is as shown in Figure 3B, and the radio reception energy of the microphone E in the second speaker is maximum, Mike The adjacent microphone of wind E is microphone F, then the corresponding radio reception energy differences of the second speaker are between microphone E and microphone F Radio reception energy value difference.

For example, it is assumed that the second speaker is as shown in Figure 3B, and the radio reception energy of the microphone F in the second speaker is maximum, Mike The adjacent microphone of wind E is microphone E and microphone G, it is assumed that the difference of the radio reception energy value between microphone F and microphone E Value is difference 1, and the difference of the radio reception energy value between microphone F and microphone G is difference 2, the then corresponding radio reception of the second speaker Energy differences are as follows: the smallest difference in difference 1 and difference 2.

S410, at least one the smallest third speaker of radio reception energy differences is determined at least one second speaker.

Optionally, the number of third speaker may be 1, it is also possible to be greater than 1.

S411, judge whether the number of third speaker is greater than 1.

If so, executing S413.

If it is not, then executing S412.

S412, a third speaker is determined as target speaker.

When the number of third speaker is not more than 1, then the number of third speaker is 1.Since the number of third speaker is 1, Therefore, which can be determined as target speaker.

S413, any one speaker at least one third speaker is determined as target speaker.

Since the radio reception average energy of each speaker at least one third speaker is identical, maximum radio reception energy value phase With and radio reception energy differences it is identical, therefore, can arbitrarily select a speaker as target sound at least one third speaker Case.

In the embodiment shown in fig. 4, radio reception average energy maximum at least one is first determined at least two speakers One first speaker is determined as target speaker if the number of the first speaker is 1 by a first speaker.If the first speaker Number is greater than 1, then at least one maximum second speaker of maximum radio reception energy value is determined at least one first speaker, if the The number of two speakers is 1, then second speaker is determined as target speaker.If the number of the second speaker is greater than 1, extremely At least one the smallest third speaker of radio reception energy differences is determined in few second speaker, if the number of third speaker is 1, One third speaker is determined as target speaker, if the number of third speaker is greater than 1, is appointed at least one third speaker Meaning selects a speaker as target speaker.In above process, can to determine that obtained target speaker is radio reception effect Best speaker.

In the following, by specific example, being carried out to speaker control method shown in above method embodiment detailed in conjunction with Fig. 5 Explanation.

Fig. 5 is the flow diagram of speaker control method provided in an embodiment of the present invention.Fig. 5 is referred to, in a local network 6 speakers are provided with, are denoted as speaker 1, speaker 2, speaker 3, speaker 4, speaker 5 and speaker 6 respectively.Assuming that 6 speakers are called out Word of waking up is respectively ", small degree ".

In actual application, when user needs to wake up one apart from oneself nearest (or radio reception effect is best) When speaker, user is it may be said that ", small degree ".Assuming that user has said after ", small degree ", apart from the closer speaker 2 of user, speaker 4, speaker 5 and speaker 6 have listened to the voice messaging, then speaker 2, speaker 4, speaker 5 and speaker 6 are respectively by the receipts of respective speaker Sound energy information is sent to server.Wherein, the radio reception energy information of each speaker includes the microphone being arranged in respective speaker Radio reception energy value be sent to server.

Server first determines radio reception average energy according to the radio reception energy information of speaker 2, speaker 4, speaker 5 and speaker 6 Maximum speaker, it is assumed that determine that obtaining the maximum speaker of radio reception average energy is speaker 4, speaker 5 and speaker 6.Due to radio reception The number of the maximum speaker of average energy is greater than 1, then server determines maximum radio reception energy in speaker 4, speaker 5 and speaker 6 The maximum speaker of magnitude, it is assumed that determine that the obtained maximum maximum speaker of radio reception energy value is speaker 4 and speaker 6.Due to maximum The number of the maximum speaker of radio reception energy value is greater than 1, then server determines that radio reception energy differences are minimum in speaker 4 and speaker 6 Speaker, it is assumed that the smallest speaker of radio reception energy differences be speaker 6, then speaker 6 is determined as target speaker, and wake up speaker 6.

In above process, after user says the wake-up word ", small degree " to speaker, even if multiple speakers listen to The wake-up word, then server determine in multiple speaker apart from user recently, radio reception effect best one speaker, and call out Wake up a speaker, avoid when user too loudly when, unnecessary wake-up is carried out to excessive speaker, reduces speaker quilt The probability of false wake-up, and then improve the accuracy controlled intelligent sound box.

It, optionally, can be corresponding default for the setting of each speaker in advance on the basis of any one above-mentioned embodiment Vocal print, correspondingly, the sound of the only vocal print can wake up the speaker.In the following, embodiment as shown in connection with fig. 6, to this kind of feelings Speaker control method under condition is illustrated.

Fig. 6 is the flow diagram of another speaker control method provided in an embodiment of the present invention.Refer to Fig. 6, the party Method may include:

S601, after at least two speakers detect default voice messaging, obtain default voice messaging.

Optionally, after at least two speakers detect default voice messaging, default language is detected to server transmission Message breath.

Optionally, the default voice messaging which sends to server is identical.

Optionally, when at least two speakers send default voice messaging to server, sound can also be sent to server The mark of local area network where case, so that server can identify all sounds in same local area network according to the mark of local area network Case.

S602, each corresponding default vocal print of speaker and the corresponding vocal print of default voice messaging are obtained.

Optionally, the corresponding default vocal print of each speaker can be stored in advance in the server.

After server receives default voice messaging, identifying processing can be carried out to voice messaging, be obtained with identification pre- If the corresponding vocal print of voice messaging.

S603, according to the corresponding default vocal print of each speaker and the corresponding vocal print of default voice messaging, at least two sounds Target speaker is determined in case.

Wherein, the vocal print of target speaker voice print matching corresponding with default voice messaging.

Optionally, it can be one-to-one relationship between speaker and default vocal print, therefore, server is according to default voice Information can determine at least two speakers obtains a speaker.

Optionally, many-to-one relationship is also possible between speaker and default vocal print, that is, the corresponding sound of a default vocal print Sound can wake up multiple speakers.Correspondingly, server may identify to obtain multiple vocal prints corresponding with default voice messaging Any one speaker of voice print matching corresponding with default voice messaging then can be determined as target speaker by the speaker matched, or Person, can also be true in the speaker of voice print matching corresponding with default voice messaging by method shown in Fig. 2-Fig. 5 embodiment Set the goal speaker.

S604, wake up target speaker.

It should be noted that the implementation procedure of S604 may refer to the implementation procedure of S203, no longer repeated herein.

In the following, being illustrated by specific example to method shown in Fig. 6 embodiment in conjunction with Fig. 7.

Fig. 7 is a kind of speaker schematic diagram provided in an embodiment of the present invention.Refer to Fig. 7, be provided in local area network speaker 1, Speaker 2, speaker 3 and speaker 4, it is assumed that the vocal print for presetting user 1 is corresponding with speaker 1, and the vocal print of user 2 is corresponding with speaker 2, The vocal print of user 3 is corresponding with speaker 3, and the vocal print of user 4 is corresponding with speaker 4.That is, the wake-up word that only user 1 says can wake up The wake-up word that speaker 1, only user 2 are said can wake up speaker 2, and the wake-up word that only user 3 says can wake up speaker 3, only The wake-up word that user 4 says can wake up speaker 4.

In actual application, after user 1, which says, wakes up word, it is assumed that speaker 1, speaker 2, speaker 3 and speaker 4 are equal The wake-up word has been listened to, then the wake-up word has been sent to server respectively, server judges the vocal print and speaker 1 of the wake-up word Corresponding, then server wakes up speaker 1.

In Fig. 6-embodiment shown in Fig. 7, the corresponding relationship of speaker and vocal print can be preset, in this way, even if more A speaker while listening for the default voice messaging for having arrived user, server can still be selected in multiple speaker one with it is pre- If the speaker of the voice print matching of voice messaging, and wake up the speaker.Avoid when user too loudly when, to excessive speaker into The unnecessary wake-up of row, reduces probability of the speaker by false wake-up, and then improve the accuracy controlled intelligent sound box.

Fig. 8 is a kind of speaker controling device structure diagram provided in an embodiment of the present invention.Refer to Fig. 8, speaker control Device 10 includes: the first acquisition module 11, determining module 12 and wake-up module 13, wherein

The first acquisition module 11 is used for, after at least two speakers detect default voice messaging, described in acquisition The radio reception energy information of at least two speakers, the radio reception energy information are used to indicate the speaker and receive the default voice The sound size of information；

The determining module 12 is used for, according to the radio reception energy information of at least two speaker, described at least two Target speaker is determined in speaker；

The wake-up module 13 is used for, and wakes up the target speaker.

It should be noted that speaker control device provided in an embodiment of the present invention can execute shown in above method embodiment Technical solution, realization principle and beneficial effect are similar, this time no longer repeated.

In a kind of possible embodiment, at least two microphones, the radio reception energy letter are provided in the speaker It include the radio reception energy value that each microphone receives the default voice messaging in breath；The determining module 12 is specifically used for:

In a kind of possible embodiment, the determining module 12 is specifically used for:

The target speaker is determined at least one described first speaker.

The determining module 12 described in a kind of possible embodiment is specifically used for:

In a kind of possible embodiment, the determining module 12 is specifically used for::

Fig. 9 is another speaker controling device structure diagram provided in an embodiment of the present invention.Embodiment shown in Fig. 8 On the basis of, Fig. 9 is referred to, speaker control device 10 further includes the second acquisition module 14, wherein

The second acquisition module 14 is used for, after at least two speaker detects the default voice messaging, The default voice messaging is obtained, and obtains each corresponding default vocal print of speaker and the corresponding sound of the default voice messaging Line；

The determining module 12 is also used to, corresponding according to the corresponding default vocal print of each speaker and the default voice messaging Vocal print, target speaker, the vocal print of the target speaker and the default voice messaging are determined at least two speaker Corresponding voice print matching；

The wake-up module 13 is also used to, and wakes up the target speaker.

Figure 10 is the hardware structural diagram of speaker control device provided in an embodiment of the present invention, as shown in Figure 10, the sound Case control device 20 includes: at least one processor 21 and memory 22.Wherein, processor 21 and memory 22 pass through bus 23 Connection.

Optionally, speaker control device 20 can also include communication component, and communication component may include receiver and/or hair Send device.

During specific implementation, at least one processor 21 executes the computer execution that the memory 22 stores and refers to It enables, so that at least one processor 21 executes speaker control method as above.

The specific implementation process of processor 21 can be found in above method embodiment, and it is similar that the realization principle and technical effect are similar, Details are not described herein again for the present embodiment.

In above-mentioned embodiment shown in Fig. 10, it should be appreciated that processor can be central processing unit (English: Central Processing Unit, referred to as: CPU), it can also be other general processors, digital signal processor (English: Digital Signal Processor, referred to as: DSP), specific integrated circuit (English: Application Specific Integrated Circuit, referred to as: ASIC) etc..General processor can be microprocessor or the processor is also possible to any conventional place Manage device etc..Hardware processor can be embodied directly in conjunction with the step of invention disclosed method and executes completion, or with handling Hardware and software module combination in device execute completion.

Memory may include high speed RAM memory, it is also possible to and it further include non-volatile memories NVM, for example, at least one Magnetic disk storage.

Bus can be industry standard architecture (Industry Standard Architecture, ISA) bus, outer Portion's apparatus interconnection (Peripheral Component, PCI) bus or extended industry-standard architecture (Extended Industry Standard Architecture, EISA) bus etc..Bus can be divided into address bus, data/address bus, control Bus etc..For convenient for indicating, the bus in illustrations does not limit only a bus or a type of bus.

The application also provides a kind of computer readable storage medium, and calculating is stored in the computer readable storage medium Machine executes instruction, and when processor executes the computer executed instructions, realizes speaker control method as described above.

Above-mentioned computer readable storage medium, above-mentioned readable storage medium storing program for executing can be by any kind of volatibility or non- Volatile storage devices or their combination realize that, such as static random access memory (SRAM), electrically erasable is only It reads memory (EEPROM), Erasable Programmable Read Only Memory EPROM (EPROM), programmable read only memory (PROM) is read-only to deposit Reservoir (ROM), magnetic memory, flash memory, disk or CD.Readable storage medium storing program for executing can be general or specialized computer capacity Any usable medium enough accessed.

A kind of illustrative readable storage medium storing program for executing is coupled to processor, to enable a processor to from the readable storage medium storing program for executing Information is read, and information can be written to the readable storage medium storing program for executing.Certainly, readable storage medium storing program for executing is also possible to the composition portion of processor Point.Processor and readable storage medium storing program for executing can be located at specific integrated circuit (Application Specific Integrated Circuits, referred to as: ASIC) in.Certainly, processor and readable storage medium storing program for executing can also be used as discrete assembly and be present in equipment In.

The division of the unit, only a kind of logical function partition, there may be another division manner in actual implementation, Such as multiple units or components can be combined or can be integrated into another system, or some features can be ignored, or not hold Row.Another point, shown or discussed mutual coupling, direct-coupling or communication connection can be through some interfaces, The indirect coupling or communication connection of device or unit can be electrical property, mechanical or other forms.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.

It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.

It, can be with if the function is realized in the form of SFU software functional unit and when sold or used as an independent product It is stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially in other words The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a People's computer, server or network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention. And storage medium above-mentioned includes: that USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited The various media that can store program code such as reservoir (RAM, Random Access Memory), magnetic or disk.

Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above-mentioned each method embodiment can lead to The relevant hardware of program instruction is crossed to complete.Program above-mentioned can be stored in a computer readable storage medium.The journey When being executed, execution includes the steps that above-mentioned each method embodiment to sequence；And storage medium above-mentioned include: ROM, RAM, magnetic disk or The various media that can store program code such as person's CD.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations；To the greatest extent Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into Row equivalent replacement；And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution The range of scheme.Those of ordinary skill in the art will appreciate that: realize that all or part of the steps of above-mentioned each method embodiment can To be done through the relevant hardware of the program instructions.Program above-mentioned can be stored in a computer readable storage medium. When being executed, execution includes the steps that above-mentioned each method embodiment to the program；And storage medium above-mentioned includes: ROM, RAM, magnetic The various media that can store program code such as dish or CD.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations；To the greatest extent Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into Row equivalent replacement；And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution The range of scheme.

Claims

1. a kind of speaker control method characterized by comprising

After at least two speakers detect default voice messaging, the radio reception energy information of at least two speaker is obtained, The radio reception energy information is used to indicate the sound size that the speaker receives the default voice messaging；

According to the radio reception energy information of at least two speaker, target speaker is determined at least two speaker, and call out It wakes up the target speaker.

2. described the method according to claim 1, wherein be provided at least two microphones in the speaker It include the radio reception energy value that each microphone receives the default voice messaging in radio reception energy information；It is described according to extremely The radio reception energy information of few two speakers, determines target speaker at least two speaker, comprising:

The radio reception energy value of the default voice messaging is received according at least two microphones in each speaker, is determined each The radio reception average energy of speaker；

3. according to the method described in claim 2, it is characterized in that, the radio reception average energy according to each speaker, Target speaker is determined at least two speaker, comprising:

According to the radio reception average energy of each speaker, at least one first speaker is determined at least two speaker, In at least two speaker, the radio reception average energy of first speaker it is maximum；

The target speaker is determined at least one described first speaker.

4. according to the method described in claim 3, it is characterized in that, determining the target at least one described first speaker Speaker, comprising:

When the number of at least one first speaker is 1, at least one first speaker is determined as the target speaker；

When the number of at least one first speaker is greater than 1, the corresponding maximum radio reception energy value of each first speaker is obtained, According to the corresponding maximum radio reception energy value of each first speaker determined at least one described first speaker at least one second Speaker, and the target speaker is determined at least one described second speaker；Wherein, the maximum radio reception energy value is described Maximum value in the radio reception energy value of at least two microphones in first speaker, at least one described first speaker, institute The maximum radio reception energy value for stating the second speaker is maximum.

5. according to the method described in claim 4, it is characterized in that, determining the target at least one described second speaker Speaker, comprising:

When the number of at least one second speaker is 1, at least one second speaker is determined as the target speaker；

When the number of at least one second speaker is greater than 1, the maximum wheat of radio reception energy value in each second speaker is obtained Radio reception energy differences between gram wind microphone adjacent thereto, and according to the corresponding radio reception energy differences of each second speaker, Determine the target speaker.

6. according to the method described in claim 5, it is characterized in that, the corresponding radio reception energy difference of each second speaker of the basis Value, determines the target speaker, comprising:

When the number of at least one third speaker is 1, then at least one described third speaker is determined as the target Speaker；

When the number of at least one third speaker is greater than 1, then by any one at least one described third speaker Speaker is determined as the target speaker.

7. method according to claim 1-6, which is characterized in that the method also includes:

According to the corresponding default vocal print of each speaker and the corresponding vocal print of the default voice messaging, at least two speaker Middle determining target speaker, and the target speaker is waken up, the vocal print of the target speaker is corresponding with the default voice messaging Voice print matching.

8. method according to claim 1-7, which is characterized in that at least two speaker is located at identical office Domain net.

9. method according to claim 1-8, which is characterized in that at least two speaker is intelligent sound box.

10. a kind of speaker control device characterized by comprising first obtains module, determining module and wake-up module, wherein

The first acquisition module is used for, and after at least two speakers detect default voice messaging, obtains described at least two The radio reception energy information of a speaker, the radio reception energy information are used to indicate the speaker and receive the default voice messaging Sound size；

The determining module is used for, according to the radio reception energy information of at least two speaker, at least two speaker Determine target speaker；

The wake-up module is used for, and wakes up the target speaker.

11. device according to claim 10, which is characterized in that be provided at least two microphones, institute in the speaker State the radio reception energy value for receiving the default voice messaging in radio reception energy information including each microphone；The determining module It is specifically used for:

12. device according to claim 11, which is characterized in that the determining module is specifically used for:

The target speaker is determined at least one described first speaker.

13. device according to claim 12, which is characterized in that the determining module is specifically used for:

14. device according to claim 13, which is characterized in that the determining module is specifically used for:

15. device according to claim 14, which is characterized in that the determining module is specifically used for:

16. the described in any item devices of 0-15 according to claim 1, which is characterized in that described device further includes the second acquisition mould Block, wherein

The second acquisition module is used for, and after at least two speaker detects the default voice messaging, obtains institute Default voice messaging is stated, and obtains each corresponding default vocal print of speaker and the corresponding vocal print of the default voice messaging；

The determining module is also used to, according to the corresponding default vocal print of each speaker and the corresponding sound of the default voice messaging Line determines that target speaker, the vocal print of the target speaker are corresponding with the default voice messaging at least two speaker Voice print matching；

The wake-up module is also used to, and wakes up the target speaker.

17. the described in any item devices of 0-16 according to claim 1, which is characterized in that at least two speaker is located at identical Local area network.

18. the described in any item devices of 0-17 according to claim 1, which is characterized in that at least two speaker is intelligent sound Case.

19. a kind of speaker control device characterized by comprising at least one processor and memory；

The memory stores computer executed instructions；

At least one described processor executes the computer executed instructions of the memory storage, so that at least one described processing Device executes such as the described in any item speaker control methods of claim 1-9.

20. a kind of computer readable storage medium, which is characterized in that be stored with computer in the computer readable storage medium It executes instruction, when processor executes the computer executed instructions, realizes such as the described in any item speaker controls of claim 1-9 Method processed.