CN113393835A - Voice interaction system, method and voice equipment - Google Patents

Voice interaction system, method and voice equipment Download PDF

Info

Publication number
CN113393835A
CN113393835A CN202010168054.4A CN202010168054A CN113393835A CN 113393835 A CN113393835 A CN 113393835A CN 202010168054 A CN202010168054 A CN 202010168054A CN 113393835 A CN113393835 A CN 113393835A
Authority
CN
China
Prior art keywords
voice
user
devices
broadcast
interaction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010168054.4A
Other languages
Chinese (zh)
Inventor
李岳冰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN202010168054.4A priority Critical patent/CN113393835A/en
Publication of CN113393835A publication Critical patent/CN113393835A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/02Services making use of location information
    • H04W4/029Location-based management or tracking services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/06Selective distribution of broadcast services, e.g. multimedia broadcast multicast service [MBMS]; Services to user groups; One-way selective calling services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/70Services for machine-to-machine communication [M2M] or machine type communication [MTC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W76/00Connection management
    • H04W76/10Connection setup

Abstract

A voice interaction system, method and corresponding device are provided. The voice interaction system comprises a plurality of voice devices and an operation device, wherein the voice devices are used for voice interaction; the operation equipment is used for carrying out operation required for acquiring user state information of a user; selecting voice equipment for performing current voice interaction with the user from the plurality of voice equipment based on the acquired user state information of the user; and the selected voice device is in voice interaction with the user. The voice interaction scheme of the invention realizes the dynamic selection of a plurality of voice devices by acquiring the user state. Specifically, the device most suitable for interacting with the user can be determined by the user wearing or carrying the device to transmit the broadcast, receiving the broadcast by the voice device and calculating the attenuation, thereby improving the accuracy and the usability of providing the voice service for the user.

Description

Voice interaction system, method and voice equipment
Technical Field
The present invention relates to the field of information technologies, and in particular, to a voice interaction system, method, and voice device.
Background
With the development of voice interaction technology, more and more users are equipped with intelligent sound boxes as intelligent interaction centers at home. The intelligent sound box is a product of sound box and network technology upgrading, and can be used as a tool for home consumers to surf the internet by voice. The intelligent sound box can be used for song-on-demand, internet shopping or weather forecast knowing operation, and can also be used for controlling intelligent household equipment, such as opening a curtain, setting the temperature of a refrigerator, raising the temperature of a water heater in advance and the like.
While smart enclosures typically employ far-field sound pickup technology so that they can receive voice input outside of the 3-5m range, such far-field sound pickup often cannot cover all locations in the home, especially when considering obstructions such as room doors, walls, and upstairs and downstairs. In order to improve the accessibility of voice interaction, it may be considered to arrange a plurality of voice interaction terminals in a home. When a plurality of voice interaction terminals exist, how to properly select the terminals to interact with the user becomes a problem to be solved in the field.
Disclosure of Invention
In order to solve at least one of the above problems, the present invention proposes a new voice solution that can dynamically select a device to be voice-interacted with a user from a plurality of voice devices by acquiring user status information, thereby facilitating the use of the voice devices by the user and prompting a reach rate of the voice interaction system as a whole.
According to a first aspect of the present invention, a voice interaction system is provided, which includes a plurality of voice devices and an operating device, where the voice devices are used for performing voice interaction; the operation equipment is used for carrying out operation required for acquiring user state information of a user; selecting voice equipment for performing current voice interaction with the user from the plurality of voice equipment based on the acquired user state information of the user; and the selected voice device is in voice interaction with the user.
The operating device may be a device whose location can represent the location of the user, and may preferably include: a wearable device worn by the user; and devices carried and/or operated by the user. The operating device may transmit a broadcast and receive the broadcast by a plurality of voice devices or other devices, and determine, based on the strength signal of the received broadcast, that the voice device with the least attenuation is voice interacting with the user.
The above system is particularly applicable to voice call scenarios. To this end, according to a second aspect of the present invention, a voice call system is provided, comprising a plurality of voice devices and a user state sensing device, wherein the voice devices are used for voice interaction with a user; the sensing equipment is used for sensing user state information, wherein the voice equipment is bound with the same calling object; and selecting a voice device to be accessed to the voice incoming call from the plurality of voice devices based on the perceived user state information, and accessing the voice incoming call by the selected voice device to enable the user to carry out voice call.
According to a third aspect of the present invention, a speech device is presented, which communicates with at least one other speech device, wherein the speech device comprises: the interaction unit is used for carrying out voice interaction with a user; and a communication unit for communicating with the at least one other voice device; and dynamically selecting the voice equipment for performing the current voice interaction with the user from the voice equipment and at least one other voice equipment based on the user state information of the user.
The voice device may be an intelligent speaker used as a central node, and the speaker may collect intensity information of the broadcast received by each voice device, and determine the voice device with the minimum attenuation to perform voice interaction.
The voice device may also be a voice sticker for increasing accessibility to voice interaction, which may upload the broadcast strength of the broadcast it receives and may be used to interact with the user when there is minimal attenuation.
According to a fourth aspect of the present invention, there is provided a smart device comprising: a sensing unit for sensing that the device is being carried and/or used by a user; a broadcasting unit for transmitting a broadcast, which is received by the plurality of voice devices to determine a voice device to perform voice interaction with a user based on the broadcast reception strength. The smart device may in particular be realized as a smart watch.
According to a fifth aspect of the present invention, a voice interaction method is provided, including: acquiring user state information of a user; selecting voice equipment for performing current voice interaction with the user from a plurality of voice equipment based on the acquired user state information of the user; and informing the selected voice equipment to perform voice interaction with the user.
Therefore, the voice interaction system, the voice interaction method and the voice equipment can realize dynamic selection of a plurality of voice equipment by acquiring the user state. In particular, the device most suitable for interacting with the user (e.g., accessing a voice call) may be determined by the user wearing or carrying the device to transmit the broadcast, and the voice device receiving the broadcast and calculating the attenuation, thereby promoting accuracy and ease of use in providing voice services to the user.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent by describing in greater detail exemplary embodiments thereof with reference to the attached drawings, in which like reference numerals generally represent like parts throughout.
FIG. 1 illustrates one scenario in which dynamic selection of a speech device is required.
FIG. 2 is a schematic diagram illustrating the components of a voice interaction system, according to one embodiment of the present invention.
Figures 3A-B illustrate one example of voice interaction in accordance with the present invention.
Fig. 4 is a block diagram of a voice call system according to an embodiment of the present invention.
FIG. 5 illustrates one component example of a voice interaction system in a larger dimension.
Fig. 6 shows a schematic composition diagram of a speech device according to the invention.
FIG. 7 shows a flow diagram of a voice interaction method according to an embodiment of the invention.
Detailed Description
Preferred embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
With the development of voice interaction technology, more and more users are equipped with intelligent sound boxes as intelligent interaction centers at home. The intelligent sound box is a product of sound box and network technology upgrading, and can be used as a tool for home consumers to surf the internet by voice. The intelligent sound box can be used for song-on-demand, internet shopping or weather forecast knowing operation, and can also be used for controlling intelligent household equipment, such as opening a curtain, setting the temperature of a refrigerator, raising the temperature of a water heater in advance and the like.
While smart enclosures typically employ far-field sound pickup technology so that they can receive voice input outside of the 3-5m range, such far-field sound pickup often cannot cover all locations in the home, especially when considering obstructions such as room doors, walls, and upstairs and downstairs. In order to improve the accessibility of voice interaction, it may be considered to arrange a plurality of voice interaction terminals in a home. When a plurality of voice interaction terminals exist, how to properly select the terminals to interact with the user becomes a problem to be solved in the field.
FIG. 1 illustrates one scenario in which a voice device needs to be selected for interaction. Specifically, fig. 1 shows a scenario in which a user uses three voice devices in the home bound to the same number b to answer an incoming call at number a via a PSTN (Public Switched Telephone Network) Network.
After purchasing the smart speakers (or other voice devices) in parallel, the user can open a PSTN phone card (corresponding to a phone number) through the network and then bind the PSTN phone card with the bound smart speakers. Specifically, the user can bind a certain SIM card to the smart speaker through a dynamic binding mode on the mobile phone APP. When there are a plurality of speech equipment of networking (for example, a plurality of intelligent audio amplifier or a intelligent audio amplifier plus a plurality of intelligent pronunciation subsides) in the user's family, the user can switch the corresponding relation of intelligent speech equipment and SIM at will through APP's setting to can dial wantonly to be bound on certain audio amplifier.
Specifically, at the cloud (server), the service provider may purchase a specific number segment as a number for binding with a device such as a smart speaker to, for example, a mobile operator, thereby implementing communication between the PSTN and the SIP (Session Initiation Protocol).
As shown in the figure, another user calls a number B by using a mobile phone with a SIM card number a, where the number a is a conventional mobile phone number issued by a mobile operator to a general mobile phone user, the number B belongs to an SIP number field purchased by a service provider from the mobile operator, and information that the number B binds to the smart speaker of the user B is recorded at a server (service end) of the service provider. Therefore, when the PSTN receives a request of calling the number b by the number a, the number b is judged to be the SIP number supported by a specific service provider according to the number segment where the number b is located, then the SIP service platform of the specific service provider is connected, and then the service platform can find out the corresponding intelligent sound box and send a signaling to the sound box according to the binding information stored in the database. Then, the sound box can send out voice prompt, such as 'incoming call', or judge and report 'incoming call of A' according to the number a. User B may then activate the smart speaker with the answer and complete the call with user a.
In the case where the user's home is equipped with three networked voice devices 110 as shown in fig. 1 and all associated with number b, when there is a voice incoming call, the existing solution may be that the three devices simultaneously notify, for example, that a voice prompt "call incoming" is issued simultaneously, or that a smart phone, for example, that installs a management APP prompts an incoming call, and the user may select one voice device from a list of bound voice devices, for example, a voice device located in the living room for communication.
It is clear that existing simultaneous notification solutions can result in excessive notifications, for example, announcing sleeping children in the bedroom, and can increase unnecessary power consumption of the voice device. And the mobile phone selection scheme can increase the steps of voice interaction and reduce the user friendliness.
Therefore, the invention provides a new voice interaction scheme, which can dynamically select the equipment to be subjected to voice interaction with the user from a plurality of voice equipment by acquiring the user state information, thereby facilitating the use of the voice equipment by the user and prompting the overall reach rate of the voice interaction system.
FIG. 2 is a schematic diagram illustrating the components of a voice interaction system, according to one embodiment of the present invention. As shown in FIG. 2, a plurality of voice devices, such as voice devices 210-1, 210-2, and 210-3 are shown in the figure, are included in the voice interaction system. The three voice devices are used for voice interaction and may be the same or different devices. In one embodiment, all three devices may be implemented as smart speakers. In other embodiments, the three devices may include one smart speaker and two smart voice stickers (i.e., voice interactive terminals with simpler structure). In different implementations, the three devices may take different decision strategies. In one embodiment, a voice device may be used as a central node for information collection and decision making, such as self-decision making or communication with a server. An example of a node centered on speech recognition 210-2 is shown in FIG. 2. In other embodiments, other devices may be used as a central node, such as a decision center in an edge computing system, a server, or other device located locally.
In addition to the voice device, the system may also include an operating device 230. The operation device 230 is used to perform operations necessary to acquire user status information of the user. In other words, the operating device 230 itself may perform some operation to enable the voice interactive system to acquire user status information of the user. As will be described in detail below, the user status information may be directly acquired by the operating device 230 via its operation, or may be acquired by other devices (e.g., voice devices) within the system based on the operation of the operating device 230. The invention is not limited thereto.
Here, the "user status information" may refer to information for deciding which voice device the user currently uses is optimal. Further, "user state information" may be information for characterizing communication conditions between the user and the respective voice devices. Depending on the propagation characteristics of the sound waves, the quality of the interaction of the user with the speech device depends not only on the relative distance but also on the orientation of the user with respect to the speech device and even on the conditions of obstacles and reflectors in between. Thus, in different embodiments, the communication conditions between the user and the respective speech devices may be estimated from different angles.
After the user status information is acquired based on the operation of the operation device, a voice device that performs a current voice interaction with the user may be selected from the plurality of voice devices based on the user status information. The selected voice device may then be used for voice interaction with the user.
Therefore, in the PSTN call and the conventional VoIP (Voice over Internet Protocol) call scenario shown in fig. 1, the Voice device with the best current communication condition can be dynamically selected to access the Voice call by acquiring the user state information representing the communication condition between the user and each Voice device, so that the convenience of Voice operation is improved, and the need of extra operation for determining an interactive terminal by the user is also eliminated.
In order to accurately characterize the communication conditions between the user and the respective voice devices, the operation device 230 may be a device whose orientation can represent the user's access. To this end, the operating device 230 may be a wearable device that the user is wearing, or may be a device that the user is carrying and/or operating. The operating device 230 may determine that it is being worn, carried, or operated by the user based on a variety of mechanisms. For example, when the operating device 230 is a wearable device, e.g., a smart watch, a bluetooth headset, etc., it may be determined that the device is being worn by a measurement of a user's physical signs (e.g., a pulse is measured) or a perception of a particular motion gesture. And when the operation device is a carrying or operation device, the device can make a determination by confirming that it is being operated. The operation herein refers to an in-person operation other than a remote control operation, for example, an in-person operation on a remote controller, not an operation on a home appliance whose remote control is performed. In addition, the smartphone can also determine that the smartphone is being carried by the user (e.g., in a pocket) through the camera and the acceleration sensor, and so on.
While the orientation of the operating device 230 itself may characterize the orientation of the user, the operating device 230 may characterize the current location and/or orientation of the user by transmitting its own position, or the like. Thereby facilitating retrieval of user status information.
Since the signal attenuation can more comprehensively represent the voice interaction quality of the user and the voice equipment, the attenuation coefficient of the voice signal sent by the user can be more approximately represented by actively transmitting the broadcast and measuring the broadcast attenuation coefficient. In a preferred embodiment, the operation device is configured to perform operations required to obtain user status information of a user, and the operations may include: the broadcast is transmitted. The transmitted broadcast may be received by a plurality of broadcast receiving apparatuses included in the voice interactive system. Subsequently, the user status information may be determined according to the reception intensity of the broadcast received by the plurality of broadcast receiving apparatuses. In a more preferred and simple implementation, the plurality of broadcast receiving devices may include one or more of the speech devices, although in some embodiments additional devices may be used to receive the broadcast and calculate the attenuation coefficient for each speech device from the relative relationship of these devices to the speech device. Alternatively, a plurality of voice devices may be directly used as the broadcast receiving device. Therefore, the system can select the voice equipment with the minimum signal attenuation to interact with the user according to the strength (or attenuation coefficient) of the broadcast transmitted by the operation equipment which can represent the direction of the user and is received by each voice equipment.
In order to determine the user status information based on the collected information, the voice interaction system of the present invention may further include: and the confirmation node is used for performing the operation required for acquiring the user state information of the user based on the operation equipment, determining the user state information of the user, and dynamically selecting the voice equipment for performing the current voice interaction with the user from the plurality of voice equipment. In one simplest embodiment, the determination node may be a central voice device of a plurality of voice devices, such as voice device 210-2 designated as the central node in FIG. 2. In other embodiments, the validation node may be a server that communicates with the local voice interaction system, or an edge computing device, or a local central node that is not used as a voice device in the voice interaction system.
In the illustration of fig. 2, dashed arrows represent communication between devices, and solid arrows represent voice interaction of a selected voice device with a user. The operating device 230 may directly transmit the user status information or data for generating the user status information to the voice device 210-2 as the center node. The voice device 210-2 simultaneously aggregates information from the other voice devices 210-1 and 210-3, generates user status information (e.g., based on broadcast signal strengths of the three voice devices) that determines that the voice device 210-1 is best suited for voice interaction with the user (e.g., with minimal broadcast attenuation), and enables the voice device 210-1 to be used to interact with the user, such as to notify the user of an incoming call, or to otherwise notify or answer the user.
Because each voice device can directly determine the device to be interacted with the user by comparing the received intensity of the voice of the user when the user actively initiates the voice interaction, the voice interaction system is more suitable for the situation that the voice device needs to actively initiate the voice interaction, for example, under the condition that an external voice call needs to be accessed.
Figures 3A-B illustrate one example of voice interaction in accordance with the present invention. As shown in fig. 3A, since the area is large and the room is large, the user has a plurality of voice devices in his home. The plurality of voice devices (e.g., smart speakers supporting a call) may be registered under the same account of the user. The user wears the smart watch on the wrist. The smart watch may determine that it is being worn by the user by measuring the user's heartbeat, blood oxygen index, and the like. The smart watch may then send out a broadcast, e.g., an omnidirectional broadcast, either on its own initiative or based on an external command (e.g., a command from a central node or a server). The above-mentioned broadcast has a specific format, for example, Bluetooth Low Energy (BLE) broadcast having a specific format for being received by four voice devices located in a living room, a master-bedroom, a bedroom, and a kitchen, respectively.
If only one speaker receives the broadcast from the designated wearable device, that speaker is selected as the current interactive speaker. If a plurality of sound boxes receive the broadcast sent by the appointed intelligent watch, the strength of the received signal is used as the basis for selecting the main sound box, as shown in fig. 3A, a user wears the intelligent watch and is positioned in a living room, the strength of the received broadcast RSSI (received signal strength indicator) of the intelligent equipment in the living room is-10 dbm, the strength of the received broadcast RSSI of the sound box in the main bedroom is-40 dbm, the distance between the sound box in the main bedroom and the user is far and/or more blocked than that of the equipment in the living room, the strength of the received broadcast RSSI of the sound boxes in the kitchen and the other bedroom is-50 dbm and-60 dbm respectively, the distance is far, the attenuation is larger, and the sound boxes in the living room are naturally selected as the main sound box.
During the interaction, if the user moves around, for example, as shown in fig. 3B, and walks from the living room to the bedroom along the route shown by the curved dashed line, the RSSI strength of the broadcast received by each speaker will change in response. At this time, the RSSI value of the broadcast received by the intelligent sound box in the bedroom is strongest and is-10 dbm, which indicates that the user distance is closer, the RSSI value of the broadcast received by the sound box in the living room is-50 dbm, which indicates that the user distance is longer, the RSSI value of the broadcast received by the sound box in the main bedroom is-40 dbm, the sound box in the kitchen does not receive signals, and the sound box in the bedroom can be switched into an interactive sound box.
The above scheme is especially suitable for the answering of network telephones or conventional telephones accessed via the network. In the answering process, the operating device (the smart watch worn by the user) can continuously send broadcast signals, such as BLE broadcast, and the system can switch the voice device which performs current voice interaction with the user based on the acquired current user state information of the user, for example, the device which is in conversation is switched according to the signal intensity in the conversation process, so that the conversation can be smooth even if the user moves back and forth in a multi-sound-box field. Therefore, the self-adaption of voice interaction can be realized based on the signal strength of the wearable device, and the possibility of missed call and mistaken call is reduced.
In other embodiments, the operating device 230 may capture the current state of the user by other means besides broadcasting itself, so as to obtain a user state signal representing the signal attenuation between the user and the voice device. In one embodiment, the operating device 230 may be a camera-equipped device for capturing images of the user, and may determine the current location and/or orientation of the user based on the images, either locally or based on cloud or central node processing. For example, the operation device 230 may be a depth camera capable of determining the three-dimensional position of the user based on a captured image. In this case, the operation device 230 may capture a face of a person, and infer location information of the user by face recognition and the size of pixels occupied by the captured face. The location information may be used to determine where the user is located and thereby select the appropriate voice device to interact with.
In another embodiment, the operating device 230 may perform an infrared tracking scan of the user. For example, an infrared sensor mounted on the smart air conditioner may perform infrared imaging and tracking on the user and supply air to low temperature portions (e.g., feet) of the user (e.g., in a heating mode). At this time, the position of the user can be determined based on the infrared information, and an appropriate voice device is selected for interaction.
In one embodiment, the operating device 230 may obtain the current voice input of the user, and determine the position or orientation of the user according to the voice input, and select the appropriate voice device for interaction accordingly.
In one embodiment, the device state of the operating device 230 may also be indicative of the user's location. For example, at eight nights, only the electric light and the television in the living room are turned on (for example, the lamp and the television are turned on by the voice of the user half an hour before), and at the moment, the situation that the user is in the living room can be judged, and the intelligent sound box in the living room is directly selected for interaction.
The above-mentioned operation device 230 may be one or more of the voice devices themselves, for example, a voice device having a camera may determine a user position by itself by taking a user image and performing analysis. The operating device 230 may also be other internet of things devices that are networked. It should be appreciated that the voice interaction system of the present invention may be an implementation of an intelligent internet of things. In the internet of things, the operation or state of a plurality of devices can be collected and integrated, so that the current position (which can be the precise position or the located area) of a user or the attenuation relative to other voice devices can be judged, and then the voice device to be accessed for interaction is determined.
As described above, "user state information" in the present invention may refer to information for deciding which voice device the user currently uses is optimal. Further, "user state information" may be information for characterizing a communication condition between the user and each voice device, and the operation device may perform an operation of capturing the current state of the user, thereby acquiring the user state information.
In some embodiments, the obtained user status may be used to determine the number of specifically selected voice devices or the voice interaction sound effect of the selected device, in addition to selecting for device interaction.
Specifically, although one voice device is generally selected for voice interaction based on the user state in order to avoid "one call and multiple responses", the interaction scheme of the present invention is also applicable to a special scenario in which multiple voice devices are simultaneously awakened for voice interaction with the user.
In addition, the operation device can collect the current position of the user as the user state information, and if the user is found to be currently positioned between the two voice devices and the attenuation is the same, the two voice devices can be simultaneously awakened to interact. Further, the operating device may collect the user's current sound environment as user status information, e.g., the voice device may collect the user's voice and analyze background noise in the collected voice, thereby selecting one or more devices far away from the noise source (e.g., turned on television) for interaction.
Alternatively or additionally, the interaction scheme of the present invention may determine the voice interaction sound effects of the selected voice device based on the user status information. Here, the voice interaction "sound effect" may refer to a sound effect in a broad sense, that is, an effect of sound presentation, for example, a size of sound presentation, sound field distribution, sound atmosphere (a talk mode in which only a natural speaking band is reserved, a music mode in which a full band is high fidelity, and even a subdivided mode in the music mode).
For example, the volume may be turned up when the current background noise is determined to be large, or the user is far away from the speech device. When the user is far away or in motion all the time, one or more voice devices can be started, and the sound presentation state of the voice devices is ensured through real-time sound field regulation.
Furthermore, the number of the selected voice devices and/or the voice interaction sound effects of the selected voice devices may also be based on the current voice interaction scene. For example, in a call scenario, a call mode highlighting a human voice band may be selected, and in a music playback scenario, a music playback mode may be started. More specifically, in the music play mode, two or more audio devices can be turned on to realize left and right stereo channels, even 5.1 channel sound effects, and the like.
For example, the user may give a voice instruction, "XX (voice device wake word), i want to dance in a branch". At this time, the current position of the user can be known according to the user state information acquired by the operating device. Then, it is possible to turn on the voice device closest to the user based on the position information and give voice feedback "good, now playing XXXX (song name) of XXX (singer name) for you". The song may be a dance song determined by the voice interaction system based on a user's history of previous interactions in combination with the user's current intent to dance. More speech devices may then be used for music playback, for example, two speech devices playing the left and right channels of the song, respectively, to present a more immersive dance playback effect to the user.
The voice interaction scheme is particularly suitable for being implemented as a voice call system, which comprises a plurality of voice devices used for carrying out call and user state perception devices. Here, "voice call" refers to a VoIP-enabled network call or a conventional call to access the PSTN. In the case where the voice device includes a display screen, "voice call" may also refer to an interaction that includes audio call information, such as a video call. Fig. 4 is a block diagram of a voice call system according to an embodiment of the present invention. As shown in FIG. 4, a plurality of voice devices, such as voice devices 410-1, 410-2, and 410-3 are shown in the figure, are included in the voice interaction system. A voice device may refer to a device with voice interaction functionality, in particular may be implemented as a smart speaker. Similar to fig. 2, the three voice devices described above are used for voice interaction and may be the same or different devices. Multiple voice devices may be bound to the same call object, e.g., to number b.
In addition to the speech device, the system may also include a sensing device 430 for sensing user status information. Based on the perceived user status information, a voice device to be placed in a voice call may be selected from the plurality of voice devices, and the selected voice device places the voice call in to enable the user to conduct a voice call.
Similarly, the perceiving device 430 may include: a device worn and/or carried by a user and may be used to transmit broadcasts. And the voice equipment receives the broadcast and determines the receiving strength of the broadcast, wherein the voice equipment with the highest receiving strength is selected as the voice equipment for accessing the voice incoming call.
Preferably, the system may further comprise: a server 420 for communicating with the voice devices and notifying the voice devices of the incoming voice call, e.g., a call from number a, and the voice device selected to access the voice call can be determined by the server and/or the voice devices.
The server 420 may be externally connected to a PSTN network via a gateway, for example, to access a conventional PSTN call, for example, a call from a number a, or may directly access a SIP network for performing a network call, in which case, the number a may be a number of a network telephone. Each voice device 410 is operable to voice interact with a user. In one embodiment, the voice device 410 may be various smart speakers or other smart voice devices or accessories of the same or different models. The server 420 may then be used to communicate with voice devices. For example, the server 420 may obtain the voice data uploaded by the voice devices for semantic analysis and subsequent operation processing, or perform subsequent operation based on the semantic data uploaded between the voice devices.
In the present invention, a plurality of voice devices 410 are bound to the same call object. Here, the "call object" for binding with a plurality of voice devices may refer to an object that can be called by a PSTN number or a network call (e.g., a registered user name of VoIP) and that can bind with a smart device, for example, a number of a special number segment. The voice device may be bound to the call object (e.g., number b) through a native operation, or may be bound through an APP installed on a mobile phone. The "binding" of the voice device with the call object refers to registering the association of the voice device with the call object on the server. In one embodiment, a handset for performing the binding operation may also belong to the voice call system of the present invention. At this time, the voice call system may further include an intelligent terminal for requesting the server to bind the call object with the plurality of voice devices.
When a call with the number a needs to be carried out, the voice equipment to be accessed for carrying out the voice call can be dynamically selected from the plurality of voice equipment. The dynamically selected voice device may then access the voice call via the service.
In particular practice, dynamic selection for multiple bound voice devices may be implemented at the server 420, or may have local implementation at multiple voice devices. In an embodiment implemented by the server, the server may dynamically select the voice device accessing the call from the multiple voice devices by, for example, receiving user state information reported by one or more voice devices or user state information reported by other devices of the internet of things where the voice device is located. In the embodiment implemented locally, the server 420 is then used to notify the plurality of voice devices 410 of the call, and the plurality of voice devices 410 determine to dynamically select the voice device of access 4. Subsequently, the server 420 may connect a certain voice device determined by the voice device 410 to the voice call.
As shown by the dashed box in fig. 4, the multiple voice devices 410 may belong to the same local area network or internet of things, for example, the multiple voice devices 410 are multiple smart speakers or smart voice accessories arranged in the same home. The voice devices 410 may be connected to the server 420 as shown in fig. 4, for example, each WiFi is connected to the same router, and is connected to the internet via the router to realize communication with the server 420. In some embodiments, these voice devices 410 may include a central device, and other devices are connected to the server 420 via some central device. Other connections between the speech devices 410 are also possible, and the application is not limited herein.
In the case that each of the plurality of voice devices 410 is capable of directly communicating with the server 420, each device may report a current state (e.g., RSSI of received broadcast) for the server 420 to decide to select a specific device for dynamic access. In the case where multiple voice devices 410 communicate with the server 420 via a central voice device, the central voice device can aggregate the status of other devices, so that a local decision can be made, or the server can be reported to make a decision.
In practical applications, dynamically selecting a voice device to access the PSTN for a voice call according to a user status of the voice device to be used may include at least one of: selecting the voice equipment currently activated by the user as the voice equipment to be accessed; selecting a voice device closest to a user as a voice device to be accessed; and selecting a voice device to access based on the power consumption of the voice device.
For example, the other user dials number B using phone number a (i.e., calls user B's smart speaker). In our user's home, different smart speakers may be deployed in different rooms. For example, in a living room, a speech device id 1, a speech device id 2, and a speech device id 3 are arranged in a bedroom and a kitchen. The three devices are all bound with the number b. In the case of a voice call, the voice device to be accessed may be dynamically selected according to a user status (e.g., a user status estimated based on information collected by the internet of things device).
In one embodiment, if no user status information is collected, three devices may be enabled to broadcast simultaneously, for example, "XX call", or one device may be selected to broadcast based on default settings, for example, a living room device broadcasts "XX call".
If the collected user status information indicates that the user is using a device, or is closer to the device, or is within the range covered by the device, then the device may be used for the broadcast. For example, the user is currently interacting with a bedroom voice device, which is now directly accessible with id 2. The device may announce an "XX call," and the user may then connect the call by answering the announcement. In addition, the voice device can further confirm the user state through voiceprint verification. For example, when different voice devices are being used by different persons, the voice devices can select a device that the user himself is using as a voice device to be accessed through voiceprint recognition. And when the person using the voice device is not the target user, the state of the target user can be judged from other information.
As described above, it is also possible to select a voice device closest to the user as a voice device to be connected to a voice call. The above decision may be based on the speech device itself. For example, multiple speech devices may estimate the user's location and/or orientation based on the amount of sound received by the microphones from the user. The above determination may be made by means of information acquired by a device other than the voice device, for example. For example, the above system of the present invention may further include: and the user state to use the voice device is determined according to the operation of the user on the network-connected device. The plurality of voice devices may include a plurality of smart voice devices belonging to the same internet of things. Thus, the usage status of other devices in the same internet of things can be used to determine the user status. For example, a bluetooth dome lamp in the kitchen is just lighted, and it can be determined through the above information that the user is currently located in the kitchen, so that the voice device in the kitchen can be accessed to make a PSTN call. As shown in fig. 4, the determination may be made, among other things, via a bluetooth device (e.g., a smart watch) or other bluetooth low energy device worn by the user (as the perceiving device 430). The perceiving device 430 may indirectly determine the location of the user at the same location as the bluetooth device based on the angle of arrival (AoA) and the angle of emission (AoD) in the bluetooth 5.1 standard. In the example of fig. 4, the smart watch worn by the user is also associated with number b, and receives a notification of a voice incoming call from the server 420 and displays the incoming call notification on the screen of the watch. At this time, the smart watch 430 may actively perform BLE broadcasting, and the voice devices 410-1, 410-2, and 410-3 receive the broadcasting and notify the service end 420 of the respective RSSIs. The server 420 thus determines that the signal attenuation of the voice device 410-1 is minimized and accesses the voice device 410-1 to the call at the number a, and then the user can make a call with the number a by answering the voice prompt of the voice device 410-1.
In addition, the voice device to be accessed for a voice call may also be selected based on the power consumption of the voice device. For example, in the case of close distance, the smart loudspeaker box input by the continuous power supply is selected, and the smart voice post powered by the battery is not selected for conversation.
In other embodiments, the user state on which the dynamic selection is based may be other state information than the user location state. Dynamic access determinations may also be made based on the physical state of the user, for example, when the user is wearing a networked device such as a sports watch that is capable of reflecting the physical state. For example, when the watch is now lying down and sleeping, the user may choose to access the bedroom's voice device, but choose a more gentle notification mode, e.g., a blue signal light flashing, rather than a direct voice announcement.
After a certain voice device is dynamically selected for communication, the voice device to be accessed to the voice communication can be dynamically switched between at least two voice devices according to the change of the user state in the communication process. The change of the user state in the call process comprises at least one of the following items: a change in a current location of the user; a change in a current orientation of the user. The above-mentioned variations can be determined by voice devices, other internet-of-things devices (e.g., smart air conditioners with infrared tracking functionality), or based on the bluetooth 5.1 specification as well. Therefore, the user can freely move during voice communication.
The dynamic switching between the at least two voice devices may be delayed for a predetermined time after detecting the change in the user state. For example, in the case where the user makes a call in a living room and goes to a study room to pick up a book, it is possible to determine whether to perform device switching after determining whether the user temporarily goes to the study room and returns to the living room immediately or waits until the study room is made to make a call, without switching the voice device for the call to the voice device of the study room immediately after sensing the change in the user's position, or with a slight delay. Similarly, to ensure the quality of the call, the volume of the voice device making the voice call can be adjusted according to the change of the user state during the call. The volume here may include the volume of the outgoing call to the current user, and may also include the volume of the call partner transmitted thereto. Therefore, the two parties of the call can be ensured to carry out the call in a relatively determined sound volume range.
It should be understood that the server 420 of the present invention may be a cloud-based service platform, which may provide SIP calls and other services, such as internet of things services. In other words, the server usually accesses multiple sets of voice devices simultaneously. FIG. 5 illustrates one component example of a voice interaction system in a larger dimension. As shown, the server 520 is connected to a plurality of sets of voice devices 510 and operating devices 530 via a network. Each set of voice devices and operating devices may be a local internet of things connected via Bluetooth (BT), e.g. a Bluetooth mesh network, or a BLE network. Each group of voice devices is assigned a call object (i.e., SIP number) and each device or accessory within the group is bound to the number. Each group of voice devices is capable of dynamically selecting a particular voice device or accessory to be accessed for a call, e.g., based on a user status.
In some embodiments, the above dynamic access function of the present invention may be a function that requires additional activation. Then, the server may determine whether to turn on a function of dynamically selecting a voice device to perform voice interaction (e.g., voice call) from the plurality of voice devices based on the state of the user at the server. For example, the user may pay for the above dynamically selected function, and record the paid function opening state at the server, at which time the server may open the service for the user.
In addition, the service end can also determine to select one of the following operations for accessing the voice equipment to perform the voice call based on preset conditions such as user settings or the current state of the voice call: dynamically selecting a voice device to be accessed for a voice call from the plurality of voice devices; simultaneously starting part or all voice recognition in a plurality of voice devices bound with the same calling object; and taking a default voice device as the voice device to be accessed for making the voice call. For example, only the default device may be selected for a call during a certain period of time set by the user, while the dynamic selection service may be enabled during other periods of time. For example, all voice devices may be automatically turned on, for example, in the case of multiple consecutive calls by important contacts, etc.
The above-described system of the invention is particularly suitable for implementation as a teleconferencing system. For example, a plurality of voice devices may be provided in a conference room, each of which may include both a microphone and a speaker, and may be able to determine the current location of the user speaking based on, for example, an infrared sensor or the bluetooth 5.1 specification, thereby selecting the appropriate voice device to talk. The teleconferencing system described above may be part of a video conferencing system and is particularly suitable for deployment in larger scenes (e.g., larger conference rooms, reporting halls, or full-floor offices where, for example, model presentations are required, etc.).
The invention may also be implemented as a speech device. Fig. 6 shows a schematic composition diagram of a speech device according to the invention. As shown in fig. 6, the voice device 600 may include an interaction unit 610 and a communication unit 620. The interaction unit 610 is used for interacting with a user, and for example, may include a microphone for collecting a voice of the user and a speaker for broadcasting, and may further include all or part of modules on a voice processing link, such as a semantic analysis and understanding module, according to an implementation. In different embodiments, all or part of the voice processing link may be implemented in the cloud, or implemented in the central voice device, etc. In one embodiment, the audio device 600 may be a smart speaker, and in other embodiments, the device 600 may also be a more simply functioning audio accessory, such as a smart audio sticker or the like. The communication unit 620 is used to communicate with at least one other voice device. In different embodiments, the communication unit 620 may have different communication capabilities. For example, in the case where the voice device 600 is a smart speaker, the communication unit 620 may include short-range and long-range communication subunits, e.g., subunits including bluetooth and WiFi functionality, for communicating with a short-range internet of things device and a long-range cloud server, respectively. And in the case where the voice device 600 is a simpler voice accessory, the communication unit 620 may be provided with only a short-range communication function, for example, a bluetooth communication function, to facilitate communication with other voice devices.
Here, the voice device 600 and at least one other voice device are bound to the same call object. Similarly, a voice device to be accessed for a voice call can be dynamically selected from the voice device and at least one other voice device based on user status information of a user, and the dynamically selected voice device is accessed for the voice call.
The binding of the voice device and the at least one other voice device to the same call object may be accomplished via the server. When the server receives the voice call request, the server may notify the voice device and at least one other voice device, for example, a call from the number a, or select a device to be accessed from the voice devices.
Because the voice device to be accessed for voice call needs to be dynamically selected according to the user state of the voice device to be used, the voice device currently activated by the user can be selected as the voice device to be accessed for voice call, and the voice device closest to the user can also be selected as the voice device to be accessed for voice call. The voice device and the at least one other voice device are multiple intelligent voice devices belonging to the same internet of things, and may be multiple voice devices 210 shown in fig. 2, for example. This thing networking can also include: and the user state information is determined according to the current user state acquired by the networking equipment.
When the voice device 600 is dynamically selecting to access a voice call, the call may be played through the interaction unit 610 (e.g., speaker). In addition, the voice device may be dynamically switched whether to connect or disconnect the voice call according to a change of the user state during the call.
Further, the communication unit 620 may be configured to: receiving a broadcast sent by a user carrying and/or wearing equipment; and determining a broadcast reception strength of the broadcast.
When the voice device is used as a central node of the internet of things or as a determination device, the communication unit 620 may be configured to: receiving a broadcast reception strength of the at least one other voice device, and the apparatus further comprises: and the selection unit is used for comparing the broadcast receiving intensity of the selection unit with the broadcast receiving intensity of the selection unit, generating user state information, and selecting the voice equipment with the maximum broadcast receiving intensity as the voice equipment for performing current voice interaction with the user based on the user state information.
When the voice device is a common node or a smart voice sticker with simple functions, for example, the communication unit 620 may be configured to: sending the broadcast receiving intensity received by the self; receiving the user state information; and performing current voice interaction with the user under the condition that the user state information indicates that the user state information is selected.
In addition, the interaction unit 610 may also determine a voice interaction sound effect of voice interaction with the user, for example, a sound size, a call mode or a music mode, perform stereo playback in combination with other voice devices, and the like, based on the user state information and/or the current voice interaction scene.
The present invention may also be embodied as an intelligent device comprising: a sensing unit for sensing that the device is being carried and/or used by a user; a broadcasting unit for transmitting a broadcast, which is received by the plurality of voice devices to determine a voice device to perform voice interaction with a user based on the broadcast reception strength. The smart device may further include: a communication unit for receiving an instruction to transmit a broadcast. The instruction to send the broadcast is issued by at least one of: a server side; one of the plurality of speech devices; and an edge computing device.
The smart device may be a wearable smart device, in particular may be implemented as a smart watch.
The dynamic access scheme of the present invention may also be implemented as a voice interaction method. The voice interaction method may be performed by a voice interaction system such as that shown in fig. 2, and in some cases, may also be performed by a voice device, such as a smart speaker, serving as a central node. FIG. 7 shows a flow diagram of a voice interaction method according to an embodiment of the invention.
In step S710, user status information of the user is acquired. In step S720, a voice device performing current voice interaction with the user is selected from a plurality of voice devices based on the acquired user status information of the user. In step S730, the selected voice device is notified to perform voice interaction with the user.
In one embodiment, obtaining user status information of a user may include: capturing an image of the user; performing infrared tracking scanning on the user; acquiring the current voice input of the user; and acquiring the equipment state of the user operation equipment.
Alternatively or additionally, obtaining user status information of the user may include: determining an operating device being carried and/or used by a user; instructing the operating device to transmit a broadcast; the plurality of voice devices receive the broadcast and determine respective broadcast reception strengths; and determining the user state information according to the broadcast receiving strength.
Further, the method may further include determining the number of the selected voice devices and/or voice interaction sound effects of the selected voice devices based on the user state information and/or the voice interaction scenario.
The voice interaction system, the interaction method, and the corresponding voice device according to the present invention have been described in detail above with reference to the accompanying drawings. The voice interaction system, the voice interaction method and the corresponding voice equipment can realize dynamic selection of a plurality of voice equipment by acquiring the user state. In particular, the device most suitable for interacting with the user (e.g., accessing a voice call) may be determined by the user wearing or carrying the device to transmit the broadcast, and the voice device receiving the broadcast and calculating the attenuation, thereby promoting accuracy and ease of use in providing voice services to the user.
Furthermore, the method according to the invention may also be implemented as a computer program or computer program product comprising computer program code instructions for carrying out the above-mentioned steps defined in the above-mentioned method of the invention.
Alternatively, the invention may also be embodied as a non-transitory machine-readable storage medium (or computer-readable storage medium, or machine-readable storage medium) having stored thereon executable code (or a computer program, or computer instruction code) which, when executed by a processor of an electronic device (or computing device, server, etc.), causes the processor to perform the steps of the above-described method according to the invention.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems and methods according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (35)

1. A voice interaction system comprises a plurality of voice devices and an operation device, wherein,
the voice equipment is used for carrying out voice interaction;
the operation equipment is used for carrying out operation required for acquiring user state information of a user;
selecting voice equipment for performing current voice interaction with the user from the plurality of voice equipment based on the acquired user state information of the user; and is
The selected voice device performs voice interaction with the user.
2. The system of claim 1, wherein the operating device comprises:
a wearable device worn by the user; and
a device carried and/or operated by the user.
3. The system of claim 2, wherein the operation device is configured to perform operations required to obtain user status information of a user, including:
the broadcast is transmitted and the broadcast is transmitted,
the voice interaction system further comprises:
and a plurality of broadcast receiving devices, each for receiving the broadcast and determining a reception strength of the broadcast, wherein the user status information is determined according to the reception strength of the broadcast received by the plurality of broadcast receiving devices.
4. The system of claim 3, wherein the plurality of broadcast receiving devices includes one or more of the voice devices.
5. The system of claim 1, further comprising:
and the confirmation node is used for performing the operation required for acquiring the user state information of the user based on the operation equipment, determining the user state information of the user, and dynamically selecting the voice equipment for performing the current voice interaction with the user from the plurality of voice equipment.
6. The system of claim 1, wherein the acknowledgement node comprises at least one of:
a server;
an edge computing device;
a central voice device of the plurality of voice devices; and
a local central node of the system.
7. The system of claim 1, wherein the user status information is used to characterize at least one of:
the current location of the user;
the current orientation of the user; and
attenuation coefficient of the speech signal emitted by the user.
8. The system of claim 1, wherein the voice interaction of the selected voice device with the user is a voice interaction actively initiated by the selected voice device.
9. The system of claim 1, wherein the operation device is configured to perform operations required to obtain user status information of a user, including:
the operation device is used for capturing the current state of the user.
10. The system of claim 9, wherein capturing the current state of the user comprises at least one of:
capturing an image of the user;
performing infrared tracking scanning on the user;
acquiring the current voice input of the user; and
and acquiring the equipment state of the user operation equipment.
11. The system of claim 9, wherein the operational device is at least one of;
one or more of the speech devices;
networking's thing networking equipment.
12. The system of claim 1, wherein the number of selected voice devices and/or voice interactive sound effects is determined based on the user status information.
13. The system of claim 12, wherein the user status information includes a current location of the user and/or a current sound environment of the user, and
determining the number of the selected voice devices and/or the voice interaction sound effect based on the user state information comprises:
and determining the number of the selected voice devices and/or voice interaction sound effects based on the current position of the user and/or the current sound environment of the user.
14. The system of claim 1, wherein the number of selected voice devices and/or voice interaction sound effects is determined based on a voice interaction scenario.
15. The system of claim 1, wherein a voice device performing a current voice interaction with the user is switched based on the acquired current user status information of the user.
16. A voice call system includes a plurality of voice devices and a user status sensing device, wherein,
the voice equipment is used for carrying out voice interaction with a user;
the sensing device is used for sensing the user state information,
wherein, the plurality of voice devices are bound with the same calling object;
selecting a voice device to access a voice call from the plurality of voice devices based on the perceived user status information, and
and the selected voice equipment accesses the voice call so as to enable the user to carry out voice call.
17. The system of claim 16, wherein the perception device comprises:
a device worn and/or carried by the user, and
the perceiving device is used to send a broadcast,
the voice device receives the broadcast and determines the receiving intensity of the broadcast, wherein the voice device with the highest receiving intensity is selected as the voice device accessing the voice incoming call.
18. The system of claim 16, further comprising:
a server for communicating with the voice devices and notifying the plurality of voice devices of the voice incoming call, and
the voice device selected as the voice incoming call is determined by the server and/or the plurality of voice devices.
19. A voice device in communication with at least one other voice device, wherein the voice device comprises:
the interaction unit is used for carrying out voice interaction with a user; and
a communication unit for communicating with the at least one other voice device;
and dynamically selecting the voice equipment for performing the current voice interaction with the user from the voice equipment and at least one other voice equipment based on the user state information of the user.
20. The device of claim 19, wherein the voice device and the at least one other voice device are bound to the same call object.
21. The device of claim 20, wherein the voice device to access the voice incoming call is dynamically selected from the voice device and the at least one other voice device, and
and the dynamically selected voice equipment accesses the voice incoming call to carry out voice call.
22. The device of claim 21, wherein the voice device and at least one other voice device are a plurality of smart voice devices belonging to the same internet of things, and the internet of things further comprises:
a networked device, wherein,
and the user state information is determined according to the current user state acquired by the networking equipment.
23. The apparatus of claim 22, wherein the voice apparatus is dynamically switched whether to connect or disconnect the voice call according to a change of a current user status during a call.
24. The device of claim 19, wherein the communication unit is to:
receiving a broadcast sent by a user carrying and/or wearing equipment; and
determining a broadcast reception strength of the broadcast.
25. The device of claim 24, wherein the communication unit is to:
receiving a broadcast reception strength of the at least one other voice device,
and the apparatus further comprises:
and the selection unit is used for comparing the broadcast receiving intensity of the selection unit with the broadcast receiving intensity of the selection unit, generating user state information, and selecting the voice equipment with the maximum broadcast receiving intensity as the voice equipment for performing current voice interaction with the user based on the user state information.
26. The device of claim 24, wherein the communication unit is to:
sending the broadcast receiving intensity received by the self;
receiving the user state information; and
and under the condition that the user state information indicates that the user is selected, performing current voice interaction with the user.
27. The apparatus of claim 19, wherein the interaction unit determines a voice interaction sound effect of the voice apparatus performing voice interaction with the user based on the user state information and/or a current voice interaction scene.
28. A smart device, comprising:
a sensing unit for sensing that the device is being carried and/or used by a user;
a broadcasting unit for transmitting a broadcast, which is received by the plurality of voice devices to determine a voice device to perform voice interaction with a user based on the broadcast reception strength.
29. The apparatus of claim 28, further comprising:
a communication unit for receiving an instruction to transmit a broadcast.
30. The apparatus of claim 29, wherein the instruction to transmit a broadcast is issued by at least one of:
a server side;
one of the plurality of speech devices;
an edge computing device.
31. The device of claim 28, wherein the smart device is a wearable smart device.
32. A voice interaction method, comprising:
acquiring user state information of a user;
selecting voice equipment for performing current voice interaction with the user from a plurality of voice equipment based on the acquired user state information of the user;
and informing the selected voice equipment to perform voice interaction with the user.
33. The method of claim 32, wherein obtaining user status information of a user comprises at least one of:
capturing an image of the user;
performing infrared tracking scanning on the user;
acquiring the current voice input of the user; and
and acquiring the equipment state of the user operation equipment.
34. The method of claim 32, wherein acquiring user status information of the user comprises:
determining an operating device being carried and/or used by a user;
instructing the operating device to transmit a broadcast;
the plurality of voice devices receive the broadcast and determine respective broadcast reception strengths; and
and determining the user state information according to the broadcast receiving strength.
35. The method of claim 32, further comprising:
and determining the number of the selected voice equipment and/or the voice interaction sound effect based on the user state information and/or the voice interaction scene.
CN202010168054.4A 2020-03-11 2020-03-11 Voice interaction system, method and voice equipment Pending CN113393835A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010168054.4A CN113393835A (en) 2020-03-11 2020-03-11 Voice interaction system, method and voice equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010168054.4A CN113393835A (en) 2020-03-11 2020-03-11 Voice interaction system, method and voice equipment

Publications (1)

Publication Number Publication Date
CN113393835A true CN113393835A (en) 2021-09-14

Family

ID=77615514

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010168054.4A Pending CN113393835A (en) 2020-03-11 2020-03-11 Voice interaction system, method and voice equipment

Country Status (1)

Country Link
CN (1) CN113393835A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114051112A (en) * 2021-09-29 2022-02-15 浪潮软件科技有限公司 Intelligent audio and video call system and method based on home edge calculation
CN115001890A (en) * 2022-05-31 2022-09-02 四川虹美智能科技有限公司 Intelligent household appliance control method and device based on response-free

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101304582A (en) * 2008-06-25 2008-11-12 深圳华为通信技术有限公司 Method for reminding signal strength and terminal
CN103458138A (en) * 2012-05-31 2013-12-18 ***通信集团公司 Intercom method, system and device
CN106162870A (en) * 2016-07-21 2016-11-23 贵州力创科技发展有限公司 The method obtaining customer position information based on the scanning of Wi Fi hot terminal
CN106782540A (en) * 2017-01-17 2017-05-31 联想(北京)有限公司 Speech ciphering equipment and the voice interactive system including the speech ciphering equipment
US20190123930A1 (en) * 2017-10-19 2019-04-25 Libre Wireless Technologies, Inc. Multiprotocol Audio/Voice Internet-Of-Things Devices and Related System

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101304582A (en) * 2008-06-25 2008-11-12 深圳华为通信技术有限公司 Method for reminding signal strength and terminal
CN103458138A (en) * 2012-05-31 2013-12-18 ***通信集团公司 Intercom method, system and device
CN106162870A (en) * 2016-07-21 2016-11-23 贵州力创科技发展有限公司 The method obtaining customer position information based on the scanning of Wi Fi hot terminal
CN106782540A (en) * 2017-01-17 2017-05-31 联想(北京)有限公司 Speech ciphering equipment and the voice interactive system including the speech ciphering equipment
US20190123930A1 (en) * 2017-10-19 2019-04-25 Libre Wireless Technologies, Inc. Multiprotocol Audio/Voice Internet-Of-Things Devices and Related System

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114051112A (en) * 2021-09-29 2022-02-15 浪潮软件科技有限公司 Intelligent audio and video call system and method based on home edge calculation
CN114051112B (en) * 2021-09-29 2023-07-18 浪潮软件科技有限公司 Intelligent audio and video call system and method based on family edge calculation
CN115001890A (en) * 2022-05-31 2022-09-02 四川虹美智能科技有限公司 Intelligent household appliance control method and device based on response-free
CN115001890B (en) * 2022-05-31 2023-10-31 四川虹美智能科技有限公司 Intelligent household appliance control method and device based on response-free

Similar Documents

Publication Publication Date Title
CN110138937B (en) Call method, device and system
CN104270668B (en) A kind of video content continuous playing method and system
US20200021627A1 (en) Communication system and method
US20180070186A1 (en) Hearing aid having an adaptive classifier
JP5877351B2 (en) Communication apparatus and communication method
US20140269531A1 (en) Intelligent connection management in wireless devices
CN106371799A (en) Volume control method and device for multimedia playback equipment
JP2017504782A (en) Mode switching method, apparatus, program, and recording medium
CN106210365B (en) Videoconference method for regulation of sound volume and system
JP5966784B2 (en) Lighting device and program
CN105794186A (en) Method, device and electronic device for controlling application program
CN103270738A (en) Communication system and method for handling voice and/or video calls when multiple audio or video transducers are available
CN103416023A (en) Communication system and method
WO2015117347A1 (en) Adjustment method and device for terminal scene mode
CN111753769A (en) Terminal audio acquisition control method, electronic equipment and readable storage medium
CN113393835A (en) Voice interaction system, method and voice equipment
CN114845144B (en) Screen projection method, auxiliary screen projection device and storage medium
CN113496701A (en) Voice interaction system, method, equipment and conference system
US9705696B2 (en) Monitoring system
CN112671971A (en) Call processing method and device, storage medium and electronic device
CN106357913A (en) Method and device for prompting information
CN104580764A (en) Ultrasound pairing signal control in teleconferencing system
US20160112574A1 (en) Audio conferencing system for office furniture
US20240147128A1 (en) Mode control method, device for bluetooth headset, and computer readable storage medium
CN105141880A (en) Call answering method and call answering device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40058140

Country of ref document: HK