WO2020015473A1 - 交互方法及装置 - Google Patents

交互方法及装置 Download PDF

Info

Publication number
WO2020015473A1
WO2020015473A1 PCT/CN2019/090504 CN2019090504W WO2020015473A1 WO 2020015473 A1 WO2020015473 A1 WO 2020015473A1 CN 2019090504 W CN2019090504 W CN 2019090504W WO 2020015473 A1 WO2020015473 A1 WO 2020015473A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
sensing area
users
voice
interactive
Prior art date
Application number
PCT/CN2019/090504
Other languages
English (en)
French (fr)
Chinese (zh)
Inventor
朱碧军
陈志远
俞静飞
Original Assignee
钉钉控股(开曼)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 钉钉控股(开曼)有限公司 filed Critical 钉钉控股(开曼)有限公司
Priority to JP2021525345A priority Critical patent/JP2021533510A/ja
Priority to SG11202100352YA priority patent/SG11202100352YA/en
Publication of WO2020015473A1 publication Critical patent/WO2020015473A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback

Definitions

  • One or more embodiments of the present specification relate to the field of electronic technology, and in particular, to an interaction method and device.
  • the electronic device may complete the above-mentioned interaction process with the user by displaying related content on the screen, voice-playing related content, and the like.
  • one or more embodiments of the present specification provide an interaction method and device.
  • an interaction method including:
  • the target interactive object of the interactive content When the target interactive object of the interactive content is a part of users in the sensing area, information of the target interactive object is displayed to users in the sensing area.
  • an interactive device including:
  • a detection unit that detects a user in a sensing area
  • a providing unit that provides interactive content to users in the sensing area
  • the display unit when the target interactive object of the interactive content is a part of the users in the sensing area, displays the information of the target interactive object to the users in the sensing area.
  • FIG. 1 is a schematic architecture diagram of an interaction system according to an exemplary embodiment
  • FIG. 2 is a flowchart of an interaction method according to an exemplary embodiment
  • FIG. 3 is a schematic diagram of an interaction scenario provided by an exemplary embodiment
  • FIG. 4 is a schematic diagram of interaction with internal employees according to an exemplary embodiment
  • FIG. 5 is a schematic diagram of guiding a user position through interactive content according to an exemplary embodiment
  • FIG. 6 is a schematic diagram of an interactive device actively initiating an interaction with a user according to an exemplary embodiment
  • FIG. 7 is another schematic diagram of guiding a user's location through interactive content according to an exemplary embodiment
  • FIG. 8 is a schematic diagram of a normal interaction scenario provided by an exemplary embodiment
  • FIG. 9 is a schematic diagram of adjusting interaction content according to an associated event according to an exemplary embodiment.
  • FIG. 10 is a schematic diagram of designating a speaker by an interactive device according to an exemplary embodiment
  • FIG. 11 is a schematic diagram of another speaker designated by an interactive device according to an exemplary embodiment
  • FIG. 12 is a schematic diagram of another speaker designation by an interactive device according to an exemplary embodiment
  • FIG. 13 is a schematic diagram of a speaking sequence of a designated external person according to an exemplary embodiment
  • FIG. 14 is a schematic diagram of labeling an interaction object according to an exemplary embodiment
  • 15 is a schematic diagram of labeling a target interaction object according to an exemplary embodiment
  • 16 is a schematic diagram of determining a source user of a user's voice according to an exemplary embodiment
  • FIG. 17 is a schematic diagram of determining a source direction of an audio message according to an exemplary embodiment
  • FIG. 18 is a schematic diagram of labeling a source user of a user's voice according to an exemplary embodiment
  • FIG. 19 is a schematic structural diagram of a device according to an exemplary embodiment
  • FIG. 20 is a block diagram of an interaction apparatus according to an exemplary embodiment.
  • the steps of the corresponding method are not necessarily performed in the order shown and described in this specification.
  • the method may include more or fewer steps than described in this specification.
  • a single step described in this specification may be divided into multiple steps for description in other embodiments; and multiple steps described in this specification may be combined into a single step for other embodiments. description.
  • the interaction scheme of this specification may be applied to an interactive device.
  • the interactive device may be an electronic device dedicated to implementing an interactive function; or, the interactive device may be a multifunctional electronic device with interactive functions.
  • the interactive device may include a PC, a tablet device, a notebook computer, and a wearable device (such as Smart glasses, etc.), one or more embodiments of this specification are not limited to this.
  • the interactive device can run an interactive system to implement an interactive solution.
  • the application of the interactive system can be pre-installed on the interactive device so that it can be started and run on the interactive device; of course, when using technologies such as HTML5, there is no need to install the application on the interactive device, that is, The interactive system described above is available and operational.
  • FIG. 1 is a schematic architecture diagram of an interaction system provided by an exemplary embodiment.
  • the interaction system may include a server 11, a network 12, and an interaction device 13.
  • the server-side program of the interactive system can be run to realize related processing functions; and during the running process, the interactive device 13 can run the client-side program of the interactive system to realize Relevant information presentation, human-computer interaction and other functions, so as to implement the interactive system in cooperation between the server 11 and the interactive device 13.
  • the server 11 may be a physical server including an independent host, or the server 11 may be a virtual server carried by a host cluster.
  • the interactive device 13 may be an electronic device dedicated to implementing interactive functions; or, the interactive device 13 may be a multifunctional electronic device with interactive functions.
  • the interactive device may include a PC, a tablet device, a notebook computer, and a wearable device (such as Smart glasses, etc.), one or more embodiments of this specification are not limited to this.
  • the network 12 for interaction between the interaction device 13 and the server 11 may include various types of wired or wireless networks.
  • the network 12 may include a Public Switched Telephone Network (PSTN) and the Internet.
  • PSTN Public Switched Telephone Network
  • the application of the client of the interactive system can be pre-installed on the interactive device, so that the client can be started and run on the interactive device; of course, when an online "client” such as HTML5 technology is used , You can get and run the client without installing the corresponding application on the interactive device.
  • the above-mentioned interactive system may be implemented based on a mobile group office platform.
  • This mobile group office platform can implement communication functions, and can also be used as an integrated functional platform for many other functions, such as approval events (such as leave requests, office item applications, financial and other approval events), attendance events, task events, log events Processing of internal events such as groups, and external events such as ordering and purchasing, is not limited in one or more embodiments of the present specification; similarly, the mobile group office platform can implement the above-mentioned interactive system.
  • the mobile group office platform can host instant messaging applications in related technologies, such as enterprise instant messaging (EIM) applications, such as Skype For Microsoft Wait.
  • EIM enterprise instant messaging
  • the instant messaging function is only one of the communication functions supported by the mobile group office platform.
  • the mobile group office platform can also implement more other functions such as those described above, which will not be repeated here.
  • the "group” in this specification may include various organizations such as enterprises, schools, military units, hospitals, and public institutions, and this application does not limit this.
  • the above interactive system can also be implemented based on any other type of application, and is not limited to mobile group office platforms or similar scenarios, such as ordinary instant messaging applications, etc., and this specification does not limit this. .
  • FIG. 2 is a flowchart of an interaction method according to an exemplary embodiment. As shown in FIG. 2, the method may be applied to an interactive device, and may include the following steps:
  • Step 202 Detect a user in a sensing area.
  • the interactive device has a certain sensing distance, and the coverage area of the sensing distance constitutes a sensing area, such as a sector (or other arbitrary shape) area with a radius of 3m; by detecting the sensing area, it can be determined whether There are users in this sensing area.
  • a sensing area such as a sector (or other arbitrary shape) area with a radius of 3m
  • the interactive device can detect the user in the sensing area in any manner. For example, the interactive device may determine whether a user exists in the sensing area by implementing face detection.
  • Step 204 Provide interactive content to users in the sensing area.
  • the interactive device may provide interactive content in any one or multiple combinations, which is not limited in this specification.
  • the interactive device may include a display screen, and the interactive content is displayed on the display screen, so as to provide the interactive content to users in the sensing area;
  • the interactive device may include a speaker and pass the interactive content through the speaker Perform voice broadcasting to provide the interactive content to users in the sensing area;
  • the interactive device may include several indicators, and control the on / off state, color, and blinking mode of the indicators to provide the interactive content Provided to users in the sensing area.
  • the sensing area may include a near-field sensing area and a far-field sensing area.
  • the near-field sensing area is closer to the interactive device than the far-field sensing area, that is, the "near field” and "far field” here are
  • a range of 0 to 1.5 m can be defined as a near-field sensing region, and a range of 1.5 to 3 m can be defined as a far-field sensing region.
  • the interactive device may provide interactive content to users in the near-field sensing area; and, the interactive device may issue guidance information to users in the far-field sensing area to guide the user from the far-field
  • the sensing area enters the near-field sensing area, so that the user becomes a user in the near-field sensing area, thereby providing interactive content to the user.
  • the user has a certain probability of wishing to interact with the interactive device. Then, since the far-field sensing area may not provide a good interaction effect due to the long distance, the user can send guidance information to the user.
  • the interactive device can send guidance information in any one or a combination of ways, which is not limited in this specification; for example, the interactive device may display the guidance information on a display screen, and for example, the interactive device may use a speaker Voice guidance is provided for the guidance information, and for example, the interactive device may light a prompt light or make the prompt light flash, thereby guiding a user to enter a near-field sensing area.
  • the interactive device when the interactive device can perform attendance operations, by identifying the users in the sensing area, the attendance operation can be automatically performed on the users in the sensing area and not yet attending during the attendance time period, regardless of whether they are Located in the near-field sensing area or the far-field sensing area.
  • the interactive device may determine that the user may need to implement Interactive operation to provide interactive content to the user; since the user usually approaches the interactive device only if he wants to interact with the interactive device, the second preset duration can be appropriately shorter than the first preset duration to shorten the user Waiting time.
  • the interactive device may actively provide interactive content to users in the sensing area, similar to the "greeting" behavior between users.
  • the interactive content may include “what can help you", etc.
  • the content guides users in the sensing area to assist them in completing related events.
  • the interaction device may determine whether a user in the sensing area satisfies a preset condition, so as to provide interactive content only to users who meet the preset condition.
  • the preset condition may include at least one of the following: in the far-field sensing area Of the dwell duration reaches the first preset duration, the dwell duration in the near-field sensing area reaches the second preset duration, the user watches toward the interactive device, the user's face is facing the interactive device or the angle between the two is smaller than the preset Setting the angle, etc., this specification does not limit this.
  • the interaction device can obtain a user's associated event in the sensing area, so that when the interactive content is related to the associated event, the interactive device can adjust the direction to the sensing area according to the status information of the associated event.
  • User-provided interactive content and when there is no associated event related to the interactive content, a default interactive content can be provided. For example, when the interactive content is related to attendance, for a user in a sensing area detected during normal working hours, the interactive content may be "Are you sure to leave early?"; And if the related event for the user in the sensing area includes The sick leave approval event has reached the leave period involved in the sick leave approval event, so the interaction content can be adjusted to "Are you sure to get off work?" Even if it is still under normal working hours.
  • the interactive content when the interactive content is related to the access of an external person, if the user in the sensing area is an external person and the corresponding associated event is an access reservation event, then the interactive content may be "Do you need to help you access the access object?" ? "; If there is no corresponding access reservation event, then the interactive content can be” Please say your visitor. "
  • the interactive device may determine the identity type of the user in the sensing area, and then adjust the interactive content provided to the user in the sensing area according to the identity type.
  • the identity type may include: a user in the sensing area belongs to a member of the group or a person outside the group, a department to which the user in the sensing area belongs, etc., so as to provide the users in the sensing area with interactive content consistent with the identity type.
  • Step 206 When the target interactive object of the interactive content is a part of the users in the sensing area, display the information of the target interactive object to the users in the sensing area.
  • the target interaction object when there are multiple users in the sensing area, the target interaction object may be only a part of the users (the number of the part of the users may be one or multiple). Displaying the information of the target interactive object can ensure that each user in the sensing area can clearly understand whether he or she is the target interactive object of the interactive content.
  • the image information of each user in the sensing area may be obtained separately, and a corresponding avatar picture is generated for each user. Then, when the interaction is not implemented, the avatar pictures of each user can be shown at the same time; while when interacting with the target interactive object, only the avatar picture corresponding to the target interactive object can be shown, the avatar pictures of other users can be blocked, or The target interactive object is displayed differently from the avatar picture corresponding to other users.
  • the differentiated display may include displaying the avatar picture of the target interactive object in the central area, and displaying the avatar picture of other users in the edge area.
  • the differentiated display may include normal display (color) of the avatar picture of the target interactive object, and avatar of other users Pictures are displayed after being grayed out, etc., and this specification does not limit this.
  • the target interactive object of the interactive content is all users in the sensing area (the number of all users may be one or multiple), there is no need to show the target interactive object to the users in the sensing area.
  • Information which helps users pay more attention to the interactive content itself; of course, even if the information of the target interactive object is displayed in this scenario, it does not affect the implementation of the technical solution of this specification.
  • the users in the sensing area may continuously change; when the target interactive object of the interactive content is changed from some users in the sensing area to all users (for example, a user who is not the target interactive object leaves (Sensing area), it is possible to temporarily stop displaying information of the target interactive object to users in the sensing area, so that "the target interactive object of the interactive content is a part of the users in the sensing area" to "the target interactive of the interactive content” The object is for all users in the sensing area "to achieve a smooth scene transition.
  • the interaction device may determine the identity information of the user who is the target interaction object as the information of the target interaction object; and then, display the identity information to the users in the sensing area.
  • the interactive device can display the above-mentioned identity information to the users in the sensing area while providing the interactive content; or the operation of providing interactive content and displaying the identity information by the interactive device may not be implemented at the same time, and this specification does not limit this .
  • the interactive device may identify the user in the sensing area.
  • the interactive device may recognize physiological features such as face recognition, fingerprint recognition, iris recognition, gait recognition, and voiceprint recognition. Or in any other way, this specification does not limit this.
  • the identity information may include the title of the first user (such as name, flower name, job title, or other types), such as when the title of the first user is "Xiaobai ",
  • the interactive device can show the user in the sensing area" Xiaobai, what do you need ", where" Xiaobai "is the information of the above target interactive object, and” What do you need "is the above interactive content.
  • the identity information may include visual characteristic description information for the second user, for example, the visual characteristic description information may include at least one of the following: estimated gender, Estimating height, estimated age, skin color, clothing, accessories, distance to the interactive device, orientation angle to the interactive device, etc.
  • the interactive device can show users in the sensing area “this man in a black shirt” "What do you need?" Among them, "Men in black shirts” will estimate the gender and clothing as the target interaction information, and "What do you need” as the above interaction content.
  • the interactive device may display user reference information corresponding to the user in the sensing area; the interactive device may determine the user reference information corresponding to the user who is the target interactive object as the target interactive object The interactive device may then highlight the determined user-referred information to the users in the sensing area. For example, the interactive device may collect images of users in the sensing area to display the collected user images as the user reference information; correspondingly, the interactive device may visualize the user images corresponding to the target interactive object.
  • Effect enhancement such as circle the corresponding user image, display arrow icons near the corresponding user image, etc.
  • weaken the visual effect of the user image corresponding to the non-target interactive object such as content occlusion of the corresponding user image, etc.
  • the interactive device may obtain an event assistance request from a user in the sensing area, and then respond to the event assistance request to assist in completing a corresponding event.
  • the user in the sensing area can send an interactive voice of "calling employee Xiaohei" to the interactive device, and the interactive device can clearly determine that its event assistance request is a call request for "Employee Xiaohei", and thus initiate the "Employee Xiaohei" call.
  • users in the sensing area can also send event assistance requests in other ways, such as making preset limb movements in the space, etc., which is not limited in this description.
  • the interactive device may receive response information returned by the user in the sensing area for the interactive content, and the response information includes the event assistance request.
  • the response information includes the event assistance request.
  • the interactive content is "Xiaobai, can I help you?"
  • the user in the sensing area can reply "Call employee Xiaohei”
  • the interactive device can clearly determine that his event assistance request is for "Employee Xiaohei" Call request to initiate a call to this "Employee Xiaohei”.
  • the manner in which the interactive device provides interactive content is not necessarily related to the manner in which the user in the sensing area returns response information, and the two may be the same or different, which is not limited in this specification.
  • the interactive device may select the users to be assisted according to a preset order; then, the interactive device may issue a prompt to the selected users in order to make the user
  • the selected users issue corresponding event assistance requests, so that users in the sensing area can issue event assistance requests in turn, avoiding confusion when multiple users issue event assistance requests at the same time, and the interactive device cannot accurately know the event assistance requests corresponding to each user. , Which helps to improve the efficiency and success rate of interactive devices to assist each user.
  • the interactive device may perform semantic recognition on the collected user voice to obtain the event assistance request; and the interactive device may perform voice feature recognition on the user voice to determine the source user of the user voice . Then, even if multiple users in the sensing area speak at the same time, the interactive device can simultaneously distinguish the corresponding speech content of each user and respond to the corresponding event assistance request, improving the efficiency of assistance to each user.
  • the interactive device can identify the users in the sensing area in advance to obtain the identity information of each user. When there are multiple users in the sensing area, the interactive device can identify the users based on the voice characteristics of the identified users.
  • the collected user's voice is used for voice feature recognition to determine the source user of the user's voice. Compared with the full voice feature for user voice recognition, the time required for voice feature recognition can be greatly reduced.
  • the interactive device may assist in completing a corresponding group management event in response to the event assistance request; when the user in the sensing area is outside the group In the case of personnel, the interactive device may respond to the event assistance request by sending a reminder message to the members of the associated group, assisting people outside the group to establish communication with the members of the associated group, or directing people outside the group to Visit the processing place of the event; when the user in the sensing area is an administrator, the interactive device can assist in completing the corresponding device management event in response to the event assistance request.
  • the interactive device may receive a user voice issued by a user in the sensing area, and respond to the user voice.
  • the user in the sensing area can actively send user voice to the interactive device.
  • the user voice is used to send an event assistance request to the interactive device, daily greetings to the interactive device, and control instructions to the interactive device.
  • This manual does not address this.
  • a user in a sensing area may respond to the interactive operation performed by the interactive device and send a corresponding user voice to the interactive device to respond to the interactive operation.
  • the interactive operation performed by the interactive device is to query the sensor in the sensing area.
  • the user's voice issued by the user can inform the interactive device what kind of help it needs, etc. This specification does not limit this.
  • the interactive device can perform semantic recognition on the user's voice; since there may be multiple corresponding words or words in the same pronunciation, there may also be some distortion or noise in the process of the user's voice pickup by the interactive device Interference, so that the interactive device may obtain multiple semantic recognition results after recognizing the user's voice.
  • the interactive device can score each semantic recognition result according to a predefined semantic recognition algorithm to obtain a corresponding confidence level. When the confidence level reaches a preset value, it indicates that the reliability of the corresponding semantic recognition result is sufficiently high.
  • the interactive device may display the corresponding multiple semantic recognition result options to the user in the sensing area for the user to choose To accurately express the user's true intention, and then respond to the user's voice according to the semantic recognition result corresponding to the selected semantic recognition result option; wherein, during the selection process, the user can read the semantic recognition result that he wants to select
  • the semantic recognition results corresponding to the options, or the order (such as "first”, “leftmost”, etc.) of the semantic recognition result options that are desired to be selected, etc., is not limited in this specification.
  • the interactive device may determine a source direction of the user voice, and respond to a user located in the source direction of the user voice. In one case, after determining the source direction of the user's voice, the interactive device directly assumes that there is only a user who speaks the user's voice at the source direction, so it can respond directly to the source of the user's voice, such as playing an interaction Voice, etc. In another case, the interactive device may determine the user whose source of the user's voice exists, and if there are multiple users at the same time, the interactive device may further determine the source user of the user's voice to respond to the source user.
  • the interactive device has a built-in microphone array, and the user voice can be received through the microphone array, the microphone array includes a first microphone disposed relatively left, and a second microphone disposed relatively right;
  • the time difference between the first microphone and the second microphone receiving the user's voice determines the source direction of the user's voice. For example, when the user in the sensing area is on the left side, the first microphone can receive the user's voice earlier than the second microphone, and when the user in the sensing area is on the right side, the second microphone can be related. User voice is received earlier than the first microphone.
  • the specific scheme of how to determine the source direction of the user's voice based on the reception time difference reference may be made to related technical solutions in the prior art, and details are not described herein again.
  • the interactive device may be obtained according to the facial actions of each of the multiple users (for example, obtained by performing image collection through a camera built in the interactive device). ), Determining a source user of the user voice, and responding to the source user.
  • the facial movements of the user may include movements of one or more parts such as the cheeks, the mouth, and the chin, which are not limited in this specification. Take mouth action as an example.
  • the user can be determined as the user ’s source of the user ’s voice; for example, although multiple users The opening and closing movements of all the mouths have occurred, but only one user's opening and closing times, amplitude, etc. match the user's voice, and the user can be determined as the source user of the user's voice.
  • a user when an interactive device is mounted on a wall, a user can usually only pass through the front of the interactive device and emit a user's voice; and when the interactive device adopts other assembly methods, the user may appear in front of the interactive device or Rear, so that the audio message collected by the interactive device may come from a user located in the front or rear. If there is a user in the sensing area of the interactive device and other users speak while passing by the rear of the interactive device, the interactive device may mistake it as a sensor. User voices from users in the area. Therefore, after the interactive device receives the audio message, it can determine whether the audio message is a user voice issued by the user in the sensing area based on the source direction of the audio message and whether there is a user in the sensing area.
  • the interactive device has a built-in microphone array including a third microphone relatively close to the sensing area and a fourth microphone relatively far from the sensing area; when an audio message is received through the microphone array, The reception of the high-frequency part in the audio message by the three microphones and the fourth microphone determines the source direction of the audio message; wherein when the source direction is relatively close to the sensing area, the high-frequency part in the audio message is Part of it will be affected by the absorption of the shell of the interactive device, so that the high frequency part of the audio message received by the fourth microphone is smaller than the high frequency part of the audio message received by the third microphone, and when the source direction is relatively far away from the induction
  • the high-frequency part of the audio message is also affected by the absorption of the interaction device casing, so that the high-frequency part of the audio message received by the third microphone is smaller than that of the audio message received by the fourth microphone.
  • the interactive device may determine that the audio message is a user voice issued by a user in the sensing area; otherwise For example, when the source direction is a side relatively far from the sensing area, or when the source direction is a side relatively close to the sensing area but no user exists in the sensing area, the interactive device may determine The audio message is not a user voice issued by a user in the sensing area.
  • the microphone array may include one or more first microphones and one or more second microphones; for example, the microphone array may include one or more third microphones and one or more fourth microphones.
  • the microphone array does not necessarily need to include four microphones at the same time; in other words, the first microphone and the second microphone, the third microphone, and the fourth microphone described above
  • the microphone is just the role that the microphone plays in realizing related functions.
  • the microphone array can contain a smaller number of microphones.
  • the microphone array can contain three microphones, of which microphone 1 and microphone 2 are located in a row in the front-to-rear direction and left and right.
  • the microphones 3 and 3 are located in front of or behind the microphone 1 and the microphone 2, so that the microphones 1 to 3 form a positional relationship similar to the “pin” shape.
  • the microphone 1 and the microphone 2 can be used as the first microphone and the third microphone.
  • the interactive device can detect the number of users in the sensing area, such as face detection and counting after acquisition through a camera, which is not limited in this specification.
  • the interactive device can separately display the avatar pictures corresponding to each user for the purpose of characterizing these users; when the users in the sensing area increase, decrease or change, the interaction
  • the avatar picture displayed on the device can also change accordingly.
  • the user's voice source user's avatar picture can be displayed differently from other users' avatar pictures, so that the user can view the Change, it can be determined that the interactive device has successfully received the user's voice and identified its source user, without having to worry about the interactive device not receiving the user's voice or recognition errors.
  • the user's voice source user's avatar picture can be distinguished from other users' avatar pictures in any way, and this manual does not limit this; for example, the user's voice source user's avatar picture can be displayed in the center area, other The user's avatar picture is displayed in the edge area; for example, the avatar picture of the user's voice source can be displayed enlarged, and the other user's avatar picture can be displayed normally or reduced; for example, the user's voice source user's avatar can be displayed The pictures are displayed normally (color), and the avatar pictures of other users are displayed after being grayed out.
  • FIG. 3 is a schematic diagram of an interaction scenario provided by an exemplary embodiment. As shown in FIG. 3, it is assumed that an interactive device 3 is provided in an office of an enterprise AA, and an enterprise WeChat client is run on the interactive device 3, so that the The interaction device 3 may implement the interaction scheme of the present specification based on the enterprise WeChat client.
  • the interactive device 3 is equipped with a camera 31, and the camera 31 can form a corresponding shooting area 32 as a corresponding sensing area of the interactive device 3. Accordingly, the interactive device 3 can shoot the camera according to the camera 31 An image obtained by shooting in the area 32 determines a user who enters the shooting area 32, such as the user 4 who entered the shooting area 32 in FIG. 3 and the like.
  • the interactive device 3 may also determine the user who enters the sensing area through sound detection, infrared detection, or other methods, which is not limited in this description.
  • FIG. 4 is a schematic diagram of interaction with internal employees according to an exemplary embodiment.
  • the interactive device 3 may be equipped with a screen 33, which may be used to display a user image 41 corresponding to the user 4 collected by the camera 31.
  • the interactive device 3 can identify the user 4, such as face recognition based on the face image collected by the camera 31, and this specification does not limit this; it is assumed that the interactive device 3 recognizes that the user 4 is an internal employee "small White ", the corresponding identity information 42 may be displayed on the screen 33, for example, the identity information 42 may be the title" Little White "of the user 4.
  • the interactive device 3 can query the time attendance data of the internal employee "Xiaobai”. White "attendance operation.
  • the interactive device 3 may provide the corresponding interactive content to the user 4, for example, the interactive content may include a label 43 shown on the screen 33, and the information contained in the label 43 is "go to work” ", Indicating that the type of time and attendance operation is" Punch to work "; the interactive content can also be provided to user 4 in other forms.
  • the interactive device 3 contains a speaker 34, such speakers can be played through the speaker 34 such as" Xiaobai, successful punch at work " voice message.
  • the interactive device 3 can perform automatic work attendance operations on other internal employees of the enterprise AA, and the interactive device 3 can also perform automatic work attendance operations on internal employees of the enterprise AA, which will not be repeated here.
  • the sensing area of the interactive device 3 may be divided into multiple sub-areas according to the distance from the interactive device 3, for example, the shooting area 32 is divided into a far-field shooting area 321 (with interactive The distance of the device 3 is 1.5 to 3.0 m) and the near-field shooting area 322 (the distance to the interactive device 3 is 0 to 1.5 m).
  • the interactive device 3 can perform the above-mentioned automatic attendance operation on the user 4 regardless of whether it is in the far field shooting area 321 or the near field shooting area 322.
  • the default is that the user 4 only passes temporarily and there is no willingness to interact. Interaction with user 4 is not actively initiated (that is, no interactive content is provided to user 4); however, if the continuous stay time of user 4 in the far-field shooting area 321 reaches the first preset time length (such as 3s), interactive device 3 may determine There is a willingness for the user 4 to interact, so the user 4 can be provided with interactive content.
  • the first preset time length such as 3s
  • the interactive device 3 may default to the user 4 passing temporarily and there is no willingness to interact when the user 4 is located in the near-field shooting area 322. Therefore, the user 4 may not actively initiate interaction with the user 4 (that is, provide no interactive content to the user 4); however, if the continuous stay time of the user 4 in the near-field shooting area 322 reaches a second preset duration, the interactive device 3 may determine that the user is a user There is a willingness to interact, so that the user 4 can be provided with interactive content.
  • the behavior of the user 4 actively entering the near-field shooting area 322 may itself include a certain willingness to interact, so the second preset duration may be appropriately shorter than the first preset duration
  • the first preset duration is 3s and the second preset duration is 1s.
  • the second preset duration may be 0, which is equivalent to the interactive device 3 defaulting to the user 4 who enters the near-field shooting area 322.
  • the willingness to interact the user 4 can be provided with interactive content without delay.
  • FIG. 5 is a schematic diagram of guiding a user's position through interactive content according to an exemplary embodiment.
  • the interactive device 3 may display a text form through an interactive display area 51 on a screen 33.
  • the interactive content 511 for example, the interactive content 511 is "Please approach within 1.5 meters", and guide the user 4 to move from the far-field shooting area 321 to the near-field shooting area 322.
  • the interactive device 3 can also play the interactive content in the form of voice through the speaker 34, such as "Xiao Bai, you are a bit far from me” (where “Xiao Bai” is the identity information, "you It's a little far from me “is interactive content), and guides the user 4 to move from the far-field shooting area 321 to the near-field shooting area 322.
  • the interactive device 3 can also control the indicator light 35 to implement such as breathing flashing, which can attract the attention of the user 4, which is equivalent to conveying the interactive content to the user 4, thereby guiding the user 4 to move from the far-field shooting area 321 to the near-field shooting area 322 .
  • the interactive device 3 may use one of the above-mentioned text form, voice form, and light form to convey the interactive content, which is not limited in this specification.
  • FIG. 6 is a schematic diagram of an interactive device actively initiating an interaction with a user according to an exemplary embodiment.
  • the interactive device 3 can play interactive content in the form of a voice through a speaker 34, such as “Xiaobai, You What can help you?
  • the interactive device 3 can display interactive content 512 in the form of text in the interactive display area 51, such as this
  • the interactive content 512 is “try to say this” and “call Zhang San”, and is used to guide the user 4 to express the interactive purpose to the interactive device 3 through a voice form.
  • the interactive device 3 does not have to guide the user 4 from the far-field shooting area 321 to the near-field shooting area 322.
  • the interactive device 3 can also directly guide the user 4 of the far-field shooting area 321 to say their interaction purpose.
  • the interactive device 3 may also detect the surrounding ambient noise. When the noise level is greater than a preset value, first guide the user 4 from the far-field shooting area 321 to the near-field shooting area 322, and then guide the user 4 to say his purpose of interaction. When the noise level is less than the preset value, the user 4 is directly guided to the far-field shooting area 321 to say his interaction purpose.
  • FIG. 7 is another schematic diagram of guiding a user's position through interactive content according to an exemplary embodiment.
  • the interactive device 3 collects a user image 71 of a user through the camera 31, but The user is an outsider of the enterprise AA, and the interactive device 3 fails to obtain its title. Therefore, when the user is guided to move from the far-field shooting area 321 to the near-field shooting area 322, the interactive device 3 can display text in the interactive display area 51.
  • Interactive content 513 in the form of, for example, the interactive content 513 is "please come within 1.5 meters"
  • the interactive device 3 can also play the interactive content in the form of voice through the speaker 34, such as "Hello, you are a little far from me” (omitted (Identifying the user's identity information)
  • the interactive device 3 can also control the indicator light 35 to implement, for example, breathing blinking, thereby guiding the user to move from the far-field shooting area 321 to the near-field shooting area 322.
  • the interaction device 3 can access the corporate WeChat server to learn the related events of the users in the sensing area, and may change the provided interactive content based on the related events.
  • FIG. 8 is a schematic diagram of a normal interaction scenario provided by an exemplary embodiment; as shown in FIG. 8, it is assumed that the interaction device 3 detects the user 4 located in the shooting area 32 during the working time period, and identifies The user 4 is an internal employee of the company AA.
  • FIG. 9 is a schematic diagram of adjusting interactive content according to an associated event according to an exemplary embodiment; as shown in FIG. 9, it is assumed that the interaction device 3 detects a user 4 located in the shooting area 32 during the working time period, and identifies the user User 4 is an internal employee of the company AA. If the interactive device 3 finds that the user 4 has submitted a sick leave approval event and has reached the sick leave time disclosed by the sick leave approval event, it is shown in the interactive display area 51. The interactive content 515 may be "Are you sure you want to get off work?".
  • FIG. 10 is a schematic diagram of specifying a speaker by an interactive device according to an exemplary embodiment. As shown in FIG. 10, it is assumed that there are multiple users in the sensing area 32, which respectively correspond to user images 81- shown on the screen 33- 82, etc., for example, the interactive device 3 can recognize that the user corresponding to the user image 81 is "Xiaobai", the user corresponding to the user image 82 is "Xiaohei", etc., and display the title of each user as the identity information and display it to the corresponding user.
  • the identity information 91 of the corresponding user is shown as "Little White” above the user image 81, and the identity information 92 of the corresponding user is shown as “Little Black” above the user image 82. Because the interaction capability of the interactive device 3 is limited, and in order that the interactive device 3 can clearly understand the interaction purpose of each user, the interactive device 3 can only interact with some users at the same time.
  • the interactive device 3 may select a target interactive object (that is, some users described above) in a certain manner, for example, in the order of the distance between each user and the interactive device 3 from small to large, according to the front face of each user and the camera 31 The order of the included angle between the shooting directions is from small to large, and the height of each user is from high to low, etc., and this specification does not limit this. It is assumed that the interactive device 3 wants to interact with the user "Xiaobai" corresponding to the user image 81. In order to avoid misunderstanding by other users in the sensing area, the interactive device 3 needs to express the interactive content to the user in the sensing area when providing the interactive content.
  • the corresponding target interaction object is the user "Xiaobai".
  • the interactive device 3 plays the interactive content "What can help you?" Through the speaker 34, and can add and play the identity information of the user "Xiaobai", so the content is actually played. It can be "Xiaobai, what can I do for you?", So that other users can be clear that the target interactive object of the interactive content "What can I help you?" Is the user "Xiaobai".
  • FIG. 11 is another schematic diagram of speaker designation by an interactive device according to an exemplary embodiment.
  • the interactive device 3 may not be able to accurately know the interactive purpose of each user due to the chaotic sound, or the interactive device 3 may not be able to respond to the interactive purposes of multiple users at the same time, or for other reasons, the interactive device 3 can provide interactive content and guide These users in turn express their purpose of interaction.
  • the interactive device 3 may show interactive content 516 in the interactive display area 51, and the interactive content 516 may include "Please do not speak at the same time”; further, when the interactive device 3 determines that the speaking order of each user is When the user "Xiaobai" speaks first, and the user “Xiaohei” speaks later, the interactive device 3 can play the interactive content "I can't hear, otherwise xx you speak first” through the speaker 34, and add and play the user "Xiaobai” ", So the actual playback content can be” I can't hear you, or you can say it first, "so that other users can clearly understand that the target interactive object of the interactive content is the user" Xiaobai ".
  • FIG. 12 is another schematic diagram of speaker designation by an interactive device according to an exemplary embodiment. As shown in FIG. 12
  • the interactive device 3 when the interactive device 3 determines that the order of each user ’s speech is that the user “Xiaobai” speaks first, When the user "Xiao Hei” speaks, the interactive device 3 can mark the user image 81 corresponding to the user "Xiao Bai", such as adding a flag box 810 in the face area, etc., even if the interactive content is "What can help you” , "Please speak", etc., each user can also make clear that the target interactive object of the interactive content is the user "Xiaobai".
  • the interactive device 13 shows an interactive text 517 in the interactive display area 51
  • the interactive text 517 includes the user “Xiaobai” in addition to the interactive content “Please speak xx”. Therefore, the entire content of the interactive text 517 is "Please speak to Xiaobai", which can also indicate to each user that the current target interaction object is the user "Xiaobai".
  • FIG. 13 is a schematic diagram of a speaking order for specifying external personnel according to an exemplary embodiment. As shown in FIG. 13, it is assumed that corresponding users such as user images 81-82 are external personnel of the enterprise AA, and the interactive device 3 cannot obtain These user titles, but the identity information of each user can be expressed in other ways in order to indicate the target interactive object corresponding to the interaction information.
  • the interaction device 3 determines that the target interaction object is the user corresponding to the user image 81
  • the user image 81 corresponds to a female user and the user image 82 corresponds to a male user
  • the identity of each user can be expressed by gender Information, such as "this lady”, “this man”, etc .; therefore, when the voice content played by the interactive device 3 through the speaker 34 is "I can't hear you, please ask this lady first", shooting area 32 All users in the area can determine that the interaction content is "I can't hear you, otherwise xx first say it", and based on the identity information "this lady", determine that the target interaction object is the user corresponding to the user image 81.
  • the user does not need to respond in some scenarios, such as the interactive content "successfully checked in to work” in the embodiment shown in FIG. 4; and in other scenarios, the user's Response, and the response may include an event assistance request initiated by the user, so that the interaction device 3 assists the user to complete the corresponding event, for example, the interaction content in the embodiment shown in FIG. 9 is "Are you sure you want to get off work?"
  • the interactive device 3 can determine that the user "Xiaobai” has initiated an event assistance request for "off time attendance event", so the interactive device 3 can assist in completing the " Work attendance incident.
  • the interactive device 3 in FIG. 13 issues a voice "I can't hear you, please ask this lady to speak first", and if the response returned by the female user is "call in vain", Based on the semantic analysis, the interactive device 3 can determine that the user "Xiaobai” has initiated an event assistance request for the "call event", and the calling object is the user "white”, so the interactive device 3 can initiate a call to the user "white” This assists in completing the "call event”.
  • the user in the shooting area 32 can also directly initiate an event assistance request to the interactive device 3, and the interactive device 3 can assist in completing the corresponding event, which is equivalent to responding to the interactive content
  • the interactive device 3 can assist in completing the corresponding event, which is equivalent to responding to the interactive content
  • the interactive device 3 can ensure that multiple users in the shooting area 32 speak in sequence, so that the interactive device 3 can determine the event assistance requests initiated by each user to assist in completing the corresponding events respectively. .
  • the interactive device 3 can simultaneously receive user voices from multiple users, and accurately separate each user voice based on the voice characteristics. It can also determine each voice through voice feature recognition (such as voiceprint recognition, etc.).
  • voice feature recognition such as voiceprint recognition, etc.
  • the interactive device 3 may directly compare the collected user voice with a sound feature database.
  • the sound feature database may include the voiceprint features of all internal employees in the enterprise AA, so as to determine based on the comparison result.
  • the internal employees corresponding to the collected user voice are output.
  • the interactive device 3 can identify the user in the shooting area 32 by other methods such as face recognition, and compare the result of the identification with the comparison result obtained based on the sound feature database to avoid internal employees of the enterprise AA. Impersonated.
  • the interactive device 3 may refuse to complete the corresponding assistance event and issue an alarm prompt to the user B.
  • the interactive device 3 may first identify the users in the shooting area 32 by using face recognition and other methods. For example, it is recognized that the users in the shooting area 32 are the users A and B in the enterprise AA. Then, when the interactive device 3 collects two user voices, it can only compare the user voice with the voiceprint characteristics of user A and user B to determine which user voice comes from user A and which user voice comes from user B, without having to compare with other voiceprint features in the sound feature library, which can greatly improve the comparison efficiency.
  • the users in the shooting area 32 may include an administrator, and the interactive device 3 may respond to the administrator ’s event assistance request to assist in completing the corresponding device management event, such as adjusting the welcome content on the screen 33 and adjusting the volume of the speaker 34 Adjust the area range of the far-field shooting area 321 and the near-field shooting area 322.
  • FIG. 14 is a schematic diagram of labeling an interaction object according to an exemplary embodiment. After the interactive device 3 shoots the shooting area 32 through the camera 31, it can mark the detected users located in the shooting area 32, so that the user can clearly determine whether he has been detected by the interactive device 3 and can interact with the interactive device. 3 Implement interaction. As shown in FIG.
  • the interactive device 3 when the interactive device 3 detects that a user exists in the shooting area 32, it can generate a corresponding avatar picture 1401 for the user based on the captured image, and display the avatar picture 1401 on the screen 33; When another user is also detected by the interactive device 3 in the shooting area 32, the corresponding avatar picture 1402 of the user may also be displayed on the screen 33; similarly, when other users enter the shooting area 32, the interactive device 3 may also be in the shooting area 32.
  • the corresponding avatar picture is shown on the screen 33, which is not repeated here.
  • the interactive device 3 may delete the avatar picture 1402 from the screen; the situation of other users is similar, and is not repeated here.
  • the corresponding user can determine that he has been detected by the interactive device 3 and that the interactive device 3 is an interaction object, and can interact with the interactive device 3;
  • the user who wishes to interact with the interactive device 3 does not see the corresponding avatar picture on the screen 33, it indicates that the user may not enter the shooting area 32 or may not be successfully detected by the interactive device 3 although entering the shooting area 32, The user can take measures such as entering or re-entering the shooting area 32 until the user's head picture is displayed on the screen 33.
  • FIG. 15 is a schematic diagram of labeling a target interactive object according to an exemplary embodiment. It is assumed that the interactive device 3 recognizes the user “Xiaobai” and the user “Xiaohei” respectively within the shooting area 32, and the interactive device 3 determines the user "Xiaobai” as the target interaction object, as shown in FIG. 15: Interactive device 3 The avatar picture 1401 corresponding to the user “Little White” can be displayed on the central area of the screen 33 (relatively away from the edge of the screen 33) in a normal proportion, and the avatar picture 1402 corresponding to the user "Little Black” can be displayed on the screen 33 in a small proportion. Edge area.
  • the interactive device 3 issues an interactive voice of "Can I help you” through the speaker 34, according to the display ratio and display position of the avatar pictures 1401, 1402, it can be determined that the target interactive object corresponding to the interactive voice is the avatar picture 1401
  • the corresponding user "Xiaobai” instead of the user “Xiaohei” corresponding to the avatar picture 1402.
  • the user in the shooting area 32 can also interact with the interactive device 3, such as sending a user voice to the interactive device 3, so that the interactive device 3 can interact with the user.
  • the voice responds to meet the needs of the user who originated the user's voice.
  • the user's voice can be used to respond to the interactive voice from the interactive device 3, or can be actively sent to the interactive device 3 by the user in the shooting area 32, which is not limited in this specification.
  • FIG. 16 is a schematic diagram of determining a source user of a user's voice according to an exemplary embodiment.
  • a microphone array may be built in the interactive device 3, and the microphone array may include a microphone 36 and a microphone 37. Wherein, the setting position of the microphone 36 is shifted to the left, and the setting position of the microphone 37 is shifted to the right.
  • a user in the shooting area 32 issues a user voice such as "I need to reserve a 15-person conference room"
  • the microphone 36 receives the user's voice earlier than the microphone 37, it indicates that The source user of the user's voice is relatively closer to the microphone 36 and farther away from the microphone 37, so it can be determined that the source user is located relatively more to the left in the shooting area 32.
  • the source can be determined The user is the user "Xiaobai".
  • the receiving time of the user's voice by the microphone 37 is earlier than the receiving time of the microphone 36, it indicates that the source user of the user's voice is relatively closer to the microphone 37 and relatively farther away from the microphone 36, so it can be determined that the source user is located in the shooting area
  • the position is relatively more to the right in 32, for example, in combination with the image collected in FIG. 10, it can be determined that the source user is the user “Little Black”.
  • the microphone 36 and the microphone 37 receive the user's voice at the same time or almost the same time, it indicates that the source user of the user's voice is located between the microphone 36 and the microphone 37, which is equivalent to directly in front of the interactive device 3. Therefore, the source user can be determined It is located in the middle of the shooting area 32.
  • the user may be located at the rear of the interactive device 3 instead of the front facing the screen 33 and the camera 31, so the user is obviously not in the shooting area 32 , but the user may be located near the interactive device 3 so that when the interactive device 3 receives an audio message such as "I need to reserve a 15-person conference room", the audio message is not necessarily the user in the shooting area 32
  • the user's voice is the interference voice from the user behind. Therefore, in order to avoid misinterpreting the interference voice as the user voice, the source direction of the audio message needs to be judged: the audio message originating from the front may be the user voice issued by the user in the shooting area 32, and the audio message originating from the rear is interference voice.
  • FIG. 17 is a schematic diagram of determining a source direction of an audio message according to an exemplary embodiment.
  • a microphone array may be built in the interactive device 3, and the microphone array may include a microphone 36, a microphone 37, and Microphone 38, the setting position of the microphone 36 is shifted to the left and right (the horizontal direction in FIG. 17), and the setting position of the microphone 37 is shifted to the right.
  • the microphone 36 and The microphone 37 is at the front of the interactive device 3 and is relatively close to the video area 32
  • the microphone 38 is at the back of the interactive device 3 and is relatively far from the video area 32.
  • the microphones 36 to 38 are located in the interactive device 3.
  • an audio message sent by a user near the interactive device 3 if the user is located in front of the interactive device 3, so that the audio message is transmitted from the front of the interactive device 3 and passes through the interactive device 3, the high frequency of the audio message Part of it will be absorbed by the shell of the interactive device 3.
  • the microphones 36 to 38 receive the high frequency part of the audio message, the strength of the high frequency signal received by the microphone 38 located on the back of the interactive device 3 will be caused by the shell of the interactive device 3.
  • Absorption is smaller than the intensity of the high frequency signal received by the microphone 36-37; if the user is located behind the interactive device 3, so that the audio message passes from the rear of the interactive device 3 and passes through the interactive device 3, the high frequency of the audio message Part of it will be absorbed by the shell of the interactive device 3.
  • the microphones 36 to 38 receive the high-frequency part of the audio message, the high-frequency signal strength received by the microphones 36-37 located at the front of the interactive device 3 will be caused by the interactive device.
  • the absorption of the shell of 3 is smaller than the intensity of the high frequency signal received by the microphone 38.
  • the source direction of the audio signal is in front of or behind the interactive device 3.
  • the source user of the audio message must not be a user in the shooting area 32, that is, the audio message is an interference voice.
  • the source direction of the audio message is determined to be in front of the interactive device 3, the source user of the audio message may be the user in the shooting area 32.
  • other conditions can be further judged in conjunction with:
  • image collection can be performed through the camera 31 on the interactive device 3, and if a user exists in the shooting area 32, it can be determined that the above-mentioned audio message originates from the user.
  • image collection can be performed through the camera 31 on the interactive device 3. If there are multiple users in the shooting area 32, the facial actions of each user can be combined, such as whether there are opening and closing during the process of receiving audio messages The motion of the mouth, whether the time when the motion occurred is consistent with the signal change of the audio message, etc., so that the user who matches the facial motion with the audio message is determined as the source user of the audio message.
  • image acquisition can be performed through the camera 31 on the interactive device 3. If there are multiple users in the shooting area 32, the source direction of the audio message identified by the microphone 36-37 can be skewed to the left and right Or intermediate, so that the user corresponding to the corresponding direction is determined as the source user of the audio message. If there are still multiple users in the same direction, the facial actions of each user described above can be further combined to filter out users whose facial actions match the audio message to determine the source user of the audio message.
  • FIG. 18 is a schematic diagram of a user labeling a source user's voice according to an exemplary embodiment.
  • the interactive device 3 may upload an avatar picture 1401 The display is maintained in the original color mode, and the avatar pictures 1402 corresponding to other users are displayed after being grayed out, so that users in the shooting area 32 can quickly confirm whether the interactive device 3 correctly recognizes the user's voice. Source users to ensure that there are no deviations in subsequent interactions.
  • the interactive device 3 when the interactive device 3 recognizes the user's voice, if the source user has an unfavorable factor such as an accent, the external environment is too noisy, or there is a distortion in the pickup process, it may affect the accurate semantic recognition of the interactive device 3 degree. Therefore, in the recognition process, the interactive device 3 can score each candidate semantic recognition result separately; among them, the interactive device 3 can directly discard the candidate semantic recognition result with lower confidence (for example, lower than a preset score). If the number of candidate semantic recognition results with high confidence (such as higher than the preset score) is 1, it can be directly used as the semantic recognition result.
  • the device 3 can show the source user with alternative semantic recognition result options corresponding to these alternative semantic recognition results with higher confidence.
  • the option 1801 shown in FIG. 18 is "1. I need to reserve a 15-person conference room.”
  • the option 1802 is "2. I need to reserve a conference room for 45 people” for the source user "Xiaobai" to select and confirm.
  • the user "Xiaobai” can inform the interactive device 3 of the selection option 1801 by issuing a confirmation voice including "first”, “previous”, “the one of 15 people", and then the interactive device 3 can determine
  • the above-mentioned semantic recognition result corresponding to the user's voice is "I need to reserve a 15-person conference room” to further respond, such as assisting the user "Xiao Bai" to complete the reservation of the relevant conference room.
  • FIG. 19 is a schematic structural diagram of a device according to an exemplary embodiment.
  • the device includes a processor 1902, an internal bus 1904, a network interface 1906, a memory 1908, and a non-volatile memory 1910.
  • the processor 1902 reads the corresponding computer program from the non-volatile memory 1910 into the memory 1908 and then runs it to form an interactive device on a logical level.
  • one or more embodiments of this specification do not exclude other implementations, such as a logic device or a combination of software and hardware, etc. That is to say, the execution body of the following processing flow is not limited to each A logic unit can also be a hardware or logic device.
  • the interactive device may include:
  • the detection unit 2001 detects a user in a sensing area
  • the providing unit 2002 provides interactive content to users in the sensing area
  • the first display unit 2003 when the target interactive object of the interactive content is a part of the users in the sensing area, displays the information of the target interactive object to the users in the sensing area.
  • the first display unit 2003 is specifically configured to:
  • the method further includes: an identification unit 2004, which identifies the user in the sensing area;
  • the identity information of the first user includes the title of the first user; when the identity of the second user who is the target interaction object is When not successfully identified, the identity information of the second user includes visual feature description information for the second user.
  • the first display unit 2003 is specifically configured to: determine user reference information corresponding to the user who is the target interactive object as the information of the target interactive object; and highlight the determined user to the users in the sensing area Refers to information.
  • the second display unit 2005 is specifically configured to:
  • Optional also includes:
  • the management unit 2006 when the target interactive object of the interactive content is changed from some users in the sensing area to all users, suspend displaying the information of the target interactive object to the users in the sensing area.
  • Optional also includes:
  • the request obtaining unit 2007 obtains an event assistance request from a user in the sensing area
  • the assistance unit 2008 responds to the event assistance request to assist in completing a corresponding event.
  • the request obtaining unit 2007 is specifically configured to:
  • the request obtaining unit 2007 is specifically configured to:
  • a prompt is issued to the selected users in order to make the selected users issue corresponding event assistance requests.
  • the request obtaining unit 2007 is specifically configured to:
  • the assistance unit 2008 is specifically configured to:
  • the sensing area includes a near-field sensing area and a far-field sensing area; the providing unit 2002 is specifically configured to:
  • an event acquiring unit 2009 which acquires an associated event of a user in the sensing area
  • the providing unit 2002 is specifically configured to: when the interactive content is related to the associated event, adjust the interactive content provided to the user in the sensing area according to the status information of the associated event.
  • the method further includes: a determining unit 2010, which determines an identity type of a user in the sensing area;
  • the providing unit 2002 is specifically configured to adjust the interactive content provided to users in the sensing area according to the identity type.
  • Optional also includes:
  • the response unit 2012 responds to the user's voice.
  • response unit 2012 is specifically configured to:
  • response unit 2012 is specifically configured to:
  • the response unit 2012 determines the source direction of the user voice in the following manner:
  • the microphone array including a first microphone disposed relatively to the left and a second microphone disposed relatively to the right;
  • the response unit 2012 responds to a user located in the direction of the source of the user's voice in the following manner:
  • Optional also includes:
  • the audio receiving unit 2013 receives audio messages through a microphone array including a third microphone relatively close to the sensing area and a fourth microphone relatively far from the sensing area;
  • the direction determining unit 2014 determines a source direction of the audio message according to the reception of the high-frequency part in the audio message by the third microphone and the fourth microphone;
  • the source determining unit 2015 determines that the audio message is a user voice issued by a user in the sensing area when the source direction is a side relatively close to the sensing area and a user exists in the sensing area.
  • Optional also includes:
  • the avatar display unit 2016 displays avatar pictures corresponding to each user when there are multiple users in the sensing area
  • the distinguishing display unit 2017 distinguishably displays the avatar picture of the source user of the user voice from the avatar pictures of other users.
  • the system, device, module, or unit described in the foregoing embodiments may be specifically implemented by a computer chip or entity, or a product with a certain function.
  • a typical implementation device is a computer, and the specific form of the computer may be a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email sending and receiving device, and a game control Desk, tablet computer, wearable device, or a combination of any of these devices.
  • a computer includes one or more processors (CPUs), input / output interfaces, network interfaces, and memory.
  • processors CPUs
  • input / output interfaces output interfaces
  • network interfaces network interfaces
  • memory volatile and non-volatile memory
  • Memory may include non-persistent memory, random access memory (RAM), and / or non-volatile memory in computer-readable media, such as read-only memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
  • RAM random access memory
  • ROM read-only memory
  • flash RAM flash memory
  • Computer-readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information can be stored by any method or technology.
  • Information may be computer-readable instructions, data structures, modules of a program, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), and read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, read-only disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, Magnetic tape cartridges, magnetic disk storage, quantum memory, graphene-based storage media, or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices.
  • computer-readable media does not include temporary computer-readable media, such as modulated data signals and carrier waves.
  • first, second, third, etc. may be used to describe various information in one or more embodiments of the present specification, the information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other.
  • first information may also be referred to as second information, and similarly, the second information may also be referred to as first information.
  • word “if” as used herein can be interpreted as “at” or "when” or "in response to determination”.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • User Interface Of Digital Computer (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
PCT/CN2019/090504 2018-01-30 2019-06-10 交互方法及装置 WO2020015473A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2021525345A JP2021533510A (ja) 2018-01-30 2019-06-10 相互作用の方法及び装置
SG11202100352YA SG11202100352YA (en) 2018-01-30 2019-06-10 Interaction method and device

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN201810089149 2018-01-30
CN201810806493.6A CN110096251B (zh) 2018-01-30 2018-07-20 交互方法及装置
CN201810806493.6 2018-07-20

Publications (1)

Publication Number Publication Date
WO2020015473A1 true WO2020015473A1 (zh) 2020-01-23

Family

ID=67443561

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/090504 WO2020015473A1 (zh) 2018-01-30 2019-06-10 交互方法及装置

Country Status (5)

Country Link
JP (1) JP2021533510A (ja)
CN (1) CN110096251B (ja)
SG (1) SG11202100352YA (ja)
TW (1) TW202008115A (ja)
WO (1) WO2020015473A1 (ja)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111078010B (zh) * 2019-12-06 2023-03-14 智语科技(江门)有限公司 一种人机交互方法、装置、终端设备及可读存储介质
CN111416871A (zh) * 2020-03-27 2020-07-14 乌鲁木齐明华智能电子科技有限公司 一种多方智能远程应答机制方法
CN111986678B (zh) * 2020-09-03 2023-12-29 杭州蓦然认知科技有限公司 一种多路语音识别的语音采集方法、装置
CN112767931A (zh) * 2020-12-10 2021-05-07 广东美的白色家电技术创新中心有限公司 语音交互方法及装置
CN115101048B (zh) * 2022-08-24 2022-11-11 深圳市人马互动科技有限公司 科普信息交互方法、装置、***、交互设备和存储介质

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130085757A1 (en) * 2011-09-30 2013-04-04 Kabushiki Kaisha Toshiba Apparatus and method for speech recognition
CN105872685A (zh) * 2016-03-24 2016-08-17 深圳市国华识别科技开发有限公司 智能终端控制方法和***、智能终端
CN105856257A (zh) * 2016-06-08 2016-08-17 以恒激光科技(北京)有限公司 适用于前台接待的智能机器人
CN106161155A (zh) * 2016-06-30 2016-11-23 联想(北京)有限公司 一种信息处理方法及主终端
CN107408027A (zh) * 2015-03-31 2017-11-28 索尼公司 信息处理设备、控制方法及程序
CN107451544A (zh) * 2017-07-14 2017-12-08 深圳云天励飞技术有限公司 信息显示方法、装置、设备及监控***
CN107483493A (zh) * 2017-09-18 2017-12-15 广东美的制冷设备有限公司 交互式日程提醒方法、装置、存储介质及智能家居***
CN108037699A (zh) * 2017-12-12 2018-05-15 深圳市天颐健康科技有限公司 机器人、机器人的控制方法和计算机可读存储介质

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004312513A (ja) * 2003-04-09 2004-11-04 Casio Comput Co Ltd 入場管理システムおよびプログラム
US8390680B2 (en) * 2009-07-09 2013-03-05 Microsoft Corporation Visual representation expression based on player expression
JP5857674B2 (ja) * 2010-12-22 2016-02-10 株式会社リコー 画像処理装置、及び画像処理システム
CN103095907B (zh) * 2012-09-14 2015-06-03 中兴通讯股份有限公司 一种移动终端中通过短信改变联系人状态的方法和装置
CN103500473A (zh) * 2013-09-04 2014-01-08 苏州荣越网络技术有限公司 一种手机打卡***
US9542544B2 (en) * 2013-11-08 2017-01-10 Microsoft Technology Licensing, Llc Correlated display of biometric identity, feedback and user interaction state
CN105590128A (zh) * 2016-03-01 2016-05-18 成都怡康科技有限公司 用于校园智能管理评价的智能卡/智能手环
CN106357871A (zh) * 2016-09-29 2017-01-25 维沃移动通信有限公司 一种扩音方法及移动终端
CN106910259A (zh) * 2017-03-03 2017-06-30 泸州市众信信息技术有限公司 一种可多途径打卡的考勤设备

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130085757A1 (en) * 2011-09-30 2013-04-04 Kabushiki Kaisha Toshiba Apparatus and method for speech recognition
CN107408027A (zh) * 2015-03-31 2017-11-28 索尼公司 信息处理设备、控制方法及程序
CN105872685A (zh) * 2016-03-24 2016-08-17 深圳市国华识别科技开发有限公司 智能终端控制方法和***、智能终端
CN105856257A (zh) * 2016-06-08 2016-08-17 以恒激光科技(北京)有限公司 适用于前台接待的智能机器人
CN106161155A (zh) * 2016-06-30 2016-11-23 联想(北京)有限公司 一种信息处理方法及主终端
CN107451544A (zh) * 2017-07-14 2017-12-08 深圳云天励飞技术有限公司 信息显示方法、装置、设备及监控***
CN107483493A (zh) * 2017-09-18 2017-12-15 广东美的制冷设备有限公司 交互式日程提醒方法、装置、存储介质及智能家居***
CN108037699A (zh) * 2017-12-12 2018-05-15 深圳市天颐健康科技有限公司 机器人、机器人的控制方法和计算机可读存储介质

Also Published As

Publication number Publication date
CN110096251A (zh) 2019-08-06
JP2021533510A (ja) 2021-12-02
SG11202100352YA (en) 2021-02-25
CN110096251B (zh) 2024-02-27
TW202008115A (zh) 2020-02-16

Similar Documents

Publication Publication Date Title
WO2020015473A1 (zh) 交互方法及装置
US11580983B2 (en) Sign language information processing method and apparatus, electronic device and readable storage medium
CN110291489B (zh) 计算上高效的人类标识智能助理计算机
US10514881B2 (en) Information processing device, information processing method, and program
JP5456832B2 (ja) 入力された発話の関連性を判定するための装置および方法
US20180181197A1 (en) Input Determination Method
CN106575361B (zh) 提供视觉声像的方法和实现该方法的电子设备
US20170188173A1 (en) Method and apparatus for presenting to a user of a wearable apparatus additional information related to an audio scene
WO2021052306A1 (zh) 声纹特征注册
JP2012186622A (ja) 情報処理装置、情報処理方法およびプログラム
KR102616403B1 (ko) 전자 장치 및 그의 메시지 전달 방법
WO2020088483A1 (zh) 一种音频控制方法及电子设备
US20230048330A1 (en) In-Vehicle Speech Interaction Method and Device
US20140044307A1 (en) Sensor input recording and translation into human linguistic form
US10499164B2 (en) Presentation of audio based on source
US20210280186A1 (en) Method and voice assistant device for managing confidential data as a non-voice input
CN113220590A (zh) 语音交互应用的自动化测试方法、装置、设备及介质
JP4585380B2 (ja) 次発言者検出方法、装置、およびプログラム
US20230298578A1 (en) Dynamic threshold for waking up digital assistant
US20210243252A1 (en) Digital media sharing
JP2023180943A (ja) 情報処理装置、情報処理方法およびプログラム
US11315544B2 (en) Cognitive modification of verbal communications from an interactive computing device
KR20220111574A (ko) 전자 장치 및 그 제어 방법
CN114360206B (zh) 一种智能报警方法、耳机、终端和***
US12032155B2 (en) Method and head-mounted unit for assisting a hearing-impaired user

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19837583

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021525345

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19837583

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19837583

Country of ref document: EP

Kind code of ref document: A1