WO2020114213A1 - 语音用户界面的显示方法和会议终端 - Google Patents

语音用户界面的显示方法和会议终端 Download PDF

Info

Publication number
WO2020114213A1
WO2020114213A1 PCT/CN2019/118081 CN2019118081W WO2020114213A1 WO 2020114213 A1 WO2020114213 A1 WO 2020114213A1 CN 2019118081 W CN2019118081 W CN 2019118081W WO 2020114213 A1 WO2020114213 A1 WO 2020114213A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
voice
conference
information
identity information
Prior art date
Application number
PCT/CN2019/118081
Other languages
English (en)
French (fr)
Inventor
郑明辉
肖靖
王耕
赵光耀
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP19894084.3A priority Critical patent/EP3869504A4/en
Publication of WO2020114213A1 publication Critical patent/WO2020114213A1/zh
Priority to US17/331,953 priority patent/US20210286867A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/451Execution arrangements for user interfaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/02Details
    • H04L12/16Arrangements for providing special services to substations
    • H04L12/18Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
    • H04L12/1813Arrangements for providing special services to substations for broadcast or conference, e.g. multicast for computer conferences, e.g. chat rooms
    • H04L12/1822Conducting the conference, e.g. admission, detection, selection or grouping of participants, correlating users to one or more conference sessions, prioritising transmission
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/02Details
    • H04L12/16Arrangements for providing special services to substations
    • H04L12/18Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
    • H04L12/1813Arrangements for providing special services to substations for broadcast or conference, e.g. multicast for computer conferences, e.g. chat rooms
    • H04L12/1831Tracking arrangements for later retrieval, e.g. recording contents, participants activities or behavior, network status
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/147Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1815Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
    • G10L2015/227Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of the speaker; Human-factor methodology

Definitions

  • the embodiments of the present application relate to the field of information processing technology, and in particular, to a voice user interface display method and a conference terminal.
  • voice interaction technology is gradually applied in various industries, such as home smart speakers, voice-controlled car terminals, personal voice assistants, and voice-controlled conference systems.
  • the voice control conference system is applied in public places such as conference rooms, and its uniqueness lies in the user's instability. For example, the organizers and participants of every meeting are changing.
  • the voice-controlled conference system presents a unified user interface for all users.
  • the embodiments of the present application provide a method for displaying a voice user interface and a conference terminal, which solves the technical problem that the current voice-controlled conference system cannot meet different requirements of different users for the conference system.
  • an embodiment of the present application provides a method for displaying a voice user interface, including:
  • the voice information When receiving the voice information input by the user to the conference terminal, collect the user's voice; the voice information includes voice wake-up words or voice information beginning with voice wake-up words;
  • the conference status of the conference terminal and the user's voice instructions generate user interface information that matches the user;
  • the user's voice is collected when receiving voice information input by the user.
  • the user's voice instruction can be obtained according to the voice information input by the user.
  • the user's identity information can be obtained in real time according to the user's voice.
  • user interface information matching the user can be displayed. Considering the user's identity information, you can identify the needs of different users for the conference, and generate user interface information in a targeted manner, which meets the different needs of different users for the conference system, improves the diversity of user interface information display, and improves The user's experience of using the conference system.
  • the user's voice command is used to wake up the conference terminal; according to the user's identity information, the conference state of the conference terminal, and the user's voice command, user interface information matching the user is generated, including:
  • the type of user is used to indicate the user's familiarity with the completion of conference control tasks by entering voice information;
  • a conference operation prompt message and a voice input interface are generated according to the conference status.
  • the method further includes:
  • a voice input interface is generated.
  • the method further includes:
  • meeting status and role information According to the meeting status and role information, generate meeting operation prompt information and voice input interface.
  • the user type is determined according to the conference status and the user's identity information, including:
  • the historical meeting record includes at least one of the following data: the last occurrence time of different meeting control tasks, the cumulative number of tasks used, and the task success rate;
  • the user type is determined according to the conference status and the user's historical conference records, including:
  • the type of the user is determined.
  • determining the type of user according to the data of at least one conference control task includes:
  • the data of the conference control task includes the last occurrence time, and the time interval between the last occurrence time and the current time is greater than or equal to the first preset threshold, and/or, if the meeting The data of the control task includes the cumulative number of tasks used, and the cumulative number of tasks used is less than or equal to the second preset threshold, and/or, if the data of the conference control task includes the task success rate, and the task success rate is less than or equal to the Three preset thresholds, it is determined that the user is a novice user with respect to the conference control task;
  • the meeting control task For each conference control task, if at least one of the most recent occurrence time, the cumulative number of tasks used, and the task success rate included in the data of the conference control task meets their respective preset conditions, it is determined that the user
  • the meeting control task is a skilled user; the preset condition corresponding to the last occurrence time is that the time interval between the last occurrence time and the current time is less than the first preset threshold, and the preset condition corresponding to the cumulative use times of the task is the task accumulation
  • the number of uses is greater than the second preset threshold, and the preset condition corresponding to the task success rate is that the task success rate is greater than the third preset threshold.
  • the user's voice command is used to perform a conference control task after waking up the conference terminal.
  • the operation result of the user's voice command includes multiple candidate objects; based on the user's identity information, the current conference status of the conference terminal, and the user's voice Instructions to generate user interface information that matches the user, including:
  • sorting multiple candidate objects according to the user's identity information to generate user interface information matching the user includes:
  • multiple candidate objects are sorted to generate user interface information that matches the user.
  • obtaining the user's voice instruction according to the voice information includes:
  • the user voice commands are generated after the server semantically understands the voice information.
  • it also includes:
  • Obtain the user's identity information based on the user's voice including:
  • obtaining the user's identity information according to the user's voice and avatar includes:
  • the user's face information and face information database determine the user's identity information.
  • obtaining the user's identity information according to the user's voice and avatar also includes:
  • the user's voiceprint information and voiceprint information database determine the user's identity information.
  • an embodiment of the present application provides a display device for a voice user interface, including:
  • the receiving module is used for receiving the user's voice when inputting the voice information of the conference terminal;
  • the voice information includes a voice wake-up word or voice information beginning with the voice wake-up word;
  • the first obtaining module is used to obtain the user's identity information according to the user's voice
  • the second obtaining module is used to obtain the user's voice instruction according to the voice information
  • the generating module is used to generate user interface information matching the user according to the user's identity information, the conference status of the conference terminal and the user's voice instruction;
  • the display module is used to display user interface information.
  • the user's voice command is used to wake up the conference terminal;
  • the generation module includes:
  • the first determining unit is used to determine the user type according to the conference status and the user's identity information, and the user type is used to indicate the user's familiarity with the completion of the conference control task by inputting voice information;
  • the first generating unit is configured to generate conference operation prompt information and a voice input interface according to the conference status if the user type indicates that the user is a novice user.
  • the generating module further includes:
  • the second generating unit is configured to generate a voice input interface if the type of user indicates that the user is a skilled user.
  • the generation module further includes:
  • the first obtaining unit is used to obtain the role information of the user in the conference
  • the first generating unit is specifically used for:
  • meeting status and role information According to the meeting status and role information, generate meeting operation prompt information and voice input interface.
  • the first determining unit includes:
  • the first obtaining subunit is used to obtain the user's historical meeting record based on the user's identity information, and the historical meeting record includes at least one of the following data: the last occurrence time of different meeting control tasks, the cumulative number of tasks used, and the task success rate ;
  • the determination subunit is used to determine the type of the user according to the conference status and the user's historical conference record.
  • the determination subunit is specifically used for:
  • the type of the user is determined.
  • the determination subunit is specifically used for:
  • the data of the conference control task includes the last occurrence time, and the time interval between the last occurrence time and the current time is greater than or equal to the first preset threshold, and/or, if the meeting The data of the control task includes the cumulative number of tasks used, and the cumulative number of tasks used is less than or equal to the second preset threshold, and/or, if the data of the conference control task includes the task success rate, and the task success rate is less than or equal to the Three preset thresholds, it is determined that the user is a novice user with respect to the conference control task;
  • the meeting control task For each conference control task, if at least one of the most recent occurrence time, the cumulative number of tasks used, and the task success rate included in the data of the conference control task meets their respective preset conditions, it is determined that the user
  • the meeting control task is a skilled user; the preset condition corresponding to the last occurrence time is that the time interval between the last occurrence time and the current time is less than the first preset threshold, and the preset condition corresponding to the cumulative use number of tasks is the task accumulation
  • the number of uses is greater than the second preset threshold, and the preset condition corresponding to the task success rate is that the task success rate is greater than the third preset threshold.
  • the user's voice instruction is used to perform a conference control task after waking up the conference terminal.
  • the operation result of the user's voice instruction includes multiple candidate objects; the generation module includes:
  • the third generating unit is configured to sort multiple candidate objects according to the user's identity information and generate user interface information that matches the user.
  • the third generating unit includes:
  • the second obtaining subunit is used to obtain the correlation between each candidate object and the identity information of the user;
  • the generating subunit is used to sort multiple candidate objects according to each relevance and generate user interface information matching the user.
  • the second acquisition module is specifically used to:
  • the user voice commands are generated after the server semantically understands the voice information.
  • the receiving module is also used to:
  • the first obtaining module is specifically used to obtain the user's identity information according to the user's voice and avatar.
  • the first acquisition module includes:
  • a second determining unit configured to determine the user's position relative to the conference terminal according to the user's voice
  • the collection unit is used to collect the user's face information according to the user's position relative to the conference terminal;
  • the third determining unit is used to determine the user's identity information according to the user's face information and the face information database.
  • the first acquisition module further includes:
  • a second obtaining unit configured to obtain the user's voiceprint information according to the user's voice
  • the fourth determining unit is used to determine the user's identity information according to the user's voiceprint information and voiceprint information database.
  • an embodiment of the present application provides a conference terminal, including: a processor, a memory, and a display;
  • the memory is used to store program instructions
  • the display is used to display user interface information according to the control of the processor
  • the processor is used to call and execute the program instructions stored in the memory.
  • the conference terminal is used to execute the method of any implementation manner of the first aspect described above.
  • an embodiment of the present application provides a chip system.
  • the chip system includes a processor, and may further include a memory, for implementing the method of any implementation manner of the first aspect described above.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • an embodiment of the present application provides a program that, when executed by a processor, is used to execute the method of any implementation manner of the foregoing first aspect.
  • an embodiment of the present application provides a computer program product containing instructions that, when run on a computer, cause the computer to execute the method of any implementation manner of the first aspect described above.
  • an embodiment of the present application provides a computer-readable storage medium that stores instructions, and when the instructions run on a computer, the computer is allowed to execute the method of any implementation manner of the first aspect described above.
  • FIG. 1 is a schematic structural diagram of a conference system to which an embodiment of this application is applicable;
  • FIG. 2 is a schematic diagram of software modules in a conference system involved in an embodiment of this application;
  • FIG. 3 is a flowchart of a method for displaying a voice user interface provided in Embodiment 1 of this application;
  • FIG. 4 is a flowchart of a method for displaying a voice user interface provided in Embodiment 2 of this application;
  • FIG. 5 is a schematic diagram of a voice user interface provided in Embodiment 2 of the present application in a scenario
  • FIG. 6 is a schematic diagram of a voice user interface provided in Embodiment 2 of the present application in another scenario
  • Example 7 is a schematic diagram of a help prompt area provided in Example 2 of the present application.
  • FIG. 8 is a schematic diagram of a historical meeting record provided in Embodiment 2 of the application.
  • FIG. 9 is a schematic diagram of a voice user interface provided in Embodiment 3 of the present application in a scenario
  • FIG. 10 is a schematic diagram of a voice user interface provided in Embodiment 3 of the present application in another scenario;
  • FIG. 11 is a schematic diagram of a voice user interface provided in Embodiment 3 of the present application in another scenario;
  • FIG. 12 is a schematic structural diagram of a display device of a voice user interface provided by an embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of a conference terminal provided by an embodiment of this application.
  • FIG. 1 is a schematic structural diagram of a conference system to which an embodiment of this application is applicable.
  • the conference system may include: a conference terminal 100 and a server.
  • the server may include at least one of a local server 200 and a remote server 300.
  • the type of the remote server 300 may be a traditional server or a cloud server.
  • Communication can be performed between the conference terminal 100 and the local server 200, between the conference terminal 100 and the remote server 300, and before the local server 200 and the remote server 300.
  • the communication method may be wired communication or wireless communication. Since the computing power of the conference terminal 100 is usually limited, the server has powerful computing power. Therefore, the communication between the conference terminal 100 and the server can compensate and assist the data processing of the conference terminal 100.
  • Each device in the conference system can be pre-installed with a software program or application (Application, APP), and through voice recognition technology and semantic understanding technology, realize the voice interaction task between the user and the conference system.
  • Application Application, APP
  • the conference terminal 100 may include: a sound collection device, a sound playback device, a shooting device, a memory, a processor, and so on.
  • the sound collection device is used to obtain the voice input by the user.
  • the shooting device can collect images or videos in the conference environment.
  • the sound playback device can play the voice part of the result of the voice interaction task.
  • the conference terminal 100 may further include a transceiver.
  • the transceiver is used to communicate with other devices and transmit data or instructions.
  • the conference terminal 100 may further include a display screen.
  • the display screen is used to display the displayable part of the result of the voice interaction task.
  • the conference terminal 100 can also perform data transmission with an external display device, which has caused the display device to display the displayable part of the result of the voice interaction task.
  • the voice interaction task may also be called a voice task, a conference control task, and so on.
  • the embodiments of the present application do not limit the functions implemented by the voice interaction task.
  • the user speaks the voice wake-up word "Xiaowei" to the conference terminal in the monitoring state.
  • the voice interaction task may be to wake up the conference terminal.
  • the conference terminal enters the standby state from the listening state to wait for the user to continue to input voice.
  • the voice input window interface can be displayed on the display screen of the conference terminal.
  • the user says "Xiaowei, please call user A" to the conference terminal in the conference.
  • the voice interaction task can be to initiate a call after waking up the conference terminal. After the task is performed, the conference terminal can be woken up and call user A. At this time, the interface of calling user A may be displayed on the display screen of the conference terminal.
  • the sound collection device may include a microphone or a microphone array.
  • the sound playing device may include a speaker or a speaker.
  • the shooting device may be a camera with different pixels.
  • FIG. 2 is a schematic diagram of a software module in a conference system involved in an embodiment of the present application.
  • the microphone array, speakers and display screen are hardware modules.
  • the voice input by the user can be obtained through the microphone array.
  • the voice recognition engine 20 can process the voice and convert the voice to text.
  • the semantic understanding engine 21 can obtain the meaning contained in the text, and parse the text into an intent. In this example, the user's intention is to call A.
  • the dialog management module 22 outputs executable instructions that can be recognized by the business.
  • the instruction may also be called a voice instruction, a user voice instruction, a conference instruction, a conference control instruction, and so on.
  • the central control module 23 executes the instruction to obtain the execution result. If the execution result includes a part that requires voice playback, the voice synthesis engine 28 processes and plays through the speaker. If the execution result includes a part that needs to be displayed on the screen, it is displayed on the display screen through the processing of the graphical user interface module 29.
  • the user type, user identity, and conference status may be considered at the same time. Execute the instruction based on the above factors, you can get the operation result of the instruction that matches the user type, user identity and conference status. The flexibility of the interface display is improved, and the user experience is improved.
  • the identity recognition engine 24 may use at least one of sound source localization technology, sound source tracking technology, voiceprint recognition technology, face recognition technology, and lip movement recognition technology to obtain the user’s information in the user information database 27 Identity Information.
  • the identity recognition engine 24 outputs the user's identity information to the central control module 23.
  • the identity type determination unit 25 may determine the type of user.
  • the type of user is used to indicate the proficiency of the user to complete the conference control task by inputting voice. It should be noted that for different conference control tasks, the same user type may be different. For example, user B often organizes meetings, so for meeting control tasks such as entering a meeting, initiating a meeting, and adding participants to the meeting, user B may be a skilled user. However, if user B just organizes the meeting and does not participate in the follow-up meeting, then user B may be a novice user for meeting control tasks such as ending the meeting, screen sharing during the meeting, or viewing the meeting place.
  • the identity type determination unit 25 outputs the user's type to the central control module 23.
  • the prompt information management unit 26 may push different prompt information to the central control module 23 according to the conference status.
  • the central control module 23 executes instructions to obtain execution results based on the outputs of the identity recognition engine 24, the identity type determination unit 25, the prompt information management unit 26, and the session management module 22.
  • the conference system involved in the embodiments of the present application can realize the functions implemented by the modules shown in FIG. 2.
  • the specific division of the modules is not limited, and the division and name of each module in FIG. 2 are only an example.
  • the embodiments of the present application do not limit the installation positions of the modules in FIG. 2.
  • the voice recognition engine 20, the semantic understanding engine 21, the dialogue management module 22, the identity type determination unit 25, and the identity recognition engine 24 may be provided on the conference terminal, local server, or remote server.
  • the central control module 23, the speech synthesis engine 28, and the graphical user interface module 29 may be set on the conference terminal.
  • the execution subject may be a display device of a voice user interface or a conference terminal.
  • the method for displaying the voice user interface provided in this embodiment may include:
  • S301 Collect voice of the user when receiving voice information input by the user to the conference terminal.
  • the voice information includes voice wake-up words or voice information beginning with voice wake-up words.
  • the voice information input by the user may include only the voice wake word. For example, "Xiaowei”. It can also be a voice message beginning with a voice wake word. For example, “Xiaowei, please call user A”, “Xiaowei, please share the screen of conference room B", “Xiaowei, I want to end the meeting", and so on.
  • the conference terminal is provided with a sound collection device. While the user inputs voice information to the conference terminal, the conference terminal can collect the user's voice.
  • a shooting device may be provided on the conference terminal. While the user inputs voice information to the conference terminal, the conference terminal can collect the user's avatar.
  • the user's voice is collected at the same time. Since the user's voice is very recognizable information, the user's identity information can be obtained in real time through the user's voice, which improves the timeliness of obtaining the user's identity information.
  • a personalized conference display interface can be customized for the user. For example, for users in different departments, the interface can be displayed in different display styles.
  • the user's identity information may include at least one of the following:
  • obtaining the user's identity information according to the user's voice may include:
  • the user's avatar is also very recognizable information.
  • the user's voice and avatar are used to obtain the user's identity information, which further improves the accuracy of the user's identity information. This is especially true for scenarios where there are a large number of people using conference terminals and the staff changes frequently, for example, large enterprises with a large number of employees.
  • the conference terminal can perform voice recognition and semantic understanding on the voice to obtain the user's voice instruction.
  • the user voice instruction can be executed by the conference terminal.
  • the execution order of S302 and S303 is not limited. For example, it can be executed before and after, or simultaneously.
  • S304 Generate user interface information matching the user according to the user's identity information, the conference status of the conference terminal, and the user's voice instruction.
  • the user interface information matching the user may be different.
  • the user's graduation time is July 2018, and the user's onboarding time is August 2018. It is currently November 2018. This means that the user is a new employee who has just graduated and has been employed for 3 months.
  • the conference state of the conference terminal is the monitoring state. User voice commands are used to wake up the conference terminal. Then, after the conference terminal enters the standby state from the monitoring state, the displayed user interface information matching the user may include prompt information related to entering the conference.
  • the user's onboarding time is 2014. It is currently 2018. This means that the user is an employee who has been working for 4 years. It can be determined that the user is familiar with the meeting process.
  • the conference terminal when the conference terminal enters the standby state from the monitoring state, it may not display any prompt information, and only display the voice input window.
  • the conference status is used to indicate the execution phase and execution status of the conference or the conference terminal.
  • This embodiment does not limit the specific classification of the conference state.
  • the conference status may include at least one of the following: not joined the conference, joined the conference, the screen is being shared during the conference, viewing the conference site during the conference, and so on.
  • the method for displaying the voice user interface collects the user's voice when receiving the voice information input by the user to the conference terminal.
  • the user's voice instruction can be obtained according to the voice information input by the user.
  • the user's identity information can be obtained in real time according to the user's voice.
  • user interface information matching the user can be displayed. Considering the user's identity information, you can identify the needs of different users for the conference and generate user interface information in a targeted manner, which meets the different needs of different users for the conference system, improves the diversity of user interface information display, and improves The user's experience of using the conference system.
  • obtaining the user's voice instruction according to the voice information may include:
  • the user's voiceprint information and voiceprint information database determine the user's identity information.
  • voiceprint recognition technology can be used to obtain the user's voiceprint information, and then find a match in the voiceprint information database to determine the user's identity information.
  • the voiceprint information database can be updated periodically.
  • obtaining the user's identity information according to the user's voice and avatar may include:
  • the user's position relative to the conference terminal is determined according to the user's voice.
  • the user's face information is collected.
  • the user's face information and face information database determine the user's identity information.
  • sound source tracking technology sound source localization technology, or lip movement recognition technology may be used to determine the position of the user relative to the conference terminal.
  • the user's face information is collected using face recognition technology or the like according to the user's position relative to the conference terminal. After that, according to the user's face information, a match is found in the face information database to determine the user's identity information.
  • the face information database can be updated periodically.
  • the user's position relative to the conference terminal may include the user's direction relative to the conference terminal.
  • obtaining the user's identity information according to the user's voice and avatar may also include:
  • the user's voiceprint information and voiceprint information database determine the user's identity information.
  • a match can be found in the voiceprint information database to determine the user's identity information. Since the user's identity information is determined based on the user's voiceprint feature and face matching, the accuracy of the user's identity information is further improved.
  • obtaining the user's voice instruction according to the voice information includes:
  • the user voice commands are generated after the server semantically understands the voice information.
  • the conference terminal itself can perform voice recognition and semantic understanding, and generate a user voice instruction according to the voice information input by the user. It simplifies the process of acquiring user voice commands.
  • data transmission can be performed between the conference terminal and the server, and the server performs voice recognition and semantic understanding on the voice information input by the user.
  • the server only needs to return the user's voice command to the conference terminal.
  • the hardware configuration of the conference terminal is reduced, which is easy to implement.
  • This embodiment provides a method for displaying a voice user interface, which includes: when receiving voice information input by a user into a conference terminal, collecting the user's voice, obtaining the user's identity information according to the user's voice, obtaining the user's voice instruction based on the voice information, and according to the user Identity information, conference status of the conference terminal, and user voice instructions, generate user interface information that matches the user, and display the user interface information.
  • the method for displaying a voice user interface provided in this embodiment collects the user's voice when receiving voice information input by the user.
  • the user's identity information can be obtained in real time according to the user's voice.
  • the user's use requirements for the conference can be identified, and the user interface information can be generated to meet the needs of different users for the conference system.
  • the demand improves the diversity of information displayed on the user interface and enhances the user's experience of using the conference system.
  • FIG. 4 is a flowchart of a method for displaying a voice user interface provided in Embodiment 2 of the present application.
  • the method for displaying a voice user interface provided in this embodiment on the basis of the embodiment shown in FIG. 3, provides an implementation manner of the method for displaying a voice user interface in a scenario in which user voice commands are used to wake up a conference terminal.
  • S304 Generate user interface information matching the user according to the user's identity information, the conference status of the conference terminal, and the user's voice instruction, which may include:
  • S401 Determine the user type according to the conference status and the user's identity information.
  • the type of user is used to indicate the familiarity of the user to complete the conference control task by inputting voice information.
  • a voice input interface is generated.
  • the conference state is different, and the user's familiarity with completing the conference control task by inputting voice information may be different.
  • meeting operation prompt information and voice input interface may be generated.
  • the conference operation prompt information it can play a good role in meeting guidance for novice users, improve the efficiency and accuracy of novice users to enter voice, and improve the success rate of novice users to complete meeting control tasks.
  • it is determined that the user is a skilled user there is no need to prompt for guidance.
  • only the voice input interface is generated. Users can directly input voice to complete the corresponding conference control tasks. Because the time and steps for displaying the prompt information of the conference operation are saved, the guidance process is saved, and the efficiency of skilled users in completing conference control tasks is improved. Meet the meeting needs of skilled users. Improve user experience.
  • FIG. 5 is a schematic diagram of a voice user interface provided in Embodiment 2 of the present application in a scenario, which is suitable for novice users.
  • the currently displayed voice user interface is the monitoring screen. It should be noted that the monitoring screen may be different in different conference states. This example does not limit the monitor screen.
  • the voice user interface may include a help prompt area 101 and a voice input interface 102.
  • the help prompt area 101 can display the prompt information of the conference operation.
  • this example does not limit the display position, display content, and display style of the help prompt area 101 and the voice input interface 102.
  • the help prompt area 101 may be displayed in a conspicuous area of the voice user interface, which is convenient for novice users to see.
  • FIG. 6 is a schematic diagram of a voice user interface provided in Embodiment 2 of the present application in another scenario, which is suitable for skilled users.
  • the currently displayed voice user interface is the monitoring screen.
  • the monitoring screen can refer to the description of FIG. 5.
  • the voice user interface may include a voice input interface 102.
  • the voice user interface of this example is simpler, and there is no redundant prompt information, which improves the meeting experience of the skilled user.
  • the method for displaying a voice user interface may further include:
  • the conference operation prompt information and the voice input interface are generated according to the conference status, which may include:
  • meeting status and role information According to the meeting status and role information, generate meeting operation prompt information and voice input interface.
  • conference states there can be multiple conference states.
  • the conference control tasks involved in different conference states may also be different.
  • the user's role in the conference can also be multiple. For example, conference chairperson, non-conference chairperson. This embodiment does not limit the division of roles in the user's conference.
  • the conference operation prompt information and voice input interface are generated according to the conference state and the user's role information in the conference, which further improves the matching degree of the prompt information and the user. Improve the user's experience of using the conference system.
  • FIG. 7 is a schematic diagram of a help prompting area provided in Embodiment 2 of the present application. On the basis of the example shown in FIG. 5, the help prompt area is explained for different conference states and different role information.
  • the conference operation prompt information may include information included in the help prompt area 101.
  • the conference status is: joined conference, multipoint conference, interactive voice response (Interactive Voice Response, IVR) broadcast before the conference end reminder.
  • the user's role information in the conference is: the conference chairperson.
  • the conference operation prompt information may include information included in the help prompt area.
  • the conference status is: joined conference, multipoint conference, after IVR broadcasts the reminder of the end of the conference.
  • the user's role information in the conference is: the conference chairperson.
  • the conference operation prompt information may include information included in the help prompt area.
  • the conference status is: Joined conference, multipoint conference.
  • the user's role information in the conference is: non-conference chairperson.
  • the conference operation prompt information may include information included in the help prompt area.
  • the conference status is: Joined, point-to-point conference. Since it is a peer-to-peer conference, it does not involve user role information in the conference.
  • the conference operation prompt information may include information included in the help prompt area.
  • the conference state is: a point-to-point call initiated by a non-voice call is in progress.
  • the conference operation prompt information may include information included in the help prompt area.
  • determining the user type according to the conference status and the user's identity information may include:
  • the historical meeting record includes at least one of the following data: the last occurrence time of different meeting control tasks, the cumulative number of task usage times, and the task success rate.
  • FIG. 8 is a schematic diagram of historical meeting records provided in Embodiment 2 of the present application.
  • the historical conference record of the user is stored in the historical conference record database.
  • User 1's historical meeting record includes data of multiple meeting control tasks. For example, task 1 to task n.
  • the type of user can be determined based on the meeting status and the user's historical meeting record.
  • the manner of recording data in the historical conference record database is not limited.
  • data can be stored in the form of a table.
  • the historical meeting record database can be updated periodically.
  • determining the user type according to the conference status and the user's historical conference records may include:
  • the type of the user is determined.
  • a conference from the beginning of the creation of the conference to the end of the conference, there can be multiple conference states.
  • the conference control tasks involved in different conference states may also be different.
  • the accuracy of determining the user type is further improved by determining the user type on the data of at least one conference control task associated with the conference status.
  • determining the type of user based on the data of at least one meeting control task may include:
  • the data of the conference control task includes the last occurrence time, and the time interval between the last occurrence time and the current time is greater than or equal to the first preset threshold, and/or, if the meeting The data of the control task includes the cumulative number of tasks used, and the cumulative number of tasks used is less than or equal to the second preset threshold, and/or, if the data of the conference control task includes the task success rate, and the task success rate is less than or equal to the Three preset thresholds determine that the user is a novice user with respect to the conference control task.
  • the cumulative number of tasks used, and the task success rate as long as there is one type of data that satisfies the corresponding user as a novice user Conditions, it can be determined that the user is a novice user.
  • the data of the meeting control task includes the last occurrence time and the task success rate.
  • the time interval between the last occurrence time and the current time is greater than or equal to the first preset threshold.
  • the task success rate is greater than the third preset threshold. Since the last occurrence time satisfies the condition that the corresponding user is a novice user, even if the task success rate does not satisfy the condition that the corresponding user is a novice user, the user is determined to be a novice user.
  • the specific values of the first preset threshold, the second preset threshold, and the third preset threshold are not limited.
  • determining the type of user based on the data of at least one meeting control task may include:
  • the preset condition corresponding to the last occurrence time is that the time interval between the last occurrence time and the current time is less than the first preset threshold
  • the preset condition corresponding to the cumulative usage count of the task is that the cumulative usage count of the task is greater than the second preset Threshold
  • the preset condition corresponding to the task success rate is that the task success rate is greater than the third preset threshold.
  • the latest occurrence time, the cumulative number of tasks used, and the task success rate must be determined before all data meet the user’s novice requirements.
  • the user is a skilled user.
  • the data of the meeting control task includes the last occurrence time and the task success rate
  • only the time interval between the last occurrence time and the current time is less than the first preset threshold
  • the task success rate is greater than the third preset Only when the threshold is reached can the user be determined to be a skilled user for the meeting control task.
  • the data of the meeting control task includes the time of the last occurrence, the cumulative number of times the task is used, and the task success rate, only the time interval between the time of the last occurrence and the current time is less than the first preset threshold, and the task is accumulated Only when the number of uses is greater than the second preset threshold and the task success rate is greater than the third preset threshold, can the user be determined to be a skilled user for the meeting control task.
  • This embodiment provides a method for displaying a voice user interface, which determines the type of user according to the conference status and the user's identity information. If the type of the user indicates that the user is a novice user, a conference operation prompt message and a voice input interface are generated according to the conference status. If the type of user indicates that the user is a skilled user, a voice input interface is generated.
  • the meeting operation prompt information can play a good guiding role for novice users, improve the efficiency and accuracy of novice users to input voice, and improve the success rate of meeting control tasks.
  • skilled users avoid displaying redundant prompt messages, saving the guidance process and improving the efficiency of skilled users in completing meeting control tasks. It meets the different needs of different users for the conference system and enhances the user experience.
  • Embodiment 3 of the present application also provides a method for displaying a voice user interface. Based on the embodiment shown in FIG. 3, this embodiment provides an implementation manner of a method for displaying a voice user interface in a scenario where a user voice command is used to perform a conference control task after waking up a conference terminal.
  • the user's voice instruction is used to perform the conference control task after waking up the conference terminal. If the operation result of the user's voice instruction includes multiple candidate objects.
  • S304 Generate user interface information matching the user according to the user's identity information, the current conference status of the conference terminal, and the user's voice instruction, which may include:
  • the voice input by user 1 is "Xiaowei, call Li Jun.”
  • the generated user voice command is used to call Li Jun after waking up the conference terminal.
  • Li Jun there are multiple Li Juns in the company.
  • the operation result of the user's voice instruction includes multiple candidate objects. Multiple candidate objects need to be sorted according to the user's identity information to generate user interface information matching the user. Therefore, the matching degree between the displayed candidate results and the user is improved, and the user experience is improved.
  • sorting multiple candidate objects according to the user's identity information to generate user interface information matching the user may include:
  • multiple candidate objects are sorted to generate user interface information that matches the user.
  • FIG. 9 is a schematic diagram of a voice user interface provided in Embodiment 3 of the present application in a scenario.
  • user 1 wants to call Li Jun.
  • the department of user 1 is department 1.
  • users with the same name as "Li Jun” are screened out, and those with the same name are considered more relevant.
  • Rank users with different names are considered more relevant.
  • the "Li Jun” in the same department as User 1 can be ranked first according to the relevance of the department. Get the final sort.
  • FIG. 10 is a schematic diagram of a voice user interface provided in Embodiment 3 of the present application in another scenario.
  • User 2 wants to call Li Jun.
  • the department of user 2 is department 3.
  • the users who are the same as the user 2 department are selected, and the same department is considered to be more relevant.
  • Rank users with different names are ranked first. Get the final sort.
  • the user interface information is directly displayed.
  • FIG. 11 is a schematic diagram of a voice user interface provided in Embodiment 3 of the present application in another scenario. As shown in Figure 11, if there is only one "Li Jun” in the company, dial Li Jun's phone directly and display the interface for calling "Li Jun”.
  • This embodiment provides a method for displaying a voice user interface.
  • a user's voice command is used to wake up a conference terminal to perform a conference control task
  • the result of the user's voice command includes multiple candidate objects
  • multiple The candidate objects are sorted to generate user interface information that matches the user. The matching degree between the displayed candidate results and the user is improved, and the user experience is improved.
  • the display device 120 of the voice user interface provided in this embodiment may include: a receiving module 1201, a first acquiring module 1202, a second acquiring module 1203, a generating module 1204, and a display module 1205.
  • the receiving module 1201 is configured to collect the user's voice when receiving voice information input by the user into the conference terminal; the voice information includes a voice wake-up word or voice information beginning with the voice wake-up word;
  • the first obtaining module 1202 is configured to obtain the user's identity information according to the user's voice
  • the second obtaining module 1203 is configured to obtain user voice instructions according to the voice information
  • the generating module 1204 is configured to generate user interface information matching the user according to the user's identity information, the conference status of the conference terminal, and the user's voice instruction;
  • the display module 1205 is configured to display the user interface information.
  • the user voice instruction is used to wake up the conference terminal;
  • the generating module 1204 includes:
  • a first determining unit configured to determine the type of the user according to the conference status and the user's identity information, and the user's type is used to indicate the user's familiarity with completing conference control tasks by inputting voice information;
  • the first generating unit is configured to, if the type of the user indicates that the user is a novice user, generate conference operation prompt information and a voice input interface according to the conference state.
  • the generating module 1204 further includes:
  • the second generating unit is configured to generate a voice input interface if the type of the user indicates that the user is a skilled user.
  • the generating module 1204 further includes:
  • a first obtaining unit configured to obtain the role information of the user in the conference
  • the first generating unit is specifically used to:
  • the conference operation prompt information and the voice input interface are generated.
  • the first determining unit includes:
  • the first obtaining subunit is used to obtain the user's historical meeting record according to the user's identity information, the historical meeting record including at least one of the following data: the last time when different meeting control tasks occur, and task accumulation Use frequency and task success rate;
  • the determining subunit is configured to determine the type of the user according to the conference status and the historical conference record of the user.
  • the determining subunit is specifically used to:
  • the type of the user is determined according to the data of the at least one conference control task.
  • the determining subunit is specifically used to:
  • the data of the conference control task includes the last occurrence time, and the time interval between the last occurrence time and the current time is greater than or equal to the first preset threshold, and/or, if the meeting The data of the control task includes the cumulative number of tasks used, and the cumulative number of tasks used is less than or equal to the second preset threshold, and/or, if the data of the conference control task includes the task success rate, and the task success rate is less than or equal to the Three preset thresholds, it is determined that the user is a novice user with respect to the conference control task;
  • the meeting control task is a skilled user; wherein the preset condition corresponding to the last occurrence time is that the time interval between the last occurrence time and the current time is less than the first preset threshold, and the preset corresponding to the cumulative number of times the task is used The condition is that the cumulative number of times the task is used is greater than the second preset threshold, and the preset condition corresponding to the task success rate is that the task success rate is greater than the third preset threshold.
  • the user voice instruction is used to perform a conference control task after waking up the conference terminal, and the operation result of the user voice instruction includes multiple candidate objects;
  • the generation module 1204 includes:
  • the third generating unit is configured to sort the plurality of candidate objects according to the identity information of the user, and generate user interface information matching the user.
  • the third generating unit includes:
  • a second obtaining subunit configured to obtain the correlation between each candidate object and the identity information of the user
  • a generating subunit configured to sort the plurality of candidate objects according to the respective relevance degrees, and generate user interface information matching the user.
  • the second obtaining module 1203 is specifically used to:
  • the user voice instruction is generated after the server semantically understands the voice information.
  • the receiving module 1201 is also used to:
  • the first obtaining module 1202 is specifically used to obtain the user's identity information according to the user's voice and avatar.
  • the first obtaining module 1202 includes:
  • a second determining unit configured to determine the position of the user relative to the conference terminal according to the voice of the user
  • a collection unit configured to collect face information of the user according to the user's position relative to the conference terminal
  • the third determining unit is configured to determine the user's identity information according to the user's face information and face information database.
  • the first obtaining module 1202 further includes:
  • a second obtaining unit configured to obtain voiceprint information of the user according to the user's voice
  • the fourth determining unit is configured to determine the user's identity information based on the user's voiceprint information and voiceprint information database.
  • the display device of the voice user interface provided in the embodiment of the present application may be used to execute the technical solution in the embodiment of the method for displaying a voice user interface of the present application.
  • the implementation principle and technical effect are similar, and are not repeated here.
  • the conference terminal 130 provided in this embodiment may include: a processor 1301, a memory 1302, and a display 1303;
  • the memory 1302 is used to store program instructions
  • the display 1303 is configured to display user interface information according to the control of the processor 1301;
  • the processor 1301 is used to call and execute the program instructions stored in the memory 1302, and when the processor 1301 executes the program instructions stored in the memory 1302, the conference terminal is used to execute the above-mentioned voice user of the present application
  • the technical solution in the embodiment of the display method of the interface has similar implementation principles and technical effects, which will not be repeated here.
  • FIG. 13 only shows a simplified design of the conference terminal.
  • the conference terminal may further include any number of transceivers, processors, memories, and/or communication units, etc., which is not limited in the embodiments of the present application.
  • the conference terminal may also include functional units such as a microphone, a speaker, and buttons.
  • An embodiment of the present application also provides a chip system.
  • the chip system includes a processor, and may further include a memory, which is used to implement the technical solution in the above-mentioned voice user interface display method embodiment of the present application.
  • the implementation principles and technical effects are similar. I won't repeat them here.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • An embodiment of the present application also provides a program which, when executed by a processor, is used to execute the technical solution in the embodiment of the method for displaying a voice user interface of the present application.
  • the implementation principles and technical effects are similar, and are not repeated here. .
  • Embodiments of the present application also provide a computer program product containing instructions that, when run on a computer, cause the computer to execute the technical solution in the embodiment of the method for displaying a voice user interface of the present application.
  • the implementation principles and technical effects are similar. I won't repeat them here.
  • Embodiments of the present application also provide a computer-readable storage medium, in which instructions are stored in the computer-readable storage medium, and when the instructions are run on a computer, the computer is allowed to execute the embodiment of the method for displaying the voice user interface described above
  • the technical solution in is similar in implementation principle and technical effect, and will not be repeated here.
  • the processors involved in the embodiments of the present application may be general-purpose processors, digital signal processors, application specific integrated circuits, field programmable gate arrays or other programmable logic devices, discrete gates or transistor logic devices, and discrete hardware components, which may be implemented or Perform the disclosed methods, steps, and logical block diagrams in the embodiments of the present application.
  • the general-purpose processor may be a microprocessor or any conventional processor.
  • the steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied and executed by a hardware processor, or may be executed and completed by a combination of hardware and software modules in the processor.
  • the memory involved in the embodiments of the present application may be a non-volatile memory, such as a hard disk (HDD) or a solid-state drive (SSD), etc., or a volatile memory (volatile memory), for example Random access memory (random-access memory, RAM).
  • the memory is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and can be accessed by a computer, but is not limited thereto.
  • the disclosed device and method may be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the units is only a division of logical functions.
  • there may be other divisions for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware, or in the form of hardware plus software functional units.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a dedicated computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be from a website site, computer, server or data center Transmit to another website, computer, server or data center via wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device including a server, a data center, and the like integrated with one or more available media.
  • the usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), optical medium (e.g., DVD), or semiconductor medium (e.g., Solid State Disk (SSD)), or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Signal Processing (AREA)
  • Acoustics & Sound (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Security & Cryptography (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Hardware Design (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

一种语音用户界面的显示方法和会议终端,所述方法包括:在接收用户输入会议终端的语音信息时,采集用户的声音。根据用户输入的语音信息可以获取用户语音指令。根据用户的声音可以实时获取用户的身份信息。进而,根据用户的身份信息、用户语音指令和会议终端当前的会议状态,可以显示与用户匹配的用户界面信息。由于考虑了用户的身份信息,可以识别出不同用户对会议的使用需求,针对性的生成用户界面信息,满足了不同用户对于会议***的不同需求,提升了用户界面信息显示的多样性,提升了用户对于会议***的使用感受。

Description

语音用户界面的显示方法和会议终端
本申请要求于2018年12月03日提交中国专利局、申请号为201811467420.5、申请名称为《语音用户界面的显示方法和会议终端》的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及信息处理技术领域,尤其涉及一种语音用户界面的显示方法和会议终端。
背景技术
随着人工智能的兴起,语音交互技术逐渐在各个行业中应用,例如,家庭智能音箱、语音控制车载终端、个人语音助手、语音控制会议***等。
其中,语音控制会议***应用在会议室等公用场所,其独特性在于用户的不固定性。比如,每一次会议的组织者和参与者都在发生变化。目前,语音控制会议***对于所有用户均呈现统一的用户界面。
但是,参与会议的用户不同,在会议中的需求可能不同。例如,对于可以熟练使用会议***的用户来说,用户希望可以高效地完成语音会议控制任务。对于初次使用会议***的用户来说,用户希望获得更多的帮助引导。目前的语音控制会议***,无法满足不同用户对于会议***的不同需求。
发明内容
本申请实施例提供一种语音用户界面的显示方法和会议终端,解决了目前的语音控制会议***无法满足不同用户对于会议***的不同需求的技术问题。
第一方面,本申请实施例提供一种语音用户界面的显示方法,包括:
接收用户输入会议终端的语音信息时,采集用户的声音;语音信息包括语音唤醒词或以语音唤醒词开头的语音信息;
根据用户的声音获取用户的身份信息;
根据语音信息获取用户语音指令;
根据用户的身份信息、会议终端的会议状态和用户语音指令,生成与用户匹配的用户界面信息;
显示用户界面信息。
第一方面提供的语音用户界面的显示方法中,通过在接收用户输入的语音信息时,采集用户的声音。根据用户输入的语音信息可以获取用户语音指令。根据用户的声音可以实时获取用户的身份信息。进而,根据用户的身份信息、用户语音指令和会议终端当前的会议状态,可以显示与用户匹配的用户界面信息。由于考虑了用户的身份信息,可以识别出 不同用户对会议的使用需求,针对性的生成用户界面信息,满足了不同用户对于会议***的不同需求,提升了用户界面信息显示的多样性,提升了用户对于会议***的使用感受。
在一种可能的实现方式中,用户语音指令用于唤醒会议终端;根据用户的身份信息、会议终端的会议状态和用户语音指令,生成与用户匹配的用户界面信息,包括:
根据会议状态和用户的身份信息确定用户的类型,用户的类型用于指示用户通过输入语音信息完成会议控制任务的熟悉程度;
若用户的类型指示用户为生手用户,则根据会议状态生成会议操作提示信息和语音输入界面。
在一种可能的实现方式中,方法还包括:
若用户的类型指示用户为熟练用户,则生成语音输入界面。
在一种可能的实现方式中,若会议状态指示用户已经加入会议,还包括:
获取用户在会议中的角色信息;
根据会议状态生成会议操作提示信息和语音输入界面,包括:
根据会议状态和角色信息,生成会议操作提示信息和语音输入界面。
在一种可能的实现方式中,根据会议状态和用户的身份信息确定用户的类型,包括:
根据用户的身份信息获取用户的历史会议记录,历史会议记录包括下列数据中的至少一项:不同会议控制任务的最近一次发生时间、任务累计使用次数和任务成功率;
根据会议状态和用户的历史会议记录确定用户的类型。
在一种可能的实现方式中,根据会议状态和用户的历史会议记录确定用户的类型,包括:
获取用户的历史会议记录中与会议状态关联的至少一种会议控制任务的数据;
根据至少一种会议控制任务的数据,确定用户的类型。
在一种可能的实现方式中,根据至少一种会议控制任务的数据,确定用户的类型,包括:
针对每种会议控制任务,若该会议控制任务的数据中包括最近一次发生时间、且最近一次发生时间与当前时间之间的时间间隔大于或等于第一预设阈值,和/或,若该会议控制任务的数据中包括任务累计使用次数、且任务累计使用次数小于或等于第二预设阈值,和/或,若该会议控制任务的数据中包括任务成功率、且任务成功率小于或等于第三预设阈值,则确定用户相对于该会议控制任务为生手用户;
针对每种会议控制任务,若该会议控制任务的数据中包括的最近一次发生时间、任务累计使用次数和任务成功率中的至少一种均满足各自对应的预设条件,则确定用户相对于该会议控制任务为熟练用户;其中,最近一次发生时间对应的预设条件为最近一次发生时间与当前时间之间的时间间隔小于第一预设阈值,任务累计使用次数对应的预设条件为任务累计使用次数大于第二预设阈值,任务成功率对应的预设条件为任务成功率大于第三预设阈值。
在一种可能的实现方式中,用户语音指令用于唤醒会议终端后执行会议控制任务,用户语音指令的运行结果包括多个候选对象;根据用户的身份信息、会议终端当前的会议状态和用户语音指令,生成与用户匹配的用户界面信息,包括:
根据用户的身份信息对多个候选对象进行排序,生成与用户匹配的用户界面信息。
在一种可能的实现方式中,根据用户的身份信息对多个候选对象进行排序,生成与用户匹配的用户界面信息,包括:
获取各候选对象与用户的身份信息之间的相关度;
根据各相关度,对多个候选对象进行排序,生成与用户匹配的用户界面信息。
在一种可能的实现方式中,根据语音信息获取用户语音指令,包括:
对语音信息进行语义理解,生成用户语音指令;
或者,
向服务器发送语音信息;
接收服务器发送的用户语音指令,用户语音指令为服务器对语音信息进行语义理解后生成的。
在一种可能的实现方式中,还包括:
在接收用户输入会议终端的语音信息时,采集用户的头像;
根据用户的声音获取用户的身份信息,包括:
根据用户的声音和头像获取用户的身份信息。
在一种可能的实现方式中,根据用户的声音和头像获取用户的身份信息,包括:
根据用户的声音确定用户相对于会议终端的位置;
根据用户相对于会议终端的位置,采集用户的人脸信息;
根据用户的人脸信息和人脸信息库,确定用户的身份信息。
在一种可能的实现方式中,根据用户的声音和头像获取用户的身份信息,还包括:
根据用户的声音获取用户的声纹信息;
根据用户的声纹信息和声纹信息库,确定用户的身份信息。
第二方面,本申请实施例提供一种语音用户界面的显示装置,包括:
接收模块,用于接收用户输入会议终端的语音信息时,采集用户的声音;语音信息包括语音唤醒词或以语音唤醒词开头的语音信息;
第一获取模块,用于根据用户的声音获取用户的身份信息;
第二获取模块,用于根据语音信息获取用户语音指令;
生成模块,用于根据用户的身份信息、会议终端的会议状态和用户语音指令,生成与用户匹配的用户界面信息;
显示模块,用于显示用户界面信息。
在一种可能的实现方式中,用户语音指令用于唤醒会议终端;生成模块,包括:
第一确定单元,用于根据会议状态和用户的身份信息确定用户的类型,用户的类型用于指示用户通过输入语音信息完成会议控制任务的熟悉程度;
第一生成单元,用于若用户的类型指示用户为生手用户,则根据会议状态生成会议操作提示信息和语音输入界面。
在一种可能的实现方式中,生成模块还包括:
第二生成单元,用于若用户的类型指示用户为熟练用户,则生成语音输入界面。
在一种可能的实现方式中,若会议状态指示用户已经加入会议,生成模块还包括:
第一获取单元,用于获取用户在会议中的角色信息;
第一生成单元具体用于:
根据会议状态和角色信息,生成会议操作提示信息和语音输入界面。
在一种可能的实现方式中,第一确定单元,包括:
第一获取子单元,用于根据用户的身份信息获取用户的历史会议记录,历史会议记录包括下列数据中的至少一项:不同会议控制任务的最近一次发生时间、任务累计使用次数和任务成功率;
确定子单元,用于根据会议状态和用户的历史会议记录确定用户的类型。
在一种可能的实现方式中,确定子单元具体用于:
获取用户的历史会议记录中与会议状态关联的至少一种会议控制任务的数据;
根据至少一种会议控制任务的数据,确定用户的类型。
在一种可能的实现方式中,确定子单元具体用于:
针对每种会议控制任务,若该会议控制任务的数据中包括最近一次发生时间、且最近一次发生时间与当前时间之间的时间间隔大于或等于第一预设阈值,和/或,若该会议控制任务的数据中包括任务累计使用次数、且任务累计使用次数小于或等于第二预设阈值,和/或,若该会议控制任务的数据中包括任务成功率、且任务成功率小于或等于第三预设阈值,则确定用户相对于该会议控制任务为生手用户;
针对每种会议控制任务,若该会议控制任务的数据中包括的最近一次发生时间、任务累计使用次数和任务成功率中的至少一种均满足各自对应的预设条件,则确定用户相对于该会议控制任务为熟练用户;其中,最近一次发生时间对应的预设条件为最近一次发生时间与当前时间之间的时间间隔小于第一预设阈值,任务累计使用次数对应的预设条件为任务累计使用次数大于第二预设阈值,任务成功率对应的预设条件为任务成功率大于第三预设阈值。
在一种可能的实现方式中,用户语音指令用于唤醒会议终端后执行会议控制任务,用户语音指令的运行结果包括多个候选对象;生成模块,包括:
第三生成单元,用于根据用户的身份信息对多个候选对象进行排序,生成与用户匹配的用户界面信息。
在一种可能的实现方式中,第三生成单元,包括:
第二获取子单元,用于获取各候选对象与用户的身份信息之间的相关度;
生成子单元,用于根据各相关度,对多个候选对象进行排序,生成与用户匹配的用户界面信息。
在一种可能的实现方式中,第二获取模块具体用于:
对语音信息进行语义理解,生成用户语音指令;
或者,
向服务器发送语音信息;
接收服务器发送的用户语音指令,用户语音指令为服务器对语音信息进行语义理解后生成的。
在一种可能的实现方式中,接收模块,还用于:
在接收用户输入会议终端的语音信息时,采集用户的头像;
第一获取模块,具体用于:根据用户的声音和头像获取用户的身份信息。
在一种可能的实现方式中,第一获取模块,包括:
第二确定单元,用于根据用户的声音确定用户相对于会议终端的位置;
采集单元,用于根据用户相对于会议终端的位置,采集用户的人脸信息;
第三确定单元,用于根据用户的人脸信息和人脸信息库,确定用户的身份信息。
在一种可能的实现方式中,第一获取模块,还包括:
第二获取单元,用于根据用户的声音获取用户的声纹信息;
第四确定单元,用于根据用户的声纹信息和声纹信息库,确定用户的身份信息。
第三方面,本申请实施例提供一种会议终端,包括:处理器、存储器和显示器;
其中,存储器,用于存储程序指令;
显示器,用于根据处理器的控制显示用户界面信息;
处理器,用于调用并执行存储器中存储的程序指令,当处理器执行存储器存储的程序指令时,会议终端用于执行上述第一方面的任意实现方式的方法。
第四方面,本申请实施例提供一种芯片***,该芯片***包括处理器,还可以包括存储器,用于实现上述第一方面的任意实现方式的方法。该芯片***可以由芯片构成,也可以包含芯片和其他分立器件。
第五方面,本申请实施例提供一种程序,该程序在被处理器执行时用于执行上述第一方面的任意实现方式的方法。
第六方面,本申请实施例提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面的任意实现方式的方法。
第七方面,本申请实施例提供一种计算机可读存储介质,计算机可读存储介质中存储有指令,当指令在计算机上运行时,使得计算机执行上述第一方面的任意实现方式的方法。
附图说明
图1为本申请实施例适用的会议***的结构示意图;
图2为本申请实施例涉及的会议***中软件模块的示意图;
图3为本申请实施例一提供的语音用户界面的显示方法的流程图;
图4为本申请实施例二提供的语音用户界面的显示方法的流程图;
图5为本申请实施例二提供的语音用户界面在一种场景下的示意图;
图6为本申请实施例二提供的语音用户界面在另一种场景下的示意图;
图7为本申请实施例二提供的帮助提示区的示意图;
图8为本申请实施例二提供的历史会议记录的示意图;
图9为本申请实施例三提供的语音用户界面在一种场景下的示意图;
图10为本申请实施例三提供的语音用户界面在另一种场景下的示意图;
图11为本申请实施例三提供的语音用户界面在又一种场景下的示意图;
图12为本申请实施例提供的语音用户界面的显示装置的结构示意图;
图13为本申请实施例提供的会议终端的结构示意图。
具体实施方式
图1为本申请实施例适用的会议***的结构示意图。如图1所示,会议***可以包括: 会议终端100和服务器。可选的,服务器可以包括本地服务器200和远端服务器300中的至少一种。远端服务器300的类型可以为传统服务器或者云服务器。会议终端100与本地服务器200之间、会议终端100与远端服务器300之间、本地服务器200与远端服务器300之前均可以进行通信。通信方式可以为有线通信或者无线通信。由于会议终端100的计算能力通常有限,而服务器具有强大的计算能力。因此,会议终端100通过与服务器之间的通信,可以弥补、协助会议终端100的数据处理。
会议***中的各个设备可以预先安装软件程序或者应用程序(Application,APP),通过语音识别技术和语义理解技术,实现用户与会议***之间的语音交互任务。
需要说明的是,本申请实施例对于会议***中会议终端100和服务器的数量不做限定。
会议终端100可以包括:声音采集设备、声音播放设备、拍摄设备、存储器和处理器,等等。其中,声音采集设备用于获取用户输入的语音。拍摄设备可以采集会议环境中的图像或者视频。声音播放设备可以播放语音交互任务结果中的语音部分。可选的,会议终端100还可以包括收发器。所述收发器用于与其他设备进行通信,传输数据或者指令。可选的,会议终端100还可以包括显示屏。显示屏用于显示语音交互任务结果中的可显示部分。可选的,若会议终端100本身不具有显示屏,会议终端100还可以与外部的显示设备进行数据传输,已使显示设备显示语音交互任务结果中的可显示部分。
下面通过示例对语音交互任务进行说明。
在本申请的一些实施方式或者场景中,语音交互任务也可以称为语音任务、会议控制任务,等等。本申请实施例对于语音交互任务实现的功能不做限定。
例如,用户对处于监听状态的会议终端说出语音唤醒词“小微”。语音交互任务可以为唤醒会议终端。该任务执行后,会议终端从监听状态进入待机状态,以等待用户继续输入语音。此时,会议终端的显示屏上可以显示语音输入窗口界面。
又例如,用户对处于会议中的会议终端说出“小微,请呼叫用户A”。语音交互任务可以为唤醒会议终端后发起呼叫。该任务执行后,会议终端可以被唤醒,并执行呼叫用户A。此时,会议终端的显示屏上可以显示正在呼叫用户A的界面。
需要说明的是,本申请实施例对于会议终端100的形状和产品类型不做限定。
需要说明的是,本申请实施例对于会议终端100中各个部件的实现方式不做限定。例如,声音采集设备可以包括麦克风或者麦克风阵列。声音播放设备可以包括喇叭或者扬声器。拍摄设备可以为具有不同像素的摄像头。
下面从软件层面对会议***进行说明。
示例性的,图2为本申请实施例涉及的会议***中软件模块的示意图。其中,麦克风阵列、扬声器和显示屏为硬件模块。
如图2所示,通过麦克风阵列可以获取用户输入的语音。例如,“小微,请呼叫用户A”。语音识别引擎20可以对语音进行处理,将语音转换为文字。语义理解引擎21可以获取文字包含的含义,将文字解析为意图。在本示例中,用户的意图为呼叫A。之后,对话管理模块22输出业务可识别的、可执行的指令。需要说明的是,在本申请的一些实施方式或者场景中,指令也可以称为语音指令、用户语音指令、会议指令、会议控制指令,等等。中控模块23获取到指令后,执行该指令,获得执行结果。执行结果中如果包括需要语音 播放的部分,则通过语音合成引擎28的处理,通过扬声器进行播放。执行结果中如果包括需要屏幕显示的部分,则通过图形用户界面模块29的处理,通过显示屏进行显示。
在本申请实施例中,执行指令时,可以同时考虑用户类型、用户身份和会议状态。基于上述因素执行该指令,可以获得与用户类型、用户身份和会议状态匹配的指令运行结果。提升了界面显示的灵活性,提升了用户感受。
具体的,身份识别引擎24可以利用声源定位技术、声源跟踪技术、声纹识别技术、人脸识别技术和唇动识别技术等中的至少一种技术,在用户信息数据库27中获取用户的身份信息。身份识别引擎24将用户的身份信息输出给中控模块23。
身份类型判定单元25可以确定用户的类型。用户的类型用于指示用户通过输入语音完成会议控制任务的熟练程度。需要说明的是,针对不同的会议控制任务,同一个用户的类型可能不同。例如,用户B经常组织会议,那么,对于进入会议、发起会议、将参会人员加入会议等会议控制任务,用户B可能为熟练用户。但是,如果用户B只是组织会议,不参加后续的会议,那么,对于结束会议、会议进行中的屏幕共享或观看会场等会议控制任务,用户B可能为生手用户。身份类型判定单元25将用户的类型输出给中控模块23。
提示信息管理单元26可以根据会议状态向中控模块23推送不同的提示信息。
最终,中控模块23根据身份识别引擎24、身份类型判定单元25、提示信息管理单元26和会话管理模块22的输出,执行指令获得执行结果。
需要说明的是,本申请实施例涉及的会议***,可以实现图2所示各个模块实现的功能。但是,对于模块的具体划分不做限定,图2中各个模块的划分和名称仅是一种示例。而且,本申请实施例对于图2中各个模块的设置位置不做限定。例如,语音识别引擎20、语义理解引擎21、对话管理模块22、身份类型判定单元25、身份识别引擎24可以设置在会议终端、本地服务器或者远端服务器上。中控模块23、语音合成引擎28、图形用户界面模块29可以设置在会议终端上。
下面以具体实施例对本申请的技术方案进行详细说明。下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例中不再赘述。
需要说明,本申请说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。
图3为本申请实施例一提供的语音用户界面的显示方法的流程图。本实施例提供的语音用户界面的显示方法,执行主体可以为语音用户界面的显示装置或者会议终端。如图3所示,本实施例提供的语音用户界面的显示方法,可以包括:
S301、接收用户输入会议终端的语音信息时,采集用户的声音。
其中,语音信息包括语音唤醒词或以语音唤醒词开头的语音信息。
具体的,用户与会议终端进行语音交互之前,需要首先通过语音唤醒词唤醒会议终端。用户输入的语音信息,可以仅包括语音唤醒词。例如,“小微”。也可以是以语音唤醒词开头的语音信息。例如,“小微,请呼叫用户A”、“小微,请共享会议室B的屏幕”、“小微,我要结束会议”,等等。会议终端上设置有声音采集设备。在用户向会议终端输入语音信息的同时,会议终端可以采集用户的声音。
可选的,会议终端上可以设置有拍摄设备。在用户向会议终端输入语音信息的同时, 会议终端可以采集用户的头像。
需要说明的是,本实施例对于语音唤醒词的实现方式不做限定。
S302、根据用户的声音获取用户的身份信息。
具体的,在接收用户输入的语音信息时,同时采集用户的声音。由于用户的声音是非常具有辨识度的信息,通过用户的声音可以实时获取用户的身份信息,提升了获取用户身份信息的时效性。
之后,可以根据用户的身份信息,确定用户是否为合法用户,以及为用户定制个性化的会议显示界面。例如,对于不同部门的用户,可以按照不同的显示风格显示界面。
可选的,用户的身份信息可以包括下列中的至少一项:
姓名、性别、年龄、毕业时间、工作经历、入职时间、工作部门、工号、工位、座机号码、手机号码、目前是否出差、出差地点、兴趣爱好,等等。
可选的,若接收用户输入会议终端的语音信息时,还采集了用户的头像,S302,根据用户的声音获取用户的身份信息,可以包括:
根据用户的声音和头像获取用户的身份信息。
具体的,用户的头像也是非常具有辨识度的信息。同时利用用户的声音和头像一起获取用户的身份信息,进一步提升了用户身份信息的准确性。尤其是对于使用会议终端的人数众多且人员变化较为频繁的场景,例如,员工数量很多的大型企业。
S303、根据语音信息获取用户语音指令。
具体的,会议终端获取用户输入的语音信息之后,可以对语音进行语音识别和语义理解,获取用户语音指令。所述用户语音指令可以被会议终端执行。
需要说明的是,在本实施例中,对于S302和S303的执行顺序不做限定。例如,可以前后执行,也可以同时执行。
S304、根据用户的身份信息、会议终端的会议状态和用户语音指令,生成与用户匹配的用户界面信息。
S305、显示用户界面信息。
具体的,用户的身份信息不同、会议状态不同、用户语音指令不同,与用户匹配的用户界面信息就可能不同。
下面通过示例进行说明。
在一个示例中,用户的毕业时间为2018年7月,用户的入职时间为2018年8月。当前为2018年11月。说明该用户是刚刚毕业的入职3个月的新员工。假设,会议终端的会议状态为监听状态。用户语音指令用于唤醒会议终端。那么,会议终端由监听状态进入待机状态后,显示的与用户匹配的用户界面信息可以包括与进入会议相关的提示信息。
在另一个示例中,用户的入职时间为2014年。当前为2018年。说明该用户是已经工作了4年的员工。可以确定该用户熟悉会议流程。在与上个示例同样的场景下,当会议终端由监听状态进入待机状态后,可以不显示任何提示信息,仅显示语音输入窗口。
其中,会议状态用于指示会议或者会议终端的执行阶段和执行状态。本实施例对于会议状态的具体分类不做限定。
可选的,会议状态可以包括下列中的至少一种:未加入会议、已加入会议、会议中正在共享屏幕、会议中观看会场,等等。
可见,本实施例提供的语音用户界面的显示方法,在接收用户输入会议终端的语音信息时,采集用户的声音。根据用户输入的语音信息可以获取用户语音指令。根据用户的声音可以实时获取用户的身份信息。进而,根据用户的身份信息、用户语音指令和会议终端当前的会议状态,可以显示与用户匹配的用户界面信息。由于考虑了用户的身份信息,可以识别出不同用户对会议的使用需求,针对性的生成用户界面信息,满足了不同用户对于会议***的不同需求,提升了用户界面信息显示的多样性,提升了用户对于会议***的使用感受。
可选的,S302中,根据语音信息获取用户语音指令,可以包括:
根据用户的声音获取用户的声纹信息。
根据用户的声纹信息和声纹信息库,确定用户的身份信息。
具体的,可以采用声纹识别技术等获取用户的声纹信息,进而在声纹信息库中查找匹配,确定用户的身份信息。
可选的,声纹信息库可以周期更新。
可选的,根据用户的声音和头像获取用户的身份信息,可以包括:
根据用户的声音确定用户相对于会议终端的位置。
根据用户相对于会议终端的位置,采集用户的人脸信息。
根据用户的人脸信息和人脸信息库,确定用户的身份信息。
具体的,可以采用声源跟踪技术、声源定位技术或者唇动识别技术等,确定用户相对于会议终端的位置。进而,在拍摄设备采集的图像或者视频中,根据用户相对于会议终端的位置,采用人脸识别技术等采集该用户的人脸信息。之后,根据用户的人脸信息在人脸信息库中查找匹配,确定用户的身份信息。
可选的,人脸信息库可以周期更新。
可选的,用户相对于会议终端的位置可以包括用户相对于会议终端的方向。
可选的,S302中,根据用户的声音和头像获取用户的身份信息,还可以包括:
根据用户的声音获取用户的声纹信息。
根据用户的声纹信息和声纹信息库,确定用户的身份信息。
在该种实现方式中,获取用户的声纹信息后,可以在声纹信息库中查找匹配,确定用户的身份信息。由于基于用户的声纹特征和人脸匹配一起确定用户的身份信息,进一步提升了用户身份信息的准确性。
可选的,S303中,根据语音信息获取用户语音指令,包括:
对语音信息进行语义理解,生成用户语音指令。
或者,
向服务器发送语音信息。
接收服务器发送的用户语音指令,用户语音指令为服务器对语音信息进行语义理解后生成的。
具体的,在一种实现方式中,会议终端自身可以进行语音识别和语义理解,根据用户输入的语音信息生成用户语音指令。简化了获取用户语音指令的处理流程。
在另一种实现方式中,会议终端与服务器之间可以进行数据传输,由服务器对用户输入的语音信息进行语音识别和语义理解。服务器将用户语音指令返回给会议终端即可。降 低了会议终端的硬件配置,易于实现。
本实施例提供一种语音用户界面的显示方法,包括:接收用户输入会议终端的语音信息时,采集用户的声音,根据用户的声音获取用户的身份信息,根据语音信息获取用户语音指令,根据用户的身份信息、会议终端的会议状态和用户语音指令,生成与用户匹配的用户界面信息,显示用户界面信息。本实施例提供的语音用户界面的显示方法,在接收用户输入的语音信息时,采集用户的声音。根据用户的声音可以实时获取用户的身份信息。由于考虑了用户的身份信息、会议终端的会议状态和用户希望执行的语音交互任务,可以识别出不同用户对会议的使用需求,针对性的生成用户界面信息,满足了不同用户对于会议***的不同需求,提升了用户界面信息显示的多样性,提升了用户对于会议***的使用感受。
图4为本申请实施例二提供的语音用户界面的显示方法的流程图。本实施例提供的语音用户界面的显示方法,在图3所示实施例的基础上,提供了用户语音指令用于唤醒会议终端的场景下,语音用户界面的显示方法的一种实现方式。
如图4所示,用户语音指令用于唤醒会议终端。S304,根据用户的身份信息、会议终端的会议状态和用户语音指令,生成与用户匹配的用户界面信息,可以包括:
S401、根据会议状态和用户的身份信息确定用户的类型。
其中,用户的类型用于指示用户通过输入语音信息完成会议控制任务的熟悉程度。
S402、若用户的类型指示用户为生手用户,则根据会议状态生成会议操作提示信息和语音输入界面。
S403、若用户的类型指示用户为熟练用户,则生成语音输入界面。
具体的,对于同一个用户而言,会议状态不同,用户通过输入语音信息完成会议控制任务的熟悉程度可能不同。当确定用户为生手用户时,可以生成会议操作提示信息和语音输入界面。通过会议操作提示信息,可以对生手用户起到很好的会议引导作用,提升了新手用户输入语音的效率和准确率,同时提升了新手用户完成会议控制任务的成功率。满足了新手用户的会议需求。当确定用户为熟练用户时,是不需要提示引导的。此时,仅生成语音输入界面。用户可以直接输入语音完成相应的会议控制任务。由于节约了显示会议操作提示信息的时间和步骤,节省了引导流程,提升了熟练用户完成会议控制任务的效率。满足了熟练用户的会议需求。提升了用户感受。
下面通过具体示例进行说明。
可选的,在一个示例中,图5为本申请实施例二提供的语音用户界面在一种场景下的示意图,适用于新手用户。
如图5左侧所示,当前显示的语音用户界面为监听画面。需要说明的是,监听画面在不同的会议状态下可能不同。本示例对于监听画面不做限定。当会议终端接收到语音唤醒词后,如图5右侧所示,语音用户界面可以包括帮助提示区101和语音输入界面102。帮助提示区101中可以显示会议操作提示信息。
需要说明的是,本示例对于帮助提示区101和语音输入界面102的显示位置、显示内容和显示风格不做限定。
可选的,帮助提示区101可以显示在语音用户界面的醒目区域,便于新手用户看到。
可选的,在另一个示例中,图6为本申请实施例二提供的语音用户界面在另一种场景下的示意图,适用于熟练用户。
如图6左侧所示,当前显示的语音用户界面为监听画面。监听画面可以参见图5的说明。当会议终端接收到语音唤醒词后,如图6右侧所示,语音用户界面可以包括语音输入界面102。相比于图5右侧的语音用户界面,对于熟练用户而言,本示例的语音用户界面更加简单,没有冗余的提示信息,提升了熟练用户的会议感受。
可选的,本实施例提供的语音用户界面的显示方法,若会议状态指示用户已经加入会议,还可以包括:
获取用户在会议中的角色信息。
相应的,S402中,根据会议状态生成会议操作提示信息和语音输入界面,可以包括:
根据会议状态和角色信息,生成会议操作提示信息和语音输入界面。
具体的,对于一个会议,从创建会议开始直至会议结束的整个过程中,会议状态可以有多种。不同的会议状态涉及的会议控制任务也可能不同。当用户已经加入会议,用户在会议中的角色也可以有多种。例如,会议***、非会议***。本实施例对于用户的会议中的角色划分不做限定。
因此,根据不同的会议状态,如果会议状态指示用户已经加入会议,根据会议状态和用户在会议中的角色信息,生成会议操作提示信息和语音输入界面,进一步提升了提示信息与用户的匹配度,提升了用户对于会议***的使用感受。
下面通过具体示例进行说明。
图7为本申请实施例二提供的帮助提示区的示意图。在图5所示示例的基础上,针对不同的会议状态、不同的角色信息对帮助提示区进行说明。
如图7中(a)所示,会议状态为:未入会。会议操作提示信息可以包括帮助提示区101中包括的信息。
如图7中(b)所示,会议状态为:已入会,多点会议,互动式语音应答(Interactive Voice Response,IVR)播报会议结束提醒之前。用户在会议中的角色信息为:会议***。会议操作提示信息可以包括帮助提示区中包括的信息。
将(a)场景与(b)场景进行比较,可见,当用户未入会时,涉及的会议控制任务可以包括“加入会议”,不会涉及“退出会议”。但是,如果用户已入会,将不会涉及“加入会议”,可能涉及“退出会议”。
如图7中(c)所示,会议状态为:已入会,多点会议,IVR播报会议结束提醒之后。用户在会议中的角色信息为:会议***。会议操作提示信息可以包括帮助提示区中包括的信息。
如图7中(d)所示,会议状态为:已入会,多点会议。用户在会议中的角色信息为:非会议***。会议操作提示信息可以包括帮助提示区中包括的信息。
如图7中(e)所示,会议状态为:已入会,点对点会议。由于是点对点会议,不涉及用户在会议中的角色信息。会议操作提示信息可以包括帮助提示区中包括的信息。
如图7中(f)所示,会议状态为:非语音发起的点对点呼叫,正在呼叫。会议操作提示信息可以包括帮助提示区中包括的信息。
可选的,S401中,根据会议状态和用户的身份信息确定用户的类型,可以包括:
根据用户的身份信息获取用户的历史会议记录,历史会议记录包括下列数据中的至少一项:不同会议控制任务的最近一次发生时间、任务累计使用次数和任务成功率。
根据会议状态和用户的历史会议记录确定用户的类型。
下面结合图8进行说明。
图8为本申请实施例二提供的历史会议记录的示意图。如图8所示,历史会议记录库中存储有用户的历史会议记录。针对一个具体的用户,例如,用户1。用户1的历史会议记录包括多个会议控制任务的数据。例如,任务1~任务n。针对用户1的每个任务,可以包括下列中的至少一项:最近一次发生时间、任务累计使用次数和任务成功率。其中,最近一次发生时间与当前时间越近、任务累计使用次数越多、任务成功率越高,说明用户对该会议控制任务越熟悉。反之,最近一次发生时间与当前时间越远、任务累计使用次数越少、任务成功率越低,说明用户对该会议控制任务越不熟悉。根据会议状态和用户的历史会议记录可以确定用户的类型。
需要说明的是,本实施例对于历史会议记录库中记录数据的方式不做限定。例如,可以采用表格的形式存储数据。
可选的,历史会议记录库可以周期性更新。
可选的,根据会议状态和用户的历史会议记录确定用户的类型,可以包括:
获取用户的历史会议记录中与会议状态关联的至少一种会议控制任务的数据。
根据至少一种会议控制任务的数据,确定用户的类型。
具体的,对于一个会议,从创建会议开始直至会议结束的整个过程中,会议状态可以有多种。不同的会议状态涉及的会议控制任务也可能不同。通过对与会议状态关联的至少一种会议控制任务的数据确定用户的类型,进一步提高了确定用户类型的准确性。
可选的,根据至少一种会议控制任务的数据,确定用户的类型,可以包括:
针对每种会议控制任务,若该会议控制任务的数据中包括最近一次发生时间、且最近一次发生时间与当前时间之间的时间间隔大于或等于第一预设阈值,和/或,若该会议控制任务的数据中包括任务累计使用次数、且任务累计使用次数小于或等于第二预设阈值,和/或,若该会议控制任务的数据中包括任务成功率、且任务成功率小于或等于第三预设阈值,则确定用户相对于该会议控制任务为生手用户。
具体的,在确定用户针对一个会议控制任务为生手用户的条件上,对于最近一次发生时间、任务累计使用次数和任务成功率,只要其中存在一种数据满足其对应的用户为生手用户的条件,就可以确定用户为生手用户。
例如,如果会议控制任务的数据中包括最近一次发生时间和任务成功率。在一种实现方式中,最近一次发生时间与当前时间之间的时间间隔大于或等于第一预设阈值。任务成功率大于第三预设阈值。由于最近一次发生时间满足其对应的用户为生手用户的条件,即使任务成功率不满足其对应的用户为生手用户的条件,也确定该用户为生手用户。
需要说明的是,本实施例对于第一预设阈值、第二预设阈值、第三预设阈值的具体取值不做限定。
需要说明的是,若确定用户为生手用户使用的数据种类为多种,对于判断各种数据是否满足对应的用户为生手用户条件的执行顺序不做限定。
可选的,根据至少一种会议控制任务的数据,确定用户的类型,可以包括:
针对每种会议控制任务,若该会议控制任务的数据中包括的最近一次发生时间、任务累计使用次数和任务成功率中的至少一种均满足各自对应的预设条件,则确定用户相对于该会议控制任务为熟练用户。其中,最近一次发生时间对应的预设条件为最近一次发生时间与当前时间之间的时间间隔小于第一预设阈值,任务累计使用次数对应的预设条件为任务累计使用次数大于第二预设阈值,任务成功率对应的预设条件为任务成功率大于第三预设阈值。
具体的,在确定用户针对一个会议控制任务为熟练用户的条件上,对于最近一次发生时间、任务累计使用次数和任务成功率,必须所有的数据均满足用户为生手用户的条件,才可以确定用户为熟练用户。
例如,如果会议控制任务的数据中包括最近一次发生时间和任务成功率,只有在最近一次发生时间与当前时间之间的时间间隔小于第一预设阈值,且,任务成功率大于第三预设阈值时,才可以确定用户针对该会议控制任务为熟练用户。
又例如,如果会议控制任务的数据中包括最近一次发生时间、任务累计使用次数和任务成功率,只有在最近一次发生时间与当前时间之间的时间间隔小于第一预设阈值,且,任务累计使用次数大于第二预设阈值,且,任务成功率大于第三预设阈值时,才可以确定用户针对该会议控制任务为熟练用户。
需要说明的是,若确定用户为熟练用户使用的数据种类为多种,对于判断各种数据是否满足对应的用户为熟练用户条件的执行顺序不做限定。
本实施例提供一种语音用户界面的显示方法,根据会议状态和用户的身份信息确定用户的类型。若用户的类型指示用户为生手用户,则根据会议状态生成会议操作提示信息和语音输入界面。若用户的类型指示用户为熟练用户,则生成语音输入界面。对于生手用户,通过会议操作提示信息,可以对生手用户起到很好的引导作用,提升了新手用户输入语音的效率和准确率,提升了完成会议控制任务的成功率。对于熟练用户,避免显示冗余的提示信息,节省了引导流程,提升了熟练用户完成会议控制任务的效率。满足了不同用户对于会议***的不同需求,提升了用户感受。
本申请实施例三还提供一种语音用户界面的显示方法。本实施例在图3所示实施例的基础上,提供了用户语音指令用于唤醒会议终端后执行会议控制任务的场景下,语音用户界面的显示方法的一种实现方式。
在本实施例中,用户语音指令用于唤醒会议终端后执行会议控制任务。若用户语音指令的运行结果包括多个候选对象。S304,根据用户的身份信息、会议终端当前的会议状态和用户语音指令,生成与用户匹配的用户界面信息,可以包括:
根据用户的身份信息对多个候选对象进行排序,生成与用户匹配的用户界面信息。
下面通过示例进行说明。
假设,用户1输入的语音为“小微,呼叫李军”。生成的用户语音指令用于唤醒会议终端后呼叫李军。但是公司中有多个李军。而且,由于用户输入的是语音,与“李军”语音相同的名字有很多,例如,李俊、李君,等等。此时,用户语音指令的运行结果包括多个候选对象。需要根据用户的身份信息对多个候选对象进行排序,生成与用户匹配的用户界面信息。从而提升显示的候选结果与用户的匹配度,提升了用户感受。
可选的,根据用户的身份信息对多个候选对象进行排序,生成与用户匹配的用户界面信息,可以包括:
获取各候选对象与用户的身份信息之间的相关度。
根据各相关度,对多个候选对象进行排序,生成与用户匹配的用户界面信息。
下面通过示例进行说明。
可选的,在一个示例中,图9为本申请实施例三提供的语音用户界面在一种场景下的示意图。如图9所示,用户1想要呼叫李军。用户1的部门为部门1。首先,将与“李军”名字相同的用户筛选出来,认为名字相同的相关度更高。将名字不同的用户排在后面。然后,对于多个“李军”,可以按照部门的相关度,将与用户1相同部门的“李军”排在前面。得到最终的排序。
可选的,在另一个示例中,图10为本申请实施例三提供的语音用户界面在另一种场景下的示意图。如图10所示,用户2想要呼叫李军。用户2的部门为部门3。首先,将与用户2部门相同的用户筛选出来,认为部门相同的相关度更高。将名字不同的用户排在后面。然后,对于部门3中的多个候选用户,将与“李军”名字相同的用户排在前面。得到最终的排序。
可选的,若用户语音指令的运行结果唯一,则直接显示用户界面信息。
下面通过示例进行说明。
可选的,图11为本申请实施例三提供的语音用户界面在又一种场景下的示意图。如图11所示,如果公司中只有一个“李军”,则直接拨打李军的电话,并显示呼叫“李军”的界面。
本实施例提供一种语音用户界面的显示方法,当用户语音指令用于唤醒会议终端后执行会议控制任务,若用户语音指令的运行结果包括多个候选对象,则根据用户的身份信息对多个候选对象进行排序,生成与用户匹配的用户界面信息。提升了显示的候选结果与用户的匹配度,提升了用户感受。
图12为本申请实施例提供的语音用户界面的显示装置的结构示意图。如图12所示,本实施例提供的语音用户界面的显示装置120可以包括:接收模块1201、第一获取模块1202、第二获取模块1203、生成模块1204和显示模块1205。
其中,接收模块1201,用于接收用户输入会议终端的语音信息时,采集所述用户的声音;所述语音信息包括语音唤醒词或以所述语音唤醒词开头的语音信息;
第一获取模块1202,用于根据所述用户的声音获取所述用户的身份信息;
第二获取模块1203,用于根据所述语音信息获取用户语音指令;
生成模块1204,用于根据所述用户的身份信息、会议终端的会议状态和所述用户语音指令,生成与所述用户匹配的用户界面信息;
显示模块1205,用于显示所述用户界面信息。
在一种可能的实现方式中,所述用户语音指令用于唤醒所述会议终端;所述生成模块1204,包括:
第一确定单元,用于根据所述会议状态和所述用户的身份信息确定所述用户的类型,所述用户的类型用于指示所述用户通过输入语音信息完成会议控制任务的熟悉程度;
第一生成单元,用于若所述用户的类型指示所述用户为生手用户,则根据所述会议状态生成会议操作提示信息和语音输入界面。
在一种可能的实现方式中,所述生成模块1204还包括:
第二生成单元,用于若所述用户的类型指示所述用户为熟练用户,则生成语音输入界面。
在一种可能的实现方式中,若所述会议状态指示所述用户已经加入会议,所述生成模块1204还包括:
第一获取单元,用于获取所述用户在所述会议中的角色信息;
所述第一生成单元具体用于:
根据所述会议状态和所述角色信息,生成所述会议操作提示信息和所述语音输入界面。
在一种可能的实现方式中,所述第一确定单元,包括:
第一获取子单元,用于根据所述用户的身份信息获取所述用户的历史会议记录,所述历史会议记录包括下列数据中的至少一项:不同会议控制任务的最近一次发生时间、任务累计使用次数和任务成功率;
确定子单元,用于根据所述会议状态和所述用户的历史会议记录确定所述用户的类型。
在一种可能的实现方式中,所述确定子单元具体用于:
获取所述用户的历史会议记录中与所述会议状态关联的至少一种会议控制任务的数据;
根据所述至少一种会议控制任务的数据,确定所述用户的类型。
在一种可能的实现方式中,所述确定子单元具体用于:
针对每种会议控制任务,若该会议控制任务的数据中包括最近一次发生时间、且最近一次发生时间与当前时间之间的时间间隔大于或等于第一预设阈值,和/或,若该会议控制任务的数据中包括任务累计使用次数、且任务累计使用次数小于或等于第二预设阈值,和/或,若该会议控制任务的数据中包括任务成功率、且任务成功率小于或等于第三预设阈值,则确定所述用户相对于该会议控制任务为生手用户;
针对每种会议控制任务,若该会议控制任务的数据中包括的最近一次发生时间、任务累计使用次数和任务成功率中的至少一种均满足各自对应的预设条件,则确定所述用户相对于该会议控制任务为熟练用户;其中,最近一次发生时间对应的预设条件为最近一次发生时间与当前时间之间的时间间隔小于所述第一预设阈值,任务累计使用次数对应的预设条件为任务累计使用次数大于所述第二预设阈值,任务成功率对应的预设条件为任务成功率大于所述第三预设阈值。
在一种可能的实现方式中,所述用户语音指令用于唤醒所述会议终端后执行会议控制任务,所述用户语音指令的运行结果包括多个候选对象;所述生成模块1204,包括:
第三生成单元,用于根据所述用户的身份信息对所述多个候选对象进行排序,生成与所述用户匹配的用户界面信息。
在一种可能的实现方式中,所述第三生成单元,包括:
第二获取子单元,用于获取各候选对象与所述用户的身份信息之间的相关度;
生成子单元,用于根据各所述相关度,对所述多个候选对象进行排序,生成与所述用户匹配的用户界面信息。
在一种可能的实现方式中,所述第二获取模块1203具体用于:
对所述语音信息进行语义理解,生成所述用户语音指令;
或者,
向服务器发送所述语音信息;
接收所述服务器发送的所述用户语音指令,所述用户语音指令为所述服务器对所述语音信息进行语义理解后生成的。
在一种可能的实现方式中,接收模块1201,还用于:
在接收用户输入会议终端的语音信息时,采集用户的头像;
第一获取模块1202,具体用于:根据用户的声音和头像获取用户的身份信息。
在一种可能的实现方式中,所述第一获取模块1202,包括:
第二确定单元,用于根据所述用户的声音确定所述用户相对于所述会议终端的位置;
采集单元,用于根据所述用户相对于所述会议终端的位置,采集所述用户的人脸信息;
第三确定单元,用于根据所述用户的人脸信息和人脸信息库,确定所述用户的身份信息。
在一种可能的实现方式中,所述第一获取模块1202,还包括:
第二获取单元,用于根据所述用户的声音获取所述用户的声纹信息;
第四确定单元,用于根据所述用户的声纹信息和声纹信息库,确定所述用户的身份信息。
本申请实施例提供的语音用户界面的显示装置,可以用于执行本申请上述语音用户界面的显示方法实施例中的技术方案,其实现原理和技术效果类似,此处不再赘述。
图13为本申请实施例提供的会议终端的结构示意图。如图13所示,本实施例提供的会议终端130可以包括:处理器1301、存储器1302和显示器1303;
其中,所述存储器1302,用于存储程序指令;
所述显示器1303,用于根据所述处理器1301的控制显示用户界面信息;
所述处理器1301,用于调用并执行所述存储器1302中存储的程序指令,当所述处理器1301执行所述存储器1302存储的程序指令时,所述会议终端用于执行本申请上述语音用户界面的显示方法实施例中的技术方案,其实现原理和技术效果类似,此处不再赘述。
可以理解的是,图13仅仅示出了会议终端的简化设计。在其他的实施方式中,会议终端还可以包含任意数量的收发器、处理器、存储器和/或通信单元等,本申请实施例中对此并不作限制。此外,会议终端中还可以包括麦克风、扬声器、按键等功能单元。
本申请实施例还提供一种芯片***,该芯片***包括处理器,还可以包括存储器,用于实现本申请上述语音用户界面的显示方法实施例中的技术方案,其实现原理和技术效果类似,此处不再赘述。该芯片***可以由芯片构成,也可以包含芯片和其他分立器件。
本申请实施例还提供一种程序,该程序在被处理器执行时用于执行本申请上述语音用户界面的显示方法实施例中的技术方案,其实现原理和技术效果类似,此处不再赘述。
本申请实施例还提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行本申请上述语音用户界面的显示方法实施例中的技术方案,其实现原理和技术 效果类似,此处不再赘述。
本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当所述指令在计算机上运行时,使得计算机执行本申请上述语音用户界面的显示方法实施例中的技术方案,其实现原理和技术效果类似,此处不再赘述。
本申请实施例中涉及的处理器可以是通用处理器、数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件,可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。
本申请实施例中涉及的存储器可以是非易失性存储器,比如硬盘(hard disk drive,HDD)或固态硬盘(solid-state drive,SSD)等,还可以是易失性存储器(volatile memory),例如随机存取存储器(random-access memory,RAM)。存储器是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个***,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。
本领域普通技术人员可以理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
在上述各实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用 介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘Solid State Disk(SSD))等。

Claims (15)

  1. 一种语音用户界面的显示方法,其特征在于,包括:
    接收用户输入会议终端的语音信息时,采集所述用户的声音;所述语音信息包括语音唤醒词或以所述语音唤醒词开头的语音信息;
    根据所述用户的声音获取所述用户的身份信息;
    根据所述语音信息获取用户语音指令;
    根据所述用户的身份信息、会议终端的会议状态和所述用户语音指令,生成与所述用户匹配的用户界面信息;
    显示所述用户界面信息。
  2. 根据权利要求1所述的方法,其特征在于,所述用户语音指令用于唤醒所述会议终端;所述根据所述用户的身份信息、会议终端的会议状态和所述用户语音指令,生成与所述用户匹配的用户界面信息,包括:
    根据所述会议状态和所述用户的身份信息确定所述用户的类型,所述用户的类型用于指示所述用户通过输入语音信息完成会议控制任务的熟悉程度;
    若所述用户的类型指示所述用户为生手用户,则根据所述会议状态生成会议操作提示信息和语音输入界面。
  3. 根据权利要求2所述的方法,其特征在于,还包括:
    若所述用户的类型指示所述用户为熟练用户,则生成语音输入界面。
  4. 根据权利要求2所述的方法,其特征在于,若所述会议状态指示所述用户已经加入会议,还包括:
    获取所述用户在所述会议中的角色信息;
    所述根据所述会议状态生成会议操作提示信息和语音输入界面,包括:
    根据所述会议状态和所述角色信息,生成所述会议操作提示信息和所述语音输入界面。
  5. 根据权利要求2-4任一项所述的方法,其特征在于,所述根据所述会议状态和所述用户的身份信息确定所述用户的类型,包括:
    根据所述用户的身份信息获取所述用户的历史会议记录,所述历史会议记录包括下列数据中的至少一项:不同会议控制任务的最近一次发生时间、任务累计使用次数和任务成功率;
    根据所述会议状态和所述用户的历史会议记录确定所述用户的类型。
  6. 根据权利要求5所述的方法,其特征在于,所述根据所述会议状态和所述用户的历史会议记录确定所述用户的类型,包括:
    获取所述用户的历史会议记录中与所述会议状态关联的至少一种会议控制任务的数据;
    根据所述至少一种会议控制任务的数据,确定所述用户的类型。
  7. 根据权利要求6所述的方法,其特征在于,所述根据所述至少一种会议控制任务的数据,确定所述用户的类型,包括:
    针对每种会议控制任务,若该会议控制任务的数据中包括最近一次发生时间、且最近一次发生时间与当前时间之间的时间间隔大于或等于第一预设阈值,和/或,若该会议控制 任务的数据中包括任务累计使用次数、且任务累计使用次数小于或等于第二预设阈值,和/或,若该会议控制任务的数据中包括任务成功率、且任务成功率小于或等于第三预设阈值,则确定所述用户相对于该会议控制任务为生手用户;
    针对每种会议控制任务,若该会议控制任务的数据中包括的最近一次发生时间、任务累计使用次数和任务成功率中的至少一种均满足各自对应的预设条件,则确定所述用户相对于该会议控制任务为熟练用户;其中,最近一次发生时间对应的预设条件为最近一次发生时间与当前时间之间的时间间隔小于所述第一预设阈值,任务累计使用次数对应的预设条件为任务累计使用次数大于所述第二预设阈值,任务成功率对应的预设条件为任务成功率大于所述第三预设阈值。
  8. 根据权利要求1所述的方法,其特征在于,所述用户语音指令用于唤醒所述会议终端后执行会议控制任务,所述用户语音指令的运行结果包括多个候选对象;所述根据所述用户的身份信息、会议终端当前的会议状态和所述用户语音指令,生成与所述用户匹配的用户界面信息,包括:
    根据所述用户的身份信息对所述多个候选对象进行排序,生成与所述用户匹配的用户界面信息。
  9. 根据权利要求8所述的方法,其特征在于,所述根据所述用户的身份信息对所述多个候选对象进行排序,生成与所述用户匹配的用户界面信息,包括:
    获取各候选对象与所述用户的身份信息之间的相关度;
    根据各所述相关度,对所述多个候选对象进行排序,生成与所述用户匹配的用户界面信息。
  10. 根据权利要求1-9任一项所述的方法,其特征在于,所述根据所述语音信息获取用户语音指令,包括:
    对所述语音信息进行语义理解,生成所述用户语音指令;
    或者,
    向服务器发送所述语音信息;
    接收所述服务器发送的所述用户语音指令,所述用户语音指令为所述服务器对所述语音信息进行语义理解后生成的。
  11. 根据权利要求1-10任一项所述的方法,其特征在于,还包括:
    在所述接收用户输入会议终端的语音信息时,采集所述用户的头像;
    所述根据所述用户的声音获取所述用户的身份信息,包括:
    根据所述用户的声音和头像获取所述用户的身份信息。
  12. 根据权利要求11所述的方法,其特征在于,所述根据所述用户的声音和头像获取所述用户的身份信息,包括:
    根据所述用户的声音确定所述用户相对于所述会议终端的位置;
    根据所述用户相对于所述会议终端的位置,采集所述用户的人脸信息;
    根据所述用户的人脸信息和人脸信息库,确定所述用户的身份信息。
  13. 根据权利要求12所述的方法,其特征在于,所述根据所述用户的声音和头像获取所述用户的身份信息,还包括:
    根据所述用户的声音获取所述用户的声纹信息;
    根据所述用户的声纹信息和声纹信息库,确定所述用户的身份信息。
  14. 一种会议终端,其特征在于,包括:处理器、存储器和显示器;
    其中,所述存储器,用于存储程序指令;
    所述显示器,用于根据所述处理器的控制显示用户界面信息;
    所述处理器,用于调用并执行所述存储器中存储的程序指令,当所述处理器执行所述存储器存储的程序指令时,所述会议终端用于执行如权利要求1至13中任一项所述的方法。
  15. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有指令,当所述指令在计算机上运行时,使得计算机执行如权利要求1至13中任一项所述的方法。
PCT/CN2019/118081 2018-12-03 2019-11-13 语音用户界面的显示方法和会议终端 WO2020114213A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP19894084.3A EP3869504A4 (en) 2018-12-03 2019-11-13 VOICE UI DISPLAY PROCEDURE AND CONFERENCE DEVICE
US17/331,953 US20210286867A1 (en) 2018-12-03 2021-05-27 Voice user interface display method and conference terminal

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811467420.5 2018-12-03
CN201811467420.5A CN111258528B (zh) 2018-12-03 2018-12-03 语音用户界面的显示方法和会议终端

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/331,953 Continuation US20210286867A1 (en) 2018-12-03 2021-05-27 Voice user interface display method and conference terminal

Publications (1)

Publication Number Publication Date
WO2020114213A1 true WO2020114213A1 (zh) 2020-06-11

Family

ID=70955239

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/118081 WO2020114213A1 (zh) 2018-12-03 2019-11-13 语音用户界面的显示方法和会议终端

Country Status (4)

Country Link
US (1) US20210286867A1 (zh)
EP (1) EP3869504A4 (zh)
CN (1) CN111258528B (zh)
WO (1) WO2020114213A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113314115A (zh) * 2021-05-28 2021-08-27 深圳创维-Rgb电子有限公司 终端设备的语音处理方法、终端设备及可读存储介质
CN113450795A (zh) * 2021-06-28 2021-09-28 深圳七号家园信息技术有限公司 一种具有语音唤醒功能的图像识别方法及***
EP3955099A1 (en) * 2020-08-11 2022-02-16 Beijing Xiaomi Mobile Software Co., Ltd. Method and device for controlling the operation mode of a terminal device, and storage medium

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112445475B (zh) * 2020-11-06 2024-07-05 杭州讯酷科技有限公司 一种基于数据表推荐的***快速构建方法
CN112287691B (zh) * 2020-11-10 2024-02-13 深圳市天彦通信股份有限公司 会议录音方法及相关设备
CN113301211B (zh) * 2020-12-16 2024-04-19 阿里巴巴集团控股有限公司 通话处理方法、装置、通信***及数据展示方法
CN112788004B (zh) * 2020-12-29 2023-05-09 上海掌门科技有限公司 一种通过虚拟会议机器人执行指令的方法、设备与计算机可读介质
CN113271431B (zh) * 2021-05-18 2022-11-29 中国工商银行股份有限公司 一种界面展示方法、装置及设备
WO2023065205A1 (en) * 2021-10-21 2023-04-27 Citrix Systems, Inc. Voice assisted remote screen sharing

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103730120A (zh) * 2013-12-27 2014-04-16 深圳市亚略特生物识别科技有限公司 电子设备的语音控制方法及***
CN106887227A (zh) * 2015-12-16 2017-06-23 芋头科技(杭州)有限公司 一种语音唤醒方法及***
US20170359467A1 (en) * 2016-06-10 2017-12-14 Glen A. Norris Methods and Apparatus to Assist Listeners in Distinguishing Between Electronically Generated Binaural Sound and Physical Environment Sound
US20180293221A1 (en) * 2017-02-14 2018-10-11 Microsoft Technology Licensing, Llc Speech parsing with intelligent assistant
CN108881783A (zh) * 2017-05-09 2018-11-23 腾讯科技(深圳)有限公司 实现多人会话的方法和装置、计算机设备和存储介质

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8165884B2 (en) * 2008-02-15 2012-04-24 Microsoft Corporation Layered prompting: self-calibrating instructional prompting for verbal interfaces
US20090210491A1 (en) * 2008-02-20 2009-08-20 Microsoft Corporation Techniques to automatically identify participants for a multimedia conference event
US20120089392A1 (en) * 2010-10-07 2012-04-12 Microsoft Corporation Speech recognition user interface
US8768707B2 (en) * 2011-09-27 2014-07-01 Sensory Incorporated Background speech recognition assistant using speaker verification
JP6028429B2 (ja) * 2012-07-10 2016-11-16 富士ゼロックス株式会社 表示制御装置、サービス提供装置、及びプログラム
US20140282178A1 (en) * 2013-03-15 2014-09-18 Microsoft Corporation Personalized community model for surfacing commands within productivity application user interfaces
CN105094738B (zh) * 2015-08-24 2018-08-31 联想(北京)有限公司 一种控制方法及电子设备
CN105306872B (zh) * 2015-10-21 2019-03-01 华为技术有限公司 控制多点视频会议的方法、装置和***
CN105867720A (zh) * 2015-12-15 2016-08-17 乐视致新电子科技(天津)有限公司 一种信息提示方法及装置
CN107370981A (zh) * 2016-05-13 2017-11-21 中兴通讯股份有限公司 一种视频会议中参会人员的信息提示方法和装置
CN106331582A (zh) * 2016-09-23 2017-01-11 北京联华博创科技有限公司 远程视频开庭方法及装置
US11310294B2 (en) * 2016-10-31 2022-04-19 Microsoft Technology Licensing, Llc Companion devices for real-time collaboration in communication sessions
KR101889279B1 (ko) * 2017-01-16 2018-08-21 주식회사 케이티 음성 명령에 기반하여 서비스를 제공하는 시스템 및 방법
US10438594B2 (en) * 2017-09-08 2019-10-08 Amazon Technologies, Inc. Administration of privileges by speech for voice assistant system
US11127405B1 (en) * 2018-03-14 2021-09-21 Amazon Technologies, Inc. Selective requests for authentication for voice-based launching of applications
US10867610B2 (en) * 2018-05-04 2020-12-15 Microsoft Technology Licensing, Llc Computerized intelligent assistant for conferences
US11289100B2 (en) * 2018-10-08 2022-03-29 Google Llc Selective enrollment with an automated assistant
US20200184979A1 (en) * 2018-12-05 2020-06-11 Nice Ltd. Systems and methods to determine that a speaker is human using a signal to the speaker

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103730120A (zh) * 2013-12-27 2014-04-16 深圳市亚略特生物识别科技有限公司 电子设备的语音控制方法及***
CN106887227A (zh) * 2015-12-16 2017-06-23 芋头科技(杭州)有限公司 一种语音唤醒方法及***
US20170359467A1 (en) * 2016-06-10 2017-12-14 Glen A. Norris Methods and Apparatus to Assist Listeners in Distinguishing Between Electronically Generated Binaural Sound and Physical Environment Sound
US20180293221A1 (en) * 2017-02-14 2018-10-11 Microsoft Technology Licensing, Llc Speech parsing with intelligent assistant
CN108881783A (zh) * 2017-05-09 2018-11-23 腾讯科技(深圳)有限公司 实现多人会话的方法和装置、计算机设备和存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3869504A4

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3955099A1 (en) * 2020-08-11 2022-02-16 Beijing Xiaomi Mobile Software Co., Ltd. Method and device for controlling the operation mode of a terminal device, and storage medium
US11756545B2 (en) 2020-08-11 2023-09-12 Beijing Xiaomi Mobile Software Co., Ltd. Method and device for controlling operation mode of terminal device, and medium
CN113314115A (zh) * 2021-05-28 2021-08-27 深圳创维-Rgb电子有限公司 终端设备的语音处理方法、终端设备及可读存储介质
CN113450795A (zh) * 2021-06-28 2021-09-28 深圳七号家园信息技术有限公司 一种具有语音唤醒功能的图像识别方法及***

Also Published As

Publication number Publication date
EP3869504A1 (en) 2021-08-25
CN111258528B (zh) 2021-08-13
US20210286867A1 (en) 2021-09-16
CN111258528A (zh) 2020-06-09
EP3869504A4 (en) 2022-04-06

Similar Documents

Publication Publication Date Title
WO2020114213A1 (zh) 语音用户界面的显示方法和会议终端
US11095468B1 (en) Meeting summary service
US9329833B2 (en) Visual audio quality cues and context awareness in a virtual collaboration session
US9372543B2 (en) Presentation interface in a virtual collaboration session
US9426421B2 (en) System and method for determining conference participation
US9154531B2 (en) Systems and methods for enhanced conference session interaction
JP7222965B2 (ja) コンピュータによって実現される会議予約方法、装置、機器及び媒体
US10459985B2 (en) Managing behavior in a virtual collaboration session
US9262747B2 (en) Tracking participation in a shared media session
US8204759B2 (en) Social analysis in multi-participant meetings
US11627006B1 (en) Utilizing a virtual assistant as a meeting agenda facilitator
US10297255B2 (en) Data processing system with machine learning engine to provide automated collaboration assistance functions
US20180211223A1 (en) Data Processing System with Machine Learning Engine to Provide Automated Collaboration Assistance Functions
US11824647B2 (en) Promotion of users in collaboration sessions
US10972297B2 (en) Data processing system with machine learning engine to provide automated collaboration assistance functions
US20140241515A1 (en) Location Aware Conferencing System And Method
JP2015526933A (ja) モバイル・デバイスからの開始ディテールの送信
US20240080351A1 (en) Methods and systems for verbal polling during a conference call discussion
KR20170126667A (ko) 회의 기록 자동 생성 방법 및 그 장치
US20190386840A1 (en) Collaboration systems with automatic command implementation capabilities
US11805159B2 (en) Methods and systems for verbal polling during a conference call discussion
US11755340B2 (en) Automatic enrollment and intelligent assignment of settings
US20240021217A1 (en) Methods and systems for pre-recorded participation in a conference
JP2024093360A (ja) コミュニケーション支援システム、コミュニケーション支援装置、コミュニケーション支援方法、及びプログラム
KR20240011841A (ko) 과거 인터렉션에 기초하여 세컨더리 자동화된 어시스턴트에 관련 쿼리 제공

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19894084

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2019894084

Country of ref document: EP

Effective date: 20210520

NENP Non-entry into the national phase

Ref country code: DE