WO2024051611A1 - 人机交互方法及相关装置 - Google Patents

人机交互方法及相关装置 Download PDF

Info

Publication number
WO2024051611A1
WO2024051611A1 PCT/CN2023/116615 CN2023116615W WO2024051611A1 WO 2024051611 A1 WO2024051611 A1 WO 2024051611A1 CN 2023116615 W CN2023116615 W CN 2023116615W WO 2024051611 A1 WO2024051611 A1 WO 2024051611A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice input
user
wake
terminal
free
Prior art date
Application number
PCT/CN2023/116615
Other languages
English (en)
French (fr)
Inventor
李凌飞
沈波
任亮亮
张跃
徐平
吴奇强
吴雪晨
谭彬林
耿安峰
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2024051611A1 publication Critical patent/WO2024051611A1/zh

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation

Definitions

  • the present application relates to the field of terminal technology, and in particular to human-computer interaction methods and related devices.
  • voice interaction has become one of the commonly used and important human-computer interaction methods.
  • most voice interactions require users to wake up the terminal through a preset wake-up word first, and then implement subsequent interactions. This method is cumbersome and results in poor user experience.
  • Some manufacturers also provide a wake-up-free function, that is, there is no need to wake up the terminal in advance, and you can directly enter the predefined wake-up-free instructions.
  • the predefined wake-up-free instructions are fixed and limited, and it is easy to accidentally wake up the user while chatting. Affect user experience.
  • This application provides a human-computer interaction method and related devices, in order to improve the user's interactive experience.
  • this application provides a human-computer interaction method, which can be executed by a terminal, or can be executed by components (such as chips, chip systems, etc.) configured in the terminal, or can also be executed by capable Logic modules or software implementations that realize all or part of the terminal functions are not limited in this application.
  • the method includes: receiving a first voice input from a user; and making a corresponding response to the first voice input if it is determined that the first voice input is semantically similar to a predefined first wake-up-free instruction.
  • the above-mentioned first wake-up-free instruction is used to instruct the terminal to perform the operation corresponding to the first wake-up-free instruction without inputting a preset wake-up word.
  • the terminal receives the first voice input from the user, and when the first voice input is semantically similar to the predefined first wake-up-free instruction, the terminal makes a corresponding response to the first voice input, that is, Without waking up the terminal in advance, even if the sentence input by the user's voice is not the predefined first wake-up-free instruction, as long as it is semantically similar to the predefined first wake-up-free instruction, the terminal can respond, which is conducive to solving the predefined problem.
  • the first wake-up-free command is fixed and limited, causing the problem of terminal unresponsiveness, which in turn helps improve the user's interactive experience.
  • the above method before making a corresponding response to the first voice input, the above method further includes: confirming the semantics of the first voice input to the user.
  • the terminal After the terminal receives the first voice input, it can confirm with the user whether the semantics of the recognized first voice input are correct. This can improve the accuracy on the one hand and prevent the user from accidentally mentioning the first voice input causing the terminal to To respond, for example, if the user mentions the first voice input by mistake, a negative reply can be made when the terminal confirms to the user, so as to prevent the terminal from continuing to perform the corresponding operation, which is conducive to improving the user's experience.
  • confirming the semantics of the first voice input to the user includes: confirming the semantics of the first voice input to the user through a prompt box and/or voice broadcast.
  • the terminal can confirm the semantics of the first voice input to the user through the prompt box, which contains the semantics of the first voice input, and can also confirm the semantics of the first voice input to the user through voice broadcast, and can also confirm the semantics of the first voice input to the user through the prompt box and voice broadcast. In a combined manner, the semantics of the first voice input is confirmed to the user.
  • the above method further includes: prompting the user with a first wake-up-free instruction.
  • the terminal can also prompt the user to directly use the predefined first wake-up-free command next time.
  • the terminal may prompt the user with the first wake-up-free instruction through a prompt box and/or voice broadcast. This application does not limit the prompting method.
  • the above method before receiving the first voice input from the user, further includes: receiving a second voice input from the user; When the semantics of a wake-up-free instruction are similar, confirm the semantics of the second voice input to the user; in response to the user's operation of confirming the semantics of the second voice input, generate a message corresponding to the second voice input.
  • the second wake-up-free command when the semantics of a wake-up-free instruction are similar, confirm the semantics of the second voice input to the user; in response to the user's operation of confirming the semantics of the second voice input, generate a message corresponding to the second voice input.
  • the second wake-up-free command before receiving the first voice input from the user, the above method further includes: receiving a second voice input from the user; When the semantics of a wake-up-free instruction are similar, confirm the semantics of the second voice input to the user; in response to the user's operation of confirming the semantics of the second voice input, generate a message corresponding to the second voice input
  • the terminal learns and generates a second wake-up-free instruction through the above method.
  • the second wake-up-free instruction can be used to instruct the terminal to perform a corresponding operation without inputting a preset wake-up word.
  • the terminal receives the second voice input from the user, if the second voice input is semantically similar to the first wake-up-free instruction, it will confirm with the user whether it is the above semantics. If the user confirms the semantics of the second voice input, If correct, a corresponding second wake-up-free instruction is generated, so that when the terminal is not awakened in advance next time and receives the second voice input again, it can respond to it.
  • the number of wake-up-free instructions that can be used to instruct the terminal to perform corresponding operations without inputting a preset wake-up word is greatly increased, which in turn helps improve the user's interactive experience.
  • the first voice input is semantically similar to the predefined first wake-up-free instruction, including: the first voice input is the same as the second wake-up-free instruction.
  • the terminal After receiving the first voice input, the terminal determines whether the first voice input and the first wake-up-free instruction are semantically similar.
  • One way is to perform semantic analysis to determine the two based on the first voice input and the predefined first wake-up-free instruction. Are the semantics similar?
  • Another way is that the terminal can determine whether the first voice input and the generated second wake-up-free instruction are the same.
  • the second wake-up-free instruction is an instruction generated based on the second voice input that is semantically similar to the first wake-up-free instruction.
  • the first voice input is the same as the generated second wake-up-free instruction, the first voice input is semantically similar to the first wake-up-free instruction, so that the terminal can also respond to the first voice input.
  • the above two methods can be used in combination or separately, which greatly improves the terminal's flexibility in determining whether the first voice input and the first wake-up-free instruction are semantically similar.
  • receiving the second voice input from the user includes: receiving the second voice input multiple times continuously within a preset time range.
  • the terminal when the terminal receives the second voice input multiple times continuously within the preset time range, it will confirm the semantics of the second voice input to the user. In this way, the user can effectively avoid accidentally mentioning the second voice input. In this case, the terminal mistakenly thinks that the user wants to perform the corresponding operation, which is beneficial to improving the user's interactive experience.
  • this application provides a human-computer interaction method, which can be executed by a terminal, or it can also be executed by components (such as chips, chip systems, etc.) configured in the terminal, or it can also be executed by capable Logic modules or software implementations that realize all or part of the terminal functions are not limited in this application.
  • the method includes: receiving a first voice input from the user; and making a corresponding response to the first voice input if the preset wake-up word is not received but the first voice input contains the target object,
  • the target object is an object whose number of mentions reaches a preset threshold in other voice inputs received before the first voice input, and the preset wake-up word is used to wake up the terminal.
  • the terminal when the terminal is not awakened in advance, after receiving the first voice input from the user, if the first voice input contains an object whose number of mentions in the previous voice input reaches the preset threshold, then the It responds accordingly, that is, by learning from previous voice inputs and saving target objects whose mention times have reached a preset threshold, as long as the received voice input contains the above target objects, even if the terminal is not woken up in advance, The terminal can also respond accordingly, which saves the time to wake up the terminal, simplifies the interaction process, and helps improve the user's interactive experience.
  • the above method before receiving the first voice input from the user, the above method further includes: receiving a preset wake-up word from the user; receiving a second wake-up word from the user. Voice input; when the number of times the first object contained in the second voice input is mentioned in the second voice input and its previous voice input exceeds a preset threshold, the first object is determined as the target object.
  • the terminal may record the number of times the first object is mentioned in the voice input. If the number of times the first object is mentioned in the voice input exceeds a preset threshold, it is determined as the target object so that the user can subsequently refer to it without prior notice.
  • voice input containing the target object is sent out. After the terminal receives the above voice input, it can respond. That is, there is no need to wake up the terminal in advance, which simplifies the interaction process and helps improve the user's interactive experience.
  • the above method further includes: based on the target object, generating a wake-up-free instruction including the target object; and prompting the user for the wake-up-free instruction.
  • the terminal can generate a wake-up-free instruction including the target object based on the target object, and prompt the user that the above wake-up-free instruction can be used directly next time, without waking up the terminal in advance, and the terminal can respond accordingly.
  • the terminal may prompt the user with the above wake-up-free instruction through a prompt box and/or voice broadcast. This application does not limit the prompting method.
  • this application provides a human-computer interaction method, which can be executed by a terminal, or can be executed by components (such as chips, chip systems, etc.) configured in the terminal, or can also be executed by capable Logic module or software that implements all or part of the terminal functions Software implementation, this application does not limit this.
  • the method includes: receiving a first voice input from the user, the first voice input belonging to a first instruction set, and the instructions in the first instruction set are semantically similar to predefined wake-up-free instructions; condition, respond to the first voice input.
  • the terminal After the terminal receives the first voice input that is semantically similar to the predefined wake-up-free instruction, it responds to the first voice input when the preset conditions are met. That is to say, for the predefined wake-up-free command The terminal will respond accordingly only if the first voice input with similar command semantics meets the preset conditions, which may not be possible in all circumstances. This can prevent the user from mistakenly mentioning the first voice input and causing the terminal to respond. It is conceivable that the first voice input may be more colloquial than the predefined wake-up-free instructions. If a response is made under any circumstances, it is likely that the terminal response will be frequently triggered during the user's conversation. Therefore, by setting the preset Conditions, the terminal will respond accordingly only when the preset conditions are met, which will greatly improve the user's interactive experience.
  • the above-mentioned preset conditions include at least one of the following: the number of users within a preset range from the terminal does not exceed a threshold; the user is in a predefined location ; The user from whom the first voice input comes does not belong to the preset group; or, the time when the first voice input is received falls within the preset period.
  • the number of users within the preset range from the terminal does not exceed the threshold, that is, when the number of users within the preset range from the terminal is small, the above first voice input can be responded to. It is not difficult to understand that, If the number of surrounding users is small, the possibility of the user mistakenly mentioning the first voice input is smaller. That is, the user may really want the terminal to perform the corresponding operation. On the contrary, if the number of surrounding users is large, the user mistakenly mentions the first voice input. And the greater the possibility of first voice input.
  • the user is in a predefined position. For example, the terminal responds to the first voice input from the user closest to the user, or the user is in a scenic spot and hopes that the terminal is more likely to improve service.
  • the terminal responds to the first voice input from the user.
  • the user from whom the first voice input comes does not belong to the preset group, such as children, the elderly, etc. It is understandable that for the preset group, instructions issued by them may be dangerous, and the terminal may not respond to them.
  • the time when the first voice input is received falls within a preset period.
  • the preset period can be, for example, working hours. During these periods, the terminal can respond to the above-mentioned first voice input. If it is other time periods, the terminal can only respond to predefined wake-up-free instructions. In summary, the above preset conditions can effectively prevent the user from mistakenly mentioning the first voice input, causing the terminal to respond.
  • the above method is applied to a car, and the number of users within a preset range from the terminal does not exceed a threshold, including: there is a passenger in the car; or , the above-mentioned user is in a predefined position, including: the user is in the main driving position.
  • the present application provides a human-computer interaction method, which can be executed by a terminal, or can be executed by components (such as chips, chip systems, etc.) configured in the terminal, or can also be executed by capable Logic modules or software implementations that realize all or part of the terminal functions are not limited in this application.
  • the method includes: without receiving a preset wake-up word from the user, determining according to the first voice input from the user that the first voice input is used to request navigation; and asking the user for the purpose of requesting navigation. location; provide navigation services to users based on destinations fed back by users.
  • the terminal without waking up the terminal in advance, after the terminal receives the first voice input from the user and finds that its intention is to request navigation, it can ask the user for the navigation destination and determine the purpose according to the user's feedback. It provides navigation services to users without waking up the terminal in advance, which simplifies the interaction process and helps improve the user's interactive experience.
  • the above method further includes: generating a wake-up-free instruction including a destination; and prompting the user for the wake-up-free instruction.
  • the terminal can generate a wake-up-free command including the above destination, and prompt the user that the above wake-up-free command can be used directly next time, and the terminal can respond accordingly.
  • the terminal may prompt the user with the above wake-up-free instruction through a prompt box and/or voice broadcast. This application does not limit the prompting method.
  • this application provides a human-computer interaction method, which can be executed by a terminal, or can also be executed by components (such as chips, chip systems, etc.) configured in the terminal, or can also be executed by capable Logic modules or software implementations that realize all or part of the terminal functions are not limited in this application.
  • the method includes: receiving a first voice input from the user, which does not belong to the predefined wake-up-free instructions; and receiving the first wake-up-free instruction among the first voice input and the predefined wake-up-free instructions. If the semantics are similar, the user is guided to input the above-mentioned first wake-up-free instruction.
  • the terminal receives the first voice input.
  • the first voice input does not belong to the predefined wake-up-free instructions, but the first voice input is semantically similar to the first wake-up-free instruction among the predefined wake-up-free instructions.
  • the terminal guides the user to enter the corresponding first free
  • the wake-up command allows the terminal to respond accordingly after the user inputs the first wake-up-free command. Compared with the terminal not responding or prompting, the user's interactive experience can be greatly improved.
  • the above-mentioned guiding the user to input the first wake-up-free instruction includes: guiding the user to input the first wake-up-free instruction through a prompt box and/or a voice broadcast.
  • the terminal can guide the user to enter the first wake-up-free instruction through a prompt box, which contains the first wake-up-free instruction. It can also guide the user to enter the first wake-up-free instruction through voice broadcast, or it can also combine the prompt box and voice broadcast. , guiding the user to enter the first wake-up-free command.
  • the above-mentioned guiding the user to input the first wake-up-free instruction through a prompt box and/or voice broadcast includes: prompting the user to input the first wake-up-free instruction through a prompt box , the prompt box contains the first wake-up-free command; when the number of prompts through the prompt box reaches the preset threshold within the preset time period, but the user does not issue the first wake-up-free command, the user is guided to enter the third wake-up command through voice broadcast.
  • the present application provides a computer device, including a unit for implementing the method in any one of the first to fifth aspects and any possible implementation manner of the first to fifth aspects. It should be understood that each unit can implement the corresponding function by executing a computer program.
  • the present application provides a computer device, including a processor configured to execute the method described in any one of the first to fifth aspects and any possible implementation manner of the first to fifth aspects.
  • the computer device may further include a memory for storing computer readable instructions, and the processor reads the computer readable instructions so that the computer device can implement the methods described in the above aspects.
  • the computer device may also include a communication interface for the computer device to communicate with other devices.
  • the communication interface may be a transceiver, a circuit, a bus, a module or other types of communication interfaces.
  • this application provides a vehicle for implementing the method in any of the first to fifth aspects and any possible implementation manner of the first to fifth aspects, or including the sixth aspect or the seventh aspect. Any of the computer equipment described above.
  • the present application provides a chip system, which includes at least one processor and is used to support the implementation of any of the above-mentioned first to fifth aspects and any possible implementation manner of the first to fifth aspects.
  • the functions involved for example, include receiving or processing data and/or information involved in the above methods.
  • the chip system further includes a memory, the memory is used to store program instructions and data, and the memory is located within the processor or outside the processor.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • the present application provides a computer-readable storage medium.
  • Computer-readable instructions are stored in the storage medium.
  • the computer-readable instructions When executed by a computer, the computer implements the first to fifth aspects. and the method in any possible implementation manner of the first aspect to the fifth aspect.
  • the present application provides a computer program product.
  • the computer program product includes: computer readable instructions.
  • the computer implements the first to fifth aspects and The method in any possible implementation manner of the first aspect to the fifth aspect.
  • Figure 1 is a schematic structural diagram of a terminal provided by an embodiment of the present application.
  • Figure 2 is a schematic diagram of a scenario applicable to the human-computer interaction method provided by the embodiment of the present application.
  • Figure 3 is a schematic diagram of a known human-computer interaction method
  • Figure 4 is a schematic diagram of another known human-computer interaction method
  • Figure 5 is a schematic flow chart of the first human-computer interaction method provided by the embodiment of the present application.
  • Figure 6 is an interaction schematic diagram of the first human-computer interaction method provided by the embodiment of the present application.
  • Figure 7 is a schematic flowchart of learning speech input terms provided by an embodiment of the present application.
  • Figure 8 is another schematic flowchart of learning speech input terms provided by an embodiment of the present application.
  • Figure 9 is a schematic flow chart of the second human-computer interaction method provided by the embodiment of the present application.
  • Figure 10 is an interaction schematic diagram of the second human-computer interaction method provided by the embodiment of the present application.
  • Figure 11 is a schematic flowchart of learning the first object in the second voice input provided by an embodiment of the present application.
  • Figure 12 is a schematic flow chart of the third human-computer interaction method provided by the embodiment of the present application.
  • Figure 13 is a schematic flowchart of determining whether to respond to the first voice input according to the scenario provided by the embodiment of the present application;
  • Figure 14 is a schematic flow chart of the fourth human-computer interaction method provided by the embodiment of the present application.
  • Figure 15 is an interactive schematic diagram for guiding a user to issue a first wake-up-free instruction provided by an embodiment of the present application
  • Figure 16 is a schematic flow chart of the fifth human-computer interaction method provided by the embodiment of the present application.
  • Figure 17 is an interaction schematic diagram of the fifth human-computer interaction method provided by the embodiment of the present application.
  • the methods provided by the embodiments of this application can be applied to mobile phones, tablet computers, smart watches, smart speakers, wearable devices, vehicle-mounted devices, augmented reality (AR)/virtual reality (VR) devices, notebook computers, On terminals such as personal computers (PCs), ultra-mobile personal computers (UMPCs), netbooks, personal digital assistants (PDAs), and distributed devices.
  • AR augmented reality
  • VR virtual reality
  • PCs personal computers
  • UMPCs ultra-mobile personal computers
  • PDAs personal digital assistants
  • distributed devices such as personal computers (PCs), ultra-mobile personal computers (UMPCs), netbooks, personal digital assistants (PDAs), and distributed devices.
  • FIG. 1 shows a schematic structural diagram of a terminal 100.
  • the terminal 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, and a battery 142 , Antenna 1, Antenna 2, mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone interface 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193 , display screen 194, and subscriber identification module (subscriber identification module, SIM) card interface 195, etc.
  • SIM subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light. Sensor 180L, bone conduction sensor 180M, etc.
  • the processor 110 may include one or more processing units.
  • the processor 110 may include an application processor (application processor, AP), a microcontroller unit (microcontroller unit, MCU), a modem processor, a graphics processor (graphics processor).
  • processing unit GPU
  • image signal processor ISP
  • controller memory
  • video codec digital signal processor
  • DSP digital signal processor
  • baseband processor neural network processor
  • the application processor outputs sound signals through the audio module 170 (such as the speaker 170A, etc.), or displays images or videos through the display screen 194 .
  • the controller may be the nerve center and command center of the terminal 100.
  • the controller can generate operation control signals based on the instruction operation code and timing signals to complete the control of fetching and executing instructions.
  • the processor 110 may also be provided with a memory for storing instructions and data.
  • the memory in processor 110 is cache memory. This memory may hold instructions or data that have been recently used or recycled by processor 110 . If the processor 110 needs to use the instructions or data again, it can be called directly from the memory. Repeated access is avoided and the waiting time of the processor 110 is reduced, thus improving the efficiency of the system.
  • the processor 110 can perform different operations to implement different functions by executing instructions.
  • the instruction may be an instruction pre-stored in the memory before the device leaves the factory, or it may be an instruction read from the APP after the user installs a new application (APP) during use. This is not the case in the embodiments of this application. Any limitations.
  • processor 110 may include one or more interfaces.
  • Interfaces may include integrated circuit (inter-integrated circuit, I2C) interface, integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, secure digital input and output (SDIO), pulse code modulation (pulse code modulation (PCM) interface, universal asynchronous receiver/transmitter (UART) interface, universal synchronous asynchronous receiver/transmitter (USART), mobile industry processor interface , MIPI), general-purpose input/output (GPIO) interface, SIM interface and/or USB interface, etc.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • SDIO secure digital input and output
  • PCM pulse code modulation
  • UART universal asynchronous receiver/transmitter
  • USBART universal synchronous asynchronous receiver/transmitter
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • the USB interface 130 is an interface that complies with the USB standard specification, and may be a Mini USB interface, a Micro USB interface, a USB Type C interface, etc.
  • the USB interface 130 can be used to connect a charger to charge the terminal 100, and can also be used to transmit data between the terminal 100 and peripheral devices. It can also be used to connect headphones to play audio through them. This interface can also be used to connect other terminals.
  • the interface connection relationships between the modules illustrated in this application are only schematic illustrations and do not constitute a structural limitation on the terminal 100 .
  • the terminal 100 may also adopt different interface connection methods in the above embodiments, or a combination of multiple interface connection methods.
  • the charging management module 140 is used to receive charging input from the charger.
  • the charger can be a wireless charger or a wired charger.
  • the charging management module 140 may receive charging input from the wired charger through the USB interface 130 .
  • the charging management module 140 may receive wireless charging input through the wireless charging coil of the terminal 100 . While charging the battery 142, the charging management module 140 can also provide power to the terminal through the power management module 141.
  • the power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110.
  • the power management module 141 receives input from the battery 142 and/or the charging management module 140 and supplies power to the processor 110, internal memory 121, external memory, display screen 194, camera 193, wireless communication module 160, etc.
  • the power management module 141 can also be used to monitor battery capacity, battery cycle times, battery health status (leakage, impedance) and other parameters.
  • the power management module 141 may also be provided in the processor 110 .
  • the power management module 141 and the charging management module 140 may also be provided in the same device.
  • the wireless communication function of the terminal 100 can be implemented through the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor and the baseband processor.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in terminal 100 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • Antenna 1 can be reused as a diversity antenna for a wireless LAN. In other embodiments, antennas may be used in conjunction with tuning switches.
  • the mobile communication module 150 can provide wireless communication solutions including 2G/3G/4G/5G applied to the terminal 100.
  • the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA), etc.
  • the mobile communication module 150 can receive electromagnetic waves through the antenna 1, perform filtering, amplification and other processing on the received electromagnetic waves, and transmit them to the modem processor for demodulation.
  • the mobile communication module 150 can also amplify the signal modulated by the modem processor and convert it into electromagnetic waves through the antenna 1 for radiation.
  • at least part of the functional modules of the mobile communication module 150 may be disposed in the processor 110 .
  • at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be provided in the same device.
  • a modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low-frequency baseband signal to be sent into a medium-high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal.
  • the demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the application processor outputs sound signals through audio devices (not limited to speaker 170A, receiver 170B, etc.), or displays images or videos through display screen 194.
  • the modem processor may be a stand-alone device.
  • the modem processor may be independent of the processor 110 and may be provided in the same device as the mobile communication module 150 or other functional modules.
  • the wireless communication module 160 can provide applications on the terminal 100 including wireless local area networks (WLAN), such as wireless fidelity (wireless fidelity, Wi-Fi), Bluetooth (bluetooth, BT), and global navigation satellite systems. Wireless communication solutions such as global navigation satellite system (GNSS), frequency modulation (FM), near field communication (NFC), infrared technology (infrared, IR), etc.
  • WLAN wireless local area networks
  • GNSS global navigation satellite system
  • FM frequency modulation
  • NFC near field communication
  • IR infrared technology
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 .
  • the wireless communication module 160 can also receive the signal to be sent from the processor 110, frequency modulate it, amplify it, and convert it into electromagnetic waves through the antenna 2 for radiation.
  • the antenna 1 of the terminal 100 is coupled to the mobile communication module 150, and the antenna 2 is coupled to the wireless communication module 160, so that the terminal 100 can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time-division code division multiple access (TD-SCDMA), long term evolution (long term evolution, LTE), fifth generation (5th generation, 5G) communication system, BT, GNSS, WLAN, NFC, FM, and/or IR technology, etc.
  • GSM global system for mobile communications
  • GPRS general packet radio service
  • CDMA code division multiple access
  • WCDMA broadband Code division multiple access
  • TD-SCDMA time-division code division multiple access
  • LTE long term evolution
  • 5th generation, 5G may include a global positioning system (GPS), GNSS, BeiDou navigation satellite system (BDS), quasi-
  • the terminal 100 can implement display functions through a GPU, a display screen 194, an application processor, and the like.
  • the GPU is an image processing microprocessor and is connected to the display screen 194 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
  • the display screen 194 is used to display images, videos, etc.
  • Display 194 includes a display panel.
  • the display panel can use a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode or an active matrix organic light emitting diode (active-matrix organic light emitting diode).
  • terminal 100 may include one or more display screens 194.
  • the display screen 194 can be used to display a prompt box, which contains a predefined wake-up-free instruction.
  • the prompt box is used to prompt the user to directly use the above-mentioned wake-up-free instruction next time, that is, no need to wake up in advance.
  • the terminal can realize voice interaction with the terminal through the above wake-up-free command.
  • the terminal 100 can implement the shooting function through the ISP, camera 193, video codec, GPU, display screen 194, application processor, etc.
  • Camera 193 is used to capture still images or video.
  • the object passes through the lens to produce an optical image that is projected onto the photosensitive element.
  • the photosensitive element can be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then passes the electrical signal to the ISP to convert it into a digital image signal.
  • ISP outputs digital image signals to DSP for processing.
  • terminal 100 may include one or more cameras 193.
  • Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the terminal 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy.
  • Video codecs are used to compress or decompress digital video.
  • Terminal 100 may support one or more video codecs.
  • the terminal 100 can play or record videos in multiple encoding formats, such as moving picture experts group (MPEG) 1, MPEG2, MPEG3, MPEG4, etc.
  • MPEG moving picture experts group
  • the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the terminal 100.
  • the external memory card communicates with the processor 110 through the external memory interface 120 to implement the data storage function. Such as saving music, videos, etc. files in external memory card.
  • Internal memory 121 may be used to store computer executable program code, which includes instructions.
  • the processor 110 executes instructions stored in the internal memory 121 to execute various functional applications and data processing of the terminal 100 .
  • the internal memory 121 may include a program storage area and a data storage area. Among them, the stored program area can store an operating system, at least one application program required for a function (such as a sound playback function, an image playback function, etc.).
  • the storage data area may store data created during use of the terminal 100 (such as audio data, phone book, etc.).
  • the internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as at least one disk storage device, flash memory device, universal flash storage (UFS), etc.
  • the terminal 100 can implement audio functions through the audio module 170, such as the speaker 170A, the receiver 170B, the microphone 170C, the headphone interface 170D, and the application processor. For example, music playback, recording, etc.
  • the audio module 170 is used to convert digital audio information into analog audio signal output, and is also used to convert analog audio input into digital audio signals. Audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110 , or some functional modules of the audio module 170 may be provided in the processor 110 .
  • Speaker 170A also called “speaker” is used to convert audio electrical signals into sound signals.
  • the terminal 100 can listen to music through the speaker 170A, or listen to a hands-free call.
  • Receiver 170B also called “earpiece” is used to convert audio electrical signals into sound signals.
  • the voice can be heard by bringing the receiver 170B close to the human ear.
  • Microphone 170C also called “microphone” or “microphone” is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can speak close to the microphone 170C with the human mouth and input the sound signal to the microphone 170C.
  • Terminal 100 can be set At least one microphone 170C. In other embodiments, the terminal 100 may be provided with two microphones 170C, which in addition to collecting sound signals, may also implement a noise reduction function. In other embodiments, the terminal 100 can also be equipped with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions, etc.
  • the microphone 170C can be used to receive voice input from the user, that is, can be used to collect sound signals from the user.
  • the headphone interface 170D is used to connect wired headphones.
  • the headphone interface 170D can be a USB interface 130, or a 3.5mm open mobile terminal platform (OMTP) standard interface, or a cellular telecommunications industry association of the USA (CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association of the USA
  • the buttons 190 include a power button (also called a power button), a volume button, etc.
  • the button 190 may be a mechanical button or a touch button.
  • the terminal 100 may receive key input and generate key signal input related to user settings and function control of the terminal 100.
  • the motor 191 can generate vibration prompts.
  • the motor 191 can be used for vibration prompts for incoming calls and can also be used for touch vibration feedback.
  • touch operations for different applications can correspond to different vibration feedback effects.
  • the motor 191 can also respond to different vibration feedback effects for touch operations in different areas of the display screen 194 .
  • Different application scenarios such as time reminders, receiving information, alarm clocks, games, etc.
  • the touch vibration feedback effect can also be customized.
  • the indicator 192 may be an indicator light, which may be used to indicate charging status, power changes, or may be used to indicate messages, missed calls, notifications, etc.
  • the SIM card interface 195 is used to connect a SIM card.
  • the SIM card can be connected to or separated from the terminal 100 by inserting it into the SIM card interface 195 or pulling it out from the SIM card interface 195 .
  • the terminal 100 may support one or more SIM card interfaces.
  • SIM card interface 195 can support Nano SIM card, Micro SIM card, SIM card, etc. Multiple cards can be inserted into the same SIM card interface 195 at the same time. The types of the plurality of cards may be the same or different.
  • the SIM card interface 195 is also compatible with different types of SIM cards.
  • the SIM card interface 195 is also compatible with external memory cards.
  • the terminal 100 interacts with the network through the SIM card to implement functions such as calls and data communications.
  • the terminal 100 adopts eSIM, that is, an embedded SIM card.
  • the eSIM card can be embedded in the terminal 100 and cannot be separated from the terminal 100.
  • the terminal 100 may include more or fewer components than shown, or some components may be combined, or some components may be separated, or may be arranged differently.
  • the components illustrated may be implemented in hardware, software, or a combination of software and hardware.
  • FIG 2 is a schematic diagram of a scenario applicable to the method provided by the embodiment of this application.
  • the user can input operations that he wants the terminal to perform through voice to achieve interaction with the terminal (a mobile phone is used as an example in Figure 2).
  • voice interaction has become one of the important and commonly used human-computer interaction methods.
  • the user can first wake up the terminal through the preset wake-up word.
  • the user can first wake up the voice assistant (or smart assistant, intelligent assistant, etc., this application does not limit this) through the preset wake-up word. , and then achieve subsequent interactions.
  • Some manufacturers also provide a wake-up-free function, that is, users do not need to wake up the voice assistant in advance and can directly interact with the terminal through predefined wake-up-free instructions.
  • the predefined wake-up-free instructions are fixed and limited. If the wake-up-free instructions input by the user's voice are inaccurate, the terminal will become unresponsive and the user experience will be poor.
  • Figure 3 shows a known human-computer interaction method.
  • the user wakes up the terminal through a preset wake-up word in advance. More specifically, the user first wakes up the voice assistant in the terminal through a preset wake-up word.
  • the wake-up word is "little Yi Xiaoyi", in response to the user inputting "Xiaoyi Xiaoyi” through voice, the voice assistant replies "I am here”.
  • the user inputs "Navigate to location A" through voice.
  • the voice assistant replies "Okay, let's start navigating for you” and displays the route to location A through the user interface. . It can be seen that the entire interaction process is relatively cumbersome, resulting in poor user experience.
  • FIG 4 shows another known human-computer interaction method.
  • users can directly voice input predefined wake-up-free instructions to interact with the terminal.
  • the user voice inputs "Navigate to the company", and in response to the user's voice input of "Navigate to the company", the terminal displays the route to the company through the user interface, in which the location of the user's company is pre-stored on the terminal.
  • the voice assistant will not respond.
  • the predefined wake-up-free instructions are fixed and limited, which may cause the voice assistant to be unable to respond to the user’s voice input. entry, resulting in poor user experience.
  • this application provides a human-computer interaction method.
  • the method includes: when the terminal receives the first voice input from the user and is semantically similar to the predefined first wake-up-free instruction. , respond accordingly to the first voice input, that is, without waking up the terminal in advance, even if the sentence input by the user's voice is not the predefined first wake-up-free instruction, as long as it is consistent with the predefined first wake-up-free instruction. If the command semantics are similar, the terminal can recognize and respond, which will help alleviate the problem of terminal unresponsiveness caused by the fixed and limited predefined first wake-up-free command, which will further help improve the user's voice interaction experience.
  • words such as “first” and “second” are used to distinguish identical or similar items with basically the same functions and effects.
  • the first voice input and the second voice input are only used to distinguish different voice inputs, and their order is not limited.
  • words such as “first” and “second” do not limit the number and position, and words such as “first” and “second” do not limit the number and position.
  • "at least one item” refers to one item or multiple items.
  • “And/or” describes the association of associated objects, indicating that there can be three relationships, for example, A and/or B, which can mean: A exists alone, A and B exist simultaneously, and B exists alone, where A, B can be singular or plural.
  • the character “/” generally indicates that the related objects are in an “or” relationship, but it does not exclude the situation that the related objects are in an “and” relationship. The specific meaning can be understood based on the context.
  • the embodiments shown below can be executed by the terminal, or can also be executed by components configured in the terminal (such as chips, chip systems, etc.), or can also be executed by logic modules that can realize all or part of the terminal functions. Or software implementation, which is not limited in the embodiments of this application.
  • the terminal may have a structure as shown in FIG. 1 , or may have more or less structures than in FIG. 1 , which is not limited in the embodiments of the present application.
  • Figure 5 is a schematic flow chart of the first human-computer interaction method provided by the embodiment of the present application. As shown in Figure 5, the method 500 may include step 501 and step 502. Each step shown in Figure 5 will be described in detail below.
  • Step 501 Receive first voice input from the user.
  • the first voice input may be a voice input received by the terminal from the user without the user waking up the terminal in advance.
  • a first voice input is received from the user.
  • the first voice input may be, for example, "navigate to location A”, “navigate to the company”, “leave to work”, “play a song” B”, “I want to listen to song B”, etc.
  • the embodiments of this application do not place any restrictions on the specific content of the first voice input.
  • Step 502 If the first voice input is semantically similar to the predefined first wake-up-free instruction, make a corresponding response to the first voice input.
  • the first wake-up-free instruction is used to instruct the terminal to perform an operation corresponding to the first wake-up-free instruction without inputting a preset wake-up word.
  • One possible implementation method is that after the terminal receives the first voice input from the user, it performs semantic analysis on the first voice input and the predefined first wake-up-free instruction based on natural language processing (NLP).
  • NLP natural language processing
  • the terminal determines whether the first voice input belongs to the second wake-up-free instruction based on learning the voice input, wherein the The second wake-up-free instruction has similar semantics to the predefined first wake-up-free instruction. That is to say, the second wake-up-free instruction has similar semantics to the predefined first wake-up-free instruction, but uses different terms.
  • the predefined first wake-up-free command is "Navigate to the company”
  • the second wake-up-free command obtained based on voice input learning is "Go to work”.
  • the semantics of the two are similar, but the terms are different.
  • the second wake-up-free command is More colloquially, the first wake-up-free command is a standard human-computer interaction term.
  • the terminal makes a response to the first voice input. Respond accordingly.
  • the terminal can first determine whether the first voice input belongs to the second wake-up-free instruction based on learning the voice input, such as If it belongs, respond to it. If it does not belong, further semantic analysis is performed on the first voice input and the predefined first wake-up-free instruction based on NLP. If the first voice input is semantically similar to the predefined first wake-up-free instruction, the terminal makes a corresponding response to the first voice input; if they are not similar, the terminal does not make a corresponding response to the first voice input.
  • the above-mentioned predefined first wake-up-free instruction and/or the first wake-up-free instruction obtained based on learning of voice input can be stored in the instruction library.
  • the terminal After receiving the first voice input, the terminal determines whether to respond accordingly based on the first wake-up-free command and the second wake-up-free command stored in the command library. If the first voice input is semantically similar to the first wake-up-free instruction, the terminal responds accordingly to the first voice input.
  • the above method further includes: confirming the semantics of the first voice input to the user.
  • the terminal After the terminal receives the first voice input from the user, if the first voice input is semantically similar to the predefined first wake-up-free instruction, the terminal asks the user whether the above semantics is correct. If the user replies that the above semantics are correct, , then the terminal responds accordingly to the first voice input.
  • the terminal can ask the user whether the semantics are correct through voice broadcast, and can also ask the user whether the semantics are correct through a prompt box (such as toast).
  • the above prompt box contains the semantics of the above-mentioned first voice input, or it can also ask the user through a prompt box (such as toast).
  • Prompt boxes (such as toast) and voice broadcasts are used to ask users whether the semantics are correct.
  • the embodiments of this application do not limit the method used by the terminal to query the user for semantics.
  • the above method further includes: prompting the user with a first wake-up-free instruction. That is to say, in addition to performing the operation indicated by the first voice input (such as navigating to the company), the terminal can also prompt the user to directly use the predefined first wake-up-free instruction next time to instruct the terminal to perform the corresponding operation.
  • Figure 6 is an interaction schematic diagram of the first human-computer interaction method provided by the embodiment of the present application.
  • the terminal in response to the user's voice input of "Go to work", the terminal asks "Do you want to navigate to the company?", the user voice replies "Yes", and in response to the user's reply, the terminal displays Go to via the user interface. Company route.
  • the terminal shown in Figure 6 asks the user "Do you want to navigate to the company" through voice broadcasting is only an example, and should not constitute any limitation on the embodiment of the present application.
  • the terminal can also ask the user "Do you want to navigate to the company” through a prompt box (such as a toast), or can also ask the user "Do you want to navigate to the company” through a prompt box (such as a toast) plus a voice broadcast? Navigate to the company?”
  • a prompt box such as a toast
  • the terminal can also prompt the user to directly use the predefined first wake-up-free instruction next time through a prompt box and/or a voice broadcast. As shown in Figure 6, the terminal prompts the user to "try navigating to the company next time” through voice broadcast.
  • the above method before receiving the first voice input from the user, further includes: receiving a second voice input from the user; and if the second voice input is semantically similar to the first wake-up-free instruction, confirming the second voice input to the user. Semantics of the second voice input; in response to the user's operation of confirming the semantics of the second voice input, generating a second wake-up-free instruction corresponding to the second voice input.
  • the terminal determines whether the predefined first wake-up-free instruction contains instructions that are semantically similar to the above-mentioned second voice input. For example, semantic analysis of the two can be performed based on NLP. , if it is determined that the above-mentioned second voice input has similar semantics to a certain predefined first wake-up-free instruction, the user is asked whether the above-mentioned semantics is correct. If the user replies that the above-mentioned semantics is correct, the terminal generates a message corresponding to the second voice input. The second wake-up-free command. In addition, the terminal can also save the second voice input in the command library.
  • receiving the second voice input from the user includes: receiving the above-mentioned second voice input multiple times continuously within a preset time range. That is, if the terminal receives the above-mentioned second voice input multiple times continuously within a preset time range, the terminal then confirms the semantics of the second voice input to the user. In this way, it can effectively prevent the user from mistakenly mentioning the second voice input in the chat conversation, causing the terminal to respond, thereby improving the user's experience.
  • the terminal continuously receives the above voice input. Afterwards, the user is asked “Do you want to navigate to the company” through a prompt box and/or voice broadcast (for example, see Figure 6), and in response to the user's confirmation operation, the route to the company is displayed through the user interface. .
  • the terminal can also prompt the user to directly use the first wake-up-free command next time through a prompt box and/or voice broadcast. As shown in Figure 6, the terminal prompts the user to "try navigating to the company next time” through voice broadcast.
  • FIG. 7 is a schematic flowchart of learning speech input terms provided by an embodiment of the present application.
  • Step 701 Receive second voice input from the user.
  • the terminal receives a second voice input from the user.
  • the second voice input includes: “Go to work”, “Is there a traffic jam on the road?", “Avoid the congested road”, “Choose a smooth road”, etc., which will not be listed here.
  • Step 702 Determine whether the second voice input is semantically similar to the predefined first wake-up-free instruction.
  • the terminal After receiving the second voice input from the user, the terminal determines whether the predefined first wake-up-free instruction contains an instruction semantically similar to the above-mentioned second voice input. If it is determined that the predefined first wake-up-free instruction does not include the above-mentioned If the second voice input has semantically similar instructions, step 703 is executed, that is, the second voice input is not responded to; if the second voice input is semantically similar to a first wake-up-free instruction, step 704 is executed, that is, the user is asked Whether the above-mentioned second speech input is the above-mentioned semantics.
  • Step 703 Do not respond to the second voice input.
  • Step 704 Ask the user whether the second voice input is the above semantics.
  • the terminal does not respond to the above-mentioned second voice input; if the user's reply to the above-mentioned second voice input is the above-mentioned semantics, the terminal executes step 705.
  • the terminal can ask the user through voice broadcast, or through a prompt box (such as toast), or through a prompt box (such as toast) plus voice broadcast. This application does not place any restrictions on the terminal's query method.
  • Step 705 Generate a second wake-up-free instruction and respond to the second voice input.
  • the terminal determines the second voice input as the second wake-up-free command, saves it in the command library, and responds to the second voice input.
  • the terminal can also prompt the user to directly use the first wake-up-free instruction next time through a prompt box and/or voice broadcast.
  • the terminal can also prompt the user to directly use the first wake-up-free instruction next time through a prompt box and/or voice broadcast.
  • FIG. 8 is another schematic flowchart of learning speech input terms provided by an embodiment of the present application.
  • the method shown in Figure 8 is a method in which the terminal triggers an inquiry to the user after receiving the second voice input multiple times in succession.
  • Step 801 Receive second voice input from the user.
  • the terminal receives a second voice input from the user.
  • the second voice input includes: “Go to work”, “Is there a traffic jam on the road?", “Avoid the congested road”, “Choose a smooth road”, etc., which will not be listed here.
  • Step 802 Determine whether the second voice input is received multiple times continuously.
  • the terminal After receiving the second voice input from the user, the terminal determines whether the second voice input is received multiple times continuously within a preset time range. If the terminal receives the second voice input multiple times continuously within the preset time range, step 804 is executed again; otherwise, the terminal executes step 803, that is, it does not respond to the second voice input.
  • Step 803 Do not respond to the second voice input.
  • Step 804 Ask the user whether the second voice input is the above semantics.
  • the terminal does not respond to the above second voice input; if the user's reply to the second voice input is the above semantics, the terminal executes step 805.
  • the terminal can ask the user through voice broadcast, or through a prompt box (such as toast), or through a prompt box (such as toast) plus voice broadcast. This application does not place any restrictions on the terminal's query method.
  • Step 805 Generate a second wake-up-free instruction and respond to the second voice input.
  • the terminal determines the second voice input as the second wake-up-free command, saves it in the command library, and responds to the second voice input.
  • the terminal can also prompt the user to directly use the predefined first wake-up-free phrase next time through a prompt box and/or a voice broadcast.
  • the terminal receives the first voice input from the user, and when the first voice input is semantically similar to the predefined first wake-up-free instruction, the terminal makes a corresponding response to the first voice input, that is, Without waking up the terminal in advance, even if the sentence input by the user's voice is not the predefined first wake-up-free instruction, as long as it is semantically similar to the predefined first wake-up-free instruction, the terminal can respond, which is conducive to solving the predefined problem.
  • the first wake-up-free command is fixed and limited, causing the problem of terminal unresponsiveness, which in turn helps improve the user's interactive experience.
  • Figure 9 is a schematic flow chart of the second human-computer interaction method provided by the embodiment of the present application.
  • the method 900 may include step 901 and step 902. Each step shown in Figure 9 will be described in detail below.
  • Step 901 Receive first voice input from the user.
  • the first voice input may be a voice input received by the terminal from the user without the user waking up the terminal in advance.
  • a first voice input from the user is received.
  • the first voice input may be, for example, “navigate to location A”, “departure to location A”, “I want to go to location A”, “Play song B”, “I want to listen to song B”, etc., the embodiment of the present application does not place any limitation on the specific content of the first voice input.
  • Step 902 If the preset wake-up word is not received but the first voice input contains the target object, make a corresponding response to the first voice input.
  • the above-mentioned target object is an object whose number of mentions reaches a preset threshold in other voice inputs received before the first voice input, and the above-mentioned preset wake-up word is used to wake up the terminal. More specifically, the above-mentioned preset wake-up word The wake-up word is used to wake up the voice assistant (or smart assistant, smart assistant, etc., this application does not limit this) in the terminal.
  • the terminal when the terminal receives the first voice input without being awakened in advance, if the first voice input contains the target object, it will respond accordingly to the first voice input; if the first voice input does not contain the The target object then does not respond to the first voice input.
  • the above-mentioned target object may be, for example, a location, a media name (such as a song title), or an artist name, etc.
  • This application does not limit the specific content of the target object.
  • the above method before receiving the first voice input from the user, the above method further includes: receiving a preset wake-up word from the user; receiving a second voice input from the user; and the first object included in the second voice input.
  • the first object is determined as the target object.
  • the first object may be, for example, a location, a media name (such as a song title), an artist name, etc. This application does not limit the specific content of the target object.
  • the terminal determines whether the second voice input contains the first object.
  • the terminal determines whether the second voice input contains the first object.
  • the terminal determines whether the second voice input contains the first object.
  • the terminal determines whether the second voice input contains the first object.
  • the terminal determines whether the second voice input contains the first object.
  • the terminal determines whether the second voice input contains the first object.
  • the terminal determines whether the second voice input contains the first object.
  • the terminal determines whether the second voice input contains the first object.
  • determine the number of mentions of the first object If the number of mentions of the first object in the current second voice input and the previously received voice input exceeds a preset threshold, Then the above-mentioned first object is determined as the target object, so that next time the user directly speaks voice input containing the above-mentioned target object, the terminal can make a corresponding response. For example, if the terminal determines location A as the target object, there is no need to wake up the terminal in advance next time.
  • the user directly inputs "navigate to location A" by voice. After the terminal receives the above voice input and determines that the above voice input contains location A, the user The interface displays the route to location A. In this way, the user does not need to wake up the terminal in advance next time, which simplifies the interaction process and helps improve the user experience.
  • the terminal may record the number of times the first object is mentioned in the voice input, and each time the first object is mentioned, the corresponding number is incremented by 1.
  • the terminal can also generate a wake-up-free instruction including the target object based on the target object; prompt the user with the wake-up-free instruction, so that the user can directly use the above wake-up-free instruction next time to control the terminal to perform the corresponding operation.
  • the terminal can prompt the user with the above-mentioned wake-up-free instruction through a prompt box and/or voice broadcast.
  • the terminal gives a voice prompt to the user, "Next time, try direct navigation to location A", where location A is the target object.
  • Figure 10 is an interaction schematic diagram of the second human-computer interaction method provided by the embodiment of the present application.
  • the terminal in response to the user's voice input operation of "Xiaoyi Xiaoyi", the terminal replies "I am here", that is, the terminal is awakened. Further, in response to the user's voice input operation of "navigate to location A”, the terminal replies "OK, let's start navigating for you", and the terminal displays the route to location A through the user interface.
  • the terminal can prompt the user through voice broadcasting to "try next time and just use navigation to go to location A.” In other words, next time the user does not need to wake up the terminal in advance and directly inputs "Navigation to location A" by voice, the terminal can display the route to location A through the user interface.
  • FIG. 11 is a schematic flowchart of learning the first object in the second voice input provided by an embodiment of the present application.
  • Step 1101 Receive a preset wake-up word from the user.
  • the above preset wake-up words are used to wake up the terminal, and more specifically, are used to wake up the voice assistant in the terminal.
  • Step 1102 Receive second voice input from the user.
  • the terminal After the terminal is awakened, it receives the second voice input from the user.
  • the second voice input includes: “Navigate to location A”, “Leave to location A”, “I want to go to location A”, etc., which are not listed here one by one.
  • Step 1103 Determine whether the second voice input includes the first object.
  • the first object includes, for example, but is not limited to: location, media name (such as song title) or artist name, etc.
  • the terminal determines whether the second voice input contains the first object (such as location A). If the second voice input does not contain the first object, then step 1104 is executed; if the second voice input contains the first object, then Execute step 1105.
  • the first object such as location A
  • Step 1104 respond to the first voice input.
  • Step 1105 Determine whether the number of times the first object is mentioned in the current voice input and its previously received voice input exceeds a preset threshold.
  • step 1104 is executed, that is, responding to the second voice input; if the first object is mentioned in the current voice input If the number of mentions in other previously received voice inputs exceeds the preset threshold, step 1106 is executed.
  • Step 1106 Determine the first object as the target object.
  • the terminal can also generate a wake-up-free instruction based on the target object, and prompt the user to directly use the above wake-up-free instruction next time.
  • the terminal can prompt the user to directly use the above wake-up-free instruction next time through a prompt box and/or voice broadcast.
  • the terminal when the terminal is not awakened in advance, after receiving the first voice input from the user, if the first voice input contains an object whose number of mentions in the previous voice input reaches the preset threshold, then the It responds accordingly, that is, by learning from previous voice inputs and saving target objects whose mention times have reached a preset threshold, as long as the received voice input contains the above target objects, even if the terminal is not woken up in advance, The terminal can also respond accordingly, which saves the time to wake up the terminal, simplifies the interaction process, and helps improve the user's interaction experience.
  • Figure 12 is a schematic flow chart of the third human-computer interaction method provided by the embodiment of the present application.
  • the method 1200 may include step 1201 and step 1202. Each step shown in Figure 12 will be described in detail below.
  • Step 1201 Receive a first voice input from a user.
  • the first voice input belongs to a first instruction set, and the instructions in the first instruction set are semantically similar to predefined wake-up-free instructions.
  • the above-mentioned predefined wake-up-free command is used to instruct the terminal to perform operations corresponding to the wake-up-free command without inputting a preset wake-up word.
  • the instruction library pre-stores a predefined first instruction set and a second instruction set.
  • the instructions in the first instruction set are semantically similar to the predefined wake-up-free instructions, and the instructions in the second instruction set It is a predefined wake-up-free command.
  • the terminal receives the first voice input and determines that the first voice input belongs to the first instruction set.
  • the instruction library pre-stores a predefined second instruction set and a first instruction set learned based on voice input that corresponds to the instructions in the second instruction set, and the terminal receives the first voice input , determining that the first voice input belongs to the first instruction set.
  • the terminal receives the first voice input , determining that the first voice input belongs to the first instruction set.
  • Table 1 is an example of the first instruction set and the second instruction set pre-stored in the instruction library.
  • the instructions in the second instruction set are predefined wake-up-free instructions, such as "check whether there is congestion”, “slide down”, “reduce the page”, “navigate to the company”, “navigate home” ", etc.
  • the instructions in the first instruction set are semantically similar to the instructions in the second instruction set, such as "Is there a traffic jam on the road?", “Scroll down”, “Zoom out”, “Go to work”, “I want to go home” "wait. It can be seen that the instructions in the first instruction set are semantically similar to the instructions in the second instruction set, but the terms are different.
  • the instructions in the first instruction set are more colloquial, while the instructions in the second instruction set are standard human-computer interaction. instruction.
  • the first instruction set may be further divided into a first instruction sub-set. 1. First instruction sub-set 2.
  • the instructions in the first instruction sub-set 2 are more colloquial than the instructions in the first instruction sub-set 1.
  • the conditions for the terminal to respond to the instructions in the first instruction subset 2 are stricter than the conditions for responding to the instructions in the first instruction subset 1 .
  • Step 1202 If the preset conditions are met, respond to the above-mentioned first voice input.
  • the above preset conditions include at least one of the following: the number of users within a preset range from the terminal does not exceed a threshold; the user is in a predefined position; the user from whom the first voice input comes does not belong to the preset group ; Or, the time when the first voice input is received falls within a preset period.
  • the number of users within the preset range from the terminal does not exceed the threshold, that is, when the number of users within the preset range from the terminal is small, it is not difficult to respond to the first voice input. It is understood that if the number of surrounding users is small, the possibility of the user mistakenly mentioning the first voice input is smaller, that is, the user may really want the terminal to perform the corresponding operation. In contrast, if the number of surrounding users is large, the user The greater the possibility of mistakenly referring to the first speech input. Therefore, the above preset conditions can effectively prevent the user from mistakenly mentioning the first voice input, causing the terminal to respond.
  • the user is in a predefined position.
  • the terminal responds to the first voice input from the user closest to the user, or the user is in a scenic spot and hopes that the terminal is more likely to improve service.
  • the terminal responds to the first voice input from the user.
  • the user from whom the first voice input comes does not belong to the preset group, such as children, the elderly, etc. It is understandable that for the preset group, instructions issued by them may be dangerous, and the terminal may not respond to them.
  • the time when the first voice input is received falls within a preset period.
  • the preset period may be, for example, working hours (or commuting hours). During these periods, the terminal can respond to the above-mentioned first voice input. If it is other time periods, the terminal can only respond Predefined wake-up-free instructions.
  • the following takes the above method applied to a car as an example (for example, the terminal uses a car machine as an example), and enumerates the response of the terminal to the first voice input in the above scenarios.
  • Scenario 1 When there is one passenger in the car, the car machine responds to the first voice input; or, when there are multiple passengers in the car, the car machine does not respond to the first voice input.
  • the car computer can determine the number of people currently in the car based on the camera in the car. When there is one passenger in the car, that is, when there is only the driver in the car, the car computer responds to the first voice input. When there are multiple passengers in the car, the car machine does not respond to the first voice input.
  • the vehicle engine can respond to the instructions in the second instruction set even when there are one or more passengers in the vehicle. In this way, the possibility of accidentally waking up the car during a chat conversation can be greatly reduced when there are multiple passengers in the car.
  • Scenario 2 When the voice input comes from the main driver, the car machine responds to the first voice input; or when the voice input comes from other passengers other than the main driver, the car machine does not respond to the first voice input.
  • the car machine can obtain whether the first voice input comes from the driver or other passengers based on the interaction with the seat. If the first voice input comes from the driver, then The car machine responds to the first voice input; if the first voice input comes from other passengers, the car machine does not respond to the first voice input. In addition, the vehicle machine can respond to instructions in the second instruction set whether from the main driver or other passengers.
  • Scenario 3 When the user from whom the first voice input comes does not belong to the preset group, the car machine responds to the first voice input; or, when the user from whom the first voice input comes belongs to the preset group, the car machine Does not respond to first voice input.
  • the car machine can determine whether the first voice input comes from a preset group of people, taking a child as an example. If the first voice input comes from a child, the car machine will not respond to the first voice input. If the first voice input comes from a child, the car machine will not respond to the first voice input. If the voice input does not come from a child, the car machine responds to the first voice input. In this way, it can effectively avoid the situation where the child mistakenly speaks the first voice input and causes the car machine to respond.
  • Scenario 4 When the time when the voice input is received falls within the preset time period, the car machine responds to the first voice input; or when the time when the voice input is received does not fall within the preset time period, the car machine does not respond. First voice input.
  • the preset time period is the working period. If the vehicle machine receives the first voice input during the working period, it can respond to the first voice input; if the vehicle machine receives the first voice input during the non-working period, It is possible not to respond to the first voice input.
  • the terminal when the terminal determines to respond to the first voice input, it can first determine the semantics of the voice input to the user, and in response to the user's operation to confirm the above semantics, respond to the above first voice input .
  • the vehicle responds to the first voice input when the first voice input comes from the driver and the time when the voice input is received falls within a preset period.
  • the car machine responds to the first voice input when there is only one passenger in the car and the time when the first voice input is received falls within a preset time period. For the sake of brevity, they are not listed here.
  • Figure 13 is a schematic flowchart of determining whether to respond to the first voice input according to the scenario provided by the embodiment of the present application.
  • the method described in Figure 13 is a combination of scenario two and scenario four.
  • Step 1301 Receive first voice input from the user.
  • the car machine Without waking up the car machine in advance, in response to the user's operation of inputting the first voice input, the car machine receives the first speech from the user. sound input.
  • the first voice input belongs to a first instruction set, and instructions in the first instruction set are semantically similar to predefined wake-up-free instructions.
  • Step 1302 Determine whether the first voice input comes from the driver.
  • the vehicle computer After the vehicle computer receives the first voice input, it determines whether the first voice input comes from the driver. If the first voice input does not come from the driver, the vehicle computer executes step 1303; if the first voice input comes from the driver If you are driving, perform step 1304.
  • Step 1303, do not respond to the first voice input.
  • the vehicle machine does not respond to the first voice input.
  • the vehicle machine can respond to instructions in the second instruction set from the user.
  • Step 1304 Determine whether the time when the first voice input is received falls within a preset period.
  • the vehicle computer continues to determine whether the time when the first voice input is received falls within the preset period. If the time when the first voice input is received falls within the preset period, the vehicle machine may execute step 1305; if the time when the first voice input is received does not fall within the preset period, the vehicle machine may execute step 1306.
  • Step 1305, respond to the first voice input.
  • the vehicle machine can respond to the first voice input.
  • Step 1306 Respond to the first voice input, but need to ask the user.
  • the car machine can respond to the first voice input, but before responding to the first voice input, the user needs to confirm the semantics of the first voice input. When the semantics are confirmed, respond to the first voice input.
  • the terminal After the terminal receives the first voice input that is semantically similar to the predefined wake-up-free instruction, it responds to the first voice input when the preset conditions are met. That is to say, for the predefined wake-up-free command The terminal will respond accordingly only if the first voice input with similar command semantics meets the preset conditions, which may not be possible in all circumstances. This can prevent the user from mistakenly mentioning the first voice input and causing the terminal to respond. It is conceivable that the first voice input may be more colloquial than the predefined wake-up-free instructions. If a response is made under any circumstances, it is likely that the terminal response will be frequently triggered during the user's conversation. Therefore, by setting the preset Conditions, the terminal will respond accordingly only when the preset conditions are met, which will greatly improve the user's interactive experience.
  • Figure 14 is a schematic flowchart of the fourth human-computer interaction method provided by an embodiment of the present application.
  • the method 1400 may include step 1401 and step 1402. Each step shown in Figure 14 will be described in detail below.
  • Step 1401 Receive the first voice input from the user.
  • the first voice input does not belong to the predefined wake-up-free instructions.
  • the first voice input may be a voice input received by the terminal from the user without the user waking up the terminal in advance.
  • the first voice input from the user is received.
  • the first voice input may be, for example, “leave to the company”, “navigate to the company”, “leave to work”, etc.
  • This application implements The specific content of voice input is not limited in any way.
  • Step 1402 If the first voice input is semantically similar to the first no-wake-up instruction among the predefined no-wake-up instructions, guide the user to input the first no-wake-up instruction.
  • the terminal determines that the first voice input and the first wake-up-free instruction have similar semantics, and then guides the user to input the first wake-up-free instruction so that the terminal responds to the first wake-up-free instruction.
  • the terminal may determine based on semantic analysis in natural language processing that the first voice input is semantically similar to the first wake-up-free instruction.
  • the above voice input is “Go to work”
  • the first wake-up-free instruction with similar semantics is “Navigate to the company”.
  • the terminal After the terminal receives the voice input of "Go to work”, it determines that the above voice input does not occur. It belongs to the predefined wake-up-free instructions, and it is recognized that the semantics of the above voice input are similar to "navigate to the company”. Therefore, the terminal can guide the user to say "Navigate to the company".
  • the above-mentioned guiding the user to input the first wake-up-free instruction includes: guiding the user to input the first wake-up-free instruction through a prompt box and/or voice broadcast.
  • the terminal can guide the user to issue the first wake-up-free instruction through a prompt box. For example, after the terminal determines that the voice input has similar semantics to the first wake-up-free instruction in the command library, the terminal displays the first wake-up-free instruction on the user interface through a prompt box. The terminal can also guide the user to issue the first wake-up-free instruction through voice broadcast. For example, after the terminal determines that the above-mentioned voice input has similar semantics to the first no-wake-up command in the command library, the terminal reminds the user by voice to use the above-mentioned first no-wake-up command. The terminal can guide the user to issue the first wake-up-free command through a prompt box and voice broadcast.
  • the terminal After the terminal determines that the above-mentioned voice input has similar semantics to the first wake-up-free command in the command library, the terminal first displays the above-mentioned first wake-up-free command on the user interface through a prompt box. If the user has not issued a wake-up command within the preset time range, The above first wake-up-free command will eventually The user is prompted with a voice to use the first wake-up-free command, or the prompt box displays the first wake-up-free command on the user interface, and at the same time, the voice prompts the user to use the first wake-up-free command.
  • This application does not limit the terminal boot method.
  • guiding the user to issue the first wake-up-free instruction through a prompt box and/or voice broadcast including: prompting the user to issue the first wake-up-free instruction through a prompt box, the prompt box containing the first wake-up-free instruction;
  • a voice broadcast is used to guide the user to issue the first wake-up-free command.
  • the terminal prompts the user to issue the first wake-up-free instruction through a prompt box for the first time, and the prompt box contains the first wake-up-free instruction.
  • the second time it prompts the user to issue the first wake-up-free instruction through the prompt box again, within 1 minute.
  • the number of prompts through the prompt box reaches two, but the user does not issue the first wake-up-free command, the user is guided to issue the first wake-up-free command through voice broadcast.
  • Figure 15 is an interactive schematic diagram for guiding a user to issue a first wake-up-free instruction provided by an embodiment of the present application.
  • the terminal determines that the voice input has similar semantics to the first wake-up-free command in the command library, "Navigate to the company". Therefore, the user is prompted through the prompt box for the first time to " Try using the navigation method to go to the company. The second time the user still uses “leave to work”. The terminal continues to prompt the user through the prompt box to "try using the navigation method to go to the company.” The third time the user still uses “leave to work”. The terminal prompts the user through a prompt box to "try using the navigation system to go to the company” and prompts the user through a voice prompt to "try using the navigation system to go to the company.”
  • the terminal receives the first voice input.
  • the first voice input does not belong to the predefined wake-up-free instructions, but the first voice input is semantically similar to the first wake-up-free instruction among the predefined wake-up-free instructions.
  • the terminal guides the user to input the corresponding first wake-up-free command, so that after the user inputs the first wake-up-free command, the terminal responds accordingly.
  • the user's interactive experience can be greatly improved. .
  • Figure 16 is a schematic flow chart of the fifth human-computer interaction method provided by the embodiment of the present application.
  • the method may include steps 1601 to 1605. Each step shown in Figure 16 will be described in detail below.
  • Step 1601 Receive first voice input from the user.
  • the terminal In response to the user's operation of inputting the first voice input, the terminal receives the first voice input from the user.
  • the first voice input is received without receiving a preset wake-up word from the user.
  • the first voice input includes: “Navigate to location A”, “I want to go to location A”, “Place A” "Where” and so on, I won't list them one by one here.
  • Step 1602 Determine whether the first voice input is used to request navigation.
  • the terminal determines whether the intention of the first voice input is to request navigation. If the first voice input is not used to request navigation, the terminal performs step 1603; if the first voice input is used to request navigation, the terminal performs step 1604.
  • Step 1603 Do not respond to the first voice input.
  • Step 1604 Ask the user for the destination requesting navigation.
  • the terminal inquires the user about the destination for requesting navigation. For example, if the voice input is "Navigate to location A", then the terminal receives and determines that the first voice input is for navigation. Further, the terminal asks the user for the navigation destination, such as asking the user "where do you want to go.” The user feedbacks "Place A", and after receiving the user's feedback, the terminal obtains the route to Location A from the cloud.
  • the terminal can ask the user through voice broadcast, or through a prompt box (such as toast), or through a prompt box (such as toast) plus voice broadcast.
  • a prompt box such as toast
  • a prompt box such as toast plus voice broadcast.
  • This application does not place any restrictions on the terminal's query method. For example, the terminal asks the user through a prompt box (such as toast) for the first two times, and the third time uses a prompt box (such as toast) plus voice broadcast to ask the user.
  • Step 1605 Provide navigation services to the user based on the destination fed back by the user.
  • the terminal After obtaining the route to the above destination, the terminal provides navigation services to the user. For example, display directions to a destination through the user interface.
  • the terminal can also generate a wake-up-free instruction including the above destination based on the destination.
  • the terminal can also prompt the user through prompt boxes and/or voice broadcasts that the above-mentioned wake-up-free instruction can be used directly next time.
  • FIG 17 is an interaction schematic diagram of the fifth human-computer interaction method provided by the embodiment of the present application.
  • the terminal in response to the user's voice input operation of "navigate to location A", the terminal asks the user "where do you want to go” through voice broadcast.
  • the user replies "Place A", and in response to the user's reply, the terminal displays the route to the location A to the user through the user interface.
  • the terminal without waking up the terminal in advance, after the terminal receives the first voice input from the user and finds that its intention is to request navigation, it can ask the user for the navigation destination and determine the purpose according to the user's feedback. to provide navigation services to users, There is no need to wake up the terminal in advance, which simplifies the interaction process and helps improve the user's interaction experience.
  • An embodiment of the present application also provides a terminal, which includes corresponding modules for performing the steps performed by the terminal in any one of the embodiments described in FIGS. 5 to 17 .
  • the terminal can be used to implement the method described in any of the embodiments described in Figures 5 to 17.
  • the modules included in the terminal can be implemented by software and/or hardware.
  • An embodiment of the present application also provides a terminal, which includes a memory and a processor, wherein the memory is used to store a computer program, and the processor is used to call and execute the computer program, so that the terminal implements the implementation described in Figures 5 to 17.
  • a terminal which includes a memory and a processor, wherein the memory is used to store a computer program, and the processor is used to call and execute the computer program, so that the terminal implements the implementation described in Figures 5 to 17. The method described in any of the examples.
  • An embodiment of the present application also provides a vehicle, on which a terminal as described above is deployed.
  • the terminal may be a vehicle machine, for example.
  • This application also provides a chip system, which includes at least one processor and is used to implement the method described in any one of the embodiments described in FIGS. 5 to 17 .
  • the chip system further includes a memory, the memory is used to store program instructions and data, and the memory is located within the processor or outside the processor.
  • the chip system can be composed of chips or include chips and other discrete devices.
  • the computer program product includes computer-readable instructions.
  • the computer-readable instructions are run by a computer, any one of the embodiments described in FIGS. 5 to 17 can be implemented. the method described.
  • This application also provides a computer-readable storage medium that stores computer-readable instructions.
  • the computer readable instructions are executed by the computer, the method described in any one of the embodiments described in FIGS. 5 to 17 is implemented.
  • the processor in the embodiment of the present application may be an integrated circuit chip with signal processing capabilities.
  • each step of the above method embodiment can be completed through an integrated logic circuit of hardware in the processor or instructions in the form of software.
  • the above-mentioned processor can be a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (field programmable gate array, FPGA), or other available processors.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • Programmd logic devices discrete gate or transistor logic devices, discrete hardware components.
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
  • the steps of the method disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field.
  • the storage medium is located in the memory, and the processor reads the information in the memory and completes the steps of the above method in combination with its hardware.
  • non-volatile memory may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
  • non-volatile memory can be read-only memory (ROM), programmable ROM (PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically removable memory. Erase electrically programmable read-only memory (EPROM, EEPROM) or flash memory.
  • Volatile memory can be random access memory (RAM), which is used as an external cache.
  • RAM static random access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • double data rate SDRAM double data rate SDRAM
  • DDR SDRAM double data rate SDRAM
  • ESDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronous link dynamic random access memory
  • direct rambus RAM direct rambus RAM
  • unit may be used to refer to computer-related entities, hardware, firmware, a combination of hardware and software, software, or software in execution.
  • multiple units or components may be combined or can be integrated into another system, or some features can to ignore or not execute.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
  • the units described as discrete components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit.
  • each functional unit may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions (programs). When the computer program instructions (program) are loaded and executed on a computer, the processes or functions described in the embodiments of the present application are generated in whole or in part.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transferred from a website, computer, server, or data center Transmission to another website, computer, server or data center through wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) means.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more available media integrated.
  • the available media may be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., digital video discs (DVD)), or semiconductor media (e.g., solid state disks (SSD) )wait.
  • magnetic media e.g., floppy disks, hard disks, magnetic tapes
  • optical media e.g., digital video discs (DVD)
  • semiconductor media e.g., solid state disks (SSD)
  • the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application.
  • the aforementioned storage media include: U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

提供了人机交互方法及相关装置,该方法包括:终端接收来自用户的第一语音输入,在上述第一语音输入与预定义的第一免唤醒指令语义相似的情况下,对第一语音输入做出相应的响应,也即,无需预先唤醒终端,只要接收到的第一语音输入与预定义的第一免唤醒指令语义相似,终端即可执行第一语音输入对应的操作,解决了第一免唤醒指令固定且有限导致的终端无响应的问题,相比于终端只对预定义的第一免唤醒指令做出响应,大大提高了用户的交互体验。

Description

人机交互方法及相关装置
本申请要求于2022年09月05日提交中国专利局、申请号为202211079452.4、申请名称为“人机交互方法及相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及终端技术领域,尤其涉及人机交互方法及相关装置。
背景技术
随着智能终端的普及程度越来越高,语音交互成为常用且重要的人机交互方式之一。目前,语音交互大多需要用户先通过预设的唤醒词唤醒终端,进而实现后续的交互,这种方式比较繁琐,进而用户体验不佳。还有一部分厂家提供了免唤醒功能,也即,无需预先唤醒终端,直接输入预定义的免唤醒指令即可,但是预定义的免唤醒指令固定且有限,并且容易在用户聊天的时候误唤醒,影响用户体验。
因此,希望提供人机交互方法,以提高用户的交互体验。
发明内容
本申请提供了人机交互方法及相关装置,以期提高用户的交互体验。
第一方面,本申请提供了一种人机交互方法,该方法可以由终端来执行,或者,也可以由配置在终端中的部件(如芯片、芯片***等)执行,或者,还可以由能实现全部或部分终端功能的逻辑模块或软件实现,本申请对此不作限定。
示例性地,该方法包括:接收来自用户的第一语音输入;在确定上述第一语音输入与预定义的第一免唤醒指令语义相似的情况下,对上述第一语音输入做出相应的响应,上述第一免唤醒指令用于在不输入预设的唤醒词的情况下指示终端执行第一免唤醒指令对应的操作。
基于上述技术方案,终端接收来自用户的第一语音输入,在上述第一语音输入与预定义的第一免唤醒指令语义相似的情况下,对第一语音输入做出相应的响应,也即,在未预先唤醒终端的情况下,即使用户语音输入的语句不是预定义的第一免唤醒指令,只要与预定义的第一免唤醒指令语义相似,终端便可以做出响应,有利于解决预定义的第一免唤醒指令固定且有限导致的终端无响应的问题,进而有利于提高用户的交互体验。
结合第一方面,在第一方面的某些可能的实现方式中,在对第一语音输入做出相应的响应之前,上述方法还包括:向用户确认第一语音输入的语义。
终端接收到第一语音输入之后,可以向用户确认一下识别出的第一语音输入的语义是否正确,这样一方面可以提高准确性,另一方面,可以避免用户误提及第一语音输入导致终端做出响应,例如,如果用户是误提及第一语音输入,则可以在终端向用户确认时,做出否定的回复,以避免终端继续执行对应的操作,有利于提高用户的体验。
可选地,向用户确认第一语音输入的语义,包括:通过提示框和/或语音播报,向用户确认第一语音输入的语义。
终端可以通过提示框向用户确认第一语音输入的语义,提示框中包含第一语音输入的语义,还可以通过语音播报,向用户确认第一语音输入的语义,还可以通过提示框和语音播报结合的方式,向用户确认第一语音输入的语义。通过提供上述多种确认方式,大大提高了终端向用户确认语义时的灵活性。
结合第一方面,在第一方面的某些可能的实现方式中,上述方法还包括:向用户提示第一免唤醒指令。
终端还可以向用户提示下一次直接使用预定义的第一免唤醒指令。例如,终端可以通过提示框和/或语音播报,向用户提示第一免唤醒指令。本申请对提示方式不作限定。
结合第一方面,在第一方面的某些可能的实现方式中,在接收来自用户的第一语音输入之前,上述方法还包括:接收来自用户的第二语音输入;在第二语音输入与第一免唤醒指令语义相似的情况下,向用户确认第二语音输入的语义;响应于用户确认第二语音输入的语义的操作,生成与第二语音输入对应 的第二免唤醒指令。
终端通过上述方法来学习并生成第二免唤醒指令,第二免唤醒指令可以用于在不输入预设的唤醒词的情况下指示终端执行对应的操作。具体地,终端接收到来自用户的第二语音输入后,在第二语音输入与第一免唤醒指令语义相似的情况下,向用户确认一下是否是上述语义,如果用户确认第二语音输入的语义正确,则生成与其对应的第二免唤醒指令,以便于下次终端未被预先唤醒的情况下,再次接收到第二语音输入时,可以对其做出响应。换言之,大大增加了可以用于在不输入预设的唤醒词的情况下指示终端执行对应的操作的免唤醒指令,进而有利于提高用户的交互体验。
结合第一方面,在第一方面的某些可能的实现方式中,第一语音输入与预定义的第一免唤醒指令语义相似,包括:第一语音输入与第二免唤醒指令相同。
终端接收到第一语音输入之后,确定第一语音输入与第一免唤醒指令是否语义相似,一种方式是,可以基于第一语音输入与预定义的第一免唤醒指令做语义分析确定二者是否语义相似。另一种方式是,终端可以判断第一语音输入与生成的第二免唤醒指令是否相同,可以理解,第二免唤醒指令是基于第二语音输入生成的与第一免唤醒指令语义相似的指令,如果第一语音输入与生成的第二免唤醒指令相同,则第一语音输入与第一免唤醒指令语义相似,这样终端也可以对第一语音输入做出响应。上述两种方式可以结合使用,也可以分开使用,大大提高了终端确定第一语音输入与第一免唤醒指令是否语义相似的灵活性。
结合第一方面,在第一方面的某些可能的实现方式中,接收来自用户的第二语音输入,包括:在预设时长范围内连续多次接收到第二语音输入。
换言之,终端在预设时长范围内连续多次接收到第二语音输入的情况下,再向用户确认第二语音输入的语义,这样一来,可以有效地避免用户误提及第二语音输入的情况下,终端误以为是用户希望执行相应的操作,有利于提高用户的交互体验。
第二方面,本申请提供了一种人机交互方法,该方法可以由终端来执行,或者,也可以由配置在终端中的部件(如芯片、芯片***等)执行,或者,还可以由能实现全部或部分终端功能的逻辑模块或软件实现,本申请对此不作限定。
示例性地,该方法包括:接收来自用户的第一语音输入;在未接收到预设的唤醒词,但第一语音输入包含目标对象的情况下,对第一语音输入做出相应的响应,上述目标对象是在第一语音输入之前接收到的其他语音输入中被提及次数达到预设门限的对象,上述预设的唤醒词用于唤醒终端。
基于上述技术方案,终端未被预先唤醒的情况下,接收到来自用户的第一语音输入后,若该第一语音输入中包含之前语音输入中被提及次数达到预设门限的对象,则对其做出相应的响应,也即,通过对之前语音输入的学习,保存被提及次数达到预设门限的目标对象后,只要接收到的语音输入中包含上述目标对象,即使不预先唤醒终端,终端也可以对其做出相应的响应,节省了唤醒终端的时间,简化了交互流程,有利于提高用户的交互体验。
结合第二方面,在第二方面的某些可能的实现方式中,在接收来自用户的第一语音输入之前,上述方法还包括:接收来自用户的预设的唤醒词;接收来自用户的第二语音输入;在第二语音输入中包含的第一对象在第二语音输入及其之前的语音输入中被提及的次数超过预设门限的情况下,将第一对象确定为目标对象。
终端可以记录第一对象在语音输入中被提及的次数,如果第一对象在语音输入中被提及的次数超过预设门限,则将其确定为目标对象,以便于用户后续可以在未预先唤醒终端的情况下,发出包含目标对象的语音输入,终端接收到上述语音输入后,便可以做出响应,也即,无需预先唤醒终端,简化了交互流程,有利于提高用户的交互体验。
结合第二方面,在第二方面的某些可能的实现方式中,上述方法还包括:基于目标对象,生成包含目标对象的免唤醒指令;向用户提示免唤醒指令。
终端可以基于目标对象,生成包含目标对象的免唤醒指令,并向用户提示下次可以直接使用上述免唤醒指令,无需预先唤醒终端,终端即可以做出相应的响应。其中,终端可以通过提示框和/或语音播报,向用户提示上述免唤醒指令。本申请对提示方式不作限定。
第三方面,本申请提供了一种人机交互方法,该方法可以由终端来执行,或者,也可以由配置在终端中的部件(如芯片、芯片***等)执行,或者,还可以由能实现全部或部分终端功能的逻辑模块或软 件实现,本申请对此不作限定。
示例性地,该方法包括:接收来自用户的第一语音输入,该第一语音输入属于第一指令集合,该第一指令集合中的指令与预定义的免唤醒指令语义相似;在满足预设条件的情况下,响应第一语音输入。
基于上述技术方案,终端接收到与预定义的免唤醒指令语义相似的第一语音输入后,在满足预设条件的情况下,响应第一语音输入,也就是说,对于与预定义的免唤醒指令语义相似的第一语音输入,满足预设条件,终端才会做出相应的响应,并不是任何情况下都能响应,这样可以避免用户误提及第一语音输入导致终端响应。可以想象,第一语音输入可能相对预定义的免唤醒指令来说比较口语化,如果任何情况下都做出响应,很可能出现用户交谈过程中频繁触发终端响应的情况,因此,通过设置预设条件,在满足预设条件的情况下,终端才会做出相应的响应,有利于大大提高用户的交互体验。
结合第三方面,在第三方面的某些可能的实现方式中,上述预设条件包括以下至少一项:与终端距离处于预设范围内的用户的数量不超过阈值;用户处于预定义的位置;第一语音输入所来自的用户不属于预设人群;或,接收到第一语音输入的时间落入预设时段。
与终端距离处于预设范围内的用户的数量不超过阈值,也即,在与终端距离处于预设范围内的用户的数量较少的情况下,可以响应上述第一语音输入,不难理解,如果周围用户数量较少,则用户误提及第一语音输入的可能性越小,也即,用户可能确实是希望终端执行对应的操作,相对地,如果周围用户数量较多,则用户误提及第一语音输入的可能性越大。用户处于预定义的位置,例如,终端响应来自距离自身最近的用户的第一语音输入,或,用户处于景区,希望终端提高服务的可能性更大等,终端响应来自用户的第一语音输入。第一语音输入所来自的用户不属于预设人群,预设人群例如小孩、老人等,可以理解,对于预设人群,其发出的指令可能存在危险性,终端可以不对其做出响应。接收到第一语音输入的时间落入预设时段,预设时段例如可以是上班时段,这些时段终端可以响应上述第一语音输入,如果是其他时段,终端可以只响应预定义的免唤醒指令。综上,上述预设条件可以有效地避免用户误提及第一语音输入导致终端响应。
结合第三方面,在第三方面的某些可能的实现方式中,上述方法应用于车,上述与终端距离处于预设范围内的用户的数量不超过阈值,包括:车内存在一个乘客;或,上述用户处于预定义的位置,包括:用户处于主驾的位置。
第四方面,本申请提供了一种人机交互方法,该方法可以由终端来执行,或者,也可以由配置在终端中的部件(如芯片、芯片***等)执行,或者,还可以由能实现全部或部分终端功能的逻辑模块或软件实现,本申请对此不作限定。
示例性地,该方法包括:在未接收到来自用户的预设的唤醒词的情况下,根据来自用户的第一语音输入,确定第一语音输入用于请求导航;向用户询问请求导航的目的地;基于用户反馈的目的地,为用户提供导航服务。
基于上述技术方案,在未预先唤醒终端的情况下,终端接收到来自用户的第一语音输入后,发现其意图是想请求导航,便可以向用户询问导航的目的地,并根据用户反馈的目的地,向用户提供导航服务,无需预先唤醒终端,简化了交互流程,有利于提高用户的交互体验。
结合第四方面,在第四方面的某些可能的实现方式中,上述方法还包括:生成包含目的地的免唤醒指令;向用户提示上述免唤醒指令。
终端可以生成包含上述目的地的免唤醒指令,并向用户提示下次可以直接使用上述免唤醒指令,终端便可以做出相应的响应。其中,终端可以通过提示框和/或语音播报,向用户提示上述免唤醒指令。本申请对提示方式不作限定。
第五方面,本申请提供了一种人机交互方法,该方法可以由终端来执行,或者,也可以由配置在终端中的部件(如芯片、芯片***等)执行,或者,还可以由能实现全部或部分终端功能的逻辑模块或软件实现,本申请对此不作限定。
示例性地,该方法包括:接收来自用户的第一语音输入,该第一语音输入不属于预定义的免唤醒指令;在第一语音输入与预定义的免唤醒指令中的第一免唤醒指令语义相似的情况下,引导用户输入上述第一免唤醒指令。
基于上述技术方案,终端接收到第一语音输入,该第一语音输入不属于预定义的免唤醒指令,但该第一语音输入与预定义的免唤醒指令中的第一免唤醒指令语义相似,则终端引导用户输入对应的第一免 唤醒指令,以便于用户输入第一免唤醒指令后,终端对其做出相应的响应,相比于终端不响应也不提示,可以大大提高用户的交互体验。
结合第五方面,在第五方面的某些可能的实现方式中,上述引导用户输入第一免唤醒指令,包括:通过提示框和/或语音播报,引导用户输入第一免唤醒指令。
终端可以通过提示框引导用户输入第一免唤醒指令,提示框中包含第一免唤醒指令,还可以通过语音播报,引导用户输入第一免唤醒指令,还可以通过提示框和语音播报结合的方式,引导用户输入第一免唤醒指令。通过提供上述多种方式,大大提高了终端引导用户输入第一免唤醒指令时的灵活性。
结合第五方面,在第五方面的某些可能的实现方式中,上述通过提示框和/或语音播报,引导用户输入第一免唤醒指令,包括:通过提示框提示用户输入第一免唤醒指令,该提示框中包含第一免唤醒指令;在预设时长范围内通过提示框提示的次数达到预设门限,但用户未发出第一免唤醒指令的情况下,通过语音播报,引导用户输入第一免唤醒指令。
第六方面,本申请提供了一种计算机设备,包括用于实现第一方面至第五方面以及第一方面至第五方面任一种可能实现方式中的方法的单元。应理解,各个单元可通过执行计算机程序来实现相应的功能。
第七方面,本申请提供了一种计算机设备,包括处理器,所述处理器用于执行第一方面至第五方面以及第一方面至第五方面任一种可能实现方式中所述的方法。
所述计算机设备还可以包括存储器,用于存储计算机可读指令,所述处理器读取所述计算机可读指令使得所述计算机设备可以实现上述各方面中描述的方法。所述计算机设备还可以包括通信接口,所述通信接口用于该计算机设备与其它设备进行通信,示例性地,通信接口可以是收发器、电路、总线、模块或其它类型的通信接口。
第八方面,本申请提供了一种车辆,用于实现第一方面至第五方面以及第一方面至第五方面任一种可能实现方式中的方法,或,包括第六方面或第七方面所述的任意一种计算机设备。
第九方面,本申请提供了一种芯片***,该芯片***包括至少一个处理器,用于支持实现上述第一方面至第五方面以及第一方面至第五方面任一种可能实现方式中所涉及的功能,例如,例如接收或处理上述方法中所涉及的数据和/或信息。
在一种可能的设计中,所述芯片***还包括存储器,所述存储器用于保存程序指令和数据,存储器位于处理器之内或处理器之外。
该芯片***可以由芯片构成,也可以包含芯片和其它分立器件。
第十方面,本申请提供了一种计算机可读存储介质,所述存储介质中存储有计算机可读指令,当所述计算机可读指令被计算机执行时,使得计算机实现第一方面至第五方面以及第一方面至第五方面任一种可能实现方式中的方法。
第十一方面,本申请提供了一种计算机程序产品,所述计算机程序产品包括:计算机可读指令,当所述计算机可读指令被计算机运行时,使得计算机实现第一方面至第五方面以及第一方面至第五方面任一种可能实现方式中的方法。
应当理解的是,本申请的第六方面至第十一方面与本申请的第一方面至第五方面的技术方案相对应,各方面及对应的可行实施方式所取得的有益效果相似,不再赘述。
附图说明
图1是本申请实施例提供的终端的结构示意图;
图2是适用于本申请实施例提供的人机交互方法的场景示意图;
图3是一种已知的人机交互方法的示意图;
图4是另一种已知的人机交互方法的示意图;
图5是本申请实施例提供的第一种人机交互方法的示意性流程图;
图6是本申请实施例提供的第一种人机交互方法的交互示意图;
图7是本申请实施例提供的对语音输入的用语进行学习的流程示意图;
图8是本申请实施例提供的对语音输入的用语进行学习的又一流程示意图;
图9是本申请实施例提供的第二种人机交互方法的示意性流程图;
图10是本申请实施例提供的第二种人机交互方法的交互示意图;
图11是本申请实施例提供的对第二语音输入中的第一对象进行学习的流程示意图;
图12是本申请实施例提供的第三种人机交互方法的示意性流程图;
图13是本申请实施例提供的根据场景确定是否响应第一语音输入流程示意图;
图14是本申请实施例提供的第四种人机交互方法的流程示意图;
图15是本申请实施例提供的引导用户发出第一免唤醒指令的交互示意图;
图16是本申请实施例提供的第五种人机交互方法的示意性流程图;
图17是本申请实施例提供的第五种人机交互方法的交互示意图。
具体实施方式
下面将结合附图,对本申请中的技术方案进行描述。
本申请实施例提供的方法可以应用于手机、平板电脑、智能手表、智能音箱、可穿戴设备、车载设备、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备、笔记本电脑、个人计算机(personal computer,PC)、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、个人数字助理(personal digital assistant,PDA)、分布式设备等终端上。
需要说明的是,本申请实施例对终端的具体类型不作任何限定。
示例性地,图1示出了终端100的结构示意图。如图1所示,该终端100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP)、微控制单元(microcontroller unit,MCU)、调制解调处理器、图形处理器(graphics processing unit,GPU)、图像信号处理器(image signal processor,ISP)、控制器、存储器、视频编解码器、数字信号处理器(digital signal processor,DSP)、基带处理器及神经网络处理器(neural-network processing unit,NPU)等中的一个或多个。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
其中,应用处理器通过音频模块170(如扬声器170A等)输出声音信号,或通过显示屏194显示图像或视频。
控制器可以是终端100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了***的效率。
处理器110可以通过执行指令,执行不同的操作,以实现不同的功能。该指令例如可以是设备出厂前预先保存在存储器中的指令,也可以是用户在使用过程中安装新的应用(application,APP)之后从APP中读取到的指令,本申请实施例对此不作任何限定。
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口、集成电路内置音频(inter-integrated circuit sound,I2S)接口、安全数字输入输出接口(secure digital input and output,SDIO)、脉冲编码调制(pulse code modulation,PCM)接口、通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口、通用同步异步收发传输器(universal synchronous asynchronous receiver/transmitter,USART)、移动产业处理器接口(mobile industry processor interface,MIPI)、通用输入输出(general-purpose input/output,GPIO)接口、SIM接口和/或USB接口等。
USB接口130是符合USB标准规范的接口,具体可以是Mini USB接口,Micro USB接口,USB Type C接口等。USB接口130可以用于连接充电器为终端100充电,也可以用于终端100与***设备之间传输数据。也可以用于连接耳机,通过耳机播放音频。该接口还可以用于连接其他终端。
可以理解的是,本申请示意的各模块间的接口连接关系,只是示意性说明,并不构成对终端100的结构限定。在另一些实施例中,终端100也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。
充电管理模块140用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。在一些有线充电的实施例中,充电管理模块140可以通过USB接口130接收有线充电器的充电输入。在一些无线充电的实施例中,充电管理模块140可以通过终端100的无线充电线圈接收无线充电输入。充电管理模块140为电池142充电的同时,还可以通过电源管理模块141为终端供电。
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110、内部存储器121、外部存储器、显示屏194、摄像头193和无线通信模块160等供电。电源管理模块141还可以用于监测电池容量、电池循环次数、电池健康状态(漏电,阻抗)等参数。在其他一些实施例中,电源管理模块141也可以设置于处理器110中。在另一些实施例中,电源管理模块141和充电管理模块140也可以设置于同一个器件中。
终端100的无线通信功能可以通过天线1、天线2、移动通信模块150、无线通信模块160、调制解调处理器以及基带处理器等实现。
天线1和天线2用于发射和接收电磁波信号。终端100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块150可以提供应用在终端100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器、开关、功率放大器、低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器170A,受话器170B等)输出声音信号,或通过显示屏194显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。
无线通信模块160可以提供应用在终端100上的包括无线局域网(wireless local area networks,WLAN),如无线保真(wireless fidelity,Wi-Fi)网络)、蓝牙(bluetooth,BT)、全球导航卫星***(global navigation satellite system,GNSS)、调频(frequency modulation,FM)、近距离无线通信技术(near field communication,NFC)、红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
在一些实施例中,终端100的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得终端100可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯***(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),第五代(5th generation,5G)通信***,BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位***(global positioning system,GPS), GNSS,北斗卫星导航***(BeiDou navigation satellite system,BDS),准天顶卫星***(quasi-zenith satellite system,QZSS)和/或星基增强***(satellite based augmentation systems,SBAS)。
终端100可以通过GPU、显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏194用于显示图像、视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD)、有机发光二极管(organic light-emitting diode,OLED)、有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode,AMOLED)、柔性发光二极管(flex light-emitting diode,FLED),迷你LED(Mini LED)、微Led(Micro LED)、微OLED(Micro-OLED)、量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,终端100可以包括一个或多个显示屏194。
在本申请中,显示屏194可以用于显示提示框,该提示框中包含预定义的免唤醒指令,该提示框用于提示用户下一次可以直接使用上述免唤醒指令,也即,无需预先唤醒终端,即可以通过上述免唤醒指令实现与终端的语音交互。
终端100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。在一些实施例中,终端100可以包括一个或多个摄像头193。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当终端100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。
视频编解码器用于对数字视频压缩或解压缩。终端100可以支持一种或多种视频编解码器。这样,终端100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1、MPEG2、MPEG3、MPEG4等。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展终端100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,从而执行终端100的各种功能应用以及数据处理。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作***,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储终端100使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。
终端100可以通过音频模块170,如扬声器170A、受话器170B、麦克风170C和耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放、录音等。
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。终端100可以通过扬声器170A收听音乐,或收听免提通话。
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当终端100接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。终端100可以设置 至少一个麦克风170C。在另一些实施例中,终端100可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,终端100还可以设置三个、四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。
在本申请中,麦克风170C可以用于接收来自用户的语音输入,也即,可以用于采集来自用户的声音信号。
耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口130,也可以是3.5mm的开放移动终端平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。
按键190包括开机键(或称电源键)、音量键等。按键190可以是机械按键,也可以是触摸式按键。终端100可以接收按键输入,产生与终端100的用户设置以及功能控制有关的键信号输入。
马达191可以产生振动提示。马达191可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏194不同区域的触摸操作,马达191也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。
指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。
SIM卡接口195用于连接SIM卡。SIM卡可以通过***SIM卡接口195,或从SIM卡接口195拔出,实现和终端100的接触和分离。终端100可以支持一个或多个SIM卡接口。SIM卡接口195可以支持Nano SIM卡、Micro SIM卡、SIM卡等。同一个SIM卡接口195可以同时***多张卡。所述多张卡的类型可以相同,也可以不同。SIM卡接口195也可以兼容不同类型的SIM卡。SIM卡接口195也可以兼容外部存储卡。终端100通过SIM卡和网络交互,实现通话以及数据通信等功能。在一些实施例中,终端100采用eSIM,即:嵌入式SIM卡。eSIM卡可以嵌在终端100中,不能和终端100分离。
本申请示意的结构并不构成对终端100的具体限定。在另一些实施例中,终端100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
为便于理解本申请实施例提供的人机交互方法,下面将对适用于本申请实施例提供的人机交互方法的场景进行说明。可理解地,本申请实施例描述的应用场景是为了更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定。
图2是适用于本申请实施例提供的方法的场景示意图。如图2所示,用户可以通过语音输入希望终端执行的操作,以实现与终端(图2中以手机为例)的交互。在某些场景中,语音交互成为重要且常用的人机交互方式之一。例如,用户驾驶车辆的过程中,可以通过语音实现与车机(终端的一示例)的交互。目前,用户可以先通过预设的唤醒词唤醒终端,更为详细地,用户可以先通过预设的唤醒词唤醒终端中的语音助手(或智慧助手、智能助手等,本申请对此不作限定),进而实现后续的交互,这种方式比较繁琐,进而用户体验不佳。还有一部分厂家提供了免唤醒功能,也即,用户无需预先唤醒语音助手,直接通过预定义的免唤醒指令即可实现与终端的交互。但是预定义的免唤醒指令固定且有限,如果用户语音输入的免唤醒指令不准确,则终端无响应,用户体验不佳。
下面将结合图3和图4详细描述上述两种已知的人机交互方法。
图3示出了一种已知的人机交互方法。如图3所示,用户预先通过预设的唤醒词唤醒终端,更为详细地,用户先通过预设的唤醒词唤醒终端中的语音助手,如图3中示出的,唤醒词为“小艺小艺”,响应于用户通过语音输入“小艺小艺”的操作,语音助手回复“我在”。接着,用户通过语音输入“导航去地点A”,响应于用户语音输入“导航去地点A”的操作,语音助手回复“好的,开始为你导航”,并通过用户界面显示前往地点A的路线。可以看出,整个交互过程比较繁琐,导致用户体验不佳。
图4示出了另一种已知的人机交互方法。如图4所示,用户可以直接语音输入预定义的免唤醒指令,实现与终端的交互。例如,用户语音输入“导航去公司”,响应于用户语音输入“导航去公司”的操作,终端通过用户界面显示前往公司的路线,其中,终端上预存有该用户公司的地点。如果用户语音输入其他相似意图(或语义)的语句,如“出发去工作”、“我想去工作”、“我想去公司”等,语音助手均不会做出响应。总的来说,预定义的免唤醒指令固定且有限,很有可能导致语音助手无法响应用户的语音输 入,导致用户体验不佳。
为提高用户的人机交互体验,本申请提供了一种人机交互方法,该方法包括:终端在接收到的来自用户的第一语音输入与预定义的第一免唤醒指令语义相似的情况下,对第一语音输入做出相应的响应,也就是说,在未预先唤醒终端的情况下,即使用户语音输入的语句不是预定义的第一免唤醒指令,只要与预定义的第一免唤醒指令语义相似,终端便可以识别并响应,有利于缓解预定义的第一免唤醒指令固定且有限导致的终端无响应的问题,进而有利于提高用户的语音交互体验。
为便于清楚描述本申请实施例的技术方案,首先做出如下说明。
第一,在本申请的实施例中,采用了“第一”、“第二”等字样对功能和作用基本相同的相同项或相似项进行区分。例如,第一语音输入和第二语音输入仅仅是为了区分不同的语音输入,并不对其先后顺序进行限定。本领域技术人员可以理解“第一”、“第二”等字样并不对数量和位置进行限定,并且“第一”、“第二”等字样也并不限定一定不同。
第二,在本申请中,“至少一项(个)”是指一项(个)或者多项(个)。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系,但并不排除表示前后关联对象是一种“和”的关系的情况,具体表示的含义可以结合上下文进行理解。
第三,在本申请中,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、***、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
下面将结合具体的实施例详细描述本申请提供的人机交互方法。
应理解,下文所示的实施例可以由终端执行,或者,也可以由配置在终端中的部件(如芯片、芯片***等)执行,或者,还可以由能够实现全部或部分终端功能的逻辑模块或软件实现,本申请实施例对此不作限定。该终端可以具有如图1所示的结构,或具有比图1更多或更少的结构,本申请实施例对此不作限定。
图5是本申请实施例提供的第一种人机交互方法的示意性流程图。如图5所示,方法500可以包括步骤501和步骤502。下面将详细描述图5所示的各个步骤。
步骤501,接收来自用户的第一语音输入。
其中,该第一语音输入可以是用户未预先唤醒终端的情况下,终端接收到的来自用户的语音输入。
示例性地,响应于用户的语音操作,接收来自用户的第一语音输入,该第一语音输入例如可以是“导航去地点A”、“导航去公司”、“出发去工作”、“播放歌曲B”、“我想听歌曲B”等,本申请实施例对第一语音输入的具体内容不作任何限定。
步骤502,在第一语音输入与预定义的第一免唤醒指令语义相似的情况下,对第一语音输入做出相应的响应。
其中,第一免唤醒指令用于在不输入预设的唤醒词的情况下指示终端执行第一免唤醒指令对应的操作。
一种可能的实现方式是,终端接收到来自用户的第一语音输入后,基于自然语言处理(natural language processing,NLP),对第一语音输入和预定义的第一免唤醒指令做语义分析,在第一语音输入与预定义的第一免唤醒指令语义相似的情况下,对第一语音输入做出相应的响应。
另一种可能的实现方式是,终端接收到来自用户的第一语音输入后,判断第一语音输入是否属于基于对语音输入学习得到的第二免唤醒指令,其中,基于对语音输入学习得到的第二免唤醒指令与预定义的第一免唤醒语义相似,也就是说,第二免唤醒指令与预定义的第一免唤醒指令语义相似,但用语不同。例如,预定义的第一免唤醒指令为“导航去公司”,基于对语音输入学习得到的第二免唤醒指令为“出发去工作”,二者语义相似,但用语不同,第二免唤醒指令更口语化,第一免唤醒指令是标准的人机交互用语。在第一语音输入属于基于对语音输入学习得到的第二免唤醒指令的情况下(也即,第一语音输入与预定义的第一免唤醒指令语义相似),终端对第一语音输入做出相应的响应。
上述两种可能的实现方式可以只使用一种,也可以结合使用。当二者结合使用时,例如,终端接收到第一语音输入后,可以先判断第一语音输入是否属于基于对语音输入学习得到的第二免唤醒指令,如 果属于,则对其做出响应。如果不属于,则进一步基于NLP,对第一语音输入和预定义的第一免唤醒指令做语义分析。如果第一语音输入与预定义的第一免唤醒指令语义相似,则终端对第一语音输入做出相应的响应;如果不相似,则终端不对第一语音输入做出相应的响应。
应理解,上述预定义的第一免唤醒指令和/或基于对语音输入学习得到的第一免唤醒指令可以存储于指令库中。终端在接收到第一语音输入之后,基于指令库中存储的第一免唤醒指令和第二免唤醒指令,确定是否对其做出相应的响应。如果第一语音输入与第一免唤醒指令语义相似,则终端对第一语音输入做出相应的响应。
可选地,在对第一语音输入做出相应的响应之前,上述方法还包括:向用户确认第一语音输入的语义。
示例性地,终端接收到来自用户的第一语音输入后,在第一语音输入与预定义的第一免唤醒指令语义相似的情况下,向用户询问上述语义是否正确,如果用户回复上述语义正确,则终端对上述第一语音输入做出相应的响应。
其中,终端可以通过语音播报的方式向用户询问语义是否正确,还可以通过提示框(如toast)向用户询问语义是否正确,上述提示框中包含上述第一语音输入的语义,或者,还可以通过提示框(如toast)加上语音播报的方式向用户询问语义是否正确。本申请实施例对终端向用户询问语义时所使用的方式不作限定。
可选地,上述方法还包括:向用户提示第一免唤醒指令。也就是说,终端处理执行第一语音输入所指示的操作(如导航去公司)外,还可以向用户提示下一次可以直接使用预定义的第一免唤醒指令来指示终端执行相应的操作。
图6是本申请实施例提供的第一种人机交互方法的交互示意图。如图6所示,响应于用户语音输入“出发去工作”的操作,终端询问“是要导航去公司吗”,用户语音回复“是的”,响应于用户的回复,终端通过用户界面显示前往公司的路线。其中,图6中所示的终端通过语音播报的方式询问用户“是要导航去公司吗”仅为示例,不应对本申请实施例构成任何限定。在其他的实施例中,终端还可以通过提示框(如toast)询问用户“是要导航去公司吗”,或者,还可以通过提示框(如toast)加上语音播报的方式询问用户“是要导航去公司吗”。
可选地,终端还可以通过提示框和/语音播报的方式,向用户提示下次可以直接使用预定义的第一免唤醒指令。如图6中所示的,终端通过语音播报的方式,提示用户“下次试试说导航去公司”。
下面将详细描述终端基于对语音输入的学习得到第二免唤醒指令的过程。
可选地,在接收来自用户的第一语音输入之前,上述方法还包括:接收来自用户的第二语音输入;在第二语音输入与第一免唤醒指令语义相似的情况下,向用户确认第二语音输入的语义;响应于用户确认第二语音输入的语义的操作,生成与第二语音输入对应的第二免唤醒指令。
示例性地,终端接收到来自用户的第二语音输入后,判断预定义的第一免唤醒指令中是否包含与上述第二语音输入语义相似的指令,例如,可以基于NLP对二者做语义分析,如果确定上述第二语音输入与某一预定义的第一免唤醒指令具有相似的语义,则向用户询问上述语义是否正确,如果用户回复上述语义正确,则终端生成与第二语音输入对应的第二免唤醒指令。另外,终端还可以将上述第二语音输入保存在指令库中。
可选地,接收来自用户的第二语音输入,包括:在预设时长范围内连续多次接收到上述第二语音输入。也即,如果终端在预设时长范围内连续多次接收到上述第二语音输入,终端再向用户确认第二语音输入的语义。这样一来,可以有效地避免用户聊天对话中误提及上述第二语音输入导致终端做出响应,进而有利于提高用户的体验。
示例性地,用户在1分钟内连续两次说出“出发去工作”,“出发去工作”与预定义的第一免唤醒指令“导航去公司”语义相似,则终端连续接收到上述语音输入后,通过提示框和/或语音播报的方式(例如可以参看图6所示),向用户询问“是要导航去公司吗”,并响应于用户的确认操作,通过用户界面显示前往公司的路线。另外,终端还可以通过提示框和/语音播报的方式,向用户提示下次可以直接使用第一免唤醒指令。如图6中所示的,终端通过语音播报的方式,提示用户“下次试试说导航去公司”。
图7是本申请实施例提供的对语音输入的用语进行学习的流程示意图。
步骤701,接收来自用户的第二语音输入。
响应于用户的语音操作,终端接收到来自用户的第二语音输入。例如,第二语音输入包括:“出发去工作”、“路上堵车吗”、“避开拥堵的道路”、“选择一条畅通的道路”等等,此处不再一一列举。
步骤702,判断第二语音输入与预定义的第一免唤醒指令是否语义相似。
终端接收到来自用户的第二语音输入后,判断预定义的第一免唤醒指令中是否包含与上述第二语音输入语义相似的指令,如果确定预定义的第一免唤醒指令中不包含与上述第二语音输入语义相似的指令,则执行步骤703,即不响应该第二语音输入;如果上述第二语音输入与某一第一免唤醒指令语义相似,则执行步骤704,即,向用户询问上述第二语音输入是否是上述语义。
步骤703,不响应该第二语音输入。
步骤704,向用户询问上述第二语音输入是否是上述语义。
如果用户回复上述第二语音输入不是上述语义,则终端不响应上述第二语音输入;如果用户回复上述第二语音输入是上述语义,则终端执行步骤705。
终端可以通过语音播报的方式询问用户,还可以通过提示框(如toast)询问用户,或者,还可以通过提示框(如toast)加上语音播报的方式询问用户。本申请对终端的询问方式不作任何限定。
步骤705,生成第二免唤醒指令,并响应第二语音输入。
如果用户回复上述第二语音输入是上述语义,则终端将上述第二语音输入确定为第二免唤醒指令,保存在指令库中,并响应上述第二语音输入。
可选地,终端还可以通过提示框和/语音播报的方式,向用户提示下一次可以直接使用第一免唤醒指令。关于图7所示的方法的示例可以参看步骤502的相关示例,此处不再列举。
图8是本申请实施例提供的对语音输入的用语进行学习的又一流程示意图。图8所示的方法是终端连续多次接收到第二语音输入后,再触发向用户的询问的方法。
步骤801,接收来自用户的第二语音输入。
响应于用户的语音操作,终端接收到来自用户的第二语音输入。例如,第二语音输入包括:“出发去工作”、“路上堵车吗”、“避开拥堵的道路”、“选择一条畅通的道路”等等,此处不再一一列举。
步骤802,判断是否连续多次接收到第二语音输入。
终端接收到来自用户的第二语音输入后,判断在预设时长范围内是否连续多次接收到上述第二语音输入。如果终端在预设时长范围内连续多次接收到上述第二语音输入,再执行步骤804,否则,终端执行步骤803,也即不响应该第二语音输入。
步骤803,不响应该第二语音输入。
步骤804,向用户询问上述第二语音输入是否是上述语义。
如果用户回复上述第二语音输入不是上述语义,则终端不响应上述第二语音输入;如果用户回复上述第二语音输入是上述语义,则终端执行步骤805。
终端可以通过语音播报的方式询问用户,还可以通过提示框(如toast)询问用户,或者,还可以通过提示框(如toast)加上语音播报的方式询问用户。本申请对终端的询问方式不作任何限定。
步骤805,生成第二免唤醒指令,并响应上述第二语音输入。
如果用户回复上述第二语音输入是上述语义,则终端将上述第二语音输入确定为第二免唤醒指令,保存在指令库中,并响应上述第二语音输入。
可选地,终端还可以通过提示框和/语音播报的方式,向用户提示下一次可以直接使用预定义的第一免唤醒用语。
基于上述技术方案,终端接收来自用户的第一语音输入,在上述第一语音输入与预定义的第一免唤醒指令语义相似的情况下,对第一语音输入做出相应的响应,也即,在未预先唤醒终端的情况下,即使用户语音输入的语句不是预定义的第一免唤醒指令,只要与预定义的第一免唤醒指令语义相似,终端便可以做出响应,有利于解决预定义的第一免唤醒指令固定且有限导致的终端无响应的问题,进而有利于提高用户的交互体验。
图9是本申请实施例提供的第二种人机交互方法的示意性流程图。
如图9所示,方法900可以包括步骤901和步骤902。下面将详细描述图9所示的各个步骤。
步骤901,接收来自用户的第一语音输入。
其中,该第一语音输入可以是用户未预先唤醒终端的情况下,终端接收到的来自用户的语音输入。
示例性地,响应于用户的语音操作,接收来自用户的第一语音输入,该第一语音输入例如可以是“导航去地点A”、“出发去地点A”、“我想去地点A”、“播放歌曲B”、“我想听歌曲B”等,本申请实施例对第一语音输入的具体内容不作任何限定。
步骤902,在未接收到预设的唤醒词,但第一语音输入包含目标对象的情况下,对第一语音输入做出相应的响应。
其中,上述目标对象是在第一语音输入之前接收到的其他语音输入中被提及次数达到预设门限的对象,上述预设的唤醒词用于唤醒终端,更为详细地,上述预设的唤醒词用于唤醒终端中的语音助手(或智慧助手、智能助手等,本申请对此不作限定)。
换言之,终端在未预先被唤醒的情况下,接收到第一语音输入时,若第一语音输入中包含目标对象,则对第一语音输入做出相应的响应;若第一语音输入中不包含目标对象,则不对第一语音输入做出响应。
可选地,上述目标对象例如可以是地点、媒体名(如歌曲名)或艺术家名等,本申请对目标对象的具体内容不作限定。
下面将详细描述确定目标对象的过程,也即,对语音输入中的第一对象进行学习的过程。
可选地,在接收来自用户的第一语音输入之前,上述方法还包括:接收来自用户的预设的唤醒词;接收来自用户的第二语音输入;在第二语音输入中包含的第一对象在第二语音输入及其之前的语音输入中被提及的次数超过预设门限的情况下,将第一对象确定为目标对象。
其中,第一对象例如可以是地点、媒体名(如歌曲名)或艺术家名等,本申请对目标对象的具体内容不作限定。
示例性地,终端基于接收到的来自用户的预设的唤醒词,被唤醒后,接收到来自用户的第二语音输入时,判断该第二语音输入中是否包含第一对象,在上述第二语音输入中包含第一对象的情况下,判断上述第一对象被提及次数,如果上述第一对象在当前第二语音输入及其之前接收到的语音输入中被提及次数超过预设门限,则将上述第一对象确定为目标对象,以便于下次用户直接说出包含上述目标对象的语音输入时,终端可以做出相应的响应。例如,终端将地点A确定为目标对象,则下一次无需预先唤醒终端,用户直接语音输入“导航去地点A”,终端接收到上述语音输入后,确定上述语音输入中包含地点A,则通过用户界面展示前往地点A的路线。这样一来,下次用户无需预先唤醒终端,简化了交互过程,有利于提高用户的体验。
另外,终端可以记录第一对象在语音输入中被提及的次数,每被提及一次,其对应的次数加1。
可选地,终端还可以基于目标对象,生成包含目标对象的免唤醒指令;向用户提示免唤醒指令,以便于用户下次可以直接使用上述免唤醒指令来控制终端执行对应的操作。
其中,终端可以通过提示框和/语音播报的方式,向用户提示上述免唤醒指令。
示例性地,终端向用户语音提示“下次试试直接说导航去地点A”,其中,地点A为目标对象。
图10是本申请实施例提供的第二种人机交互方法的交互示意图。如图10所示,响应于用户语音输入“小艺小艺”的操作,终端回复“我在”,也即,终端被唤醒。进一步地,响应于用户语音输入“导航去地点A”的操作,终端回复“好的,开始为你导航”,终端通过用户界面显示前往地点A的路线。如果地点A在当前语音输入及其之前的语音输入中出现的次数超过预设门限,则终端可以通过语音播报的方式提示用户“下次试试直接说导航去地点A”。也就是说,下次用户无需预先唤醒终端,直接语音输入“导航去地点A”,终端即可通过用户界面显示前往地点A的路线。
图11是本申请实施例提供的对第二语音输入中的第一对象进行学习的流程示意图。
步骤1101,接收来自用户的预设的唤醒词。
上述预设的唤醒词用于唤醒终端,更为详细地,用于唤醒终端中的语音助手。
步骤1102,接收来自用户的第二语音输入。
终端被唤醒后,接收来自用户的第二语音输入。例如,第二语音输入包括:“导航去地点A”、“出发去地点A”、“我想去地点A”等等,此处不再一一列举。
步骤1103,判断上述第二语音输入是否包含第一对象。
其中,第一对象例如包括但不限于:地点、媒体名(如歌曲名)或艺术家名等。
示例性地,终端接收到第二语音输入后,判断该第二语音输入中是否包含第一对象(如地点A)。如果该第二语音输入中不包含第一对象,则执行步骤1104;如果该第二语音输入中包含第一对象,则 执行步骤1105。
步骤1104,响应该第一语音输入。
步骤1105,判断第一对象在当前语音输入及其之前接收到的语音输入中被提及次数是否超过预设门限。
如果第一对象在当前语音输入及其之前接收到的其他语音输入中被提及次数未超过预设门限,则执行步骤1104,即,响应该第二语音输入;如果第一对象在当前语音输入之前接收到的其他语音输入中被提及次数超过预设门限,则执行步骤1106。
步骤1106,将第一对象确定为目标对象。
另外,终端还可以基于该目标对象生成免唤醒指令,并提示用户下次直接使用上述免唤醒指令。
可选地,终端可以通过提示框和/语音播报的方式,向用户提示下一次可以直接使用上述免唤醒指令。
基于上述技术方案,终端未被预先唤醒的情况下,接收到来自用户的第一语音输入后,若该第一语音输入中包含之前语音输入中被提及次数达到预设门限的对象,则对其做出相应的响应,也即,通过对之前语音输入的学习,保存被提及次数达到预设门限的目标对象后,只要接收到的语音输入中包含上述目标对象,即使不预先唤醒终端,终端也可以对其做出相应的响应,节省了唤醒终端的时间,简化了交互流程,有利于提高用户的交互体验。
图12是本申请实施例提供的第三种人机交互方法的示意性流程图。
如图12所示,方法1200可以包括步骤1201和步骤1202。下面将详细描述图12所示的各个步骤。
步骤1201,接收来自用户的第一语音输入,该第一语音输入属于第一指令集合,该第一指令集合中的指令与预定义的免唤醒指令语义相似。
其中,上述预定义的免唤醒指令用于在不输入预设的唤醒词的情况下指示终端执行免唤醒指令对应的操作。
一种可能的实现方式是,指令库中预存有预定义的第一指令集合和第二指令集合,第一指令集合中的指令与预定义的免唤醒指令语义相似,第二指令集合中的指令为预定义的免唤醒指令。终端接收到第一语音输入,确定该第一语音输入属于第一指令集合。
另一种可能的实现方式是,指令库中预存有预定义的第二指令集合和基于语音输入学习到的与第二指令集合中的指令对应的第一指令集合,终端接收到第一语音输入,确定该第一语音输入属于第一指令集合。其中,终端基于语音输入学习到的与第二指令集合中的指令对应的第一指令集合的方法可以参看图5和图9的相关描述,此处不再赘述。
表1是指令库中预存的第一指令集合和第二指令集合的示例。
表1
如表1所示,第二指令集合中的指令是预定义的免唤醒指令,如“查看是否拥堵”、“向下滑动”、“把页面缩小”、“导航去公司”、“导航回家”等,第一指令集合中的指令是与第二指令集合中的指令语义相似的指令,如“路上堵车吗”、“下滑”、“缩小”、“出发去工作”、“我想回家”等。可以看出,第一指令集合中的指令与第二指令集合中的指令语义相似,但用语不同,第一指令集合中的指令更口语化,第二指令集合中的指令是标准的人机交互指令。
应理解,上述指令的划分仅为示例,不应对本申请实施例构成任何限定,在其他实施例中,也可以是不同的划分形式,例如,第一指令集合可以继续划分为第一指令子集合1、第一指令子集合2,第一指令子集合2中的指令比第一指令子集合1中的指令更口语化。终端响应第一指令子集合2中的指令的条件比响应第一指令子集合1中的指令的条件更严格。
步骤1202,在满足预设条件的情况下,响应上述第一语音输入。
可选地,上述预设条件包括以下至少一项:与终端距离处于预设范围内的用户的数量不超过阈值;用户处于预定义的位置;第一语音输入所来自的用户不属于预设人群;或,接收到第一语音输入的时间落入预设时段。
其中,与终端距离处于预设范围内的用户的数量不超过阈值,也即,在与终端距离处于预设范围内的用户的数量较少的情况下,可以响应上述第一语音输入,不难理解,如果周围用户数量较少,则用户误提及第一语音输入的可能性越小,也即,用户可能确实是希望终端执行对应的操作,相对地,如果周围用户数量较多,则用户误提及第一语音输入的可能性越大。因此,上述预设条件可以有效地避免用户误提及第一语音输入导致终端响应。
用户处于预定义的位置,例如,终端响应来自距离自身最近的用户的第一语音输入,或,用户处于景区,希望终端提高服务的可能性更大等,终端响应来自用户的第一语音输入。
第一语音输入所来自的用户不属于预设人群,预设人群例如小孩、老人等,可以理解,对于预设人群,其发出的指令可能存在危险性,终端可以不对其做出响应。
接收到第一语音输入的时间落入预设时段,预设时段例如可以是上班时段(或称为通勤时段),这些时段终端可以响应上述第一语音输入,如果是其他时段,终端可以只响应预定义的免唤醒指令。
下面以上述方法应用于车为例(如,终端以车机为例),列举上述几种场景下,终端对第一语音输入的响应情况。
场景一:在车内存在一个乘客的情况下,车机响应第一语音输入;或,在车内存在多个乘客的情况下,车机不响应上述第一语音输入。
示例性地,车机可以基于车内的摄像头判断当前车内的人数,在车内存在一个乘客,也即,车内只有主驾的情况下,车机响应第一语音输入。在车内存在多个乘客的情况下,车机不响应第一语音输入。另外,车机在车内存在一个或多个乘客的情况下,均可以响应第二指令集合中的指令。这样一来,可以大大降低车内存在多个乘客的情况下,聊天对话中误唤醒车机的可能性。
场景二:在语音输入来自主驾的情况下,车机响应第一语音输入;或在语音输入来自除主驾之外的其他乘客的情况下,车机不响应第一语音输入。
示例性地,车机接收到第一语音输入后,可以基于与座椅的交互,获取到该第一语音输入是来自于主驾还是其他乘客,若该第一语音输入来自于主驾,则车机响应该第一语音输入;若该第一语音输入来自于其他乘客,则车机不响应该第一语音输入。另外,无论是来自主驾还是其他乘客的第二指令集合中的指令,车机均可以响应。
场景三:在第一语音输入所来自的用户不属于预设人群的情况下,车机响应第一语音输入;或,在第一语音输入所来自的用户属于预设人群的情况下,车机不响应第一语音输入。
示例性地,车机可以判断该第一语音输入是否来自于预设人群,以小孩为例,如果该第一语音输入来自于小孩,则车机不响应上述第一语音输入,如果该第一语音输入不是来自于小孩,则车机响应上述第一语音输入,这样一来,可以有效地避免小孩误说出第一语音输入导致的车机做出响应的情况。
场景四:在接收到语音输入的时间落入预设时段的情况下,车机响应第一语音输入;或,在接收到语音输入的时间未落入预设时段的情况下,车机不响应第一语音输入。
示例性地,预设时段以上班时段为例,车机如果在上班时段内接收到第一语音输入,则可以响应上述第一语音输入;如果在非上班时段内接收到上述第一语音输入,可以不响应上述第一语音输入。
应理解,上文所述的几个场景中,在终端确定响应第一语音输入的情况下,可以先向用户确定语音输入的语义,响应于用户确认上述语义的操作,响应上述第一语音输入。
还应理解,上述几种可能的场景也可以结合,例如,车机在第一语音输入来自于主驾,且接收到语音输入的时间落入预设时段的情况下,响应上述第一语音输入。又例如,车机在车内只有一个乘客,且接收到第一语音输入的时间落入预设时段的情况下,响应上述第一语音输入。为了简洁,此处不再一一列举。
图13是本申请实施例提供的根据场景确定是否响应第一语音输入流程示意图。图13所述的方法是场景二和场景四结合的情况。
步骤1301,接收来自用户的第一语音输入。
在未预先唤醒车机的情况下,响应于用户输入第一语音输入的操作,车机接收到来自用户的第一语 音输入。该第一语音输入属于第一指令集合,该第一指令集合中的指令与预定义的免唤醒指令语义相似。
步骤1302,判断第一语音输入是否来自于主驾。
车机接收到第一语音输入之后,判断该第一语音输入是否来自于主驾,若上述第一语音输入不是来自于主驾,则车机执行步骤1303;若上述第一语音输入来自于主驾,则执行步骤1304。
步骤1303,不响应该第一语音输入。
若上述第一语音输入不是来自于主驾,则车机不响应上述第一语音输入。另外,车机可以响应来自用户的第二指令集合中的指令。
步骤1304,判断接收到第一语音输入的时间是否落入预设时段。
若上述第一语音输入来自于主驾,则车机继续判断接收到第一语音输入的时间是否落入预设时段。如果接收到第一语音输入的时间落入预设时段,则车机可以执行步骤1305;若接收到第一语音输入的时间未落入预设时段,则车机可以执行步骤1306。
步骤1305,响应该第一语音输入。
如果接收到第一语音输入的时间落入预设时段,则车机可以响应该第一语音输入。
步骤1306,响应该第一语音输入,但需要向用户询问。
若接收到第一语音输入的时间未落入预设时段,则车机可以响应该第一语音输入,但是在响应该第一语音输入之前需要向用户确认该第一语音输入的语义,在用户确认语义的情况下,再响应上述第一语音输入。
基于上述技术方案,终端接收到与预定义的免唤醒指令语义相似的第一语音输入后,在满足预设条件的情况下,响应第一语音输入,也就是说,对于与预定义的免唤醒指令语义相似的第一语音输入,满足预设条件,终端才会做出相应的响应,并不是任何情况下都能响应,这样可以避免用户误提及第一语音输入导致终端响应。可以想象,第一语音输入可能相对预定义的免唤醒指令来说比较口语化,如果任何情况下都做出响应,很可能出现用户交谈过程中频繁触发终端响应的情况,因此,通过设置预设条件,在满足预设条件的情况下,终端才会做出相应的响应,有利于大大提高用户的交互体验。
图14是本申请实施例提供的第四种人机交互方法的流程示意图。
如图14所示,该方法1400可以包括步骤1401和步骤1402。下面将详细描述图14所示的各个步骤。
步骤1401,接收来自用户的第一语音输入,该第一语音输入不属于预定义的免唤醒指令。
其中,该第一语音输入可以是用户未预先唤醒终端的情况下,终端接收到的来自用户的语音输入。
示例性地,响应于用户的语音操作,接收来自用户的第一语音输入,该第一语音输入例如可以是“出发去公司”、“导航到公司”、“出发去工作”等,本申请实施例对语音输入的具体内容不作任何限定。
步骤1402,在上述第一语音输入与预定义的免唤醒指令中的第一免唤醒指令语义相似的情况下,引导用户输入第一免唤醒指令。
终端确定出上述第一语音输入与第一免唤醒指令具有相似的语义,则引导用户输入第一免唤醒指令,以便于终端对上述第一免唤醒指令做出响应。
其中,终端可以基于自然语言处理中的语义分析确定出第一语音输入与第一免唤醒指令语义相似。
示例性地,上述语音输入为“出发去工作”,与其具有相似语义的第一免唤醒指令为“导航去公司”,终端接收到“出发去工作”的语音输入后,确定在上述语音输入不属于预定义的免唤醒指令,并识别出上述语音输入的语义与“导航去公司”相似。因此,终端可以引导用户说出“导航去公司”。
可选地,上述引导用户输入第一免唤醒指令,包括:通过提示框和/或语音播报,引导用户输入第一免唤醒指令。
终端可以通过提示框引导用户发出第一免唤醒指令。例如,终端确定出上述语音输入与指令库中的第一免唤醒指令具有相似的语义之后,通过提示框在用户界面上显示上述第一免唤醒指令。终端还可以通过语音播报的方式引导用户发出第一免唤醒指令。例如,终端确定出上述语音输入与指令库中的第一免唤醒指令具有相似的语义之后,语音提醒用户使用上述第一免唤醒指令。终端可以通过提示框加上语音播报的方式,引导用户发出第一免唤醒指令。
例如,终端确定出上述语音输入与指令库中的第一免唤醒指令具有相似的语义之后,先通过提示框在用户界面上显示上述第一免唤醒指令,如果预设时长范围内用户仍未发出上述第一免唤醒指令,则终 端语音提醒用户使用上述第一免唤醒指令,或,提示框在用户界面上显示上述第一免唤醒指令,同时语音提醒用户使用上述第一免唤醒指令。本申请对终端的引导方式不作限定。
可选地,通过提示框和/或语音播报,引导用户发出第一免唤醒指令,包括:通过提示框提示用户发出第一免唤醒指令,提示框中包含第一免唤醒指令;在预设时长范围内通过提示框提示的次数达到预设门限,但用户未发出第一免唤醒指令的情况下,通过语音播报,引导用户发出第一免唤醒指令。
示例性地,终端第一次通过提示框提示用户发出第一免唤醒指令,提示框中包含第一免唤醒指令,第二次还是通过提示框提示用户发出第一免唤醒指令,在1分钟内通过提示框提示的次数达到两次,但用户未发出第一免唤醒指令的情况下,通过语音播报,引导用户发出第一免唤醒指令。
图15是本申请实施例提供的引导用户发出第一免唤醒指令的交互示意图。
响应于用户语音输入“出发去工作”的操作,终端确定出该语音输入与指令库中的第一免唤醒指令“导航去公司”具有相似的语义,因此,第一次通过提示框提示用户“试试说导航去公司”,第二次用户还是使用的“出发去工作”,终端继续通过提示框提示用户“试试说导航去公司”,第三次用户还是使用的“出发去工作”,终端则通过提示框提示用户“试试说导航去公司”,并通过语音提示用户“试试说导航去公司”。
基于上述技术方案,终端接收到第一语音输入,该第一语音输入不属于预定义的免唤醒指令,但该第一语音输入与预定义的免唤醒指令中的第一免唤醒指令语义相似,则终端引导用户输入对应的第一免唤醒指令,以便于用户输入第一免唤醒指令后,终端对其做出相应的响应,相比于终端不响应也不提示,可以大大提高用户的交互体验。
图16是本申请实施例提供的第五种人机交互方法的示意性流程图。
如图16所示,该方法可以包括步骤1601至步骤1605。下面将详细描述图16所示的各个步骤。
步骤1601,接收来自用户的第一语音输入。
响应于用户输入第一语音输入的操作,终端接收到来自用户的第一语音输入。该第一语音输入是在未接收到来自用户的预设的唤醒词的情况下接收的,例如,第一语音输入包括:“导航去地点A”、“我想去地点A”、“地点A在哪里”等等,此处不再一一列举。
步骤1602,判断上述第一语音输入是否是用于请求导航。
换言之,终端接收到上述第一语音输入后,判断该第一语音输入的意图,是不是用于请求导航。如果该第一语音输入不是用于请求导航,则终端执行步骤1603;如果该第一语音输入是用于请求导航,则终端执行步骤1604。
步骤1603,不响应该第一语音输入。
步骤1604,向用户询问请求导航的目的地。
如果该第一语音输入是用于请求导航,则终端向用户询问请求导航的目的地。例如,该语音输入为“导航去地点A”,则终端接收确定出该第一语音输入用于导航,进一步地,终端向用户询问导航的目的地,如向用户询问“你想去哪里”。用户反馈“地点A”,则终端接收到用户的反馈后,从云端获取到地点A的路线。
终端可以通过语音播报的方式询问用户,还可以通过提示框(如toast)询问用户,或者,还可以通过提示框(如toast)加上语音播报的方式询问用户。本申请对终端的询问方式不作任何限定。例如,终端前两次通过提示框(如toast)询问用户,第三次通过提示框(如toast)加上语音播报的方式询问用户。
步骤1605,基于用户反馈的目的地,为用户提供导航服务。
终端获取到上述目的地的路线后,为用户提供导航服务。例如,通过用户界面显示目的地的路线。
可选地,终端还可以基于该目的地,生成包含上述目的地的免唤醒指令,终端还可以通过提示框和/语音播报的方式,向用户提示下一次可以直接使用上述免唤醒指令。
图17是本申请实施例提供的第五种人机交互方法的交互示意图。如图17所示,响应于用户语音输入“导航去地点A”的操作,终端通过语音播报的方式向用户询问“你要去哪里”。用户回复“地点A”,响应于用户的回复,终端通过用户界面向用户展示前往地点A的路线。
基于上述技术方案,在未预先唤醒终端的情况下,终端接收到来自用户的第一语音输入后,发现其意图是想请求导航,便可以向用户询问导航的目的地,并根据用户反馈的目的地,向用户提供导航服务, 无需预先唤醒终端,简化了交互流程,有利于提高用户的交互体验。
本申请实施例还提供了一种终端,该终端包括用于执行上述图5至图17所述实施例中任意一个实施例中终端所执行的步骤的相应的模块。该终端可以用于实现上述图5至图17所述实施例中任意一个实施例中所述的方法。该终端包括的模块可以通过软件和/或硬件方式实现。
本申请实施例还提供一种终端,该终端包括存储器和处理器,其中,存储器用于存储计算机程序,处理器用于调用并执行计算机程序,以使得该终端实现上述图5至图17所述实施例中任意一个实施例中所述的方法。
本申请实施例还提供一种车辆,该车辆上部署有如前所述的终端,所述终端例如可以是车机。
本申请还提供了一种芯片***,所述芯片***包括至少一个处理器,用于实现上述图5至图17所述实施例中任意一个实施例中所述的方法。
在一种可能的设计中,所述芯片***还包括存储器,所述存储器用于保存程序指令和数据,存储器位于处理器之内或处理器之外。
该芯片***可以由芯片构成,也可以包含芯片和其他分立器件。
本申请还提供一种计算机程序产品,所述计算机程序产品包括计算机可读指令,当所述计算机可读指令被计算机运行时,实现上述图5至图17所述实施例中任意一个实施例中所述的方法。
本申请还提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令。当所述计算机可读指令被计算机运行时,实现上述图5至图17所述实施例中任意一个实施例中所述的方法。
应理解,本申请实施例中的处理器可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器可以是通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或者其它可编程逻辑器件、分立门电路或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。
还应理解,本申请实施例中的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlinkDRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。应注意,本文描述的***和方法的存储器旨在包括但不限于这些和任意其它适合类型的存储器。
本说明书中使用的术语“单元”、“模块”等,可用于表示计算机相关的实体、硬件、固件、硬件和软件的组合、软件、或执行中的软件。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各种说明性逻辑块(illustrative logical block)和步骤(step),能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。在本申请所提供的几个实施例中,应该理解到,所揭露的装置、设备和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个***,或一些特征可 以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分立部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
在上述实施例中,各功能单元的功能可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令(程序)。在计算机上加载和执行所述计算机程序指令(程序)时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其它可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,数字通用光盘(digital video disc,DVD))、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。

Claims (23)

  1. 一种人机交互方法,其特征在于,包括:
    接收来自用户的第一语音输入;
    在确定所述第一语音输入与预定义的第一免唤醒指令语义相似的情况下,对所述第一语音输入做出相应的响应,所述第一免唤醒指令用于在不输入预设的唤醒词的情况下指示终端执行所述第一免唤醒指令对应的操作。
  2. 如权利要求1所述的方法,其特征在于,在所述接收来自用户的第一语音输入之前,所述方法还包括:
    接收来自所述用户的第二语音输入;
    在所述第二语音输入与所述第一免唤醒指令语义相似的情况下,向所述用户确认所述第二语音输入的语义;
    响应于所述用户确认所述第二语音输入的语义的操作,生成与所述第二语音输入对应的第二免唤醒指令。
  3. 如权利要求2所述的方法,其特征在于,所述第一语音输入与预定义的第一免唤醒指令语义相似,包括:
    所述第一语音输入与所述第二免唤醒指令相同。
  4. 如权利要求2或3所述的方法,其特征在于,所述接收来自用户的第二语音输入,包括:
    在预设时长范围内连续多次接收到所述第二语音输入。
  5. 如权利要求1至4中任一项所述的方法,其特征在于,在对所述第一语音输入做出相应的响应之前,所述方法还包括:
    向所述用户确认所述第一语音输入的语义。
  6. 如权利要求5所述的方法,其特征在于,所述向所述用户确认所述第一语音输入的语义,包括:
    通过提示框和/或语音播报,向所述用户确认所述第一语音输入的语义。
  7. 如权利要求1至6中任一项所述的方法,其特征在于,所述方法还包括:
    向所述用户提示所述第一免唤醒指令。
  8. 一种人机交互方法,其特征在于,包括:
    接收来自用户的第一语音输入;
    在未接收到预设的唤醒词,但所述第一语音输入包含目标对象的情况下,对所述第一语音输入做出相应的响应,所述目标对象是在所述第一语音输入之前接收到的其他语音输入中被提及次数达到预设门限的对象,所述预设的唤醒词用于唤醒终端。
  9. 如权利要求8所述的方法,其特征在于,在所述接收来自用户的第一语音输入之前,所述方法还包括:
    接收来自所述用户的预设的唤醒词;
    接收来自所述用户的第二语音输入;
    在所述第二语音输入中包含的第一对象在所述第二语音输入及其之前的语音输入中被提及的次数超过所述预设门限的情况下,将所述第一对象确定为目标对象。
  10. 如权利要求9所述的方法,其特征在于,所述方法还包括:
    基于所述目标对象,生成包含所述目标对象的免唤醒指令;
    向所述用户提示所述免唤醒指令。
  11. 一种人机交互方法,其特征在于,包括:
    接收来自用户的第一语音输入,所述第一语音输入属于第一指令集合,所述第一指令集合中的指令与预定义的免唤醒指令语义相似;
    在满足预设条件的情况下,响应所述第一语音输入。
  12. 如权利要求11所述的方法,其特征在于,所述预设条件包括以下至少一项:
    与终端距离处于预设范围内的用户的数量不超过阈值;
    用户处于预定义的位置;
    所述第一语音输入所来自的用户不属于预设人群;或,
    接收到所述第一语音输入的时间落入预设时段。
  13. 如权利要求12所述的方法,其特征在于,所述方法应用于车,所述与终端距离处于预设范围内的用户的数量不超过阈值,包括:所述车内存在一个乘客;或,
    所述用户处于预定义的位置,包括:所述用户处于主驾的位置。
  14. 一种人机交互方法,其特征在于,包括:
    在未接收到来自用户的预设的唤醒词的情况下,根据来自所述用户的第一语音输入,确定所述第一语音输入用于请求导航;
    向所述用户询问请求导航的目的地;
    基于所述用户反馈的所述目的地,为所述用户提供导航服务。
  15. 如权利要求14所述的方法,其特征在于,所述方法还包括:
    生成包含所述目的地的免唤醒指令;
    向所述用户提示所述免唤醒指令。
  16. 一种人机交互方法,其特征在于,包括:
    接收来自用户的第一语音输入,所述第一语音输入不属于预定义的免唤醒指令;
    在确定所述第一语音输入与预定义的免唤醒指令中的第一免唤醒指令语义相似的情况下,引导所述用户输入所述第一免唤醒指令。
  17. 如权利要求16所述的方法,其特征在于,所述引导所述用户输入所述第一免唤醒指令,包括:
    通过提示框和/或语音播报,引导所述用户输入所述第一免唤醒指令。
  18. 如权利要求17所述的方法,其特征在于,所述通过提示框和/或语音播报,引导所述用户输入所述第一免唤醒指令,包括:
    通过所述提示框提示所述用户输入所述第一免唤醒指令,所述提示框中包含所述第一免唤醒指令;
    在预设时长范围内通过所述提示框提示的次数达到预设门限,但所述用户未发出所述第一免唤醒指令的情况下,通过所述语音播报,引导所述用户输入所述第一免唤醒指令。
  19. 一种计算机设备,其特征在于,包括用于执行如权利要求1至18中任一项所述方法的单元。
  20. 一种计算机设备,其特征在于,包括处理器和存储器,其中,
    所述存储器用于存储计算机可读指令;
    所述处理器用于读取所述计算机可读指令,以使得所述计算机设备实现如权利要求1至18中任一项所述的方法。
  21. 一种车辆,其特征在于,用于实现如权利要求1至18中任一项所述的方法;或,包括如权利要求19或20所述的计算机设备。
  22. 一种计算机可读存储介质,其特征在于,所述存储介质中存储有计算机可读指令,当所述计算机可读指令被计算机执行时,实现如权利要求1至18中任一项所述的方法。
  23. 一种计算机程序产品,其特征在于,所述计算机程序产品包括计算机可读指令,当所述计算机可读指令被计算机运行时,实现如权利要求1至18中任一项所述的方法。
PCT/CN2023/116615 2022-09-05 2023-09-01 人机交互方法及相关装置 WO2024051611A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211079452.4A CN117690423A (zh) 2022-09-05 2022-09-05 人机交互方法及相关装置
CN202211079452.4 2022-09-05

Publications (1)

Publication Number Publication Date
WO2024051611A1 true WO2024051611A1 (zh) 2024-03-14

Family

ID=90133973

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/116615 WO2024051611A1 (zh) 2022-09-05 2023-09-01 人机交互方法及相关装置

Country Status (2)

Country Link
CN (1) CN117690423A (zh)
WO (1) WO2024051611A1 (zh)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108509225A (zh) * 2018-03-28 2018-09-07 联想(北京)有限公司 一种信息处理方法及电子设备
CN108520748A (zh) * 2018-02-01 2018-09-11 百度在线网络技术(北京)有限公司 一种智能设备功能引导方法及***
CN108735216A (zh) * 2018-06-12 2018-11-02 广东小天才科技有限公司 一种基于语义识别的语音搜题方法及家教设备
WO2020073288A1 (zh) * 2018-10-11 2020-04-16 华为技术有限公司 一种触发电子设备执行功能的方法及电子设备
CN111028846A (zh) * 2019-12-25 2020-04-17 北京梧桐车联科技有限责任公司 免唤醒词注册的方法和装置
CN111354360A (zh) * 2020-03-17 2020-06-30 北京百度网讯科技有限公司 语音交互处理方法、装置和电子设备
CN111816192A (zh) * 2020-07-07 2020-10-23 云知声智能科技股份有限公司 语音设备及其控制方法、装置和设备
CN112802465A (zh) * 2019-11-14 2021-05-14 北京安云世纪科技有限公司 一种语音控制方法及***
US20210183386A1 (en) * 2019-08-15 2021-06-17 Huawei Technologies Co., Ltd. Voice Interaction Method and Apparatus, Terminal, and Storage Medium
CN114155855A (zh) * 2021-12-17 2022-03-08 海信视像科技股份有限公司 语音识别方法、服务器以及电子设备
CN114594923A (zh) * 2022-02-16 2022-06-07 北京梧桐车联科技有限责任公司 车载终端的控制方法、装置、设备及存储介质
CN115662410A (zh) * 2022-08-12 2023-01-31 安徽讯飞寰语科技有限公司 车机语音交互方法、车机
CN115705844A (zh) * 2021-08-12 2023-02-17 上海擎感智能科技有限公司 语音交互配置方法、电子设备和计算机可读介质

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108520748A (zh) * 2018-02-01 2018-09-11 百度在线网络技术(北京)有限公司 一种智能设备功能引导方法及***
CN108509225A (zh) * 2018-03-28 2018-09-07 联想(北京)有限公司 一种信息处理方法及电子设备
CN108735216A (zh) * 2018-06-12 2018-11-02 广东小天才科技有限公司 一种基于语义识别的语音搜题方法及家教设备
WO2020073288A1 (zh) * 2018-10-11 2020-04-16 华为技术有限公司 一种触发电子设备执行功能的方法及电子设备
US20210183386A1 (en) * 2019-08-15 2021-06-17 Huawei Technologies Co., Ltd. Voice Interaction Method and Apparatus, Terminal, and Storage Medium
CN112802465A (zh) * 2019-11-14 2021-05-14 北京安云世纪科技有限公司 一种语音控制方法及***
CN111028846A (zh) * 2019-12-25 2020-04-17 北京梧桐车联科技有限责任公司 免唤醒词注册的方法和装置
CN111354360A (zh) * 2020-03-17 2020-06-30 北京百度网讯科技有限公司 语音交互处理方法、装置和电子设备
CN111816192A (zh) * 2020-07-07 2020-10-23 云知声智能科技股份有限公司 语音设备及其控制方法、装置和设备
CN115705844A (zh) * 2021-08-12 2023-02-17 上海擎感智能科技有限公司 语音交互配置方法、电子设备和计算机可读介质
CN114155855A (zh) * 2021-12-17 2022-03-08 海信视像科技股份有限公司 语音识别方法、服务器以及电子设备
CN114594923A (zh) * 2022-02-16 2022-06-07 北京梧桐车联科技有限责任公司 车载终端的控制方法、装置、设备及存储介质
CN115662410A (zh) * 2022-08-12 2023-01-31 安徽讯飞寰语科技有限公司 车机语音交互方法、车机

Also Published As

Publication number Publication date
CN117690423A (zh) 2024-03-12

Similar Documents

Publication Publication Date Title
WO2021051989A1 (zh) 一种视频通话的方法及电子设备
WO2021078284A1 (zh) 一种内容接续方法及电子设备
WO2021027267A1 (zh) 语音交互方法、装置、终端及存储介质
WO2021052282A1 (zh) 数据处理方法、蓝牙模块、电子设备与可读存储介质
WO2020078337A1 (zh) 一种翻译方法及电子设备
WO2021204098A1 (zh) 语音交互方法及电子设备
WO2021000817A1 (zh) 环境音处理方法及相关装置
WO2020239013A1 (zh) 一种交互方法和终端设备
WO2020073288A1 (zh) 一种触发电子设备执行功能的方法及电子设备
WO2020006711A1 (zh) 一种消息的播放方法及终端
WO2022161077A1 (zh) 语音控制方法和电子设备
CN113488042B (zh) 一种语音控制方法及电子设备
WO2023024852A1 (zh) 一种短消息通知方法和电子终端设备
JP7234379B2 (ja) スマートホームデバイスによってネットワークにアクセスするための方法および関連するデバイス
CN113133095A (zh) 一种降低移动终端功耗的方法及移动终端
CN114115770A (zh) 显示控制的方法及相关装置
EP4221172A1 (en) Control method and apparatus for electronic device
CN113301544B (zh) 一种音频设备间语音互通的方法及设备
WO2024051611A1 (zh) 人机交互方法及相关装置
EP4354831A1 (en) Cross-device method and apparatus for synchronizing navigation task, and device and storage medium
CN113950037B (zh) 一种音频播放方法及终端设备
WO2022135254A1 (zh) 一种编辑文本的方法、电子设备和***
CN114327198A (zh) 控制功能推送方法及设备
CN113141665B (zh) 按需***消息的接收方法、装置和用户设备
WO2022068654A1 (zh) 一种终端设备交互方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23862305

Country of ref document: EP

Kind code of ref document: A1