CN112017652A - Interaction method and terminal equipment - Google Patents

Interaction method and terminal equipment Download PDF

Info

Publication number
CN112017652A
CN112017652A CN201910472665.5A CN201910472665A CN112017652A CN 112017652 A CN112017652 A CN 112017652A CN 201910472665 A CN201910472665 A CN 201910472665A CN 112017652 A CN112017652 A CN 112017652A
Authority
CN
China
Prior art keywords
input content
terminal device
content
control instruction
terminal equipment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910472665.5A
Other languages
Chinese (zh)
Inventor
胡鹏
黄德才
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201910472665.5A priority Critical patent/CN112017652A/en
Priority to PCT/CN2020/092888 priority patent/WO2020239013A1/en
Publication of CN112017652A publication Critical patent/CN112017652A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/725Cordless telephones
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Telephone Function (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The embodiment of the application provides an interaction method and terminal equipment, relates to the field of terminals, and can enhance the flexibility of a user for remotely operating intelligent voice interaction equipment and fully exert the capacity of the intelligent voice interaction equipment. The method comprises the following steps: the method comprises the steps that a first terminal device receives first input content from a second terminal device, wherein the first input content comprises first voice content and/or first text content; the first terminal equipment determines a control instruction corresponding to the first input content; and the first terminal equipment processes the control instruction corresponding to the first input content. The embodiment of the application is applied to a remote interaction scene.

Description

Interaction method and terminal equipment
Technical Field
The present application relates to the field of terminals, and in particular, to an interaction method and a terminal device.
Background
Current smart voice interaction devices (e.g., smart speakers) relate users to devices through voice. The user can be by intelligent audio amplifier direct through speech control intelligent audio amplifier, let intelligent audio amplifier broadcast music, tell a story etc.. But when the user is far away from the smart speaker (e.g., greater than 5 meters), contact with the smart speaker device is not available. That is to say, the user can use smart speaker at the nearer with smart speaker, and the user can't use smart speaker when keeping away from smart speaker, and smart speaker is in idle state this moment, causes the wasting of resources.
In order to solve the above problem, as shown in fig. 1, the smart speaker and the mobile phone may communicate through wireless fidelity (WiFi) or Internet, and a user may send an instruction to the smart speaker by operating a specific menu on an Application (APP) of the mobile phone, and control the smart speaker to perform operations such as playing music.
However, in the above scheme, the user can only perform limited operations, such as music playing, pausing, etc., that have been pre-made on the mobile phone APP, which limits the functions of the smart speaker.
Disclosure of Invention
The embodiment of the application provides an interaction method and terminal equipment, which can enhance the flexibility of a user for remotely operating intelligent voice interaction equipment and fully exert the capacity of the intelligent voice interaction equipment.
In a first aspect, an embodiment of the present application provides an interaction method, including: the method comprises the steps that a first terminal device receives first input content from a second terminal device, wherein the first input content comprises first voice content and/or first text content; the first terminal equipment determines a control instruction corresponding to the first input content; and the first terminal equipment processes the control instruction corresponding to the first input content.
Based on the method provided by the embodiment of the application, the first terminal device can receive the first input content from the second terminal device, determine the control instruction corresponding to the first input content, and process the control instruction. Therefore, the control instruction can be determined by the first terminal device based on the first input content (the content remotely input by the user through the second terminal device), and the non-user operates according to the limited instruction preset on the mobile phone APP, so that the flexibility of remote interaction between the user and the first terminal device is enhanced, and the capability of the first terminal device is fully exerted.
In one possible implementation manner, the determining, by the first terminal device, the control instruction corresponding to the first input content includes: the first terminal equipment sends first input content to the server; the first terminal equipment receives a control instruction corresponding to the first input content from the server. The server may include an Automatic Speech Recognition (ASR) engine and a Natural Language Processing (NLP) engine, where the ASR engine may convert the first speech content into the first text information, and the NLP engine may obtain the control instruction according to a semantic meaning of the first text information.
In one possible implementation manner, the processing, by the first terminal device, the control instruction corresponding to the first input content includes: the first terminal equipment executes a control instruction corresponding to the first input content; or the first terminal device sends a control instruction corresponding to the first input content to the third terminal device. The first terminal device can receive the control instruction issued by the server and forward the control instruction to the third terminal device, so that the third terminal device is controlled, the capability of the first terminal device is fully exerted, and the user experience is improved.
In one possible implementation, the method further includes: and the first terminal equipment sends a first response message corresponding to the first input content to the second terminal equipment. The first response message may include a control instruction transcribed by the server and an execution result of the smart speaker after executing the control instruction. Therefore, after the user inputs the first input content through the second terminal device, the control instruction obtained according to the first input content and the execution result aiming at the control instruction can be obtained, and the user experience is improved.
In one possible implementation manner, the processing, by the first terminal device, the control instruction corresponding to the first input content includes: the first terminal equipment receives second input content, wherein the second input content comprises second voice content and/or second text content; and if the moment when the first terminal equipment receives the second input content is earlier than the moment when the first input content is received, the first terminal equipment processes the control instruction corresponding to the second input content and then processes the control instruction corresponding to the first input content. That is, the smart speaker may follow a first-come first-executed (first-obtained control command is executed first) strategy when processing different control commands.
In a second aspect, an embodiment of the present application provides an interaction method, including: the second terminal equipment receives first input content from a user, wherein the first input content comprises first voice content and/or first text content; the second terminal equipment sends the first input content to the first terminal equipment; the second terminal equipment receives a first response message corresponding to the first input content from the first terminal equipment; and the second terminal equipment broadcasts by voice or displays the first response message on the display screen.
Based on the method provided by the embodiment of the application, after receiving the first input content, the user of the second terminal device sends the first input content to the first terminal device, and then receives a first response message corresponding to the first input content from the first terminal device, and broadcasts the first response message in a voice mode or displays the first response message on a display screen. Therefore, after the user inputs the first input content through the second terminal device, the control instruction obtained according to the first input content and the execution result aiming at the control instruction can be obtained, and the user experience is improved.
In one possible implementation, the method further includes: the second terminal equipment receives a second response message from the first terminal equipment, wherein the second response message corresponds to second input content; and the second terminal equipment broadcasts by voice or displays a second response message on the display screen.
In this way, if another user (different from the user who inputs the first input content) inputs the voice content (the second voice content) at the sound box, or another user inputs the second input content through the fourth terminal device (for example, a mobile phone or an intelligent wearable device), the user of the first terminal device can know the operation (including the control instruction corresponding to the second input content and the execution result for the control instruction) of the second terminal device by the other user, and the control capability of the user on the second terminal device is improved, so that the user experience is improved.
For technical effects of the second aspect and various possible implementations thereof, reference may be made to the technical effects of the first aspect and various possible implementations thereof, which are not described herein in detail.
In a third aspect, an embodiment of the present application provides a first terminal device, including: a receiving unit, configured to receive first input content from a second terminal device, where the first input content includes first voice content and/or first text content; the determining unit is used for determining a control instruction corresponding to the first input content; and the processing unit is used for processing the control instruction corresponding to the first input content.
In one possible implementation, the determining unit is configured to: transmitting the first input content to the server through the transmission unit; and receiving a control instruction corresponding to the first input content from the server through the receiving unit.
In one possible implementation, the processing unit is configured to: executing a control instruction corresponding to the first input content; or, the control instruction corresponding to the first input content is sent to the third terminal equipment through the sending unit.
In a possible implementation manner, the sending unit is further configured to: and sending a first response message corresponding to the first input content to the second terminal equipment.
In one possible implementation, the processing unit is configured to: receiving second input content through the receiving unit, wherein the second input content comprises second voice content and/or second text content; and if the moment of receiving the second input content by the first terminal equipment is earlier than the moment of receiving the first input content, processing the control instruction corresponding to the first input content after processing the control instruction corresponding to the second input content.
In a fourth aspect, an embodiment of the present application provides a second terminal device, including: a receiving unit for receiving first input content from a user, the first input content including first voice content and/or first text content; a transmitting unit configured to transmit first input content to a first terminal device; the receiving unit is further used for receiving a first response message corresponding to the first input content from the first terminal equipment; and the processing unit is used for voice broadcasting or displaying the first response message on the display screen.
In one possible implementation, the receiving unit is further configured to: receiving a second response message from the first terminal device, wherein the second response message corresponds to second input content; and the processing unit is also used for voice broadcasting or displaying the second response message on the display screen.
In a fifth aspect, the present application further provides an apparatus, which may be a first terminal device or a chip. The apparatus comprises a processor for implementing any one of the interaction methods provided by the first aspect. The apparatus may also include a memory for storing program instructions and data, which may be memory integrated within the apparatus or off-chip memory disposed external to the apparatus. The memory is coupled to the processor, and the processor can call and execute the program instructions stored in the memory, so as to implement any one of the interaction methods provided by the first aspect. The apparatus may also include a communication interface for the apparatus to communicate with other devices (e.g., a second terminal device).
In a sixth aspect, the present application further provides an apparatus, which may be a second terminal device or a chip. The apparatus includes a processor for implementing any one of the interaction methods provided by the second aspect. The apparatus may also include a memory for storing program instructions and data, which may be memory integrated within the apparatus or off-chip memory disposed external to the apparatus. The memory is coupled to the processor, and the processor can call and execute the program instructions stored in the memory, so as to implement any one of the interaction methods provided by the second aspect. The apparatus may also include a communication interface for the apparatus to communicate with another device (e.g., a first terminal device).
In a seventh aspect, an embodiment of the present application provides a computer-readable storage medium, which includes instructions, when executed on a computer, causing the computer to perform any one of the interaction methods provided in the first aspect or the second aspect.
In an eighth aspect, the present application provides a computer program product containing instructions, which when run on a computer, causes the computer to perform any one of the interaction methods provided in the first aspect or the second aspect.
In a ninth aspect, an embodiment of the present application provides a chip system, where the chip system includes a processor and may further include a memory, and is configured to implement any one of the interaction methods provided in the first aspect or the second aspect. The chip system may be formed by a chip, and may also include a chip and other discrete devices.
In a tenth aspect, an embodiment of the present application provides an interactive system, where the system includes the first terminal device in the third aspect and the second terminal device in the fourth aspect.
Drawings
Fig. 1 is a schematic view of a communication architecture between a smart speaker and a mobile phone in the prior art;
fig. 2 is a schematic diagram of an interaction method according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of another interaction method applicable to the embodiments of the present application;
fig. 4 is a schematic internal structure diagram of a second terminal device according to an embodiment of the present disclosure;
fig. 5 is a schematic internal structure diagram of a first terminal device according to an embodiment of the present disclosure;
fig. 6 is a schematic signal interaction diagram suitable for the interaction method provided in the embodiments of the present application;
fig. 7 is a schematic desktop diagram of a mobile phone according to an embodiment of the present application;
fig. 8 is a schematic view of an input interface of an intelligent sound box APP provided in the embodiment of the present application;
fig. 9 is a schematic view of an input interface of another smart sound box APP provided in the embodiment of the present application;
fig. 10 is a schematic internal structure diagram of another first terminal device according to an embodiment of the present application;
fig. 11 is a schematic internal structure diagram of another second terminal device according to an embodiment of the present application.
Detailed Description
The embodiment of the application provides an interaction method and terminal equipment, which are applied to an interaction system formed by first terminal equipment and second terminal equipment, and a user can remotely interact with the first terminal equipment through the second terminal equipment. For example, the method is applied to an interactive system formed by a mobile phone and a (paired) smart home product (for example, a smart sound box). The first terminal device and the second terminal device may communicate with each other through a New radio access technology (New RAT), a Long Term Evolution (LTE), Bluetooth (BT), Wifi, or other protocols, which is not limited in this application.
As shown in fig. 2, which is an architectural schematic diagram of an interaction system provided in the embodiment of the present application, the system may include a first terminal device (e.g., a mobile phone 10a), a second terminal device (e.g., a smart speaker 10b), a first network device (e.g., an internet server 11), and a second network device (e.g., a cloud server 12). The first terminal device may receive the first input content sent by the second terminal device through the internet server 11, and may send a first response message of the first input content and the like to the second terminal device through the internet server 11. The cloud server 12 may be used to parse voice content and/or text content. For example, the cloud server 12 may convert the speech content into text information by the ASR engine, and convert the text information into a control instruction by the NLP engine, so that the second terminal device responds according to the control instruction. Cloud server 12 may be a server corresponding to smart speaker APPs installed on mobile phone 10a and smart speaker 10b, or may be a third-party server corresponding to smart speaker programs integrated in other APPs, which is not limited in this application.
The first terminal device and the internet server 11, the second terminal device and the internet server 11, and the second terminal device and the cloud server 12 may communicate in a wireless communication manner, for example, the wireless communication manner may be communication through a wireless access network device (e.g., a base station). In an LTE network, the base station may be an evolved node base station (eNB). In a fifth Generation mobile communication technology (5-Generation, 5G) network, a base station may be a next Generation base station (gNB), a new radio base station (new radio eNB), a macro base station, a micro base station, a high frequency base station, or a Transmission and Reception Point (TRP), etc.
The first terminal device provided in this embodiment of the present application may be a User Equipment (UE), and for example, may be a mobile phone, a tablet computer, a desktop computer, a laptop computer, an ultra-mobile personal computer (UMPC), a handheld computer, a netbook, a Personal Digital Assistant (PDA), a vehicle-mounted terminal, and other devices. The second terminal device may be various smart home devices or UE, and the smart home devices may be, for example, a smart speaker, a smart television, a smart refrigerator, a smart washing machine, a smart rice cooker, a smart dish washer, a smart floor sweeping robot, and the like.
In one possible design, as shown in fig. 3, the interactive system may further include a third terminal device (e.g., the sweeping robot 10c), and the third terminal device is connected with the second terminal device. The third terminal device may be various smart home devices or UE, etc.
As shown in fig. 4, the second terminal device in the communication system architecture may specifically be a mobile phone 100. The handset 100 may include a processor 110, an internal memory 120, a camera 130, a display 140, a radio frequency module 150, a communication module 160, an antenna 1, an antenna 2, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, and the like.
The structure illustrated in the embodiment of the present invention is not limited to the mobile phone 100. It may include more or fewer components than shown, or combine certain components, or split certain components, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
Processor 110 may include one or more processing units, such as: the processor 110 may include an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a memory, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a Neural-Network Processing Unit (NPU), etc. The different processing units may be independent devices or may be integrated in the same processor.
The controller may be a decision maker directing the various components of the handset 100 to work in concert as instructed. Is the neural center and command center of the handset 100. The controller generates an operation control signal according to the instruction operation code and the time sequence signal to complete the control of instruction fetching and instruction execution.
A memory may also be provided in processor 110 for storing instructions and data. In some embodiments, the memory in the processor is a cache memory. Instructions or data that have just been used or recycled by the processor may be saved. If the processor needs to reuse the instruction or data, it can be called directly from the memory. Avoiding repeated accesses and reducing the latency of the processor, thereby increasing the efficiency of the system.
In some embodiments, the processor 110 may include an interface. The interface may include an integrated circuit (I2C) interface, an integrated circuit built-in audio (I2S) interface, a Pulse Code Modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a Mobile Industry Processor Interface (MIPI), a general-purpose input/output (GPIO) interface, a Subscriber Identity Module (SIM) interface, and/or a Universal Serial Bus (USB) interface.
The wireless communication function of the mobile phone 100 can be implemented by the antenna 1, the antenna 2, the rf module 150, the communication module 160, a modem, a baseband processor, and the like.
The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the handset 100 may be used to cover a single or multiple communication bands. Different antennas can also be multiplexed to improve the utilization of the antennas. For example: the cellular network antenna may be multiplexed into a wireless local area network diversity antenna. In some embodiments, the antenna may be used in conjunction with a tuning switch.
The RF module 150 may provide applications including second generation (2) for the handset 100thgeneration, 2G)/third generation (3)thgeneration, 3G)/fourthGeneration (4)thgeneration, 4G)/fifth generation (5)thgeneration, 5G), and the like. May include at least one filter, switch, power amplifier, Low Noise Amplifier (LNA), etc. The radio frequency module receives electromagnetic waves through the antenna 1, and processes the received electromagnetic waves such as filtering, amplification and the like, and transmits the electromagnetic waves to the modem for demodulation. The rf module 150 may also amplify the signal modulated by the modem, and convert the signal into electromagnetic wave through the antenna 1 to radiate the electromagnetic wave. In some embodiments, at least some of the functional modules of the rf module 150 may be disposed in the processor 110. In some embodiments, at least some functional modules of the rf module 150 may be disposed in the same device as at least some modules of the processor 110.
The modem may include a modulator and a demodulator. The modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then passes the demodulated low frequency baseband signal to a baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs a sound signal through an audio device (not limited to a speaker, a receiver, etc.) or displays an image or video through a display screen. In some embodiments, the modem may be a stand-alone device. In some embodiments, the modem may be separate from the processor, in the same device as the rf module or other functional module.
The communication module 160 may provide a communication processing module including a solution of Wireless Local Area Network (WLAN) (for example, WiFi), bluetooth, Global Navigation Satellite System (GNSS), Frequency Modulation (FM), Near Field Communication (NFC), Infrared (IR), and other wireless communications, which is applied to the mobile phone 100. The communication module 160 may be one or more devices integrating at least one communication processing module. The communication module 160 receives electromagnetic waves via the antenna 2, performs frequency modulation and filtering processing on electromagnetic wave signals, and transmits the processed signals to the processor. The communication module 160 may also receive a signal to be transmitted from the processor, frequency-modulate and amplify the signal, and convert the signal into electromagnetic waves via the antenna 2 to radiate the electromagnetic waves.
In some embodiments, the antenna 1 of the handset 100 is coupled to the radio frequency module 150 and the antenna 2 is coupled to the communication module 160. So that the handset 100 can communicate with networks and other devices via wireless communication techniques. The wireless communication technology may include global system for mobile communications (GSM), General Packet Radio Service (GPRS), code division multiple access (code division multiple access, CDMA), Wideband Code Division Multiple Access (WCDMA), time-division code division multiple access (time-division code division multiple access, TD-SCDMA), LTE, 5G New wireless communication (New Radio, NR), BT, GNSS, WLAN, NFC, FM, and/or IR technologies, and the like. The GNSS may include a Global Positioning System (GPS), a global navigation satellite system (GLONASS), a beidou navigation satellite system (BDS), a quasi-zenith satellite system (QZSS), and/or a Satellite Based Augmentation System (SBAS).
The mobile phone 100 implements the display function through the GPU, the display screen 140, and the application processor. The GPU is a microprocessor for image processing and is connected with a display screen and an application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
The display screen 140 is used to display images, video, and the like. The display screen includes a display panel. The display panel may be a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (AMOLED), a miniature, a Micro led, a Micro-o led, a quantum dot light-emitting diode (QLED), or the like. In some embodiments, the handset 100 may include 1 or N display screens, with N being a positive integer greater than 1.
The mobile phone 100 may implement a shooting function through the ISP, the camera 130, the video codec, the GPU, the display screen 140, the application processor, and the like.
The ISP is used for processing data fed back by the camera. For example, when a photo is taken, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing and converting into an image visible to naked eyes. The ISP can also carry out algorithm optimization on the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be located in camera 130.
The camera 130 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image to the photosensitive element. The photosensitive element may be a Charge Coupled Device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The light sensing element converts the optical signal into an electrical signal, which is then passed to the ISP where it is converted into a digital image signal. And the ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into image signal in standard RGB, YUV and other formats. In some embodiments, the handset 100 may include 1 or N cameras, N being a positive integer greater than 1.
Internal memory 120 may be used to store computer-executable program code, including instructions. The processor 110 executes various functional applications of the handset 100 and data processing by executing instructions stored in the internal memory 120. The memory 120 may include a program storage area and a data storage area. The storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required by at least one function, and the like. The data storage area may store data (e.g., audio data, a phonebook, etc.) created during use of the handset 100, and the like. Further, the memory 120 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, other volatile solid state storage devices, a universal flash memory (UFS), and the like.
The mobile phone 100 can implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. Such as music playing, recording, etc.
The audio module 170 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The audio module may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be disposed in the processor 110, or some functional modules of the audio module may be disposed in the processor 110.
The speaker 170A, also called a "horn", is used to convert the audio electrical signal into an acoustic signal. The cellular phone 100 can listen to music through a speaker or listen to a hands-free call.
The receiver 170B, also called "earpiece", is used to convert the electrical audio signal into an acoustic signal. When the handset 100 receives a call or voice information, it can receive voice by placing the receiver close to the ear.
The microphone 170C, also referred to as a "microphone," is used to convert sound signals into electrical signals. When making a call or sending voice information, a user can input a voice signal into the microphone by making a sound by approaching the microphone through the mouth of the user. The handset 100 may be provided with at least one microphone. In some embodiments, the handset 100 may be provided with two microphones to achieve a noise reduction function in addition to collecting sound signals. In some embodiments, the mobile phone 100 may further include three, four or more microphones to collect sound signals and reduce noise, and may further identify sound sources and implement directional recording functions.
The headphone interface 170D is used to connect a wired headphone. The earphone interface may be a USB interface, or may be an open mobile platform (OMTP) standard interface of 3.5mm, or a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.
As shown in fig. 5, the first terminal device in the communication system architecture may be, for example, a smart home device 200. The smart home device 200 may include a processor 201, a display 202, a storage module 203, a communication module 204, a radio frequency module 205, an antenna 01, an antenna 02, a microphone 206, a speaker 207, and the like. The functions of the components can be referred to the above related description, and are not described herein.
In some embodiments, the antenna 01 of the smart home device 200 is coupled to the communication module, and the antenna 02 is coupled to the radio frequency module. So that the smart home device 200 can communicate with a network and other devices through a wireless communication technology. The wireless communication technology may include LTE, 5G NR, WLAN, and the like. Thus, the smart home device 200 may interact with the internet server 11 and the cloud server 12 (which may also be referred to as a cloud).
It is to be understood that the smart home device 200 described above may have more or fewer components than those shown in fig. 5, may combine two or more components, or may have a different configuration of components. The various components shown in fig. 5 may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing or application specific integrated circuits.
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. In the description of the present application, unless otherwise specified, "at least one" means one or more, "a plurality" means two or more. In addition, in order to facilitate clear description of technical solutions of the embodiments of the present application, in the embodiments of the present application, terms such as "first" and "second" are used to distinguish the same items or similar items having substantially the same functions and actions. Those skilled in the art will appreciate that the terms "first," "second," etc. do not denote any order or quantity, nor do the terms "first," "second," etc. denote any order or importance.
For the sake of understanding, the interaction method provided by the embodiments of the present application is specifically described below with reference to the accompanying drawings.
As shown in fig. 6, an interaction method is provided in the embodiments of the present application, which is described by taking a first terminal device as an intelligent sound box and a second terminal device as a mobile phone as an example, and includes:
601. the mobile phone receives first input content of a user, wherein the first input content comprises first voice content and/or first text content.
The user's cell-phone can install smart audio amplifier APP customer end, and the user can send first pronunciation content and/or first text content for smart audio amplifier through the APP customer end on the cell-phone.
For example, when a user wishes to remotely operate a smart sound box in a home, as shown in fig. 7, the user may click on an icon 702 of the smart sound box APP on a desktop 701 of a mobile phone. When the mobile phone detects that a user clicks an icon 702 of the smart sound box APP on the desktop 701, the smart sound box APP may be started, and a Graphical User Interface (GUI) shown in fig. 8 is displayed, where the GUI may be referred to as an input interface 801. The input interface 801 may include a voice input icon 802 and a prompt message 803 of inputting voice, "please input voice", or the prompt message of voice may be "please say an operation you wish to perform" (not shown in fig. 8), and the prompt message of inputting voice may be displayed after the user clicks the voice input icon 802; the text input section may include a text box 804 and a prompt 805 for inputting text "please input text", or the prompt may be "please edit an operation you wish to perform" (not shown in fig. 8), or the like.
Optionally, the user may operate the smart speaker through a smart speaker applet or a public number in another APP (e.g., a WeChat APP) of the mobile phone, or the user may log in a web page of the smart speaker through a browser to operate the smart speaker, which is not limited in this application.
If the first input content includes first voice content, the mobile phone can pick up a voice signal (first voice content) sent by the user through the microphone. For example, when a user performs voice input, the user may click a voice input icon, at this time, the mobile phone starts the microphone to detect a voice signal (spoken by the user) sent by the user, and the first voice content may be "please open the sweeping robot at home to clean the living room" or "please clean the living room". After the user stops speaking and the time interval is preset, the mobile phone can think that the voice input is finished. Optionally, the mobile phone may prompt the user that the voice input is completed through a prompt tone or a prompt message. Optionally, the mobile phone may preset the voice length that the user can input, for example, the voice length input by the user is specified to be greater than 1s and less than or equal to 60 s. Optionally, if the volume of the voice content sent by the user is too small (for example, less than 20dB), the user may be prompted to adjust the volume through a prompt tone or a prompt message, so that the microphone can better pick up the voice content of the user.
If the first input content includes first text content, the mobile phone may receive the text content (first text content) input by the user through an input device (e.g., a touch screen or a keyboard). For example, the first text content may be a piece of mind or comprehension, etc., edited by the user himself. Alternatively, the first text content may be a small text or a poem copied by the user from a browser or an application such as WeChat. For example, when a user enters text, the user may click the text box 803, a cursor in the text box flashes to prompt the user to enter text content, and optionally, an input method (e.g., a pinyin input method or a handwriting input method) or a shortcut (copy, paste, etc.) may be displayed around the text box (e.g., below or above the text box), so as to facilitate the user's operation.
If the first input content includes first voice content and first text content, the mobile phone can pick up the voice content (first voice content) sent by the user through the microphone and receive the text content (first text content) input by the user through the input device. For example, the first text content may be a text edited by the user, and the first voice content may be "please read the text aloud with a strong/high/low voice (i.e., the first text content)". The user may input the first voice content first and then input the first text content, or the user may input the first text content first and then input the first voice content, or the user may input the first voice content and the first text content at the same time, which is not limited in the present application.
Optionally, the first input content may include image information, such as a screenshot of a web page, a screenshot of ticket booking information of a train ticket, an airplane ticket, and the like. For example, the user may input the first voice content or the first text content through the smart sound box APP: the method comprises the steps of reserving a flight ticket of a voyage number in a picture, and inputting a screenshot of flight ticket information.
Therefore, when the user is not near the intelligent sound box, the voice content sent by the user can be picked up through the microphone of the mobile phone by utilizing the internet environment, and/or the text content input by the user is received through the input device, so that the mobile phone and the intelligent sound box realize remote interaction, the user can remotely operate the intelligent sound box without obstacles, and the intelligent sound box is communicated with other members in the house through the intelligent sound box, so that the capability of the intelligent sound box in the house is fully exerted.
602. The mobile phone sends first input content to the smart sound box.
The mobile phone can directly send the voice content (first voice content) input by the user and/or the text content (first text content) input by the user to the smart sound box, and does not need to perform corresponding processing on the first input content, such as semantic extraction or operation instruction matching.
For example, when the user says: the mobile phone can directly send the audio information of the words of 'please open the sweeping robot in the family to sweep the living room' to the intelligent sound box without matching the corresponding operation instruction according to the semantics of the words of 'please open the sweeping robot in the family to sweep the living room'.
603. The smart sound box receives first input content from the mobile phone.
604. And the intelligent sound box determines a control instruction corresponding to the first input content.
The smart speaker may upload the first input content to a server (e.g., a cloud server). If the first input content is the first voice content, the ASR engine of the cloud server may convert the first voice content into the first text information, and the NLP engine of the cloud server may obtain the control instruction according to the semantics of the first text information. The control instruction may include a vertical type and a slot position of the first text information. For example, if the user says "please help me to order an air ticket to beijing in the morning of the afternoon" or "i want to order an air ticket to beijing in the morning of the afternoon", the NLP engine may identify and determine that the vertical category of the first text information is "order an air ticket" according to the keyword as an intention, and determine that the slot position of the vertical category includes: the departure time is 'the morning of the next day', and the destination is 'Beijing'. Alternatively, if the user says "speak a little joke," the NLP engine may identify the vertical as "telling a story" based on the keyword as an intent. And determining the slot position of the vertical type comprises: the story type is 'joke', and the story length is 'short'. The cloud server sends the control instruction to the intelligent sound box, and the intelligent sound box receives the control instruction sent by the cloud server.
If the first input content is text content, the NLP engine of the cloud server can obtain a control instruction according to the semantics of the text information, then the cloud server issues the control instruction to the intelligent sound box, and the intelligent sound box receives the control instruction issued by the cloud server.
605. And the intelligent sound box processes the control instruction corresponding to the first input content.
The control instruction corresponding to the first input content may be for the smart sound box, that is, the smart sound box is required to respond according to the control instruction; the control instruction corresponding to the first input content may also be for a third terminal device (for example, a household sweeping robot), that is, the smart speaker forwards the control instruction to the third terminal device, and the third terminal device responds according to the control instruction.
And if the control instruction is specific to the intelligent sound box, the intelligent sound box executes the control instruction corresponding to the first input content. For example, if the control command is PLAY (PLAY), the smart speaker PLAYs music.
If the control instruction is for the third terminal device, the smart sound box may send the control instruction corresponding to the first input content to the third terminal device, so that the third terminal device executes the control instruction corresponding to the first input content, thereby controlling the third terminal device to perform corresponding operations, such as power on, power off, and the like. The third terminal device may be a terminal device matched with the smart speaker, for example, a sweeping robot matched with the smart speaker, an air conditioner, a refrigerator, a washing machine, a smart curtain, and the like.
For example, when a user needs to remotely operate a floor sweeping robot at home and the mobile phone is not matched with the floor sweeping robot at home, the mobile phone does not control the floor sweeping robot instruction, and cannot directly control the floor sweeping robot, and at the moment, the mobile phone can control the floor sweeping robot through the smart sound box. For example, a user can input the following voice content through a smart voice APP client of a mobile phone: please turn on the floor sweeping robot in the house to clean the living room. After the intelligent sound box receives the voice content, the voice content can be uploaded to the cloud server, an ASR engine of the cloud server can convert the voice content into text information, an NLP engine of the cloud server can obtain a control instruction 'cleaning living room' according to the semantics of the text information, the cloud server sends the control instruction to the intelligent sound box, the intelligent sound box receives the control instruction sent by the cloud server and sends the control instruction to the floor-sweeping robot, and therefore control over the floor-sweeping robot is achieved, the capacity of the intelligent sound box is fully exerted, and user experience is improved.
606. And the smart sound box sends a first response message corresponding to the first input content to the mobile phone.
If the control instruction is for the smart sound box, after the smart sound box executes the control instruction corresponding to the first input content, a first response message is sent to the mobile phone, where the first response message may include a control instruction transcribed by the cloud service and an execution result of the smart sound box after executing the control instruction, so that after a user inputs the first input content through the second terminal device, the user can obtain the control instruction obtained according to the first input content and the execution result of the control instruction, and user experience is improved. For example, the user a may send a voice content of "playing the moons of the lotus pool" through the APP terminal of the mobile phone, after receiving the voice content, the smart speaker determines a control instruction corresponding to the voice content through the cloud server, plays a corresponding song according to the control instruction, and sends a playing state (i.e., a first response message corresponding to the first input content) to the APP terminal of the mobile phone.
If the control instruction is for the third terminal device, after receiving an execution result corresponding to the first input content from the third terminal device, the smart sound box may send a first response message to the mobile phone, where the first response message may include a control instruction transcribed by the cloud service and an execution result obtained after the third terminal device executes the control instruction.
607. The mobile phone receives a first response message corresponding to the first input content from the smart sound box.
608. And the mobile phone broadcasts the voice or displays a response message on a display screen.
That is, the mobile phone can perform voice broadcast and/or display the control instruction and the execution result (i.e., the response message) of the smart sound box transcribed by the cloud server on the display screen.
As shown in fig. 9, assume that user 805 inputs by voice: "play half an hour light music when eating dinner, dinner 18: 50 start ". The mobile phone may display on APP that the user 805 has input the voice content 806, and display that the control instruction 808 processed by the smart sound box 807 is "18: 50 starts playing the light music for 30 minutes "and the execution result 809 is" 18: 50 start playing light music "and execution results 810" 19: 20 ending playing the light music ". Optionally, the mobile phone may display, on the APP, time information 811 of the voice content input by the user 805, and time information 812 of the control instruction 808 sent by the smart sound box received by the mobile phone, time information 813 of the execution result 809 received by the mobile phone, and time information 814 of the execution result 810 received by the mobile phone. Further, the mobile phone can also broadcast the control instruction 808, the execution result 809 and the execution result 810 to the user by voice. Or, the mobile phone may not display the control instruction 808, the execution result 809, and the execution result 810, and only broadcast the control instruction, the execution result 809, and the execution result 810 to the user by voice. For example, when the mobile phone detects that the user is listening to music and the screen of the mobile phone is in a black screen state, the mobile phone broadcasts the response message fed back by the smart sound box through voice, the screen does not need to be lightened, and power consumption can be saved.
In addition, the interaction method may further include:
609. and the intelligent sound box receives second input content, wherein the second input content comprises second voice content and/or second text content.
The second input content may be voice content (second voice content) input by other users (different from the user who inputs the first input content) at the loudspeaker box, or the second input content may be second voice content and/or second text content sent by other users through a fourth terminal device (for example, a mobile phone of the user or a smart wearable device, etc.).
If the moment when the intelligent sound box receives the second input content is earlier than the moment when the intelligent sound box receives the first input content, processing a control instruction corresponding to the first input content after the intelligent sound box processes the control instruction corresponding to the second input content; and if the moment when the intelligent sound box receives the second input content is later than the moment when the intelligent sound box receives the first input content, processing the control instruction corresponding to the second input content after the intelligent sound box processes the control instruction corresponding to the first input content. That is, the smart speaker follows a first-come first-executed strategy when processing different control commands.
The following description is made on the sequence of processing different control instructions by the smart sound box by combining a specific scene:
scene 1, user A send the pronunciation content of "broadcast music" through the APP end of cell-phone, and after smart speaker received the pronunciation content, confirm the instruction that this pronunciation content corresponds through the cloud ware, according to this instruction broadcast music to can send the broadcast state to cell-phone APP end. In the process of playing music, a user B controls the playing to be paused through a voice instruction beside the sound box, and the expected result is as follows: the intelligent sound box pauses music playing and can send the pause state to the APP end of the mobile phone.
Scene 2, intelligent audio amplifier are carrying out the music broadcast, and user B is by the audio amplifier through voice command control pause, and then user A sends the speech content of "broadcast music" through the APP end of cell-phone, and after the intelligent audio amplifier received the speech content, confirm the instruction that this speech content corresponds through cloud ware, and the expected result is: and the intelligent sound box continues playing music after the music playing is paused, and can send the playing state to the APP end of the mobile phone.
Scene 3, a user a sends a voice content of "playing music" through an APP end of a mobile phone, after receiving the voice content, an intelligent sound box determines an instruction corresponding to the voice content through a cloud server, and plays music according to the instruction, and in a process of playing music (for example, after 1/3 of a song is played), a user B controls music playing through voice at the sound box (the content requested to be played by the user B is exactly the same as the content requested to be played by the user a, for example, "moonpool" is required to be played), the expected result is: the intelligent sound box continues to play the 'moons of the lotus pool', the 'moons of the lotus pool' do not need to be played again, the user requirements can be met, and the same instructions are prevented from being executed repeatedly.
It should be noted that there is no necessary execution sequence between step 609 and step 603, and step 609 may be executed before step 603, after step 603, or simultaneously with step 603, which is not specifically limited in this embodiment.
It can be understood that, after the smart sound box processes the control instruction corresponding to the second input content, a second response message corresponding to the second input content may be sent to the mobile phone. And the mobile phone receives a second response message from the intelligent sound box, the second response message corresponds to second input content, and the mobile phone broadcasts the second response message in a voice mode or displays the second response message on a display screen. In this way, if another user (different from the user who inputs the first input content) inputs the voice content (the second voice content) at the sound box, or another user inputs the second input content through the fourth terminal device (for example, the mobile phone of the user or the smart wearable device), the user of the first terminal device can know the operation (including the control instruction corresponding to the second input content and the execution result for the control instruction) of the second terminal device by the other user, the control capability of the user on the second terminal device is improved, and the user experience is improved.
Based on the method provided by the embodiment of the application, the second terminal device (for example, a mobile phone) can receive the first input content from the user and send the first input content to the first terminal device (for example, a smart voice interaction device). The first terminal device receives the first input content from the second terminal device, determines a control instruction corresponding to the first input content through the server, and processes the control instruction. Therefore, the control instruction can be determined by the first terminal device based on the content (namely the first input content) remotely input by the user through the second terminal device, and the user does not operate according to the limited instruction preset on the mobile phone APP, so that the flexibility of remote interaction between the user and the first terminal device is enhanced, and the capability of the first terminal device is fully exerted.
In the embodiments provided in the present application, the method provided in the embodiments of the present application is introduced from the perspective of the first terminal device, the second terminal device, and the interaction between the first terminal device and the second terminal device. In order to implement the functions in the method provided by the embodiment of the present application, the first terminal device and the second terminal device may include a hardware structure and/or a software module, and the functions are implemented in the form of a hardware structure, a software module, or a hardware structure and a software module. Whether any of the above-described functions is implemented as a hardware structure, a software module, or a hardware structure plus a software module depends upon the particular application and design constraints imposed on the technical solution.
In the case of dividing each functional module according to each function, fig. 10 shows a schematic diagram of a possible structure of the apparatus 10 according to the foregoing embodiment, which may be a first terminal device, and the first terminal device includes: a receiving unit 1001, a determining unit 1002 and a processing unit 1002. In this embodiment of the application, the receiving unit 1001 is configured to receive first input content from a second terminal device, where the first input content includes first voice content and/or first text content; a determining unit 1002, configured to determine a control instruction corresponding to the first input content; the processing unit 1003 is configured to process a control instruction corresponding to the first input content. Optionally, the first terminal device may further include a sending unit 1004 (not shown in fig. 10) configured to send a first response message corresponding to the first input content to the second terminal device.
In the method embodiment shown in fig. 6, the receiving unit 1001 is configured to support the first terminal device to execute the process 603 in fig. 6; determining unit 1002 is configured to support the first terminal device to execute process 604 in fig. 6; processing unit 1003 is configured to support the first terminal device to perform process 605 in fig. 6; a sending unit 1004 configured to enable the first terminal device to execute the process 606 in fig. 6. All relevant contents of each step related to the above method embodiment may be referred to the functional description of the corresponding functional module, and are not described herein again.
In the case of dividing each functional module by corresponding functions, fig. 11 shows a schematic diagram of a possible structure of the apparatus 11 according to the foregoing embodiment, which may be a second terminal device, and the second terminal device includes: a receiving unit 1101, a sending unit 1102 and a processing unit 1103. In the embodiment of the present application, the receiving unit 1101 is configured to receive a first input content from a user, where the first input content includes a first voice content and/or a first text content; a sending unit 1102, configured to send first input content to a first terminal device; a receiving unit 1101, configured to receive a first response message corresponding to the first input content from the first terminal device; the processing unit 1103 is configured to perform voice broadcast or display the first response message on the display screen.
In the method embodiment shown in fig. 6, the receiving unit 1101 is configured to support the second terminal device to execute the processes 601 and 607 in fig. 6; the sending unit 1102 is configured to support the second terminal device to execute the process 602 in fig. 6; processing unit 1103 is configured to enable the second terminal device to perform process 608 in fig. 6. All relevant contents of each step related to the above method embodiment may be referred to the functional description of the corresponding functional module, and are not described herein again.
The division of the modules in the embodiments of the present application is schematic, and only one logical function division is provided, and in actual implementation, there may be another division manner, and in addition, each functional module in each embodiment of the present application may be integrated in one processor, may also exist alone physically, or may also be integrated in one module by two or more modules. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. For example, in the embodiment of the present application, the receiving unit and the transmitting unit may be integrated into the transceiving unit.
The method provided by the embodiment of the present application may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, a network appliance, a user device, or other programmable apparatus. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., Digital Video Disk (DVD)), or a semiconductor medium (e.g., Solid State Drive (SSD)), among others.
It will be apparent to those skilled in the art that various changes and modifications may be made in the embodiments of the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the embodiments of the present application fall within the scope of the claims of the present application and their equivalents, the present application is also intended to encompass such modifications and variations.

Claims (20)

1. An interaction method, comprising:
the method comprises the steps that a first terminal device receives first input content from a second terminal device, wherein the first input content comprises first voice content and/or first text content;
the first terminal equipment determines a control instruction corresponding to the first input content;
and the first terminal equipment processes the control instruction corresponding to the first input content.
2. The interaction method according to claim 1, wherein the determining, by the first terminal device, the control instruction corresponding to the first input content includes:
the first terminal equipment sends the first input content to a server;
and the first terminal equipment receives the control instruction corresponding to the first input content from the server.
3. The interaction method according to claim 1 or 2, wherein the processing of the control instruction corresponding to the first input content by the first terminal device comprises:
the first terminal equipment executes the control instruction corresponding to the first input content; or
And the first terminal equipment sends the control instruction corresponding to the first input content to third terminal equipment.
4. An interaction method according to any one of claims 1 to 3, characterized in that the method further comprises:
and the first terminal equipment sends a first response message corresponding to the first input content to the second terminal equipment.
5. The interaction method according to any one of claims 1 to 4, wherein the processing of the control instruction corresponding to the first input content by the first terminal device comprises:
the first terminal equipment receives second input content, wherein the second input content comprises second voice content and/or second text content;
and if the moment when the first terminal equipment receives the second input content is earlier than the moment when the first input content is received, processing the control instruction corresponding to the first input content by the first terminal equipment after processing the control instruction corresponding to the second input content.
6. An interaction method, comprising:
the second terminal equipment receives first input content from a user, wherein the first input content comprises first voice content and/or first text content;
the second terminal equipment sends the first input content to the first terminal equipment;
the second terminal equipment receives a first response message corresponding to the first input content from the first terminal equipment;
and the second terminal equipment broadcasts by voice or displays the first response message on a display screen.
7. The interactive method of claim 6, wherein the method further comprises:
the second terminal equipment receives a second response message from the first terminal equipment, wherein the second response message corresponds to second input content;
and the second terminal equipment broadcasts by voice or displays the second response message on a display screen.
8. A first terminal device, comprising:
a receiving unit, configured to receive first input content from a second terminal device, where the first input content includes first voice content and/or first text content;
the determining unit is used for determining a control instruction corresponding to the first input content;
and the processing unit is used for processing the control instruction corresponding to the first input content.
9. The first terminal device of claim 8, wherein the determining unit is configured to:
transmitting the first input content to a server through a transmitting unit;
receiving the control instruction corresponding to the first input content from the server through the receiving unit.
10. The first terminal device according to claim 8 or 9, wherein the processing unit is configured to:
executing the control instruction corresponding to the first input content; or
And sending the control instruction corresponding to the first input content to a third terminal device through the sending unit.
11. The first terminal device according to any of claims 8-10, wherein the sending unit is further configured to:
and sending a first response message corresponding to the first input content to the second terminal equipment.
12. The first terminal device according to any of claims 8-11, wherein the processing unit is configured to:
receiving, by the receiving unit, second input content, the second input content including second voice content and/or second text content;
and if the moment when the first terminal equipment receives the second input content is earlier than the moment when the first input content is received, processing the control instruction corresponding to the first input content after processing the control instruction corresponding to the second input content.
13. A second terminal device, comprising:
a receiving unit, configured to receive first input content from a user, where the first input content includes first voice content and/or first text content;
a sending unit, configured to send the first input content to a first terminal device;
the receiving unit is further configured to receive a first response message corresponding to the first input content from the first terminal device;
and the processing unit is used for voice broadcasting or displaying the first response message on a display screen.
14. The second terminal device of claim 13, wherein the receiving unit is further configured to:
receiving a second response message from the first terminal device, wherein the second response message corresponds to second input content;
and the processing unit is also used for voice broadcasting or displaying the second response message on a display screen.
15. A first terminal device, wherein said first terminal device comprises a processor and a memory;
the memory is configured to store computer-executable instructions, and when the first terminal device is running, the processor executes the computer-executable instructions stored by the memory to cause the first terminal device to perform the interaction method according to any one of claims 1 to 5.
16. A second terminal device, wherein the second terminal device comprises a processor and a memory;
the memory is used for storing computer-executable instructions, and when the second terminal device runs, the processor executes the computer-executable instructions stored by the memory to cause the second terminal device to execute the interaction method according to claim 6 or 7.
17. A computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the interaction method of any one of claims 1-5.
18. A computer-readable storage medium comprising instructions which, when executed on a computer, cause the computer to perform the interaction method of claim 6 or 7.
19. A chip system comprising a processor and a memory, the processor executing computer-executable instructions stored by the memory to implement the interaction method of any one of claims 1-5.
20. A chip system comprising a processor and a memory, the processor executing computer-executable instructions stored by the memory to implement the interaction method of claim 6 or 7.
CN201910472665.5A 2019-05-31 2019-05-31 Interaction method and terminal equipment Pending CN112017652A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910472665.5A CN112017652A (en) 2019-05-31 2019-05-31 Interaction method and terminal equipment
PCT/CN2020/092888 WO2020239013A1 (en) 2019-05-31 2020-05-28 Interaction method and terminal device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910472665.5A CN112017652A (en) 2019-05-31 2019-05-31 Interaction method and terminal equipment

Publications (1)

Publication Number Publication Date
CN112017652A true CN112017652A (en) 2020-12-01

Family

ID=73506179

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910472665.5A Pending CN112017652A (en) 2019-05-31 2019-05-31 Interaction method and terminal equipment

Country Status (2)

Country Link
CN (1) CN112017652A (en)
WO (1) WO2020239013A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113393839A (en) * 2021-08-16 2021-09-14 成都极米科技股份有限公司 Intelligent terminal control method, storage medium and intelligent terminal
CN113450792A (en) * 2021-06-22 2021-09-28 海信视像科技股份有限公司 Voice control method of terminal equipment, terminal equipment and server
WO2022262366A1 (en) * 2021-06-18 2022-12-22 华为技术有限公司 Cross-device dialogue service connection method, system, electronic device, and storage medium
WO2022268136A1 (en) * 2021-06-22 2022-12-29 海信视像科技股份有限公司 Terminal device and server for voice control

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117201526A (en) * 2022-06-01 2023-12-08 华为技术有限公司 Equipment control method and electronic equipment
CN117133284A (en) * 2023-04-06 2023-11-28 荣耀终端有限公司 Voice interaction method and electronic equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101272418A (en) * 2008-03-25 2008-09-24 宇龙计算机通信科技(深圳)有限公司 Communication terminal and method for long-range controlling communication terminal
JP2014164241A (en) * 2013-02-27 2014-09-08 Nippon Telegraph & Telephone East Corp Relay system, relay method, and program
CN106685772A (en) * 2016-12-23 2017-05-17 北京奇虎科技有限公司 Intelligent speaker, intelligent housing system and implementation method thereof
CN107835444A (en) * 2017-11-16 2018-03-23 百度在线网络技术(北京)有限公司 Information interacting method, device, voice frequency terminal and computer-readable recording medium
CN108235813A (en) * 2017-02-28 2018-06-29 华为技术有限公司 The method and relevant device of a kind of phonetic entry
CN109143879A (en) * 2018-08-10 2019-01-04 珠海格力电器股份有限公司 A method of controlling household electrical appliances centered on air-conditioning
CN109243444A (en) * 2018-09-30 2019-01-18 百度在线网络技术(北京)有限公司 Voice interactive method, equipment and computer readable storage medium
CN109408168A (en) * 2018-09-25 2019-03-01 维沃移动通信有限公司 A kind of remote interaction method and terminal device

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1655206A (en) * 2004-02-13 2005-08-17 邓清辉 Universal voice acoustic control and telephone remote control device for household appliance
CN102170617A (en) * 2011-04-07 2011-08-31 中兴通讯股份有限公司 Mobile terminal and remote control method thereof
CN102427418A (en) * 2011-12-09 2012-04-25 福州海景科技开发有限公司 Intelligent household system based on speech recognition
CN105991825A (en) * 2015-02-04 2016-10-05 中兴通讯股份有限公司 Voice control method, device and system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101272418A (en) * 2008-03-25 2008-09-24 宇龙计算机通信科技(深圳)有限公司 Communication terminal and method for long-range controlling communication terminal
JP2014164241A (en) * 2013-02-27 2014-09-08 Nippon Telegraph & Telephone East Corp Relay system, relay method, and program
CN106685772A (en) * 2016-12-23 2017-05-17 北京奇虎科技有限公司 Intelligent speaker, intelligent housing system and implementation method thereof
CN108235813A (en) * 2017-02-28 2018-06-29 华为技术有限公司 The method and relevant device of a kind of phonetic entry
CN107835444A (en) * 2017-11-16 2018-03-23 百度在线网络技术(北京)有限公司 Information interacting method, device, voice frequency terminal and computer-readable recording medium
CN109143879A (en) * 2018-08-10 2019-01-04 珠海格力电器股份有限公司 A method of controlling household electrical appliances centered on air-conditioning
CN109408168A (en) * 2018-09-25 2019-03-01 维沃移动通信有限公司 A kind of remote interaction method and terminal device
CN109243444A (en) * 2018-09-30 2019-01-18 百度在线网络技术(北京)有限公司 Voice interactive method, equipment and computer readable storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022262366A1 (en) * 2021-06-18 2022-12-22 华为技术有限公司 Cross-device dialogue service connection method, system, electronic device, and storage medium
CN113450792A (en) * 2021-06-22 2021-09-28 海信视像科技股份有限公司 Voice control method of terminal equipment, terminal equipment and server
WO2022268136A1 (en) * 2021-06-22 2022-12-29 海信视像科技股份有限公司 Terminal device and server for voice control
CN113393839A (en) * 2021-08-16 2021-09-14 成都极米科技股份有限公司 Intelligent terminal control method, storage medium and intelligent terminal
CN113393839B (en) * 2021-08-16 2021-11-12 成都极米科技股份有限公司 Intelligent terminal control method, storage medium and intelligent terminal

Also Published As

Publication number Publication date
WO2020239013A1 (en) 2020-12-03

Similar Documents

Publication Publication Date Title
CN112017652A (en) Interaction method and terminal equipment
WO2020244495A1 (en) Screen projection display method and electronic device
WO2020098437A1 (en) Method for playing multimedia data and electronic device
US20220400305A1 (en) Content continuation method and electronic device
CN113169760B (en) Wireless short-distance audio sharing method and electronic equipment
CN110784830B (en) Data processing method, Bluetooth module, electronic device and readable storage medium
EP3852348B1 (en) Translation method and terminal
WO2020078330A1 (en) Voice call-based translation method and electronic device
JP7234379B2 (en) Methods and associated devices for accessing networks by smart home devices
CN112995731B (en) Method and system for switching multimedia equipment
CN114996168A (en) Multi-device cooperative test method, test device and readable storage medium
WO2023071502A1 (en) Volume control method and apparatus, and electronic device
WO2022183941A1 (en) Message reply method and device
CN113141665B (en) Method and device for receiving system-on-demand message and user equipment
CN117389507B (en) Audio data processing method, electronic device and storage medium
WO2023216922A1 (en) Target device selection identification method, terminal device, system and storage medium
CN114697438B (en) Method, device, equipment and storage medium for carrying out call by utilizing intelligent equipment
WO2024067110A1 (en) Card updating method and related apparatus
WO2022143165A1 (en) Method and apparatus for determining network standard
WO2024051611A1 (en) Human-machine interaction method and related apparatus
WO2022143048A1 (en) Dialogue task management method and apparatus, and electronic device
CN117193897A (en) Cross-terminal screen interaction method and device, terminal equipment and medium
CN115942253A (en) Prompting method and related device
CN116709084A (en) Bluetooth headset, audio output method and audio output system
CN115729508A (en) Audio control method and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination