WO2023098467A1 - 语音解析方法、电子设备、可读存储介质及芯片*** - Google Patents

语音解析方法、电子设备、可读存储介质及芯片*** Download PDF

Info

Publication number
WO2023098467A1
WO2023098467A1 PCT/CN2022/131980 CN2022131980W WO2023098467A1 WO 2023098467 A1 WO2023098467 A1 WO 2023098467A1 CN 2022131980 W CN2022131980 W CN 2022131980W WO 2023098467 A1 WO2023098467 A1 WO 2023098467A1
Authority
WO
WIPO (PCT)
Prior art keywords
terminal device
information
voice
application program
interface
Prior art date
Application number
PCT/CN2022/131980
Other languages
English (en)
French (fr)
Inventor
张腾
王斌
孙峰
庄効谚
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023098467A1 publication Critical patent/WO2023098467A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G08SIGNALLING
    • G08BSIGNALLING OR CALLING SYSTEMS; ORDER TELEGRAPHS; ALARM SYSTEMS
    • G08B21/00Alarms responsive to a single specified undesired or abnormal condition and not otherwise provided for
    • G08B21/18Status alarms
    • G08B21/24Reminder alarms, e.g. anti-loss alarms
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • the present application relates to the field of terminal technology, and in particular to a voice analysis method, electronic equipment, a readable storage medium and a chip system.
  • terminal devices can not only perform actions based on received user-triggered clicks and other operations, but also detect user voices through voice assistants, and perform actions based on user voices.
  • the terminal device can detect the voice command issued by the user through the voice assistant, and analyze the voice command issued by the user in combination with the interface content displayed on the current interface of the terminal device, determine the user intention corresponding to the voice command, and then control the End devices perform actions that match user intent.
  • the terminal device cannot accurately understand the user's intention in some scenarios, and may trigger wrong operations or ask the user repeatedly, resulting in low interaction efficiency between the terminal device and the user.
  • the present application provides a speech analysis method, an electronic device, a readable storage medium, and a chip system, which solve the problem of low interaction efficiency between a terminal device and a user in certain scenarios in the prior art.
  • a speech analysis method including:
  • the voice instructions are used to instruct the terminal device to perform operations, and the information sent by the application includes reminder information for reminding the user;
  • the user intention corresponding to the voice command is determined.
  • the accuracy of determining the user's intention corresponding to the voice command can be improved, thereby improving The efficiency of voice interaction between the terminal device and the user.
  • the method before determining the user intention corresponding to the voice command according to the voice command and the reminder information, the method further includes:
  • the first application list is a list of applications installed on the terminal device
  • the second application list is an application currently running on the terminal device list of
  • the first application program list and the second application program list determine the identification of the application program corresponding to the voice instruction and the running state of the application program
  • the determining the user intention corresponding to the voice command according to the voice command and the reminder information includes:
  • the running state of the application program is running in the background, then according to the reminder information, the voice command and the identification of the application program, determine the user intention corresponding to the voice command;
  • the interface information corresponding to the current interface is obtained, and according to the voice command, the reminder information and the interface information, determine The user intention corresponding to the voice instruction.
  • the application program list and the second application program list determine the application program corresponding to the voice command, thereby determine the running state of the application program, and then determine the user corresponding to the voice command in different ways according to different running states Intent, which increases the flexibility to determine user intent.
  • the interface information of the application program can be continuously obtained, so that the user intention corresponding to the voice command can be determined according to the voice command and reminder information, combined with the acquired interface information, which can improve the determination of the user's intention. accuracy of intent.
  • the acquiring interface information corresponding to the current interface according to the current interface of the application program includes:
  • the interface content is analyzed to obtain interface information corresponding to the application program.
  • the acquisition of voice commands and information sent by running applications includes:
  • the workload of obtaining the information sent by the application program can be reduced, and the efficiency of obtaining the voice command and the information sent by the application program can be improved. Variety and flexibility.
  • the acquisition of the information sent by the running application includes:
  • the information sent by the running application program is obtained in real time.
  • the acquired information can be combined in time to determine the user's intention corresponding to the voice command, thereby improving the efficiency of determining the user's intention and improving the acquisition of information sent by the application program variety and flexibility.
  • the acquisition of the information sent by the running application includes:
  • the audio data is converted by using the automatic speech recognition technology ASR to obtain information in the form of text sent by the application program.
  • the audio data sent by the application program is obtained, and the audio data is converted to obtain the information in the form of text sent by the application program, which can improve the flexibility of obtaining the information sent by the application program.
  • the acquisition of the information sent by the running application includes:
  • the text data sent by the application program is extracted through a preset interface to obtain information in text form sent by the application program.
  • the efficiency of obtaining the information sent by the application program can be improved, and the flexibility of obtaining the information sent by the application program can be improved.
  • the method further included:
  • the determining the user intention corresponding to the voice command according to the voice command and the reminder information includes:
  • the user intention corresponding to the voice instruction is determined according to the text instruction and the reminder information.
  • the ASR technology is used to convert the voice instruction to obtain a text instruction in text form, including:
  • the ASR technology is used to convert the denoised speech instruction to obtain the text instruction in text form.
  • the accuracy of converting the text command can be improved, thereby improving the accuracy of determining the user's intention.
  • the method before acquiring the voice instruction and the information sent by the running application, the method further includes :
  • sample data establish a variety of associations between different types of sample data
  • the various types of sample data include: sample reminder information, sample interface content, sample voice instructions, and sample user intentions, and various types of associations Including: the relationship between the sample user intention and the sample reminder information, the relationship between the sample user intention and the sample voice instruction, the relationship between the sample user intention and the sample interface content connection relation;
  • the fusion model is obtained by performing training according to various association relationships, and the fusion model is a single model or a model group composed of multiple models.
  • determining the user intention corresponding to the voice instruction according to the voice instruction and the reminder information include:
  • the user intention corresponding to the voice command is determined by combining the voice command and the reminder information through the fusion model.
  • the acquired voice instruction and reminder information are analyzed by the fusion model, and the user intention output by the fusion model matching the voice instruction and reminder information is obtained, which can improve the accuracy of determining the user intention.
  • the method further includes:
  • the intent execution interface is invoked to perform an operation matching the user intent.
  • the method is applied in a multi-device scenario, and the multi-device scenario includes the first terminal device and a second terminal device, the first terminal device is connected to the second terminal device;
  • the acquisition of voice commands and information issued by running applications includes:
  • the first terminal device obtains the voice instruction and the information sent by the application program run by the first terminal device;
  • the first terminal device sends an information request instruction to the second terminal device according to the voice instruction, and the information request instruction is used to instruct the second terminal device to obtain and feed back the first terminal device to the first terminal device.
  • Information sent by the application program running on the terminal device is used to instruct the second terminal device to obtain and feed back the first terminal device to the first terminal device.
  • the first terminal device receives the information sent by the running application fed back by the second terminal device.
  • Any terminal device that collects voice commands in a multi-device scenario can control other devices in the multi-device scenario according to the voice commands, which can improve the flexibility of controlling terminal devices with voice commands.
  • a speech analysis device including:
  • the first acquiring module is configured to acquire voice instructions and information issued by a running application, the voice instructions are used to instruct the terminal device to perform operations, and the information issued by the application includes reminder information for reminding the user;
  • the first determining module is configured to determine the user intention corresponding to the voice command according to the voice command and the reminder information.
  • the device further includes:
  • the second acquiring module is configured to acquire a first application list and a second application list, the first application list is a list of applications installed on the terminal device, and the second application list is the A list of applications currently running on the terminal device;
  • a second determining module configured to determine the identifier of the application program corresponding to the voice instruction and the running state of the application program according to the first application program list and the second application program list;
  • the first determining module is specifically configured to, if the running state of the application program is running in the background, determine the corresponding voice command according to the reminder information, the voice command and the identifier of the application program.
  • User intention if the running state of the application is running in the foreground, then according to the current interface of the application, obtain the interface information corresponding to the current interface, and according to the voice command, the reminder information and the interface information to determine the user intent corresponding to the voice command.
  • the first determining module is further specifically configured to extract the current interface to obtain the current The interface content included in the interface; the interface content is analyzed to obtain the interface information corresponding to the application program.
  • the first acquiring module is specifically configured to acquire the voice instruction at the first moment; according to At the first moment, information sent by each of the application programs running within a preset time before the first moment is acquired.
  • the first acquisition module is specifically configured to acquire the running Information sent by the application.
  • the first acquisition module is further specifically configured to acquire the terminal device through a preset interface The broadcast audio data; the automatic speech recognition technology ASR is used to convert the audio data to obtain the information in the form of text sent by the application program.
  • the first acquisition module is further specifically configured to: The interface extracts the text data sent by the application program, and obtains the information in the form of text sent by the application program.
  • the device further includes:
  • a conversion module configured to convert the voice command using ASR technology to obtain a text command in text form
  • the first determination module is further specifically configured to determine the user intention corresponding to the voice command according to the text command and the reminder information.
  • the conversion module is specifically configured to use voice enhancement technology to denoise the voice instruction to obtain the denoised A noised voice command: using the ASR technology to convert the denoised voice command to obtain the text command in text form.
  • the device further includes:
  • the establishment module is used to establish multiple association relationships between different types of sample data according to multiple sample data.
  • the various sample data include: sample reminder information, sample interface content, sample voice instructions, and sample user intentions.
  • the association relationship includes: the association relationship between the sample user intention and the sample reminder information, the association relationship between the sample user intention and the sample voice instruction, the sample user intention and the sample Relationships between interface contents;
  • the training module is used to perform training according to the various association relationships to obtain a fusion model, and the fusion model is a single model or a model group composed of multiple models.
  • the first determination module is further specifically configured to use the fusion model to combine the voice instruction and The reminder information determines the user intention corresponding to the voice instruction.
  • the device further includes:
  • An executing module configured to call an intent execution interface according to the user intent, and execute an operation matching the user intent.
  • the apparatus is applied in a multi-device scenario, and the multi-device scenario includes the first terminal device and a second terminal device, the first terminal device is connected to the second terminal device;
  • the first obtaining module is also specifically used for the first terminal device to obtain voice commands and information issued by applications run by the first terminal device; and send information to the second terminal device according to the voice commands request instruction, and then receive the information sent by the running application fed back by the second terminal device, the information request instruction is used to instruct the second terminal device to obtain and feed back the first terminal device 2.
  • an electronic device including: a processor, the processor is configured to run a computer program stored in a memory, so as to implement the speech analysis method described in any one of the above first aspects.
  • a computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the speech analysis method described in any one of the above-mentioned first aspects is implemented .
  • a chip system in a fifth aspect, includes a memory and a processor, and the processor executes the computer program stored in the memory, so as to realize the speech analysis as described in any one of the above-mentioned first aspects method.
  • FIG. 1A is a schematic interface diagram of a shopping application program provided by an embodiment of the present application.
  • FIG. 1B is a schematic interface diagram of a system setting provided by the embodiment of the present application.
  • FIG. 1C is a schematic interface diagram of a map application program provided by the embodiment of the present application.
  • FIG. 2 is a schematic diagram of a speech analysis scene involved in a speech analysis method provided by an embodiment of the present application
  • FIG. 3 is a schematic flow chart of a speech analysis method provided by an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of obtaining reminder information based on a software architecture provided by an embodiment of the present application
  • FIG. 5 is a schematic flow diagram of another method of acquiring reminder information based on the software architecture provided by the embodiment of the present application.
  • FIG. 6 is a schematic flow diagram of a multi-device voice analysis provided by an embodiment of the present application.
  • FIG. 7 is a structural block diagram of a speech analysis device provided by an embodiment of the present application.
  • FIG. 8 is a structural block diagram of another speech analysis device provided by an embodiment of the present application.
  • FIG. 9 is a structural block diagram of another speech analysis device provided by an embodiment of the present application.
  • FIG. 10 is a structural block diagram of another speech analysis device provided by an embodiment of the present application.
  • FIG. 11 is a structural block diagram of another speech analysis device provided by the embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 13 is a software structural block diagram of an electronic device according to an embodiment of the present application.
  • the voice assistant of the terminal device can be turned on.
  • the terminal device can collect the voice commands issued by the user through the voice assistant, and analyze the user's voice command to obtain the information included in the voice command and determine the corresponding voice command.
  • user intent For example, the terminal device can turn on or off the air conditioner, stereo, and light through the voice assistant.
  • the voice assistant can determine the user's intention corresponding to the voice command based on the interface content displayed on the current interface. For example, referring to Figure 1A, if the current interface of the terminal device is the interface of a shopping application program, the voice assistant can perform shopping operations according to the voice command; referring to Figure 1B, if the current interface of the terminal device is the The voice command executes the operation of turning on Bluetooth.
  • the voice assistant still cannot determine the user's intention after combining the interface content displayed on the current interface, the voice assistant no longer controls the terminal device to perform the operation corresponding to the voice command, or the voice assistant needs to ask the user to obtain more information, Therefore, the user intention corresponding to the voice command is determined according to the content of the user's answer.
  • the map application reminds the user that "a faster route is currently found, and the destination can be reached 5 minutes earlier". If the voice assistant detects the voice command of "switching routes" issued by the user, but the voice assistant cannot determine the corresponding user intention according to the voice command, the voice assistant will no longer control the terminal device to perform the operation of switching routes on map applications, Alternatively, the voice assistant needs to ask the user again.
  • both induction cookers are working, and one of the induction cookers reminds the user that "the fire has been on high for 20 minutes, it is recommended to lower the temperature and simmer slowly”. If the voice assistant detects that the voice command issued by the user is "adjust the temperature to 200 degrees", the voice assistant can control the two induction cookers to be adjusted to 200 degrees, causing the device to perform incorrect operations.
  • this application proposes a voice analysis method.
  • the voice assistant obtains various information sent by the application program, and reminds the user according to the reminder information in the various information, based on the voice command issued by the user, combined with the current interface.
  • the content of the interface can accurately determine the user's intent corresponding to the voice command, control the terminal device to perform operations that match the user's intent, and improve the efficiency of the user's interaction with the voice assistant.
  • FIG. 2 is a schematic diagram of a voice analysis scenario involved in a voice analysis method provided in an embodiment of the present application.
  • the voice analysis scenario may include at least one terminal device 210, wherein each terminal device 210 may be located in the same network
  • each terminal device may include various types of devices such as smart TV, smart audio, router, and projector, and the embodiment of the present application does not limit the type of the terminal device.
  • the terminal device 210 can turn on the voice assistant during operation, and use the voice assistant to obtain voice instructions issued by the user and various information issued by the currently running application program, and then According to the voice command and the reminder information in the various information, the user's intention corresponding to the voice command is determined.
  • the voice assistant when the voice assistant detects that the user has issued a voice command, the voice assistant can analyze the voice command, and at the same time, obtain the information sent by the application program within a preset time, and remind the user according to the reminder information in the sent information , combined with the interface content displayed on the current interface of the terminal device, the fusion model obtained through pre-training is used to output the user intention corresponding to the voice command, so that the terminal device 210 can be controlled by the voice assistant to perform operations matching the user intention.
  • the voice assistant can also obtain the information issued by the map application program, and based on the "find more information” in various information. "fast route” reminder information, combined with the interface currently displayed by the map application program, it can be determined that the user's intention corresponding to the voice command is to switch the navigation route to the found faster route.
  • the voice assistant can control the terminal device to switch the navigation route in the map application program.
  • the voice assistant after the voice assistant detects that the user issued a voice command of "adjust the temperature to 200 degrees", it can obtain the information sent by each induction cooker, and according to the "suggested lowering temperature" issued by one of the induction cookers, Temperature" reminder message, the voice assistant can determine the user's intention corresponding to the voice command, and adjust the temperature of the induction cooker that sends the reminder message to 200 degrees, then the voice assistant can control the induction cooker to lower the temperature to 200 degrees.
  • the voice assistant first detects the voice command issued by the user, and then determines the user's intention according to the reminder information in the obtained various information.
  • the voice assistant can also obtain various information including reminder information issued by the application program in real time, and then when it detects the voice command issued by the user, it can combine the reminder information in the various information to determine the user's intention.
  • the embodiment of the present application does not limit the sequence of obtaining the reminder information and obtaining the voice instruction.
  • the voice instruction involved in the embodiment of the present application may be an instruction for instructing the terminal device to perform an operation.
  • the voice command is used to instruct the terminal device to purchase commodities, to enable or disable a certain function, or to adjust the status of the terminal device.
  • the embodiment of the present application does not limit the operation performed by the voice command to instruct the terminal device.
  • the application program can send various types of information during running, which may include reminder information for reminding the user.
  • the voice assistant can obtain various information sent by the application program in various ways.
  • the voice assistant can intercept the information sent by the application program through the preset interface; it can also receive various information sent by the application program to the voice assistant actively, and can also obtain the information sent by the application program in other ways.
  • the type of information sent by the program and the method of obtaining the information sent by the application program are not limited.
  • the speech analysis scenario including multiple devices is for convenience of explanation.
  • the speech analysis method can be applied in many different speech analysis scenarios, such as the scene of controlling smart home devices through voice assistants, The scenario where the assistant controls the on-vehicle device and other scenarios where the terminal device is controlled by the voice assistant, the embodiment of the present application does not limit the voice analysis scenario.
  • Fig. 3 is a schematic flowchart of a speech analysis method provided by the embodiment of the present application. As an example but not a limitation, the method can be applied to the above-mentioned terminal device. Referring to Fig. 3, the method includes:
  • Step 301 perform training according to various sample data to obtain a fusion model.
  • the fused model may be a single model, or a model group composed of multiple models, and each model in the model group may work together.
  • the embodiment of the present application does not limit the number of models in the fused model.
  • the fusion model is a model group
  • the first model in the model group can determine the domain to which the voice command belongs based on the received voice command, reminder information, interface content and other information ; Afterwards, information such as voice instructions, reminder information, and interface content can be input to the model corresponding to the field in the model group, and the user intention output by the model can be obtained.
  • the terminal device can perform voice interaction with the user through the voice assistant, and the voice assistant can analyze the voice command issued by the user through the fusion model obtained in advance, and determine the user intention corresponding to the voice command, so as to control the terminal device to implement the user's intention. matching operations. Therefore, before the terminal device performs voice interaction with the user, the terminal device can perform training according to various sample data to obtain a fusion model.
  • the various sample data may include: sample reminder information, sample interface content, sample voice instructions, and sample user intentions.
  • the sample reminder information can be the information that the application program reminds the user.
  • the sample interface content can be the content corresponding to each interface displayed by the terminal device when the application program is started.
  • the sample voice command can be the voice issued by the user to the terminal device.
  • User intentions can be different operations performed by the terminal device according to voice instructions, interface content, and reminder information.
  • the terminal device may establish an association relationship between different types of sample data according to various types of sample data, so that the various established association relationships may be used. Then various sample data such as sample reminder information, sample interface content, and sample voice commands in various association relationships are input into the preset initial model to obtain the initial user intention output by the initial model. Afterwards, the terminal device can adjust the initial model according to the initial user intent output by the initial model, combined with the sample user intent corresponding to the multiple sample data input into the initial model in various association relationships, and repeat the above process until the training obtains The model can more accurately output sample user intentions corresponding to various sample data such as sample reminder information, sample interface content, and sample voice commands.
  • the terminal device can establish an association relationship with different types of sample data based on various types of sample data and combined with detected user-triggered operations, that is, the terminal device can associate any sample user intention with other types of user intentions.
  • a certain sample data in the sample data establishes an association relationship.
  • the terminal device may establish an association relationship between the sample user intention and other three types of sample data for each sample user intention. That is, for each sample user intention, an association relationship between the sample user intention and at least one sample reminder information may be established, an association relationship between the sample user intention and at least one sample voice instruction may also be established, or An association relationship between the sample user intention and at least one sample interface content is established, so as to obtain an association relationship between various sample data.
  • the terminal device can associate the sample user intention "switch to a faster route” with the sample reminder information "currently found a faster route, and can arrive at the destination 5 minutes earlier", or the sample user intention "switch to a faster route”.
  • Route is associated with the sample voice instruction "switch route”
  • the sample user intention "switch to a faster route” can also be associated with the interface corresponding to the map application in the sample interface content.
  • the terminal device After the terminal device establishes various association relationships based on part of the sample data (such as 80%), the terminal device can use the multiple association relationships as tag data, and a large amount of sample data (such as sample reminder information, sample interface content, and sample voice instruction) into the pre-set initial model to obtain the initial user intent output by the initial model, and then compare the difference between the sample user intent corresponding to the sample data and the initial user intent according to the association relationship as the label data, so as to improve the initial model. Parameters are adjusted. After the initial model is trained multiple times in the above manner, so that the initial model can more accurately output sample user intentions corresponding to various sample data, the training of the initial model is completed and a fusion model is obtained.
  • sample data such as sample reminder information, sample interface content, and sample voice instruction
  • the terminal device can also use the remaining sample data (such as 20%) in a large number of sample data to test the fusion model to determine the similarity between the initial user intent output by the fusion model and the sample user intent. degree, so that it can be determined whether to further train the fusion model according to the obtained similarity degree.
  • the fusion model be obtained through terminal equipment training based on a large number of sample data, but also the server can be trained based on sample data to obtain the fusion model. No limit.
  • sample data may be obtained through manual collection, and of course may also be collected through other methods, and the embodiment of the present application does not limit the collection method of sample data.
  • the reminder information used to remind the user in the application program may be intercepted manually.
  • the terminal device can add the fusion model to the preset voice assistant, so that the voice assistant can determine the user's intention through the fusion model; the terminal device can also set an interface corresponding to the voice assistant, so that the voice assistant
  • the fusion model can be invoked through this interface, and the embodiment of the present application does not limit the manner in which the voice assistant and the fusion model work together.
  • Step 302 acquiring the information sent by the currently running application program.
  • the application program may send various types of information during the running process, and the various types of information sent by the application program may include reminder information, and the reminder information is information for reminding the user.
  • the reminding information may be obtained through voice conversion broadcast by the application program, or may be obtained from text information of the application program, and the embodiment of the present application does not limit the reminding information.
  • the terminal device can start different applications during operation, and each application can send out various information including reminder information, and the voice assistant can obtain various information sent by the application, so that in the next step, the voice assistant
  • the user's intention corresponding to the voice command can be determined according to the reminder information in various information and combined with the voice command issued by the user.
  • the fusion model trained in step 301 is trained based on a large number of sample reminder information, and the sample reminder information used for training the fusion model is manually collected. Therefore, the fusion model can recognize the reminder information in the various information of the application program acquired by the voice assistant, so that in the subsequent steps, the user's intention can be output according to the reminder information. However, the fusion model does not recognize other information in the various information except the reminder information. When the voice assistant inputs other information in the various information to the fusion model, the fusion model cannot output the user's intention, but outputs an abnormality (such as fusion model Can output empty or output as other).
  • an abnormality such as fusion model Can output empty or output as other.
  • the terminal device can start the voice assistant after startup, and the voice assistant can monitor the various running applications and obtain various information sent by the application (for example, the various information sent by the application to intercept).
  • the application program can remind the user of the reminder information corresponding to the changing scene in the form of voice broadcast according to the constantly changing scene where the terminal device is located.
  • the voice assistant can obtain the audio data of the reminder information through the preset interface when the application program broadcasts the reminder information, and then use the automatic speech recognition technology (automatic speech recognition, ASR) to convert the audio data into text, and get App reminder message.
  • ASR automatic speech recognition
  • the map application at the APP layer can use the media player in the system layer according to the received traffic information and the current location of the terminal device Broadcast the audio data of "find a faster route", and the voice assistant can obtain the audio data from the player through the pre-set interface, and convert the audio data to obtain the text corresponding to the audio data "discover a faster route", That is, to get the reminder information of the map application program.
  • the application program does not enable the function of voice broadcast or the application program does not have the function of voice broadcast, and the application program displays text information to the user Make a reminder.
  • the voice assistant can obtain the reminder information of the application program in text form through the pre-set interface.
  • the voice assistant can also obtain reminder information in other ways.
  • the voice assistant can obtain the text information corresponding to the broadcast audio data in the application program through a preset interface, so that the obtained text information can be used as the reminder information of the application program.
  • the application when the application detects the need to remind the user, the application can also actively send the audio data or text information to be broadcast to the voice assistant, and the voice assistant does not need to obtain the audio data or text information again. That is, referring to FIG. 5 , the application program can complete the action of sending reminder information to the voice assistant at the APP layer, and the voice assistant can obtain the reminder information without going through the player at the system layer.
  • the application program can continue to broadcast audio data or display text information to the user, or can broadcast audio data or text information through the voice assistant. No limit.
  • the terminal device may also pre-train a recognition model for recognizing reminder information based on a large number of samples of reminder information collected manually, combined with other information of the application program. Afterwards, the terminal device can use the recognition model to recognize various information obtained by the voice assistant, so as to obtain reminder information.
  • the voice assistant can also determine the reminder information sent by the application program in other ways, and the embodiment of the present application does not limit the method of obtaining the reminder information.
  • Step 303 Obtain the voice command issued by the user, and convert the voice command to obtain a text command.
  • the voice instruction is an instruction for instructing the terminal device to perform an operation.
  • the user may issue a voice command according to the reminder information, instructing the terminal device or the application program to perform related operations.
  • the terminal device can collect the voice commands issued by the user through the voice assistant, and convert them into text commands.
  • the pre-set voice assistant can be turned on first, so that the voice command issued by the user can be continuously collected through the voice assistant, so that the terminal device can be controlled to perform corresponding operations according to the voice command, and the communication between the terminal device and the user can be improved.
  • the efficiency of voice interaction can be achieved.
  • the voice assistant can call the microphone of the terminal device, collect audio data through the microphone, and obtain voice instructions from the user.
  • the voice assistant can use voice enhancement technology to filter the noise in the voice command to obtain the filtered voice command, that is, the voice uttered by the user.
  • the voice assistant can use ASR to perform text conversion on the filtered voice commands to obtain text commands.
  • the voice assistant can collect voice instructions by means of a wake-up word.
  • the voice assistant can continuously collect audio data through the microphone.
  • the voice assistant can use the audio data within a preset time after the wake-up word as a voice command.
  • the voice assistant can send the wake-up word to The audio data "switch route" within 10 seconds after that is used as a voice command, which is filtered through voice enhancement, and then converted into text through ASR technology to obtain the text command "switch route".
  • the voice assistant can also obtain voice instructions in combination with the end word. After the voice assistant detects the wake-up word, the audio data after the wake-up word can be used as voice instructions until the voice assistant detects the end word.
  • the voice assistant For example, if the wake-up word is "Little E”, the ending word is "Goodbye”, and the audio data collected by the voice assistant is "Little E, switch routes, goodbye", then after the voice assistant detects the wake-up word “Little E", it can The audio data "switching route" that continues to be collected is used as a voice command, and when the voice assistant detects the ending word "goodbye", the voice assistant no longer uses the collected audio data as a voice command.
  • step 302 the terminal device can also execute step 303 first, and then execute step 302, indicating that the voice assistant first collects the voice command issued by the user, and then obtains the information issued by the application program within a preset time, so as to The reminder information sent out is combined with the voice command to determine the user's intention corresponding to the voice command.
  • the voice assistant does not obtain the information sent by the application program in real time, but first collects the voice command issued by the user, and according to the time when the voice command is collected, selects the information sent by the application program within a preset time before that time. , so as to get the reminder information sent by the application.
  • the preset time is 2 minutes. If the voice assistant of the terminal device collects the user's voice command at 11:10, the terminal device can traverse the applications running from 11:08 to 11:10 and obtain the Messages from each application that was run between 11:10 and 11:10.
  • the terminal device may also only perform step 303 without performing step 302.
  • the embodiment of the present application does not limit the sequence of steps 302 and 303, nor does it limit whether the terminal device performs step 302.
  • the user can also issue an instruction according to the current scene, and the terminal device can obtain the voice instruction issued by the user.
  • the map application does not send out a "shorter route found" reminder message, but when the user finds that there is a traffic jam ahead or needs to change the destination, the user can issue a "switching route" to the voice assistant.
  • Directions" or "Change Destination" voice commands are examples of the voice assistant.
  • the voice assistant can also continuously obtain information corresponding to the current scene through the terminal device or the application program, so as to continuously determine the current environment of the user.
  • the voice assistant detects that the user's current environment does not match the status of the terminal device or the status of the application program, the voice assistant can remind the user, so as to determine whether it is necessary to monitor the status of the terminal device or the application program according to the voice command fed back by the user. status is adjusted.
  • the map application may send the traffic jam information to the voice assistant as a reminder.
  • the voice assistant can ask the user "There is a traffic jam on section A, do you want to switch the navigation route?" If the voice assistant detects that the user answers "switch the route", the voice assistant can instruct the map application to switch the navigation route.
  • Step 304 according to the text instruction corresponding to the voice instruction, determine the application program corresponding to the text instruction.
  • the terminal device After acquiring the text instruction, the terminal device can search for an application program corresponding to the text instruction. Because the application program running in the foreground of the terminal device can send reminder information, the application program running in the background may also send reminder information. Therefore, the terminal device can first determine the application program corresponding to the text command according to the text command corresponding to the voice command, so that in the subsequent steps, the terminal device can obtain the interface of the application program according to the determined application program, or, according to the text command and Reminders identify user intent.
  • the terminal device may acquire a first application program list and a second application program list, wherein the first application program list is a list of various application programs installed on the terminal device, and the second application program list is a list of terminal A list of each application currently running on the device, including applications running in the foreground and applications running in the background.
  • the terminal device can search the application associated with the text instruction from the first application list and the second application list according to the first application list and the second application list, combined with the text instruction corresponding to the voice instruction.
  • the terminal device can search the application program matching the text instruction from the first application program list and the second application program list according to the text instruction corresponding to the voice instruction, so as to determine the running status corresponding to the matching application program, wherein the The running state is used to indicate whether the terminal device is running the application program, and whether the terminal device is running the application program in the foreground or running the application program in the background.
  • the terminal device may perform step 305 to obtain interface information of the application program. If the running state indicates that the application program corresponding to the text instruction is an application program running in the background of the terminal device, the terminal device can obtain the identification of the application program, skip step 305 and execute step 306, and combine the identification of the determined application program to pass The fusion model determines the user intent corresponding to the spoken command.
  • the application program related to the text instruction can be an application program with navigation function;
  • the application program related to the instruction can be an application program with the function of playing audio and video data;
  • the text instruction is "open application B", then the application program related to the text instruction can be the same as or similar to the application name in the text instruction application.
  • Step 305 if the application program corresponding to the text instruction is an application program running in the foreground of the terminal device, then acquire interface information according to the interface content of the current interface of the terminal device.
  • the current interface is an interface currently displayed by the terminal device.
  • the interface content may include various elements displayed in the current interface.
  • the interface content may include elements such as text, images, and controls.
  • the interface content may also include: the text position corresponding to the text, the image position corresponding to the image, and the corresponding information such as category and location.
  • the interface information is information determined according to the content of the interface, and is used to indicate the scene where the terminal device is currently located and the actions associated with the current scene (such as the actions currently being performed by the terminal device, and/or the actions that the terminal device may perform Actions).
  • the interface information corresponding to the current interface may include: the scene where the terminal device is currently located is a map navigation scene, the action currently being performed by the terminal device is to navigate to the destination, and the terminal Actions that may be performed by the device include: changing the route, changing the destination, and stopping navigation.
  • the terminal device can obtain the current interface of the terminal device through system services, that is, the interface corresponding to the application program, and identify each element in the content of the interface, and then perform Parse to get the interface information of the application.
  • system services that is, the interface corresponding to the application program
  • Step 306 Determine the user intention corresponding to the voice command according to the information sent by the application program and the text command corresponding to the voice command.
  • the user intention may be obtained by outputting the fusion model trained according to step 301 .
  • the voice assistant of the terminal device can input the information issued by the application program, the text command corresponding to the voice command, and the identification corresponding to the application program to the fusion model, and obtain the user information output by the fusion model.
  • the voice assistant can input the information sent by the application program, the text command corresponding to the voice command, and the interface information corresponding to the application program to the fusion model, and obtain the user intention output by the fusion model.
  • the terminal device after the terminal device obtains the interface information of the application program, it can input the text instruction, interface information and the information issued by the obtained application program at the same time.
  • the text instruction, interface information and information sent by the application program are analyzed through the fusion model, and the operation matching the text instruction, interface information and information sent by the application program is output, that is, the operation corresponding to the voice instruction User intent, so that the terminal device can perform matching operations according to the user intent in subsequent steps.
  • the terminal device can first input each information into the first model in the model group, analyze each information through the first model, and output the domain to which the voice command belongs, and then obtain the domain from the model Among the multiple models in the group, determine the model that matches the domain of the voice command, and input information such as text commands, interface information, and information sent by the application program into the model, and analyze each information through the model to obtain the voice command. corresponding user intent.
  • the terminal device skips step 305 after executing step 304, and sends the text command, the information sent by the application program, and the The identification of the application program obtained in 304 is input into the fusion model, and the user intention output by the fusion model is obtained.
  • the terminal device when the terminal device inputs the information sent by the application program into the fusion model, if the information sent by the application program is obtained by the terminal device first by performing step 302 and then by performing step 303, the terminal device can , select the information sent by the application program within a preset time to input into the fusion model; if the information sent by the application program is obtained by the terminal device by first executing step 303 and then executing step 302, the terminal device can send the information sent by the application program within the preset time. All the information is input into the fusion model.
  • the terminal device may also use other methods to select the information input into the fusion model, which is not limited in this embodiment of the present application.
  • Step 307 perform an operation matching the user's intention.
  • the terminal device After the terminal device determines the user's intention corresponding to the voice command, it can call the preset intention execution interface to control the terminal device to perform operations that match the user's intention, respond to the voice command issued by the user, and complete the voice interaction between the terminal device and the user , so that the processes and steps required for voice interaction between the terminal device and the user can be reduced.
  • step 301 is an optional step, and the terminal device may only perform step 301 once, that is, after the fusion model is obtained through training, there is no need to perform step 301 again each time the user intention corresponding to the voice command is subsequently determined. Train the fusion model.
  • step 305 is also an optional step. If the terminal device determines in step 304 that the application program corresponding to the voice command is an application program running in the foreground of the terminal device, the terminal device can perform step 305; When the corresponding application program is an application program running in the background of the terminal device, the terminal device may skip step 305 and execute step 306 .
  • the above voice analysis method can also be applied in a scenario of multiple devices.
  • it can be applied to the scenario of controlling different smart home devices through the terminal device, can also be applied to the scenario of controlling the vehicle-mounted device through the terminal device, and can also be applied to other scenarios including multiple devices, which is not limited in the embodiment of the present application. .
  • FIG. 6 shows a schematic flow chart of voice analysis performed by multiple devices.
  • the first terminal device and the second terminal device are taken as examples to illustrate a method for any terminal device to access the network for voice analysis.
  • Can include:
  • Step 601. When accessing the network, the first terminal device broadcasts the first application program list to other devices in the network, and requests the second terminal device for the second application program list.
  • the first application program list may be a list of various application programs currently running on the first terminal device.
  • the second application program list may be a list of application programs currently running on the second terminal device.
  • the network accessed by the first terminal device may be a local area network, a wide area network, or the Internet.
  • the first terminal device may access a local area network; in a distributed application scenario, the second A terminal device may access the wide area network, which is not limited in this embodiment of the present application.
  • the first terminal device After the first terminal device detects access to the network, it can generate list request information, obtain the first application program list of the first terminal device, and broadcast the first application program list and list request information to other terminal devices in the network, so that Other terminal devices in the network may receive the first application program list, and feed back a corresponding application program list to the first terminal device according to the list request information.
  • the application programs may be continuously opened or closed, and the first application program list of the first terminal device is constantly changing.
  • the first terminal device detects that there is a newly opened application program, or when an application program is closed, it can update the first application program list, and broadcast the updated first application program list to other devices in the network. program list.
  • Step 602 the second terminal device receives the first application program list of the first terminal device, and feeds back the second application program list of the second terminal device to the first terminal device.
  • Step 603 when the first terminal device detects the voice command sent by the user, it converts the voice command to obtain a text command, and obtains reminder information and interface information of the first terminal device.
  • Step 604 the first terminal device sends an information request instruction to the second terminal device.
  • the information request instruction is used to request reminder information and interface information of the second terminal device.
  • the embodiment of the present application is described by performing step 603 first and then performing step 604 as an example.
  • the first terminal device may also perform step 603 and step 604 at the same time, that is, the first terminal device may also send an information request command to the second terminal device while converting the voice command.
  • the first terminal device may also perform step 604 first, and then perform step 603, and this embodiment of the present application does not limit the order in which the first terminal device performs step 603 and step 604.
  • the first terminal device when the first terminal device first executes step 604 and then executes step 603, the first terminal device does not obtain the reminder information and interface information of the second terminal device according to the detected voice command, but periodically obtains the second terminal device's reminder information and interface information.
  • the reminder information and interface information of the second terminal device use a process similar to the above step 302 to step 303 to obtain the reminder information and interface information of the second terminal device in advance, and when the voice command sent by the user is detected, it can be used in subsequent steps. , to determine the user intent corresponding to the voice command.
  • Step 605 The second terminal device obtains the reminder information and interface information of the second terminal device according to the information request instruction sent by the first terminal device, and feeds back the reminder information and interface information of the second terminal device to the first terminal device.
  • step 603 and step 605 is similar to the process of obtaining information sent by the application program and the process of obtaining interface information in step 302 and step 305, and will not be repeated here.
  • Step 606 the first terminal device determines the application program corresponding to the voice command and the user intention corresponding to the voice command according to the multiple reminder information acquired and combined with the text command.
  • step 602 and step 605 if in step 602 and step 605, the first terminal device acquires the interface information of the application program, the first terminal device may also combine Obtained interface information to determine user intent. However, if in step 602 and step 605, the first terminal device has not obtained the interface information of the application program, then the first terminal device can combine the determined The application identification of the application program determines the user's intent, which will not be repeated here.
  • Step 607 If the application program corresponding to the voice command is an application program run by the second terminal device, the first terminal device sends the user intention to the second terminal device.
  • the first terminal device After the first terminal device determines that the user is aiming at the voice command issued by an application program of the second terminal device, the first terminal device can send the user intention obtained through identification and analysis to the second terminal device, so that the second terminal device can According to the user intention, the intention execution interface is called, and the second terminal device is controlled to perform an operation corresponding to the user intention, so as to realize multi-device cooperative work.
  • Step 608 the second terminal device performs an operation corresponding to the user intention according to the received user intention.
  • Step 609 If the application program corresponding to the voice command is an application program run by the first terminal device, the first terminal device performs an operation corresponding to the user's intention according to the user's intention.
  • step 603, step 605, step 606, step 608, and step 609 the interface information and reminder information are obtained, and the user's intention is determined according to the interface information and reminder information, combined with text instructions, and then the corresponding operation is performed according to the user's intention
  • the process is similar to the process from step 302 to step 306, and will not be repeated here.
  • the embodiment of the present application takes the voice assistant of the terminal device as an example, and some applications installed on the terminal device can also perform voice interaction with the user, so the installed application program can also use the above-mentioned voice analysis method to determine the voice assistant of the user according to the voice command. Intent, to realize the control of terminal equipment or various application programs.
  • the map application can issue voice instructions to the map application through voice interaction, and the map application can switch the navigation route according to the voice instruction , Change the destination, control the on-board equipment or control the vehicle.
  • the voice analysis method can improve the user's intention by acquiring the information sent by the running application program when the voice command is obtained, and using the information sent by the application program as a factor for determining the user's intention.
  • the accuracy of the user's intention corresponding to the voice instruction is determined, so that the efficiency of voice interaction between the terminal device and the user can be improved.
  • the voice command from the user When receiving the voice command from the user, it can convert the voice command to get the text command, and obtain the information from the application program, analyze the text command and the information from the application program through the pre-trained fusion model, output and voice The user intent corresponding to the command.
  • the accuracy of determining the user's intention corresponding to the voice command can be improved, thereby improving the efficiency of voice interaction between the terminal device and the user.
  • the terminal device can also obtain the interface information of the application program, and the fusion model combines the interface information of the application program on the basis of the text command and the information sent by the application program.
  • the user's intention corresponding to the voice command can be determined more accurately, and the accuracy of determining the user's intention can be improved.
  • any terminal device that has collected voice commands in a multi-device scenario can control other devices in the multi-device scenario according to the voice commands, which can improve the flexibility of controlling terminal devices with voice commands.
  • applications with voice interaction functions can also determine the user intent corresponding to the voice command based on the voice command, combined with reminder information and interface information, which can improve the versatility and flexibility of the application to analyze voice commands and obtain user intent.
  • FIG. 7 is a structural block diagram of a speech analysis device provided by the embodiment of the present application. For the convenience of description, only the parts related to the embodiment of the present application are shown.
  • the device includes:
  • the first acquiring module 701 is configured to acquire a voice command and information sent by a running application, the voice command is used to instruct the terminal device to perform an operation, and the information sent by the application includes reminder information for reminding the user;
  • the first determining module 702 is configured to determine the user intention corresponding to the voice command according to the voice command and the reminder information.
  • the device also includes:
  • the second obtaining module 703 is configured to obtain a first application program list and a second application program list, the first application program list is a list of applications installed on the terminal device, and the second application program list is the current application program list of the terminal device list of running applications;
  • the second determining module 704 is configured to determine the identifier of the application program corresponding to the voice instruction and the running state of the application program according to the first application program list and the second application program list;
  • the first determination module 702 is specifically configured to determine the user intention corresponding to the voice command according to the reminder information, the voice command, and the application program identifier if the running state of the application program is running in the background; if the application program If the running state of the program is running in the foreground, according to the current interface of the application program, the interface information corresponding to the current interface is obtained, and the user intention corresponding to the voice command is determined according to the voice command, the reminder information and the interface information.
  • the first determination module 702 is also specifically configured to extract the current interface to obtain interface content included in the current interface; analyze the interface content to obtain interface information corresponding to the application program.
  • the obtaining first obtaining module 701 is specifically configured to obtain the voice command at the first moment; according to the first moment, obtain the information issued by each of the application programs running within a preset time before the first moment .
  • the first acquiring module 701 is specifically configured to acquire information sent by the running application program in real time.
  • the first obtaining module 701 is also specifically configured to obtain the audio data broadcast by the terminal device through a preset interface; use the automatic speech recognition technology ASR to convert the audio data to obtain the text sent by the application program form information.
  • the first obtaining module 701 is also specifically configured to extract the text data sent by the application program through a preset interface to obtain information in text form sent by the application program.
  • the device also includes:
  • the conversion module 705 is used to convert the voice command by using ASR technology to obtain a text command in text form
  • the first determining module 702 is further specifically configured to determine the user intention corresponding to the voice instruction according to the text instruction and the reminder information.
  • the conversion module 705 is specifically used to denoise the voice command by using voice enhancement technology to obtain a denoised voice command; use the ASR technology to convert the denoised voice command to obtain a text form of this text instruction.
  • the device also includes:
  • the establishment module 706 is used to establish various associations between different types of sample data according to various sample data, and the various sample data include: sample reminder information, sample interface content, sample voice instructions, and sample user intentions, and more
  • the association relationship includes: the association relationship between the sample user intention and the sample reminder information, the association relationship between the sample user intention and the sample voice command, and the association relationship between the sample user intention and the sample interface content ;
  • the training module 707 is configured to perform training according to various association relationships to obtain a fusion model, where the fusion model is a single model or a model group composed of multiple models.
  • the first determination module 702 is further specifically configured to determine the user intention corresponding to the voice command through the fusion model, in combination with the voice command and the reminder information.
  • the device also includes:
  • the execution module 708 is configured to invoke an intent execution interface according to the user intent, and execute an operation matching the user intent.
  • the apparatus is applied in a multi-device scenario, where the multi-device scenario includes a first terminal device and a second terminal device, and the first terminal device is connected to the second terminal device;
  • the first obtaining module 701 is also specifically used for the first terminal device to obtain voice commands and information sent by the application program running on the first terminal device; according to the voice command, send an information request command to the second terminal device,
  • the information request instruction is used to instruct the second terminal device to obtain and feed back to the first terminal device the information issued by the application program running on the second terminal device; Information.
  • the speech analysis device by obtaining the information sent by the running application program when the voice command is obtained, and using the information sent by the application program as a factor for determining the user's intention, can improve The accuracy of the user's intention corresponding to the voice instruction is determined, so that the efficiency of voice interaction between the terminal device and the user can be improved.
  • FIG. 12 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device may include a processor 1210, an external memory interface 1220, an internal memory 1221, a universal serial bus (universal serial bus, USB) interface 1230, a charging management module 1240, a power management module 1241, a battery 1242, an antenna 1, an antenna 2, Mobile communication module 1250, wireless communication module 1260, audio module 1270, speaker 1270A, receiver 1270B, microphone 1270C, earphone interface 1270D, sensor module 1280, button 1290, motor 1291, indicator 1292, camera 1293, display screen 1294, and user An identification module (subscriber identification module, SIM) card interface 1295 and the like.
  • SIM subscriber identification module
  • the sensor module 1280 can include pressure sensor 1280A, gyroscope sensor 1280B, air pressure sensor 1280C, magnetic sensor 1280D, acceleration sensor 1280E, distance sensor 1280F, proximity light sensor 1280G, fingerprint sensor 1280H, temperature sensor 1280J, touch sensor 1280K, ambient light Sensor 1280L, bone conduction sensor 1280M, etc.
  • the structure shown in the embodiment of the present invention does not constitute a specific limitation on the electronic device.
  • the electronic device may include more or fewer components than shown in the illustrations, or combine certain components, or separate certain components, or arrange different components.
  • the illustrated components can be realized in hardware, software or a combination of software and hardware.
  • the processor 1210 may include one or more processing units, for example: the processor 1210 may include an application processor (application processor, AP), a modem processor, a graphics processing unit (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural network processor (neural-network processing unit, NPU) wait. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • application processor application processor, AP
  • modem processor graphics processing unit
  • GPU graphics processing unit
  • image signal processor image signal processor
  • ISP image signal processor
  • controller memory
  • video codec digital signal processor
  • DSP digital signal processor
  • baseband processor baseband processor
  • neural network processor neural-network processing unit, NPU
  • the controller may be the nerve center and command center of the electronic equipment.
  • the controller can generate an operation control signal according to the instruction opcode and timing signal, and complete the control of fetching and executing the instruction.
  • a memory may also be provided in the processor 1210 for storing instructions and data.
  • the memory in processor 1210 is a cache memory. This memory may hold instructions or data that processor 1210 has just used or recycled. If the processor 1210 needs to use the instruction or data again, it can be called directly from the memory. Repeated access is avoided, and the waiting time of the processor 1210 is reduced, thereby improving the efficiency of the system.
  • processor 1210 may include one or more interfaces.
  • the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transmitter (universal asynchronous receiver/transmitter, UART) interface, mobile industry processor interface (mobile industry processor interface, MIPI), general-purpose input and output (general-purpose input/output, GPIO) interface, subscriber identity module (subscriber identity module, SIM) interface, and /or universal serial bus (universal serial bus, USB) interface, etc.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART universal asynchronous transmitter
  • MIPI mobile industry processor interface
  • GPIO general-purpose input and output
  • subscriber identity module subscriber identity module
  • SIM subscriber identity module
  • USB universal serial bus
  • the I2C interface is a bidirectional synchronous serial bus, including a serial data line (serial data line, SDA) and a serial clock line (derail clock line, SCL).
  • processor 1210 may include multiple sets of I2C buses.
  • the processor 1210 can be respectively coupled to the touch sensor 1280K, the charger, the flashlight, the camera 1293 and the like through different I2C bus interfaces.
  • the processor 1210 may be coupled to the touch sensor 1280K through the I2C interface, so that the processor 1210 and the touch sensor 1280K communicate through the I2C bus interface to realize the touch function of the electronic device.
  • the I2S interface can be used for audio communication.
  • processor 1210 may include multiple sets of I2S buses.
  • the processor 1210 may be coupled to the audio module 1270 through an I2S bus to implement communication between the processor 1210 and the audio module 1270 .
  • the audio module 1270 can transmit audio signals to the wireless communication module 1260 through the I2S interface, so as to realize the function of answering calls through the Bluetooth headset.
  • the PCM interface can also be used for audio communication, sampling, quantizing and encoding the analog signal.
  • the audio module 1270 and the wireless communication module 1260 can be coupled through a PCM bus interface.
  • the audio module 1270 can also transmit audio signals to the wireless communication module 1260 through the PCM interface, so as to realize the function of answering calls through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.
  • the UART interface is a universal serial data bus used for asynchronous communication.
  • the bus can be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication.
  • a UART interface is generally used to connect the processor 1210 and the wireless communication module 1260 .
  • the processor 1210 communicates with the Bluetooth module in the wireless communication module 1260 through the UART interface to realize the Bluetooth function.
  • the audio module 1270 can transmit audio signals to the wireless communication module 1260 through the UART interface, so as to realize the function of playing music through the Bluetooth headset.
  • the MIPI interface can be used to connect the processor 1210 with the display screen 1294, the camera 1293 and other peripheral devices.
  • MIPI interface includes camera serial interface (camera serial interface, CSI), display serial interface (display serial interface, DSI), etc.
  • the processor 1210 communicates with the camera 1293 through the CSI interface to realize the shooting function of the electronic device.
  • the processor 1210 communicates with the display screen 1294 through the DSI interface to realize the display function of the electronic device.
  • the GPIO interface can be configured by software.
  • the GPIO interface can be configured as a control signal or as a data signal.
  • the GPIO interface can be used to connect the processor 1210 with the camera 1293 , the display screen 1294 , the wireless communication module 1260 , the audio module 1270 , the sensor module 1280 and so on.
  • the GPIO interface can also be configured as an I2C interface, I2S interface, UART interface, MIPI interface, etc.
  • the USB interface 1230 is an interface conforming to the USB standard specification, specifically, it may be a Mini USB interface, a Micro USB interface, a USB Type C interface, and the like.
  • the USB interface 1230 can be used to connect a charger to charge the electronic device, and can also be used to transmit data between the electronic device and peripheral devices. It can also be used to connect headphones and play audio through them. This interface can also be used to connect other electronic devices, such as AR devices.
  • the interface connection relationship between the modules shown in the embodiment of the present invention is only a schematic illustration, and does not constitute a structural limitation of the electronic device.
  • the electronic device may also adopt different interface connection methods in the above embodiments, or a combination of multiple interface connection methods.
  • the wireless communication function of the electronic device can be realized by the antenna 1, the antenna 2, the mobile communication module 1250, the wireless communication module 1260, the modem processor and the baseband processor.
  • the wireless communication module 1260 can provide wireless local area networks (wireless local area networks, WLAN) (such as wireless fidelity (Wi-Fi) network), bluetooth (bluetooth, BT), global navigation satellite system, etc. (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions.
  • the wireless communication module 1260 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 1260 receives electromagnetic waves via the antenna 2 , frequency-modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 1210 .
  • the wireless communication module 1260 can also receive the signal to be sent from the processor 1210 , frequency-modulate it, amplify it, and convert it into electromagnetic waves through the antenna 2 for radiation.
  • the antenna 1 of the electronic device is coupled to the mobile communication module 1250, and the antenna 2 is coupled to the wireless communication module 1260, so that the electronic device can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (general packet radio service, GPRS), code division multiple access (code division multiple access, CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), BT, GNSS, WLAN, NFC , FM, and/or IR techniques, etc.
  • GSM global system for mobile communications
  • general packet radio service general packet radio service
  • CDMA code division multiple access
  • WCDMA broadband Code division multiple access
  • time division code division multiple access time-division code division multiple access
  • TD-SCDMA time-division code division multiple access
  • LTE long term evolution
  • BT GNSS
  • the GNSS may include a global positioning system (global positioning system, GPS), a global navigation satellite system (global navigation satellite system, GLONASS), a Beidou navigation satellite system (beidou navigation satellite system, BDS), a quasi-zenith satellite system (quasi -zenith satellite system (QZSS) and/or satellite based augmentation systems (SBAS).
  • GPS global positioning system
  • GLONASS global navigation satellite system
  • Beidou navigation satellite system beidou navigation satellite system
  • BDS Beidou navigation satellite system
  • QZSS quasi-zenith satellite system
  • SBAS satellite based augmentation systems
  • the electronic device realizes the display function through the GPU, the display screen 1294, and the application processor.
  • the GPU is a microprocessor for image processing, connected to the display screen 1294 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 1210 may include one or more GPUs that execute program instructions to generate or change display information.
  • the display screen 1294 is used to display images, videos and the like.
  • Display 1294 includes a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode or an active matrix organic light emitting diode (active-matrix organic light emitting diode, AMOLED), flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light emitting diodes (quantum dot light emitting diodes, QLED), etc.
  • the electronic device may include 1 or N display screens 1294, where N is a positive integer greater than 1.
  • the electronic device can realize the shooting function through ISP, camera 1293 , video codec, GPU, display screen 1294 and application processor.
  • the ISP is used to process the data fed back by the camera 1293 .
  • the light is transmitted to the photosensitive element of the camera through the lens, and the light signal is converted into an electrical signal, and the photosensitive element of the camera transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye.
  • ISP can also perform algorithm optimization on image noise, brightness, and skin color.
  • ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • the ISP may be located in the camera 1293.
  • Camera 1293 is used to capture still images or video.
  • the object generates an optical image through the lens and projects it to the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the light signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
  • the ISP outputs the digital image signal to the DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other image signals.
  • the electronic device may include 1 or N cameras 1293, where N is a positive integer greater than 1.
  • Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when an electronic device selects a frequency point, a digital signal processor is used to perform Fourier transform on the frequency point energy, etc.
  • Video codecs are used to compress or decompress digital video.
  • An electronic device may support one or more video codecs.
  • the electronic device can play or record video in multiple encoding formats, for example: moving picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4, etc.
  • the NPU is a neural-network (NN) computing processor.
  • NPU neural-network
  • Applications such as intelligent cognition of electronic devices can be realized through NPU, such as: image recognition, face recognition, speech recognition, text understanding, etc.
  • the external memory interface 1220 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device.
  • the external memory card communicates with the processor 1210 through the external memory interface 1220 to implement a data storage function. Such as saving music, video and other files in the external memory card.
  • the internal memory 1221 may be used to store computer-executable program codes including instructions.
  • the processor 1210 executes various functional applications and data processing of the electronic device by executing instructions stored in the internal memory 1221 .
  • the internal memory 1221 may include an area for storing programs and an area for storing data.
  • the stored program area can store an operating system, at least one application program required by a function (such as a sound playing function, an image playing function, etc.) and the like.
  • the storage data area can store data (such as audio data, phone book, etc.) created during the use of the electronic device.
  • the internal memory 1221 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (universal flash storage, UFS) and the like.
  • the electronic device can implement audio functions through the audio module 1270, the speaker 1270A, the receiver 1270B, the microphone 1270C, the earphone interface 1270D, and the application processor. Such as music playback, recording, etc.
  • the audio module 1270 is used to convert digital audio information into analog audio signal output, and is also used to convert analog audio input into digital audio signal.
  • the audio module 1270 may also be used to encode and decode audio signals.
  • the audio module 1270 may be set in the processor 1210 , or some functional modules of the audio module 1270 may be set in the processor 1210 .
  • Loudspeaker 1270A also called “horn" is used to convert audio electrical signals into sound signals.
  • the electronic device can listen to music through speaker 1270A, or listen to hands-free calls.
  • Receiver 1270B also called “earpiece” is used to convert audio electrical signals into audio signals.
  • the receiver 1270B can be placed close to the human ear to receive the voice.
  • Microphone 1270C also called “microphone” or “microphone” is used to convert sound signals into electrical signals. When making a phone call or sending a voice message, the user can put his mouth close to the microphone 1270C to make a sound, and input the sound signal to the microphone 1270C.
  • the electronic device may be provided with at least one microphone 1270C. In other embodiments, the electronic device can be provided with two microphones 1270C, which can also implement a noise reduction function in addition to collecting sound signals. In some other embodiments, the electronic device can also be provided with three, four or more microphones 1270C to realize the collection of sound signals, noise reduction, identification of sound sources, and realization of directional recording functions, etc.
  • the earphone interface 1270D is used for connecting wired earphones.
  • the earphone interface 1270D may be a USB interface 1230, or a 3.5mm open mobile terminal platform (open mobile terminal platform, OMTP) standard interface, or a cellular telecommunications industry association of the USA (CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association of the USA
  • the pressure sensor 1280A is used to sense the pressure signal and convert the pressure signal into an electrical signal.
  • pressure sensor 1280A may be located on display screen 1294 .
  • pressure sensors 1280A such as resistive pressure sensors, inductive pressure sensors, and capacitive pressure sensors.
  • a capacitive pressure sensor may be comprised of at least two parallel plates with conductive material.
  • the electronic device detects the intensity of the touch operation according to the pressure sensor 1280A.
  • the electronic device may also calculate the touched position according to the detection signal of the pressure sensor 1280A.
  • touch operations acting on the same touch position but with different touch operation intensities may correspond to different operation instructions. For example: when a touch operation with a touch operation intensity less than the first pressure threshold acts on the short message application icon, an instruction to view short messages is executed. When a touch operation whose intensity is greater than or equal to the first pressure threshold acts on the icon of the short message application, the instruction of creating a new short message is executed.
  • the gyroscope sensor 1280B can be used to determine the motion posture of the electronic device. In some embodiments, the angular velocity of the electronic device about three axes (ie, x, y, and z axes) can be determined by the gyro sensor 1280B.
  • the gyro sensor 1280B can be used for image stabilization. Exemplarily, when the shutter is pressed, the gyro sensor 1280B detects the shake angle of the electronic device, calculates the distance that the lens module needs to compensate according to the angle, and allows the lens to counteract the shake of the electronic device through reverse movement to achieve anti-shake.
  • the gyroscope sensor 1280B can also be used for navigation and somatosensory game scenes.
  • the air pressure sensor 1280C is used to measure air pressure.
  • the electronic device calculates the altitude through the air pressure value measured by the air pressure sensor 1280C to assist positioning and navigation.
  • the magnetic sensor 1280D includes a Hall sensor.
  • the electronic device may detect opening and closing of the flip holster using the magnetic sensor 1280D.
  • the electronic device can detect the opening and closing of the flip according to the magnetic sensor 1280D. Then according to the detected opening and closing state of the holster or the opening and closing state of the flip cover, features such as automatic unlocking of the flip cover are set.
  • the acceleration sensor 1280E can detect the acceleration of the electronic device in various directions (generally three axes). When the electronic device is stationary, the magnitude and direction of gravity can be detected. It can also be used to recognize the posture of electronic devices, and can be used in applications such as horizontal and vertical screen switching, pedometers, etc.
  • Distance sensor 1280F used to measure distance.
  • Electronic devices can measure distance via infrared or laser light. In some embodiments, when shooting a scene, the electronic device can use the distance sensor 1280F for distance measurement to achieve fast focusing.
  • Proximity light sensor 1280G may include, for example, light emitting diodes (LEDs) and light detectors, such as photodiodes.
  • the light emitting diodes may be infrared light emitting diodes.
  • Electronic devices emit infrared light outwards through light-emitting diodes.
  • Electronic devices use photodiodes to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object in the vicinity of the electronic device. When insufficient reflected light is detected, the electronic device may determine that there is no object in the vicinity of the electronic device.
  • the electronic device can use the proximity light sensor 1280G to detect that the user holds the electronic device close to the ear to make a call, so as to automatically turn off the screen to save power.
  • the proximity light sensor 1280G can also be used in leather case mode, automatic unlock and lock screen in pocket mode.
  • the ambient light sensor 1280L is used for sensing ambient light brightness.
  • the electronic device can adaptively adjust the brightness of the display screen 1294 according to the perceived ambient light brightness.
  • the ambient light sensor 1280L can also be used to automatically adjust the white balance when taking pictures.
  • the ambient light sensor 1280L can also cooperate with the proximity light sensor 1280G to detect whether the electronic device is in the pocket to prevent accidental touch.
  • the fingerprint sensor 1280H is used to collect fingerprints. Electronic devices can use the collected fingerprint features to unlock fingerprints, access application locks, take pictures with fingerprints, answer incoming calls with fingerprints, etc.
  • the temperature sensor 1280J is used to detect temperature.
  • the electronic device uses the temperature detected by the temperature sensor 1280J to implement a temperature treatment strategy. For example, when the temperature reported by the temperature sensor 1280J exceeds a threshold, the electronic device may reduce the performance of a processor located near the temperature sensor 1280J, so as to reduce power consumption and implement thermal protection.
  • the electronic device when the temperature is lower than another threshold, the electronic device heats the battery 1242 to avoid abnormal shutdown of the electronic device caused by low temperature.
  • the electronic device boosts the output voltage of the battery 1242 to avoid abnormal shutdown caused by low temperature.
  • Touch sensor 1280K also known as "touch panel”.
  • the touch sensor 1280K can be arranged on the display screen 1294, and the touch sensor 1280K and the display screen 1294 form a touch screen, also called “touch screen”.
  • the touch sensor 1280K is used to detect a touch operation on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • Visual output related to touch operations can be provided through the display screen 1294 .
  • the touch sensor 1280K may also be disposed on the surface of the electronic device, which is different from the position of the display screen 1294 .
  • the bone conduction sensor 1280M can acquire vibration signals.
  • the bone conduction sensor 1280M can acquire the vibration signal of the vibrating bone mass of the human voice.
  • the bone conduction sensor 1280M can also contact the human pulse and receive the blood pressure beating signal.
  • the bone conduction sensor 1280M can also be disposed in the earphone, combined into a bone conduction earphone.
  • the audio module 1270 can analyze the voice signal based on the vibration signal of the vibrating bone mass of the vocal part acquired by the bone conduction sensor 1280M, so as to realize the voice function.
  • the application processor can analyze the heart rate information based on the blood pressure beating signal acquired by the bone conduction sensor 1280M, so as to realize the heart rate detection function.
  • the keys 1290 include a power key, a volume key, and the like. Key 1290 may be a mechanical key. It can also be a touch button.
  • the electronic device can receive key input and generate key signal input related to user settings and function control of the electronic device.
  • the motor 1291 can generate a vibrating prompt.
  • the motor 1291 can be used for incoming call vibration prompts, and can also be used for touch vibration feedback.
  • touch operations applied to different applications may correspond to different vibration feedback effects.
  • the motor 1291 can also correspond to different vibration feedback effects for touch operations on different areas of the display screen 1294 .
  • Different application scenarios for example: time reminder, receiving information, alarm clock, games, etc.
  • the touch vibration feedback effect can also support customization.
  • the indicator 1292 can be an indicator light, and can be used to indicate charging status, power change, and can also be used to indicate messages, missed calls, notifications, and the like.
  • the SIM card interface 1295 is used for connecting a SIM card.
  • the SIM card can be connected and separated from the electronic device by being inserted into the SIM card interface 1295 or pulled out from the SIM card interface 1295 .
  • the electronic device can support 1 or N SIM card interfaces, where N is a positive integer greater than 1.
  • SIM card interface 1295 can support Nano SIM card, Micro SIM card, SIM card, etc. Multiple cards can be inserted into the same SIM card interface 1295 at the same time. The types of the multiple cards may be the same or different.
  • the SIM card interface 1295 is also compatible with different types of SIM cards.
  • the SIM card interface 1295 is also compatible with external memory cards.
  • the electronic device interacts with the network through the SIM card to realize functions such as calling and data communication.
  • the electronic device adopts an eSIM, that is, an embedded SIM card.
  • the eSIM card can be embedded in the electronic device and cannot be separated from the electronic device.
  • the software system of the electronic device may adopt a layered architecture, an event-driven architecture, a micro-kernel architecture, a micro-service architecture, or a cloud architecture.
  • the Android system with layered architecture is taken as an example to illustrate the software structure of the electronic device.
  • FIG. 13 is a software structural block diagram of an electronic device according to an embodiment of the present application.
  • the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Layers communicate through software interfaces.
  • the Android system is divided into four layers, which are respectively the application program layer, the application program framework layer, the Android runtime (Android runtime) and the system library, and the kernel layer from top to bottom.
  • the application layer can consist of a series of application packages.
  • the application package may include application programs such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, and short message.
  • application programs such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, and short message.
  • the application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer.
  • the application framework layer includes some predefined functions.
  • the application framework layer can include window manager, content provider, view system, phone manager, resource manager, notification manager, etc.
  • a window manager is used to manage window programs.
  • the window manager can get the size of the display screen, determine whether there is a status bar, lock the screen, capture the screen, etc.
  • Content providers are used to store and retrieve data and make it accessible to applications.
  • Said data may include video, images, audio, calls made and received, browsing history and bookmarks, phonebook, etc.
  • the view system includes visual controls, such as controls for displaying text, controls for displaying pictures, and so on.
  • the view system can be used to build applications.
  • a display interface can consist of one or more views.
  • a display interface including a text message notification icon may include a view for displaying text and a view for displaying pictures.
  • the phone manager is used to provide communication functions of electronic devices. For example, the management of call status (including connected, hung up, etc.).
  • the resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and so on.
  • the notification manager enables the application to display notification information in the status bar, which can be used to convey notification-type messages, and can automatically disappear after a short stay without user interaction.
  • the notification manager is used to notify the download completion, message reminder, etc.
  • the notification manager can also be a notification that appears on the top status bar of the system in the form of a chart or scroll bar text, such as a notification of an application running in the background, or a notification that appears on the screen in the form of a dialog window.
  • prompting text information in the status bar issuing a prompt sound, vibrating the electronic device, and flashing the indicator light, etc.
  • the Android Runtime includes core library and virtual machine. The Android runtime is responsible for the scheduling and management of the Android system.
  • the core library consists of two parts: one part is the function function that the java language needs to call, and the other part is the core library of Android.
  • the application layer and the application framework layer run in virtual machines.
  • the virtual machine executes the java files of the application program layer and the application program framework layer as binary files.
  • the virtual machine is used to perform functions such as object life cycle management, stack management, thread management, security and exception management, and garbage collection.
  • a system library can include multiple function modules. For example: surface manager (surface manager), media library (Media Libraries), 3D graphics processing library (eg: OpenGL ES), 2D graphics engine (eg: SGL), etc.
  • the surface manager is used to manage the display subsystem and provides the fusion of 2D and 3D layers for multiple applications.
  • the media library supports playback and recording of various commonly used audio and video formats, as well as still image files, etc.
  • the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
  • the 3D graphics processing library is used to implement 3D graphics drawing, image rendering, compositing, and layer processing, etc.
  • 2D graphics engine is a drawing engine for 2D drawing.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer includes at least a display driver, a camera driver, an audio driver, and a sensor driver.
  • a corresponding hardware interrupt is sent to the kernel layer.
  • the kernel layer processes touch operations into original input events (including touch coordinates, time stamps of touch operations, and other information). Raw input events are stored at the kernel level.
  • the application framework layer obtains the original input event from the kernel layer, and identifies the control corresponding to the input event. Take the touch operation as a touch click operation, and the control corresponding to the click operation is the control of the camera application icon as an example.
  • the camera application calls the interface of the application framework layer to start the camera application, and then starts the camera driver by calling the kernel layer.
  • Camera 1293 captures still images or video.
  • An embodiment of the present application also provides an electronic device, including: a processor, configured to run a computer program stored in a memory, so as to implement one or more steps in any one of the above methods.
  • the embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores instructions, and when it is run on a computer or a processor, the computer or the processor executes one of the above-mentioned methods or multiple steps.
  • the embodiment of the present application also provides a computer program product including instructions.
  • the computer program product runs on a computer or a processor, it causes the computer or the processor to perform one or more steps in any one of the above methods.
  • An embodiment of the present application also provides a chip system, the chip system includes a memory and a processor, and the processor executes a computer program stored in the memory to implement one or more steps in any one of the above methods.
  • all or part of them may be implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in or transmitted via a computer-readable storage medium.
  • the computer instructions may be transmitted from one web site, computer, server, or data center to another web site, computer, server or data center for transmission.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrated with one or more available media.
  • the available medium may be a magnetic medium (for example, a floppy disk, a hard disk, or a magnetic tape), an optical medium (for example, DVD), or a semiconductor medium (for example, a solid state disk (solid state disk, SSD)), etc.
  • the processes can be completed by computer programs to instruct related hardware.
  • the programs can be stored in computer-readable storage media.
  • When the programs are executed may include the processes of the foregoing method embodiments.
  • the aforementioned storage medium includes: ROM or random access memory RAM, magnetic disk or optical disk, and other various media that can store program codes.
  • the disclosed devices and methods may be implemented in other ways.
  • the system embodiments described above are only illustrative.
  • the division of the modules or units is only a logical function division.
  • multiple units or components can be Incorporation may either be integrated into another system, or some features may be omitted, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
  • the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, all or part of the procedures in the method of the above-mentioned embodiments in the present application can be completed by instructing related hardware through a computer program.
  • the computer program can be stored in a computer-readable storage medium.
  • the computer program When executed by a processor, the steps in the above-mentioned various method embodiments can be realized.
  • the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file or some intermediate form.
  • the computer-readable medium may at least include: any entity or device capable of carrying computer program codes to a terminal device, a recording medium, a computer memory, a read-only memory (ROM, Read-Only Memory), a random-access memory (RAM, Random Access Memory), electrical carrier signals, telecommunication signals, and software distribution media.
  • ROM read-only memory
  • RAM random-access memory
  • electrical carrier signals telecommunication signals
  • software distribution media Such as U disk, mobile hard disk, magnetic disk or optical disk, etc.
  • computer readable media may not be electrical carrier signals and telecommunication signals under legislation and patent practice.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Business, Economics & Management (AREA)
  • Emergency Management (AREA)
  • General Physics & Mathematics (AREA)
  • Telephonic Communication Services (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

本申请适用于终端技术领域,提供了一种语音解析方法、电子设备、可读存储介质及芯片***,所述方法包括:获取语音指令和运行的应用程序所发出的信息,所述语音指令用于指示终端设备执行操作,所述应用程序所发出的信息包括用于提醒用户的提醒信息;根据所述语音指令和所述提醒信息,确定所述语音指令对应的用户意图。通过获取应用程序发出的信息,将应用程序发出的信息作为确定用户意图的一个因素,可以提高确定语音指令对应的用户意图的准确性,从而可以提高终端设备与用户进行语音交互的效率。

Description

语音解析方法、电子设备、可读存储介质及芯片***
本申请要求于2021年11月30日提交国家知识产权局、申请号为202111453243.7、申请名称为“语音解析方法、电子设备、可读存储介质及芯片***”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及终端技术领域,尤其涉及一种语音解析方法、电子设备、可读存储介质及芯片***。
背景技术
随着终端设备的不断发展,终端设备不但可以根据接收的用户触发的点击等操作执行动作,还可以通过语音助手检测用户发出的语音,根据用户发出的语音执行动作。
现有技术中,终端设备可以通过语音助手检测用户发出的语音指令,并结合终端设备当前界面所展示的界面内容,对用户发出的语音指令进行解析,确定语音指令对应的用户意图,之后可以控制终端设备执行与用户意图相匹配的操作。
但是,终端设备在某些场景中无法准确理解用户意图,可能会触发错误的操作或反复询问用户,造成终端设备与用户交互效率较低的问题。
发明内容
本申请提供一种语音解析方法、电子设备、可读存储介质及芯片***,解决了现有技术中终端设备在某些场景中与用户交互效率较低的问题。
为达到上述目的,本申请采用如下技术方案:
第一方面,提供一种语音解析方法,包括:
获取语音指令和运行的应用程序所发出的信息,所述语音指令用于指示终端设备执行操作,所述应用程序所发出的信息包括用于提醒用户的提醒信息;
根据所述语音指令和所述提醒信息,确定所述语音指令对应的用户意图。
通过在获取到语音指令时,获取运行的应用程序所发出的信息,将应用程序发出的信息也作为确定用户意图的一个因素,可以提高确定语音指令所对应的用户意图的准确性,从而可以提高终端设备与用户进行语音交互的效率。
在第一方面的第一种可能的实现方式中,在所述根据所述语音指令和所述提醒信息,确定所述语音指令对应的用户意图之前,所述方法还包括:
获取第一应用程序列表和第二应用程序列表,所述第一应用程序列表为所述终端设备安装的各应用程序的列表,所述第二应用程序列表为所述终端设备当前运行的应用程序的列表;
根据所述第一应用程序列表和所述第二应用程序列表,确定与所述语音指令相对应的应用程序的标识、以及所述应用程序的运行状态;
所述根据所述语音指令和所述提醒信息,确定所述语音指令对应的用户意图,包括:
若所述应用程序的运行状态为后台运行,则根据所述提醒信息、所述语音指令和所述应用程序的标识,确定所述语音指令对应的所述用户意图;
若所述应用程序的运行状态为前台运行,则根据所述应用程序的当前界面,获取所述当前界面对应的界面信息,并根据所述语音指令、所述提醒信息和所述界面信息,确定所述语音指令对应的所述用户意图。
根据第一应用程序列表和第二应用程序列表,确定与语音指令相对应的应用程序,从而确定该应用程序的运行状态,进而可以根据不同的运行状态,采用不同的方式确定语音指令对应的用户意图,可以提高确定用户意图的灵活性。
若该应用程序的运行状态为前台运行,则可以继续获取该应用程序的界面信息,从而可以根据语音指令和提醒信息,再结合获取的界面信息确定语音指令相对应的用户意图,可以提高确定用户意图的准确性。
基于第一方面的第一种可能的实现方式,在第一方面的第二种可能的实现方式中,所述根据所述应用程序的当前界面,获取所述当前界面对应的界面信息,包括:
对所述当前界面进行提取,得到所述当前界面包括的界面内容;
对所述界面内容进行解析,得到所述应用程序对应的界面信息。
基于第一方面的上述任意一种可能的实现方式,在第一方面的第三种可能的实现方式中,所述获取语音指令和运行的应用程序所发出的信息,包括:
在第一时刻获取所述语音指令;
根据所述第一时刻,获取在第一时刻之前预设时间内运行的各个所述应用程序所发出的信息。
通过在第一时刻获取语音指令,再获取第一时刻之前预设时间内应用程序发出的信息,可以减少获取应用程序发出的信息的工作量,可以提高获取语音指令和应用程序所发出的信息的多样性和灵活性。
基于第一方面的上述除第三种之外的其他任意一种可能的实现方式,在第一方面的第四种可能的实现方式中,所述获取运行的应用程序所发出的信息,包括:
实时获取运行的所述应用程序所发出的信息。
通过实时获取应用程序发出的信息,可以在获取到语音指令时,可以及时结合获取的信息,确定语音指令对应的用户意图,从而可以提高确定用户意图的效率,可以提高获取应用程序所发出的信息的多样性和灵活性。
基于第一方面的上述任意一种可能的实现方式,在第一方面的第五种可能的实现方式中,所述获取运行的应用程序所发出的信息,包括:
通过预先设置的接口,获取所述终端设备播报的音频数据;
采用自动语音识别技术ASR对所述音频数据进行转换,得到所述应用程序发出的文本形式的信息。
通过采用获取音频数据的方式,获取应用程序所发出的音频数据,并对音频数据进行转换,得到应用程序发出的文本形式的信息,可以提高获取应用程序所发出信息的灵活性。
基于第一方面的上述除第五种之外的其他任意一种可能的实现方式,在第一方面的第六种可能的实现方式中,所述获取运行的应用程序所发出的信息,包括:
通过预先设置的接口,对所述应用程序发送的文本数据进行提取,得到所述应用程序发出的文本形式的信息。
通过采用提取文本数据的方式,获取应用程序所发出的信息,可以提高获取应用程序所发出信息的效率,可以提高获取应用程序所发出信息的灵活性。
基于第一方面的上述任意一种可能的实现方式,在第一方面的第七种可能的实现方式中,所述根据所述语音指令和所述提醒信息,确定所述语音指令对应的用户意图之前,所述方法还包括:
采用ASR技术对所述语音指令进行转换,得到文本形式的文本指令;
所述根据所述语音指令和所述提醒信息,确定所述语音指令对应的用户意图,包括:
根据所述文本指令和所述提醒信息,确定所述语音指令对应的所述用户意图。
通过对语音指令进行转换,得到文本指令,并基于文本指令,结合提醒信息确定用户意图,可以提高确定用户意图的灵活性和多样性。
基于第一方面的第七种可能的实现方式,在第一方面的第八种可能的实现方式中,所述采用ASR技术对所述语音指令进行转换,得到文本形式的文本指令,包括:
采用语音增强技术对所述语音指令进行去噪,得到去噪后的语音指令;
采用所述ASR技术对所述去噪后的语音指令进行转换,得到文本形式的所述文本指令。
通过对语音指令进行去噪,再对去噪后的语音指令进行转换,得到文本指令,可以提高转换文本指令的准确性,从而可以提高确定用户意图的准确性。
基于第一方面的上述任意一种可能的实现方式,在第一方面的第九种可能的实现方式中,在所述获取语音指令和运行的应用程序所发出的信息之前,所述方法还包括:
根据多种样本数据,建立不同种类的样本数据之间的多种关联关系,多种所述样本数据包括:样本提醒信息、样本界面内容、样本语音指令和样本用户意图,多种所述关联关系包括:所述样本用户意图与所述样本提醒信息之间的关联关系,所述样本用户意图与所述样本语音指令之间的关联关系,所述样本用户意图与所述样本界面内容之间的关联关系;
根据多种所述关联关系进行训练,得到融合模型,所述融合模型为单个模型或多个模型所组成的模型组。
基于第一方面的第九种可能的实现方式,在第一方面的第十种可能的实现方式中,所述根据所述语音指令和所述提醒信息,确定所述语音指令对应的用户意图,包括:
通过所述融合模型,结合所述语音指令和所述提醒信息,确定所述语音指令对应的所述用户意图。
通过融合模型对获取的语音指令和提醒信息进行解析,得到融合模型输出的与语音指令和提醒信息相匹配的用户意图,可以提高确定用户意图的准确性。
基于第一方面的上述任意一种可能的实现方式,在第一方面的第十一种可能的实现方式中,在所述根据所述语音指令和所述提醒信息,确定所述语音指令对应的用户意图之后,所述方法还包括:
根据所述用户意图,调用意图执行接口,执行与所述用户意图相匹配的操作。
基于第一方面的上述任意一种可能的实现方式,在第一方面的第十二种可能的实现方式中,所述方法应用在多设备场景中,所述多设备场景包括第一终端设备和第二终端设备,所述第一终端设备与所述第二终端设备连接;
所述获取语音指令和运行的应用程序所发出的信息,包括:
所述第一终端设备获取语音指令和所述第一终端设备运行的应用程序所发出的信息;
所述第一终端设备根据所述语音指令,向第二终端设备发送信息请求指令,所述信息请求指令用于指示所述第二终端设备获取、并向所述第一终端设备反馈所述第二终端设备运行的应用程序所发出的信息;
所述第一终端设备接收所述第二终端设备反馈的运行的应用程序所发出的信息。
在多设备场景中任意一个采集到语音指令的终端设备,可以根据语音指令对多设备场景中的其他设备进行控制,可以提高语音指令控制终端设备的灵活性。
第二方面,提供一种语音解析装置,包括:
第一获取模块,用于获取语音指令和运行的应用程序所发出的信息,所述语音指令用于指示终端设备执行操作,所述应用程序所发出的信息包括用于提醒用户的提醒信息;
第一确定模块,用于根据所述语音指令和所述提醒信息,确定所述语音指令对应的用户意图。
在第二方面的第一种可能的实现方式中,所述装置还包括:
第二获取模块,用于获取第一应用程序列表和第二应用程序列表,所述第一应用程序列表为所述终端设备安装的各应用程序的列表,所述第二应用程序列表为所述终端设备当前运行的应用程序的列表;
第二确定模块,用于根据所述第一应用程序列表和所述第二应用程序列表,确定与所述语音指令相对应的应用程序的标识、以及所述应用程序的运行状态;
所述第一确定模块,具体用于若所述应用程序的运行状态为后台运行,则根据所述提醒信息、所述语音指令和所述应用程序的标识,确定所述语音指令对应的所述用户意图;若所述应用程序的运行状态为前台运行,则根据所述应用程序的当前界面,获取所述当前界面对应的界面信息,并根据所述语音指令、所述提醒信息和所述界面信息,确定所述语音指令对应的所述用户意图。
基于第二方面的第一种可能的实现方式,在第二方面的第二种可能的实现方式中,所述第一确定模块,还具体用于对所述当前界面进行提取,得到所述当前界面包括的界面内容;对所述界面内容进行解析,得到所述应用程序对应的界面信息。
基于第二方面的上述任意一种可能的实现方式,在第二方面的第三种可能的实现方式中,所述获第一获取模块,具体用于在第一时刻获取所述语音指令;根据所述第一时刻,获取在第一时刻之前预设时间内运行的各个所述应用程序所发出的信息。
基于第二方面的上述除第三种之外的其他任意一种可能的实现方式,在第二方面的第四种可能的实现方式中,所述第一获取模块,具体用于实时获取运行的所述应用程序所发出的信息。
基于第二方面的上述任意一种可能的实现方式,在第二方面的第五种可能的实现方式中,所述第一获取模块,还具体用于通过预先设置的接口,获取所述终端设备播报的音频数据;采用自动语音识别技术ASR对所述音频数据进行转换,得到所述应用程序发出的文本形式的信息。
基于第二方面的上述除第五种之外的其他任意一种可能的实现方式,在第二方面的第六种可能的实现方式中,所述第一获取模块,还具体用于通过预先设置的接口,对所 述应用程序发送的文本数据进行提取,得到所述应用程序发出的文本形式的信息。
基于第二方面的上述任意一种可能的实现方式,在第二方面的第七种可能的实现方式中,所述装置还包括:
转换模块,用于采用ASR技术对所述语音指令进行转换,得到文本形式的文本指令;
所述第一确定模块,还具体用于根据所述文本指令和所述提醒信息,确定所述语音指令对应的所述用户意图。
基于第二方面的第七种可能的实现方式,在第二方面的第八种可能的实现方式中,所述转换模块,具体用于采用语音增强技术对所述语音指令进行去噪,得到去噪后的语音指令;采用所述ASR技术对所述去噪后的语音指令进行转换,得到文本形式的所述文本指令。
基于第二方面的上述任意一种可能的实现方式,在第二方面的第九种可能的实现方式中,所述装置还包括:
建立模块,用于根据多种样本数据,建立不同种类的样本数据之间的多种关联关系,多种所述样本数据包括:样本提醒信息、样本界面内容、样本语音指令和样本用户意图,多种所述关联关系包括:所述样本用户意图与所述样本提醒信息之间的关联关系,所述样本用户意图与所述样本语音指令之间的关联关系,所述样本用户意图与所述样本界面内容之间的关联关系;
训练模块,用于根据多种所述关联关系进行训练,得到融合模型,所述融合模型为单个模型或多个模型所组成的模型组。
基于第二方面的第九种可能的实现方式,在第二方面的第十种可能的实现方式中,所述第一确定模块,还具体用于通过所述融合模型,结合所述语音指令和所述提醒信息,确定所述语音指令对应的所述用户意图。
基于第二方面的上述任意一种可能的实现方式,在第二方面的第十一种可能的实现方式中,所述装置还包括:
执行模块,用于根据所述用户意图,调用意图执行接口,执行与所述用户意图相匹配的操作。
基于第二方面的上述任意一种可能的实现方式,在第二方面的第十二种可能的实现方式中,所述装置应用在多设备场景中,所述多设备场景包括第一终端设备和第二终端设备,所述第一终端设备与所述第二终端设备连接;
所述第一获取模块,还具体用于所述第一终端设备获取语音指令和所述第一终端设备运行的应用程序所发出的信息;并根据所述语音指令,向第二终端设备发送信息请求指令,再接收所述第二终端设备反馈的运行的应用程序所发出的信息,所述信息请求指令用于指示所述第二终端设备获取、并向所述第一终端设备反馈所述第二终端设备运行的应用程序所发出的信息。
第三方面,提供一种电子设备,包括:处理器,所述处理器用于运行存储器中存储的计算机程序,以实现如上述第一方面中任一项所述的语音解析方法。
第四方面,提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现如上述第一方面中任一项所述的语音解析方 法。
第五方面,提供一种芯片***,所述芯片***包括存储器和处理器,所述处理器执行所述存储器中存储的计算机程序,以实现如上述第一方面中任一项所述的语音解析方法。
可以理解的是,上述第二方面至第五方面的有益效果可以参见上述第一方面中的相关描述,在此不再赘述。
附图说明
图1A为本申请实施例提供的一种购物类应用程序的界面示意图;
图1B为本申请实施例提供的一种***设置的界面示意图;
图1C为本申请实施例提供的一种地图类应用程序的界面示意图;
图2为本申请实施例提供的一种语音解析方法所涉及的语音解析场景的场景示意图;
图3为本申请实施例提供的一种语音解析方法的示意性流程图;
图4为本申请实施例提供的一种基于软件架构的获取提醒信息的流程示意图;
图5为本申请实施例提供的另一种基于软件架构的获取提醒信息的流程示意图;
图6为本申请实施例提供的一种多设备进行语音解析的流程示意图;
图7为本申请实施例提供的一种语音解析装置的结构框图;
图8为本申请实施例提供的另一种语音解析装置的结构框图;
图9为本申请实施例提供的另一种语音解析装置的结构框图;
图10为本申请实施例提供的另一种语音解析装置的结构框图;
图11为本申请实施例提供的另一种语音解析装置的结构框图;
图12为本申请实施例提供的一种电子设备的结构示意图;
图13是本申请实施例的一种电子设备的软件结构框图。
具体实施方式
以下描述中,为了说明而不是为了限定,提出了诸如特定***结构、技术之类的具体细节,以便透彻理解本申请实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中,省略对众所周知的模型训练方法、语音解析方法、界面解析方法和电子设备的详细说明,以免不必要的细节妨碍本申请的描述。
以下实施例中所使用的术语只是为了描述特定实施例的目的,而并非旨在作为对本申请的限制。如在本申请的说明书和所附权利要求书中所使用的那样,单数表达形式“一个”、“所述”、“上述”和“该”旨在也包括例如“一个或多个”这种表达形式,除非其上下文中明确地有相反指示。
首先,对终端设备与用户进行语音交互的场景进行介绍。
终端设备在运行过程中,可以开启终端设备的语音助手,终端设备可以通过语音助手采集用户发出的语音指令,并对用户的语音指令进行解析,得到语音指令中所包括的信息,确定语音指令对应的用户意图。例如,终端设备可以通过语音助手对空调、音响和电灯等执行开启或关闭的操作。
若语音指令中包括的信息不完整,语音助手无法理解用户的意图,则语音助手可以 结合当前界面所展示的界面内容,确定语音指令对应的用户意图。例如,参见图1A,若终端设备当前界面为购物类应用程序的界面,则语音助手可以根据语音指令执行购物的操作;参见图1B,若终端设备当前界面为***设置界面,则语音助手可以根据语音指令执行开启蓝牙的操作。
进一步地,若语音助手在结合当前界面所展示的界面内容后仍无法确定用户意图,则语音助手不再控制终端设备执行语音指令对应的操作,或者,语音助手需要询问用户以获取更多信息,从而根据用户回答的内容确定语音指令对应的用户意图。
例如,参见图1C,终端设备在开启地图类应用程序进行路线导航的过程中,地图类应用程序提醒用户“当前发现更快路线,可以提前5分钟到达目的地”。若语音助手检测到用户发出的“切换路线”的语音指令,而语音助手无法根据语音指令确定相对应的用户意图,则语音助手不再控制终端设备执行对地图类应用程序执行切换路线的操作,或者,语音助手需要再次向用户询问。
或者,在多设备场景中,两个电磁炉均处于工作状态,且其中一个电磁炉提醒用户“已大火20分钟,建议调低温度,小火慢炖”。若语音助手检测到用户发出的语音指令为“温度调到200度”,则语音助手可以控制两个电磁炉均调整至200度,造成了设备执行错误操作的问题。
因此,本申请提出一种语音解析方法,语音助手通过获取应用程序发出的多种信息,并根据多种信息中对用户进行提醒的提醒信息,基于用户发出的语音指令,再结合当前界面所展示的界面内容,可以准确确定语音指令对应的用户意图,控制终端设备执行与用户意图相匹配的操作,提高用户与语音助手进行交互的效率。
参见图2,图2为本申请实施例提供的一种语音解析方法所涉及的语音解析场景的场景示意图,该语音解析场景中可以包括至少一个终端设备210,其中各个终端设备210可以位于同一网络内,而且,各个终端设备可以包括:智能电视、智能音响、路由器和投影仪等多种类型的设备,本申请实施例对终端设备的类型不做限定。
对于至少一个终端设备210中的每个终端设备210,终端设备210在运行过程中可以开启语音助手,并通过语音助手获取用户发出的语音指令和当前运行的应用程序所发出的多种信息,再根据语音指令和多种信息中的提醒信息,确定与语音指令相对应的用户意图。
具体地,在语音助手检测到用户发出语音指令时,语音助手可以对该语音指令进行解析,同时可以获取应用程序在预设时间内发出的信息,根据发出的信息中对用户进行提醒的提醒信息,结合终端设备当前界面所展示的界面内容,通过预先训练得到的融合模型,输出得到语音指令对应的用户意图,从而可以通过语音助手控制终端设备210执行与用户意图相匹配的操作。
例如,与上述地图导航场景相对应的,语音助手在检测到用户发出的“切换路线”的语音指令后,语音助手还可以获取地图类应用程序发出的信息,并根据多种信息中“发现更快路线”的提醒信息,结合地图类应用程序当前所展示的界面,可以确定语音指令对应的用户意图为将导航路线切换至发现的更快路线。相应的,语音助手可以控制终端设备对地图类应用程序中的导航线路进行切换。
类似的,与上述多设备场景相对应的,语音助手在检测到用户发出“温度调到200 度”的语音指令后,可以获取各个电磁炉发出的信息,并根据其中一个电磁炉发出的“建议调低温度”的提醒信息,语音助手可以确定语音指令对应的用户意图为,将该发出提醒信息的电磁炉的温度调到200度,则语音助手可以控制该电磁炉将温度降低至200度。
需要说明的是,上述是以语音助手先检测到用户发出的语音指令,之后根据获取的多种信息中的提醒信息,确定用户意图的场景。在实际应用中,语音助手也可以实时获取应用程序发出的包括提醒信息在内的多种信息,之后在检测到用户发出的语音指令时,可以结合多种信息中的提醒信息确定用户意图。本申请实施例对获取提醒信息和获取语音指令的先后顺序不做限定。
而且,本申请实施例中所涉及的语音指令可以为用于指示终端设备执行操作的指令。例如,该语音指令用于指示终端设备购买商品、开启或关闭某个功能、或者对终端设备的状态进行调整,本申请实施例对语音指令指示终端设备执行的操作不做限定。
另外,应用程序在运行过程中可以发出多种类型的信息,其中可以包括用于提醒用户的提醒信息。相应的,语音助手可以采用多种方式获取应用程序发出的多种信息。如语音助手可以通过预先设置的接口,拦截应用程序发出的信息;也可以接收应用程序主动向语音助手发送的多种信息,还可以通过其他方式获取应用程序发出的信息,本申请实施例对应用程序所发出信息的类型、以及获取应用程序发出的信息的方式不做限定。
此外,上述举例的包括多设备的语音解析场景是为了便于说明,在实际应用中,语音解析方法可以应用在多种不同的语音解析场景中,如通过语音助手控制智能家居设备的场景、通过语音助手控制车载设备的场景、以及其他通过语音助手控制终端设备的场景,本申请实施例对语音解析场景不做限定。
图3为本申请实施例提供的一种语音解析方法的示意性流程图,作为示例而非限定,该方法可以应用于上述终端设备中,参见图3,该方法包括:
步骤301、根据多种样本数据进行训练,得到融合模型。
其中,融合模型可以是单个模型,也可以是由多个模型组成的模型组,模型组中的各个模型可以配合工作,本申请实施例对融合模型中模型的数量不做限定。例如,若融合模型是模型组,则在融合模型确定用户意图的过程中,模型组中的第一个模型可以根据接收的语音指令、提醒信息和界面内容等信息,确定该语音指令所属的领域;之后,可以向模型组中与该领域相对应的模型,输入语音指令、提醒信息和界面内容等信息,得到该模型输出的用户意图。
终端设备可以通过语音助手与用户进行语音交互,语音助手可以通过预先训练得到的融合模型对用户发出的语音指令进行解析,确定与语音指令相对应的用户意图,从而可以控制终端设备执行与用户意图相匹配的操作。因此,在终端设备与用户进行语音交互之前,终端设备可以根据多种样本数据进行训练,得到融合模型。
其中,多种样本数据可以包括:样本提醒信息、样本界面内容、样本语音指令和样本用户意图等多种样本数据。样本提醒信息可以为应用程序对用户进行提醒的信息,样本界面内容可以为终端设备在开启应用程序时所展示的各个界面分别对应的内容,样本语音指令可以为用户对终端设备发出的声音,样本用户意图可以为终端设备根据语音指令、界面内容和提醒信息所执行的不同操作。
在一种可选实施方式中,终端设备可以根据多种样本数据,建立不同种类的样本数据之间的关联关系,从而可以根据建立的多种关联关系。再将多种关联关系中的样本提醒信息、样本界面内容和样本语音指令等多种样本数据输入预先设置的初始模型,得到初始模型输出的初始用户意图。之后,终端设备可以根据初始模型输出的初始用户意图,结合多种关联关系中与输入初始模型的多个样本数据所对应的样本用户意图,对初始模型进行调整,并重复上述过程,直至训练得到的模型能够较准确的输出与样本提醒信息、样本界面内容和样本语音指令等多种样本数据相对应的样本用户意图。
具体地,终端设备可以先根据各个种类的样本数据,结合检测到的用户触发的操作,对不同种类的样本数据建立关联关系,也即是终端设备可以将任意一个样本用户意图,与其他种类的样本数据中某一样本数据建立关联关系。
其中,终端设备在建立关联关系的过程中,对于每个样本用户意图,可以建立该样本用户意图与其他三种样本数据之间的关联关系。也即是,对于每个样本用户意图,可以建立该样本用户意图与至少一个样本提醒信息之间的关联关系,也可以建立该样本用户意图与至少一个样本语音指令之间的关联关系,还可以建立该样本用户意图与至少一个样本界面内容之间的关联关系,从而得到多种样本数据之间的关联关系。
例如,终端设备可以将样本用户意图“切换至更快路线”与样本提醒信息“当前发现更快路线,可以提前5分钟到达目的地”建立关联关系,也可以将样本用户意图“切换至更快路线”与样本语音指令“切换路线”建立关联关系,还可以将样本用户意图“切换至更快路线”与样本界面内容中地图类应用程序对应的界面建立关联关系。
终端设备在根据部分样本数据(如80%)建立得到多种关联关系后,终端设备可以将多种关联关系作为标签数据,并将大量的样本数据(如样本提醒信息、样本界面内容和样本语音指令)输入预先设置的初始模型中,得到初始模型输出的初始用户意图,再根据作为标签数据的关联关系,比较样本数据对应的样本用户意图与初始用户意图之间的差异,从而对初始模型的参数进行调整。在按照上述方式对初始模型进行多次训练,使得初始模型能够较准确的输出与多种样本数据相对应的样本用户意图,则完成了对初始模型的训练,得到了融合模型。
另外,终端设备在训练得到融合模型后,还可以采用大量样本数据中剩余的样本数据(如20%),对融合模型进行测试,确定融合模型输出的初始用户意图与样本用户意图之间的相似度,从而可以根据得到的相似度,确定是否需要进一步对融合模型进行训练。
需要说明的是,在实际应用中,不但可以通过终端设备根据大量样本数据训练得到融合模型,还可以通过服务器根据样本数据进行训练,得到融合模型,本申请实施例对训练得到融合模型的执行主体不做限定。
而且,大量样本数据可以是通过人工采集得到的,当然也可以通过其他方式进行采集,本申请实施例对样本数据的采集方式也不做限定。例如,可以通过人工操作的方式,对应用程序中用于提醒用户的提醒信息进行截取。
此外,在训练得到融合模型后,终端设备可以将融合模型加入预先设置的语音助手中,以便语音助手可以通过融合模型确定用户意图;终端设备也可以设置与语音助手相对应的接口,以便语音助手可以通过该接口调用融合模型,本申请实施例对语音助手与 融合模型配合工作的方式不做限定。
步骤302、获取当前运行的应用程序所发出的信息。
其中,应用程序在运行过程中可以发出多种类型的信息,且应用程序发出的多种类型的信息中可以包括提醒信息,提醒信息为用于提醒用户的信息。而且,该提醒信息可以由应用程序播报的语音转换得到,也可以由应用程序的文本信息得到,本申请实施例对提醒信息不做限定。
终端设备在运行过程中可以启动不同的应用程序,各个应用程序均可以发出包括提醒信息在内的多种信息,语音助手则可以获取应用程序发出的多种信息,以便在后续步骤中,语音助手可以根据多种信息中的提醒信息,结合用户发出语音指令,确定语音指令对应的用户意图。
相对应的,步骤301中训练得到的融合模型是根据大量样本提醒信息训练得到的,而用于训练融合模型的样本提醒信息是人工采集得到的。因此,融合模型对于语音助手获取的应用程序的多种信息,可以识别多种信息中的提醒信息,从而在后续步骤中,根据提醒信息输出用户意图。而融合模型不识别多种信息中除提醒信息之外的其他信息,则语音助手在向融合模型输入多种信息中的其他信息时,融合模型无法输出用户意图,而是输出异常(如融合模型可以输出为空或者输出为其他)。
在一种可选实施例中,终端设备可以在启动后开启语音助手,语音助手可以对运行的各个应用程序进行监测,并获取应用程序发出的多种信息(例如对应用程序发出的多种信息进行拦截)。相应的,应用程序在运行过程中,可以根据终端设备所处的不断变化的场景,通过语音播报的形式,向用户提醒与变化的场景相对应的提醒信息。语音助手则可以在应用程序进行语音播报提醒信息时,可以通过预先设置的接口获取该提醒信息的音频数据,再采用自动语音识别技术(automatic speech recognition,ASR)对该音频数据进行文本转换,得到应用程序的提醒信息。
例如,参见图4,在地图类应用程序进行导航的场景中,APP层的地图类应用程序可以根据接收的路况信息以及终端设备当前所在的位置,并通过***层中的播放器(media player)播报“发现更快路线”的音频数据,语音助手则可以通过预先设置的接口从播放器获取该音频数据,并对该音频数据进行转换,得到该音频数据对应的文本“发现更快路线”,也即是得到地图类应用程序的提醒信息。
在另一种可选实施例中,与上述获取语音播报的提醒信息类似的,应用程序并未开启语音播报的功能或者应用程序不具备语音播报的功能,应用程序采用显示文本信息的方式对用户进行提醒。相对应的,语音助手在应用程序进行提醒时,可以通过预先设置的接口获取该应用程序文本形式的提醒信息。
需要说明的是,在应用程序播报音频数据时,语音助手还可以采用其他方式获取提醒信息。例如,语音助手可以通过预先设置的接口,获取应用程序中与播报的音频数据相对应的文本信息,从而可以将获取的文本信息作为应用程序的提醒信息。
另外,应用程序在检测到需要对用户进行提醒时,应用程序还可以主动向语音助手发送需要播报的音频数据或文本信息,语音助手则无需再次获取音频数据或文本信息。也即是,参见图5,应用程序在APP层即可完成向语音助手发送提醒信息的动作,语音助手无需通过***层的播放器即可获取提醒信息。
而且,应用程序在向语音助手发送音频数据或文本信息后,可以继续播报音频数据或向用户展示文本信息,也可以通过语音助手播报音频数据或文本信息,本申请实施例对播报提醒信息的方式不做限定。
另外需要说明的是,终端设备也可以根据人工采集的大量样本提醒信息,结合应用程序的其他信息,预先训练得到用于识别提醒信息的识别模型。之后,终端设备可以通过该识别模型对语音助手获取的多种信息进行识别,从而得到提醒信息。当然,在实际应用中,语音助手还可以通过其他方式确定应用程序发出的提醒信息,本申请实施例对获取提醒信息的方式不做限定。
步骤303、获取用户发出的语音指令,并对语音指令进行转换,得到文本指令。
其中,语音指令为用于指示终端设备执行操作的指令。
终端设备的应用程序在发出提醒信息后,用户可能会根据提醒信息发出语音指令,指示终端设备或应用程序进行相关的操作。相应的,终端设备则可以通过语音助手采集用户发出的语音指令,并转换得到文本指令。
而且,终端设备在启动后,可以先开启预先设置的语音助手,以便可以通过语音助手持续采集用户发出的语音指令,从而可以根据语音指令控制终端设备执行相对应的操作,提高终端设备与用户进行语音交互的效率。
具体地,终端设备在开启语音助手后,语音助手可以调用终端设备的麦克风,通过麦克风采集音频数据,得到用户发出的语音指令。之后,语音助手可以采用语音增强技术,对语音指令中的噪音进行过滤,得到过滤后的语音指令,也即是用户发出的声音。语音助手可以采用ASR对过滤后的语音指令进行文本转换,得到文本指令。
进一步地,语音助手在采集语音指令的过程中,语音助手可以通过唤醒词的方式采集语音指令。语音助手可以通过麦克风不间断采集音频数据,当采集的音频数据中包括唤醒词时,语音助手可以将在唤醒词之后预设时间内的音频数据作为语音指令。
例如,若唤醒词为“小E”,预设时间10秒,语音助手采集的音频数据为“小E,切换路线”,则语音助手在检测到唤醒词“小E”后,可以将唤醒词之后10秒内的音频数据“切换路线”作为语音指令,并通过语音增强进行过滤,再通过ASR技术进行文本转换,得到文本指令“切换路线”。
此外,语音助手在唤醒词的基础上,还可以结合结束词的方式获取语音指令。语音助手在检测到唤醒词后,可以将唤醒词之后的音频数据均作为语音指令,直至语音助手检测到结束词。
例如,若唤醒词为“小E”,结束词为“再见”,语音助手采集的音频数据为“小E,切换路线,再见”,则语音助手在检测到唤醒词“小E”后,可以将继续采集得到的音频数据“切换路线”作为语音指令,在语音助手在检测到结束词“再见”时,语音助手则不再将采集的音频数据作为语音指令。
需要说明的是,本申请实施例是以先执行步骤302、后执行步骤303为例进行说明的。在实际应用中,终端设备也可以先执行步骤303,后执行步骤302,说明语音助手是先采集得到了用户发出的语音指令,再在预设时间内获取应用程序发出的信息,从而根据应用程序发出的提醒信息,结合语音指令确定语音指令对应的用户意图。
具体地,语音助手并未实时获取应用程序发出的信息,而是先采集得到用户发出的 语音指令,并根据采集得到语音指令的时刻,选取在该时刻之前的预设时间内应用程序发出的信息,从而得到应用程序发出的提醒信息。
例如,预设时间为2分钟,若终端设备的语音助手在11:10时采集到用户发出语音指令,则终端设备可以遍历11:08至11:10运行的应用程序,并获取在11:08至11:10之间运行的每个应用程序所发出的信息。
当然,终端设备还可以仅执行步骤303,不执行步骤302,本申请实施例对步骤302和步骤303的顺序不做限定,对终端设备是否执行步骤302也不做限定。
具体地,当应用程序并未对用户进行提醒时,用户也可以根据当前场景发出指令,终端设备则可以获取用户发出的语音指令。例如,在地图类应用程序进行导航的场景中,地图类应用程序并未发出“发现更短路线”提醒信息,而用户发现前方堵车或当前需要改变目的地时,用户可以向语音助手发出“切换路线”或“改变目的地”的语音指令。
另外,语音助手也可以不断通过终端设备或应用程序获取当前场景对应的信息,以不断确定用户当前所处的环境。在语音助手检测到用户当前所处的环境,与终端设备的状态或应用程序的状态不匹配时,语音助手可以提醒用户,从而根据用户反馈的语音指令确定是否需要对终端设备的状态或应用程序的状态进行调整。
例如,在地图类应用程序进行导航的场景中,地图类应用程序检测到规划路线出现堵车的情况,则地图类应用程序可以将该堵车信息作为提醒信息发送给语音助手。语音助手可以询问用户“在A路段出现堵车,是否切换导航路线?”,若语音助手检测到用户回答“切换路线”,则语音助手可以指示地图类应用程序对导航路线进行切换。
步骤304、根据语音指令对应的文本指令,确定文本指令相对应的应用程序。
终端设备在获取文本指令后,可以查找与该文本指令相对应的应用程序。由于终端设备前台运行的应用程序可以发出提醒信息,后台运行的应用程序也可能发出提醒信息。因此,终端设备可以先根据语音指令对应的文本指令,确定文本指令对应的应用程序,以便在后续步骤中,终端设备可以根据确定的应用程序,获取该应用程序的界面,或者,根据文本指令和提醒信息确定用户意图。
在一种可选实施例中,终端设备可以获取第一应用程序列表和第二应用程序列表,其中第一应用程序列表为终端设备已安装的各个应用程序的列表,第二应用程序列表为终端设备当前运行的各应用程序的列表,包括前台运行的应用程序和后台运行的应用程序。
之后,终端设备可以根据第一应用程序列表和第二应用程序列表,并结合语音指令对应的文本指令,从第一应用程序列表和第二应用程序列表中查找与文本指令相关联的应用程序。
终端设备可以根据语音指令对应的文本指令,从第一应用程序列表和第二应用程序列表中查找与文本指令相匹配的应用程序,从而确定该相匹配的应用程序所对应的运行状态,其中该运行状态用于表示终端设备是否正在运行该应用程序,以及终端设备是在前台运行该应用程序或者在后台运行该应用程序。
若运行状态指示与文本指令相对应的应用程序为终端设备前台运行的应用程序,则终端设备可以执行步骤305,获取该应用程序的界面信息。若运行状态指示与文本指令相对应的应用程序为终端设备后台运行的应用程序,则终端设备可以获取该应用程序的 标识,并跳过步骤305执行步骤306,结合确定的应用程序的标识,通过融合模型确定语音指令对应的用户意图。
例如,若文本指令为“将目的地更换为A地点”,则与该文本指令相关的应用程序可以为具备导航功能的应用程序;若文本指令为“将音量降低至20”,则与该文本指令相关的应用程序可以为具备播放音视频数据功能的应用程序;若文本指令为“开启B应用程序”,则与该文本指令相关的应用程序可以为与文本指令中应用程序名称一致或相似的应用程序。
步骤305、若文本指令相对应的应用程序为终端设备前台运行的应用程序,则根据终端设备当前界面的界面内容,获取界面信息。
其中,当前界面为终端设备当前所展示的界面。界面内容可以包括当前界面中所展示的各个元素,例如,界面内容可以包括:文本、图像和控件等元素,界面内容还可以包括:文本对应的文本位置、图像对应的图像位置、以及控件对应的类别和位置等信息。
而且,界面信息为根据界面内容确定的信息,用于表示终端设备当前所在的场景、以及与当前所在的场景相关联的动作(如终端设备当前正在执行的动作,和/或,终端设备可能执行的动作)。
例如,若当前界面为地图类应用程序的界面,则与当前界面相对应的界面信息可以包括:终端设备当前所在的场景为地图导航场景,终端设备当前正在执行的动作为向目的地导航,终端设备可能执行的动作包括:更换路线、更换目的地和停止导航等。
在一种可选实施例中,终端设备可以通过***服务获取终端设备的当前界面,也即是应用程序对应的界面,并对界面内容中的各个元素进行识别,再对识别得到的各个元素进行解析,得到应用程序的界面信息。
步骤306、根据应用程序发出的信息、以及语音指令对应的文本指令,确定与语音指令对应的用户意图。
其中,用户意图可以为根据步骤301训练得到的融合模型输出得到。例如,若终端设备未得到应用程序的界面信息,则终端设备的语音助手可以向融合模型输入应用程序发出的信息、语音指令对应的文本指令、以及应用程序对应的标识,得到融合模型输出的用户意图;若终端设备得到了应用程序的界面信息,则语音助手可以向融合模型输入应用程序发出的信息、语音指令对应的文本指令、以及应用程序对应的界面信息,得到融合模型输出的用户意图。
在一种可选实施例中,若语音指令对应的应用程序为前台应用程序,则终端设备在得到应用程序的界面信息后,可以将文本指令、界面信息和获取的应用程序发出的信息同时输入融合模型中,通过融合模型对文本指令、界面信息和应用程序发出的信息进行解析,输出得到与文本指令、界面信息和应用程序发出的信息相匹配的操作,也即是与语音指令相对应的用户意图,以便终端设备可以在后续步骤中,根据该用户意图执行相匹配的操作。
例如,若融合模型由模型组组成,则终端设备可以先将各信息输入模型组中的第一个模型,通过第一个模型对各信息进行分析,输出得到语音指令所属的领域,再从模型组的多个模型中确定与语音指令所属领域相匹配的模型,并将文本指令、界面信息和应用程序发出的信息等信息输入该模型,通过该模型对各信息进行解析,得到与语音指令 相对应的用户意图。
在另一种可选实施例中,若语音指令对应的应用程序为后台应用程序,则终端设备在执行完毕步骤304后,跳过步骤305,将文本指令、应用程序发出的信息、以及在步骤304中获取的应用程序的标识输入融合模型,得到融合模型输出的用户意图。
需要说明的是,终端设备在向融合模型输入应用程序发出的信息时,若应用程序发出的信息是终端设备先执行步骤302后执行步骤303获取得到的,则终端设备可以根据获取语音指令的时刻,选取预设时间内应用程序发出的信息输入融合模型;若应用程序发出的信息是终端设备先执行步骤303后执行步骤302获取得到的,则终端设备可以将获取的预设时间内应用程序发出的所有信息输入融合模型。
当然,终端设备还可以采用其他方式选取输入融合模型的信息,本申请实施例对此不做限定。
步骤307、执行与用户意图相匹配的操作。
终端设备在确定与语音指令相对应的用户意图后,可以调用预先设置的意图执行接口,控制终端设备执行与用户意图相匹配的操作,响应用户发出的语音指令,完成终端设备与用户的语音交互,从而可以减少终端设备与用户进行语音交互所需的流程和步骤。
需要说明的是,在实际应用中,步骤301为可选步骤,终端设备可以仅执行一次步骤301,即训练得到融合模型后,后续每次确定语音指令对应的用户意图时,无需再次执行步骤301训练融合模型。而且,步骤305也为可选步骤,若终端设备在步骤304确定语音指令对应的应用程序为终端设备前台运行的应用程序时,则终端设备可以执行步骤305;若终端设备在步骤304确定语音指令对应的应用程序为终端设备后台运行的应用程序时,则终端设备可以跳过步骤305,执行步骤306。
另外,上述实施例中仅是以单个终端设备为例进行了说明,而在实际应用中,上述语音解析方法还可以应用多设备的场景中。例如,可以应用在通过终端设备控制不同智能家居设备的场景,也可以应用在通过终端设备控制车载设备的场景,还可以应用在其他包括多个设备场景中,本申请实施例对此不做限定。
参见图6,图6示出了多设备进行语音解析的流程示意图,图6中以第一终端设备和第二终端设备为例,说明了任意终端设备接入网络进行语音解析的方法,该方法可以包括:
步骤601、第一终端设备在接入网络时,向网络内的其他设备广播第一应用程序列表,并请求第二终端设备的第二应用程序列表。
其中,第一应用程序列表可以为第一终端设备当前运行的各应用程序的列表。类似的,第二应用程序列表可以为第二终端设备当前运行的各应用程序的列表。
而且,第一终端设备接入的网络可以为局域网,也可以为广域网,还可以为互联网,例如,在智能家居场景中,第一终端设备可以接入局域网;在分布式应用的场景中,第一终端设备可以接入广域网,本申请实施例对此不做限定。
当第一终端设备检测到接入网络后,可以生成列表请求信息,并获取第一终端设备的第一应用程序列表,向网络内的其他终端设备广播第一应用程序列表和列表请求信息,以便网络内的其他终端设备可以接收第一应用程序列表,并根据列表请求信息向第一终端设备反馈相对应的应用程序列表。
需要说明的是,第一终端设备在运行过程中,可以不断开启应用程序,也可以不断关闭应用程序,则第一终端设备的第一应用程序列表是不断变化的。相应的,第一终端设备在检测到有新开启的应用程序,或者某个应用程序被关闭时,可以对第一应用程序列表进行更新,并向网络内的其他设备广播更新后的第一应用程序列表。
步骤602、第二终端设备接收第一终端设备的第一应用程序列表,并向第一终端设备反馈第二终端设备的第二应用程序列表。
步骤603、第一终端设备在检测到用户发出的语音指令时,对语音指令进行转换,得到文本指令,并获取第一终端设备的提醒信息和界面信息。
步骤604、第一终端设备向第二终端设备发送信息请求指令。
其中,该信息请求指令用于请求第二终端设备的提醒信息和界面信息。而且,本申请实施例是以先执行步骤603、后执行步骤604为例进行说明。而在实际应用中,第一终端设备也可以同时执行步骤603和步骤604,也即是第一终端设备在对语音指令进行转换的同时,还可以向第二终端设备发送信息请求指令。
当然,第一终端设备还可以先执行步骤604、后执行步骤603,本申请实施例对第一终端设备执行步骤603和步骤604的顺序不做限定。但是,在第一终端设备先执行步骤604、后执行步骤603的过程中,第一终端设备并不是根据检测的语音指令获取第二终端设备的提醒信息和界面信息,而是周期性地获取第二终端设备的提醒信息和界面信息,采用与上述步骤302至步骤303类似的过程,预先获取第二终端设备的提醒信息和界面信息,在检测到用户发出的语音指令时,可以在后续步骤中,确定与语音指令相对应的用户意图。
步骤605、第二终端设备根据第一终端设备发送的信息请求指令,获取第二终端设备的提醒信息和界面信息,并向第一终端设备反馈第二终端设备的提醒信息和界面信息。
其中,步骤603和步骤605获取提醒信息和界面信息的过程,与步骤302和步骤305中获取应用程序发出的信息的过程、以及获取界面信息的过程类似,在此不再赘述。
步骤606、第一终端设备根据获取的多个提醒信息,结合文本指令,确定语音指令对应的应用程序、以及语音指令对应的用户意图。
其中,与步骤602和步骤605相对应的,若在步骤602和步骤605中,第一终端设备获取得到了应用程序的界面信息,则第一终端设备在执行步骤606的过程中,还可以结合获取的界面信息,确定用户意图。但是,若在步骤602和步骤605中,第一终端设备并未获取得到应用程序的界面信息,则第一终端设备在执行步骤606的过程中,即可根据提醒信息和文本指令,结合确定的应用程序的应用标识确定用户意图,在此不再赘述。
步骤607、若语音指令对应的应用程序为第二终端设备运行的应用程序,则第一终端设备向第二终端设备发送用户意图。
第一终端设备在确定用户是针对第二终端设备运动的某个应用程序所发出的语音指令,则第一终端设备可以向第二终端设备发送识别解析得到的用户意图,以便第二终端设备可以根据该用户意图调用意图执行接口,控制第二终端设备执行与用户意图相对应的操作,实现多设备协同工作。
步骤608、第二终端设备根据接收的用户意图,执行与用户意图相对应的操作。
步骤609、若语音指令对应的应用程序为第一终端设备运行的应用程序,则第一终端设备根据用户意图执行与用户意图相对应的操作。
需要说明的是,步骤603、步骤605、步骤606、步骤608和步骤609中获取界面信息和提醒信息,并根据界面信息和提醒信息,结合文本指令确定用户意图,再根据用户意图执行相对应操作的过程,与步骤302至步骤306的过程类似,在此不再赘述。
另外,本申请实施例是以终端设备的语音助手为例,而终端设备安装的部分应用程序也可以与用户进行语音互动,则安装的应用程序也可以采用上述语音解析方法,根据语音指令确定用户意图,实现对终端设备或各应用程序的控制。
例如,用户在驾驶车辆的过程中,不方便对地图类应用程序进行操作,则可以通过语音交互的方式对地图类应用程序发出语音指令,地图类应用程序则可以根据语音指令对导航路线进行切换、对目的地进行变更、对车载设备进行控制或者对车辆进行控制。
综上所述,本申请实施例提供的语音解析方法,通过在获取到语音指令时,获取运行的应用程序所发出的信息,将应用程序发出的信息也作为确定用户意图的一个因素,可以提高确定语音指令所对应的用户意图的准确性,从而可以提高终端设备与用户进行语音交互的效率。
在接收到用户发出的语音指令时,可以对语音指令进行转换,得到文本指令,并获取应用程序发出的信息,通过预先训练的融合模型对文本指令和应用程序发出的信息进行解析,输出与语音指令相对应的用户意图。通过获取应用程序发出的信息,将应用程序发出的信息作为确定用户意图的一个因素,可以提高确定语音指令对应的用户意图的准确性,从而可以提高终端设备与用户进行语音交互的效率。
而且,在用户指令与终端设备前台运行的应用程序相对应时,终端设备还可以获取应用程序的界面信息,融合模型在文本指令和应用程序发出的信息的基础上,结合应用程序的界面信息,可以更准确的确定语音指令对应的用户意图,可以提高确定用户意图的准确性。
另外,在多设备场景中任意一个采集到语音指令的终端设备,可以根据语音指令对多设备场景中的其他设备进行控制,可以提高语音指令控制终端设备的灵活性。
此外,具备语音交互功能的应用程序也可以根据语音指令,结合提醒信息和界面信息,确定与语音指令相对应的用户意图,可以提高应用解析语音指令、获取用户意图的广泛性和灵活性。
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
对应于上文实施例所述的语音解析方法,图7为本申请实施例提供的一种语音解析装置的结构框图,为了便于说明,仅示出了与本申请实施例相关的部分。
参照图7,该装置包括:
第一获取模块701,用于获取语音指令和运行的应用程序所发出的信息,该语音指令用于指示终端设备执行操作,该应用程序所发出的信息包括用于提醒用户的提醒信息;
第一确定模块702,用于根据该语音指令和该提醒信息,确定该语音指令对应的用户意图。
可选的,参见图8,该装置还包括:
第二获取模块703,用于获取第一应用程序列表和第二应用程序列表,该第一应用程序列表为该终端设备安装的各应用程序的列表,该第二应用程序列表为该终端设备当前运行的应用程序的列表;
第二确定模块704,用于根据该第一应用程序列表和该第二应用程序列表,确定与该语音指令相对应的应用程序的标识、以及该应用程序的运行状态;
该第一确定模块702,具体用于若该应用程序的运行状态为后台运行,则根据该提醒信息、该语音指令和该应用程序的标识,确定该语音指令对应的该用户意图;若该应用程序的运行状态为前台运行,则根据该应用程序的当前界面,获取该当前界面对应的界面信息,并根据该语音指令、该提醒信息和该界面信息,确定该语音指令对应的该用户意图。
可选的,该第一确定模块702,还具体用于对该当前界面进行提取,得到该当前界面包括的界面内容;对该界面内容进行解析,得到该应用程序对应的界面信息。
可选的,该获第一获取模块701,具体用于在第一时刻获取该语音指令;根据该第一时刻,获取在第一时刻之前预设时间内运行的各个该应用程序所发出的信息。
可选的,该第一获取模块701,具体用于实时获取运行的该应用程序所发出的信息。
可选的,该第一获取模块701,还具体用于通过预先设置的接口,获取该终端设备播报的音频数据;采用自动语音识别技术ASR对该音频数据进行转换,得到该应用程序发出的文本形式的信息。
可选的,该第一获取模块701,还具体用于通过预先设置的接口,对该应用程序发送的文本数据进行提取,得到该应用程序发出的文本形式的信息。
可选的,参见图9,该装置还包括:
转换模块705,用于采用ASR技术对该语音指令进行转换,得到文本形式的文本指令;
该第一确定模块702,还具体用于根据该文本指令和该提醒信息,确定该语音指令对应的该用户意图。
可选的,该转换模块705,具体用于采用语音增强技术对该语音指令进行去噪,得到去噪后的语音指令;采用该ASR技术对该去噪后的语音指令进行转换,得到文本形式的该文本指令。
可选的,参见图10,该装置还包括:
建立模块706,用于根据多种样本数据,建立不同种类的样本数据之间的多种关联关系,多种该样本数据包括:样本提醒信息、样本界面内容、样本语音指令和样本用户意图,多种该关联关系包括:该样本用户意图与该样本提醒信息之间的关联关系,该样本用户意图与该样本语音指令之间的关联关系,该样本用户意图与该样本界面内容之间的关联关系;
训练模块707,用于根据多种该关联关系进行训练,得到融合模型,该融合模型为单个模型或多个模型所组成的模型组。
可选的,该第一确定模块702,还具体用于通过该融合模型,结合该语音指令和该提醒信息,确定该语音指令对应的该用户意图。
可选的,参见图11,该装置还包括:
执行模块708,用于根据该用户意图,调用意图执行接口,执行与该用户意图相匹配的操作。
可选的,该装置应用在多设备场景中,该多设备场景包括第一终端设备和第二终端设备,该第一终端设备与该第二终端设备连接;
该第一获取模块701,还具体用于该第一终端设备于获取语音指令和该第一终端设备运行的应用程序所发出的信息;根据该语音指令,向第二终端设备发送信息请求指令,该信息请求指令用于指示该第二终端设备获取、并向该第一终端设备反馈该第二终端设备运行的应用程序所发出的信息;接收该第二终端设备反馈的运行的应用程序所发出的信息。
综上所述,本申请实施例提供的语音解析装置,通过在获取到语音指令时,获取运行的应用程序所发出的信息,将应用程序发出的信息也作为确定用户意图的一个因素,可以提高确定语音指令所对应的用户意图的准确性,从而可以提高终端设备与用户进行语音交互的效率。
下面以终端设备为例,对本申请实施例涉及的电子设备进行介绍。请参阅图12,图12为本申请实施例提供的一种电子设备的结构示意图。
电子设备可以包括处理器1210,外部存储器接口1220,内部存储器1221,通用串行总线(universal serial bus,USB)接口1230,充电管理模块1240,电源管理模块1241,电池1242,天线1,天线2,移动通信模块1250,无线通信模块1260,音频模块1270,扬声器1270A,受话器1270B,麦克风1270C,耳机接口1270D,传感器模块1280,按键1290,马达1291,指示器1292,摄像头1293,显示屏1294,以及用户标识模块(subscriber identification module,SIM)卡接口1295等。其中传感器模块1280可以包括压力传感器1280A,陀螺仪传感器1280B,气压传感器1280C,磁传感器1280D,加速度传感器1280E,距离传感器1280F,接近光传感器1280G,指纹传感器1280H,温度传感器1280J,触摸传感器1280K,环境光传感器1280L,骨传导传感器1280M等。
可以理解的是,本发明实施例示意的结构并不构成对电子设备的具体限定。在本申请另一些实施例中,电子设备可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器1210可以包括一个或多个处理单元,例如:处理器1210可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
其中,控制器可以是电子设备的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器1210中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器1210中的存储器为高速缓冲存储器。该存储器可以保存处理器1210刚用过或循环使 用的指令或数据。如果处理器1210需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器1210的等待时间,因而提高了***的效率。
在一些实施例中,处理器1210可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。
I2C接口是一种双向同步串行总线,包括一根串行数据线(serial data line,SDA)和一根串行时钟线(derail clock line,SCL)。在一些实施例中,处理器1210可以包含多组I2C总线。处理器1210可以通过不同的I2C总线接口分别耦合触摸传感器1280K,充电器,闪光灯,摄像头1293等。例如:处理器1210可以通过I2C接口耦合触摸传感器1280K,使处理器1210与触摸传感器1280K通过I2C总线接口通信,实现电子设备的触摸功能。
I2S接口可以用于音频通信。在一些实施例中,处理器1210可以包含多组I2S总线。处理器1210可以通过I2S总线与音频模块1270耦合,实现处理器1210与音频模块1270之间的通信。在一些实施例中,音频模块1270可以通过I2S接口向无线通信模块1260传递音频信号,实现通过蓝牙耳机接听电话的功能。
PCM接口也可以用于音频通信,将模拟信号抽样,量化和编码。在一些实施例中,音频模块1270与无线通信模块1260可以通过PCM总线接口耦合。在一些实施例中,音频模块1270也可以通过PCM接口向无线通信模块1260传递音频信号,实现通过蓝牙耳机接听电话的功能。所述I2S接口和所述PCM接口都可以用于音频通信。
UART接口是一种通用串行数据总线,用于异步通信。该总线可以为双向通信总线。它将要传输的数据在串行通信与并行通信之间转换。在一些实施例中,UART接口通常被用于连接处理器1210与无线通信模块1260。例如:处理器1210通过UART接口与无线通信模块1260中的蓝牙模块通信,实现蓝牙功能。在一些实施例中,音频模块1270可以通过UART接口向无线通信模块1260传递音频信号,实现通过蓝牙耳机播放音乐的功能。
MIPI接口可以被用于连接处理器1210与显示屏1294,摄像头1293等***器件。MIPI接口包括摄像头串行接口(camera serial interface,CSI),显示屏串行接口(display serial interface,DSI)等。在一些实施例中,处理器1210和摄像头1293通过CSI接口通信,实现电子设备的拍摄功能。处理器1210和显示屏1294通过DSI接口通信,实现电子设备的显示功能。
GPIO接口可以通过软件配置。GPIO接口可以被配置为控制信号,也可被配置为数据信号。在一些实施例中,GPIO接口可以用于连接处理器1210与摄像头1293,显示屏1294,无线通信模块1260,音频模块1270,传感器模块1280等。GPIO接口还可以被配置为I2C接口,I2S接口,UART接口,MIPI接口等。
USB接口1230是符合USB标准规范的接口,具体可以是Mini USB接口,Micro  USB接口,USB Type C接口等。USB接口1230可以用于连接充电器为电子设备充电,也可以用于电子设备与***设备之间传输数据。也可以用于连接耳机,通过耳机播放音频。该接口还可以用于连接其他电子设备,例如AR设备等。
可以理解的是,本发明实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对电子设备的结构限定。在本申请另一些实施例中,电子设备也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。
电子设备的无线通信功能可以通过天线1,天线2,移动通信模块1250,无线通信模块1260,调制解调处理器以及基带处理器等实现。
无线通信模块1260可以提供应用在电子设备上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星***(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块1260可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块1260经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器1210。无线通信模块1260还可以从处理器1210接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
在一些实施例中,电子设备的天线1和移动通信模块1250耦合,天线2和无线通信模块1260耦合,使得电子设备可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯***(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位***(global positioning system,GPS),全球导航卫星***(global navigation satellite system,GLONASS),北斗卫星导航***(beidou navigation satellite system,BDS),准天顶卫星***(quasi-zenith satellite system,QZSS)和/或星基增强***(satellite based augmentation systems,SBAS)。
电子设备通过GPU,显示屏1294,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏1294和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器1210可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏1294用于显示图像,视频等。显示屏1294包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备可以包括1个或N个显示屏 1294,N为大于1的正整数。
电子设备可以通过ISP,摄像头1293,视频编解码器,GPU,显示屏1294以及应用处理器等实现拍摄功能。
ISP用于处理摄像头1293反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头1293中。
摄像头1293用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,电子设备可以包括1个或N个摄像头1293,N为大于1的正整数。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。
视频编解码器用于对数字视频压缩或解压缩。电子设备可以支持一种或多种视频编解码器。这样,电子设备可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现电子设备的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。
外部存储器接口1220可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备的存储能力。外部存储卡通过外部存储器接口1220与处理器1210通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。
内部存储器1221可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器1210通过运行存储在内部存储器1221的指令,从而执行电子设备的各种功能应用以及数据处理。内部存储器1221可以包括存储程序区和存储数据区。其中,存储程序区可存储操作***,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储电子设备使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器1221可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。
电子设备可以通过音频模块1270,扬声器1270A,受话器1270B,麦克风1270C,耳机接口1270D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
音频模块1270用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频 输入转换为数字音频信号。音频模块1270还可以用于对音频信号编码和解码。在一些实施例中,音频模块1270可以设置于处理器1210中,或将音频模块1270的部分功能模块设置于处理器1210中。
扬声器1270A,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备可以通过扬声器1270A收听音乐,或收听免提通话。
受话器1270B,也称“听筒”,用于将音频电信号转换成声音信号。当电子设备接听电话或语音信息时,可以通过将受话器1270B靠近人耳接听语音。
麦克风1270C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风1270C发声,将声音信号输入到麦克风1270C。电子设备可以设置至少一个麦克风1270C。在另一些实施例中,电子设备可以设置两个麦克风1270C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,电子设备还可以设置三个,四个或更多麦克风1270C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。
耳机接口1270D用于连接有线耳机。耳机接口1270D可以是USB接口1230,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。
压力传感器1280A用于感受压力信号,可以将压力信号转换成电信号。在一些实施例中,压力传感器1280A可以设置于显示屏1294。压力传感器1280A的种类很多,如电阻式压力传感器,电感式压力传感器,电容式压力传感器等。电容式压力传感器可以是包括至少两个具有导电材料的平行板。当有力作用于压力传感器1280A,电极之间的电容改变。电子设备根据电容的变化确定压力的强度。当有触摸操作作用于显示屏1294,电子设备根据压力传感器1280A检测所述触摸操作强度。电子设备也可以根据压力传感器1280A的检测信号计算触摸的位置。在一些实施例中,作用于相同触摸位置,但不同触摸操作强度的触摸操作,可以对应不同的操作指令。例如:当有触摸操作强度小于第一压力阈值的触摸操作作用于短消息应用图标时,执行查看短消息的指令。当有触摸操作强度大于或等于第一压力阈值的触摸操作作用于短消息应用图标时,执行新建短消息的指令。
陀螺仪传感器1280B可以用于确定电子设备的运动姿态。在一些实施例中,可以通过陀螺仪传感器1280B确定电子设备围绕三个轴(即,x,y和z轴)的角速度。陀螺仪传感器1280B可以用于拍摄防抖。示例性的,当按下快门,陀螺仪传感器1280B检测电子设备抖动的角度,根据角度计算出镜头模组需要补偿的距离,让镜头通过反向运动抵消电子设备的抖动,实现防抖。陀螺仪传感器1280B还可以用于导航,体感游戏场景。
气压传感器1280C用于测量气压。在一些实施例中,电子设备通过气压传感器1280C测得的气压值计算海拔高度,辅助定位和导航。
磁传感器1280D包括霍尔传感器。电子设备可以利用磁传感器1280D检测翻盖皮套的开合。在一些实施例中,当电子设备是翻盖机时,电子设备可以根据磁传感器1280D检测翻盖的开合。进而根据检测到的皮套的开合状态或翻盖的开合状态,设置翻 盖自动解锁等特性。
加速度传感器1280E可检测电子设备在各个方向上(一般为三轴)加速度的大小。当电子设备静止时可检测出重力的大小及方向。还可以用于识别电子设备的姿态,应用于横竖屏切换,计步器等应用。
距离传感器1280F,用于测量距离。电子设备可以通过红外或激光测量距离。在一些实施例中,拍摄场景,电子设备可以利用距离传感器1280F测距以实现快速对焦。
接近光传感器1280G可以包括例如发光二极管(LED)和光检测器,例如光电二极管。发光二极管可以是红外发光二极管。电子设备通过发光二极管向外发射红外光。电子设备使用光电二极管检测来自附近物体的红外反射光。当检测到充分的反射光时,可以确定电子设备附近有物体。当检测到不充分的反射光时,电子设备可以确定电子设备附近没有物体。电子设备可以利用接近光传感器1280G检测用户手持电子设备贴近耳朵通话,以便自动熄灭屏幕达到省电的目的。接近光传感器1280G也可用于皮套模式,口袋模式自动解锁与锁屏。
环境光传感器1280L用于感知环境光亮度。电子设备可以根据感知的环境光亮度自适应调节显示屏1294亮度。环境光传感器1280L也可用于拍照时自动调节白平衡。环境光传感器1280L还可以与接近光传感器1280G配合,检测电子设备是否在口袋里,以防误触。
指纹传感器1280H用于采集指纹。电子设备可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。
温度传感器1280J用于检测温度。在一些实施例中,电子设备利用温度传感器1280J检测的温度,执行温度处理策略。例如,当温度传感器1280J上报的温度超过阈值,电子设备执行降低位于温度传感器1280J附近的处理器的性能,以便降低功耗实施热保护。在另一些实施例中,当温度低于另一阈值时,电子设备对电池1242加热,以避免低温导致电子设备异常关机。在其他一些实施例中,当温度低于又一阈值时,电子设备对电池1242的输出电压执行升压,以避免低温导致的异常关机。
触摸传感器1280K,也称“触控面板”。触摸传感器1280K可以设置于显示屏1294,由触摸传感器1280K与显示屏1294组成触摸屏,也称“触控屏”。触摸传感器1280K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏1294提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器1280K也可以设置于电子设备的表面,与显示屏1294所处的位置不同。
骨传导传感器1280M可以获取振动信号。在一些实施例中,骨传导传感器1280M可以获取人体声部振动骨块的振动信号。骨传导传感器1280M也可以接触人体脉搏,接收血压跳动信号。在一些实施例中,骨传导传感器1280M也可以设置于耳机中,结合成骨传导耳机。音频模块1270可以基于所述骨传导传感器1280M获取的声部振动骨块的振动信号,解析出语音信号,实现语音功能。应用处理器可以基于所述骨传导传感器1280M获取的血压跳动信号解析心率信息,实现心率检测功能。
按键1290包括开机键,音量键等。按键1290可以是机械按键。也可以是触摸式按键。电子设备可以接收按键输入,产生与电子设备的用户设置以及功能控制有关的键信 号输入。
马达1291可以产生振动提示。马达1291可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏1294不同区域的触摸操作,马达1291也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。
指示器1292可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。
SIM卡接口1295用于连接SIM卡。SIM卡可以通过***SIM卡接口1295,或从SIM卡接口1295拔出,实现和电子设备的接触和分离。电子设备可以支持1个或N个SIM卡接口,N为大于1的正整数。SIM卡接口1295可以支持Nano SIM卡,Micro SIM卡,SIM卡等。同一个SIM卡接口1295可以同时***多张卡。所述多张卡的类型可以相同,也可以不同。SIM卡接口1295也可以兼容不同类型的SIM卡。SIM卡接口1295也可以兼容外部存储卡。电子设备通过SIM卡和网络交互,实现通话以及数据通信等功能。在一些实施例中,电子设备采用eSIM,即:嵌入式SIM卡。eSIM卡可以嵌在电子设备中,不能和电子设备分离。
电子设备的软件***可以采用分层架构,事件驱动架构,微核架构,微服务架构,或云架构。本发明实施例以分层架构的Android***为例,示例性说明电子设备的软件结构。
图13是本申请实施例的一种电子设备的软件结构框图。
分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将Android***分为四层,从上至下分别为应用程序层,应用程序框架层,安卓运行时(Android runtime)和***库,以及内核层。
应用程序层可以包括一系列应用程序包。
如图13所示,应用程序包可以包括相机,图库,日历,通话,地图,导航,WLAN,蓝牙,音乐,视频,短信息等应用程序。
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。
如图13所示,应用程序框架层可以包括窗口管理器,内容提供器,视图***,电话管理器,资源管理器,通知管理器等。
窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕等。
内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿等。
视图***包括可视控件,例如显示文字的控件,显示图片的控件等。视图***可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。
电话管理器用于提供电子设备的通信功能。例如通话状态的管理(包括接通,挂断等)。
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在***顶部状态栏的通知,例如后台运行的应用程序的通知,还可以是以对话窗口形式出现在屏幕上的通知。例如在状态栏提示文本信息,发出提示音,电子设备振动,指示灯闪烁等。
Android Runtime包括核心库和虚拟机。Android runtime负责安卓***的调度和管理。
核心库包含两部分:一部分是java语言需要调用的功能函数,另一部分是安卓的核心库。
应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理,堆栈管理,线程管理,安全和异常的管理,以及垃圾回收等功能。
***库可以包括多个功能模块。例如:表面管理器(surface manager),媒体库(Media Libraries),三维图形处理库(例如:OpenGL ES),2D图形引擎(例如:SGL)等。
表面管理器用于对显示子***进行管理,并且为多个应用程序提供了2D和3D图层的融合。
媒体库支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体库可以支持多种音视频编码格式,例如:MPEG4,H.264,MP3,AAC,AMR,JPG,PNG等。
三维图形处理库用于实现三维图形绘图,图像渲染,合成,和图层处理等。
2D图形引擎是2D绘图的绘图引擎。
内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动。
下面结合捕获拍照场景,示例性说明电子设备软件以及硬件的工作流程。
当触摸传感器1280K接收到触摸操作,相应的硬件中断被发给内核层。内核层将触摸操作加工成原始输入事件(包括触摸坐标,触摸操作的时间戳等信息)。原始输入事件被存储在内核层。应用程序框架层从内核层获取原始输入事件,识别该输入事件所对应的控件。以该触摸操作是触摸单击操作,该单击操作所对应的控件为相机应用图标的控件为例,相机应用调用应用框架层的接口,启动相机应用,进而通过调用内核层启动摄像头驱动,通过摄像头1293捕获静态图像或视频。
本申请实施例还提供了一种电子设备,包括:处理器,该处理器用于运行存储器中存储的计算机程序,以实现如上述任一个方法中的一个或多个步骤。
本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有指令,当其在计算机或处理器上运行时,使得计算机或处理器执行上述任一个方法中的一个或多个步骤。
本申请实施例还提供了一种包含指令的计算机程序产品。当该计算机程序产品在计 算机或处理器上运行时,使得计算机或处理器执行上述任一个方法中的一个或多个步骤。
本申请实施例还提供了一种芯片***,该芯片***包括存储器和处理器,该处理器执行该存储器中存储的计算机程序,以实现如上述任一个方法中的一个或多个步骤。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者通过所述计算机可读存储介质进行传输。所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如,固态硬盘(solid state disk,SSD))等。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,该流程可以由计算机程序来指令相关的硬件完成,该程序可存储于计算机可读取存储介质中,该程序在执行时,可包括如上述各方法实施例的流程。而前述的存储介质包括:ROM或随机存储记忆体RAM、磁碟或者光盘等各种可存储程序代码的介质。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能单元、模块的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。上述***中单元、模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
在本申请所提供的实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的***实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个***,或一些特征可以忽略,或不执行。另一点, 所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口,装置或单元的间接耦合或通讯连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质至少可以包括:能够将计算机程序代码携带到终端设备的任何实体或装置、记录介质、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质。例如U盘、移动硬盘、磁碟或者光盘等。在某些司法管辖区,根据立法和专利实践,计算机可读介质不可以是电载波信号和电信信号。
最后应说明的是:以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (16)

  1. 一种语音解析方法,其特征在于,包括:
    获取语音指令和运行的应用程序所发出的信息,所述语音指令用于指示终端设备执行操作,所述应用程序所发出的信息包括用于提醒用户的提醒信息;
    根据所述语音指令和所述提醒信息,确定所述语音指令对应的用户意图。
  2. 根据权利要求1所述的方法,其特征在于,在所述根据所述语音指令和所述提醒信息,确定所述语音指令对应的用户意图之前,所述方法还包括:
    获取第一应用程序列表和第二应用程序列表,所述第一应用程序列表为所述终端设备安装的各应用程序的列表,所述第二应用程序列表为所述终端设备当前运行的应用程序的列表;
    根据所述第一应用程序列表和所述第二应用程序列表,确定与所述语音指令相对应的应用程序的标识、以及所述应用程序的运行状态;
    所述根据所述语音指令和所述提醒信息,确定所述语音指令对应的用户意图,包括:
    若所述应用程序的运行状态为后台运行,则根据所述提醒信息、所述语音指令和所述应用程序的标识,确定所述语音指令对应的所述用户意图;
    若所述应用程序的运行状态为前台运行,则根据所述应用程序的当前界面,获取所述当前界面对应的界面信息,并根据所述语音指令、所述提醒信息和所述界面信息,确定所述语音指令对应的所述用户意图。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述应用程序的当前界面,获取所述当前界面对应的界面信息,包括:
    对所述当前界面进行提取,得到所述当前界面包括的界面内容;
    对所述界面内容进行解析,得到所述应用程序对应的界面信息。
  4. 根据权利要求1至3任一所述的方法,其特征在于,所述获取语音指令和运行的应用程序所发出的信息,包括:
    在第一时刻获取所述语音指令;
    根据所述第一时刻,获取在第一时刻之前预设时间内运行的各个所述应用程序所发出的信息。
  5. 根据权利要求1至3任一所述的方法,其特征在于,所述获取运行的应用程序所发出的信息,包括:
    实时获取运行的所述应用程序所发出的信息。
  6. 根据权利要求1至5任一所述的方法,其特征在于,所述获取运行的应用程序所发出的信息,包括:
    通过预先设置的接口,获取所述终端设备播报的音频数据;
    采用自动语音识别技术ASR对所述音频数据进行转换,得到所述应用程序发出的文本形式的信息。
  7. 根据权利要求1至5任一所述的方法,其特征在于,所述获取运行的应用程序所发出的信息,包括:
    通过预先设置的接口,对所述应用程序发送的文本数据进行提取,得到所述应用程序发出的文本形式的信息。
  8. 根据权利要求1至7任一所述的方法,其特征在于,所述根据所述语音指令和所述提醒信息,确定所述语音指令对应的用户意图之前,所述方法还包括:
    采用ASR技术对所述语音指令进行转换,得到文本形式的文本指令;
    所述根据所述语音指令和所述提醒信息,确定所述语音指令对应的用户意图,包括:
    根据所述文本指令和所述提醒信息,确定所述语音指令对应的所述用户意图。
  9. 根据权利要求8所述的方法,其特征在于,所述采用ASR技术对所述语音指令进行转换,得到文本形式的文本指令,包括:
    采用语音增强技术对所述语音指令进行去噪,得到去噪后的语音指令;
    采用所述ASR技术对所述去噪后的语音指令进行转换,得到文本形式的所述文本指令。
  10. 根据权利要求1至9任一所述的方法,其特征在于,在所述获取语音指令和运行的应用程序所发出的信息之前,所述方法还包括:
    根据多种样本数据,建立不同种类的样本数据之间的多种关联关系,多种所述样本数据包括:样本提醒信息、样本界面内容、样本语音指令和样本用户意图,多种所述关联关系包括:所述样本用户意图与所述样本提醒信息之间的关联关系,所述样本用户意图与所述样本语音指令之间的关联关系,所述样本用户意图与所述样本界面内容之间的关联关系;
    根据多种所述关联关系进行训练,得到融合模型,所述融合模型为单个模型或多个模型所组成的模型组。
  11. 根据权利要求10所述的方法,其特征在于,所述根据所述语音指令和所述提醒信息,确定所述语音指令对应的用户意图,包括:
    通过所述融合模型,结合所述语音指令和所述提醒信息,确定所述语音指令对应的所述用户意图。
  12. 根据权利要求1至11任一所述的方法,其特征在于,在所述根据所述语音指令和所述提醒信息,确定所述语音指令对应的用户意图之后,所述方法还包括:
    根据所述用户意图,调用意图执行接口,执行与所述用户意图相匹配的操作。
  13. 根据权利要求1至12任一所述的方法,其特征在于,所述方法应用在多设备场景中,所述多设备场景包括第一终端设备和第二终端设备,所述第一终端设备与所述第二终端设备连接;
    所述获取语音指令和运行的应用程序所发出的信息,包括:
    所述第一终端设备获取语音指令和所述第一终端设备运行的应用程序所发出的信息;
    所述第一终端设备根据所述语音指令,向第二终端设备发送信息请求指令,所述信息请求指令用于指示所述第二终端设备获取、并向所述第一终端设备反馈所述第二终端设备运行的应用程序所发出的信息;
    所述第一终端设备接收所述第二终端设备反馈的运行的应用程序所发出的信息。
  14. 一种电子设备,其特征在于,包括:处理器,所述处理器用于运行存储器中存储的计算机程序,以实现如权利要求1至13中任一项所述的语音解析方法。
  15. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至13中任一项所述的语音 解析方法。
  16. 一种芯片***,其特征在于,所述芯片***包括存储器和处理器,所述处理器执行所述存储器中存储的计算机程序,以实现如权利要求1至13中任一项所述的语音解析方法。
PCT/CN2022/131980 2021-11-30 2022-11-15 语音解析方法、电子设备、可读存储介质及芯片*** WO2023098467A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111453243.7 2021-11-30
CN202111453243.7A CN116206602A (zh) 2021-11-30 2021-11-30 语音解析方法、电子设备、可读存储介质及芯片***

Publications (1)

Publication Number Publication Date
WO2023098467A1 true WO2023098467A1 (zh) 2023-06-08

Family

ID=86510056

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/131980 WO2023098467A1 (zh) 2021-11-30 2022-11-15 语音解析方法、电子设备、可读存储介质及芯片***

Country Status (2)

Country Link
CN (1) CN116206602A (zh)
WO (1) WO2023098467A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108806674A (zh) * 2017-05-05 2018-11-13 北京搜狗科技发展有限公司 一种定位导航方法、装置和电子设备
CN108897517A (zh) * 2018-06-27 2018-11-27 联想(北京)有限公司 一种信息处理方法及电子设备
CN109741740A (zh) * 2018-12-26 2019-05-10 苏州思必驰信息科技有限公司 基于外部触发的语音交互方法及装置
CN110866179A (zh) * 2019-10-08 2020-03-06 上海博泰悦臻网络技术服务有限公司 一种基于语音助手的推荐方法、终端及计算机存储介质
CN111949240A (zh) * 2019-05-16 2020-11-17 阿里巴巴集团控股有限公司 交互方法、存储介质、服务程序和设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108806674A (zh) * 2017-05-05 2018-11-13 北京搜狗科技发展有限公司 一种定位导航方法、装置和电子设备
CN108897517A (zh) * 2018-06-27 2018-11-27 联想(北京)有限公司 一种信息处理方法及电子设备
CN109741740A (zh) * 2018-12-26 2019-05-10 苏州思必驰信息科技有限公司 基于外部触发的语音交互方法及装置
CN111949240A (zh) * 2019-05-16 2020-11-17 阿里巴巴集团控股有限公司 交互方法、存储介质、服务程序和设备
CN110866179A (zh) * 2019-10-08 2020-03-06 上海博泰悦臻网络技术服务有限公司 一种基于语音助手的推荐方法、终端及计算机存储介质

Also Published As

Publication number Publication date
CN116206602A (zh) 2023-06-02

Similar Documents

Publication Publication Date Title
WO2021052263A1 (zh) 语音助手显示方法及装置
RU2766255C1 (ru) Способ голосового управления и электронное устройство
WO2020211701A1 (zh) 模型训练方法、情绪识别方法及相关装置和设备
CN110910872B (zh) 语音交互方法及装置
US11922935B2 (en) Voice interaction method and apparatus, terminal, and storage medium
CN110138959B (zh) 显示人机交互指令的提示的方法及电子设备
CN115866121B (zh) 应用界面交互方法、电子设备和计算机可读存储介质
US20220350450A1 (en) Processing Method for Waiting Scenario in Application and Apparatus
EP4064284A1 (en) Voice detection method, prediction model training method, apparatus, device, and medium
WO2020029306A1 (zh) 一种图像拍摄方法及电子设备
CN111819533B (zh) 一种触发电子设备执行功能的方法及电子设备
US11868463B2 (en) Method for managing application permission and electronic device
WO2021052139A1 (zh) 手势输入方法及电子设备
WO2021218429A1 (zh) 应用窗口的管理方法、终端设备及计算机可读存储介质
CN116233300A (zh) 控制通信服务状态的方法、终端设备和可读存储介质
WO2023207667A1 (zh) 一种显示方法、汽车和电子设备
WO2023071940A1 (zh) 跨设备的导航任务的同步方法、装置、设备及存储介质
CN114444000A (zh) 页面布局文件的生成方法、装置、电子设备以及可读存储介质
CN113742460A (zh) 生成虚拟角色的方法及装置
WO2022007757A1 (zh) 跨设备声纹注册方法、电子设备及存储介质
WO2023098467A1 (zh) 语音解析方法、电子设备、可读存储介质及芯片***
CN113867851A (zh) 电子设备操作引导信息录制方法、获取方法和终端设备
WO2023116669A1 (zh) 视频生成***、方法及相关装置
CN114115772B (zh) 灭屏显示的方法及装置
WO2023109636A1 (zh) 应用卡片显示方法、装置、终端设备及可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22900272

Country of ref document: EP

Kind code of ref document: A1