CN112055072B

CN112055072B - Cloud audio input method and device, cloud system, electronic equipment and storage medium

Info

Publication number: CN112055072B
Application number: CN202010902321.6A
Authority: CN
Inventors: 陈晓峰; 刘智勇
Original assignee: Beijing IQIYI Science and Technology Co Ltd
Current assignee: Beijing IQIYI Science and Technology Co Ltd
Priority date: 2020-08-31
Filing date: 2020-08-31
Publication date: 2023-06-30
Anticipated expiration: 2040-08-31
Also published as: CN112055072A

Abstract

The embodiment of the invention relates to a cloud audio input method, a cloud system, electronic equipment and a storage medium, wherein the method is applied to the cloud system, the cloud system comprises cloud equipment and a user terminal, and the method comprises the following steps: when the cloud equipment detects a trigger event of a voice function, generating an audio acquisition request, and sending the audio acquisition request to the user terminal, wherein the audio acquisition request is used for requesting the user terminal to acquire audio; the user terminal responds to the audio acquisition request and calls an audio module to acquire audio data; and the user terminal sends the audio data to the cloud equipment. Therefore, cloud audio input can be realized, and the voice requirement of a user when using the cloud application is met.

Description

Cloud audio input method and device, cloud system, electronic equipment and storage medium

Technical Field

The embodiment of the invention relates to the field of cloud communication, in particular to a cloud audio input method and device, a cloud system, electronic equipment and a storage medium.

Background

Cloud application is a novel application for connecting and controlling a remote server (or a server cluster, namely a cloud server) through the Internet or a local area network to complete business logic or operation tasks. The cloud application may run on a cloud device, which may generally be a virtual device built on a cloud server.

In practical applications, a user may need a voice service when using a cloud application, for example, the user needs to perform voice communication with other users when playing a cloud game, for example, the user needs to control the cloud application to perform a certain operation through voice, and so on.

However, because the cloud application runs on the virtual cloud device, and the virtual cloud device does not have an actual audio input device, such as a microphone, the cloud device cannot collect audio data, which cannot meet the voice requirement of the user when using the cloud application.

Disclosure of Invention

In view of the above, the embodiments of the present invention provide a cloud audio input method, a cloud audio input device, a cloud system, an electronic device, and a storage medium. By applying the method, cloud audio input can be realized, and the voice requirement of a user when using the cloud application is met.

In a first aspect, an embodiment of the present invention provides a cloud audio input method, which is applied to a cloud system, where the cloud system includes a cloud device and a user terminal, and the method includes:

when the cloud equipment detects a trigger event of a voice function, generating an audio acquisition request, and sending the audio acquisition request to the user terminal, wherein the audio acquisition request is used for requesting the user terminal to acquire audio;

The user terminal responds to the audio acquisition request and calls an audio module to acquire audio data;

and the user terminal sends the audio data to the cloud equipment.

In a second aspect, the invention provides a cloud audio input method, which is applied to cloud equipment in a cloud system, wherein the cloud system further comprises a user terminal; the method comprises the following steps:

when a trigger event of a voice function is detected, an audio acquisition request is generated, and the audio acquisition request is sent to the user terminal, so that the user terminal responds to the audio acquisition request and calls an audio module to acquire audio data; the audio acquisition request is used for requesting the user terminal to acquire audio;

the audio data from the user terminal is received.

In a third aspect, the invention provides a cloud audio input method, which is applied to a user terminal in a cloud system, wherein the cloud system further comprises cloud equipment; the method comprises the following steps:

receiving an audio acquisition request from the cloud equipment, wherein the audio acquisition request is used for requesting the user terminal to acquire audio;

responding to the audio acquisition request, and calling an audio module to acquire audio data;

And sending the audio data to the cloud equipment.

In a fourth aspect, the invention provides a cloud system, which comprises cloud equipment and a user terminal, wherein the cloud equipment is in communication connection with the user terminal;

the cloud device is configured to execute the cloud audio input method according to any one of the second aspect;

the user terminal is configured to execute the cloud audio input method according to any one of the third aspect.

In a fifth aspect, the present invention provides a cloud audio input device, the device comprising:

the processing unit is used for generating an audio acquisition request when a trigger event of a voice function is detected, wherein the audio acquisition request is used for requesting the user terminal to acquire audio;

the sending unit is used for sending the audio acquisition request to the user terminal so that the user terminal responds to the audio acquisition request and calls an audio module to acquire audio data;

and the receiving unit is used for receiving the audio data sent by the user terminal.

In a sixth aspect, the present invention provides a cloud audio input device, the device comprising:

the receiving unit is used for receiving an audio acquisition request from the cloud equipment, wherein the audio acquisition request is used for requesting the user terminal to acquire audio;

The calling unit is used for responding to the audio acquisition request and calling the audio module to acquire audio data;

and the sending unit is used for sending the audio data to the cloud equipment.

In a seventh aspect, the present invention provides an electronic device, comprising: the cloud audio input device comprises a processor and a memory, wherein the processor is used for executing a cloud audio input program stored in the memory to realize the cloud audio input method in any one of the second aspect or the third aspect.

In an eighth aspect, the present invention provides a storage medium storing one or more programs executable by one or more processors to implement the cloud audio input method according to any of the second or third aspects.

According to the technical scheme provided by the embodiment of the invention, when the cloud equipment detects the trigger event of the voice function, the audio acquisition request is generated and sent to the user terminal, the user terminal responds to the audio acquisition request and calls the audio module to acquire audio data and sends the audio data to the cloud equipment, so that the cloud equipment can realize audio acquisition by utilizing the audio module corresponding to the user terminal in the cloud application operation process, and even if the cloud equipment does not have the audio acquisition function, the audio data can be acquired through interaction with the user terminal, cloud audio input is realized, and the voice requirement of a user when using the cloud application is met.

Drawings

FIG. 1 is a schematic diagram of a system architecture of a cloud system according to an embodiment of the present invention;

FIG. 2 is a flowchart of an embodiment of a cloud audio input method according to an exemplary embodiment of the present invention;

FIG. 3 is a schematic diagram of an application interface for a user terminal to display a cloud application;

FIG. 4 is a schematic diagram of a cloud handset system architecture;

fig. 5 is a schematic process diagram of a cloud device sending an audio acquisition request to a user terminal;

FIG. 6 is a schematic diagram of a user terminal system architecture;

fig. 7 is a schematic diagram of a process of reading audio data collected by an audio module by a user terminal;

FIG. 8 is a flowchart illustrating another cloud audio input method according to an exemplary embodiment of the present invention;

FIG. 9 is a schematic diagram illustrating a process of a user terminal starting an audio module for audio acquisition;

fig. 10 is a schematic process diagram of a cloud device receiving audio data from a cloud communication module;

FIG. 11 is a block diagram illustrating an exemplary cloud audio input device according to an exemplary embodiment of the present invention;

FIG. 12 is a block diagram illustrating an exemplary cloud audio input device according to an exemplary embodiment of the present invention;

fig. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

For the purpose of facilitating understanding of the embodiments of the present invention, a system architecture to which the present invention relates is described by way of example:

referring to fig. 1, a schematic system architecture of a cloud system according to an embodiment of the present invention is shown.

As shown in fig. 1, the cloud system 100 may include a cloud device 101, a user terminal 102. The cloud device 101 is communicatively connected to the user terminal 102.

The cloud device 101 may be a virtual device or an entity device. By way of example, cloud device 101 may be a logical device that is virtualized on a cloud server by virtual technology. Specifically, a plurality of logic devices can be virtualized on any cloud server. When cloud device 101 is a physical device, it includes, but is not limited to, a smart phone, a tablet, a laptop, a desktop, a server, and the like.

In practice, the cloud device 101 may provide corresponding network services by installing a cloud application (also referred to as a cloud application program, a cloud APP, a cloud application, etc.), such as the cloud device 101 providing cloud game services by installing a cloud game application, that is, the cloud application is running on the cloud device. Specifically, one or more cloud applications may be running on any one cloud device; wherein when multiple cloud applications are installed, the application identifications of the different cloud applications are generally different. In one possible scenario, multiple cloud applications with identical application identifications may be installed in one cloud device.

The user terminal 102 is a physical device that may be a variety of electronic devices supporting a display screen including, but not limited to, smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

In practice, the user terminal 102 may be used to provide corresponding network services by installing a client application, such as the user terminal 102 providing video or image playback services by installing a video player.

As an embodiment, the cloud device and the user terminal establish communication connection through the cloud communication module. Optionally, the cloud communication module may be built in the cloud device or independent of the cloud device, and may be used to implement communication connection between one or more cloud applications on the cloud device and their corresponding user terminals, and/or to implement communication connection between one or more cloud devices and their corresponding user terminals. The above modules are broad, and may be classes, functions, threads, processes, etc., and the cloud communication module is an exemplary cloud application service program.

In the cloud system 100 illustrated in fig. 1, on the one hand, the cloud device 101 is configured to run a cloud application, and send a data stream when the cloud application is run to the user terminal 102, and the user terminal 102 is configured to display the received data stream. On the other hand, the user terminal 102 is configured to collect operation data of a user, send the operation data (collected operation data or processed operation data) to the cloud device 101, where the cloud device 101 responds to the user operation based on the received operation data, and continues to send a response data stream to the user terminal 102, and the user terminal 102 continues to display the received response data stream. As such, cloud device 101 forms a closed loop with user terminal 102, and a user may use a cloud application installed on cloud device 101 through user terminal 102. Since the cloud application is not actually running in the user terminal 102, it is advantageous to reduce the running resources of the user terminal 102.

Taking the cloud game application as an example, the cloud device 101 sends a video stream of the cloud game application to the user terminal 102, and the user terminal 102 may display the received video stream through a client application, such as a video player. Further, each time the user terminal 102 collects operation data of a user, the operation data is sent to the cloud device 101, the cloud device 101 responds and refreshes the video stream based on the received operation data, and the refreshed video stream is sent to the user terminal 102, so that the user can use the cloud game application installed on the cloud device 101 through the user terminal 102.

It will be appreciated that the number of networks and devices in fig. 1 is merely illustrative. The cloud system 100 may include any number of networks and devices as desired, as the invention is not limited in this regard.

Based on the cloud system 100 illustrated in fig. 1, in practical applications, a user is likely to need a voice service when using the cloud application, for example, in a cloud game scene, different users need to perform voice communication or text communication, where when different users need to perform text communication, the user can send voice to the cloud application, and the cloud application converts the received voice into text; for example, in the smart home control scenario, the user may send a voice command to the cloud application, and the cloud application executes an operation corresponding to the received voice command.

However, when the cloud device is a virtual device, the virtual device is not provided with an actual audio input device, such as a microphone, so the cloud device cannot collect audio data; when the cloud device is an entity device, the entity device is not necessarily provided with an actual audio input device, so the cloud device may not be capable of collecting audio data; furthermore, even if the cloud device is provided with the actual audio input device, the user is facing to the user terminal and is not the cloud device, so that the audio input device on the cloud device cannot collect audio data of the user side, and therefore the voice requirement of the user when using the cloud application cannot be met, and the development of the cloud application is greatly limited.

Based on the above, the invention provides a cloud audio input method to realize the audio input of cloud equipment (namely, cloud) and meet the voice requirement of a user when using a cloud application. It should be noted that, the audio input is meant to be a broad meaning, and may include an audio input alone or a combination of audio and other multimedia data. The audio includes, but is not limited to, one or more of speech, music, songs, and combinations thereof. The other multimedia data described above includes, but is not limited to, one or more of text, graphics, images, animations. For example, in a cloud game scenario, different users are engaged in video calls, which involves inputting a combination of audio and images into the cloud.

The cloud audio input method provided by the invention is described in the following embodiments by referring to the accompanying drawings:

referring to fig. 2, a flowchart of an embodiment of a cloud audio input method according to an exemplary embodiment of the present invention is shown. As shown in fig. 2, the process may include the steps of:

step 201, when the cloud device detects a trigger event to a voice function, an audio acquisition request is generated.

In practice, the cloud device sends an application interface of the cloud application to the user terminal, and the user terminal displays the received application interface. As shown in fig. 3, a schematic diagram of an application interface of a cloud application is shown for a user terminal. The user terminal may monitor the triggering event of the user on the presented application interface, such as a click event, a double click event, a slide event, etc. And when the user terminal monitors any triggering event, the user terminal sends the description information of the triggering event to the cloud equipment. The cloud device may then detect a trigger event for the cloud application.

In an embodiment of the invention, a trigger event of applying a voice function to a cloud is involved. Based on this, the cloud device, after receiving the description information of the trigger event from the user terminal, may determine whether the trigger event is a trigger event for applying a voice function to the cloud based on the description information.

As an optional implementation manner, the description information of the trigger event includes a position coordinate corresponding to the trigger event on the application interface. Based on this, the cloud application may then determine whether the trigger event is a trigger event for a cloud application voice function according to the location coordinates in the description information. For example, the user may click on the microphone icon 301 on the application interface 300, the user terminal then carries the position coordinate clicked by the user on the application interface 300 in the description information of the trigger event and sends the position coordinate to the cloud device, and the cloud device determines whether the user clicks on the microphone icon 301 in the application interface 300 based on the received position coordinate, so that it may be realized that it is determined whether the trigger event is a trigger event for the cloud application voice function.

In the embodiment of the invention, when the cloud equipment detects a trigger event of a voice function, an audio acquisition request for requesting the user terminal to acquire audio is generated.

As one embodiment, the audio acquisition request carries the audio parameters supported by the cloud application described above (hereinafter referred to as target audio parameters for convenience of description). In practice, the target audio parameters supported by different cloud applications may be different. The function of the audio acquisition request to carry the target audio parameters will be described in step 203 below, and will not be described in detail here.

Step 202, the cloud device sends an audio acquisition request to the user terminal.

In one example, taking a cloud device as a cloud mobile phone implemented based on an android system as shown in fig. 4, the cloud mobile phone includes, but is not limited to: cloud applications (i.e., cloud applications), audio services layer, hardware abstraction layer.

The Audio service layer may be Audio player (hereinafter referred to as AF). AF is a Native layer system service, and is also an audio hub of the android system, and plays a role in being up and down (providing an access interface for upper layer services) and managing audio equipment through a hardware abstraction layer.

The hardware abstraction layer may be an Audio HAL (Hardware Abstraction Layer ). The Audio HAL is a basic stone for running on the framework layer of the android system, and the operation of related hardware required by the upper layer of the android system is generally realized by calling related APIs (Application Program Interface, application program interfaces) by the Audio HAL.

However, when the cloud phone is a virtual device, the virtual device does not have an actual Audio input device, or when the cloud phone is a physical device that does not have an Audio input device, the Audio HAL cannot call a related API to drive the Audio input device of the cloud phone. Based on this, in the embodiment of the present invention, when the cloud device detects a trigger event for a voice function, the Audio HAL does not call a related API to drive an Audio input device of the cloud mobile phone, but sends an Audio acquisition request to the user terminal to request the user terminal to drive the Audio input device on the user terminal to perform Audio acquisition. As shown in fig. 5, the process of sending an audio acquisition request to a user terminal by a cloud device includes the following steps:

in step 501, when any cloud application on the cloud device detects a trigger event for a voice function, any cloud application generates an audio module starting request and sends the audio module starting request to an audio service layer.

The audio module start request is used for requesting to start the audio acquisition equipment.

Step 502, the audio service layer forwards an audio module start request to the hardware abstraction layer.

In step 503, the hardware abstraction layer responds to the audio module starting request and sends an audio acquisition request to the cloud communication module.

As one embodiment, the hardware abstraction layer may send the audio capture request to the cloud communication module through a cross-process communication mechanism, such as through IPC (Inter-Process Communication, interprocess communication).

Step 504, the cloud communication module forwards the audio acquisition request to the user terminal.

As an embodiment, the cloud communication module may send the received audio acquisition request to the user terminal through a data channel between the cloud communication module and the user terminal, such as WebRTC (Web Real-Time Communication ) long connection.

Through the flow shown in fig. 5, the cloud device realized based on the android system sends the audio acquisition request to the user terminal.

And 203, responding to the audio acquisition request by the user terminal, and calling an audio module to acquire the audio.

In practice, the user terminal responds to the audio acquisition request and invokes the audio module to acquire audio data. Optionally, the audio module may be built in the user terminal; or the audio module is an external audio input device capable of being in communication connection with the user terminal, including but not limited to a headset (including a Bluetooth headset), a microphone, and the like. The connection mode of the external audio input device and the user terminal includes but is not limited to: wired connection, bluetooth connection, wi-Fi connection, etc. It can be seen that the user terminal invoking the audio module to perform audio collection includes: and calling an audio module built in the user terminal to acquire audio, or calling the audio module in communication connection with the user terminal to acquire audio.

As one embodiment, before invoking the audio module for audio collection, the user terminal first checks whether the client application for exposing the cloud application data stream has the invocation authority of the audio module. The invoking authority may be applied in advance. If the checking result shows that the client application has the calling authority of the audio module, the user terminal can directly call the audio module to acquire the audio; if the checking result indicates that the client application does not have the calling authority of the audio module, the user terminal may first generate and output a notification message for indicating to open the calling authority of the client application to the audio module, so that the user opens the calling authority of the client application to the audio module based on the notification message.

As an alternative implementation, the user terminal may check whether the client application has the call authority of the audio module by reading the local setting information. The above local setting information at least records the call authority of the plurality of client applications on the user terminal to the audio module, and is an example of the local setting information, for example, as shown in table 1 below:

TABLE 1

Client application	Whether or not to have call authority of audio module
		A application	Is that
B application	Whether or not
		…	…

Based on this, the user terminal may find the call authority of the audio module corresponding to the client application in the local setting information illustrated in table 1.

As an embodiment, as described above, the audio acquisition request may carry target audio parameters, and based on this, in this embodiment, the user terminal responds to the audio acquisition request, first sets the audio acquisition parameters of the audio module based on the target audio parameters carried by the audio acquisition request, and then invokes the audio module to perform audio acquisition, so as to obtain audio data.

In practice, for a certain type of audio parameters, the audio module may support one or more audio parameters, such as both mono and bi-channel, while the audio parameters supported by the audio module do not necessarily contain target audio parameters. Based on the above, the setting, by the user terminal, the audio acquisition parameters of the audio module based on the target audio parameters carried by the audio acquisition request includes:

when the user terminal determines that the audio module supports the target audio parameters, setting the audio acquisition parameters of the audio module as the target audio parameters;

When the user terminal determines that the audio module does not support the target audio parameter and the audio module only supports one audio parameter, the user terminal sets the audio acquisition parameter of the audio module as one audio parameter only supported by the user terminal;

when the user terminal determines that the audio module does not support the target audio parameter and the audio module supports multiple audio parameters, the user terminal selects one audio parameter from the multiple audio parameters supported by the audio module, and sets the audio acquisition parameter of the audio module as the selected audio parameter.

As an alternative implementation, the user terminal randomly selects one audio parameter from among a plurality of audio parameters supported by the audio module.

As another alternative implementation, the user terminal selects an audio parameter closest to the target audio parameter from among a plurality of audio parameters supported by the audio module. For example, assuming that the target audio sampling rate is 50Hz, the audio sampling rates supported by the audio module include 40Hz, 200Hz, the user terminal may determine 40Hz as the matching audio parameters described above. When there are a plurality of audio parameters closest to the target audio parameters, for example, the audio sampling rate supported by the audio module includes 40Hz, 60Hz, and 200Hz, one audio parameter may be randomly selected from the plurality of audio parameters closest to the target audio parameters, or a larger audio parameter may be selected, or a smaller audio parameter may be selected, which is not limited in this aspect of the invention.

According to the method and the device for setting the audio collection parameters of the audio module based on the target audio parameters carried by the audio collection request, the problem that the cloud application on the cloud equipment cannot analyze the audio data from the user terminal can be avoided as much as possible.

Step 204, the user terminal sends the audio data to the cloud device.

In the embodiment of the invention, the calling authority can also comprise (or be regarded as) audio reading authority. That is, when the aforementioned call right is provided, the user terminal also has the right to read the audio data. Or, the invoking authority may be independent of the audio reading authority, and in this case, before the user terminal reads the audio data, it is further required to verify whether the user terminal has the audio reading authority. Similar to the call authority, the audio read authority may be applied in advance. On the basis, if the user terminal has the audio reading authority, the audio data can be directly read; if the user terminal does not have the audio reading authority, the user terminal can request the audio reading authority similar to the calling authority and the description is not expanded.

As an embodiment, the user terminal may actively read the audio data collected by the audio module and send the read audio data to the cloud device, that is, the user terminal may actively send the audio data collected by the audio module to the cloud device.

As another embodiment, after receiving the audio acquisition request from the cloud device, the user terminal reads the audio data collected by the audio module and sends the read audio data to the cloud device, that is, the user terminal may passively send the audio data collected by the audio module to the cloud device. As for the specific implementation of this embodiment, it will be described in the flow shown in fig. 8 below, and will not be described in detail here.

As an optional implementation manner, the user terminal may circularly read the audio data collected by the audio module, and send the read audio data to the cloud device when the audio data of a certain byte is read each time.

In one example, taking a user terminal as a smart phone implemented based on an android system as shown in fig. 6, the user terminal includes, but is not limited to: the system comprises a client, an audio service layer, a hardware abstraction layer, an audio module and an audio buffer. Wherein the audio service layer may be an AF. The hardware abstraction layer may be an Audio HAL. For the description of the AF and Audio HAL, reference is made to the above description, and no further description is given here.

On the basis of the user terminal shown in fig. 6, as shown in fig. 7, the process of reading the audio data collected by the audio module by the user terminal includes the following steps:

Step 701, the client sends an audio data reading request to the audio service layer.

The above-described audio data read request is for requesting to read audio data.

In step 702, the audio service layer forwards an audio data read request to the hardware abstraction layer.

In step 703, the hardware abstraction layer reads the audio data from the audio buffer of the audio module in response to the audio data reading request, and sends the read audio data to the audio service layer.

The hardware abstraction layer may call an associated API to read audio data from an audio buffer of the audio module in response to the audio data read request. The audio buffer is used for storing audio data acquired by the audio module.

Step 704, the audio service layer forwards the received audio data to the client.

Thus, the description of the flow shown in fig. 7 is completed.

Step 205, the cloud device receives audio data from the user terminal.

In the embodiment of the invention, after the cloud equipment receives the audio data from the user terminal, the cloud equipment can operate based on the received audio data to realize the corresponding service. For example, after receiving audio data from a user terminal, the cloud device may forward the received audio data to other devices, such as other user terminals, audio playback devices, and the like. In an exemplary application scenario, the cloud device may forward the received voice from the user terminal to other user terminals, so that voice interaction between different users when using the cloud application may be implemented.

Thus, the description of the flow shown in fig. 2 is completed.

The flow shown in fig. 2 generates an audio acquisition request when a triggering event of a voice function is detected through cloud equipment, and sends the audio acquisition request to a user terminal, the user terminal responds to the audio acquisition request and calls an audio module to acquire audio data, and sends the audio data to the cloud equipment.

On the basis of the flow shown in fig. 2, please refer to fig. 8, which is a flowchart of another cloud audio input method according to an exemplary embodiment of the present invention. As shown in fig. 8, the process may include the steps of:

step 801, when detecting a trigger event to a voice function, the cloud device generates an audio acquisition request.

The detailed description of this step 801 may be referred to the description of step 201 above, and will not be repeated here.

Step 802, the cloud device sends an audio acquisition request to the user terminal through the cloud communication module.

When cloud equipment and user terminal pass through cloud communication module communication connection, cloud equipment sends the audio acquisition request to user terminal and includes: the cloud equipment sends an audio acquisition request to the cloud communication module, and the cloud communication module forwards the audio acquisition request from the cloud equipment to the user terminal.

Step 803, the user terminal responds to the audio collection request, calls the audio module to collect audio to obtain audio data, and sends a first indication message to the cloud device through the cloud communication module.

The first indication message is used for indicating that the audio module is started.

In an application, as an embodiment, the audio module on the user terminal may be in an on state. Based on the above, the user terminal responds to the audio acquisition request, and determines that the audio module is started, the audio module can be called to acquire audio data, and a first indication message is sent to the cloud equipment through the cloud communication module.

As another example, an audio module on a user terminal may be in a closed state. Based on the above, the user terminal responds to the audio acquisition request, and determines that the audio module is closed, the audio module can be started first, then the audio module is called for audio acquisition to obtain audio data, and the cloud communication module sends a first indication message to the cloud equipment.

In one example, based on the user terminal shown in fig. 6, as shown in fig. 9, the process of starting the audio module by the user terminal to perform audio collection includes the following steps:

step 901, the client sends an audio module starting request to the audio service layer.

The audio module start request is used for requesting to start the audio module.

In step 902, the audio service layer forwards an audio module start request to the hardware abstraction layer.

In step 903, the hardware abstraction layer starts the audio module in response to the audio module start request.

The hardware abstraction layer may invoke an associated API to launch the audio module in response to the audio module launch request.

By the flow shown in fig. 9, the user terminal based on the android system starts the audio module.

In step 804, the cloud device responds to the first indication message, and sends an audio acquisition request to the user terminal through the cloud communication module.

As an embodiment, when the cloud device receives the first indication message, it may determine that the user terminal starts audio acquisition, and based on this, the cloud device may send an audio acquisition request to the user terminal to request to acquire audio data.

When the cloud device is in communication connection with the user terminal through the cloud communication module, the cloud device sending an audio acquisition request to the user terminal includes: the cloud equipment sends an audio acquisition request to the cloud communication module, and the cloud communication module forwards the audio acquisition request from the cloud equipment to the user terminal.

In step 805, the user terminal responds to the audio acquisition request, and sends audio data to the cloud device through the cloud communication module.

As an embodiment, the user terminal may send the audio data to the cloud device after receiving the audio acquisition request from the cloud device. I.e. the user terminal passively sends audio data to the cloud device.

When cloud equipment and user terminal pass through cloud communication module communication connection, cloud equipment sends audio data to user terminal and includes: the user terminal sends the audio data to the cloud communication module, and the cloud communication module forwards the audio data from the user terminal to the cloud equipment.

As an embodiment, after receiving the audio data from the user terminal, the cloud communication module stores the received audio data in the cloud storage area, sends a second indication message for indicating that the audio data has been stored in the cloud storage area to the cloud device, and the cloud device reads the audio data from the cloud storage area in response to the second indication message.

Optionally, each cloud device may correspond to one or more cloud storage areas, and further, one cloud storage area may correspond to a plurality of cloud devices, that is, different cloud devices may correspond to the same cloud storage area, or may correspond to different cloud storage areas. When one cloud storage area corresponds to a plurality of cloud devices, the plurality of cloud devices may be all cloud devices under the same cloud server, or a plurality of cloud servers.

Optionally, one cloud storage area may correspond to one cloud device, and specifically, the cloud storage area is a built-in storage area of the cloud device.

As an embodiment, when the audio format of the audio data collected by the audio module is inconsistent with the audio format supported by the cloud device (for convenience of description, hereinafter referred to as a target audio format), the audio data that has been format-converted is stored in the cloud storage area, and the audio data that has been format-converted belongs to the target audio format. This can avoid the problem that the cloud device cannot recognize the received audio data.

As an optional implementation manner, when the audio format of the audio data collected by the audio module is inconsistent with the target audio format, the user terminal may perform format conversion on the audio data collected by the audio module, and send the audio data after format conversion to the cloud communication module. This enables the audio data to be stored in the cloud storage area, which has been format-converted.

As another optional implementation manner, when the audio format of the audio data collected by the audio module is inconsistent with the target audio format, the cloud communication module may perform format conversion on the received audio data, i.e., the audio data collected by the audio module, and store the audio data after format conversion in the cloud storage area. This also enables audio data that has been format converted to be stored in the cloud storage area.

Wherein in one example, the target audio format may be included in the target audio parameters. Therefore, the user terminal or the cloud communication module can perform format conversion on the audio data acquired by the audio module according to the target audio format included in the target audio parameters.

In another example, the user terminal or the cloud communication module may have a target audio format supported by each cloud application recorded in advance. Based on this, the user terminal or the cloud communication module may locally find a target audio format supported by a target cloud application (the target cloud application refers to a cloud application that detects a trigger event for a voice function in step 201 above) to perform format conversion on the audio data collected by the audio module.

It should be noted that in practice, different cloud applications may correspond to different target audio formats, and further, any cloud application may correspond to one or more target audio formats. In the case that the cloud application corresponds to multiple audio formats, as an optional implementation manner, the user terminal or the cloud communication module may select (e.g., randomly select) one target audio format to perform format conversion on the audio data collected by the audio module. As another optional implementation manner, the user terminal or the cloud communication module may perform format conversion on the audio data collected by the audio module for each target audio format.

Step 806, the cloud device receives the audio data from the cloud communication module.

In an example, taking a cloud device as a cloud mobile phone implemented based on an android system as an example, as shown in fig. 10, a process of receiving audio data from a cloud communication module by the cloud device includes the following steps:

in step 1001, the hardware abstraction layer receives the second indication message from the cloud communication module.

In step 1002, the hardware abstraction layer reads audio data from the cloud storage area in response to the second instruction message.

The hardware abstraction layer may call an associated API to read audio data from the cloud storage area in response to the second indication message.

In step 1003, the hardware abstraction layer sends the read audio data to the audio service layer.

In step 1004, the audio service layer forwards the received audio data to the cloud application.

Through the flow shown in fig. 10, the cloud device is implemented to receive the audio data from the cloud communication module.

The process shown in fig. 8 generates an audio acquisition request when a triggering event for a voice function is detected through cloud equipment, and sends the audio acquisition request to a user terminal, the user terminal responds to the audio acquisition request, invokes an audio module to acquire audio data, sends a first indication message for indicating that the audio module is started to the cloud equipment, and sends an audio acquisition request to the user terminal in response to the first indication message, and then the user terminal responds to the audio acquisition request and sends the audio data to the cloud equipment. Therefore, in the cloud application running process, the cloud equipment can utilize the audio module corresponding to the user terminal to realize audio collection, so that even if the cloud equipment does not have an audio collection function, audio data can be collected through interaction with the user terminal, cloud audio input is realized, and the voice requirement of a user when using the cloud application is met.

The cloud audio input device provided by the invention is described in the following embodiments with reference to the accompanying drawings:

firstly, a cloud audio input device provided by the invention is described from a cloud device side:

referring to fig. 11, a block diagram of an embodiment of a cloud audio input device according to an exemplary embodiment of the present invention is shown.

As shown in fig. 11, the apparatus includes: a processing unit 1101, a transmitting unit 1102, and a receiving unit 1103.

The processing unit 1101 is configured to generate an audio acquisition request when a trigger event for a voice function is detected, where the audio acquisition request is used to request the user terminal to perform audio acquisition;

the sending unit 1102 is configured to send the audio collection request to the user terminal, so that the user terminal responds to the audio collection request and invokes an audio module to perform audio collection to obtain audio data;

a receiving unit 1103, configured to receive the audio data sent by the user terminal.

In a possible implementation manner, the cloud device and the user terminal establish communication connection through a cloud communication module;

the cloud communication module is used for realizing communication connection between a plurality of cloud devices and corresponding user terminals; and/or the number of the groups of groups,

The cloud communication module is used for realizing communication connection between a plurality of cloud applications on the cloud equipment and corresponding user terminals.

In a possible embodiment, the device further comprises (not shown in fig. 11):

a first message receiving unit, configured to receive a first indication message from the user terminal, where the first indication message is used to indicate that the audio module is turned on;

the receiving unit 1103 is further configured to:

responding to the first indication message, sending an audio acquisition request to the user terminal, wherein the audio acquisition request is used for requesting to acquire audio data, so that the user terminal responds to the audio acquisition request, reads the audio data acquired by the audio module, and sends the read audio data to the cloud equipment;

the audio data from the user terminal is received.

In a possible embodiment, the device further comprises (not shown in fig. 11):

the second message receiving unit is used for receiving a second indication message from the cloud communication module, wherein the second indication message is used for indicating that the audio data are stored in a cloud storage area;

and the reading unit is used for reading the audio data from the cloud storage area.

In a possible implementation manner, when the audio format of the audio data collected by the audio module is inconsistent with the target audio format supported by the cloud device, the cloud storage area stores the audio data subjected to format conversion, where the audio data subjected to format conversion belongs to the target audio format.

Secondly, the cloud audio input device provided by the invention is described from a user terminal side:

referring to fig. 12, a block diagram of an embodiment of a cloud audio input device according to an exemplary embodiment of the present invention is shown.

As shown in fig. 12, the apparatus includes: a receiving unit 1201, a calling unit 1202, and a transmitting unit 1203.

The receiving unit 1201 is configured to receive an audio collection request from the cloud device, where the audio collection request is used to request the user terminal to perform audio collection;

a calling unit 1202, configured to respond to the audio collection request, and call an audio module to perform audio collection, so as to obtain audio data;

a sending unit 1203 is configured to send the audio data to the cloud device.

In a possible embodiment, the device further comprises (not shown in fig. 12):

a first message sending unit, configured to send a first indication message to the cloud device, where the first indication message is used to indicate that the audio module is turned on, so that the cloud device sends an audio acquisition request to the user terminal in response to the first indication message, where the audio acquisition request is used to request to acquire audio data;

The sending unit 1203 is further configured to:

and responding to the audio acquisition request, reading the audio data acquired by the audio module, and sending the read audio data to the cloud equipment.

In a possible implementation manner, the calling unit 1202 is further configured to:

calling an audio module built in the user terminal to acquire audio; or,

and calling an audio module in communication connection with the user terminal to perform audio collection.

Fig. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and an electronic device 1300 shown in fig. 13 includes: at least one processor 1301, a memory 1302, at least one network interface 1304, and other user interfaces 1303. The various components in the electronic device 1300 are coupled together by a bus system 1305. It is appreciated that the bus system 1305 is used to implement connected communications between these components. The bus system 1305 includes a power bus, a control bus, and a status signal bus in addition to the data bus. But for clarity of illustration, the various buses are labeled as bus system 1305 in fig. 13.

The user interface 1303 may include, among other things, a display, a keyboard, or a pointing device (e.g., a mouse, a trackball, a touch pad, or a touch screen, etc.).

It will be appreciated that the memory 1302 in embodiments of the invention can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. The non-volatile memory may be a Read-only memory (ROM), a programmable Read-only memory (ProgrammableROM, PROM), an erasable programmable Read-only memory (ErasablePROM, EPROM), an electrically erasable programmable Read-only memory (ElectricallyEPROM, EEPROM), or a flash memory, among others. The volatile memory may be a random access memory (RandomAccessMemory, RAM) that acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic random access memory (DynamicRAM, DRAM), synchronous dynamic random access memory (SynchronousDRAM, SDRAM), double data rate synchronous dynamic random access memory (ddr SDRAM), enhanced Synchronous Dynamic Random Access Memory (ESDRAM), synchronous link dynamic random access memory (SynchlinkDRAM, SLDRAM), and direct memory bus random access memory (DirectRambusRAM, DRRAM). The memory 1302 described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

In some implementations, the memory 1302 stores the following elements, executable units or data structures, or a subset thereof, or an extended set thereof: an operating system 13021 and application programs 13022.

The operating system 13021 contains various system programs, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks. The application programs 13022 include various application programs such as a media player (MediaPlayer), a Browser (Browser), and the like for realizing various application services. The program for implementing the method of the embodiment of the present invention may be contained in the application program 13022.

In an embodiment of the present invention, the processor 1301 is configured to execute the method steps provided by the method embodiments by calling a program or an instruction stored in the memory 1302, specifically, a program or an instruction stored in the application program 13022, including, for example:

The audio data from the user terminal is received.

Or,

and sending the audio data to the cloud equipment.

The method disclosed in the above embodiment of the present invention may be applied to the processor 1301 or implemented by the processor 1301. Processor 1301 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the method described above may be performed by integrated logic circuitry in hardware in processor 1301 or instructions in software. The processor 1301 may be a general purpose processor, a digital signal processor (DigitalSignalProcessor, DSP), an application specific integrated circuit (ApplicationSpecificIntegratedCircuit, ASIC), an off-the-shelf programmable gate array (FieldProgrammableGateArray, FPGA) or other programmable logic device, a discrete gate or transistor logic device, a discrete hardware component. The disclosed methods, steps, and logic blocks in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software elements in a decoding processor. The software elements may be located in a random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers, etc. as well known in the art. Which is located in a memory 1302, and a processor 1301 reads information in the memory 1302, which in combination with its hardware performs the steps of the method described above.

It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or a combination thereof. For a hardware implementation, the processing units may be implemented within one or more application specific integrated circuits (ApplicationSpecificIntegratedCircuits, ASIC), digital signal processors (DigitalSignalProcessing, DSP), digital signal processing devices (dspev), programmable logic devices (ProgrammableLogicDevice, PLD), field programmable gate arrays (Field-ProgrammableGateArray, FPGA), general purpose processors, controllers, micro-controllers, microprocessors, other electronic units configured to perform the functions described herein, or a combination thereof.

For a software implementation, the techniques described herein may be implemented by means of units that perform the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.

The electronic device provided in this embodiment may be an electronic device as shown in fig. 13, and all the steps of the cloud audio input method provided in the embodiment of the present invention may be executed, so as to achieve the technical effects of the cloud audio input method shown in the embodiment of the present invention, and specific reference is made to the above related description, which is omitted herein for brevity.

The embodiment of the invention also provides a storage medium (computer readable storage medium). The storage medium here stores one or more programs. Wherein the storage medium may comprise volatile memory, such as random access memory; the memory may also include non-volatile memory, such as read-only memory, flash memory, hard disk, or solid state disk; the memory may also comprise a combination of the above types of memories.

When one or more programs in the storage medium can be executed by one or more processors, the cloud audio input method executed on the electronic device side is realized.

The processor is used for executing a cloud audio input program stored in the memory so as to realize the following steps of a cloud audio input method executed on the electronic equipment side:

The audio data from the user terminal is received.

Or,

and sending the audio data to the cloud equipment.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of function in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. The cloud audio input method is characterized by being applied to a cloud system, wherein the cloud system comprises cloud equipment and a user terminal, and the method comprises the following steps:

the user terminal sends the audio data to the cloud equipment;

the cloud device determines whether a trigger event for a voice function is detected by adopting the following mode:

the cloud equipment sends an application interface of a cloud application to the user terminal;

The user terminal displays the received application interface;

when the user terminal monitors any trigger event, the user terminal sends description information of the any trigger event to the cloud equipment;

and the cloud equipment determines whether any trigger event is a trigger event of a voice function applied to the cloud based on the description information.

2. The method of claim 1, wherein the cloud device and the user terminal establish a communication connection through a cloud communication module;

3. The method of claim 2, wherein the cloud device further comprises: an audio service layer and a hardware abstraction layer; when the cloud equipment detects a trigger event of a voice function, generating an audio acquisition request and sending the audio acquisition request to the user terminal, wherein the method comprises the following steps:

when any cloud application on the cloud equipment detects a trigger event of a voice function, the any cloud application generates an audio module starting request and sends the audio module starting request to the audio service layer, wherein the audio module starting request is used for requesting to start the audio acquisition equipment;

The audio service layer forwards the audio module starting request to the hardware abstraction layer;

responding to the starting request of the audio module, and sending the audio acquisition request to the cloud communication module by the hardware abstraction layer;

and the cloud communication module forwards the audio acquisition request to the user terminal.

4. The method according to claim 1, wherein the method further comprises:

the user terminal sends a first indication message to the cloud equipment, wherein the first indication message is used for indicating that the audio module is started;

the cloud equipment responds to the first indication message and sends an audio acquisition request to the user terminal, wherein the audio acquisition request is used for requesting to acquire audio data;

the user terminal sending the audio data to the cloud device, including:

and the user terminal responds to the audio acquisition request, reads the audio data acquired by the audio module and sends the read audio data to the cloud equipment.

5. A method according to claim 3, characterized in that the method further comprises:

the cloud device receives a second indication message from the cloud communication module, wherein the second indication message is used for indicating that the audio data is stored in a cloud storage area;

And the cloud equipment reads the audio data from the cloud storage area.

6. The method of claim 5, wherein when the audio format of the audio data collected by the audio module is inconsistent with a target audio format supported by the cloud device, storing the format-converted audio data in the cloud storage area, wherein the format-converted audio data belongs to the target audio format.

7. The method of claim 1, wherein the audio acquisition request carries target audio parameters;

the method further comprises the steps of:

when the user terminal determines that the audio module supports the target audio parameters, setting the audio acquisition parameters of the audio module based on the target audio parameters; or,

when the user terminal determines that the audio module does not support the target audio parameter and the audio module supports only one audio parameter, setting an audio acquisition parameter of the audio module based on the audio parameter supported by the audio module; or,

when the user terminal determines that the audio module does not support the target audio parameter and the audio module supports multiple audio parameters, selecting one audio parameter from the multiple audio parameters supported by the audio module, and setting the audio acquisition parameters of the audio module based on the selected audio parameter.

8. The cloud audio input method is characterized by being applied to cloud equipment in a cloud system, wherein the cloud system further comprises a user terminal; the method comprises the following steps:

receiving the audio data from the user terminal;

the user terminal displays the received application interface;

9. The method of claim 8, wherein the cloud device establishes a communication connection with the user terminal through a cloud communication module;

10. The method of claim 9, wherein the cloud device further comprises: an audio service layer and a hardware abstraction layer; when a trigger event for a voice function is detected, generating an audio acquisition request, and sending the audio acquisition request to the user terminal, wherein the method comprises the following steps:

responding to the audio module starting request, and sending the audio acquisition request to the cloud communication module by the hardware abstraction layer so that the audio acquisition request is forwarded to the user terminal by the cloud communication module.

11. The method of claim 8, wherein the method further comprises:

receiving a first indication message from the user terminal, wherein the first indication message is used for indicating that the audio module is started;

said receiving said audio data from said user terminal comprises:

the audio data from the user terminal is received.

12. The cloud audio input method is characterized by being applied to a user terminal in a cloud system, wherein the cloud system further comprises cloud equipment; the method comprises the following steps:

transmitting the audio data to the cloud device;

the user terminal displays the received application interface;

13. The method according to claim 12, wherein the method further comprises:

sending a first indication message to the cloud device, wherein the first indication message is used for indicating that the audio module is started so that the cloud device responds to the first indication message to send an audio acquisition request to the user terminal, and the audio acquisition request is used for requesting to acquire audio data;

the sending the audio data to the cloud device includes:

14. The method of claim 12, wherein the invoking the audio module for audio acquisition comprises:

calling an audio module built in the user terminal to acquire audio; or,

15. The cloud system is characterized by comprising cloud equipment and a user terminal, wherein the cloud equipment is in communication connection with the user terminal;

the cloud device is configured to execute the cloud audio input method according to any one of claims 8 to 11;

the user terminal is configured to execute the cloud audio input method according to any one of claims 12 to 14.

16. A cloud audio input device, the device comprising:

A receiving unit, configured to receive the audio data sent by the user terminal;

the cloud device determines whether a trigger event for a voice function is detected in the following manner:

the user terminal displays the received application interface;

17. A cloud audio input device, the device comprising:

a transmitting unit, configured to transmit the audio data to the cloud device;

the user terminal displays the received application interface;

18. An electronic device, comprising: the cloud audio input device comprises a processor and a memory, wherein the processor is used for executing a cloud audio input program stored in the memory to realize the cloud audio input method according to any one of claims 8-11 or 12-14.

19. A storage medium storing one or more programs executable by one or more processors to implement the cloud audio input method of any of claims 8-11 or 12-14.