CN113810253B

CN113810253B - Service providing method, system, device, equipment and storage medium

Info

Publication number: CN113810253B
Application number: CN202010556222.7A
Authority: CN
Inventors: 詹亚威; 陈建平; 祝俊; 姜迪建; 黄洽南
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-06-17
Filing date: 2020-06-17
Publication date: 2023-06-20
Anticipated expiration: 2040-06-17
Also published as: CN113810253A

Abstract

The embodiment of the invention provides a service providing method, a system, a device, equipment and a storage medium, wherein the method comprises the following steps: the intelligent sound box responds to the selection operation of the target service triggered by the user, determines a plurality of functions required for realizing the target service, wherein the plurality of functions comprise a first function provided by the intelligent sound box and a second function provided by the mobile terminal equipment. And then, the intelligent sound box respectively acquires first data corresponding to the first function and second data corresponding to the second function, and enables the intelligent sound box to realize target service according to the two parts of data. In the method, the intelligent sound box can acquire the data from different devices, and the intelligent sound box has the capability of providing target service by combining the data, so that the service capability of the intelligent sound box is enriched.

Description

Service providing method, system, device, equipment and storage medium

Technical Field

The present invention relates to the field of intelligent devices, and in particular, to a service providing method, system, apparatus, device, and storage medium.

Background

Common smart devices may include smart speakers, self-moving robots, unmanned aerial vehicles, and the like. Among them, the most widely used in daily life is the intelligent sound box. The smart speaker is an upgrade product of a general speaker, which can provide various services for a user, such as online shopping, content searching, etc., in response to a voice command transmitted from the user. Meanwhile, the intelligent sound box can also respond to voice instructions to control other intelligent household equipment, such as opening curtains, controlling water heater heating and the like.

However, in practical applications, the above services provided by the intelligent speaker are actually an intelligent voice service, and the service capability of the intelligent speaker is limited. Therefore, how to enrich the service capability of the intelligent sound box is a urgent problem to be solved.

Disclosure of Invention

In view of this, the embodiments of the present invention provide a service providing method, apparatus, device and storage medium for enriching service capabilities of intelligent devices.

In a first aspect, an embodiment of the present invention provides a service providing method, which is applied to an intelligent sound box, including:

determining a plurality of functions required for realizing the target service in response to the selection operation of a user on the target service, wherein the plurality of functions comprise a first function provided by the intelligent sound box and a second function provided by the mobile terminal equipment;

acquiring first data corresponding to the first function;

acquiring second data corresponding to the second function;

and according to the first data and the second data, realizing the target service.

In a second aspect, an embodiment of the present invention provides a service providing apparatus, including:

the determining module is used for responding to the selection operation of a user on the target service and determining a plurality of functions required by realizing the target service, wherein the plurality of functions comprise a first function provided by the intelligent sound box and a second function provided by the mobile terminal equipment;

The first acquisition module is used for acquiring first data corresponding to the first function;

the second acquisition module is used for acquiring second data corresponding to the second function;

and the response module is used for realizing the target service according to the first data and the second data.

In a third aspect, an embodiment of the present invention provides an electronic device, including a processor and a memory, where the memory is configured to store one or more computer instructions, and the one or more computer instructions implement the service providing method in the first aspect when executed by the processor. The electronic device may also include a communication interface for communicating with other devices or communication networks.

In a fourth aspect, embodiments of the present invention provide a non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to at least implement a service providing method as described in the first aspect.

In a fifth aspect, an embodiment of the present invention provides a service providing system, including: an intelligent sound box and mobile terminal equipment;

The intelligent sound box is used for responding to the selection operation of a user on a target service, determining a plurality of functions required by realizing the target service, wherein the plurality of functions comprise a first function provided by the intelligent sound box and a second function provided by the mobile terminal equipment; acquiring first data corresponding to the first function; acquiring second data corresponding to the second function sent by the mobile terminal equipment; and implementing the target service according to the first data and the second data.

In a sixth aspect, an embodiment of the present invention provides a service providing method, applied to an intelligent speaker used by a first user, including:

responding to the selection operation of the first user on the motion state control service, and determining that a plurality of functions required for realizing the motion state control service comprise a display function provided by the intelligent sound box and a sensing data acquisition function provided by first mobile terminal equipment used by a second user;

acquiring video data corresponding to the display function;

acquiring first sensing data corresponding to the sensing data acquisition function, wherein the first sensing data reflects the motion state of the first mobile terminal equipment;

And controlling the motion state of the target object corresponding to the second user in the video picture to be the same as the motion state of the first mobile terminal equipment according to the first sensing data, so that the intelligent sound box realizes motion state control service.

In a seventh aspect, an embodiment of the present invention provides a service providing apparatus, including:

the system comprises a determining module, a first control module and a second control module, wherein the determining module is used for responding to the selection operation of a first user on a motion state control service and determining that a plurality of functions required for realizing the motion state control service comprise a display function provided by an intelligent sound box used by the first user and a sensing data acquisition function provided by first mobile terminal equipment used by a second user;

the first acquisition module is used for acquiring video data corresponding to the display function;

the second acquisition module is used for acquiring sensing data corresponding to the sensing data acquisition function, and the sensing data reflects the motion state of the first mobile terminal equipment;

and the control module is used for controlling the motion state of the target object corresponding to the second user in the video picture to be the same as the motion state of the first mobile terminal equipment according to the sensing data, so that the intelligent sound box realizes motion state control service.

In an eighth aspect, an embodiment of the present invention provides an electronic device, including a processor and a memory, where the memory is configured to store one or more computer instructions, and the one or more computer instructions implement the service providing method in the sixth aspect when executed by the processor. The electronic device may also include a communication interface for communicating with other devices or communication networks.

In a ninth aspect, embodiments of the present invention provide a non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to at least implement a service providing method as described in the sixth aspect.

In a tenth aspect, an embodiment of the present invention provides a service providing system, including: an intelligent sound box used by a first user and a first mobile terminal device used by a second user;

the intelligent sound box is used for responding to the selection operation of the first user on the motion state control service, and determining that the multiple functions required by realizing the motion state control service comprise a display function provided by the intelligent sound box and a sensing data acquisition function provided by the first mobile terminal equipment;

Acquiring video data corresponding to the display function and sensing data corresponding to the sensing data acquisition function, wherein the sensing data reflects the motion state of the first mobile terminal equipment;

and controlling the motion state of the target object corresponding to the second user in the video picture to be the same as the motion state of the first mobile terminal equipment according to the sensing data, so that the intelligent sound box realizes motion state control service.

According to the service providing method provided by the embodiment of the invention, the intelligent sound box responds to the selection operation of the target service triggered by the user, determines a plurality of functions required for realizing the target service, and the intelligent sound box cannot provide the plurality of functions. In particular, the smart speaker can provide a first function of the plurality of functions and the mobile terminal device in communication with the smart speaker can provide the remaining function of the plurality of functions, namely a second function. Then, the intelligent sound box respectively acquires first data corresponding to the first function and second data corresponding to the second function, and enables the intelligent sound box to achieve target service according to the first data and the second data.

In the method, the intelligent sound box can acquire data from different devices, and the intelligent sound box has the capability of providing target services by combining the data. That is, the mobile terminal device provides data support for the service capability expansion of the intelligent sound box, and the intelligent sound box achieves the purpose of expanding the service capability of the intelligent sound box by means of the functions of the intelligent sound box and the functions of other devices. And in addition, the intelligent sound box does not need to be added with any functional module in a manner of enriching service capability.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a service providing method according to an embodiment of the present invention;

FIG. 2 is a flowchart of another service providing method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of the service providing method according to the embodiment shown in FIG. 2 applied in a game service scenario;

FIG. 4 is a flowchart of a method for providing a service according to another embodiment of the present invention;

FIG. 5 is a schematic diagram of the service providing method provided in the embodiment shown in FIG. 4 applied in an interactive scenario;

FIG. 6 is a flowchart of a method for providing a service according to another embodiment of the present invention;

FIG. 7 is a schematic diagram of the service providing method according to the embodiment shown in FIG. 6 applied in the identification service scenario;

fig. 8 is a schematic structural diagram of a service providing system according to an embodiment of the present invention;

FIG. 9 is a flowchart of yet another service providing method according to an embodiment of the present invention;

FIG. 10 is a schematic diagram of the service providing method according to the embodiment shown in FIG. 9 applied in a game service scenario;

FIG. 11 is a schematic diagram of another service providing system according to an embodiment of the present invention;

fig. 12 is a schematic structural diagram of a service providing apparatus according to an embodiment of the present invention;

fig. 13 is a schematic structural diagram of an electronic device corresponding to the service providing apparatus provided in the embodiment shown in fig. 12;

fig. 14 is a schematic structural diagram of another service providing apparatus according to an embodiment of the present invention;

fig. 15 is a schematic structural diagram of an electronic device corresponding to the service providing apparatus provided in the embodiment shown in fig. 14.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, the "plurality" generally includes at least two, but does not exclude the case of at least one.

It should be understood that the term "and/or" as used herein is merely one relationship describing the association of the associated objects, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to an identification", depending on the context. Similarly, the phrase "if determined" or "if identified (stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when identified (stated condition or event)" or "in response to an identification (stated condition or event), depending on the context.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a product or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such product or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a commodity or system comprising such elements.

In practical applications, smart devices are various, such as smart speakers, unmanned aerial vehicles, self-moving robots, etc. mentioned in the background, and the services that different smart devices can provide are different.

Taking the smart speakers as an example, the most common services of smart speakers may include content query services, online shopping services, man-machine conversation services, and so on. These services rely on the automatic speech recognition (Automatic Speech Recognition, ASR for short) function of the smart speakers.

With the development of smart speakers, users have more demands on services that they can provide, such as interactive services, game services, recognition services, etc., which users wish to provide, but it is obvious that such services cannot be realized by only relying on the ASR function of the smart speakers. At this time, there is a need to enrich the service capability of the intelligent sound box itself, and the service providing method provided by the invention can be used to expand the service that the intelligent sound box can provide.

The following embodiments may take the service capability of the smart speaker as an example, and describe the service providing method provided by the embodiments of the present invention. In addition, the mobile terminal device may be a mobile phone, a tablet computer, a sports bracelet, etc. used by the user. The service capability of different intelligent devices can be expanded, and the intelligent sound box is not limited.

Based on the foregoing, some embodiments of the present invention will be described in detail below with reference to the accompanying drawings. In the case where there is no conflict between the embodiments, the following embodiments and features in the embodiments may be combined with each other. In addition, the sequence of steps in the method embodiments described below is only an example and is not strictly limited.

Fig. 1 is a flowchart of a service providing method according to an embodiment of the present invention, where the service providing method according to the embodiment of the present invention may be implemented by an intelligent speaker.

As shown in fig. 1, the method comprises the steps of:

s101, determining multiple functions required for realizing the target service in response to the selection operation of the user on the target service, wherein the multiple functions comprise a first function provided by the intelligent sound box and a second function provided by the mobile terminal equipment.

The user can trigger the selection operation of the target service for the intelligent sound box. Alternatively, the user may issue a voice command to the smart speaker to wake up the smart speaker and cause it to determine the target service selected by the user via the ASR function provided by itself. Alternatively, the user may manually select the target service via an operation interface provided by the smart speaker.

For a selected target service, it needs to combine multiple functions to achieve it, and the multiple functions are not all deployed on the same device. That is, the plurality of functions may include at least one first function provided by the smart speaker and at least one second function provided by the mobile terminal device.

For example, the first function provided by the smart speaker may include: ASR functions, display functions, audio playback functions, computing functions, and so forth. The second function provided by the mobile terminal device may include: a positioning function, a computing function, a sensing function, an authentication function, an image processing function, and the like.

Based on the target service selected by the user, optionally, the intelligent sound box may determine multiple services corresponding to the target service and divide a first function and a second function in the multiple services in the following manner:

The intelligent sound box can obtain a service configuration file recorded with the corresponding relation between the service and the function. The intelligent speaker can determine the various services required to achieve the target service by querying the configuration file in response to the user-triggered selection operation. And then the first function and the second function in the multiple functions are further distinguished according to the functions provided by the intelligent sound box. The service configuration file may be stored in the intelligent speaker or the server.

S102, acquiring first data corresponding to the first function.

S103, second data corresponding to the second function are acquired.

And S104, realizing the target service according to the first data and the second data.

Then, the intelligent sound box can acquire the second data generated by the mobile terminal equipment by means of the communication connection established with the mobile terminal equipment while acquiring the first data, and the target service can be realized by combining the two parts of data.

It is easy to understand that, for different target services, the first data and the second data required by the intelligent sound box to realize the intelligent sound box are different, and the specific implementation process of the target service is also different. Alternatively, the target service may be the interactive service, the game service, the identification service, or the like mentioned above.

Wherein the interactive service can be understood as: the user can watch the video through the intelligent sound box and simultaneously interact with the video content, such as comment sending, gift giving and the like. The intelligent sound box can respond to the interactive action generated by the user.

The game service can be understood as: the intelligent sound box is internally provided with a game Application (APP for short), and the intelligent sound box can respond to game operations generated by a user, so that the intelligent sound box realizes game services.

The identification service can be understood as: for the object to be identified, the user wants to know the detailed information but does not know the name, and the intelligent sound box can identify the object to be identified to obtain the detailed information of the object to be identified.

The specific implementation of the several target services provided above may be seen in the following detailed description of the embodiments shown in fig. 2-7.

In this embodiment, the intelligent sound box determines multiple functions required for implementing the target service in response to a user-triggered selection operation on the target service, where the multiple functions are a first function of the intelligent sound box and a second function of the mobile terminal device. And then, the intelligent sound box respectively acquires first data corresponding to the first function and second data corresponding to the second function, and enables the intelligent sound box to realize target service according to the two data.

Optionally, in the process of realizing the target service, different functions of different devices are used, so that in order to facilitate the realization of the target service, each function of the intelligent sound box and the mobile terminal device can be modularized, and a corresponding interface is set for each function. After the connection is established with the mobile terminal device, the intelligent sound box can directly call the corresponding interface of the second function to acquire the second data.

Specifically, the intelligent sound box can be obtained from a preset configuration file recorded with the corresponding relation between the functions and the interfaces. This preset profile may be referred to as a function profile in order to distinguish it from the service profile in step 101. After determining the first function and the second function required for realizing the target service, the intelligent sound box can further query a preset interface corresponding to the second function in the function configuration file, and acquire second data corresponding to the second function by calling the preset interface. Optionally, the service configuration file may also be stored in the smart speaker or the server.

Along with the development of the intelligent sound box, the intelligent sound box can be provided with a screen, so that the intelligent sound box has a display function, and at the moment, game services can be provided for users by means of the display function of the intelligent sound box.

Specifically, during the course of a game, a user may generate a game operation, which may change the motion state of a target object (i.e., an object acting on the game operation) in a game video screen, thereby implementing a game service. It follows that the functions required for realizing the service include a display function of a video picture, and an acquisition function of a target object motion state. The acquisition of the motion state of the target object can be reflected by sensing data acquired by a sensor configured in the mobile terminal device.

Therefore, when the intelligent sound box realizes game service, the first function provided by the intelligent sound box is a display function, and the first data corresponding to the display function is video data; the second function provided by the mobile terminal equipment is a sensing data acquisition function, and the second data corresponding to the second function is sensing data.

At this time, the smart speakers may implement game services using the following method as shown in fig. 2. Fig. 2 is a flowchart of another service providing method according to an embodiment of the present invention, as shown in fig. 2, the method may include the following steps:

S201, determining multiple functions required for realizing the target service in response to the selection operation of the user on the target service, wherein the multiple functions comprise a first function provided by the intelligent sound box and a second function provided by the mobile terminal equipment.

The above-mentioned step 201 is performed in a similar manner to the corresponding steps of the previous embodiment, and reference may be made to the related description in the embodiment shown in fig. 1, which is not repeated here.

S202, responding to the starting operation of the user to the triggering of the intelligent sound box, and acquiring and displaying video data corresponding to the first function.

S203, responding to the mobile operation triggered by the user on the mobile terminal equipment, acquiring sensing data acquired by a sensor of the mobile terminal equipment in the motion process, wherein the sensing data corresponds to a second function.

S204, controlling the motion state of the target object in the video picture to be the same as the motion state of the mobile terminal equipment according to the sensing data so as to enable the intelligent sound box to realize motion state control service.

The game APP can be installed in the intelligent sound box, and a user can trigger the game APP to start operation through a voice instruction or clicking operation. The intelligent sound box responds to the starting operation to acquire and display video data of the game, namely first data.

The user can see the target object in the video picture, and operate the mobile terminal device according to the state of the target object in the video picture, such as holding the mobile terminal device to move up and down and left and right, so as to change the motion state of the mobile terminal device. The sensor configured in the mobile terminal device may collect sensing data reflecting the change of the movement state, i.e. the second data. The sensing data can be transmitted to the intelligent sound box through a preset interface corresponding to the sensing data acquisition function.

Based on the obtained video data and the sensing data, the intelligent sound box can control the motion state of the target object in the video picture to be the same as the motion state of the mobile terminal device according to the sensing data while displaying the game video.

For example, when the sensing data reflects that the mobile terminal device moves upwards, the intelligent sound box can control the target object to move upwards in the video picture according to the acquired sensing data. Because the motion state of the target object is consistent with that of the mobile terminal device, the target object is indicated to respond to the game operation generated by the user, and the intelligent sound box realizes the motion state control service.

In this embodiment, the intelligent sound box can play video data by means of its own display function, and can also control the motion state of a target object in a video picture by means of the sensing data acquisition function of the mobile terminal device, so as to implement a target object motion state control service, i.e. a game service. And the expansion of service capability does not require the addition of functional modules within the intelligent sound box.

For ease of understanding, the specific implementation procedure of the above service providing method is exemplarily described in connection with a scenario of providing a game service. The details can be understood in connection with fig. 3.

Assume that an airplane playing game APP is installed in the intelligent sound box, and the purpose of the game is to knock down an airplane on a game screen. Firstly, a user can wake up the intelligent sound box through a voice instruction and enable the intelligent sound box to open the airplane playing game APP. At this time, the intelligent sound box can display game pictures by utilizing the display function of the intelligent sound box. As shown in fig. 3, a plurality of planes are included above the game screen, and the bottom of the game screen includes a projectile launching device, which is the target object in the embodiment of fig. 2, and the motion state of the target object is controlled so that the planes can be knocked down.

The user can then hold the handset to move the handset to the left. The mobile phone can acquire sensing data by using a sensor configured by the mobile phone, and the acquired sensing data can reflect the motion state of the mobile phone to move leftwards. The intelligent sound box acquires sensing data through a preset interface, and then controls the motion state of the shell launching device according to the sensing data, namely, controls the shell launching device to move leftwards so that the shell launching device can hit down an airplane at the upper left side in a game picture.

In the scene of providing game service, the mobile phone does not display game pictures, and the mobile phone is also provided with a game APP, so that the mobile phone plays a role of a game handle.

Of course, the above-mentioned motion state control service may be used in other scenes, such as a video play scene, in addition to the game scene in particular. The user can adjust the video playing progress through the sensing data collected by the mobile terminal equipment. For example, when the sensing data reflects that the mobile terminal device moves leftwards, the video is fast forwarded, and the mobile terminal device moves upwards, so that the next video is played. At this time, the mobile terminal device may function as a remote controller.

For the intelligent sound box with the display function, interactive service can be provided for users by means of the display function. Specifically, video broadcast APP or live APP can be installed in the intelligent audio amplifier, and the user can watch the video through the intelligent audio amplifier. The video may be live video or normal video that is not live. And during viewing, the user may also generate corresponding interaction data, which may alternatively be comments, virtual gifts, etc. Of course, the interactive data can also be displayed on the intelligent sound box. It follows that in the interactive service, functions that need to be used include a display function of a video picture and an input function.

When the intelligent sound box realizes the interactive service, the first function provided by the intelligent sound box is a display function, and the first data corresponding to the display function is video data; the second function provided by the mobile terminal equipment is an input function, and the second data corresponding to the second function is interaction data.

At this time, the smart speakers may implement the interactive service using the method described below as shown in fig. 4. Fig. 4 is a flowchart of yet another service providing method according to an embodiment of the present invention, as shown in fig. 4, the method may include the following steps:

S301, determining multiple functions required for realizing the target service in response to the selection operation of the user on the target service, wherein the multiple functions comprise a first function provided by the intelligent sound box and a second function provided by the mobile terminal equipment.

The above-mentioned step 301 is performed in a similar manner to the corresponding steps in the previous embodiment, and reference may be made to the related description in the embodiment shown in fig. 1, which is not repeated here.

S302, responding to the starting operation of the user to the triggering of the intelligent sound box, and acquiring video data corresponding to the first function.

S303, responding to the input operation triggered by the user to the mobile terminal equipment, and acquiring interaction data input by the user in the mobile terminal equipment, wherein the interaction data corresponds to the second function.

S304, displaying the interaction data and the video data so as to enable the intelligent sound box to realize interaction service.

The user may trigger a start operation by a voice command or a click operation. The intelligent sound box starts to operate in response to the start, and video data are acquired and displayed. The video data can be displayed by means of a video play APP or a live APP installed in the smart box.

The mobile terminal equipment can be provided with an interaction data input APP, and the APP can be matched with a video playing APP or a live broadcast APP or an intelligent sound box. During the process of watching video, a user can input interactive data through the interactive data input APP. And the mobile terminal equipment responds to the input operation triggered by the user to acquire the interaction data. And the interaction data can be transmitted to the intelligent sound box through a preset interface corresponding to the input function. The intelligent sound box can display the interactive data input by the user while displaying the video, so that the intelligent sound box realizes the interactive service.

In this embodiment, the intelligent sound box may play video data by means of its own display function, and may also acquire interactive data input by a user by means of an input function of the mobile terminal device. The interactive service can be realized by displaying the two functions simultaneously, and the expansion of service capacity does not need to increase a functional module in the intelligent sound box.

For ease of understanding, a specific implementation procedure of the service providing method provided above is exemplarily described in connection with an interactive service scenario. The details can be understood in connection with fig. 5.

Suppose that a live APP is installed on the intelligent sound box, and an interactive data input APP is installed on the mobile phone. The user can wake up the intelligent sound box through the voice command and enable the intelligent sound box to open the live APP. At this time, the intelligent sound box can display live video by utilizing the display function of the intelligent sound box. In the process of watching live video, a user can open the interactive data input APP in the mobile phone at any time to input interactive data. The mobile phone responds to input operation triggered by a user, and can acquire and send interaction data to the intelligent sound box through a preset interface. The intelligent sound box simultaneously displays video data and interaction data, namely, provides interaction service.

As shown in fig. 5, the smart speaker may be at 9:00 starts playing live video, and the user uses the mobile phone to play 9:02 input interaction data, then at 9:02, the interaction data and the live video can be displayed on the screen of the intelligent sound box together.

In addition to the above services, in practical applications, for an object that does not know the name but wants to know its detailed information, the identification service provided by the smart speaker can be used to identify the object. At this time, the object to be identified can be identified by directly utilizing the camera shooting function and the calculation function provided by the intelligent sound box.

But the shooting function and the calculating function of the intelligent sound box are weak. Therefore, shooting and recognition of the object to be recognized can be realized by means of the mobile terminal equipment with more powerful shooting and computing functions, and the intelligent sound box feeds the recognition result back to the user. The above-mentioned is actually to promote the level of the recognition service of the intelligent sound box by means of the mobile terminal device.

Based on the above description, when the intelligent sound box realizes the recognition service, the first function provided by the intelligent sound box is a voice interaction function (i.e. an ASR function), and the first data corresponding to the first function is a voice instruction; the second function provided by the mobile terminal equipment is an image recognition function, and the second data corresponding to the second function is a recognition result.

At this time, the smart speaker may implement the recognition service itself using the following method as shown in fig. 6. Fig. 6 is a flowchart of yet another service providing method according to an embodiment of the present invention, as shown in fig. 6, the method may include the following steps:

s401, determining multiple functions required for realizing the target service in response to the selection operation of the user on the target service, wherein the multiple functions comprise a first function provided by the intelligent sound box and a second function provided by the mobile terminal equipment.

The above-mentioned step 401 is performed in a similar manner to the corresponding steps in the previous embodiment, and reference may be made to the related description in the embodiment shown in fig. 1, which is not repeated here.

S402, a voice instruction input by a user to the intelligent sound box is acquired, wherein the voice instruction corresponds to a first function.

S403, sending a voice command to the mobile terminal equipment so as to control the mobile terminal equipment to shoot an image and identify an object contained in the shot image.

S404, acquiring a recognition result generated by the mobile terminal equipment, wherein the recognition result corresponds to the second function.

S405, outputting the identification result so that the intelligent sound box realizes the identification service.

After the user selects the target service (i.e. the identification service) through voice instruction or clicking operation, the camera of the mobile terminal device can be further controlled to be started. Specifically, after the user sends a voice command to the intelligent sound box, the voice command is sent to the mobile terminal device. And the mobile terminal equipment responds to the voice instruction and controls the opening of the camera of the mobile terminal equipment. At this time, the user can use the mobile terminal device to shoot the object to be identified, and the mobile terminal device identifies the shot image to obtain an identification result.

The mobile terminal device sends the identification result to the intelligent sound box. The intelligent sound box can broadcast the recognition result to the user by means of the voice interaction function of the intelligent sound box, so that the intelligent sound box can realize recognition service.

In this embodiment, the intelligent sound box may receive a voice instruction of a user by means of its own voice interaction function, and may further perform shooting and recognition on an object to be recognized by means of a shooting function and a computing function of the mobile terminal device, so as to obtain a recognition result. Finally, the intelligent sound box can also broadcast the identification result by means of the semantic interaction function of the intelligent sound box.

For ease of understanding, the specific implementation procedure of the service providing method provided above is exemplarily described in connection with identifying a service scenario. The details can be understood in connection with fig. 7.

If the user sees a plant without knowing the name when watching the television, the user wants to know the plant in detail, on one hand, the user can control the television picture to be still, on the other hand, the user can wake up the intelligent sound box through a voice command to enable the intelligent sound box to be switched to a mode corresponding to the identification service, and then the voice command is sent to the intelligent sound box, so that the intelligent sound box can control the camera of the mobile phone to be started. At this time, the user can photograph and identify the plant using the mobile phone. The mobile phone can shoot clear images by utilizing the strong shooting function of the mobile phone, and accurately identify the types of plants in the images by utilizing the strong image identification function of the mobile phone.

After the intelligent sound box receives the identification result, the intelligent sound box can be broadcasted to a user, so that identification service is realized. Although the intelligent sound box with the shooting function and the recognition function also exists, the shooting and recognition functions of the mobile phone are stronger than those of the intelligent sound box, so that the accuracy of the recognition result can be ensured by the mode, and the recognition service level of the intelligent sound box is improved.

In the above embodiments, the second data is derived from a mobile terminal device. However, in practical applications, the second data may also be derived from a plurality of mobile terminal devices for different service scenarios, such as the following multiplayer online game scenario shown in fig. 9, which is not limited by the present invention.

In addition, optionally, besides the above services, the smart speaker may also implement switching play services. Namely, the playing of the multimedia data is switched to the intelligent sound box by the mobile terminal equipment. Wherein the multimedia data may be audio, video, etc.

In particular, at a first moment in time, multimedia data may be played on the mobile terminal device. At the second moment, the user can trigger the switching operation to the intelligent sound box through the voice command. The intelligent sound box responds to the switching operation, the current playing progress of the multimedia data can be obtained from the mobile terminal equipment, and the multimedia data can be continuously played on the intelligent sound box according to the playing progress. In this scenario, the first function provided by the smart speaker and the second function provided by the mobile terminal device are both display functions.

In this service scenario, when the above-mentioned multimedia data is audio, the mobile terminal device may also be a vehicle-mounted playing system with a playing function, and so on. When the multimedia data is a live video stream, after the intelligent sound box responds to the switching operation, the live video stream can be obtained and continuously played according to the live video stream identification obtained from the mobile terminal equipment.

It is easy to understand that communication connection needs to be established between the intelligent sound box and the mobile terminal device, so that the intelligent sound box can acquire second data corresponding to the second function based on the connection, and finally target service is realized. For the establishment of the communication connection, the following manner may be adopted. And this connection establishment process may be performed before "determining various functions required to implement the target service in response to a user's selection operation of the target service".

Alternatively, the user may trigger a connection operation on the smart speaker, which may be a voice command or a click operation, or the like. The intelligent sound box sends a connection establishment message to all devices in the same local area network. After receiving the message, the mobile terminal device in the local area network can display corresponding prompt information to inform the user of the existence of the connection establishment message. At this time, the user may operate the mobile terminal device, so that the mobile terminal device responds to the connection establishment message and sends its own identification information to the smart speaker.

The intelligent sound box can compare the identification information of the intelligent sound box with the received identification information of the mobile terminal equipment, and if the intelligent sound box and the received identification information have a matching relationship, the communication connection between the intelligent sound box and the mobile terminal equipment is determined to be successfully established. Wherein the identification information of the device may be regarded as identity information of the device. And the mobile terminal equipment is often equipment which is already bound with the intelligent sound box, and the binding relation between the mobile terminal equipment and the intelligent sound box is stored in the intelligent sound box local or server side in advance.

Optionally, the smart speaker may also send its own identification information together when sending the connection establishment message, so that the mobile terminal device may obtain the identification information of each of the two devices. At this time, the comparison process of the identification information can be executed by the mobile terminal equipment, and the comparison result is finally fed back to the intelligent sound box.

Alternatively, the smart speaker may be provided with a scanning identifier, such as a two-dimensional code or a barcode, and the scanning identifier may include identification information of the smart speaker. The user can trigger the scanning operation on the mobile terminal device, namely, the mobile terminal device is used for scanning the scanning identification on the intelligent sound box so as to acquire the identification information of the intelligent sound box. At this time, the mobile terminal device may consider that there is a device to be connected to itself, and may directly establish a communication connection with the intelligent speaker according to the identification information. In this way, the connection is automatically established as soon as the identification information is received, and no comparison of the identification information is required.

Optionally, after the user triggers the mobile terminal device to scan, the mobile terminal device may send its own identification information to the smart speaker. After receiving the identification information, the intelligent sound box automatically establishes communication connection with the mobile terminal equipment.

Optionally, the smart speaker and the mobile terminal device may each support at least one communication mode, such as bluetooth communication, wifi communication, cloud (enclosed) communication, and the like, and the barcode set on the smart speaker may further include a communication mode. After the connection is established, the devices can communicate according to the communication mode contained in the scanning identification.

The essential difference between the above connection methods is whether or not the identification information is compared. In practical application, the method can be used alternatively according to different target services. For some services with strict requirements on identity, a connection can be established by means of identification information comparison. For the daily entertainment services such as the game service, the identification service, the interactive service and the like, the identity requirement is not strict, and the connection can be established in a mode of not comparing the identification information.

Optionally, the smart speaker and the mobile terminal device may each issue a connection disconnection instruction to disconnect communication between the smart speaker and the mobile terminal device. And during the connection, the devices can send heartbeat packets to each other at fixed time to inform the state of the devices, and check whether the communication connection is normal.

The embodiments described above are specific implementation procedures for describing different services in detail from the perspective of the intelligent sound box. In addition, the intelligent sound box and the mobile terminal equipment can jointly form a service providing system. Fig. 8 is a schematic structural diagram of a service providing system according to an embodiment of the present invention, as shown in fig. 8, where the system includes: an intelligent sound box 1 and a mobile terminal device 2. Alternatively, the number of mobile terminal devices 2 may be at least one.

In this system, the intelligent sound box 1 is used for determining a plurality of functions required for realizing a target service in response to a selection operation of the target service by a user. The multiple functions include a first function provided by the smart speaker 1 and a second function provided by the mobile terminal device 2. Then, the first data corresponding to the first function and the second data corresponding to the second function sent by the mobile terminal device 2 are further acquired. Finally, the target service is realized by using the two parts of data at the same time.

Alternatively, the mobile terminal device 2 may be, for example, a mobile phone, a tablet computer, a sports bracelet, or the like.

In this embodiment, the mobile terminal device 2 provides data support for service capability extension of the smart speaker, and the smart speaker 1 can provide its own capability of providing the target service by using the data acquired from the different devices in combination. And in this way, no functional module needs to be added to the intelligent sound box 1.

The intelligent sound box 1 can provide interactive service, identification service and motion state control service for users by using the system. The specific implementation process of each service may be referred to the related description in the embodiments shown in fig. 2 to fig. 7, and will not be described herein.

In the embodiments shown in fig. 2-3 above, it has been mentioned that the smart speakers may provide motion state control services, such as gaming services. The game service may be a multiplayer online game service, and the method shown in fig. 9 may be performed to realize multiplayer online.

In this scenario, it is assumed that the second user is a remote user with respect to the first user, i.e. the mobile terminal device used by the second user is located in a different position than the smart speakers used by the first user. The intelligent sound box used by the first user and the mobile terminal equipment used by the first user are located at the same position. The second user can also have an intelligent sound box used by the second user, and the intelligent sound box used by the second user and the mobile terminal equipment used by the second user are positioned at the same position. For convenience of description, the mobile terminal device used by the second user may be referred to as a first mobile terminal device, and the mobile terminal device used by the first user may be referred to as a second mobile terminal device.

Based on the above description, fig. 9 is a flowchart of yet another service providing method according to an embodiment of the present invention, and as shown in fig. 9, an execution subject of the method may be a smart speaker used by a first user, where the method may include the following steps:

s501, in response to a selection operation of the motion state control service by the first user, determining that multiple functions required for realizing the motion state control service include a display function provided by the intelligent sound box and a sensing data acquisition function provided by the first mobile terminal device used by the second user.

S502, obtaining video data corresponding to the display function.

S503, acquiring first sensing data corresponding to the sensing data acquisition function, wherein the sensing data reflects the motion state of the first mobile terminal equipment.

S504, according to the first sensing data, controlling the motion state of a target object corresponding to the second user in the video picture to be the same as the motion state of the first mobile terminal equipment, so that the intelligent sound box realizes motion state control service.

When the first user and the remote second user play online, the first mobile terminal device used by the second user can firstly establish connection with the intelligent sound box used by the first user in the mode provided by the first mobile terminal device. Then, the first user can trigger the selection operation of the motion state control service through a voice instruction or clicking operation, and the intelligent sound box responds to the selection operation to control the start of the installed game APP, so that video data of the game are acquired and displayed.

Then, the intelligent sound box used by the first user can also send the acquired video data to the intelligent sound box used by the second user, and at this time, the first user and the second user can both watch the same video picture through the respective intelligent sound boxes.

The remote second user can see the target object corresponding to the second user in the video picture, and operate the first mobile terminal device according to the state of the target object in the video picture, for example, hold the first mobile terminal device to move up and down and left and right so as to change the motion state of the first mobile terminal device. The sensor configured in the first mobile terminal device may collect first sensing data reflecting a change in the movement state. The first sensing data can be transmitted to the intelligent sound box through a preset interface corresponding to the sensing data acquisition function.

Based on the video data and the first sensing data, the intelligent sound box used by the first user can control the motion state of the target object corresponding to the second user in the video picture to be the same as the motion state of the first mobile terminal device according to the first sensing data while displaying the video data, so that the intelligent sound box provides a multi-user connection game service.

Optionally, after step 503, the first user may also collect the second sensing data through the second mobile terminal device used by the first user, and the smart speaker used by the first user may also control the motion state of the target object corresponding to the first user in the video frame according to the second sensing data. At the moment, the intelligent sound box used by the first user can simultaneously see the control of different users on the corresponding target objects, so that the online game of multiple people is realized.

In this embodiment, the first mobile terminal device may provide data support for service capability extension of the intelligent sound box, so that the intelligent sound box may display a manipulation result of the remote second user on the target object, and of course, the intelligent sound box may also display a manipulation result of the first user on the target object. Compared with the single-person game provided by the intelligent sound box in the embodiment shown in fig. 2-3, the intelligent sound box can provide the multi-person online game, and the position of the game participant is not limited, so that the intelligent sound box is very flexible.

Optionally, the intelligent sound box used by the second user can also receive the sensing data respectively collected by the second mobile terminal device and the first mobile terminal device, and respectively display the control of different users on the motion state of the corresponding target object according to the sensing data, and the intelligent sound box used by the second user can also provide the multi-user online game service.

For ease of understanding, the specific implementation procedure of the above service providing method is exemplarily described in connection with a scenario of providing a game service. The details can be understood in connection with fig. 10.

Taking the example of the in-flight game in the embodiment shown in fig. 3 as an example, first, the first user may wake up the smart speaker through a voice command and cause the smart speaker to open the in-flight game APP. At this time, the smart speaker used by the first user may display the game screen by using its own display function. At this time, the intelligent sound box used by the second user can also display the same game picture.

As shown in fig. 10, a plurality of planes are included above the game screen, and the bottom of the game screen includes a shell launcher 1 (target object corresponding to the first user) and a shell launcher 2 (target object corresponding to the second user), and the movement state of the target object is controlled so that it can hit the planes.

The second user may then hold the handset to move the handset to the left. The mobile phone can acquire sensing data by using a sensor configured by the mobile phone, and the acquired sensing data can reflect the motion state of the mobile phone to move leftwards. The intelligent sound box used by the first user acquires sensing data acquired by the mobile phone of the second user through the preset interface, and then controls the movement state of the shell launching device 2 according to the sensing data, namely, controls the shell launching device 2 to move leftwards so that the shell launching device 2 can knock down an airplane at the upper left side in a game picture.

Similarly, the first user may hold the mobile phone to move the mobile phone to the right. The intelligent sound box used by the first user controls the shell launching device 1 to move rightwards according to the sensing data collected by the mobile phone used by the first user, so that the shell launching device 1 can knock down an airplane at the upper right side in a game picture.

And at the moment, the intelligent sound boxes used by the first user and the second user can both see the control results of the two parties on the respective target objects, namely the multi-person online game is realized.

The embodiment shown in fig. 9 to 10 is a specific implementation procedure for describing different services in detail from the perspective of the smart speaker. In addition, the intelligent sound box used by the first user and the first mobile terminal equipment used by the second user can jointly form a service providing system. Fig. 11 is a schematic structural diagram of another service providing system according to an embodiment of the present invention, as shown in fig. 11, where the system includes: a smart speaker 3 for use by a first user and a first mobile terminal device 4 for use by a second user.

In this system, the intelligent sound box 3 is configured to determine, in response to a selection operation of the motion state control service by the first user, that a plurality of functions required for implementing the motion state control service include a display function provided by the intelligent sound box 3 and a sensing data acquisition function provided by the first mobile terminal device 4;

Acquiring video data corresponding to a display function and first sensing data corresponding to a sensing data acquisition function, wherein the first sensing data reflects the motion state of first mobile terminal equipment 4;

and controlling the motion state of a target object corresponding to the second user in the video picture to be the same as the motion state of the first mobile terminal equipment according to the first sensing data, so that the intelligent sound box 3 realizes motion state control service.

Optionally, the service providing system may further include a second mobile terminal device 5 used by the first user, for acquiring second sensing data corresponding to the sensing data acquisition function, where the second sensing data reflects a motion state of the second mobile terminal device. At this time, the intelligent sound box 3 may control the motion state of the target object corresponding to the first user in the video frame to be the same as the motion state of the second mobile terminal device 5 according to the second sensing data, so that the intelligent sound box realizes the motion state control service.

Details of this embodiment, which are not described in detail, may be referred to the related descriptions in the embodiments shown in fig. 9 to 10, and are not described here again.

In this embodiment, the first mobile terminal device 4 and the second mobile terminal device 5 both provide data support for the service capability extension of the intelligent sound box 3, and the intelligent sound box 3 can have the capability of the motion state control service by combining the data acquired from different devices. And in this way, no functional module needs to be added to the intelligent sound box 3.

A service providing apparatus of one or more embodiments of the present invention will be described in detail below. Those skilled in the art will appreciate that these service providing means may be configured by the steps taught by the present solution using commercially available hardware components.

Fig. 12 is a schematic structural diagram of a service providing apparatus according to an embodiment of the present invention, as shown in fig. 12, where the apparatus includes:

and the determining module 11 is used for determining a plurality of functions required for realizing the target service in response to the selection operation of the user on the target service, wherein the plurality of functions comprise a first function provided by the intelligent sound box and a second function provided by the mobile terminal equipment.

The first obtaining module 12 is configured to obtain first data corresponding to the first function.

And the second acquisition module 13 is configured to acquire second data corresponding to the second function.

And a response module 14, configured to implement the target service according to the first data and the second data.

Optionally, the second obtaining module 13 is specifically configured to: inquiring a preset interface corresponding to the second function in the mobile terminal equipment in a preset configuration file recorded with the corresponding relation between the interface and the function; and acquiring second data corresponding to the second function by calling the preset interface

Optionally, the first function is a display function, and the first data is video data; the second function is a sensing data acquisition function, and the second data is sensing data describing the motion state of the mobile terminal equipment.

The response module 14 is specifically configured to: and controlling the motion state of a target object in a video picture to be the same as the motion state of the mobile terminal equipment according to the sensing data so as to enable the intelligent sound box to realize motion state control service.

Optionally, the first obtaining module 12 is specifically configured to: and responding to the starting operation of the user on the triggering of the intelligent sound box, and acquiring and displaying the video data.

The second obtaining module 13 is specifically configured to: and responding to the mobile operation triggered by the user on the mobile terminal equipment, and acquiring the sensing data acquired by the sensor of the mobile terminal equipment in the motion process.

Optionally, the first function is a display function, and the first data is video data; the second function is an input function, and the second data is interaction data for the video data, which is input by the user.

The response module 14 is specifically configured to: and displaying the interaction data and the video data so as to enable the intelligent sound box to realize interaction service.

Optionally, the first obtaining module 12 is specifically configured to: and responding to the starting operation triggered by the user to the intelligent sound box, and acquiring the video data.

The second obtaining module 13 is specifically configured to: and responding to the input operation triggered by the user to the mobile terminal equipment, and acquiring the interaction data input by the user in the mobile terminal equipment.

Optionally, the first function is a voice interaction function, and the first data is a voice instruction input by the user to the intelligent sound box; the second function is an image recognition function, and the second data is a recognition result of the image.

The second obtaining module 13 is specifically configured to: sending the voice command to the mobile terminal equipment so as to control the mobile terminal equipment to shoot an image and identify an object contained in the shot image; and acquiring the identification result generated by the mobile terminal equipment.

The response module 14 is specifically configured to: and outputting the identification result so as to enable the intelligent sound box to realize identification service.

Optionally, the apparatus further comprises:

and the sending module 21 is used for responding to the connection operation triggered by the user on the intelligent sound box and sending a connection establishment message.

And the receiving module 22 is configured to receive identification information of the mobile terminal device, where the identification information is sent after the mobile terminal device receives the connection establishment instruction.

And the establishing module 23 is configured to establish a communication connection between the intelligent sound box and the mobile terminal device if the identification information of the intelligent sound box is matched with the identification information of the mobile terminal device.

Optionally, the receiving module 22 is further configured to receive identification information of the mobile terminal device in response to a scanning operation triggered by the user on the mobile terminal device.

The establishing module 23 is further configured to establish a communication connection between the smart speaker and the mobile terminal device.

The apparatus shown in fig. 12 may perform the method of the embodiment shown in fig. 1 to 7, and reference is made to the relevant description of the embodiment shown in fig. 1 to 7 for a part of this embodiment that is not described in detail. The implementation process and technical effects of this technical solution are described in the embodiments shown in fig. 1 to 7, and are not described herein.

The internal functions and structures of the service providing apparatus are described above, and in one possible design, the structure of the service providing apparatus may be implemented as an electronic device, as shown in fig. 13, which may include: a processor 31 and a memory 32. Wherein the memory 32 is for storing a program for supporting the electronic device to execute the service providing method provided in the embodiments shown in fig. 1 to 7 described above, and the processor 31 is configured for executing the program stored in the memory 32.

The program comprises one or more computer instructions which, when executed by the processor 31, are capable of carrying out the steps of:

acquiring first data corresponding to the first function;

acquiring second data corresponding to the second function;

Optionally, the processor 31 is further configured to perform all or part of the steps in the embodiments shown in fig. 1 to 7.

The electronic device may further include a communication interface 33 in the structure for the electronic device to communicate with other devices or a communication network.

In addition, an embodiment of the present invention provides a computer storage medium storing computer software instructions for the electronic device, which includes a program for executing the service providing method in the embodiment of the method shown in fig. 1 to 7.

Fig. 14 is a schematic structural diagram of another service providing apparatus according to an embodiment of the present invention, as shown in fig. 14, the apparatus includes:

The determining module 41 is configured to determine, in response to a selection operation of the motion state control service by the first user, that a plurality of functions required for implementing the motion state control service include a display function provided by the smart speaker and a sensing data acquisition function provided by the first mobile terminal device used by the second user.

The first obtaining module 42 is configured to obtain video data corresponding to the display function.

The second obtaining module 43 is configured to obtain first sensing data corresponding to the sensing data collecting function, where the first sensing data reflects a motion state of the first mobile terminal device.

And the control module 44 is configured to control, according to the first sensing data, a motion state of a target object corresponding to the second user in the video frame to be the same as a motion state of the first mobile terminal device, so that the intelligent sound box realizes a motion state control service.

Optionally, the plurality of functions further includes a sensing data acquisition function provided by a second mobile terminal device used by the first user.

The second obtaining module 43 is further configured to obtain second sensing data corresponding to the sensing data collecting function, where the second sensing data reflects a motion state of the second mobile terminal device.

The control module 44 is further configured to control, according to the second sensing data, a motion state of a target object corresponding to the first user in the video frame to be the same as a motion state of the second mobile terminal device, so that the intelligent sound box realizes a motion state control service.

The apparatus shown in fig. 14 may perform the method of the embodiment shown in fig. 9 to 10, and reference is made to the relevant description of the embodiment shown in fig. 9 to 10 for a part of this embodiment that is not described in detail. The implementation process and technical effects of this technical solution are described in the embodiments shown in fig. 9 to 10, and are not described herein.

The internal functions and structures of the service providing apparatus are described above, and in one possible design, the structure of the service providing apparatus may be implemented as an electronic device, as shown in fig. 15, which may include: a processor 51 and a memory 52. Wherein the memory 52 is for storing a program for supporting the electronic device to execute the service providing method provided in the embodiments shown in fig. 9 to 10 described above, and the processor 51 is configured to execute the program stored in the memory 52.

The program comprises one or more computer instructions which, when executed by the processor 51, are capable of carrying out the steps of:

acquiring video data corresponding to the display function;

Optionally, the processor 51 is further configured to perform all or part of the steps in the embodiments shown in fig. 9 to 10.

The electronic device may further include a communication interface 53 in the structure of the electronic device, for the electronic device to communicate with other devices or a communication network.

In addition, an embodiment of the present invention provides a computer storage medium storing computer software instructions for the electronic device, which includes a program for executing the service providing method according to the embodiment of the method shown in fig. 9 to 10.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. The service providing method is characterized by being applied to the intelligent sound box and comprising the following steps:

determining a plurality of functions required for realizing the target service in response to the selection operation of a user on the target service, wherein the plurality of functions comprise a first function provided by the intelligent sound box and a second function provided by mobile terminal equipment;

acquiring first data corresponding to the first function;

acquiring second data corresponding to the second function;

and according to the first data from the intelligent sound box and the second data from the mobile terminal equipment, realizing the target service.

2. The method of claim 1, wherein the obtaining the second data corresponding to the second function comprises:

Inquiring a preset interface corresponding to the second function in the mobile terminal equipment in a preset configuration file recorded with the corresponding relation between the interface and the function;

and acquiring second data corresponding to the second function by calling the preset interface.

3. The method of claim 1, wherein the first function is a display function and the first data is video data; the second function is a sensing data acquisition function, and the second data is sensing data describing the motion state of the mobile terminal equipment;

the implementation of the target service according to the first data and the second data includes:

and controlling the motion state of a target object in a video picture to be the same as the motion state of the mobile terminal equipment according to the sensing data so as to enable the intelligent sound box to realize motion state control service.

4. A method according to claim 3, wherein the obtaining the first data corresponding to the first function includes:

responding to the starting operation of the user to the intelligent sound box trigger, and acquiring and displaying the video data;

the obtaining the second data corresponding to the second function includes:

And responding to the mobile operation triggered by the user on the mobile terminal equipment, and acquiring the sensing data acquired by the sensor of the mobile terminal equipment in the motion process.

5. The method of claim 1, wherein the first function is a display function and the first data is video data; the second function is an input function, and the second data is interaction data which is input by the user and is specific to the video data;

and displaying the interaction data and the video data so as to enable the intelligent sound box to realize interaction service.

6. The method of claim 5, wherein the obtaining the first data corresponding to the first function comprises:

responding to the starting operation of the user on the triggering of the intelligent sound box, and acquiring the video data;

the obtaining the second data corresponding to the second function includes:

and responding to the input operation triggered by the user to the mobile terminal equipment, and acquiring the interaction data input by the user in the mobile terminal equipment.

7. The method of claim 1, wherein the first function is a voice interaction function, and the first data is a voice command input by the user to the intelligent speaker; the second function is an image recognition function, and the second data is a recognition result of the image;

The obtaining the second data corresponding to the second function includes:

sending the voice command to the mobile terminal equipment so as to control the mobile terminal equipment to shoot an image and identify an object contained in the shot image;

acquiring an identification result generated by the mobile terminal equipment;

and outputting the identification result so as to enable the intelligent sound box to realize identification service.

8. The method of claim 1, wherein before determining the plurality of functions required to implement the target service in response to a user selection operation of the target service, the method further comprises:

responding to the connection operation triggered by the user on the intelligent sound box, and sending a connection establishment message;

receiving identification information of the mobile terminal equipment, wherein the identification information is sent after the mobile terminal equipment receives the connection establishment message;

and if the identification information of the intelligent sound box is matched with the identification information of the mobile terminal equipment, establishing communication connection between the intelligent sound box and the mobile terminal equipment.

9. The method of claim 1, wherein before determining the plurality of functions required to implement the target service in response to a user selection operation of the target service, the method further comprises:

receiving identification information of the mobile terminal equipment in response to a scanning operation triggered by the user on the mobile terminal equipment;

and establishing communication connection between the intelligent sound box and the mobile terminal equipment.

10. A service providing method, applied to an intelligent speaker used by a first user, comprising:

acquiring video data corresponding to the display function;

11. The method of claim 10, wherein the plurality of functions further comprises a sensory data acquisition function provided by a second mobile terminal device used by the first user:

after the video data corresponding to the display function is acquired, the method further includes:

acquiring second sensing data corresponding to the sensing data acquisition function, wherein the second sensing data reflects the motion state of the second mobile terminal equipment;

and controlling the motion state of the target object corresponding to the first user in the video picture to be the same as the motion state of the second mobile terminal equipment according to the second sensing data, so that the intelligent sound box realizes motion state control service.

12. A service providing system, the system comprising: an intelligent sound box and mobile terminal equipment;

the intelligent sound box is used for responding to the selection operation of a user on a target service, determining a plurality of functions required by realizing the target service, wherein the plurality of functions comprise a first function provided by the intelligent sound box and a second function provided by the mobile terminal equipment; acquiring first data corresponding to the first function; acquiring second data corresponding to the second function sent by the mobile terminal equipment; and implementing the target service according to the first data from the intelligent sound box and the second data from the mobile terminal equipment.

13. The system of claim 12, wherein the intelligent speaker is specifically configured to:

responding to the starting operation triggered by the user on the intelligent sound box, acquiring and displaying video data, wherein the video data corresponds to the first function;

responding to the mobile operation triggered by the user on the mobile terminal equipment, and acquiring sensing data acquired by a sensor of the mobile terminal equipment in the motion process, wherein the sensing data corresponds to the second function;

14. The system of claim 12, wherein the intelligent speaker is specifically configured to:

responding to the starting operation triggered by the user on the intelligent sound box, and acquiring video data, wherein the video data corresponds to the first function;

responding to the input operation triggered by the user to the mobile terminal equipment, and acquiring interaction data input by the user in the mobile terminal equipment, wherein the interaction data corresponds to the second function;

15. The system of claim 12, wherein the intelligent speaker is specifically configured to:

responding to a voice command input by the user to the intelligent sound box, and sending the voice command to the mobile terminal equipment, wherein the voice command corresponds to the first function; outputting a recognition result generated by the mobile terminal device to enable the intelligent sound box to realize recognition service, wherein the recognition result corresponds to the second function;

the mobile terminal device is specifically configured to: capturing an image in response to the voice instruction; and identifying the photographed image to obtain the identification result.

16. A service providing system, comprising: an intelligent sound box used by a first user and a first mobile terminal device used by a second user;

Acquiring video data corresponding to the display function and first sensing data corresponding to the sensing data acquisition function, wherein the first sensing data reflects the motion state of the first mobile terminal equipment;

17. The system of claim 16, wherein the system further comprises: the second mobile terminal equipment is used by the first user and provides a sensing data acquisition function;

the intelligent sound box is further used for acquiring second sensing data corresponding to the sensing data acquisition function, and the second sensing data reflects the motion state of the second mobile terminal equipment;

18. A service providing apparatus, comprising:

The system comprises a determining module, a processing module and a control module, wherein the determining module is used for determining a plurality of functions required by realizing target service in response to the selection operation of a user on the target service, and the plurality of functions comprise a first function provided by an intelligent sound box and a second function provided by mobile terminal equipment;

and the response module is used for realizing the target service according to the first data from the intelligent sound box and the second data from the mobile terminal equipment.

19. An electronic device, comprising: a memory, a processor; wherein the memory has stored thereon executable code which, when executed by the processor, causes the processor to perform the service providing method of any of claims 1 to 9.

20. A non-transitory machine-readable storage medium having stored thereon executable code which, when executed by a processor of an electronic device, causes the processor to perform the service providing method of any of claims 1 to 9.

21. A service providing apparatus, comprising:

22. An electronic device, comprising: a memory, a processor; wherein the memory has stored thereon executable code which, when executed by the processor, causes the processor to perform the service providing method of claim 10 or 11.

23. A non-transitory machine-readable storage medium having stored thereon executable code which, when executed by a processor of an electronic device, causes the processor to perform the service providing method of claim 10 or 11.