CN112578909B

CN112578909B - Method and device for equipment interaction

Info

Publication number: CN112578909B
Application number: CN202011471453.4A
Authority: CN
Inventors: 钟鹏飞; 张宁; 廖加威; 任晓华; 车炜春; 黄晓琳; 董粤强; 赵慧斌
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-12-15
Filing date: 2020-12-15
Publication date: 2024-05-31
Anticipated expiration: 2040-12-15
Also published as: CN112578909A

Abstract

The application discloses a method and a device for equipment interaction, and relates to an artificial intelligence technology. The specific implementation scheme is as follows: the first device listens for sound signals. And executing the first interaction action according to the monitored sound signal. If the sound signal is not monitored, the first device judges whether an object exists in a preset visual field range according to the shot image frame. If so, a second interactive action is executed according to the photographed image frame. Through monitoring to sound and shooting of image frame, realize the monitoring to the user from two aspect cooperations of hearing and vision to can effectively promote the accuracy of monitoring the user, and because the shooting scope of camera is limited, therefore use sound collection system to monitor the sound at first, can further promote user's monitoring's efficiency and success rate.

Description

Method and device for equipment interaction

Technical Field

The present application relates to an artificial intelligence technology in computer technology, and in particular, to a method and apparatus for device interaction.

Background

With the continuous development of computer technology, intelligent devices such as robots play an increasingly important role in daily life.

The intelligent device generally needs to interact with a user to realize corresponding functions, at present, when the intelligent device is used for realizing interaction, a camera monitors whether the user exists in the range of the current intelligent device, and if the user exists, the intelligent device and the user actively interact.

However, since the photographing range of the camera is limited, the user monitoring is performed only according to the camera, which results in a low success rate of the user monitoring.

Disclosure of Invention

The application provides a method, a device, equipment and a storage medium for equipment interaction.

According to an aspect of the present application, there is provided a method of device interaction, comprising:

The first device monitors sound signals;

executing a first interaction action according to the monitored sound signal;

If the sound signal is not monitored, the first device judges whether an object exists in a preset visual field range according to the shot image frame;

and if so, executing a second interaction action according to the shot image frame.

According to another aspect of the present application, there is provided an apparatus for device interaction, comprising:

the monitoring module is used for monitoring the sound signal by the first equipment;

The processing module is used for executing a first interaction action according to the monitored sound signal;

The judging module is used for judging whether an object exists in a preset visual field range according to the shot image frame if the sound signal is not monitored;

and the processing module is also used for executing a second interaction action according to the shot image frame if the shot image frame exists.

According to another aspect of the present application, there is provided an electronic apparatus including:

At least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.

According to another aspect of the application there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method as described above.

According to another aspect of the present application, there is provided a computer program product comprising: a computer program stored in a readable storage medium, from which it can be read by at least one processor of an electronic device, the at least one processor executing the computer program causing the electronic device to perform the method of the first aspect.

The technology solves the problem of low success rate of user monitoring.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the application or to delineate the scope of the application. Other features of the present application will become apparent from the description that follows.

Drawings

The drawings are included to provide a better understanding of the present application and are not to be construed as limiting the application. Wherein:

fig. 1 is a schematic structural diagram of a robot according to an embodiment of the present application;

FIG. 2 is a flow chart of a method of device interaction provided by an embodiment of the present application;

FIG. 3 is a second flowchart of a method for device interaction according to an embodiment of the present application;

fig. 4 is a schematic diagram of implementation of a listening direction provided by an embodiment of the present application;

FIG. 5 is a schematic diagram of an implementation of determining a target azimuth by a first azimuth according to an embodiment of the present application;

FIG. 6 is a schematic diagram illustrating an implementation of determining a target azimuth by using a plurality of first azimuth according to an embodiment of the present application;

FIG. 7 is a flowchart III of a method for device interaction provided by an embodiment of the present application;

FIG. 8 is a diagram illustrating a predetermined field of view according to an embodiment of the present application;

FIG. 9 is a diagram illustrating a preset field of view according to an embodiment of the present application

FIG. 10 is a schematic diagram of an implementation of an object determination target object according to an embodiment of the present application;

FIG. 11 is a schematic diagram of an implementation of a plurality of object determination target objects according to an embodiment of the application

FIG. 12 is a schematic diagram of an implementation of determining a predicted walking path according to an embodiment of the present application;

FIG. 13 is a schematic diagram of an interaction range provided by an embodiment of the present application;

FIG. 14 is a schematic diagram of interaction directions provided by an embodiment of the present application;

FIG. 15 is a flow chart of a method for device interaction according to an embodiment of the present application;

FIG. 16 is a schematic structural view of an apparatus for device interaction according to an embodiment of the present application;

FIG. 17 is a block diagram of an electronic device for implementing a method of device interaction of an embodiment of the application.

Detailed Description

Exemplary embodiments of the present application will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present application are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

For a better understanding of the technical solution of the present application, the following further details are provided for the related background of the present application:

Along with the continuous development of science and technology, the current intelligent device plays an increasingly important role in daily life, wherein the intelligent device refers to a device with sensing function, thinking judgment and computing processing capability and can execute related functions.

In one possible implementation manner, the intelligent device may be, for example, a robot, where the robot is an intelligent device capable of semi-autonomous or fully autonomous operation, and it is understood that the robot is classified according to appearance, use, and the like, and may have a plurality of different categories, and the specific category of the robot is not limited in the present application, and may be selected according to actual needs.

Aiming at the problems in the prior art, the application provides the following technical conception: the voice acquisition device and the camera of the intelligent equipment are utilized to realize the monitoring of the user in a cooperative manner, so that the success rate of the monitoring of the user can be improved, and the voice acquisition device can be used for monitoring the voice firstly because the shooting range of the camera is limited, so that the efficiency and the success rate of the monitoring of the user are further improved.

Taking a robot as an example, the structure of the robot will be described with reference to fig. 1, and fig. 1 is a schematic structural diagram of the robot according to an embodiment of the present application.

As shown in fig. 1, the robot includes:

image acquisition device, sound acquisition device, motion chassis, expression display system device.

The image capturing device may be, for example, a camera, and may capture image frames, the robot may monitor a user and interact with the user according to the captured image frames, in the actual implementation process, the number of image capturing devices installed on the robot may be one or more, which is not limited in this embodiment, and the specific location of the camera may also be selected according to the actual requirement, and the indication in fig. 1 is only an exemplary introduction.

And the sound collection device can collect the sound around the robot, and the monitoring and the sound of the user can be carried out according to the collected sound, so in the embodiment, the sound collection device and the image collection device can be combined, and the monitoring of the user can be realized from the aspects of hearing and vision.

Wherein, can install a plurality of sound collection system in the robot for example, a plurality of sound collection system can install the different positions at the robot to realize the collection to the sound of each position, in actual implementation process, the realization of the specific quantity and the position of sound collection system installation can be selected according to actual demand, and this embodiment does not do special limitation to this.

And the motion chassis of the robot can realize the motion of the robot, such as the displacement of the robot, the rotation of the robot direction, and the like, so that the interaction with a user can be realized.

And the expression display system device of the robot can display different expressions, for example, the corresponding expressions can be displayed on the expression display system according to the content of the current interaction, and compared with the prior art that the robot only interacts with the user according to limbs, the robot can interact with the user by combining limb actions and expression feedback, so that the diversity and flexibility of man-machine interaction are improved, and the user experience is improved.

Based on the above-described structure of the robot, in the implementation scheme of the present application, taking the image acquisition device as a camera as an example, for example, the image shot by the camera of the robot may be preprocessed within the effective recognition range of the camera, so as to determine the user location area. The distance between the user and the robot is then measured. Predicting a user walking direction path by optimizing image frame sequence characteristics, and judging whether the user walking direction path enters a robot interaction area. And the robot makes corresponding feedback through limb actions and expressions within the effective recognition range.

And acquiring the sound signals of the target area in real time by utilizing the sound acquisition devices in different directions within the effective identification range, and monitoring the sound magnitude decibel values in different directions in real time. The sounding position of the user is judged by identifying the difference value of the volume and the environmental volume, and the position with larger sound source difference value is the position of the user, so that the position of the user relative to the robot is defined. When the multi-person interaction is identified, the position of the user can be judged through sound source positioning, and the robot can make corresponding feedback through limb actions and expressions in an effective range. So that the monitoring and interaction of the user can be achieved based on visual and auditory coordination.

The method for device interaction provided by the present application is described below with reference to specific embodiments, and fig. 2 is a flowchart of the method for device interaction provided by the embodiment of the present application.

As shown in fig. 2, the method includes:

S201, the first device monitors a sound signal.

In this embodiment, the first device may be, for example, the robot described above, or the first device may also be any one of possible intelligent devices, as long as the first device may implement monitoring of sound, capturing of an image, and interaction with a user, and the specific implementation manner of the first device is not particularly limited in this embodiment.

In one possible implementation manner, the first device may include, for example, sound collecting devices disposed in a plurality of different directions, so that when the first device monitors a sound signal, the first device may monitor the sound signal from the different directions of the first device, and if the sound collecting device in any direction monitors the sound signal, the first device may determine that the sound signal is monitored.

Or if the sound collecting devices in all the directions do not monitor the sound signals, the first device can be determined to not monitor the sound signals.

S202, executing a first interaction action according to the monitored sound signal.

In one possible implementation, when the first device listens to the sound signal, the first device may determine that there is a user around the first device, that is, the first device is currently monitoring the user from the auditory level, and at this time, the first device may actively interact with the user.

The first device may perform a first interaction according to the monitored sound signal, for example, and in one possible implementation, the first interaction of the first device may include, for example, a limb motion of the first device, and the first interaction of the first device may further include an expression motion of the first device, for example, the first device may analyze the monitored sound signal, so as to perform a corresponding limb motion and expression motion.

In another possible implementation manner, since the first device may monitor sound signals of different directions, the first device may further determine a target direction according to the monitored sound signal of at least one direction, where the target direction may be, for example, a direction in which the sound signal is largest, or may be a direction in which the sound signal is clearest, which is not limited in this embodiment. After determining the target bearing, the first device may turn the interactive surface towards the target bearing, thereby performing the first interactive action described above on the user of the target bearing.

In the actual implementation process, the specific implementation of the first interaction action can be selected according to the actual requirement, and the embodiment is not limited to this, as long as the first interaction action is performed by the first device and the user.

And S203, if the sound signal is not monitored, the first device judges whether an object exists in a preset visual field range according to the shot image frame.

In another possible implementation manner, the first device may not monitor the sound signal, where the first device may monitor the sound signal within a preset duration, and if the sound signal is not monitored within the preset duration, it determines that the sound signal is not monitored, where a specific setting of the preset duration may be selected according to an actual requirement.

Or the first device may not set a preset duration, but monitor the sound signal in real time, if the sound signal is not monitored at the current moment, immediately determining that the sound signal is not monitored, where the specific determination mode of not monitoring the sound signal is not limited in this embodiment, and may be selected according to actual requirements.

But at this time, there is no user on behalf of the periphery of the first device, and possibly the surrounding user does not make a sound, the first device may further continue to capture image frames according to the camera, and then monitor the user according to the captured image frames.

It can be understood that the camera can shoot an image in a certain range, so that the camera of the first device can shoot an image frame in a preset visual field range, and then the first device analyzes the shot image frame, so that whether an object exists in the preset visual field range can be determined.

The image frames may be analyzed, for example, to determine whether an object is present in the image frames, where the object may be a user, for example. If at least one object exists in the image frame, it may be determined that a user exists in a preset field of view, and in this embodiment, the number of objects included in the image frame is not limited, and may be determined according to actual requirements. If no object exists in the image frame, it can be determined that no user exists in the preset visual field range.

And S204, if the image frame exists, executing a second interaction action according to the shot image frame.

In one possible implementation, if the first device determines that the object exists in the preset field of view according to the captured image frame, the first device may determine that the user exists around the first device at this time, that is, the first device monitors the user from the visual aspect at this time.

Then, the first device may execute a second interaction according to the captured image frame, where the second interaction is similar to the first interaction described above, and may be, for example, a limb action or an expression action executed by the first device after analyzing the captured image frame, which is not limited to a specific implementation of the second interaction in this embodiment.

It should be noted that, the first interaction described above is performed according to the monitored sound signal, the second interaction is performed according to the captured image frame, in another possible implementation, after the sound signal is monitored, the user is monitored according to the image frame, or after the user is monitored according to the image frame, the final interaction may be determined according to the sound signal and the image frame together, and the specific implementation of the interaction performed by the first device is not particularly limited, and may be selected according to the actual requirement.

The method for equipment interaction provided by the embodiment of the application comprises the following steps: the first device listens for sound signals. And executing the first interaction action according to the monitored sound signal. If the sound signal is not monitored, the first device judges whether an object exists in a preset visual field range according to the shot image frame. If so, a second interactive action is executed according to the photographed image frame. Through monitoring to sound and shooting of image frame, realize the monitoring to the user from two aspect cooperations of hearing and vision to can effectively promote the accuracy of monitoring the user, and because the shooting scope of camera is limited, therefore use sound collection system to monitor the sound at first, can further promote user's monitoring's efficiency and success rate.

On the basis of the above embodiment, two parts of the interaction of the first device through auditory monitoring and the interaction of the first device through visual monitoring in the present application are respectively described below in connection with specific embodiments.

First, an implementation manner of implementing interaction of auditory monitoring will be described with reference to fig. 3 to 6, fig. 3 is a flowchart second of a method for implementing interaction of devices provided by an embodiment of the present application, fig. 4 is a schematic implementation diagram of a listening position provided by an embodiment of the present application, fig. 5 is a schematic implementation diagram of a first position determination target position provided by an embodiment of the present application, and fig. 6 is a schematic implementation diagram of a plurality of first position determination target positions provided by an embodiment of the present application.

As shown in fig. 3, the method includes:

s301, the first device listens for sound signals from different directions of the first device.

In this embodiment, the first device may include, for example, sound collecting devices disposed in different directions, where the sound collecting signals in each direction may collect sound signals in corresponding directions, so that listening to the sound signals from different directions of the first device may be achieved.

In one possible implementation manner, the number of the sound collecting devices of the first device may be, for example, 6, where the 6 sound collecting devices may be uniformly arranged on the first device, so as to realize sound collection around the robot in 360 directions, and it may be understood that the first device may perform 360 degrees of rotation in the active area, so that, no matter in which direction the first device rotates, since the monitoring of the sound signal is 360 degrees, the monitoring of the sound signal may be comprehensively realized.

For example, it can be understood with reference to fig. 4, it is assumed that 6 sound collecting devices are currently installed on the first device, so as to realize monitoring of sound signals of six directions, namely, direction 301, direction 302, direction 303, direction 304, direction 305 and direction 306.

Or in the actual implementation process, the number of the sound collecting devices arranged on the first device can be 4 or 8, and the embodiment does not limit the number of the sound collecting devices, and the sound collecting devices can be selected according to actual requirements, so long as the sound signals in different directions can be collected.

By monitoring the sound signals of the first equipment in different directions, detection of the user can be achieved no matter which angle the user stands at the first equipment, and therefore accuracy and success rate of user monitoring can be effectively improved.

S302, the first device determines the respective recognition volumes of the sound signals in different directions.

The first device collects sound signals in different directions, and can monitor the decibel values of the sound signals in different directions in real time, so that the respective recognition volumes of the sound signals in different directions are determined, wherein the recognition volumes are the decibel values of the sound signals.

S303, the first device judges whether the difference value of the identification volume between any two directions is smaller than or equal to a first threshold value according to the sound signals of different directions, if so, S304 is executed, and if not, S305 is executed.

In this embodiment, the first device needs to determine the specific sounding direction of the user based on the monitored sound signals of each direction, so as to interact with the user in a targeted manner.

In one possible implementation, if the first device listens for different locations, but listens for sound signals in only one location, then it may be determined that this location is the location of the user.

In another possible implementation manner, if the first device listens to different directions, and sound signals are monitored in different directions, a target direction in which a user is located needs to be determined in different directions, so that targeted interaction with the user is performed.

In one possible implementation of determining the target position in different positions, the position of the user with respect to the robot may be determined, for example, by determining the sounding position of the user by monitoring the difference between the recognized volume of the sound signals of the respective positions and the current ambient volume, for example, the difference between the recognized volume of the sound signals of the respective positions and the ambient volume is the largest, and thus the current ambient volume needs to be determined first.

The environmental volume may be understood as an average value of the currently collected sound signals in all directions, but in order to ensure the accuracy of the determined environmental volume, when the sound signal in a certain direction has too high volume or the sound signal in a certain direction has too low volume, the too high or too low sound signal should be removed, so as to determine the environmental volume.

Therefore, in this embodiment, it may be determined whether the difference between the recognition volumes between any two directions is less than or equal to the first threshold, where the specific implementation of the first threshold may be selected according to the actual requirement, and this embodiment is not particularly limited.

In one possible implementation of determining the first threshold, for example, an initial environmental volume reference value may be set, for example, an indoor quiet environmental volume is 45 db, a value of 45 db may be used as the initial reference value, for example, the first threshold described above may be set, and after the first device is started, a self-check is performed, and by comparing with the first threshold, a numerical reference of the environmental volume at the location may be automatically determined, so as to implement real-time calculation of the environmental volume.

S304, determining an average value of the identification volume of each sound signal in each direction as the environment volume.

In one possible implementation, if the difference between the recognition volumes of any two directions is less than or equal to the first threshold, which indicates that the difference between the recognition volumes of the sound signals of the directions is not large, then an average value of the respective recognition volumes of the sound signals of the directions may be determined, and the average value may be determined as the environmental volume.

Here, for example, it is assumed that the recognition volumes of the current 6-azimuth sound signals are respectively: 67 db, 95 db, 101 db, 88 db, 66 db, 71 db, and assuming that the first threshold is set to 45, it can be determined that the difference in the recognition volumes between any two orientations is less than 45 based on the example described herein, it can be determined that the average value of the respective recognition volumes of the sound signals of the above 6 orientations is 81.3 at this time, and it can be determined that the environmental volume is 81.3 at this time.

In the actual implementation process, the recognition volume of the sound signals in each direction may be selected according to the actual requirement, which is not particularly limited in this embodiment.

S305, removing the maximum recognition volume and the minimum recognition volume from the recognition volumes of the sound signals in all directions, and determining the average value of the rest recognition volumes as the environment volume.

In another possible implementation, if the difference in the recognition volumes between any two orientations is not less than or equal to the first threshold, for example, the difference in the recognition volumes between the presence of certain two orientations is less than or equal to the first threshold, then the difference in the recognition volumes of the sound signals of the presence of certain orientations and the recognition volumes of the sound signals of the remaining orientations is greater.

In order to ensure the accuracy of the finally determined environmental volume, the average value of the remaining recognition volumes may be determined after the largest recognition volume and the smallest recognition volume are removed from the recognition volumes of the sound signals in each direction, and the average value determined at this time may be determined as the environmental volume.

Here, for example, it is assumed that the recognition volumes of the current 6-azimuth sound signals are respectively: 67 db, 25 db, 145 db, 88 db, 66 db, 71 db, and assuming that the first threshold is set to 45, it can be determined that, based on the example described herein, not the difference in the recognition volume between any two orientations is less than 45, say, the difference between 25 db and 145 db reaches 120, which is greater than the first threshold 45. Then the maximum recognition volume 145 and the minimum recognition volume 25 need to be removed at this time, and then the average value of the remaining recognition volumes 67, 88, 66, 71 is determined to be 73, and then the environmental volume can be determined to be 73 at this time.

It can be understood that in this embodiment, by first determining the difference between the recognition volumes of the sound signals in any two directions, and then determining the average value of the recognition volumes in each direction as the environmental volume when the difference is smaller than the first threshold, the determination of the environmental volume can be simply and efficiently implemented; and when the difference value between any two directions is not smaller than the first threshold value, firstly providing the maximum environment volume and the minimum environment volume, and then determining the average value of the rest recognition volumes as the environment volume, thereby avoiding the influence on the environment volume caused by overlarge or overlarge sound in a certain direction and ensuring the accuracy of the determined environment volume.

S306, at least one first azimuth with the identification volume larger than or equal to a second threshold value is determined.

Based on the above description, in one possible implementation, when the sound signal is monitored in only one direction, the direction in which the user monitors the direction of the sound signal is located may be determined, and thus the number of directions in which the sound signal is monitored may be determined.

In order to ensure the accuracy of the monitored sound signal, the identification volume of the monitored sound signal may be compared with the second threshold, and the position where the identification volume is greater than or equal to the second threshold is determined as the first position, where the first position is the position where the monitored sound signal is determined.

Say that the second threshold may be set to 0, this means that a certain bearing is currently determined to be the first bearing as long as a sound signal is monitored in this bearing.

For another example, the second threshold may be set to 3, which indicates that the monitored sound signal is determined to be a sound signal that does not need to be processed only when the recognition volume of the monitored sound signal is determined to be 3 db or more.

In the actual implementation process, the specific setting of the second threshold value can be selected according to the actual requirement, and the embodiment does not limit the specific setting, and the first azimuth of the monitored sound is determined by comparing the identification volume with the second threshold value, so that the accuracy of the determined azimuth of the monitored sound can be effectively ensured.

S307, judging whether the number of the first azimuth is one, if so, executing S308, and if not, executing S309.

After determining at least one first orientation, if the number of first orientations is one, the user orientation can be directly determined at this time, and if the number of first orientations is more than one, further determination is needed, so that whether the number of first orientations is one can be determined.

S308, determining the first azimuth as a target azimuth.

In a possible implementation, if the number of first orientations is one, the first orientation may be determined to be the target orientation, for example, as can be understood with reference to fig. 5, and as shown in fig. 5, there are 6 orientations of 501, 502, 503, 504, 505, 506 currently, and assuming that the second threshold is 0, and assuming that the recognition volume of the sound signal of only the orientation 504 is greater than the second threshold, that is, only the sound signal is monitored at the orientation 504, the orientation 504 may be determined to be the target orientation at this time.

S309, respectively determining differences between the identification volume and the environment volume of each first azimuth, and determining the first azimuth with the largest difference as the target azimuth.

In another possible manner, if the number of first orientations is greater than one, which indicates that the sound signal is currently monitored in all of the different orientations, then a user orientation for interaction with the user needs to be determined in the different orientations.

In one possible implementation manner of determining the target azimuth, if the environmental volume is determined, for example, a difference between the identification volume of each first azimuth and the environmental volume may be determined, and then the first azimuth with the largest difference is determined as the target azimuth.

For example, as can be understood with reference to fig. 6, assuming that there are 6 orientations 601, 602, 603, 604, 605, 606, and taking the example that the second threshold value is 0, and assuming that the recognition volume of the sound signals of the three orientations of the current orientation 602, the orientation 604, and the orientation 606 is greater than the second threshold value, that is, the sound signals are monitored in all of the three orientations, it may be determined that there are three first orientations, that is, the orientation 602, the orientation 604, and the orientation 606.

Thereafter, differences between the recognized volume and the ambient volume of the three first orientations may be determined, and the first orientation with the largest difference may be determined as the target orientation, referring to fig. 6, assuming that the recognized volume of the orientation 602 is 130 db, the recognized volume of the orientation 604 is 75 db, the sensed orientation of the orientation 606 is 90 db, and assuming that the ambient volume at that time is 73 db, the first orientation with the largest difference among the three first orientations may be determined as the orientation 602, and the orientation 602 may be determined as the target orientation.

In the foregoing description, the implementation manner of determining the target position according to the difference between the recognition volume of each first position and the environmental volume is described, in another possible implementation manner, when the sound signal is monitored in different positions, for example, the position of the first monitored sound signal may be determined as the target position, and the specific implementation manner of determining the target position is not particularly limited, and may be selected according to actual needs, so long as it is the position selected from the different positions where the sound signal is monitored.

And S310, the first device adjusts the interaction interface to a target-oriented orientation and executes a first interaction action.

The target orientation in this embodiment refers to a currently determined position of the user relative to the first device, and for the purpose of implementing pertinence and interacting with the user, the first device may enable the interaction interface to face the target orientation and perform the first interaction action.

For example, the expressive display system means of the first device may be turned to the target orientation, thereby performing the first interaction. Taking the first device as an example of a robot, for example, the robot may perform the first interaction within an effective range, where the effective range refers to a range of motion of the robot performing a limb motion, such as body rotation, limb movement, etc., and may be within a circular range with the robot as a center and a radius of 1.5 meters, for example, the robot performing body and limb movements. When the robot can perform body and limb movements after determining the target azimuth of the user, the expression feedback of the robot can be controlled through the expression display system device.

The implementation of specific limb actions and expression feedback can be selected and set according to actual requirements, and the embodiment is not particularly limited.

The method for equipment interaction provided by the embodiment of the application comprises the following steps: the first device listens for sound signals from different directions of the first device. The first device determines the respective identified volumes of the sound signals in different directions. The first device judges whether the difference value of the recognition volumes between any two directions is smaller than or equal to a first threshold value according to the sound signals in different directions, and if so, the average value of the recognition volumes of the sound signals in all directions is determined to be the environment volume. If not, the maximum recognition volume and the minimum recognition volume are removed from the recognition volumes of the sound signals in all directions, and the average value of the rest recognition volumes is determined as the environment volume. At least one first bearing is determined that identifies a volume greater than or equal to a second threshold. And judging whether the number of the first orientations is one, if so, determining the first orientations as target orientations. If not, respectively determining the difference value of the identification volume and the environment volume of each first azimuth, and determining the first azimuth with the largest difference value as the target azimuth. The first device adjusts the interactive interface to a target-oriented orientation and performs a first interactive action.

By monitoring sounds in different directions, when the sound in one direction is monitored, the direction is determined to be the target direction, or when the sound in the different directions is monitored, the target direction is determined in the different directions, and then the first equipment performs targeted interaction with the user in the target direction, so that the targeted and effective interaction of the equipment can be effectively improved. And when the target azimuth is determined in the plurality of first azimuths, determining the azimuth with the largest difference as the target azimuth by comparing the difference between the identification volume of each first azimuth and the environmental volume, thereby effectively ensuring the rationality of the determined target azimuth, and when the environmental volume is determined, determining the average value of the identification volume of each azimuth, effectively ensuring the accuracy of the determined environmental volume.

In the following, description will be given of implementation manners of implementing interaction for visual monitoring in conjunction with fig. 7 to 14, fig. 7 is a flowchart III of a method for implementing interaction for equipment provided by an embodiment of the present application, fig. 8 is a schematic diagram I of a preset field of view provided by an embodiment of the present application, fig. 9 is a schematic diagram II of a preset field of view provided by an embodiment of the present application, fig. 10 is a schematic diagram of implementing an object determination target object provided by an embodiment of the present application, fig. 11 is a schematic diagram of implementing a plurality of object determination target objects provided by an embodiment of the present application, fig. 12 is a schematic diagram of implementing a predicted walking path determined by an embodiment of the present application, fig. 13 is a schematic diagram of an interaction range provided by an embodiment of the present application, and fig. 14 is a schematic diagram of an interaction direction provided by an embodiment of the present application.

As shown in fig. 7, the method includes:

S701, the first device judges whether an object exists in a preset visual field range according to the shot image frame, if so, S702 is executed, and if not, S701 is executed.

In this embodiment, the camera of the first device may capture an image frame within a preset field of view, and then the first device may determine whether an object exists within a preset identification range according to the captured image frame.

In one possible implementation, the preset field of view may be, for example, as shown in fig. 8 and fig. 9, first, referring to fig. 8, the distance of the preset field of view may be, for example, in the range of 0-6 meters from the first device, and referring to fig. 9, for example, the up-down viewing angle of the camera is: the left and right viewing angles of, for example, a camera are 70 °:124 deg..

In connection with fig. 8 and 9, it may be determined that, in one possible implementation, the preset field of view may be a range of 70 ° for the up-down view, 124 ° for the left-right view, and 6 meters for the camera. In the actual implementation process, the preset field of view range may be selected and set according to the actual requirement, which is not particularly limited in this embodiment.

In one possible implementation, the image frame may also be first preprocessed before the determination is made from the image frame, where the preprocessing may include at least one of: the antialiasing processing and the image information equalization processing are performed so that the image frames can meet the requirements of a database format.

If it is determined whether an object exists according to the image frame, for example, an image recognition process may be performed on the image frame, so as to determine whether the object exists in the image frame.

If it is determined that no object exists in the current image frame, it may be determined that no object exists in the preset field of view, and then the image frame may be continuously photographed and the object may be identified according to the photographed image frame, so that step S701 may be repeatedly performed.

S702, the first device determines, according to the captured image frame, the number of objects included in the image frame.

In another possible implementation manner, if the first device determines that the object exists in the preset visual field according to the shot image frame, the number of the objects can be further determined, so that the user can interact with the first device in a targeted manner.

Wherein the first device may determine the number of objects included in the image frame from the photographed image frame.

In one possible implementation, taking an object as an example, for example, image information of a user position may be determined according to a specific scene, and a Haar-like feature classifier file suitable for pedestrian recognition in the scene may be trained in advance, where the Haar feature is a digital image feature used for object recognition, and is a first instant face detection operation.

When locating a pedestrian, the pedestrian location area in the image frame can be located according to the pre-trained Haar-Link Features classifier, if one pedestrian location area is determined, it can be determined that one pedestrian currently exists, and if a plurality of pedestrian areas are determined, it can be determined that a plurality of pedestrians currently exist.

S703, if one object is included in the image frame, determining the one object in the image frame as a target object.

In one possible implementation, if it is determined that an object is included in the image frame, it may be determined that the object in the current image frame is the target object, and it is understood that the target object is the object that needs to be interacted with.

Referring to fig. 10, for example, if it is currently determined that one object 1001 is included in an image frame, it may be determined that the object 1001 is a target object.

S704, if a plurality of objects are included in the image frame, determining the object closest to the first device as the target object.

In another possible implementation, if it is determined that the image frame includes a plurality of objects, a target object may be determined among the plurality of objects, and in one possible implementation, an object closest to the first device may be determined as the target object.

For example, referring to fig. 11, if the currently determined image frame includes 3 objects, that is, an object 1101, an object 1102, and an object 1103, the object 1103 closest to the first device among the three objects may be determined as the target object.

The implementation manner of determining the distance between each object and the first device may be, for example, determining the distance between each object and the first device by using a distance sensor of the first device, or may also determine the distance between each object and the first device by using the distance in the image frame, which is not particularly limited in this embodiment, and may be selected according to actual requirements.

Or one may be selected randomly from a plurality of objects as the target object, and the implementation manner of determining the target object is not limited in this embodiment, as long as the target object is one of the plurality of objects.

S705, the first device determines a location area of the target object in the plurality of image frames according to the plurality of image frames corresponding to the target object.

After the target object is determined in the single-frame image, the first device may predict the walking path of the target object by using the target object as an interaction target, where the first device may obtain a plurality of image frames corresponding to the target object through the camera, so as to predict the walking path of the target object.

The first device may determine a distance between the target object and the first device in the plurality of image frames according to a position area of the target object in the plurality of image frames, so as to implement prediction of a walking path, and then the first device needs to determine the position area of the target object in the plurality of image frames according to the plurality of image frames corresponding to the target object.

As will be understood with reference to fig. 12, taking a single frame image as an example, the location area 1201 of the target object may be determined in the single frame image, and this may be performed for each of a plurality of image frames corresponding to the target object, so that the location area of the target object in the plurality of image frames may be determined, where an implementation of determining the location area may be determined by the feature classifier, for example, as described above.

S706, the first device determines distances between the target object in the plurality of image frames and the first device according to the location areas, the pixel widths of the image frames, and the focal length of the camera, respectively.

After determining the location areas of the target object in the plurality of image frames, the distance of the target object from the first device in each image frame may be determined from each location area.

The first device may calculate the distance between the target object and the first device according to the monocular ranging principle in the camera calibration principle, and take any frame therein as an example for introduction, for example, the width W of the position area may be determined according to the position area of the target object in the frame image, and the distance between the target object and the first device is denoted by D, and meanwhile, the pixel width of the image frame may be determined as P, and the focal length of the camera may be determined as F, where W, D, P, F may satisfy the following formula one:

F= (P x D)/W formula one

It will be appreciated that the focal length F of the camera, the pixel width P, and the width W of the location area may be determined, and thus the distance D between the target object and the first device in the frame image may be determined.

For example, referring to fig. 12, a distance 1202 between the target object and the first device may be determined currently in a single frame image.

The above description is given taking any frame as an example, and the above operation is performed for each of a plurality of image frames corresponding to the target object, so that the distance between the target object and the first device can be determined among the plurality of image frames.

S707, the first device determines a predicted travel path of the target object according to the distances between the target object and the first device in the plurality of image frames.

After determining the distance of the target object from the first device in the plurality of image frames of the target object, a predicted travel path of the target object may be determined based on the plurality of distances.

In one possible implementation, the prediction of the walking direction path of the pedestrian is estimated, for example, according to information in a section of continuously shot image key frames, that is, the distance between the currently determined target object and the first device, by using a bayesian formula probability function statistical model fitting method.

For example, referring to fig. 12, the current target object corresponds to a plurality of image frames, including t-2 frames, t-1 frames, t frames, and t+1 frames, respectively, the location area of the target object in the t-2 frames may be currently determined to be 1203, the location area of the target object in the t-1 frames may be 1204, the location area of the target object in the t frames may be 1205, the location area of the target object in the t+1 frames may be 1206, and based on the remaining locations, the distance between the target object and the first device may be determined, and then it is assumed that the predicted walk path 1207 may be predicted.

S708, the first device determines that there is overlap between the predicted travel path of the target object and the interaction range of the target object, if yes, S709 is executed, and if no, S710 is executed.

After the first device predicts the predicted walking path of the target object, it may be determined whether there is an overlap between the predicted walking path of the target object and the interaction range of the target object, that is, whether the target object may walk into the interaction range of the first device according to the predicted walking path.

As shown in fig. 13, for example, the interaction range of the first device may be, for example, a range of about 60 degrees centered about the center of the robot, and the interaction range may be, for example, a range of about 1.5 meters centered about the center of the robot, and the predicted travel path of the target object may specifically be that the predicted travel path may appear within the range of about 1.5 meters centered about the center of the robot.

In another possible implementation manner, it may also be determined whether the target object may walk into the interaction range of the robot according to, for example, the interaction direction of the target object and the first device, for example, the predicted walking path and the first device, and as may be understood with reference to fig. 14, for example, the predicted walking path and the first device of the target object may be classified into 5 classes, which are 180 °, 225 °, 135 °, 90 °, 270 ° shown in fig. 14, respectively.

For example, when the angle between the predicted walking path and the first device is 180 degrees, the interaction range that the target object can walk into the robot can be determined; for example, when the angle between the predicted walking path and the first device is 270 degrees and 90 degrees, the interaction range that the target object does not walk into the robot can be determined; for example, when the included angle between the predicted walking path and the first device is 225 degrees and 135 degrees, the interaction range that the target object possibly walks into the robot can be determined, further image frame shooting and prediction can be performed, and continuous observation can be performed.

In the actual implementation process, the specific setting of the interaction direction and the specific processing manner of each interaction direction may be selected according to the actual requirement, which is not particularly limited in this embodiment.

S709, the first device moves following the target object, and performs a second interaction.

In one possible implementation manner, if it is determined that the predicted walking path will walk into the interaction range of the first device, the robot may move along with the target object, and may correspondingly implement the limb motion and the expression of the first device to make corresponding feedback according to the determined interaction direction.

Taking a camera as an example, for example, when a target object enters the camera field of view of the robot, it is possible to implement the robot body and face to follow the target object for movement. The movement command is received by the camera recognition device, wherein the rotation range of the robot can be, for example, -60 degrees to 60 degrees, the front of the robot is 0 degrees, the robot xiaohu range is a fan-shaped area with the center of 1.5 meters and the radius of-60 degrees to 60 degrees, namely the range introduced by the above-mentioned figure 13.

S710, discarding the predicted walking path.

In another possible implementation, if it is determined that the predicted travel path does not walk into the interaction range of the first device, the first device may discard the predicted travel path, and then perform the above process to detect the object.

The method for equipment interaction provided by the embodiment of the application comprises the following steps: and the first equipment judges whether objects exist in a preset visual field range according to the shot image frames, if so, the first equipment determines the number of the objects included in the image frames according to the shot image frames. If one object is included in the image frame, one object in the image frame is determined as a target object. If a plurality of objects are included in the image frame, an object closest to the first device is determined to be a target object. The first device determines a position area of the target object in the plurality of image frames according to the plurality of image frames corresponding to the target object. The first device determines distances between the target object in the plurality of image frames and the first device according to the position areas, the pixel widths of the image frames and the focal length of the camera. The first device determines a predicted travel path of the target object based on a distance between the target object and the first device in the plurality of image frames. And the first equipment judges that the predicted walking path of the target object and the interaction range of the target object are overlapped, if so, the first equipment moves along with the target object and executes a second interaction action. If not, discarding the predicted walking path.

According to the method, the device and the system, the target object is determined according to the image frames, so that the target object can be interacted with the target object in a targeted mode, the accuracy and the efficiency of interaction are improved, meanwhile, according to the position areas of the target object in the image frames, the distance between the target object and the first device in the image frames is determined, prediction of a walking path can be achieved based on the determined distance, then the interaction with the target object is achieved according to the predicted walking path, and when the target object is determined to be close to the first device, active interaction is conducted, the interaction accuracy can be effectively guaranteed, interference to a user is avoided, and poor user experience is caused.

On the basis of the above-described embodiment, a system description is made below on the method of device interaction provided by the present application in conjunction with fig. 15, and fig. 15 is a schematic flow chart of the method of device interaction provided by the embodiment of the present application.

As shown in fig. 15, the method includes:

The first device first judges whether to monitor the sound signal, if so, the first device executes the process of user monitoring and interaction according to hearing.

Specifically, the first device may determine whether the current monitored sound signal is a single person or multiple persons, if the current monitored sound signal is a single person, the direction in which the sound signal of the single person is located is determined as the target direction, if the current monitored sound signal is multiple persons, one target direction is determined in different directions corresponding to the multiple persons, and the implementation manner of determining one target direction in the different directions may refer to the description of the above embodiment, which is not repeated herein.

After determining the target orientation, the interactive interface may be adjusted to face the target orientation and corresponding limb actions and expression feedback performed.

Or if the first device determines that the sound signal is not monitored at first, executing a flow of user monitoring and interaction according to vision.

Specifically, the first device may determine whether an object exists in a preset visual field range according to a captured image frame, if the object does not exist, the first device may determine whether the current object is a single person or multiple persons from the beginning, if the object exists, determine the single person as a target object, and if the current object is multiple persons, determine an object farthest from the first device as a target object.

The method includes the steps of determining a predicted walking path of a target object, wherein the implementation manner of determining the predicted walking path is described in the above embodiments, and is not repeated herein, then determining whether the target object is going into an interaction range of a first device according to the predicted walking path, if not, discarding the path, and if so, the first device moves according to the target object, performs limb motion, and performs expression feedback.

In summary, compared with the method for performing user monitoring only through a camera in the prior art and then performing interaction actions, the method for performing equipment interaction provided by the embodiment of the application can realize monitoring of the user by utilizing the sound collecting device and the camera in a cooperative manner, and because the shooting range of the camera is limited, the sound collecting device is used for monitoring the sound at first, so that the efficiency and the success rate of user monitoring can be further improved. In the interaction process, interaction can be realized based on limb interaction and expression feedback, so that the mobility and flexibility of interaction can be improved.

The application provides a method and a device for equipment interaction, which are applied to an artificial intelligence technology in a computer technology to achieve the purpose of improving the efficiency and the success rate of user monitoring.

Fig. 16 is a schematic structural diagram of a device interaction apparatus according to an embodiment of the present application. As shown in fig. 16, the apparatus 1600 for device interaction of the present embodiment may include: a monitor module 1601, a processing module 1602, and a judgment module 1603.

A listening module 1601, configured to listen to a sound signal by the first device;

A processing module 1602, configured to perform a first interaction according to the monitored sound signal;

A judging module 1603, configured to judge whether an object exists in a preset field of view according to a captured image frame if the sound signal is not monitored;

The processing module 1602 is further configured to execute a second interaction, if any, according to the captured image frame.

In a possible implementation manner, the monitoring module 1601 is specifically configured to:

the first device listens for sound signals from different directions of the first device.

In a possible implementation manner, the monitoring module 1601 specifically includes:

the first determining submodule is used for determining the environmental volume according to the sound signals in different directions by the first equipment;

The first determining submodule is further used for determining the respective identification volumes of the sound signals in different directions by the first equipment;

The first determining submodule is further used for determining a target azimuth of the monitored sound according to the identification volume and/or the environment volume by the first equipment;

And the adjustment sub-module is used for adjusting the interaction interface to face the target direction by the first device and executing the first interaction action.

In a possible implementation manner, the first determining submodule is specifically configured to:

If the difference value of the identification volume between any two directions is smaller than or equal to a threshold value, determining the average value of the identification volume of each sound signal of each direction as the environmental volume; or alternatively

If the difference value of the recognition volumes between any two azimuth is larger than the threshold value, removing the maximum recognition volume and the minimum recognition volume from the respective recognition volumes of the sound signals of all the azimuth, and determining the average value of the respective recognition volumes of the sound signals of the rest azimuth as the environment volume.

Determining at least one first position in which the identified volume is not 0;

if the number of the first orientations is one, determining the first orientation as the target orientation; or alternatively

If the number of the first orientations is larger than one, determining the difference value between the identification volume of each first orientation and the environmental volume, and determining the first orientation with the largest difference value as the target orientation.

In a possible implementation manner, the processing module 1602 is specifically configured to:

a second determining sub-module, configured to determine a target object in a captured image frame according to the image frame by the first device;

The second determining submodule is further used for determining a position area of the target object in a plurality of image frames according to the plurality of image frames corresponding to the target object by the first device;

The second determining submodule is further used for respectively determining the distances between a target object in the plurality of image frames and the first equipment according to the position areas, the pixel width of the image frames and the focal length of the camera by the first equipment;

the second determining submodule is further used for determining a predicted walking path of the target object according to the distance between the target object in the plurality of image frames and the first device by the first device;

And the execution sub-module is used for executing a second interaction action by the first device according to the predicted walking path of the target object.

In a possible implementation manner, the execution submodule is specifically configured to:

the first device judges that the predicted walking path of the target object and the interaction range of the target object are overlapped;

if yes, the first device moves along with the target object, and the second interaction action is executed.

In a possible implementation manner, the second determining submodule is specifically configured to:

The first device determines the number of objects included in the image frame according to the photographed image frame;

if the image frame comprises an object, determining the object in the image frame as a target object;

and if the image frame comprises a plurality of objects, determining the object closest to the first device as the target object.

In a possible implementation manner, the processing module 1603 is further configured to:

preprocessing the image frame, wherein the preprocessing comprises at least one of the following steps: and (5) antialiasing processing and image information equalization processing.

According to an embodiment of the present application, the present application also provides an electronic device and a readable storage medium.

According to an embodiment of the present application, there is also provided a computer program product comprising: a computer program stored in a readable storage medium, from which at least one processor of an electronic device can read, the at least one processor executing the computer program causing the electronic device to perform the solution provided by any one of the embodiments described above.

FIG. 8 illustrates a schematic block diagram of an example electronic device 800 that may be used to implement an embodiment of the application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 17, the electronic device 1700 includes a computing unit 1701 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1702 or a computer program loaded from a storage unit 1708 into a Random Access Memory (RAM) 1703. In the RAM 1703, various programs and data required for the operation of the device 1700 may also be stored. The computing unit 1701, the ROM 1702, and the RAM 1703 are connected to each other via a bus 1704. An input/output (I/O) interface 1705 is also connected to the bus 1704.

Various components in device 1700 are connected to I/O interface 1705, including: an input unit 1706 such as a keyboard, a mouse, etc.; an output unit 1707 such as various types of displays, speakers, and the like; a storage unit 1708 such as a magnetic disk, an optical disk, or the like; and a communication unit 1709 such as a network card, modem, wireless communication transceiver, or the like. The communication unit 1709 allows the device 1700 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunications networks.

The computing unit 1701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 1701 performs the various methods and processes described above, such as the method of device interaction. For example, in some embodiments, the method of device interaction may be implemented as a computer software program tangibly embodied on a machine-readable medium, e.g., the storage unit 1708. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 1700 via ROM 1702 and/or communication unit 1709. When the computer program is loaded into RAM 1703 and executed by computing unit 1701, one or more steps of the method of device interaction described above may be performed. Alternatively, in other embodiments, the computing unit 1701 may be configured to perform the method of device interaction in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual PRIVATE SERVER" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed embodiments are achieved, and are not limited herein.

The above embodiments do not limit the scope of the present application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application should be included in the scope of the present application.

Claims

1. A method of device interaction, comprising:

The first device monitors sound signals;

executing a first interaction action according to the monitored sound signal;

if yes, executing a second interaction action according to the shot image frame;

the performing a second interaction according to the captured image frame includes:

the first device determines a target object in a shot image frame according to the image frame;

The first device determines the position area of the target object in a plurality of image frames according to the plurality of image frames corresponding to the target object;

The first device respectively determines the distances between the target object in the plurality of image frames and the first device according to the position areas, the pixel width of the image frames and the focal length of the camera;

The first device determines a predicted walking path of a target object in the plurality of image frames according to the distance between the target object and the first device;

The first device performs a second interaction according to the predicted walking path of the target object.

2. The method of claim 1, wherein the first device listens for sound signals, comprising:

3. The method of claim 2, wherein the performing a first interaction from the monitored sound signal comprises:

the first equipment determines the environmental volume according to the sound signals in different directions;

The first device determines the respective recognition volumes of the sound signals in the different directions;

the first device determines the target azimuth of the monitored sound according to the identification volume and/or the environment volume;

The first device adjusts the interactive interface to face the target direction and executes the first interactive action.

4. A method according to claim 3, wherein the first device determining the ambient volume from the sound signals of the different orientations comprises:

5. The method of claim 4, wherein the first device determining a target location for listening to sound based on each of the identified volumes and/or the ambient volumes comprises:

6. The method of claim 1, wherein the first device performs a second interaction in accordance with the predicted path of travel of the target object, comprising:

7. The method of claim 1, wherein the first device determines a target object in a captured image frame from the image frame, comprising:

8. The method of claim 7, further comprising:

9. An apparatus of device interaction, comprising:

the processing module is further configured to execute a second interaction according to the captured image frame if the second interaction exists;

the processing module is specifically configured to:

The second determining submodule is further used for respectively determining the distances between the target object in the plurality of image frames and the first device according to the position areas, the pixel width of the image frames and the focal length of the camera by the first device;

10. The apparatus of claim 9, wherein the monitor module is specifically configured to:

11. The apparatus of claim 10, wherein the listening module specifically comprises:

12. The apparatus of claim 11, wherein the first determination submodule is specifically configured to:

13. The apparatus of claim 12, wherein the first determination submodule is specifically configured to:

14. The apparatus of claim 9, wherein the execution submodule is specifically configured to:

15. The apparatus of claim 9, wherein the second determination submodule is specifically configured to:

16. The apparatus of claim 15, the processing module further to:

17. An electronic device, comprising:

At least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

18. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-8.

19. A computer program product comprising a computer program which, when executed by a processor, implements the method of any of claims 1-8.