CN117806305A - Equipment control method and related device - Google Patents

Equipment control method and related device Download PDF

Info

Publication number
CN117806305A
CN117806305A CN202211202127.2A CN202211202127A CN117806305A CN 117806305 A CN117806305 A CN 117806305A CN 202211202127 A CN202211202127 A CN 202211202127A CN 117806305 A CN117806305 A CN 117806305A
Authority
CN
China
Prior art keywords
user
voice
information
location
equipment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211202127.2A
Other languages
Chinese (zh)
Inventor
薛景涛
刘昌盛
孟则辉
王文侠
谢宇
严石
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202211202127.2A priority Critical patent/CN117806305A/en
Publication of CN117806305A publication Critical patent/CN117806305A/en
Pending legal-status Critical Current

Links

Landscapes

  • Manipulator (AREA)

Abstract

A device control method applied to an intelligent home scene, the method comprising: acquiring first information, wherein the first information comprises a first position of a user relative to a second device; acquiring a second position according to the first information, wherein the second position is the position where the second equipment is located; the first position and the second position are used for determining the position of the user; and sending the position of the user to the third device so that the third device can move to the position of the user. In the method, the user can call the third equipment to the side only by the position near any equipment with the microphone array, the sound source positioning of the robot body microphone array is not relied on, the perception range is large, and the cross-room level call can be realized.

Description

Equipment control method and related device
Technical Field
The present disclosure relates to the field of robots, and in particular, to a device control method and a related apparatus.
Background
With the development of artificial intelligence technology, more and more intelligent hardware is going into people's life, and in particular, intelligent sound boxes become the center for connecting people and intelligent home/online content. By means of the intelligent sound box, a user can conveniently control environment light, kitchen appliances and even start the sweeping robot.
The home service robot is representative of intelligent interaction and intelligent hardware, and can autonomously move among different rooms by taking a sweeper as representative, so as to avoid obstacles, establish an environment layout map and execute a navigation task based on the position. The current intelligent household solutions mostly take the sweeping robot as common intelligent hardware, and control the sweeping machine to start working, return to charge, end sweeping, time sweeping and the like through a predefined few voice instructions.
The home service robot provides service to the person, and primarily responds to the call of the user, namely the name of the robot shouting by the user, and the robot can autonomously move from anywhere to the front of the user's eyes. MIC sound source localization technology represented by intelligent sound box application is mature, and after a microphone array is configured by some floor sweeping machines, a calling function in a single-room open environment can be realized. Specifically, after a user calls a robot, the robot body microphone array positions the direction of the sound source of the user, the robot navigates to the positioned position, meanwhile, whether a person exists at the position is visually detected, if the person exists, the robot navigates to the front of a target person, and the call is completed.
However, in the prior art, the sound source localization of the robot body microphone array is relied on, the perception range is small, and the calling of the cross-room level cannot be realized.
Disclosure of Invention
The application discloses a device control method which can realize cross-room calling of devices.
In a first aspect, the present application provides an apparatus control method, applied to a first apparatus, the method including: acquiring first information, wherein the first information comprises a first position of a user relative to a second device; the first position is determined according to first voice of the user acquired by the second device, and the first voice indicates a third device to move to an area where the user is located; acquiring a second position according to the first information, wherein the second position is the position where the second equipment is located; the first position and the second position are used for determining the position of the user; and sending the position of the user to the third device.
In some scenarios, the third device is far away from the user or has a barrier to the user, so that the voice of the user cannot be collected by the third device, or the collected voice cannot be subjected to sound source localization due to the fact that the signal intensity is too low. For example, the third device and the user are in different rooms in an indoor environment, or in different locations within the same room that are farther apart, or in different locations within the same room that are obscured by an obstacle.
In this case, the voice command of the user may be collected by other intelligent devices (for example, the second device in the embodiment of the present application) that are closer to the user or have no shielding, an audio sensor (for example, a microphone or the like for collecting audio data) may be disposed on the second device, based on the voice command, the position of the user relative to the second device may be accurately located, and the second device may send the determined relative position between the user and the user to the first device, or directly send the voice for use to the first device, and the first device calculates the relative position between the user and the second device. Further, the first device may determine the absolute position of the user based on the absolute position of the second device known in advance and the position of the user relative to the second device, and send the absolute position of the user to the third device, so that the third device may move to the area where the user is located based on the absolute position of the user. By the mode, the user can call the third equipment to the side only near any equipment with the microphone array, the sound source positioning of the robot body microphone array is not relied on, the perception range is large, and the cross-room level call can be realized.
In one possible implementation, the acquiring the first information includes: receiving the first information sent by the second equipment; or, receiving the first voice sent by the second device, and determining the first information according to the first voice.
In one possible implementation, the second location is determined according to a second voice collected at the target location, the second voice being a voice uttered by the second device; alternatively, the second location is determined from images acquired at the target location for the second device.
In one possible implementation, the second voice or the image is used to determine first identity information, where the first identity information is identity information of the second device; the method further comprises the steps of: establishing a mapping relation between the first identity information and the second position; the first information further includes identity information of the second device, and the obtaining the second location includes: and acquiring a second position according to the identity information of the second equipment and the mapping relation.
In one possible implementation, the mapping relationship between the first identity information and the second location may be represented as an instance-level matrix distribution map. Through the mode, the third device interacts with the second device through voice when exploring the environment, discovers the device, and recognizes identity information such as the device attribute, the category and the like according to the prerecorded response code. And accurately positioning the second equipment according to the visual and auditory multi-mode, and constructing an example-level wheat matrix equipment distribution map.
In one possible implementation, the first location includes: a plurality of first candidate locations determined from the first speech; the location of the user includes: and determining a plurality of second candidate positions of the user according to the first position and the second position.
The sound propagation reflection path is complex in indoor environments, and the sound source localization method can generally give several potential positions (and intensities) of the sound source. In this embodiment of the present application, the first device may send the plurality of candidate user positions to the third device, and the third device may sequentially move to the plurality of candidate user positions, so as to ensure that the candidate user positions may be successfully moved to the vicinity of the user.
In one possible implementation, the method further comprises: and sending a target order to the third device, wherein the target order is used for indicating the third device to sequentially move to part or all of the plurality of second candidate positions according to the target order.
In one possible implementation, the target order relates to at least one of: a traffic path length between a location where the third device is currently located and each of the plurality of second candidate locations; confidence of each first candidate position in the plurality of first candidate positions, the confidence being carried in the first information.
According to the method and the device for searching the sound source position, the sound source position can be calculated to inspire the exploration cost, the robot sorts the potential user areas, preferentially navigates to the areas with high confidence and short distance of the potential user areas, and the moving path cost and time cost can be reduced on the premise that the correct movement to the areas where the users are located is ensured.
In a possible implementation, the first information further includes second identity information, where the second identity information is identity information of the user, and the method further includes:
receiving an image of the environment acquired by the third device when the third device moves to one of the plurality of second candidate positions; transmitting indication information to the third device based on the second identity information and according to successful matching between identity information of the user included in the image, the indication information indicating that the third device has moved to a correct candidate location of the plurality of second candidate locations; or,
receiving a third voice of the user acquired by the third device when the third device moves to one of the plurality of second candidate positions; and sending indication information to the third device based on the second identity information and the successful matching between the user identity information determined according to the third voice, wherein the indication information indicates that the third device has moved to the correct candidate position in the plurality of second candidate positions.
In a second aspect, the present application provides an apparatus control method applied to a third apparatus, the third apparatus including a mobile component, the method including: acquiring first information, wherein the first information comprises a first position of a user relative to a second device; the first position is determined according to first voice of the user acquired by the second device, and the first voice indicates the third device to move to an area where the user is located; acquiring a second position according to the first information, wherein the second position is the position where the second equipment is located; the first position and the second position are used for determining the position of the user; and controlling the moving component to move to the position where the user is located.
In some scenarios, the third device is far away from the user or has a barrier to the user, so that the voice of the user cannot be collected by the third device, or the collected voice cannot be subjected to sound source localization due to the fact that the signal intensity is too low. For example, the third device and the user are in different rooms in an indoor environment, or in different locations within the same room that are farther apart, or in different locations within the same room that are obscured by an obstacle. In this case, the voice command of the user may be acquired by other intelligent devices (for example, the second device in the embodiment of the present application) that are closer to the user or have no shielding, an audio sensor (for example, a microphone or the like for acquiring audio data) may be disposed on the second device, based on the voice command, the position of the user relative to the second device may be accurately located, the second device may send the determined relative position between the user and the second device to the third device, and the third device may determine the absolute position of the user based on the absolute position of the second device known in advance and the position of the user relative to the second device, and move to the area where the user is located based on the absolute position of the user. By the mode, the user can call the third equipment to the side only near any equipment with the microphone array, the sound source positioning of the robot body microphone array is not relied on, the perception range is large, and the cross-room level call can be realized.
In the prior art, a robot usually needs to reach a pre-recorded location coordinate of a user, then search the user nearby randomly, and the efficiency and the success rate of searching the person are low. The application proposes to use the potential area perception of the user by the second device as a priori, so that the third device can be moved to the vicinity of the user more quickly and efficiently.
In one possible implementation, the second location is generated by one of:
collecting second voice at a target position, wherein the second voice is the voice sent by the second equipment; determining the second position according to the second voice and the target position; or acquiring an image of the second device at a target location; and determining the second position according to the image and the target position.
In one possible implementation, the method further comprises: sending out wake-up voice; the wake-up voice is used for waking up the second equipment; the second voice is a response voice sent by the second equipment in response to the wake-up voice.
In one possible implementation, the method further comprises: and sending configuration information to the second device, wherein the configuration information is used for indicating the second device to respond the second voice as the wake-up voice.
It should be understood that the configuration information described above may also be sent by the first device to the second device.
In one possible implementation, the mapping relationship between the first identity information and the second location may be represented as an instance-level matrix distribution map. Through the mode, the third device interacts with the second device through voice when exploring the environment, discovers the device, and recognizes identity information such as the device attribute, the category and the like according to the prerecorded response code. And accurately positioning the second equipment according to the visual and auditory multi-mode, and constructing an example-level wheat matrix equipment distribution map.
In one possible implementation, the second voice or the image is used to determine first identity information, where the first identity information is identity information of the second device; the method further comprises the steps of: establishing a mapping relation between the first identity information and the second position; the first information further includes identity information of the second device, and the obtaining the second location includes: and acquiring a second position according to the identity information of the second equipment and the mapping relation.
In one possible implementation, the first location includes: a plurality of first candidate locations determined from the first speech; the first location and the second location are used for determining a plurality of second candidate locations where the user is located; the moving to the position of the user by controlling the moving component comprises the following steps: and sequentially moving to the plurality of second candidate positions by controlling the moving component according to the target sequence until the correct candidate position in the plurality of candidate positions is moved.
In one possible implementation, the target order relates to at least one of:
a traffic path length between a location where the third device is currently located and each of the plurality of second candidate locations;
confidence of each first candidate position in the plurality of first candidate positions, the confidence being carried in the first information.
In a possible implementation, the first information further includes second identity information, where the second identity information is identity information of the user, and the method further includes:
acquiring an image of the environment when moving to one of the plurality of second candidate locations; determining a correct candidate position to move to among the plurality of second candidate positions based on the second identity information and according to successful matching between the identity information of the user included in the image; or,
collecting a third voice of the user when moving to one of the plurality of second candidate positions; and determining to move to a correct candidate position in the plurality of second candidate positions based on the second identity information and the successful matching between the user identity information determined according to the third voice.
In a third aspect, the present application provides an apparatus control method applied to a second apparatus, the method including:
acquiring a first voice of a user, wherein the first voice indicates a third device to move to an area where the user is located;
transmitting first information to the third device or the first device according to the first voice, wherein the first information comprises a first position of the user relative to the second device; the first location is determined from the first voice.
In one possible implementation, the method further comprises:
collecting wake-up voice sent by the third equipment;
responsive to the wake-up speech, a reply speech is issued.
In one possible implementation, the first location includes: a plurality of first candidate locations is determined from the first speech.
In one possible implementation, the first information further includes: confidence of each first candidate location of the plurality of first candidate locations.
In a fourth aspect, the present application provides an apparatus control device, applied to a first apparatus, the device comprising:
the processing module is used for acquiring first information, wherein the first information comprises a first position of a user relative to the second device; the first position is determined according to first voice of the user acquired by the second device, and the first voice indicates a third device to move to an area where the user is located;
Acquiring a second position according to the first information, wherein the second position is the position where the second equipment is located; the first position and the second position are used for determining the position of the user;
and the sending module is used for sending the position of the user to the third equipment.
In one possible implementation, the processing module is specifically configured to:
receiving the first information sent by the second equipment; or,
and receiving the first voice sent by the second equipment, and determining the first information according to the first voice.
In one possible implementation of the present invention,
the second position is determined according to second voice acquired at the target position, and the second voice is the voice sent by the second equipment; or,
the second location is determined from an image acquired at a target location for the second device.
In one possible implementation, the second voice or the image is used to determine first identity information, where the first identity information is identity information of the second device; the processing module is further configured to:
establishing a mapping relation between the first identity information and the second position;
The processing module is specifically configured to:
and acquiring a second position according to the identity information of the second equipment and the mapping relation.
In one possible implementation, the first location includes: a plurality of first candidate locations determined from the first speech; the location of the user includes: and determining a plurality of second candidate positions of the user according to the first position and the second position.
In one possible implementation, the sending module is further configured to:
and sending a target order to the third device, wherein the target order is used for indicating the third device to sequentially move to part or all of the plurality of second candidate positions according to the target order.
In one possible implementation, the target order relates to at least one of:
a traffic path length between a location where the third device is currently located and each of the plurality of second candidate locations;
confidence of each first candidate position in the plurality of first candidate positions, the confidence being carried in the first information.
In a possible implementation, the first information further includes second identity information, where the second identity information is identity information of the user, and the processing module is further configured to:
Receiving an image of the environment acquired by the third device when the third device moves to one of the plurality of second candidate positions; based on the second identity information and the successful matching between the identity information of the user included in the image, the sending module is further configured to: transmitting, to the third device, indication information indicating that the third device has moved to a correct candidate location of the plurality of second candidate locations; or,
the processing module is further configured to: receiving a third voice of the user acquired by the third device when the third device moves to one of the plurality of second candidate positions; based on the second identity information and the successful matching between the user identity information determined according to the third voice, the sending module is further configured to: and sending indication information to the third device, the indication information indicating that the third device has moved to a correct candidate location of the plurality of second candidate locations.
In a fifth aspect, the present application provides a device control apparatus applied to a third device, the third device including a moving assembly, the apparatus comprising:
The processing module is used for acquiring first information, wherein the first information comprises a first position of a user relative to the second device; the first position is determined according to first voice of the user acquired by the second device, and the first voice indicates the third device to move to an area where the user is located;
acquiring a second position according to the first information, wherein the second position is the position where the second equipment is located; the first position and the second position are used for determining the position of the user;
and the mobile control module is used for moving to the position where the user is by controlling the mobile component.
In one possible implementation, the second location is generated by one of:
collecting second voice at a target position, wherein the second voice is the voice sent by the second equipment; determining the second position according to the second voice and the target position; or,
acquiring an image of the second device at a target location; and determining the second position according to the image and the target position.
In one possible implementation, the apparatus further includes:
The sensor module is used for sending out wake-up voice; the wake-up voice is used for waking up the second equipment;
the second voice is a response voice sent by the second equipment in response to the wake-up voice.
In one possible implementation, the second voice or the image is used to determine first identity information, where the first identity information is identity information of the second device; the processing module is further configured to:
establishing a mapping relation between the first identity information and the second position;
the first information further includes identity information of the second device, and the processing module is specifically configured to:
and acquiring a second position according to the identity information of the second equipment and the mapping relation.
In one possible implementation, the first location includes: a plurality of first candidate locations determined from the first speech; the first location and the second location are used for determining a plurality of second candidate locations where the user is located;
the mobile control module is specifically configured to:
and sequentially moving to the plurality of second candidate positions by controlling the moving component according to the target sequence until the correct candidate position in the plurality of candidate positions is moved.
In one possible implementation, the target order relates to at least one of:
a traffic path length between a location where the third device is currently located and each of the plurality of second candidate locations;
confidence of each first candidate position in the plurality of first candidate positions, the confidence being carried in the first information.
In a possible implementation, the first information further includes second identity information, where the second identity information is identity information of the user, and the sensor module is further configured to:
acquiring an image of the environment when moving to one of the plurality of second candidate locations;
the processing module is further configured to determine, based on the second identity information and based on successful matching between identity information of the user included in the image, a correct candidate position to move to among the plurality of second candidate positions; or,
the sensor module is further configured to: collecting a third voice of the user when moving to one of the plurality of second candidate positions;
the processing module is further configured to determine to move to a correct candidate location of the plurality of second candidate locations based on the second identity information and the successful match between the user identity information determined according to the third voice.
In a sixth aspect, the present application provides an apparatus control device, applied to a second apparatus, the device comprising:
the processing module is used for acquiring first voice of a user, and the first voice indicates that third equipment moves to an area where the user is located;
a sending module, configured to send first information to the third device or the first device according to the first voice, where the first information includes a first location of the user relative to the second device; the first location is determined from the first voice.
In one possible implementation, the apparatus further includes:
the sensor module is used for collecting wake-up voice sent by the third equipment;
responsive to the wake-up speech, a reply speech is issued.
In one possible implementation, the first location includes: a plurality of first candidate locations is determined from the first speech.
In one possible implementation, the first information further includes: confidence of each first candidate location of the plurality of first candidate locations.
The embodiment of the application also provides a device control method applied to the first device, wherein the method comprises the following steps: acquiring first information, wherein the first information comprises a first position of a user relative to a second device; the first position is determined according to first voice of the user acquired by the second device, and the first voice indicates the work of the device of the control target type; the target type of device includes a plurality of devices; acquiring a second position according to the first information, wherein the second position is the position where the second equipment is located; the first position and the second position are used for determining the position of the user; and controlling the work of the third device based on the fact that the position of the user and the position of the third device in the plurality of devices belong to the same area.
In one possible implementation, the belonging to the same area includes: belonging to the same room.
The target type of equipment can be household appliances such as lamps, sound boxes, dish washers, air conditioners and the like.
In some scenarios, a home scenario may include multiple devices of the same type, e.g., multiple lights, multiple air conditioners, etc. When a user needs to wake up or control the operation of one of the devices, if the user does not explicitly specify in the voice which device is in a particular room, the first device cannot confirm which device the user is controlling, and in the prior art, all devices of the same type are often controlled, which results in poor experience of the user.
In this embodiment of the present application, since the location where the user is located when making the voice may be obtained, and the location (for example, the room where each device is located) of a plurality of devices with a device type being a target type may be obtained in advance, a device (or a device belonging to the same area, for example, the same room) where the location is closest to the location of the user may be determined from the devices with a target type, and may be considered as a device that the user wants to control most, and further, the first device may control the operation of the device (the third device), for example, may trigger the third device to be turned on, turned off, and other functions.
A location of each device of the plurality of devices having a device type of the target type is obtained:
in one possible implementation, the location of each device may be user-actively entered, for example, the user may be provided with an interactive interface in which the user may enter the room in which the respective device is located.
In one possible implementation, the location of each device may be determined by a sensor, such as a visual location determination by an image sensor, while the mobile device (e.g., a sweeping robot, etc.) is moving indoors.
It should be appreciated that the specific description of the first device may also refer to the description related to the first aspect, and the description is not repeated here.
In a seventh aspect, embodiments of the present application provide a computing device, which may include a memory, a processor, and a bus system, where the memory is configured to store a program, and the processor is configured to execute the program in the memory, to perform the optional method according to any one of the first aspect, the optional method according to any one of the second aspect, and the optional method according to any one of the third aspect.
In an eighth aspect, embodiments of the present application provide a computer-readable storage medium having a computer program stored therein, which when run on a computer, causes the computer to perform the first aspect and any optional method, the second aspect and any optional method of the third aspect.
In a ninth aspect, embodiments of the present application provide a computer program product comprising code which, when executed, is adapted to carry out the first aspect and any optional method, any optional method of the second aspect, and any optional method of the third aspect.
In a tenth aspect, the present application provides a chip system comprising a processor for supporting a computing device to implement the functions involved in the above aspects, e.g. to transmit or process data involved in the above methods; or, information. In one possible design, the chip system further includes a memory for storing program instructions and data necessary to execute the device or device control means. The chip system can be composed of chips, and can also comprise chips and other discrete devices.
Drawings
FIG. 1A is an application architecture illustration in this application;
FIG. 1B is an application architecture illustration in this application;
FIG. 2 is an application architecture illustration in this application;
FIG. 3 is an application architecture illustration in this application;
FIG. 4 is a flow chart of a device control method of the present application;
FIG. 5a is a schematic representation of a map construction in the present application;
FIG. 5b is a flow chart of a device control method of the present application;
FIG. 5c is a flow chart of a device control method of the present application;
FIG. 5d is a flow chart of a device control method of the present application;
FIG. 5e is a flow chart of a device control method of the present application;
FIG. 5f is a schematic representation of sound source localization in the present application;
FIG. 5g is a schematic illustration of a robot path of movement in the present application;
FIG. 5h is a schematic illustration of a robot path of movement in the present application;
FIG. 6 is a flow chart of a device control method of the present application;
FIG. 7 is a schematic structural view of an apparatus control device of the present application;
FIG. 8 is a schematic structural view of an apparatus control device of the present application;
fig. 9 is a schematic diagram of an execution device according to an embodiment of the present application;
FIG. 10 is a schematic diagram of a training device according to an embodiment of the present application;
fig. 11 is a schematic diagram of a chip according to an embodiment of the present application.
Detailed Description
Specific implementations of the present application are described below by way of example with reference to the accompanying drawings in the embodiments of the present application. Implementations of the present application may also include combinations of these embodiments, such as with other embodiments and with structural changes, without departing from the spirit or scope of the present application. The following detailed description of the embodiments is, therefore, not to be taken in a limiting sense. The terminology used in the examples section of the present application is for the purpose of describing particular embodiments of the present application only and is not intended to be limiting of the present application.
One or more structural components of functions, modules, features, units, etc. referred to in particular embodiments of the present application may be understood as being implemented in any manner by any physical or tangible component (e.g., by software running on a computer device, hardware (e.g., processor or chip implemented logic functions), etc., or any other combination). In some embodiments, the illustrated separation of various devices into distinct modules or units in the figures may reflect the use of corresponding distinct physical and tangible components in an actual implementation. Alternatively, a single module in the drawings of the embodiments of the present application may be implemented by a plurality of actual physical components. Likewise, any two or more modules depicted in the figures may reflect different functions performed by a single actual physical component.
With respect to the method flow diagrams of the embodiments of the present application, certain operations are described as distinct steps performed in a certain order. Such a flowchart is illustrative and not limiting. Some steps described herein may be grouped together and performed in a single operation, may be partitioned into multiple sub-steps, and may be performed in an order different than that shown herein. The various steps illustrated in the flowcharts may be implemented in any manner by any circuit structure or tangible mechanism (e.g., by software running on a computer device, hardware (e.g., processor or chip implemented logic functions), etc., or any combination thereof).
The following description may identify one or more features as "optional. This type of statement should not be construed as an exhaustive indication of features that may be considered optional; that is, although not explicitly identified in the text, other features may be considered optional. Furthermore, any description of a single entity is not intended to exclude the use of a plurality of such entities; similarly, the description of multiple entities is not intended to preclude the use of a single entity. Finally, the term "exemplary" refers to one implementation among potentially many implementations.
The terms first, second and the like in the description and in the claims of the present application and in the above-described figures, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely illustrative of the manner in which the embodiments of the application described herein have been described for objects of the same nature. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Referring to fig. 1A, the embodiment of the present application may be used in an intelligent home scene, for example, may be applied to a home service robot interaction scene, where the intelligent home scene may include a robot body (a third device in the embodiment of the present application may be a robot body) and at least one group of intelligent home microphone array devices (such as a microphone array of an intelligent sound box, a microphone array of a television, and so on, and the second device in the embodiment of the present application may be an intelligent home microphone array device) and an intelligent home system. The robot can communicate with the wheat array equipment through the intelligent home system to perform configuration of the wheat array parameters, receiving and transmitting of the wheat array data and the like. The smart home system may be software (such as mobile phone software) or cloud services running on the terminal device.
Referring to fig. 1B, the embodiment of the present application may be used in an intelligent home scene, for example, may be applied to a home service robot interaction scene, where the intelligent home scene may include a robot body (a third device in the embodiment of the present application may be a robot body) and at least one group of intelligent home microphone array devices (such as a microphone array of an intelligent sound box, a microphone array of a television, and so on), and a control terminal (a first device in the embodiment of the present application) in an intelligent home system. The robot and the wheat array device may be connected to a control terminal, which may control the operation of the robot and the wheat array device through interaction with the wheat array device and the robot. The control terminal can be called a full house intelligent host, an intelligent host and the like.
Referring to fig. 2, fig. 2 shows a schematic configuration of a robot and a wheat-array apparatus.
Wherein, the robot body may include: a mobile system, a navigation system, a microphone array 100, a loudspeaker 101, a visual perception unit, a computing unit, a storage medium. The robot moving system and the navigation system can realize autonomous moving and obstacle avoidance of the robot indoors, autonomous navigation from A to B and environment automatic exploration of composition capability. The visual perception unit can realize the detection of a specific target and calculate the distance between the specific target and the robot, and the microphone array equipment specifically can comprise a microphone array 105 and a loudspeaker 106. It should be understood that the 105 array may be the same array as the 100 array, all disposed on the robot.
The local network shown in fig. 1A may be a local area network, a wide area network switched by a relay (relay) device, or a local area network and a wide area network. When the local network is a local area network, the local network may be a wifi hotspot network, a wifi P2P network, a bluetooth network, a zigbee network, or a near field communication (near field communication, NFC) network, for example. When the local network is a wide area network, the local network may be, for example, a third generation mobile communication technology (3 rd-generation wireless telephone technology, 3G) network, a fourth generation mobile communication technology (the 4th generation mobile communication technology,4G) network, a fifth generation mobile communication technology (5 th-generation mobile communication technology, 5G) network, a future evolution public land mobile network (public land mobile network, PLMN) or the internet, etc.
In one possible implementation, the service robot may be a robot, and fig. 3 depicts a schematic internal structure of the robot, and as shown in fig. 3, the robot includes a processor 401, a driving system 402 including a driver (e.g., a mobile component in an embodiment of the present application), a sensor system 403, a wireless communication system 404, and a memory 405, where the above devices may be connected by one or more communication buses 406.
Wherein the drive system 402 is used to drive the robot to move.
The wireless communication system 404 is used to establish a network connection with the server 20 or the terminal device 10.
Wherein the sensor system 403 comprises a camera 403a for capturing or scanning environmental information. The camera 403a may be a high definition camera with a wide angle lens. Such as a fisheye lens type high definition wide angle camera.
The sensor system 403 may also include a navigation sensor 403b, a motion sensor 403c, a cliff sensor 403d. The navigation sensor 403b is used for calculating the position of the robot in the space and generating a working map of the robot. For example, the navigation sensor 403b may be a dead reckoning sensor, an obstacle detection and avoidance (ODOA) sensor, a positioning and mapping (SLAM) sensor, or the like.
In some embodiments, one or more motion sensors 403c in the sensor system 403 are used to generate signals indicative of the motion of the robot, e.g., the signals of motion may include the distance traveled, the amount of rotation, the speed or acceleration of the robot, etc.
In some examples, one or more cliff sensors 403d in sensor system 403 are used to detect obstacles (e.g., thresholds, stairs, etc.) under the sweeping robot. Processor 401, in response to a signal from cliff sensor 403d, navigates the robot away from the detected obstacle.
It should be noted that, in addition to the respective sensors shown in fig. 3, other types of sensors such as an anti-slip sensor, an infrared anti-collision sensor, and the like may be included in the sensor system 403, which are not illustrated in the embodiments of the present application.
And the processor 401 is used for running a computer program stored in the memory 405 according to the detection result acquired by the sensor system, and controlling the gesture of the robot.
With the development of artificial intelligence technology, more and more intelligent hardware is going into people's life, and in particular, intelligent sound boxes become the center for connecting people and intelligent home/online content. By means of the intelligent sound box, a user can conveniently control environment light, kitchen appliances and even start the sweeping robot.
The home service robot is representative of intelligent interaction and intelligent hardware, and can autonomously move among different rooms by taking a sweeper as representative, so as to avoid obstacles, establish an environment layout map and execute a navigation task based on the position. The current intelligent household solutions mostly take the sweeping robot as common intelligent hardware, and control the sweeping machine to start working, return to charge, end sweeping, time sweeping and the like through a predefined few voice instructions.
The home service robot provides service to the person, and primarily responds to the call of the user, namely the name of the robot shouting by the user, and the robot can autonomously move from anywhere to the front of the user's eyes. MIC sound source localization technology represented by intelligent sound box application is mature, and after a microphone array is configured by some floor sweeping machines, a calling function in a single-room open environment can be realized. Specifically, after a user calls a robot, the robot body microphone array positions the direction of the sound source of the user, the robot navigates to the positioned position, meanwhile, whether a person exists at the position is visually detected, if the person exists, the robot navigates to the front of a target person, and the call is completed.
However, in the prior art, the sound source localization of the robot body microphone array is relied on, the perception range is small, and the calling of the cross-room level cannot be realized.
In order to solve the above-described problems, the present application provides an apparatus control method.
Referring to fig. 4, fig. 4 is a flowchart of an apparatus control method provided in an embodiment of the present application, and as shown in fig. 4, the apparatus control method provided in the embodiment of the present application includes:
401. acquiring first information, wherein the first information comprises a first position of a user relative to a second device; the first position is determined according to first voice of the user acquired by the second device, and the first voice indicates a third device to move to an area where the user is located.
The execution body of step 401 may be a movable robot, that is, a third device described in the embodiments of the present application.
In one possible implementation, the user may issue a voice command to call the third device to move to the vicinity of the user, i.e., to the area in which the user is located. Specifically, taking the third device as a robot and the second device as a microphone array device as an example, the user calls the robot around the microphone array device, for example, the user is in a bedroom at this time, the microphone array device (for example, a sound box) in the bedroom detects the wake-up word of the robot, and the user wakes up, namely, the user is identified to call the robot.
In some scenarios, the third device is far away from the user or has a barrier to the user, so that the voice of the user cannot be collected by the third device, or the collected voice cannot be subjected to sound source localization due to the fact that the signal intensity is too low. For example, the third device and the user are in different rooms in an indoor environment, or in different locations within the same room that are farther apart, or in different locations within the same room that are obscured by an obstacle.
In this case, the voice command of the user may be acquired by a microphone device (for example, the second device in the embodiment of the present application) that is closer to the user or has no shielding, and an audio sensor (for example, a microphone or the like for acquiring audio data) may be disposed on the second device, based on the voice command, the position of the user relative to the second device may be located, and the second device may send the determined relative position between the user and the second device to the third device, and the third device may determine the absolute position of the user based on the absolute position of the second device known in advance and the position of the user relative to the second device, and may move to the area where the user is located based on the absolute position of the user. By the mode, the user can call the third equipment to the side only near any equipment with the microphone array, the sound source positioning of the robot body microphone array is not relied on, the perception range is large, and the cross-room level call can be realized.
In the prior art, a robot usually needs to reach a pre-recorded location coordinate of a user, then search the user nearby randomly, and the efficiency and the success rate of finding people are low. The application provides a potential area perception of a user based on voice data acquired by second equipment as a priori, so that third equipment can be moved to the vicinity of the user more quickly and efficiently.
The above-described flow is described in detail below.
Regarding how to determine the location of the second device:
in one possible implementation, the third device may move to an area near the second device, and locate the location where the second device is located based on the voice sent by the second device (this step may be performed in advance, for example, when the second device is just accessing the smart home system, or periodically or triggered by the user to ensure the accuracy of the location of the second device (the accuracy problem is mainly due to the fact that the second device may be moved, which leads to the failure of the previously pre-stored location information)). Typically, the second device is placed in a fixed location, so that this determined location can be used a number of times later.
Specifically, in one possible implementation, the third device may acquire identity information of the second device, and the third device may acquire voice in the surrounding environment, and determine that the acquired voice is from the second device based on comparing the identity information carried in the voice with the pre-acquired identity information of the second device. Further, based on the speech, an absolute position of the second device is determined by sound source analysis.
In one possible implementation, after the third device accesses the smart home system or a new device accesses the smart home system, the third device may acquire information of each smart device (including the second device) in the smart home system, and may further acquire information such as identity information of the second device.
Taking a third device as an example of a robot, the robot can access an intelligent home system through a network in a scene of an intelligent home, and establish connection with at least one home device (such as the second device and the first device in the embodiment of the application), wherein the intelligent home system comprises but is not limited to terminal software and cloud service, the intelligent home system is connected with a home intelligent terminal through the network and manages the intelligent terminal, and the intelligent home system can read information of the intelligent device in the home, such as a type, a mac identity tag, a 3D appearance model and the like. Therefore, after the third equipment is accessed to the intelligent home network, information of other home equipment can be acquired.
Next, taking the third device as an example of a robot, a specific flow diagram of the third device to acquire information of other home devices is given:
in one possible implementation, the robot may read in-home smart device management information from the smart home system, including but not limited to the following table 1.
TABLE 1
Wherein the mac address is used to represent the identity of the device, and the mac address corresponds one-to-one to the device. The device appearance model includes, but is not limited to, a 3D model of the device, based on which visual inspection can be performed to identify the type of device, and a particular appearance model can also be a set of templates for visual matching. The device type and appearance model may be used to determine whether the device may detect based on vision.
In one possible implementation, the third device may obtain the location of the second device. Wherein the location of the second device may be determined by sound source localization of speech uttered by the second device.
In one possible implementation, the second device needs to be triggered to make a voice, e.g., the third device may make a wake-up voice; the wake-up voice is used for waking up the second equipment; further, the second device may issue a reply voice (i.e., the second voice in the embodiments of the present application) in response to the wake-up voice.
In one possible implementation, the second device may only respond to a preset wake-up word to send out a response voice, and the third device may obtain, in advance, a wake-up word capable of triggering the second device to send out the response voice, or configure the second device through information interaction, so that the second device may respond to the wake-up word configured by the third device to send out the response voice configured by the third device (the response voice configured by the third device may be understood as a response word carried in the response voice configured by the third device).
In one possible implementation, a third device may send configuration information to the second device, the configuration information being used to instruct the second device to respond to the second voice as the wake-up voice.
The third device needs to determine, based on the received response voice, that the device identity carried in the response voice is the second device, and then the second device can be located by using the response voice. The identity information may be voiceprint information in the voice or a response word in the voice, and the identity information of the second device may be obtained in advance for the third device or configured in advance for the second device by the third device.
Referring to fig. 5d, an illustration of parameter configuration of the second device by a third device is described below:
1. the third device detects whether the wake-up words of the intelligent devices (including the second device) are the same through the intelligent home system, and if the wake-up words are different, the same wake-up words are configured for the second device, such as 'small Yin Xiaoyin';
2. the third device encodes a response word for each of the mailer devices, so that the third device can identify the identity of the mailer devices through the response word of the mailer devices, and the response word can be a specific click or a set completely different sentence added after the normal response word.
Optionally, the third device sets different response voiceprint codes for the wheat array, so that even if the response words of the devices are the same, the third device can identify the identity of the wheat array device by comparing the voiceprints of the buying reality;
3. the third equipment is configured to add a wake-up word of the third equipment, match the answer word of the step 2, and the third equipment can be awakened by the answer word of the wheat matrix equipment;
4. the third device configures the voiceprint information of the third device to the wheat array device database, and when the wheat array device is awakened, the wheat array judges whether the interaction object is the third device or not by comparing the voiceprints of the sound source, so that the third device is responded by the preconfigured response word.
In one possible implementation, the location of the second device (i.e., the second location in the embodiments of the present application) may be generated by: collecting second voice at a target position, wherein the second voice is the voice sent by the second equipment; and determining the second position according to the second voice and the target position. For example, the third device may interact with the second device in at least two locations, and calculate the precise location of the second device based on the detected position of the second device in the at least two known locations. It will be appreciated that the position of the second device relative to the third device may be obtained by sound source localization, and that the current absolute position of the second device (i.e. the target position) may also be used in order to accurately locate the absolute position of the second device, which may be an absolute position in the coordinate system of the third device.
In one possible implementation, the location of the second device (i.e., the second location in the embodiments of the present application) may be generated by: acquiring an image including a second device at a target location; the second location is determined by image detection from an image comprising the second device and the target location. For example, the target can be identified according to a prerecorded template of the second device or a detection model trained through an AI network, and further, the target distance can be estimated according to the template or the depth of a target area in a detection frame can be directly obtained according to TOF as the target distance, and the target distance can be used for determining the accurate position of the second device.
That is, the first identity information may be determined through the second voice or the image, the first identity information being the identity information of the second device, and further, a third device may establish a mapping relationship between the first identity information and the second location. I.e. the second location may be known and be known to be the location of the second device.
In one possible implementation, the mapping relationship between the first identity information and the second location may be represented as an instance-level matrix distribution map. Through the mode, the third device interacts with the second device through voice when exploring the environment, discovers the device, and recognizes identity information such as the device attribute, the category and the like according to the prerecorded response code. And accurately positioning the second equipment according to the visual and auditory multi-mode, and constructing an example-level wheat matrix equipment distribution map.
Next, taking a third device as a robot and a second device as a wheat matrix device as an example, introducing a schematic diagram for constructing an example-level wheat matrix distribution map:
in one possible implementation, referring to fig. 5b and 5c, during autonomous exploration, the robot may call the microphone array wake-up name at fixed intervals or according to a map refresh amount, the microphone array device responds to specific information by the self-contained speaker 106, the robot body microphone array 100 receives the information to wake up, and locates the sound source speaker 106. The search of the interactive wheat array device 106 is realized, and hereinafter 106 refers to the equivalent of the wheat array device 105.
Positioning the wheat array equipment. When the robot body wheat array 100 detects a preset response word, the type of the wheat array is judged according to the interaction protocol of the robot and the wheat array, and different space detection and positioning strategies of the wheat array equipment 106 are adopted.
For the wheat array 106 with obvious appearance characteristics, based on the visual detection of the wheat array azimuth, the wheat array azimuth 120 is estimated by adopting a multi-position sound source positioning mode when the visual detection is overtime or the visual characteristics of the wheat array 106 are not obvious. In the wheat array azimuth estimation method 120, specifically, the robot is controlled to obtain the azimuth angle of the wheat array relative to the robot through voice interaction on at least two different azimuths of the wheat array 106, and the position of the wheat array 106 in the map is calculated according to the intersection point of the azimuth lines of the wheat array. When the robot acquisition azimuth is greater than two, the accuracy of estimating the azimuth of the wheat array 106 can be improved by an LSM method.
Specifically, taking the third device as a robot and the second device as a wheat matrix device as an example, the robot can interactively position the specific position of the robot through intelligent hardware with a wheat matrix in the environment in the autonomous moving process, and associate attribute information of the intelligent hardware to construct an example-level wheat matrix distribution map. Wherein, the example-level wheat arrays refer to that each wheat array has the identifiable identity information, and the robot can uniquely confirm the ID of the wheat array through the identity information. The distribution map refers to a device coordinate map formed after the device is marked on the navigation map.
Firstly, the robot can perform autonomous exploration on the environment according to a conventional exploration method, such as front-explorer, best View (NBV), and the like, and the exploration process of the robot synchronously establishes a map of the environment.
The front-explorer is a motion strategy for realizing environment exploration by guiding a robot to search a boundary area. The robot extracts unknown boundaries (shown as (b) in fig. 5 a) according to the refreshing result of the current navigation map (shown as (a) in fig. 5 a), calculates the area to be explored (shown as (c) in fig. 5 a) after filtering processing, and selects the preferential exploration direction according to the distance from the robot. The algorithm is high in efficiency, has good computational complexity in a 2D space, and is widely used for autonomous map building of the planar mobile robot.
The optimal view point method is a motion strategy for realizing environment exploration by guiding a robot to select an optimal observation view angle, and according to an observation information gain model of a task, the optimal view point method is used as an evaluation reference of the optimal view angle, searches a target state with maximized information gain in a robot motion space, and controls the robot to move to reach the state, so that autonomous exploration of the environment is realized. The method is suitable for exploration tasks of robots driven in complex space and high dimensions, such as autonomous construction of 3D environments.
In the autonomous exploration process of the robot, according to the wake-up words of the intelligent home system, the wake-up names of the wheat array are called, for example, the robot sends out a sound of 'small Yin Xiaoyin', at fixed intervals or according to the map refreshing quantity.
Referring to fig. 5e, when the robot matrix detects a preset answer word, for example, the preset answer word of the sound box is "hello manager", and the identity of the matrix is determined according to the interaction protocol of the robot and the matrix. The protocol for judging the identity of the wheat array can be adding a specific statement into the answer words or voice print coding of the answer equipment. And different wheat array positioning strategies are adopted according to the wheat array identity robot. Alternatively, the wheat arrays can be classified into two types according to identity attributes: the wheat array has clear appearance, the size is not lower than the preset volume, and the intelligent home system is internally provided with wheat array model information for visual detection; type two, which does not satisfy any of the conditions of type one, is considered type two.
For the type one wheat array equipment, a visual detection target positioning method is adopted: and detecting the wheat array target in the visual field based on vision. Upon detection of a target, a target frame distance is estimated, the estimation methods including, but not limited to: interpolating and estimating the target distance according to the pixels of the prerecorded wheat matrix template and the distance mapping table; if the robot is equipped with a depth camera (TOF, etc.), the depth of the target area in the detection frame can be directly obtained as the target distance according to the TOF.
For the wheat array of the type II or when the time-out of the wheat array is detected by the type I, estimating the azimuth of the wheat array by adopting a multi-position sound source positioning mode. Specifically, the robot is controlled to obtain the azimuth angle of the wheat array relative to the robot through voice interaction on at least two different azimuths of the wheat array, and the position of the wheat array in the map is calculated according to the intersection point of the azimuth lines of the wheat array. When the robot acquisition azimuth is more than two, the accuracy of the wheat array azimuth estimation can be improved through an LSM method.
At least two azimuths of the robot for sampling the wheat array, wherein the sampling points should meet the preset threshold value of azimuth interval, and if the azimuth angle of the sampling points is gamma, cos (gamma) =is found according to the vector angle formula<cos(θ 00 ),sin(θ 00 )>·<cos(θ 11 ),sin(θ 11 )>Only the cos (gamma) is required to be set smaller than the preset threshold value.
The method comprises the steps of calculating the position of the wheat array according to the intersection point of the azimuth lines of the wheat array, and obtaining the azimuth lines of the sound source in each azimuth, wherein the analysis expression is as follows, and the specific azimuth coordinates of the wheat array equipment on a map can be obtained by combining a plurality of azimuth lines.
Wherein phi is i Positioning a direction angle for a sound source;
x i ,y ii the position coordinates and the orientation of the robot on the navigation map are obtained;
x a ,y a is the wheat matrix coordinates.
In one possible implementation, the second device may determine the location of the user relative to the second device based on the first voice uttered by the user. For example, in a home system, the location may be expressed as which room the user is in and which location in the room. Where it can be described as coarse-grained positioning and where it can be described as fine-grained positioning.
In one possible implementation, the user's location relative to the second device may be detected from the first voice by a sound source localization method, such as, but not limited to, time difference of arrival (time difference of arival, TDOA), energy difference between microphones (inter-microphone intensity difference, IID), and the like.
Specifically, taking the third device as a robot and the second device as a microphone array device as an example, the user calls the robot around the microphone array, for example, the user is in a bedroom at this time, the sound box in the bedroom detects the wake-up word of the robot and is awakened by the user, namely, the user is identified to call. The identity and the position of the wheat matrix are marked in the robot navigation map, and the awakened wheat matrix shows that the user is located in the room, so that the first-level rough positioning of the user at the room level is realized.
Wherein the wheat array device can detect the potential orientation of the user. Referring to fig. 5f, after the array is awakened, the user's orientation is determined by sound source localization. Due to multipath effect of room sound propagation, some voices of the same sound source are directly propagated and received by the microphone array, and some voices are reflected by the wall body to reach the microphone array, so that the microphone array can detect a plurality of potential directions of the sound source. Since the acoustic signals are lost to different extents through different propagation paths, the potential azimuth (a i ) Signal intensity (I) i ) There is a certain positive correlation with the confidence. And merging the sound source azimuth A with the azimuth angle smaller than the preset azimuth deviation threshold, and calculating a confidence interval R of the sound source azimuth according to the signal intensity threshold T, wherein R is an azimuth angle range with the signal intensity higher than T. Calculate each potential sound source position A i At confidence interval R i Average signal strength of (a)Sound source positioning potential azimuth composition set triggered by next callThe user is in these potential areas of the room, thereby enabling a second level of user area level localization.
The wheat array device may send the set of potential orientations N of the user, the identity ID of the wheat array, the wake-up instruction of the user (which may be carried in the first information) to the robot via the smart home system.
The sound propagation reflection path is complex in indoor environments, and the sound source localization method can generally give several potential positions (and intensities) of the sound source. In this embodiment of the present application, the second device may send the plurality of candidate user positions to the third device, and the third device may sequentially move to the plurality of candidate user positions, so as to ensure that the user may successfully move to the vicinity of the user.
402. Acquiring a second position according to the first information, wherein the second position is the position where the second equipment is located; the first location and the second location are used to determine a location where the user is located.
In a possible implementation, the first information may include identity information of the second device, and further the third device may obtain the second location according to the identity information of the second device and the mapping relationship, which may be specifically referred to the above embodiment and will not be described herein.
403. And controlling the moving component to move to the position where the user is located.
In one possible implementation, the first information may include a plurality of candidate user locations, and the third device may determine the plurality of candidate user locations and sequentially move to a plurality of second candidate locations in the target order by controlling the moving component until moving to a correct candidate location among the plurality of candidate locations.
In one possible implementation, the target order relates to at least one of: a traffic path length between a location where the third device is currently located and each of the plurality of second candidate locations; confidence of each first candidate position in the plurality of first candidate positions, the confidence being carried in the first information.
According to the method and the device for searching the sound source position, the sound source position can be calculated to inspire the exploration cost, the robot sorts the potential user areas, preferentially navigates to the areas with high confidence and short distance of the potential user areas, and the moving path cost and time cost can be reduced on the premise that the correct movement to the areas where the users are located is ensured.
Taking the third device as an example, referring to fig. 5g and 5h, the robot may sample the passable area of the room where the wheat array device is located on the navigation map according to the user positioning information, and the sampling method may be a probability random tree (Probabilistic Roadmap PRM), a fast random number (Rapid random Tree RRT), and the like. For example, according to the PRM algorithm, the robot randomly samples child nodes in a room where a user is located on a navigation map, and connects adjacent non-shielding nodes to obtain a reachable topology node tree of the room. All nodes are denoted as p= { P 1 (x,y)…,p n (x, y) }, further combined with user potential areasFor region M i The nodes in the user potential area are averaged by coordinates to obtain the azimuth center G of the user potential area i (x,y)。
Robot calculation to reach G from current position i Is the shortest path g of (2) i Taking the potential area of the user as an exploration heuristic h i The heuristic cost is related to the area azimuth intensity and confidence interval, namelyExploring total cost C for sound source azimuth heuristic i =g i +h i . And in turn navigate to the user's potential bearing. Sequencing the regions in the user potential azimuth set N according to the total cost of heuristic exploration of the azimuth of the sound source, and sequentially navigating to G i Nearby. As shown in the following diagram, when C 2 <C 1 When the robot reaches G 2 Is lower than G 1 Thus robot prioritizes reaching G 2 The path ψ of a point 1 Robot arrives at G 2 Later, if the vision fails to detect the user, the robot re-plans to reach G 1 Is guided by the robot to reach G 1 The user is then detected again.
In a possible implementation, the first information further includes second identity information, where the second identity information is identity information of the user, and the third device may acquire an image of an environment when moving to one of the plurality of second candidate locations; determining a correct candidate position to move to among the plurality of second candidate positions based on the second identity information and according to successful matching between the identity information of the user included in the image.
That is, the third device arrives at G i After the vicinity, visual human detection is started. The process of detecting the human body comprises the steps of carrying out identity recognition on a specific person, specifically, the corresponding relation between the voiceprint ID and the face ID of the user is stored in third equipment or an intelligent home system, and when the user shouts the robot, the microphone array synchronously records and sends second identity information (such as the voiceprint ID) of the user to the robot. After the robot detects the user in the potential area, the identity information (for example, the face ID of the user) of the user included in the image is matched with the stored voiceprint of the calling person, if the matching is successful, the robot considers that the target is found, and navigates to a specific position away from the user, so that the calling response is completed.
In one possible implementation, the third device may collect a third voice of the user while moving to one of the plurality of second candidate locations; and determining to move to a correct candidate position in the plurality of second candidate positions based on the second identity information and the successful matching between the user identity information determined according to the third voice.
In one possible implementation, the third device may collect a third voice of the user while moving to one of the plurality of second candidate locations; and determining to move to a correct candidate position in the plurality of second candidate positions based on the second identity information and the successful matching between the user identity information determined according to the third voice.
By the method, the robot calculates the exploration cost according to the potential area of the user and the sound source positioning confidence and guides the priority of navigation and search of the robot. The robot can search the close range area and clear sound source, and the robot can avoid searching in the invalid area and find the target fast after reaching the room of the user.
In the embodiment, the sound source position which is larger than the intensity threshold is collected according to the robot body wheat array to serve as the potential position of the user, and the search cost is calculated according to the navigation distance and the confidence intensity. In the prior art, only the maximum intensity azimuth is selected, a fixed threshold is set at intervals, a target navigation point is selected for exploration, and when the environment is complex and sound reverberation and multipath noise cause a large sound source positioning error, a target user cannot be found in the direction. The embodiment can improve the calling success rate of the robot in a complex environment, reduce the random search in an invalid area and avoid the potential user area.
Next, an apparatus control method provided in an embodiment of the present application is described from a first apparatus side, and referring to fig. 6, fig. 6 is a flowchart of an apparatus control method provided in an embodiment of the present application, including:
601. Acquiring first information, wherein the first information comprises a first position of a user relative to a second device; the first position is determined according to first voice of the user acquired by the second device, and the first voice indicates a third device to move to an area where the user is located.
The execution body of step 601 may be a control terminal, that is, a first device, and the first device may be a terminal device or a cloud server.
In one possible implementation, the user may issue a voice command to call the third device to move to the vicinity of the user, i.e., to the area in which the user is located. Specifically, taking the third device as a robot and the second device as a microphone array device as an example, a user calls the robot around the microphone array device, for example, the user is in a bedroom at the moment, a sound box in the bedroom detects a wake-up word of the robot and is awakened by the user, namely, the user is identified to call.
In some scenarios, the third device is far away from the user or has a barrier to the user, so that the voice of the user cannot be collected by the third device, or the collected voice cannot be subjected to sound source localization due to the fact that the signal intensity is too low. For example, the third device and the user are in different rooms in an indoor environment, or in different locations within the same room that are farther apart, or in different locations within the same room that are obscured by an obstacle.
In this case, the voice command of the user may be collected by other intelligent devices (for example, the second device in the embodiment of the present application) that are closer to the user or have no shielding, an audio sensor (for example, a microphone or the like for collecting audio data) may be disposed on the second device, based on the voice command, the position of the user relative to the second device may be accurately located, and the second device may send the determined relative position between the user and the user to the first device, or directly send the voice for use to the first device, and the first device calculates the relative position between the user and the second device. Further, the first device may determine the absolute position of the user based on the absolute position of the second device known in advance and the position of the user relative to the second device, and send the absolute position of the user to the third device, so that the third device may move to the area where the user is located based on the absolute position of the user. By the mode, the user can call the third equipment to the side only near any equipment with the microphone array, the sound source positioning of the robot body microphone array is not relied on, the perception range is large, and the cross-room level call can be realized.
The above-described flow is described in detail below.
In one possible implementation, to know the location of the second device, the third device may be moved to an area in the vicinity of the second device and locate the location of the second device based on the voice uttered by the second device (this step is performed in advance), and typically the second device is placed in a fixed location so that the determined location may be used a number of times later. Specifically, in one implementation, the third device may acquire identity information of the second device, and the third device may acquire voice in the surrounding environment, and determine that the acquired voice is from the second device based on comparing the identity information carried in the voice with the pre-acquired identity information of the second device. Further, based on the speech, an absolute position of the second device is determined by sound source analysis. Alternatively, the third device may send the collected voice to the first device, which may determine the absolute position of the second device by sound source analysis based on the voice.
In one possible implementation, after the first device accesses the smart home system or a new device accesses the smart home system, the first device may acquire information of each smart device (including the second device) in the smart home system, and may further acquire information such as identity information of the second device.
The first device may access the smart home system through the network, and establish a connection with at least one home device (for example, the second device and the third device in the embodiment of the present application), where the smart home system includes, but is not limited to, terminal software, cloud service, and the smart home system is connected with a home smart terminal through the network and manages the smart terminal, and the smart home system may read information of the smart device in the home, such as a type, a mac identity tag, a 3D appearance model, and so on. Therefore, after the first equipment is accessed to the intelligent home network, information of other home equipment can be acquired.
In one possible implementation, the first device may read smart device management information in the home from the smart home system, including but not limited to those shown in table 1.
In one possible implementation, the third device may need to trigger the second device to speak, e.g., the third device may speak a wake-up voice; the wake-up voice is used for waking up the second equipment; further, the second device may issue a reply voice (i.e., the second voice in the embodiments of the present application) in response to the wake-up voice.
In one possible implementation, the second device may only respond to a preset wake-up word to send out a response voice, and the third device may obtain, in advance, a wake-up word capable of triggering the second device to send out the response voice, or configure the second device through information interaction (an execution body of the configuration process may be the first device or the third device), so that the second device may respond to the wake-up word configured by the third device to send out the response voice configured by the third device (the response voice configured by the third device may be understood as a response word carried in the response voice configured by the third device).
In one possible implementation, a first device may send configuration information to the second device, the configuration information being used to instruct the second device to respond to the second voice as the wake-up voice.
In one possible implementation, the third device may send the second voice to the first device, from which the first device may determine the location of the second device.
The first device needs to determine that the device identity carried in the response voice is the second device based on the received response voice, and then the second device can be positioned by using the response voice. The identity information may be voiceprint information in the voice or a response word in the voice, and the identity information of the second device may be obtained in advance for the first device or configured in advance for the second device by the first device.
In one possible implementation, the first device may determine that the second voice or image corresponds to the second device based on the acquired second voice or the acquired image. For example, the target can be identified according to a prerecorded template of the second device or a detection model trained through an AI network, and further, the target distance can be estimated according to the template or the depth of a target area in a detection frame can be directly obtained according to TOF as the target distance, and the target distance can be used for determining the accurate position of the second device.
That is, the first identity information may be determined by the second voice or the image, where the first identity information is the identity information of the second device, and further, the first device may establish a mapping relationship between the first identity information and the second location. I.e. the second location may be known and be known to be the location of the second device.
In one possible implementation, the mapping relationship between the first identity information and the second location may be represented as an instance-level matrix distribution map. Through the mode, when the third device explores the environment, the third device interacts with the second device through voice, discovers the device, and recognizes identity information such as the attribute, the category and the like of the device according to the prerecorded response code. And accurately positioning the second equipment according to the visual and auditory multi-mode, and constructing an example-level wheat matrix equipment distribution map.
602. Acquiring a second position according to the first information, wherein the second position is the position where the second equipment is located; the first position and the second position are used for determining the position of the user;
in one possible implementation, the first device may determine the location of the user relative to the second device based on the first voice, or receive the location of the user relative to the second device determined by the second device based on the first voice.
603. And sending the position of the user to the third device.
In one possible implementation, the first location includes: a plurality of first candidate locations determined from the first speech; the location of the user includes: and determining a plurality of second candidate positions of the user according to the first position and the second position.
The sound propagation reflection path is complex in indoor environments, and the sound source localization method can generally give several potential positions (and intensities) of the sound source. In this embodiment of the present application, the first device may send the plurality of candidate user positions to the third device, and the third device may sequentially move to the plurality of candidate user positions, so as to ensure that the candidate user positions may be successfully moved to the vicinity of the user.
In one possible implementation, the first device may further send a target order to the third device, where the target order is used to instruct the third device to sequentially move to some or all of the plurality of second candidate locations where the third device is located in the target order.
In one possible implementation, the target order relates to at least one of: a traffic path length between a location where the third device is currently located and each of the plurality of second candidate locations; confidence of each first candidate position in the plurality of first candidate positions, the confidence being carried in the first information.
According to the method and the device for searching the sound source position, the sound source position can be calculated to inspire the exploration cost, the robot sorts the potential user areas, preferentially navigates to the areas with high confidence and short distance of the potential user areas, and the moving path cost and time cost can be reduced on the premise that the correct movement to the areas where the users are located is ensured.
In a possible implementation, the first information further includes second identity information, where the second identity information is identity information of the user, and the first device may further receive an image of an environment where the third device is acquired when moving to one of the plurality of second candidate locations; transmitting indication information to the third device based on the second identity information and according to successful matching between identity information of the user included in the image, the indication information indicating that the third device has moved to a correct candidate location of the plurality of second candidate locations; or, receiving a third voice of the user acquired by the third device when the third device moves to one of the plurality of second candidate positions; and sending indication information to the third device based on the second identity information and the successful matching between the user identity information determined according to the third voice, wherein the indication information indicates that the third device has moved to the correct candidate position in the plurality of second candidate positions.
The application provides a device control method, which is applied to first devices, and comprises the following steps: acquiring first information, wherein the first information comprises a first position of a user relative to a second device; the first position is determined according to first voice of the user acquired by the second device, and the first voice indicates a third device to move to an area where the user is located; acquiring a second position according to the first information, wherein the second position is the position where the second equipment is located; the first position and the second position are used for determining the position of the user; and sending the position of the user to the third device.
In some scenarios, the third device is far away from the user or has a barrier to the user, so that the voice of the user cannot be collected by the third device, or the collected voice cannot be subjected to sound source localization due to the fact that the signal intensity is too low. For example, the third device and the user are in different rooms in an indoor environment, or in different locations within the same room that are farther apart, or in different locations within the same room that are obscured by an obstacle.
In this case, the voice command of the user may be collected by other intelligent devices (for example, the second device in the embodiment of the present application) that are closer to the user or have no shielding, an audio sensor (for example, a microphone or the like for collecting audio data) may be disposed on the second device, based on the voice command, the position of the user relative to the second device may be accurately located, and the second device may send the determined relative position between the user and the user to the first device, or directly send the voice for use to the first device, and the first device calculates the relative position between the user and the second device. Further, the first device may determine the absolute position of the user based on the absolute position of the second device known in advance and the position of the user relative to the second device, and send the absolute position of the user to the third device, so that the third device may move to the area where the user is located based on the absolute position of the user. By the mode, the user can call the third equipment to the side only near any equipment with the microphone array, the sound source positioning of the robot body microphone array is not relied on, the perception range is large, and the cross-room level call can be realized.
In addition, the embodiment of the application also provides a device control method applied to the second device, wherein the method comprises the following steps: acquiring a first voice of a user, wherein the first voice indicates a third device to move to an area where the user is located; transmitting first information to the third device or the first device according to the first voice, wherein the first information comprises a first position of the user relative to the second device; the first location is determined from the first voice.
In one possible implementation, the second device may collect wake-up speech sent by the third device; responsive to the wake-up speech, a reply speech is issued.
In one possible implementation, the first location includes: a plurality of first candidate locations is determined from the first speech.
In one possible implementation, the first information further includes: confidence of each first candidate location of the plurality of first candidate locations.
The embodiment of the application also provides a device control method applied to the first device, wherein the method comprises the following steps: acquiring first information, wherein the first information comprises a first position of a user relative to a second device; the first position is determined according to first voice of the user acquired by the second device, and the first voice indicates the work of the device of the control target type; the target type of device includes a plurality of devices; acquiring a second position according to the first information, wherein the second position is the position where the second equipment is located; the first position and the second position are used for determining the position of the user; and controlling the work of the third device based on the fact that the position of the user and the position of the third device in the plurality of devices belong to the same area.
In one possible implementation, the belonging to the same area includes: belonging to the same room.
The target type of equipment can be household appliances such as lamps, sound boxes, dish washers, air conditioners and the like.
In some scenarios, a home scenario may include multiple devices of the same type, e.g., multiple lights, multiple air conditioners, etc. When a user needs to wake up or control the operation of one of the devices, if the user does not explicitly specify in the voice which device is in a particular room, the first device cannot confirm which device the user is controlling, and in the prior art, all devices of the same type are often controlled, which results in poor experience of the user.
In this embodiment of the present application, since the location where the user is located when making the voice may be obtained, and the location (for example, the room where each device is located) of a plurality of devices with a device type being a target type may be obtained in advance, a device (or a device belonging to the same area, for example, the same room) where the location is closest to the location of the user may be determined from the devices with a target type, and may be considered as a device that the user wants to control most, and further, the first device may control the operation of the device (the third device), for example, may trigger the third device to be turned on, turned off, and other functions.
A location of each device of the plurality of devices having a device type of the target type is obtained:
in one possible implementation, the location of each device may be user-actively entered, for example, the user may be provided with an interactive interface in which the user may enter the room in which the respective device is located.
In one possible implementation, the location of each device may be determined by a sensor, such as a visual location determination by an image sensor, while the mobile device (e.g., a sweeping robot, etc.) is moving indoors.
Referring to fig. 7, fig. 7 is a schematic structural diagram of an apparatus control device provided in an embodiment of the present application, which is applied to a first apparatus, the apparatus 700 includes:
a processing module 701, configured to obtain first information, where the first information includes a first position of a user relative to a second device; the first position is determined according to first voice of the user acquired by the second device, and the first voice indicates a third device to move to an area where the user is located;
the specific description of the processing module 701 may refer to the descriptions of step 601 and step 602 in the above embodiments, which are not repeated here.
Acquiring a second position according to the first information, wherein the second position is the position where the second equipment is located; the first position and the second position are used for determining the position of the user;
and the sending module 702 is configured to send, to the third device, a location where the user is located.
The specific description of the transmitting module 702 may refer to the description of step 603 in the foregoing embodiment, which is not repeated herein.
In one possible implementation, the processing module is specifically configured to:
receiving the first information sent by the second equipment; or,
and receiving the first voice sent by the second equipment, and determining the first information according to the first voice.
In one possible implementation of the present invention,
the second position is determined according to second voice acquired at the target position, and the second voice is the voice sent by the second equipment; or,
the second location is determined from an image acquired at a target location for the second device.
In one possible implementation, the second voice or the image is used to determine first identity information, where the first identity information is identity information of the second device; the processing module is further configured to:
Establishing a mapping relation between the first identity information and the second position;
the processing module is specifically configured to:
and acquiring a second position according to the identity information of the second equipment and the mapping relation.
In one possible implementation, the first location includes: a plurality of first candidate locations determined from the first speech; the location of the user includes: and determining a plurality of second candidate positions of the user according to the first position and the second position.
In one possible implementation, the sending module is further configured to:
and sending a target order to the third device, wherein the target order is used for indicating the third device to sequentially move to part or all of the plurality of second candidate positions according to the target order.
In one possible implementation, the target order relates to at least one of:
a traffic path length between a location where the third device is currently located and each of the plurality of second candidate locations;
confidence of each first candidate position in the plurality of first candidate positions, the confidence being carried in the first information.
In a possible implementation, the first information further includes second identity information, where the second identity information is identity information of the user, and the processing module is further configured to:
receiving an image of the environment acquired by the third device when the third device moves to one of the plurality of second candidate positions; based on the second identity information and the successful matching between the identity information of the user included in the image, the sending module is further configured to: transmitting, to the third device, indication information indicating that the third device has moved to a correct candidate location of the plurality of second candidate locations; or,
the processing module is further configured to: receiving a third voice of the user acquired by the third device when the third device moves to one of the plurality of second candidate positions; based on the second identity information and the successful matching between the user identity information determined according to the third voice, the sending module is further configured to: and sending indication information to the third device, the indication information indicating that the third device has moved to a correct candidate location of the plurality of second candidate locations.
Referring to fig. 8, fig. 8 is a schematic structural diagram of a device control apparatus provided in an embodiment of the present application, and the apparatus 800 is applied to a third device, and includes:
a processing module 801, configured to obtain first information, where the first information includes a first position of a user relative to a second device; the first position is determined according to first voice of the user acquired by the second device, and the first voice indicates the third device to move to an area where the user is located;
acquiring a second position according to the first information, wherein the second position is the position where the second equipment is located; the first position and the second position are used for determining the position of the user;
the specific description of the processing module 801 may refer to the descriptions of the step 401 and the step 402 in the foregoing embodiments, which are not repeated herein.
A movement control module 802, configured to move to a location where the user is located by controlling the movement component.
The specific description of the mobile control module 802 may refer to the description of step 403 in the above embodiment, which is not repeated here.
In one possible implementation, the second location is generated by one of:
Collecting second voice at a target position, wherein the second voice is the voice sent by the second equipment; determining the second position according to the second voice and the target position; or,
acquiring an image of the second device at a target location; and determining the second position according to the image and the target position.
In one possible implementation, the apparatus further includes:
the sensor module is used for sending out wake-up voice; the wake-up voice is used for waking up the second equipment;
the second voice is a response voice sent by the second equipment in response to the wake-up voice.
In one possible implementation, the second voice or the image is used to determine first identity information, where the first identity information is identity information of the second device; the processing module is further configured to:
establishing a mapping relation between the first identity information and the second position;
the first information further includes identity information of the second device, and the processing module is specifically configured to:
and acquiring a second position according to the identity information of the second equipment and the mapping relation.
In one possible implementation, the first location includes: a plurality of first candidate locations determined from the first speech; the first location and the second location are used for determining a plurality of second candidate locations where the user is located;
The mobile control module is specifically configured to:
and sequentially moving to the plurality of second candidate positions by controlling the moving component according to the target sequence until the correct candidate position in the plurality of candidate positions is moved.
In one possible implementation, the target order relates to at least one of:
a traffic path length between a location where the third device is currently located and each of the plurality of second candidate locations;
confidence of each first candidate position in the plurality of first candidate positions, the confidence being carried in the first information.
In a possible implementation, the first information further includes second identity information, where the second identity information is identity information of the user, and the sensor module is further configured to:
acquiring an image of the environment when moving to one of the plurality of second candidate locations;
the processing module is further configured to determine, based on the second identity information and based on successful matching between identity information of the user included in the image, a correct candidate position to move to among the plurality of second candidate positions; or,
The sensor module is further configured to: collecting a third voice of the user when moving to one of the plurality of second candidate positions;
the processing module is further configured to determine to move to a correct candidate location of the plurality of second candidate locations based on the second identity information and the successful match between the user identity information determined according to the third voice.
In addition, the embodiment of the application also provides a device control device, which is applied to the second device, and the device comprises:
the processing module is used for acquiring first voice of a user, and the first voice indicates that third equipment moves to an area where the user is located;
a sending module, configured to send first information to the third device or the first device according to the first voice, where the first information includes a first location of the user relative to the second device; the first location is determined from the first voice.
In one possible implementation, the apparatus further includes:
the sensor module is used for collecting wake-up voice sent by the third equipment;
responsive to the wake-up speech, a reply speech is issued.
In one possible implementation, the first location includes: a plurality of first candidate locations is determined from the first speech.
In one possible implementation, the first information further includes: confidence of each first candidate location of the plurality of first candidate locations.
Next, referring to fig. 9, fig. 9 is a schematic structural diagram of an execution device provided in the embodiment of the present application, and the execution device 900 may specifically be represented by the first device, the second device, or the third device described above, which is not limited herein. Wherein, the execution device 900 implements the functions of the device control method in the corresponding embodiment of fig. 6. Specifically, the execution device 900 includes: a receiver 901, a transmitter 902, a processor 903, and a memory 904 (where the number of processors 903 in the execution device 900 may be one or more), where the processor 903 may include an application processor 9031 and a communication processor 9032. In some embodiments of the present application, the receiver 901, transmitter 902, processor 903, and memory 904 may be connected by a bus or other means.
Memory 904 may include read-only memory and random access memory, and provides instructions and data to the processor 903. A portion of the memory 904 may also include non-volatile random access memory (NVRAM). The memory 904 stores a processor and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, wherein the operating instructions may include various operating instructions for implementing various operations.
The processor 903 controls the operation of the execution device. In a specific application, the individual components of the execution device are coupled together by a bus system, which may include, in addition to a data bus, a power bus, a control bus, a status signal bus, etc. For clarity of illustration, however, the various buses are referred to in the figures as bus systems.
The methods disclosed in the embodiments of the present application may be applied to the processor 903 or implemented by the processor 903. The processor 903 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuitry of hardware in the processor 903 or instructions in the form of software. The processor 903 may be a general-purpose processor, a digital signal processor (digital signal processing, DSP), a microprocessor or microcontroller, a visual processor (vision processing unit, VPU), a tensor processor (tensor processing unit, TPU), or the like, which is suitable for AI operation, and may further include an application specific integrated circuit (application specific integrated circuit, ASIC), a field-programmable gate array (field-programmable gate array, FPGA), or other programmable logic device, discrete gate or transistor logic device, or discrete hardware components. The processor 903 may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present application. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in hardware, in a decoded processor, or in a combination of hardware and software modules in a decoded processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 904, and the processor 903 reads the information in the memory 904, and combines the hardware to perform the steps 501 to 503 in the above embodiment.
The receiver 901 may be used to receive input numeric or character information and to generate signal inputs related to performing relevant settings and function control of the device. The transmitter 902 is operable to output numeric or character information via a first interface; the transmitter 902 is further operable to send instructions to the disk stack via the first interface to modify data in the disk stack; the transmitter 902 may also include a display device such as a display screen.
Referring to fig. 10, fig. 10 is a schematic structural diagram of the device control apparatus provided in the embodiment of the present application, specifically, the device control apparatus 1000 is implemented by one or more servers, where the device control apparatus 1000 may have relatively large differences due to different configurations or performances, and may include one or more central processing units (central processing units, CPU) 1010 (e.g., one or more processors) and a memory 1032, and one or more storage media 1030 (e.g., one or more mass storage devices) storing application programs 1042 or data 1044. Wherein memory 1032 and storage medium 1030 may be transitory or persistent. The program stored in the storage medium 1030 may include one or more modules (not shown), each of which may include a series of instruction operations in the device control apparatus. Still further, the central processor 1010 may be configured to communicate with a storage medium 1030, and execute a series of instruction operations in the storage medium 1030 on the device control apparatus 1000.
The device control apparatus 1000 may also include one or more power supplies 1026, one or more wired or wireless network interfaces 1050, one or more input/output interfaces 1058; or, one or more operating systems 1041, such as Windows ServerTM, mac OS XTM, unixTM, linuxTM, freeBSDTM, etc.
Specifically, the device control apparatus may be the first device described in the above embodiment, so as to perform the steps related to the device control method in the above embodiment.
There is also provided in an embodiment of the present application a computer program product comprising computer readable instructions which, when run on a computer, cause the computer to perform the steps performed by the apparatus as described above, or cause the computer to perform the steps performed by the apparatus control device as described above.
The embodiment of the present application also provides a computer-readable storage medium in which a program for performing signal processing is stored, which when run on a computer, causes the computer to perform the steps performed by the aforementioned execution apparatus or causes the computer to perform the steps performed by the aforementioned apparatus control device.
The execution device, the device control device or the terminal device provided in the embodiment of the present application may be a chip, where the chip includes: a processing unit, which may be, for example, a processor, and a communication unit, which may be, for example, an input/output interface, pins or circuitry, etc. The processing unit may execute the computer-executable instructions stored in the storage unit to cause the chip in the execution device to perform the model training method described in the above embodiment, or to cause the chip in the device control apparatus to perform the steps related to model training in the above embodiment. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, etc., and the storage unit may also be a storage unit in the wireless access device side located outside the chip, such as a read-only memory (ROM) or other type of static storage device that may store static information and instructions, a random access memory (random access memory, RAM), etc.
Specifically, referring to fig. 11, fig. 11 is a schematic structural diagram of a chip provided in an embodiment of the present application, where the chip may be represented as a neural network processor NPU 1100, and the NPU 1100 is mounted as a coprocessor on a main CPU (Host CPU), and the Host CPU distributes tasks. The core part of the NPU is an arithmetic circuit 1103, and the controller 1104 controls the arithmetic circuit 1103 to extract matrix data in the memory and perform multiplication. The chip may be the first device, the second device or the third device described in the above embodiments, so as to perform the steps related to the device control method in the above embodiments.
In some implementations, the arithmetic circuit 1103 includes a plurality of processing units (PEs) inside. In some implementations, the operational circuit 1103 is a two-dimensional systolic array. The arithmetic circuit 1103 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 1103 is a general purpose matrix processor.
For example, assume that there is an input matrix a, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to the matrix B from the weight memory 1102 and buffers the data on each PE in the arithmetic circuit. The arithmetic circuit takes matrix a data from the input memory 1101 and performs matrix operation with matrix B, and the obtained partial result or final result of the matrix is stored in an accumulator (accumulator) 1108.
The unified memory 1106 is used for storing input data and output data. The weight data is directly transferred to the weight memory 1102 through the memory cell access controller (Direct Memory Access Controller, DMAC) 1105. The input data is also carried into the unified memory 1106 through the DMAC.
BIU is Bus Interface Unit, bus interface unit 1110, for the AXI bus to interact with the DMAC and finger memory (Instruction Fetch Buffer, IFB) 1109.
The bus interface unit 1110 (Bus Interface Unit, abbreviated as BIU) is configured to fetch the instruction from the external memory by the instruction fetch memory 1109, and is further configured to fetch the raw data of the input matrix a or the weight matrix B from the external memory by the memory unit access controller 1105.
The DMAC is mainly used to transfer input data in the external memory DDR to the unified memory 1106 or to transfer weight data to the weight memory 1102 or to transfer input data to the input memory 1101.
The vector calculation unit 1107 includes a plurality of operation processing units, and further processes the output of the operation circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like, as needed. The method is mainly used for non-convolution/full-connection layer network calculation in the neural network, such as Batch Normalization (batch normalization), pixel-level summation, up-sampling of a characteristic plane and the like.
In some implementations, the vector computation unit 1107 can store the vector of processed outputs to the unified memory 1106. For example, the vector calculation unit 1107 may perform a linear function; alternatively, a nonlinear function is applied to the output of the arithmetic circuit 1103, such as linear interpolation of the feature planes extracted by the convolutional layer, and then such as a vector of accumulated values, to generate the activation value. In some implementations, the vector calculation unit 1107 generates normalized values, pixel-level summed values, or both. In some implementations, the vector of processed outputs can be used as an activation input to the operational circuitry 1103, e.g., for use in subsequent layers in a neural network.
An instruction fetch memory (instruction fetch buffer) 1109 connected to the controller 1104 for storing instructions used by the controller 1104;
the unified memory 1106, the input memory 1101, the weight memory 1102 and the finger memory 1109 are all On-Chip memories. The external memory is proprietary to the NPU hardware architecture.
The processor mentioned in any of the above may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the above-mentioned programs.
It should be further noted that the above-described apparatus embodiments are merely illustrative, and that the units described as separate units may or may not be physically separate, and that units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the device provided by the application, the connection relation between the modules represents that the modules have communication connection therebetween, and can be specifically implemented as one or more communication buses or signal lines.
From the above description of the embodiments, it will be apparent to those skilled in the art that the present application may be implemented by means of software plus necessary general purpose hardware, or of course may be implemented by dedicated hardware including application specific integrated circuits, dedicated CPUs, dedicated memories, dedicated components and the like. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions can be varied, such as analog circuits, digital circuits, or dedicated circuits. However, a software program implementation is a preferred embodiment in many cases for the present application. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk or an optical disk of a computer, etc., including several instructions for causing a computer device (which may be a personal computer, a device control apparatus, or a network device, etc.) to perform the method described in the embodiments of the present application.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.
The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, device control, or data center to another website, computer, device control, or data center by a wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a data center, a device control apparatus, or the like that includes one or more integration of the available media. The usable medium may be a magnetic medium (e.g., a floppy Disk, a hard Disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.

Claims (34)

1. A device control method, applied to a first device, the method comprising:
acquiring first information, wherein the first information comprises a first position of a user relative to a second device; the first position is determined according to first voice of the user acquired by the second device, and the first voice indicates a third device to move to an area where the user is located;
acquiring a second position according to the first information, wherein the second position is the position where the second equipment is located; the first position and the second position are used for determining the position of the user;
and sending the position of the user to the third device.
2. The method of claim 1, wherein the obtaining the first information comprises:
receiving the first information sent by the second equipment; or,
and receiving the first voice sent by the second equipment, and determining the first information according to the first voice.
3. A method according to claim 1 or 2, characterized in that,
the second position is determined according to second voice acquired at the target position, and the second voice is the voice sent by the second equipment; or,
The second location is determined from an image acquired at a target location for the second device.
4. A method according to claim 3, wherein the second voice or the image is used to determine first identity information, the first identity information being identity information of the second device; the method further comprises the steps of:
establishing a mapping relation between the first identity information and the second position;
the first information further includes identity information of the second device, and the obtaining the second location includes:
and acquiring a second position according to the identity information of the second equipment and the mapping relation.
5. The method of any one of claims 1 to 4, wherein the first location comprises: a plurality of first candidate locations determined from the first speech; the location of the user includes: and determining a plurality of second candidate positions of the user according to the first position and the second position.
6. The method of claim 5, wherein the method further comprises:
and sending a target order to the third device, wherein the target order is used for indicating the third device to sequentially move to part or all of the plurality of second candidate positions according to the target order.
7. The method of claim 6, wherein the target order relates to at least one of:
a traffic path length between a location where the third device is currently located and each of the plurality of second candidate locations;
confidence of each first candidate position in the plurality of first candidate positions, the confidence being carried in the first information.
8. A device control method, applied to a first device, the method comprising:
acquiring first information, wherein the first information comprises a first position of a user relative to a second device; the first position is determined according to first voice of the user acquired by the second device, and the first voice indicates the work of the device of the control target type; the target type of device includes a plurality of devices;
acquiring a second position according to the first information, wherein the second position is the position where the second equipment is located; the first position and the second position are used for determining the position of the user;
and controlling the work of the third device based on the fact that the position of the user and the position of the third device in the plurality of devices belong to the same area.
9. The method of claim 8, wherein the belonging to the same region comprises: belonging to the same room.
10. A device control method applied to a third device including a moving component, the method comprising:
acquiring first information, wherein the first information comprises a first position of a user relative to a second device; the first position is determined according to first voice of the user acquired by the second device, and the first voice indicates the third device to move to an area where the user is located;
acquiring a second position according to the first information, wherein the second position is the position where the second equipment is located; the first position and the second position are used for determining the position of the user;
and controlling the moving component to move to the position where the user is located.
11. The method of claim 10, wherein the second location is generated by one of:
collecting second voice at a target position, wherein the second voice is the voice sent by the second equipment; determining the second position according to the second voice and the target position; or,
Acquiring an image of the second device at a target location; and determining the second position according to the image and the target position.
12. The method of claim 11, wherein the method further comprises:
sending out wake-up voice; the wake-up voice is used for waking up the second equipment;
the second voice is a response voice sent by the second equipment in response to the wake-up voice.
13. The method according to claim 11 or 12, wherein the second voice or the image is used to determine first identity information, the first identity information being identity information of the second device; the method further comprises the steps of:
establishing a mapping relation between the first identity information and the second position;
the first information further includes identity information of the second device, and the obtaining the second location includes:
and acquiring a second position according to the identity information of the second equipment and the mapping relation.
14. The method of any one of claims 10 to 13, wherein the first location comprises: a plurality of first candidate locations determined from the first speech; the first location and the second location are used for determining a plurality of second candidate locations where the user is located;
The moving to the position of the user by controlling the moving component comprises the following steps:
and sequentially moving to a plurality of second candidate positions by controlling the moving component according to the target sequence until the user moves to the correct candidate position in the plurality of candidate positions.
15. The method of claim 14, wherein the target order relates to at least one of:
a traffic path length between a location where the third device is currently located and each of the plurality of second candidate locations;
confidence of each first candidate position in the plurality of first candidate positions, the confidence being carried in the first information.
16. The method according to claim 14 or 15, wherein the first information further comprises second identity information, the second identity information being identity information of the user, the method further comprising:
acquiring an image of the environment when moving to one of the plurality of second candidate locations; determining a correct candidate position to move to among the plurality of second candidate positions based on the second identity information and according to successful matching between the identity information of the user included in the image; or,
Collecting a third voice of the user when moving to one of the plurality of second candidate positions; and determining to move to a correct candidate position in the plurality of second candidate positions based on the second identity information and the successful matching between the user identity information determined according to the third voice.
17. A device control method, characterized by being applied to a second device, the method comprising:
acquiring a first voice of a user, wherein the first voice indicates a third device to move to an area where the user is located;
transmitting first information to the third device or the first device according to the first voice, wherein the first information comprises a first position of the user relative to the second device; the first location is determined from the first voice.
18. The method of claim 17, wherein the method further comprises:
collecting wake-up voice sent by the third equipment;
responsive to the wake-up speech, a reply speech is issued.
19. The method of claim 17 or 18, wherein the first location comprises: a plurality of first candidate locations is determined from the first speech.
20. The method of any of claims 17 to 19, wherein the first information further comprises: confidence of each first candidate location of the plurality of first candidate locations.
21. A device control apparatus, characterized by being applied to a first device, the apparatus comprising:
the processing module is used for acquiring first information, wherein the first information comprises a first position of a user relative to the second device; the first position is determined according to first voice of the user acquired by the second device, and the first voice indicates a third device to move to an area where the user is located;
acquiring a second position according to the first information, wherein the second position is the position where the second equipment is located; the first position and the second position are used for determining the position of the user;
and the sending module is used for sending the position of the user to the third equipment.
22. The apparatus according to claim 21, wherein the processing module is specifically configured to:
receiving the first information sent by the second equipment; or,
and receiving the first voice sent by the second equipment, and determining the first information according to the first voice.
23. The apparatus of claim 21 or 22, wherein the device comprises a plurality of sensors,
the second position is determined according to second voice acquired at the target position, and the second voice is the voice sent by the second equipment; or,
the second location is determined from an image acquired at a target location for the second device.
24. The apparatus of claim 23, wherein the second voice or the image is used to determine first identity information, the first identity information being identity information of the second device; the processing module is further configured to:
establishing a mapping relation between the first identity information and the second position;
the processing module is specifically configured to:
and acquiring a second position according to the identity information of the second equipment and the mapping relation.
25. The apparatus of any one of claims 21 to 24, wherein the first location comprises: a plurality of first candidate locations determined from the first speech; the location of the user includes: and determining a plurality of second candidate positions of the user according to the first position and the second position.
26. The apparatus of claim 25, wherein the transmitting module is further configured to:
And sending a target order to the third device, wherein the target order is used for indicating the third device to sequentially move to part or all of the plurality of second candidate positions according to the target order.
27. The apparatus of claim 26, wherein the target order relates to at least one of:
a traffic path length between a location where the third device is currently located and each of the plurality of second candidate locations;
confidence of each first candidate position in the plurality of first candidate positions, the confidence being carried in the first information.
28. A device control apparatus, characterized by being applied to a first device, the apparatus comprising:
the processing module is used for acquiring first information, wherein the first information comprises a first position of a user relative to the second device; the first position is determined according to first voice of the user acquired by the second device, and the first voice indicates the work of the device of the control target type; the target type of device includes a plurality of devices;
acquiring a second position according to the first information, wherein the second position is the position where the second equipment is located; the first position and the second position are used for determining the position of the user;
And the control module is used for controlling the work of the third equipment based on the fact that the position of the user and the position of the third equipment in the plurality of equipment belong to the same area.
29. The apparatus of claim 28, wherein the belonging to the same region comprises: belonging to the same room.
30. A device control apparatus, characterized by being applied to a third device including a moving assembly, the apparatus comprising:
the processing module is used for acquiring first information, wherein the first information comprises a first position of a user relative to the second device; the first position is determined according to first voice of the user acquired by the second device, and the first voice indicates the third device to move to an area where the user is located;
acquiring a second position according to the first information, wherein the second position is the position where the second equipment is located; the first position and the second position are used for determining the position of the user;
and the mobile control module is used for moving to the position where the user is by controlling the mobile component.
31. A device control apparatus, characterized by being applied to a second device, the apparatus comprising:
The processing module is used for acquiring first voice of a user, and the first voice indicates that third equipment moves to an area where the user is located;
a sending module, configured to send first information to the third device or the first device according to the first voice, where the first information includes a first location of the user relative to the second device; the first location is determined from the first voice.
32. A computing device, the computing device comprising a memory and a processor; the memory stores code, the processor being configured to retrieve the code and perform the method of any of claims 1 to 20.
33. A computer storage medium storing one or more instructions which, when executed by one or more computers, cause the one or more computers to implement the method of any one of claims 1 to 20.
34. A computer program product comprising code for implementing the method of any of claims 1 to 20 when said code is executed.
CN202211202127.2A 2022-09-29 2022-09-29 Equipment control method and related device Pending CN117806305A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211202127.2A CN117806305A (en) 2022-09-29 2022-09-29 Equipment control method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211202127.2A CN117806305A (en) 2022-09-29 2022-09-29 Equipment control method and related device

Publications (1)

Publication Number Publication Date
CN117806305A true CN117806305A (en) 2024-04-02

Family

ID=90422366

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211202127.2A Pending CN117806305A (en) 2022-09-29 2022-09-29 Equipment control method and related device

Country Status (1)

Country Link
CN (1) CN117806305A (en)

Similar Documents

Publication Publication Date Title
KR102255273B1 (en) Apparatus and method for generating map data of cleaning space
JP7139226B2 (en) Mobile cleaning robot artificial intelligence for situational awareness
US10628714B2 (en) Entity-tracking computing system
KR102577785B1 (en) Cleaning robot and Method of performing task thereof
US9751212B1 (en) Adapting object handover from robot to human using perceptual affordances
CN112135553B (en) Method and apparatus for performing cleaning operations
US11457788B2 (en) Method and apparatus for executing cleaning operation
US10754351B2 (en) Observability grid-based autonomous environment search
US6853880B2 (en) Autonomous action robot
US9854206B1 (en) Privacy-aware indoor drone exploration and communication framework
US20190212441A1 (en) Map Related Acoustic Filtering by a Mobile Robot
CN108062098B (en) Map construction method and system for intelligent robot
KR20240063820A (en) Cleaning robot and Method of performing task thereof
US11409295B1 (en) Dynamic positioning of an autonomous mobile device with respect to a user trajectory
EP3639051A1 (en) Sound source localization confidence estimation using machine learning
US20130218395A1 (en) Autonomous moving apparatus and method for controlling the same
US20230057965A1 (en) Robot and control method therefor
CN109389641A (en) Indoor map integrated data generation method and indoor method for relocating
US20200100066A1 (en) System and Method for Generating Floor Plans Using User Device Sensors
WO2019138619A1 (en) Information processing device, information processing method and program
JP7095220B2 (en) Robot control system
KR20230134109A (en) Cleaning robot and Method of performing task thereof
CN112987720A (en) Multi-scale map construction method and construction device for mobile robot
CN117806305A (en) Equipment control method and related device
CN116869408A (en) Interaction method and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication