CN115268285A

CN115268285A - Device control method, device, electronic device, and storage medium

Info

Publication number: CN115268285A
Application number: CN202210833792.5A
Authority: CN
Inventors: 余向东
Original assignee: Lumi United Technology Co Ltd
Current assignee: Lumi United Technology Co Ltd
Priority date: 2022-07-14
Filing date: 2022-07-14
Publication date: 2022-11-01

Abstract

The embodiment of the application discloses a device control method, a device, a vehicle and a storage medium, and relates to the technical field of data processing. The method comprises the following steps: identifying the body posture of a target object in a video frame of a real-time video stream; if at least two target body postures are identified from the continuous video frames with the preset frame number, a control strategy matched with the at least two target body postures is triggered to be obtained, and a target control instruction is sent to corresponding target equipment based on the control strategy. The method and the device for controlling the target state can reduce the condition that the determined control strategy is inaccurate due to the fact that the target state is touched by mistake, improve the accuracy of the control strategy and further improve the accuracy of equipment control.

Description

Device control method, device, electronic device, and storage medium

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a device control method and apparatus, an electronic device, and a storage medium.

Background

With the progress of science and technology, cameras are steadily developing towards more and more intelligent directions, functions such as face recognition, human shape detection, cry detection, pet recognition and the like become standard matching functions of the cameras, and effective data can be screened from mass video data through the functions.

In the field of equipment control, a video stream of a human body can be collected through a camera, the human body form is identified from the video stream, a control instruction corresponding to the human body form is obtained, and then the equipment is controlled through the control instruction. However, the method for controlling the equipment is easy to cause the equipment to be controlled by false triggering, and the accuracy of the equipment control is low.

Disclosure of Invention

The application provides a device control method and device, electronic equipment and a storage medium, which are used for improving the accuracy of device control.

In a first aspect, an embodiment of the present application provides an apparatus control method, where the method includes: identifying the body posture of a target object in a video frame of a real-time video stream; if at least two target body postures are identified from the continuous video frames with the preset frame number, a control strategy matched with the at least two target body postures is triggered to be obtained, and a target control finger is sent to corresponding target equipment based on the control strategy.

In a second aspect, an embodiment of the present application provides an apparatus for controlling a device, where the apparatus includes: the identification module is used for identifying the body posture of a target object in a video frame of the real-time video stream; the control module is used for triggering and acquiring a control strategy matched with at least two target body postures if at least two target body postures are identified from continuous video frames with preset frame numbers, so as to send a target control instruction to corresponding target equipment based on the control strategy.

Optionally, the control module is further configured to trigger obtaining of a control policy matched with the at least two first target body poses if at least two first target body poses are identified from any one of the consecutive video frames with the preset number of frames.

Optionally, the control module is further configured to trigger obtaining of a control policy matched after combination of the at least two first target body poses and the at least one second target body pose if at least two first target body poses are identified from first video frames in the consecutive video frames with the preset number of frames and at least one second target body pose is identified from second video frames in the consecutive video frames with the preset number of frames.

Optionally, the control module is further configured to trigger obtaining of a control policy matched after combination of the at least one third target body posture and the at least one fourth target body posture if at least one third target body posture is recognized from a third video frame of the consecutive video frames with the preset number of frames and at least one fourth target body posture is recognized from a fourth video frame of the consecutive video frames with the preset number of frames.

Optionally, the control module is further configured to trigger obtaining of a control policy matched with the at least two target body poses if a specified body pose is identified in a fifth video frame in the real-time video stream and at least one target body pose is identified in a preset number of frames of continuous video acquired after the fifth video frame.

Optionally, the apparatus further includes a candidate body posture determining module, configured to determine, if multiple body postures are identified in one frame of video frame of the real-time video stream, at least one candidate body posture from the multiple body postures according to a ratio of each body posture in the multiple body postures in the one frame of video frame; and judging whether the at least one candidate body posture has a specified body posture or at least one target body posture.

Optionally, the candidate body pose determining module is further configured to determine, according to a ratio of each body pose in the frame of video frame of the real-time video stream to the plurality of body poses, one body pose with the largest ratio from the plurality of body poses as the candidate body pose if the plurality of body poses are identified in the frame of video frame of the real-time video stream.

Optionally, the candidate body pose determining module is further configured to frame each body pose in the frame of image by using an anchor frame of a preset shape according to a position of each body pose in the plurality of body poses in the frame of video and a preset feature value corresponding to each body pose in the plurality of body poses; and determining the body gesture framed by at least one anchor frame with the largest area in the areas of the anchor frames corresponding to the body gestures as a candidate body gesture.

Optionally, the candidate body posture determining module is further configured to determine, according to all pixel points included in each of the plurality of body postures, an image area occupied by each of the plurality of body postures in the frame of video frame; and determining at least one body posture which occupies the largest image area in the area of the image area occupied by each body posture as a candidate body posture.

Optionally, the apparatus further comprises: the system comprises an acquisition module, a processing module and a control module, wherein the acquisition module is used for acquiring at least two preset body postures and at least one preset control strategy of at least one preset object, and the at least two preset body postures comprise at least two preset standard body postures; establishing and storing a mapping relation between at least two preset standard body postures of the acquired at least one preset object and the at least one preset control strategy; and the control module is also used for determining that the at least two body postures are target body postures if at least two body postures are recognized from the continuous video frames with the preset frame number and are preset standard body postures, and triggering to acquire a control strategy matched with the at least two target body postures according to the mapping relation.

In a third aspect, an embodiment of the present application provides an electronic device, including a processor and a memory; one or more programs are stored in the memory and configured to be executed by the processor to implement the methods described above.

In a fourth aspect, the present application provides a computer-readable storage medium, in which program codes are stored, wherein the program codes, when executed by a processor, perform the above-mentioned method.

In a fifth aspect, embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions are read by a processor of the computer device from a computer-readable storage medium, and the processor executes the computer instructions, causing the computer device to perform the method described above.

According to the device control method, the device, the electronic device and the storage medium, at least two target body postures are identified from continuous video frames with preset frame numbers in a video stream, and the control strategy matched with the at least two target body postures is determined based on the at least two target body postures, so that the condition that the determined control strategy is inaccurate due to the fact that the target body posture is touched by mistake is reduced, the accuracy of the control strategy is improved, and the accuracy of device control is further improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 illustrates a device control system provided in accordance with an embodiment of the present application;

fig. 2 is a flowchart illustrating a device control method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a target form pose in an embodiment of the present application;

FIG. 4 is a schematic diagram of another target body pose in an embodiment of the present application;

FIG. 5 is a schematic diagram of yet another target form pose in an embodiment of the present application;

FIG. 6 is a schematic diagram showing a target form pose in yet another embodiment of the present application;

fig. 7 is a flowchart illustrating a device control method according to still another embodiment of the present application;

fig. 8 is a flowchart illustrating a device control method according to still another embodiment of the present application;

FIG. 9 is a diagram illustrating a gesture recognition process in an embodiment of the present application;

fig. 10 is a block diagram showing an apparatus control device according to an embodiment of the present application;

fig. 11 shows a block diagram of an electronic device for executing the device control method according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described clearly and completely with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. All other embodiments obtained by a person of ordinary skill in the art without making any creative effort based on the embodiments in the present application belong to the protection scope of the present application.

In the following description, references to the terms "first", "second", and the like are only intended to distinguish similar objects and do not denote a particular order, but rather the terms "first", "second", and the like may be used interchangeably with the specific order or sequence described herein, where permissible, to enable embodiments of the present application to be practiced otherwise than as specifically illustrated or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.

At present, a human body form is recognized from a video stream, a control instruction corresponding to the human body form is obtained, and then the device is controlled through the control instruction. However, the human body shape recognized in the video stream may be a human body shape touched by mistake and is not a human body shape sent by the user on demand, so that the control instruction acquired according to the human body shape is determined to be inaccurate, and the accuracy of device control is low.

Therefore, the inventor proposes a device control method, apparatus, electronic device, and storage medium. Identifying the body posture of a target object in a video frame of a real-time video stream; if at least two target body postures are identified from the continuous video frames with the preset frame number, a control strategy matched with the at least two target body postures is triggered to be obtained, and a target control instruction is sent to corresponding target equipment based on the control strategy. Because the control strategy is determined based on at least two target body states, the condition that the determined control strategy is inaccurate due to the fact that the target body states are touched by mistake is reduced, the accuracy of the control strategy is improved, and the accuracy of equipment control is further improved.

FIG. 1 illustrates an appliance control system provided in accordance with an embodiment of the present application. As shown in fig. 1, the device control system includes a camera 101, a server (or cloud) 102, a terminal 103, and a smart device 104, where the camera 101 and each smart device 104 generally need to transmit data to the server 102 or the terminal 103 through a network device 105 (such as a router or a gateway); .

The camera 101 may be a high definition camera, an infrared camera, a color camera, and the like, and is used to capture a real-time video stream of a target object.

The server 102 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like.

The terminal 103 may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart appliance, a vehicle-mounted terminal, an aircraft, a wearable device terminal, a virtual reality device, and other terminal devices capable of playing video, and the terminal may run a video playing application or other applications (e.g., an instant messaging application, a shopping application, a search application, a game application, a forum application, a map traffic application, etc.) capable of invoking the video playing application.

The smart device 104 may be a smart home device, a smart office device, and the like, such as a refrigerator, a printer, a television, an air conditioner, and the like.

As an embodiment, the camera 101 may capture a real-time video stream of a target object, the real-time video stream is sent to the terminal 103 through the server 102, the terminal 103 processes the real-time video stream sent by the camera 101 to obtain a control policy, and obtains a control instruction according to the control policy, and then the terminal 103 sends the control instruction to the smart device 104, so that the smart device 104 executes the control instruction.

As another embodiment, the camera 101 may capture a real-time video stream of a target object and transmit the real-time video stream to the server 102, the server 102 processes the real-time video stream transmitted by the camera 101 to obtain a control policy and obtain a control instruction according to the control policy, and then the server 102 transmits the control instruction to the smart device 104, so that the smart device 104 executes the control instruction.

In some embodiments, the camera 101 itself may process the captured real-time video stream to obtain a control policy and obtain a control instruction according to the control policy, and then the camera 101 sends the control instruction to the smart device 104 to enable the smart device 104 to execute the control instruction.

For convenience of description, in the following embodiments of the present application, the camera 101, the server 102, and the terminal 103, which can execute the aspects of the present application, are all referred to as electronic devices.

Referring to fig. 2, fig. 2 shows a flowchart of a method for controlling a device according to an embodiment of the present application, where the method may be executed by an electronic device, for example, the electronic device may be the server in fig. 1, or the terminal in fig. 1, or a camera device with a camera function, such as the camera in fig. 1. The method specifically comprises the following steps:

and S110, identifying the body posture of the target object in the video frame of the real-time video stream.

In this embodiment, the target object may be photographed by the camera in real time to obtain a real-time video stream, and the electronic device acquires the real-time video stream from the camera and identifies a body posture of the target object in a video frame of the real-time video stream.

The target object may be a whole body of a human body, or may be a part of the human body, such as a head, a hand, and the like.

It is understood that the body posture of the target object may refer to the shape posture of the target object, and may be the shape posture of the whole body of the target object or the shape posture of the local part of the target object, for example, the body posture may include the whole body posture of the human body and/or the part posture of the human body, the part posture of the human body includes the head posture of the human body and/or the limb posture of the human body, and the limb posture of the human body includes the gesture of the human body.

When the body posture is the whole body posture, one target object corresponds to one body posture; when the body posture is the head posture, one target object corresponds to one body posture; when the physique gesture is the gesture, a gesture corresponds a physique gesture, if two hands of one person all make the gesture, then correspond two physique gestures.

As an embodiment, each video frame of the real-time video stream may be recognized by a deep learning model (e.g., a trained body posture recognition model) to determine the body posture of the target object in each video frame.

The training process of the deep learning model can comprise the following steps: and inputting the sample image including the body posture into a preset algorithm model for training. In the training process, the predicted value output by the preset algorithm model in each round can be compared with the sample label of the body posture to obtain corresponding loss, and then the parameters of the preset algorithm model are adjusted according to the corresponding loss to continuously make the preset algorithm model converge; and stopping training until the conditions are met, thereby obtaining a well-trained deep learning model.

The real-time video stream can be a 1080P high-definition video, each frame image of the real-time video stream can be buffered in a buffer, the buffered current video frame is obtained from the buffer, the current video frame is converted into a to-be-processed video frame with the resolution of 512 multiplied by 512, and then the body posture of the to-be-processed video frame is identified through a deep learning model.

S120, if at least two target body postures are identified from the continuous video frames with the preset frame number, triggering and obtaining a control strategy matched with the at least two target body postures, and sending a target control instruction to corresponding target equipment based on the control strategy.

The preset frame number may be set based on the demand and the actual scene, for example, the preset frame number is 100 or 120.

The control policy may refer to a control method or a control flow for controlling the target device, the control policy may include a device identifier of the target device and a target control instruction that the target device needs to execute, and the target device may be any device that can be controlled, for example, may be a smart home device. The device identifier may be a name or a number of the target device, so as to send the target control instruction to the target device through the device identifier, where the target control instruction may be a specific instruction that the target device needs to execute, for example, the target device is an air conditioner, and the target control instruction may be turning on the air conditioner and setting a temperature of the air conditioner to be 26 degrees celsius.

The target body pose may be a body pose in the video frame that matches a preset body pose. The electronic device can store a plurality of preset body posture pairs input by a preset object and a plurality of preset control strategies corresponding to the preset body posture pairs one to one, each preset body posture pair comprises at least two preset body postures, the preset body postures can refer to preset standard body postures prestored, the preset body posture pairs can be stored in a preset combination result mode, the preset combination result can comprise position relations between each preset body posture in the preset body posture pairs and each preset body posture, and the preset combination result can also comprise each preset body posture in the preset body posture pairs and time sequence between each preset body posture.

It can be understood that the target object in the corresponding recognized target body posture and the preset object in the preset form posture may be the same object or different objects. The recognized target body posture is used as a basis for acquiring the control strategy, and the face information and the body information of the object can not be used as the basis for acquiring the control strategy.

And aiming at each video frame, comparing all preset body gestures included in the plurality of preset body gesture pairs with body gestures in the video frame, determining whether the preset body gestures matched with the body gestures in the video frame exist in all the preset body gestures included in the plurality of preset body gesture pairs, if so, determining that the body gestures in the video frame are target body gestures, if not, determining that the body gestures in the video frame are not the target body gestures, acquiring the next video frame, and continuing to perform the judgment process of the target body gestures.

For example, the preset body posture pairs include 10 corresponding to 10 preset control strategies, each preset body posture pair includes 2 preset body postures, 20 preset body postures included in the 10 preset body posture pairs are compared with the body posture in the video frame, whether the 20 preset body postures are matched with the body posture in the video frame is determined, and when the preset body posture matched with the body posture in the video frame exists in the 20 preset body postures, the body posture in the video frame is determined to be the target body posture.

A plurality of preset body posture pairs of at least one preset object are acquired through a camera, and a plurality of preset control strategies recorded by the preset object are acquired, wherein one preset body posture pair comprises at least two preset standard body postures; and then establishing and storing the mapping relation between the obtained multiple preset body posture pairs and the multiple preset control strategies, thereby obtaining the preset control strategies in which the multiple preset body posture pairs and the multiple preset body posture pairs correspond to one another.

The matching of the body posture and the preset body posture can mean that the body posture is the same as the preset body posture, and the matching of the body posture and the preset body posture can mean that the similarity between the body posture and the preset body posture reaches a similarity threshold value, wherein the similarity threshold value can be 80% or 88% and the like.

When the target object comprises one object, the target body posture in one video frame refers to the target body posture of the target object; when the target object includes a plurality of objects, the target physical pose in one video frame refers to a target physical pose of at least one target object.

For example, the number of the target objects is three, and the first frame of the real-time video stream includes a target body posture of one target object, so that the target body posture in the video frame is the target body posture; and the tenth frame of the real-time video stream comprises the target body postures of the three target objects, so that the target body postures in the video frame are the three target body postures.

As an implementation manner, if each of the consecutive video frames with the preset number of frames includes at most one target body posture, when at least two video frames in the consecutive video frames with the preset number of frames include the target body posture, the control strategy matched with the at least two target body postures is triggered to be acquired. And when at least two video frames do not exist in the continuous video frames with the preset frame number and comprise the target body postures, not triggering to acquire the control strategy matched with the at least two target body postures.

For example, if the preset frame number is 100, the 1 st video frame in the 100 consecutive video frames includes the target body posture, and the 78 th frame includes the target body posture, the acquisition of the control strategy matching with at least two target body postures is triggered. For another example, if the preset frame number is 100, the 1 st video frame in the consecutive video frames includes the target body posture, and the 120 th frame includes the target body posture, it is determined that the consecutive 100 frames does not include at least two target body postures, and the acquisition of the control strategy matched with the at least two target body postures is not triggered.

In another embodiment, if one video frame in the consecutive video frames with the preset number of frames includes at least two target body poses, the control strategy matching with the at least two target body poses is triggered to be acquired. For example, if the preset frame number is 100 and the 56 th video frame in the 100 consecutive video frames includes at least two target poses, the acquisition of the control strategy matching with the at least two target poses is triggered.

The control strategy matched with the at least two target body postures may refer to a preset control strategy corresponding to a preset body posture pair matched with the at least two target body postures. The target equipment refers to equipment pointed by a preset control strategy matched with at least two target body postures, and the target control instruction refers to an instruction required to be executed by the target equipment in the control strategy matched with the at least two target body postures. The control policy may include a device identification (e.g., a device name or a device number, etc.) of the controlled device and the specific content of the control instruction.

After the electronic device obtains the control policy, a target control instruction in the control policy can be sent to the target device according to the device identifier in the control policy, and the target device executes the target control instruction after receiving the target control instruction, so that the target device is controlled.

As an implementation manner, the matching of the preset shape posture pair and the at least two target shape postures may be matching of a preset combination result corresponding to the preset shape posture pair and a combination result of the at least two target shape postures, where the combination result of the at least two target shape postures may be a result obtained by combining the at least two target shape postures in one image according to position information of the at least two target shape postures in a video frame, and the combination result of the at least two target shape postures may also be a result obtained by arranging the at least two target shape postures according to a time sequence of a video frame to which the at least two target shape postures belong.

For example, the preset body posture pairs include 3 body postures A1, B1 and C1, where A1 is above the left of B1 and B1 is above the left of C1 in the preset combination result corresponding to the preset body posture pairs, the recognized target body postures include 3, respectively, A2, B2 and C2 are combined in one image according to the position information in the video frame where the A2, B2 and C2 are located, and when it is determined that A1 is the same as A2, B1 is the same as B2, and C1 is the same as C2, and in one combined image, A2 is above the left of B2 and B2 is above the left of C2, it is determined that the body posture pairs match with the 3 target body postures.

For another example, the preset body posture pair includes 3 body postures A3, B3, and C3, where in the preset combination result corresponding to the preset body posture pair, A3 is before B3, B3 is before C3, the identified target body postures include 3, which are respectively A4, B4, and C4, A4, B4, and C4 are arranged according to the time sequence of the video frame where the A4, B4, and C4 are located, and when it is determined that A3 is the same as A4, B3 is the same as B4, C3 is the same as C4, and A4 is before B4, and B4 is before C4, it is determined that the body posture pair matches with 3 target body postures.

In some embodiments, after at least two target body poses are identified from consecutive video frames with a preset number of frames, according to the position relationship of the at least two target body poses in respective video frames, the at least two target body poses are positioned in a restored image (the restored image is the combined result of the at least two target body poses) with the same resolution as the video frames, then pose regions corresponding to the at least two target body poses are framed from the restored image, the pose regions are compressed into a comparison image with a resolution of 112 × 112, and then the comparison image is compared with a plurality of preset body poses corresponding to preset combined results.

Optionally, as an implementation, S120 may include: and if at least two first target body postures are identified from any one of the continuous video frames with the preset frame number, triggering and acquiring a control strategy matched with the at least two first target body postures, so as to send a target control instruction to corresponding target equipment based on the control strategy.

The at least two first target body poses may refer to at least two target body poses in the same video frame, and the at least two first target body poses may be the same or different. For example, the at least two first target body gestures may be two gestures respectively made by two hands of the same target object, and the at least two first target body gestures may also be four gestures respectively made by two hands of two target objects.

When a certain video frame is determined to comprise at least two first target body postures, acquiring a target body posture pair matched with the at least two first target body postures from a plurality of preset body posture pairs, and taking a preset control strategy corresponding to the target body posture pair as a control strategy matched with the at least two first target body postures.

The matching of the preset body posture pair and the at least two first target body postures may refer to matching of a preset combination result corresponding to the preset body posture pair and video frames corresponding to the at least two first target body postures, where the matching of the preset combination result and the video frames may refer to that the preset combination result and the video frames include the same body posture and the position relationship between the body postures is the same.

For example, the preset body posture pair includes 2 body postures A3 and B3, A3 in the preset combination result corresponding to the preset body posture pair is on the left side of B3, the first recognized target body postures include 2, which are A4 and B4 respectively, A4 is on the left side of B4 in the video frame corresponding to the first target body posture, and when it is determined that A3 is the same as A4 and B3 is the same as B4, it is determined that the body posture pair matches with 2 first target body postures.

As shown in fig. 3, a, b, c and d4 video frames in fig. 3, the gesture in each video frame is a target body gesture, each video frame comprises at least two target body gestures, in a of fig. 3, a five-finger gesture is on the left side of a two-finger gesture, in b of fig. 3, an ok gesture of a left hand is on the left side of an ok gesture of a right hand, in c of fig. 3, a two-finger gesture of a right hand is on the leftmost side, a two-finger gesture of a left hand is in the middle, an ok gesture of a right hand is on the rightmost side, in d of fig. 3, a four-finger gesture of a right hand is on the leftmost side and the rightmost side, and a four-finger gesture of a left hand is in the middle; at least two gestures in each video frame correspond to one control strategy, for example, the control strategy corresponding to a in fig. 3 is to play a next song, and the control strategy corresponding to b in fig. 3 is to turn on an air conditioner.

Optionally, S120 may further include: if at least two first target body postures are recognized from the first video frames in the continuous video frames with the preset frame number, and at least one second target body posture is recognized from the second video frames in the continuous video frames with the preset frame number, a control strategy matched with the at least two first target body postures and the at least one second target body posture after combination is triggered to be obtained, so that a target control instruction is sent to corresponding target equipment based on the control strategy.

Wherein a first acquisition time of acquiring the first video frame is earlier than a second acquisition time of acquiring the second video frame.

The first target body posture may refer to a target body posture of a target object in a first video frame, the second body posture may refer to a target body posture of a target object in a second video frame, the first video frame may refer to a video frame including at least two target body postures, which appears for the first time, of consecutive video frames of a preset number of frames, and the second video frame may refer to a video frame including at least one target body posture, which appears for the first time after the first video frame.

The at least two first target body postures and the at least one second target body posture can be combined to obtain a first body posture pair to be recognized, a target body posture pair matched with the first body posture pair to be recognized can be determined in a plurality of stored preset body posture pairs, and then a preset control strategy corresponding to the target body posture pair is obtained to serve as a control strategy matched with the at least two first target body postures and the at least one second target body posture after combination.

The matching of the preset body posture pair with the first body posture pair to be recognized may be that the preset body posture pair only includes a first preset body posture matched with at least two first target body postures and a second preset body posture matched with at least one second target body posture, and the first preset body posture is arranged in front of the second preset body posture. The matching of the first preset body posture and the at least two first target body postures may mean that the body postures included in the at least two first target body postures are the same as the body postures included in the first preset body posture, and the positional relationship of each body posture in the at least two first target body postures is the same as the positional relationship of each body posture in the first preset body posture. The second preset body posture and the at least one second target body posture matching may mean that the body posture included in the at least one second target body posture is the same as the body posture included in the second preset body posture, and the positional relationship of each body posture in the at least one second target body posture is the same as the positional relationship of each body posture in the second preset body posture.

As shown in fig. 4, a in fig. 4 is two first target body gestures, wherein the two first target body gestures are a left-hand five-finger gesture and a right-hand two-finger gesture respectively, and the left-hand five-finger gesture is to the left of the right-hand two-finger gesture, b in fig. 4 is a second target body gesture, the second target body gesture is a right-hand ok gesture, a and b in fig. 4 are combined as a combined result, and a and b in fig. 4 are in a first order, and the combined result corresponds to a control policy, for example, turning on a television.

Optionally, S120 may further include: and if at least one third target body posture is recognized from a third video frame in the continuous video frames with the preset frame number, and at least one fourth target body posture is recognized from a fourth video frame in the continuous video frames with the preset frame number, triggering and acquiring a control strategy matched with the at least one third target body posture and the at least one fourth target body posture after combination.

The third target body posture may refer to a target body posture of a target object in a third video frame, the fourth body posture may refer to a target body posture of a target object in a fourth video frame, the third video frame may refer to a video frame including at least one target body posture, which appears for the first time, of consecutive video frames of a preset number of frames, and the fourth video frame may refer to a video frame including at least one target body posture, which appears for the first time after the third video frame.

The at least one third target body posture and the at least one fourth target body posture can be combined to obtain a second body posture pair to be recognized, a target body posture pair matched with the second body posture pair to be recognized can be determined in a plurality of stored preset body posture pairs, and then a preset control strategy corresponding to the target body posture pair is obtained to serve as a control strategy matched with the combined at least one third target body posture and the combined at least one fourth target body posture.

The matching of the preset body posture pair with the second to-be-recognized body posture pair may be that the preset body posture pair only includes a third preset body posture matched with at least one third target body posture and a fourth preset body posture matched with at least one fourth target body posture, and the third preset body posture is arranged in front of the fourth preset body posture. The matching of the third preset body posture and the at least one third target body posture may mean that the body posture included in the at least one third target body posture is the same as the body posture included in the third preset body posture, and the positional relationship of each body posture in the at least one third target body posture is the same as the positional relationship of each body posture in the third preset body posture. The matching of the fourth preset body posture and the at least one fourth target body posture may mean that the body posture included in the at least one fourth target body posture is the same as the body posture included in the fourth preset body posture, and the positional relationship of each body posture in the at least one fourth target body posture is the same as the positional relationship of each body posture in the fourth preset body posture.

As shown in fig. 5, a in fig. 5 is a third target body posture, the third target body posture is a right-hand ok gesture, b in fig. 5 is two fourth target body postures, the two fourth target body postures are respectively a left-hand ok gesture and a right-hand ok gesture, and the left-hand ok gesture is on the left side of the right-hand ok gesture; a and b in fig. 5 are combined as one combined result, and the order of a and b in fig. 5 is a first, which corresponds to one control strategy, e.g. turning on a bedroom light.

Optionally, S120 may further include: if the specified body posture is recognized in a fifth video frame in the real-time video stream, and at least one target body posture is recognized in a continuous video with a preset frame number collected after the fifth video frame, triggering and obtaining a control strategy matched with the at least two target body postures.

The fifth video frame may refer to a video frame in which a specified physical pose of the target object exists, and the specified physical pose may be any physical pose. The specified body gesture can be a body gesture set by a user for triggering, the specified body gestures can be one or more, and when any specified body gesture exists in the video frames, the video frame is determined to be a fifth video frame.

And after the appointed body posture is recognized in the fifth video frame, taking the appointed body posture as a target body posture, and if at least one target body posture is recognized in continuous videos with preset frame numbers collected after the fifth video frame, taking the appointed body posture and the at least one target body posture after the appointed body posture as at least two target body postures. And if at least one target body posture is not identified in the continuous videos with the preset frame number collected after the fifth video frame, re-identifying the fifth video frame comprising the specified body posture.

The combination result matching of the preset body posture pair with the at least two target body postures may be that the preset body posture pair includes only a fifth preset body posture matching with the designated body posture and a sixth preset body posture matching with the at least one target body posture, and the fifth preset body posture is arranged in front of the sixth preset body posture. The matching of the fifth preset body posture and the specified body posture may mean that the body postures included in the specified body posture are the same as the body postures included in the fifth preset body posture, and the position relationship of each body posture in the specified body posture is the same as the position relationship of each body posture in the fifth preset body posture. The matching of the sixth preset body posture and the at least one target body posture may mean that the body posture included in the at least one target body posture is the same as the body posture included in the sixth preset body posture, and the position relationship of each body posture in the at least one target body posture is the same as the position relationship of each body posture in the sixth preset body posture.

As shown in fig. 6, a in fig. 6 is a designated body posture, the designated body posture includes 1 gesture, the designated body posture is a right-hand ok gesture, b in fig. 6 is a target body posture corresponding to a in fig. 6, b in fig. 6 includes three target body postures, which are respectively a right-hand four-finger gesture, a left-hand four-finger gesture, and a right-hand four-finger gesture, where the left-hand four-finger gesture is between two right-hand four-finger gestures; a and b in fig. 6 are combined as one combination result, and the order of a and b in fig. 6 is a-first, the combination result corresponding to one control strategy, for example, turning on the humidifier.

In this embodiment, at least two target body poses are identified from continuous video frames with preset frames in a video stream, and a control strategy matched with the at least two target body poses is determined, where the control strategy is determined based on the at least two target body poses, so that the occurrence of inaccurate control strategy caused by mistakenly touching the target body poses is reduced, the accuracy of the control strategy is improved, and the accuracy of device control is further improved.

Meanwhile, if at least two first target body postures are recognized from a first video frame in the continuous video frames with the preset frame number, and at least one second target body posture is recognized from a second video frame in the continuous video frames with the preset frame number, a control strategy matched with the at least two first target body postures and the at least one second target body posture after combination is triggered and obtained, and the control strategy is determined through a plurality of target body postures in the first video frame and the second video frame, so that the condition that the determined control strategy is inaccurate due to mistaken touch of the target body posture is further reduced, and the accuracy of the control strategy and the accuracy of equipment control are further improved.

And if the specified body posture can be recognized in a fifth video frame in the real-time video stream, and at least one target body posture is recognized in a continuous video with a preset frame number collected after the fifth video frame, the control strategy matched with the at least two target body postures is triggered and obtained, and the situation that the determined control strategy is inaccurate due to mistaken touch of the target body posture is further reduced by taking the specified video frame as a basis for triggering and obtaining the control strategy, so that the accuracy of the control strategy and the accuracy of equipment control are further improved.

Referring to fig. 7, fig. 7 is a flowchart illustrating a method for controlling a device according to another embodiment of the present application, where the method may be executed by an electronic device, for example, the electronic device may be a server in fig. 1, or a terminal in fig. 1, or a camera device with a camera function, such as the camera in fig. 1. The method specifically comprises the following steps:

s210, identifying the body posture of the target object in the video frame of the real-time video stream.

The description of S210 refers to the description of S110 above, and is not repeated here.

S220, if a plurality of body gestures are identified in one frame of video frame of the real-time video stream, determining at least one candidate body gesture from the plurality of body gestures according to the ratio of each body gesture in the plurality of body gestures in the one frame of video frame.

When a plurality of body gestures are recognized in one frame of video frame of the real-time video stream, the proportion of each body gesture in the video frame is determined, and at least one body gesture with the largest proportion is determined from the plurality of body gestures to serve as a candidate body gesture.

As an embodiment, each of the plurality of body poses may be framed in the frame of image by an anchor frame of a preset shape according to a position of each of the plurality of body poses in the frame of video and a preset feature value corresponding to each of the plurality of body poses; and determining the body gesture framed by at least one anchor frame with the largest area in the areas of the anchor frames corresponding to the body gestures as a candidate body gesture.

The user can set a preset characteristic value for each body posture based on the requirement, the preset characteristic value of the more important body posture can be larger, the preset characteristic value of the less important body posture can be smaller, and the size of the preset characteristic value determines the area size of the anchoring frame. And aiming at the body postures with the same size, the larger the preset characteristic value is, the larger the obtained anchoring frame is.

The size of the anchoring frame can be determined according to the preset characteristic value of the body posture, and the position of the anchoring frame is determined according to the position of the body posture. And then calculating the area in each anchor frame, and determining the body posture framed by at least one anchor frame with the largest area as a candidate body posture.

The anchoring frame of each body posture is determined according to the preset characteristic value of each body posture, the candidate body posture is determined according to the area size of the anchoring frame, unimportant body postures with small anchoring frame areas in the video frame are filtered, interference of unimportant body postures with small anchoring frame areas in the video frame is reduced, and the body posture recognition accuracy is improved. Meanwhile, all body postures do not need to be identified, and the identification efficiency of the body postures is improved.

As another embodiment, an image area occupied by each of the plurality of body poses in the frame of video frame may be determined according to all pixel points included in each of the plurality of body poses; and determining at least one body posture which occupies the largest image area in the area of the image area occupied by each body posture as a candidate body posture.

The area of each image area is calculated by taking the area formed by all the pixel points contained in each body posture in the video frame as an image area, taking the area formed by the pixel points not contained in the body posture in the video frame as a non-image area, and determining at least one body posture occupying the largest image area as a candidate body posture.

And S230, judging whether the at least one candidate body posture has a specified body posture or at least one target body posture.

And for each video frame, judging whether a specified body posture or at least one target body posture exists in at least one candidate body posture determined in the video frame, if so, determining that the specified body posture or at least one target body posture is recognized, and executing S240. If not, determining that the specified body posture and at least one target body posture are not recognized, and continuing to perform the steps of S210-S230 on the next video frame.

S240, if at least two target body postures are identified from continuous video frames with a preset frame number, triggering and acquiring a control strategy matched with the at least two target body postures so as to send a target control instruction to corresponding target equipment based on the control strategy.

The description of S240 refers to the description of S120 above, and is not repeated here.

In this embodiment, when the video frame includes a plurality of body gestures, at least one candidate body gesture is determined, and unimportant body gestures with a smaller proportion in the video frame are filtered out, so that interference of unimportant body gestures with a smaller proportion in the video frame is reduced, and the body gesture recognition accuracy is improved. Meanwhile, all body postures do not need to be identified, and the identification efficiency of the body postures is improved.

Referring to fig. 8, fig. 8 is a flowchart illustrating a method for controlling a device according to still another embodiment of the present application, where the method is executed by an electronic device, for example, the electronic device may be the server in fig. 1, or the terminal in fig. 1, or a camera device with a camera function, such as the camera in fig. 1. The method specifically comprises the following steps:

s310, at least two preset body postures of at least one preset object and at least one preset control strategy are obtained, wherein the at least two preset body postures comprise at least two preset standard body postures.

The preset object can be an object for entering a preset body posture, can be the same as the target object, and can also be different from the target object. The preset body posture can be a preset standard body posture input by a preset object, and at least two preset standard body postures can be used as a preset body posture pair. The description of the preset body posture pair refers to the above description and is not repeated.

And S320, establishing and storing a mapping relation between at least two preset standard body postures of the acquired at least one preset object and the at least one preset control strategy.

And establishing a mapping relation between the at least two preset standard body postures and the at least one preset control strategy so as to index the corresponding preset control strategy through the at least two preset standard body postures and the mapping relation.

As an implementation manner, at least two preset standard body postures may be combined into a preset body posture pair, and then a mapping relationship between at least one preset body posture pair and at least one preset control strategy is established and coexisted. And a preset control strategy corresponding to the index is realized through the mapping relation and the preset body posture.

S330, identifying the body posture of the target object in the video frame of the real-time video stream.

The description of S330 refers to the description of S110 above, and is not repeated here.

S340, if at least two body postures are recognized from the continuous video frames with the preset frame number to be the preset standard body postures, determining that the at least two body postures are the target body postures, triggering to acquire a control strategy matched with the at least two target body postures according to the mapping relation, and sending a target control instruction to corresponding target equipment based on the control strategy.

And if the at least two body postures are recognized from the continuous video frames with the preset frame number and are the preset standard body postures, determining that the at least two body postures are the target body postures.

The step of triggering to obtain the control policy matched with the at least two target body gestures according to the mapping relationship, so as to send the target control instruction to the corresponding target device based on the control policy refers to the description of S120 above, which is not described herein again.

In this embodiment, a mapping relationship between at least two preset standard body poses of the at least one preset object and the at least one preset control strategy is established and stored, so that the corresponding control instruction is directly indexed through the mapping relationship, the acquisition speed of the control instruction is increased, and the control efficiency of the device is increased.

For more convenient understanding of the solution of the present application, the device control method of the present application will be explained below with reference to a specific application scenario in which the physical gesture is a gesture, the resolution of the camera is 1080P (1920 × 1080), the camera itself is the main body for executing the device control method, and the smart device is a refrigerator.

As shown in fig. 9, first, a gesture recognition algorithm of the camera (that is, the device control method of the present application is started, in this scenario, the body posture that needs to be recognized by the device control method is a gesture), and the camera captures a video stream of the target object in real time.

The video camera extracts a frame of YUV format ("Y" for brightness, i.e. gray scale values, "U" and "V" for chrominance) from the video stream, and stores the frame of video at a resolution of 1080P in a buffer.

The video camera compresses the video frames in the buffer into to-be-processed video frames with a resolution of 512 × 512, matches the pre-processed video frames with an algorithm model (that is, a preset standard body gesture pre-stored in the above embodiment, which is a preset standard gesture in the scene), and determines whether a preset standard gesture matched with the to-be-processed video frames is matched.

If not, extracting new current frame from video stream, and executing the matching step again.

If the matching is successful, the gesture in the video frame to be processed is taken as a target gesture, the gesture in the preprocessed image is taken as a target gesture, if at least two target gestures exist, the at least two target gestures are positioned to 1080P original image coordinates in a YUV format in an inverted mode to obtain a restored image, then the area where the gesture exists on the restored image is compressed into a gesture image with a resolution of 112 × 112, and then the gesture image is compared with a plurality of pre-stored gesture combinations (the gesture combinations are also preset body posture pairs in the above embodiment).

Judging whether a gesture combination matched with the gesture image exists in the prestored gesture combinations, if so, acquiring a control strategy corresponding to the gesture combination matched with the gesture image (the control strategy corresponding to the gesture combination is the preset control strategy corresponding to the preset body posture pair in the embodiment), and if not, extracting a new current frame from the video stream again and executing the steps again.

The obtained control strategy is that the temperature of the refrigerating chamber of the refrigerator is set to be 0 ℃, the camera obtains a control instruction that the temperature of the refrigerating chamber is set to be 0 ℃ according to the control strategy, then the camera sends the control instruction to the refrigerator, and the refrigerator executes the control instruction that the temperature of the refrigerating chamber is set to be 0 ℃.

Referring to fig. 10, fig. 10 is a block diagram of an apparatus control device according to an embodiment of the present application, where the apparatus 1100 includes:

the recognition module 1110 is configured to recognize a body posture of a target object in a video frame of a real-time video stream;

the control module 1120 is configured to, if at least two target body gestures are identified from consecutive video frames with a preset number of frames, trigger to acquire a control policy matched with the at least two target body gestures, and send a target control instruction to a corresponding target device based on the control policy.

Optionally, the control module 1120 is further configured to trigger obtaining of a control policy matched with the at least two first target body poses if at least two first target body poses are identified from any one of the consecutive video frames of the preset number of frames.

Optionally, the control module 1120 is further configured to trigger obtaining of a control policy matched after combination of the at least two first target body poses and the at least one second target body pose if at least two first target body poses are identified from first video frames in the consecutive video frames with the preset number of frames and at least one second target body pose is identified from second video frames in the consecutive video frames with the preset number of frames.

Optionally, the control module 1120 is further configured to trigger obtaining of a control policy matched after the combination of the at least one third target body posture and the at least one fourth target body posture if at least one third target body posture is recognized from a third video frame in the consecutive video frames with the preset frame number and at least one fourth target body posture is recognized from a fourth video frame in the consecutive video frames with the preset frame number.

Optionally, the control module 1120 is further configured to trigger obtaining of a control policy matched with the at least two target body poses if a specified body pose is identified in a fifth video frame in the real-time video stream and at least one target body pose is identified in a preset number of frames of continuous video acquired after the fifth video frame.

Optionally, the apparatus further includes a candidate body pose determining module, configured to determine, if multiple body poses are identified in a frame of video frame of the real-time video stream, at least one candidate body pose from the multiple body poses according to a ratio of each body pose in the multiple body poses in the frame of video frame; and judging whether the at least one candidate body posture has a specified body posture or at least one target body posture.

Optionally, the candidate body posture determining module is further configured to determine, according to a ratio of each of the plurality of body postures in the frame of video frame, at least one body posture with a largest ratio from the plurality of body postures as the candidate body posture, if the plurality of body postures are identified in the frame of video frame of the real-time video stream.

Optionally, the candidate body pose determining module is further configured to frame each body pose in the frame of image by using an anchor frame of a preset shape according to a position of each body pose in the plurality of body poses in the frame of video and a preset feature value corresponding to each body pose in the plurality of body poses; and determining the body posture framed by at least one anchor frame with the largest area in the areas of the anchor frames corresponding to the body postures as a candidate body posture.

Optionally, the candidate body posture determining module is further configured to determine, according to all pixel points included in each of the plurality of body postures, an image area occupied by each of the plurality of body postures in the frame of video frame; and determining at least one body posture with the largest occupied image area in the area of the image area occupied by each body posture as a candidate body posture.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and modules may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In several embodiments provided in the present application, the coupling of the modules to each other may be electrical, mechanical or other forms of coupling.

In addition, functional modules in the embodiments of the present application may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.

Fig. 11 shows a block diagram of an electronic device for executing the device control method according to an embodiment of the present application. It should be noted that the computer system 1200 of the electronic device shown in fig. 11 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 11, the computer system 1200 includes a Central Processing Unit (CPU) 1201, which can perform various appropriate actions and processes, such as executing the methods in the above-described embodiments, according to a program stored in a Read-Only Memory (ROM) 1202 or a program loaded from a storage section 1208 into a Random Access Memory (RAM) 1203. In the RAM 1203, various programs and data necessary for system operation are also stored. The CPU1201, ROM1202, and RAM 1203 are connected to each other by a bus 1204. An Input/Output (I/O) interface 1205 is also connected to bus 1204.

The following components are connected to the I/O interface 1205: an input section 1206 including a keyboard, a mouse, and the like; an output section 1207 including a Display device such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 1208 including a hard disk and the like; and a communication section 1209 including a Network interface card such as a LAN (Local Area Network) card, a modem, and the like. The communication section 1209 performs communication processing via a network such as the internet. A driver 1210 is also connected to the I/O interface 1205 as needed. A removable medium 1211, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is mounted on the drive 1210 as necessary, so that a computer program read out therefrom is mounted into the storage section 1208 as necessary.

In particular, according to embodiments of the present application, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer-readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 1209, and/or installed from the removable medium 1211. The computer program performs various functions defined in the system of the present application when executed by a Central Processing Unit (CPU) 1201.

It should be noted that the computer readable media shown in the embodiments of the present application may be computer readable signal media or computer readable storage media or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.

As another aspect, the present application also provides a computer-readable storage medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer-readable storage medium carries computer-readable instructions that, when executed by a processor, implement the method of any of the embodiments described above.

According to an aspect of the present application, there is also provided an electronic device, including: a processor; a memory having computer readable instructions stored thereon which, when executed by the processor, implement the method of any of the above embodiments.

According to an aspect of an embodiment of the present application, there is provided a computer program product or a computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, so that the computer device executes the method in any of the above embodiments.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present application.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art will understand that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not necessarily depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. An apparatus control method, characterized in that the method comprises:

identifying the body posture of a target object in a video frame of the real-time video stream;

if at least two target body postures are identified from continuous video frames with a preset frame number, a control strategy matched with the at least two target body postures is triggered to be obtained, so that a target control instruction is sent to corresponding target equipment based on the control strategy.

2. The method of claim 1, wherein if at least two target poses are identified from a preset number of consecutive video frames, triggering obtaining a control strategy matching the at least two target poses, comprises:

and if at least two first target body postures are identified from any one of the continuous video frames with the preset frame number, triggering and acquiring a control strategy matched with the at least two first target body postures.

3. The method of claim 2, wherein the triggering of obtaining the control strategy matching with the at least two target poses if at least two target poses are identified from a preset number of consecutive video frames comprises:

and if at least two first target body postures are identified from the first video frames in the continuous video frames with the preset frame number, and at least one second target body posture is identified from the second video frames in the continuous video frames with the preset frame number, triggering and acquiring a control strategy matched with the combination of the at least two first target body postures and the at least one second target body posture.

4. The method of claim 1, wherein the triggering of obtaining the control strategy matching with the at least two target poses if at least two target poses are identified from a preset number of consecutive video frames comprises:

and if at least one third target body posture is identified from the third video frames in the continuous video frames with the preset frame number, and at least one fourth target body posture is identified from the fourth video frames in the continuous video frames with the preset frame number, triggering and acquiring a control strategy matched with the combined at least one third target body posture and the at least one fourth target body posture.

5. The method of claim 1, wherein if at least two target poses are identified from a preset number of consecutive video frames, triggering obtaining a control strategy matching the at least two target poses, comprises:

if the specified body posture is recognized in a fifth video frame in the real-time video stream, and at least one target body posture is recognized in a continuous video with a preset frame number collected after the fifth video frame, triggering and obtaining a control strategy matched with the at least two target body postures.

6. The method according to any one of claims 1 to 5, wherein after the step of identifying the physical pose of the target object in the video frames of the real-time video stream, the method further comprises:

if a plurality of body gestures are identified in one frame of video frame of the real-time video stream, determining at least one candidate body gesture from the plurality of body gestures according to the ratio of each body gesture in the plurality of body gestures in the one frame of video frame;

and judging whether the at least one candidate body posture has a specified body posture or at least one target body posture.

7. The method of claim 6, wherein if a plurality of body poses are identified in a frame of video of the real-time video stream, determining at least one candidate body pose from the plurality of body poses according to a ratio of each body pose in the frame of video, comprises:

if a plurality of body gestures are identified in one frame of video frame of the real-time video stream, determining at least one body gesture with the largest ratio from the plurality of body gestures as a candidate body gesture according to the ratio of each body gesture in the plurality of body gestures in the one frame of video frame.

8. The method of claim 7, wherein determining from the plurality of body poses at least one body pose with the largest aspect ratio as the candidate body pose according to the aspect ratio of each body pose in the plurality of body poses in the frame of video, comprises:

framing each body posture in the frame image by an anchoring frame of a preset shape according to the position of each body posture in the plurality of body postures in the frame video and a preset characteristic value corresponding to each body posture in the plurality of body postures;

and determining the body gesture framed by at least one anchor frame with the largest area in the areas of the anchor frames corresponding to the body gestures as a candidate body gesture.

9. The method according to claim 7, wherein determining, from the plurality of body poses, the largest-ratio one body pose as the candidate body pose according to the ratio of each body pose in the plurality of body poses in the frame of video, comprises:

determining an image area occupied by each body posture in the plurality of body postures in the frame of video frame according to all pixel points contained in each body posture in the plurality of body postures;

and determining at least one body posture which occupies the largest image area in the area of the image area occupied by each body posture as a candidate body posture.

10. The method of claim 6, wherein prior to the step of identifying the physical pose of the target object in the video frames of the real-time video stream, the method further comprises:

acquiring at least two preset body postures and at least one preset control strategy of at least one preset object, wherein the at least two preset body postures comprise at least two preset standard body postures;

establishing and storing a mapping relation between at least two preset standard body postures of the obtained at least one preset object and the at least one preset control strategy;

if at least two target body postures are identified from the continuous video frames with the preset frame number, triggering and acquiring a control strategy matched with the at least two target body postures, wherein the control strategy comprises the following steps:

and if at least two body postures are recognized as preset standard body postures from the continuous video frames with preset frame numbers, determining that the at least two body postures are target body postures, and triggering to acquire a control strategy matched with the at least two target body postures according to the mapping relation.

11. The method according to any one of claims 1 to 5 or 7 to 10, wherein the body posture of the target object comprises a whole body posture of a human body and/or a part posture of the human body, wherein the part posture of the human body comprises a head posture of the human body and/or a limb posture of the human body, and wherein the limb posture of the human body comprises a hand gesture of the human body.

12. An apparatus control device, characterized in that the device comprises:

the identification module is used for identifying the body posture of a target object in a video frame of the real-time video stream;

the control module is used for triggering and acquiring a control strategy matched with at least two target body postures if at least two target body postures are identified from continuous video frames with preset frame numbers, so as to send a target control instruction to corresponding target equipment based on the control strategy.

13. An electronic device, comprising:

a memory for storing one or more computer programs;

one or more processors configured to retrieve from the memory and execute the one or more computer programs to perform the method of any of claims 1 to 11.

14. A computer-readable storage medium, characterized in that a program code is stored in the computer-readable storage medium, which program code can be called by a processor to execute the method according to any one of claims 1 to 11.