CN113033252A - Attitude detection method, attitude detection device and computer-readable storage medium - Google Patents

Attitude detection method, attitude detection device and computer-readable storage medium Download PDF

Info

Publication number
CN113033252A
CN113033252A CN201911344827.3A CN201911344827A CN113033252A CN 113033252 A CN113033252 A CN 113033252A CN 201911344827 A CN201911344827 A CN 201911344827A CN 113033252 A CN113033252 A CN 113033252A
Authority
CN
China
Prior art keywords
target object
gesture
frames
posture
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911344827.3A
Other languages
Chinese (zh)
Other versions
CN113033252B (en
Inventor
赵薇
廖可
宫卫涛
伊红
王炜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ricoh Co Ltd
Original Assignee
Ricoh Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ricoh Co Ltd filed Critical Ricoh Co Ltd
Priority to CN201911344827.3A priority Critical patent/CN113033252B/en
Publication of CN113033252A publication Critical patent/CN113033252A/en
Application granted granted Critical
Publication of CN113033252B publication Critical patent/CN113033252B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention provides a posture detection method, a posture detection device and a computer readable storage medium. The attitude detection method according to the embodiment of the invention comprises the following steps: acquiring at least two frames of video images which are arranged in a time sequence within a preset time range from a video image stream; respectively carrying out target object identification on each frame of video image of the at least two frames of video images, and determining at least one target object needing gesture detection from the target object; predicting the posture of a target object in at least one frame of video image of the at least two frames of video images by combining the time sequence of the at least two frames of video images to obtain a posture prediction result; recognizing the gesture of a target object in at least one frame of video image of the at least two frames of video images to obtain a gesture recognition result; and acquiring a posture detection result of the at least one target object based on the posture prediction result and the posture recognition result.

Description

Attitude detection method, attitude detection device and computer-readable storage medium
Technical Field
The present invention relates to the field of image processing, and in particular, to a method and an apparatus for detecting an attitude, and a computer-readable storage medium.
Background
With the development of computer vision and human body interaction technology, in the application process of an interactive system, instructions can be acquired by recognizing the posture of a human body or other objects and interact with the system. The current gesture detection method generally obtains the gesture of the target object from an image or video image frame, then recognizes the obtained gesture, and outputs the result of gesture recognition.
However, this gesture detection method usually consumes a large amount of system operation time, and if the system does not respond until after recognizing the gesture, a large time error necessarily exists between the time point of the response and the actual time point of the gesture made by the target object, which may result in greatly reducing the user experience of the system.
Therefore, a gesture detection method and device capable of effectively detecting gestures in real time are needed to reduce the system response time and improve the user experience on the basis of ensuring the accuracy of gesture detection.
Disclosure of Invention
To solve the above technical problem, according to an aspect of the present invention, there is provided a posture detecting method including: acquiring at least two frames of video images which are arranged in a time sequence within a preset time range from a video image stream; respectively carrying out target object identification on each frame of video image of the at least two frames of video images, and determining at least one target object needing gesture detection from the target object; predicting the posture of a target object in at least one frame of video image of the at least two frames of video images by combining the time sequence of the at least two frames of video images to obtain a posture prediction result; recognizing the gesture of a target object in at least one frame of video image of the at least two frames of video images to obtain a gesture recognition result; and acquiring a posture detection result of the at least one target object based on the posture prediction result and the posture recognition result.
According to still another aspect of the present invention, there is provided a posture detecting apparatus including: the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is configured to acquire at least two frames of video images which are arranged in a time sequence within a preset time range from a video image stream; the determining unit is configured to respectively perform target object recognition on each frame of video images of the at least two frames of video images, and determine at least one target object needing gesture detection from the target object; a prediction unit configured to predict a pose of a target object in at least one frame of video images of the at least two frames of video images in combination with a temporal order of the at least two frames of video images to obtain a pose prediction result; the recognition unit is configured to recognize the gesture of the target object in at least one frame of video image of the at least two frames of video images so as to obtain a gesture recognition result; a detection unit configured to acquire a posture detection result of the at least one target object based on the posture prediction result and a posture recognition result.
According to still another aspect of the present invention, there is provided a posture detecting apparatus including: a processor; and a memory having computer program instructions stored therein, wherein the computer program instructions, when executed by the processor, cause the processor to perform the steps of: acquiring at least two frames of video images which are arranged in a time sequence within a preset time range from a video image stream; respectively carrying out target object identification on each frame of video image of the at least two frames of video images, and determining at least one target object needing gesture detection from the target object; predicting the posture of a target object in at least one frame of video image of the at least two frames of video images by combining the time sequence of the at least two frames of video images to obtain a posture prediction result; recognizing the gesture of a target object in at least one frame of video image of the at least two frames of video images to obtain a gesture recognition result; and acquiring a posture detection result of the at least one target object based on the posture prediction result and the posture recognition result.
According to yet another aspect of the invention, there is provided a computer readable storage medium having computer program instructions stored thereon, wherein the computer program instructions, when executed by a processor, implement the steps of: acquiring at least two frames of video images which are arranged in a time sequence within a preset time range from a video image stream; respectively carrying out target object identification on each frame of video image of the at least two frames of video images, and determining at least one target object needing gesture detection from the target object; predicting the posture of a target object in at least one frame of video image of the at least two frames of video images by combining the time sequence of the at least two frames of video images to obtain a posture prediction result; recognizing the gesture of a target object in at least one frame of video image of the at least two frames of video images to obtain a gesture recognition result; and acquiring a posture detection result of the at least one target object based on the posture prediction result and the posture recognition result.
According to the above-mentioned gesture detection method, device and computer-readable storage medium of the present invention, gesture prediction and gesture recognition can be performed for the target object acquired in the video image, respectively, in combination with the time sequence of the video image, and gesture detection can be performed based on the results of the gesture prediction and the gesture recognition. The gesture detection method, the gesture detection device and the computer readable storage medium can effectively detect the gesture in real time, reduce the response time of the system on the basis of ensuring the accuracy of gesture detection and improve the user experience.
Drawings
The above and other objects, features, and advantages of the present invention will become more apparent from the detailed description of the embodiments of the present invention when taken in conjunction with the accompanying drawings.
FIG. 1 shows a flow diagram of a gesture detection method according to one embodiment of the invention;
FIG. 2 illustrates one example of an application scenario of a gesture detection method according to one embodiment of the present invention;
FIG. 3 shows a schematic view of a human skeletal model, according to an embodiment of the present invention;
FIG. 4(a) illustrates points in each feature point identification object identified for a scene of one embodiment of the invention used to calculate motion regularity parameters; fig. 4(b) shows a schematic diagram in which a motion regularity curve is plotted separately for each selected point.
FIG. 5 illustrates a process of deriving gesture trajectories for limb motion and gestures according to one embodiment of the invention;
FIG. 6 shows a block diagram of a gesture detection apparatus according to one embodiment of the invention;
fig. 7 shows a block diagram of a gesture detection apparatus according to an embodiment of the present invention.
Detailed Description
A posture detection method, apparatus, and computer-readable storage medium according to embodiments of the present invention will be described below with reference to the accompanying drawings. In the drawings, like reference numerals refer to like elements throughout. It should be understood that: the embodiments described herein are merely illustrative and should not be construed as limiting the scope of the invention.
In the method for detecting the pose of an image or a video image frame, a target object is generally obtained first, and then the pose of the obtained target object is detected. However, such a gesture detection method can only detect and match the gesture already made, and generally has a large hysteresis, and cannot meet the requirement that the gesture recognition system wants to accurately detect the gesture in real time.
A posture detection method according to an embodiment of the present invention will be described below with reference to fig. 1. The gesture detection method of the embodiment of the invention can be applied to video images acquired from video image streams. Alternatively, the video image for performing the gesture detection in the embodiment of the present invention may be a two-dimensional plane video image acquired by a general plane camera or a camera, a two-dimensional wide-angle video image acquired by a wide-angle lens, or a two-dimensional wide-angle video image acquired by a wide-angle lens, for example
Figure BDA0002333050060000041
Such as a panoramic camera, without limitation. When the video image is a panoramic video image, optionally, the panoramic video image may be first subjected to projection processing, and then the projected video image is subjected to the following processingTo improve the accuracy of gesture detection.
FIG. 1 shows a flow diagram of a gesture detection method 100 of an embodiment of the invention. As shown in fig. 1, in step S101, at least two frames of video images arranged in time sequence within a preset time range are acquired from a video image stream.
According to the embodiment of the invention, two or more frames of video images can be acquired in real time from the video image stream. In one example, the video images obtained from the video image stream may be video images arranged frame by frame in a time sequence; in another example, the video images obtained from the video image stream may also be non-frame-wise arranged video images with a certain interval in between. When the video image is acquired, the time-related information of the video image can be acquired simultaneously for subsequent gesture detection operation. In addition, optionally, a time range of the video image to be acquired may be set, so as to perform gesture detection on the video image in the time range in a targeted manner.
In step S102, target object recognition is performed on each frame of video images of the at least two frames of video images, and at least one target object requiring gesture detection is determined therefrom.
According to the embodiment of the present invention, optionally, target object recognition may be performed on each frame of the at least two frames of video images directly by using object feature recognition or edge detection. Optionally, feature point identification may be performed on each frame of video image in the at least two frames of video images, so as to obtain one or more feature point identification objects therefrom, and feature points corresponding to the feature point identification objects may be obtained; subsequently, at least one target object that needs to be subjected to posture detection may be determined from the at least one feature point recognition object, and specifically, each of the acquired feature point recognition objects may be evaluated according to the evaluation parameter to determine the at least one target object according to the evaluation result. For example, the evaluation parameter may include at least one of a position parameter, a size parameter, a regularity of motion parameter, and an offset parameter of the feature point identifying object.
According to an example of the present invention, a position parameter in the evaluation parameters may be used to indicate a position where the feature point recognition object is located. For example, the position parameter may be a distance between a position of the feature point recognition object and a preset position in the video image. In one example, the preset position in the video image may be a center position of the video image, and the position parameter may indicate a distance between the feature point recognition object and the center position of the video image. In another example, the preset position in the video image may be a position of a system device for system interaction in the application scene in the video image, and the position parameter may indicate a distance between the feature point identification object and the system device for system interaction. Of course, the above-mentioned method for calculating the position parameter is only an example, and the position parameter is not limited to the parameter representing the distance, and may be another method for representing the position of the feature point recognition object, and is not limited herein.
According to an example of the present invention, a size parameter of the evaluation parameters may be used to indicate a size of the feature point recognition object in the video image. For example, the size parameter may indicate an area occupied by the feature point identification object in the video image, or may indicate a size range of an object identification frame corresponding to the feature point identification object (for example, when the object identification frame of the feature point identification object is a rectangle, the size parameter may be a length and a width of the rectangle, and when the object identification frame of the feature point identification object is a circle or an ellipse, the size parameter may be a related parameter of the circle or the ellipse). In an example of the present invention, for example, if the identified feature point is a bone model of the feature point identification object, the size parameter may be a sum of lengths of joints in the bone model corresponding to the feature point identification object. Of course, the above-mentioned manner for calculating the size parameter is only an example, and other setting of the size parameter and corresponding calculation manners may also be adopted, which is not limited herein.
According to an example of the present invention, a motion regularity parameter in the evaluation parameter may be used to indicate periodicity and regularity of motion of the feature point recognition object in a certain time range. For example, the degree of fitting between the motion of a feature point recognition object in a current period of time and a previous motion (e.g., whether the motion amplitudes are similar or not and whether the motion time periods are identical or not) may be fitted by first calculating the motion state (e.g., the motion amplitude, the motion time period, etc.) of the feature point recognition object in a previous period of time, and the obtained fitting result is used as the motion regularity parameter in the evaluation parameter. The above-mentioned manner for calculating the motion regularity parameters is only an example, and other setting of the motion regularity parameters and corresponding calculation manners may also be adopted, which is not limited herein.
According to an example of the present invention, the offset parameter in the evaluation parameter may be used to obtain, after one or more target objects requiring gesture detection have been obtained, an offset between a feature point detection object in a video object obtained after calculation and a corresponding target object obtained. For example, after a first target object has been acquired, a distance between one or more feature point detection objects in a video image and the first target object may be calculated as an offset parameter between each feature point detection object and the first target object. Of course, when more than one target object has been currently acquired, for example, when the target object includes a first target object and a second target object, distances between one or more feature point detection objects in the video image and the first target object and the second target object may also be calculated respectively as offset parameters for the first target object and the second target object respectively. The above calculation method for the offset parameter is only an example, and in the actual application process, other setting and calculation methods for the offset parameter may also be adopted, which are not described herein again.
According to an embodiment of the present invention, the target object for posture detection may be determined by obtaining the above-described evaluation parameters including, for example, a position parameter, a size parameter, a motion regularity parameter, and a shift parameter, evaluating the object for each feature point recognition, and determining the target object according to the evaluation result. Alternatively, calculation and weighting may be performed for each of the evaluation parameters, and the evaluation parameter in which the result of weighting each parameter is taken into consideration is used as the evaluation criterion of the feature point identification object. For example, in a certain application scenario, it is desirable to take the distance between the feature point recognition object and the system device for system interaction as the most important consideration, the weight of the position parameter in the evaluation parameters may be set to be the maximum, while the weights of the rest of the size parameters, the motion regularity parameters, the offset parameters, and the like are set to be smaller, or even only the factors of the position parameter in the evaluation parameters are considered. For another example, in an application scenario, if it is desired to take the feature point recognition object closest to the camera as the most important consideration, the weight of the size parameter in the evaluation parameters may be set to be the maximum, and the weights of the rest of the position parameters, the motion regularity parameters, the offset parameters, and the like may be set to be smaller, or even only the size parameter in the evaluation parameters may be considered. For another example, in a certain application scenario, the target object for gesture detection in the current scenario cannot be determined, and only the position parameter, the size parameter, the motion regularity parameter, and the weight thereof in the evaluation parameter may be considered, and the relevant factor of the offset parameter may not be considered. For example, in a certain application scenario, a certain target object already exists, and considering that the detection result of the motion regularity parameters is long and cumbersome, only the position parameter, the size parameter, the offset parameter, and the weight thereof in the evaluation parameters may be considered, and the motion regularity parameters may not be considered.
After obtaining the result of the evaluation parameter of each feature point recognition object, one feature point recognition object from which the result of the evaluation parameter most conforms to expectation can be selected as a target object for gesture detection; one or more feature point recognition objects whose results of evaluating the parameters exceed a set threshold may also be set as target objects for gesture detection by setting a certain threshold. The above calculation methods of the evaluation parameters and the determination method of the target object are only examples, and in practical applications, any setting and calculation method of the evaluation parameters and any determination method of the target object may be adopted, which is not limited herein.
According to the embodiment of the present invention, the target objects respectively identified from each of the at least two frames of video images may be the same or different from each other, and the number of the identified target objects may not be limited. For example, it is possible to respectively identify the same one or more target objects from each of the at least two frames of video images, and determine all or a part of the identified target objects as the target objects requiring the gesture detection. For another example, it is also possible to respectively identify one or more target objects different from each other from each of the at least two frames of video images, and select a target object from the one or more target objects that needs to be followed by pose detection. Optionally, when some video images do not have a finally selected target object which needs to be subjected to the attitude detection, the video images may not be processed in the subsequent process, so as to eliminate unnecessary noise as much as possible, save system resources and improve the attitude detection efficiency. The above are only some examples of the determination manners of the target object that needs to perform the gesture detection, and in practical applications, different determination manners of the target object may be selected according to different required scenes, which is not limited herein.
In step S103, a pose of a target object in at least one frame of video images of the at least two frames of video images is predicted according to a time sequence of the at least two frames of video images, so as to obtain a pose prediction result.
After the target object needing gesture detection is obtained, the gesture track of at least one target object to be detected can be obtained according to the time sequence, and the gesture track is matched with the gesture track model. Optionally, the gesture trajectory of the target object may be a variation curve of the gesture feature with time. For example, the posture feature may include one or more of the parameters of the target object, such as the position of the limb, the angle of the limb, the speed of the limb movement, the speed of the joint point movement, the angle of the joint, and the like. After the pose feature of the video image at a certain time point is acquired, a change track of the pose feature with time can be drawn by taking time as an axis, and the change track is taken as a pose track of a target object.
After the gesture trajectory of the target object is obtained, the gesture trajectory of the target object may be matched with a pre-stored gesture trajectory model to predict a gesture to be made by the target object at a next time point or time period, so as to obtain a gesture prediction result. The pre-stored posture trajectory model may also be a rule that the posture features change with time, for example, for each posture, a model that the corresponding posture features change with time may be stored as the posture trajectory model of the posture. When the gesture track of the target object is matched with the pre-stored gesture track model, the trend of the gesture track of the target object changing along with time can be fitted with the gesture track model of a certain gesture within a period of time to judge whether the gesture is met. After the attitude trajectory of the target object is fitted to the attitude trajectory models of the plurality of attitudes, the attitude closest to the target object may be selected according to the fitting result and used as the attitude prediction result of the target object.
In step S104, a gesture of the target object in at least one of the at least two frames of video images is recognized to obtain a gesture recognition result.
After the target object to be subjected to the posture detection is acquired, the acquired posture of the target object may be matched with a static posture model stored in advance. Optionally, the pose of the target object may also include certain pose features. For example, the posture feature may include one or more of the parameters of the target object such as the limb position, the limb angle, the joint point coordinates, and the like.
After the posture of the target object is obtained, the posture of the target object may be matched with a pre-stored static posture model to identify the current posture of the target object, so as to obtain a posture identification result. Wherein, the pre-stored static attitude model may also include one or more of the aforementioned attitude characteristics. When the posture of the target object is matched with the pre-stored static posture model, the posture of the target object can be compared with the static posture models of multiple postures, and then the closest posture can be selected according to the fitting result as the posture recognition result of the target object.
In step S105, a posture detection result of the at least one target object is acquired based on the posture prediction result and the posture recognition result.
According to an embodiment of the present invention, the order of obtaining the gesture prediction result and the gesture recognition result is not limited. Alternatively, the gesture prediction result may be obtained first, and then the gesture recognition result may be obtained; optionally, the gesture recognition result may be obtained first, and then the gesture prediction result may be obtained; of course, the gesture prediction result and the gesture recognition result may be obtained simultaneously. Further, in one example, only the gesture recognition result may be obtained without obtaining the gesture prediction result; of course, in another example, only the gesture prediction result may be predicted without obtaining the gesture recognition result. The above-mentioned various result obtaining manners are all examples, and are not limited herein.
In the case where the gesture prediction result and the gesture recognition result of the at least one target object have been obtained, the corresponding confidence levels may be determined according to the obtained gesture prediction result and the obtained gesture recognition result, respectively. Optionally, the confidence of the posture prediction result may be a fitting degree between the posture trajectory of the target object and a corresponding posture trajectory model thereof, such as a difference between limb angles, a difference between limb movement speed changes, and the like; the confidence of the gesture recognition result may be a fitting degree between the gesture of the target object and the corresponding static gesture model, such as a difference between limb angles, a difference between specific positions of limbs, and the like. Of course, the above calculation manner of the confidence is only an example, and is not limited herein.
After determining the confidence levels of the pose prediction result and the pose recognition result of the at least one target object, a comparison may be made between the confidence levels of the pose prediction result and the pose recognition result of the target object. Specifically, when the confidence degree comparison result between the posture prediction result and the posture recognition result meets a preset condition, taking the posture prediction result as the posture detection result of the at least one target object; and when the confidence degree comparison result of the attitude prediction result and the attitude recognition result does not meet the preset condition, taking the attitude recognition result as the attitude detection result of the at least one target object. The preset conditions may be different according to an application scenario of the method according to the embodiment of the present invention. For example, the magnitudes of the confidence levels of the posture prediction result and the posture recognition result may be compared, and when the confidence level of the posture prediction result is greater, the posture prediction result is used as the posture detection result of the at least one target object; and when the confidence of the gesture recognition result is larger, taking the gesture recognition result as the gesture detection result of the at least one target object.
Alternatively, according to an embodiment of the present invention, considering that the gesture of the target object may be an intermediate gesture at a certain time point (in the middle of a certain gesture, the gesture is not yet completed), in this case, the confidence of the gesture recognition result may be relatively low, and therefore, when the confidence of the gesture recognition result is relatively low (below a threshold), the gesture prediction result (if any) may be used as the gesture detection result of the at least one target object. Conversely, when the gesture of the target object is already in the completed gesture of a certain gesture, the confidence of the gesture recognition result is relatively high, and the detection result of the gesture is relatively accurate, so that the gesture recognition result can be used as the gesture detection result of the at least one target object when the confidence of the gesture recognition result is relatively high (higher than a threshold). Here, when the posture recognition result is not obtained yet at this time, the posture prediction result may be temporarily used as the posture detection result of the target object, and when the posture recognition result is obtained once, the obtained posture recognition result may be used as the posture detection result of the target object. Further, in order to further improve the accuracy of the gesture prediction result, the obtained gesture recognition result may be used to modify the acquisition criterion of the gesture prediction result, for example, the gesture trajectory model may be optimized, and the relevant parameter or threshold range for gesture prediction may be modified.
Of course, the above manner of obtaining the gesture detection result of the at least one target object according to the gesture prediction result and the gesture recognition result is only an example, and in practical applications, any manner of obtaining the gesture detection result by combining the gesture prediction result and the gesture recognition result may be considered, and is not limited herein.
The following describes how to obtain the pose trajectory model and the static pose model for pose detection in the embodiment of the present invention, and the following obtaining manner is only an example, and in practical applications, any pose trajectory model and static pose model may be used for pose detection for a target object.
The attitude trajectory model and the static attitude model according to the embodiment of the invention can be obtained by adopting the similar process of the attitude detection method. In one example, at least two frames of video images arranged in time sequence within a preset time range can be obtained from a video image stream; respectively identifying target objects of each frame of video image of the at least two frames of video images, and determining at least one target object needing model acquisition from the target objects; acquiring a gesture track model aiming at a target object in at least one frame of video image of the at least two frames of video images by combining the time sequence of the at least two frames of video images; and acquiring a static attitude model aiming at the attitude of the target object in at least one frame of video image of the at least two frames of video images. In addition, after the static attitude model is acquired, the attitude trajectory model can be optimized by using the acquired result of the static attitude model. As mentioned above, the posture trajectory model and the static posture model may be obtained simultaneously or sequentially, and are not limited herein. In practical applications, the pose trajectory model may be obtained for a certain video image stream, and the static pose model may be obtained for another video image stream.
According to the attitude detection method provided by the embodiment of the invention, the attitude prediction and the attitude recognition can be respectively carried out on the target object acquired in the video image by combining the time sequence of the video image, and the attitude detection is carried out based on the results of the attitude prediction and the attitude recognition. The gesture detection method can effectively detect the gesture in real time, reduce the response time of the system on the basis of ensuring the accuracy of gesture detection and improve the user experience.
Fig. 2 shows an example of an application scenario of the gesture detection method according to the embodiment of the present invention. In fig. 2, the device for performing human-computer interaction is a treadmill, and the treadmill acquires a video image stream changing with time through a camera arranged thereon, and obtains at least two frames of video images arranged in time sequence within a preset time range from the video image stream.
After the at least two frames of video images are acquired, feature point detection can be performed on each frame of video image, so as to construct a human skeleton model as shown in fig. 3. In which the human skeleton model shows a plurality of joint points in the human skeleton as feature points, such as 18 joint points represented as 0-17 in fig. 3, which respectively represent 18 important joints on the human body. Point 1 as in fig. 3 may be used to indicate the cervical point in the human skeleton. After feature point detection, in the example of the present disclosure, respective feature points on the corresponding feature point detection object and the skeleton model thereof may be detected, and then, the feature points on the skeleton and the change rule thereof with time may be utilized to evaluate each feature point identification object by utilizing the evaluation parameter.
As described previously, the evaluation parameter may include at least one of the position parameter C1, the size parameter C2, the regularity of motion parameter C3, and the offset parameter C4 of each feature point recognition object. In an example of the embodiment of the present invention, the position parameter C1 may be represented as a position of a certain feature point on the feature point recognition object and the center of the video image. For example, the neck point Pe (x) on the object is recognized by the feature pointe,ye) For example, the location parameter may represent Pe (x)e,ye) And a video image center Pc(xc,yc) The distance between them. In an example of embodiment of the present invention, the size parameter C2 in the evaluation parameter may identify all valid limbs on the object for each feature pointLength lxAnd L { L }1,l2,l3,., the active limb may identify a relatively motionless limb of all the limbs in the subject, for example, for feature points.
In an example of the embodiment of the present invention, the motion regularity parameter C3 in the evaluation parameter may be used to indicate the periodicity and regularity of the motion of the feature point identification object in a certain time range. As described above, the degree of fitting of a feature point recognition object to a previous feature point recognition object in a current time range may be fitted by first calculating a state (e.g., a motion amplitude, a motion time period, etc.) of the feature point recognition object moving in the previous time range. Fig. 4(a) shows points for calculating motion regularity parameters in each feature point identification object identified for the scene of the embodiment of the present invention, that is, the neck points Pe framed by an ellipse in each feature point identification object, including Pe1、Pe2、Pe3And Pe4(ii) a Fig. 4(b) shows that a motion regularity curve is respectively drawn for each selected neck point, and a fitting result of motion regularity is calculated from the drawn curve as a motion regularity parameter of the corresponding feature point recognition object. For example, in the plotted curve, the horizontal axis may be time, and the vertical axis may be a change in the ordinate y of the movement of the cervical point, a change in the abscissa x, or a change in the abscissa x, and may of course be a value of y/x as shown in fig. 4 (b). As can be seen from FIG. 4(b), the neck point Pe2And Pe4Irregular movement, cervical point Pe3No movement, and a cervical point Pe1The motion is relatively regular, and the value of the corresponding motion regularity parameter may be relatively high.
In the present example, the offset parameter C4 in the evaluation parameter may be determined when, for example, the corresponding neck point is detected to be Pe1Calculating the position of the neck point of the feature point detection object in the current video image and the position of the neck point Pe1To know the offset of a current detection object relative to a previous target object at a certain feature point.
Optionally, the respective evaluation parameters in this example: the position parameter C1, the size parameter C2, the motion regularity parameter C3 and the offset parameter C4 may be obtained by giving corresponding weighting values W1, W2, W3 and W4, respectively, to calculate the value C of the evaluation parameter, which is W1 × C1+ W2 × C2+ W3 × C3+ W4 × C4, and determine one or more target objects in which pose detection is required, according to a preset threshold or other means. The specific determination method is as described above, and is not described herein again.
After determining the target object which needs to be subjected to the gesture detection, optionally, the gesture prediction result and the gesture recognition result of the target object can be respectively calculated. Specifically, when calculating the posture prediction result, the function f (t) of the change of a certain limb angle of the target object with time may be obtained first, and the noise influence n is considered0(t), obtaining the posture track of the limb of the target object: θ (t) ═ f (t) + n0(t) of (d). When a plurality of limbs are in motion, a plurality of posture tracks aiming at different limbs can be constructed and are respectively compared with the motion tracks of the corresponding limbs in the posture track model. FIG. 5 shows a process for deriving pose trajectories for limb movements and poses in the top series of photographs of FIG. 5, according to an example of an embodiment of the present invention. For the postures of the two-arm movement shown in fig. 5, the law of the movement of the left arm and the right arm with time may be considered, respectively. Wherein, for example, the gesture trajectory of the right arm can be shown as the lower left of FIG. 5, denoted as θ1(t)=f1(t)+n0(t) and the pose trajectory of the left arm can be shown in the lower right of FIG. 5, denoted as θ2(t)=f2(t)+n0(t) wherein f1(t)、f2(t) is a function of the angle of the right and left arms, respectively, over time, n0(t) is noise. The correspondingly plotted curve of the change of the limb angle with time can also be as shown in fig. 5, and can be respectively compared with the corresponding limb curve of the corresponding posture in the prestored posture trajectory model to obtain the comparison result.
Optionally, when the gesture recognition result is calculated, the gesture of the target object in at least one frame of the video image may also be compared with the static gesture model, and if the range of the limb angle of the target object meets a certain threshold range of the static gesture, the current gesture of the target object corresponds to the gesture in the static gesture model, so as to obtain the gesture recognition result.
After the pose prediction result and the pose recognition result are obtained, respectively, a pose detection result of the at least one target object may be obtained based on the pose prediction result and the pose recognition result. The specific detection method is as described above, and is not described herein again.
Next, a posture detection apparatus according to an embodiment of the present invention is described with reference to fig. 6. Fig. 6 shows a block diagram of a gesture detection apparatus 600 according to an embodiment of the present invention. As shown in fig. 6, the posture detection apparatus 600 includes an acquisition unit 610, a determination unit 620, a prediction unit 630, a recognition unit 640, and a detection unit 650. The posture detection apparatus 600 may include other components in addition to these units, however, since these components are not related to the contents of the embodiment of the present invention, illustration and description thereof are omitted herein. Further, since the following operations performed by the posture detection apparatus 600 according to the embodiment of the present invention are the same in specific details as those described above with reference to fig. 1 to 5, the repetitive description of the same details is omitted herein to avoid redundancy.
The acquisition unit 610 acquires at least two frames of video images arranged in time sequence within a preset time range from the video image stream.
According to the embodiment of the present invention, the obtaining unit 610 may obtain two or more frames of video images from the video image stream in real time. In one example, the video images obtained from the video image stream may be video images arranged frame by frame in a time sequence; in another example, the video images obtained from the video image stream may also be non-frame-wise arranged video images with a certain interval in between. When the video image is acquired, the time-related information of the video image can be acquired simultaneously for subsequent gesture detection operation. In addition, optionally, a time range of the video image to be acquired may be set, so as to perform gesture detection on the video image in the time range in a targeted manner.
The determining unit 620 performs target object recognition on each frame of video images of the at least two frames of video images, and determines at least one target object to be subjected to pose detection.
According to the embodiment of the present invention, optionally, the determining unit 620 may perform target object recognition on each frame of the at least two frames of video images directly by using object feature recognition or edge detection. Optionally, the determining unit 620 may further perform feature point identification on each frame of video images of the at least two frames of video images, to obtain one or more feature point identification objects therefrom, and may obtain feature points corresponding to the feature point identification objects respectively; subsequently, the determination unit 620 may determine at least one target object that needs to be subjected to the posture detection from among the at least one feature point recognition object, and specifically, may perform an evaluation for each acquired feature point recognition object according to the evaluation parameter to determine the at least one target object according to the evaluation result. For example, the evaluation parameter may include at least one of a position parameter, a size parameter, a regularity of motion parameter, and an offset parameter of the feature point identifying object.
According to an example of the present invention, a position parameter in the evaluation parameters may be used to indicate a position where the feature point recognition object is located. For example, the position parameter may be a distance between a position of the feature point recognition object and a preset position in the video image. In one example, the preset position in the video image may be a center position of the video image, and the position parameter may indicate a distance between the feature point recognition object and the center position of the video image. In another example, the preset position in the video image may be a position of a system device for system interaction in the application scene in the video image, and the position parameter may indicate a distance between the feature point identification object and the system device for system interaction. Of course, the above-mentioned method for calculating the position parameter is only an example, and the position parameter is not limited to the parameter representing the distance, and may be another method for representing the position of the feature point recognition object, and is not limited herein.
According to an example of the present invention, a size parameter of the evaluation parameters may be used to indicate a size of the feature point recognition object in the video image. For example, the size parameter may indicate an area occupied by the feature point identification object in the video image, or may indicate a size range of an object identification frame corresponding to the feature point identification object (for example, when the object identification frame of the feature point identification object is a rectangle, the size parameter may be a length and a width of the rectangle, and when the object identification frame of the feature point identification object is a circle or an ellipse, the size parameter may be a related parameter of the circle or the ellipse). In an example of the present invention, for example, if the identified feature point is a bone model of the feature point identification object, the size parameter may be a sum of lengths of joints in the bone model corresponding to the feature point identification object. Of course, the above-mentioned manner for calculating the size parameter is only an example, and other setting of the size parameter and corresponding calculation manners may also be adopted, which is not limited herein.
According to an example of the present invention, a motion regularity parameter in the evaluation parameter may be used to indicate periodicity and regularity of motion of the feature point recognition object in a certain time range. For example, the degree of fitting between the motion of a feature point recognition object in a current period of time and a previous motion (e.g., whether the motion amplitudes are similar or not and whether the motion time periods are identical or not) may be fitted by first calculating the motion state (e.g., the motion amplitude, the motion time period, etc.) of the feature point recognition object in a previous period of time, and the obtained fitting result is used as the motion regularity parameter in the evaluation parameter. The above-mentioned manner for calculating the motion regularity parameters is only an example, and other setting of the motion regularity parameters and corresponding calculation manners may also be adopted, which is not limited herein.
According to an example of the present invention, the offset parameter in the evaluation parameter may be used to obtain, after one or more target objects requiring gesture detection have been obtained, an offset between a feature point detection object in a video object obtained after calculation and a corresponding target object obtained. For example, after a first target object has been acquired, a distance between one or more feature point detection objects in a video image and the first target object may be calculated as an offset parameter between each feature point detection object and the first target object. Of course, when more than one target object has been currently acquired, for example, when the target object includes a first target object and a second target object, distances between one or more feature point detection objects in the video image and the first target object and the second target object may also be calculated respectively as offset parameters for the first target object and the second target object respectively. The above calculation method for the offset parameter is only an example, and in the actual application process, other setting and calculation methods for the offset parameter may also be adopted, which are not described herein again.
According to an embodiment of the present invention, the target object for posture detection may be determined by obtaining the above-described evaluation parameters including, for example, a position parameter, a size parameter, a motion regularity parameter, and a shift parameter, evaluating the object for each feature point recognition, and determining the target object according to the evaluation result. Alternatively, calculation and weighting may be performed for each of the evaluation parameters, and the evaluation parameter in which the result of weighting each parameter is taken into consideration is used as the evaluation criterion of the feature point identification object. For example, in a certain application scenario, it is desirable to take the distance between the feature point recognition object and the system device for system interaction as the most important consideration, the weight of the position parameter in the evaluation parameters may be set to be the maximum, while the weights of the rest of the size parameters, the motion regularity parameters, the offset parameters, and the like are set to be smaller, or even only the factors of the position parameter in the evaluation parameters are considered. For another example, in an application scenario, if it is desired to take the feature point recognition object closest to the camera as the most important consideration, the weight of the size parameter in the evaluation parameters may be set to be the maximum, and the weights of the rest of the position parameters, the motion regularity parameters, the offset parameters, and the like may be set to be smaller, or even only the size parameter in the evaluation parameters may be considered. For another example, in a certain application scenario, the target object for gesture detection in the current scenario cannot be determined, and only the position parameter, the size parameter, the motion regularity parameter, and the weight thereof in the evaluation parameter may be considered, and the relevant factor of the offset parameter may not be considered. For example, in a certain application scenario, a certain target object already exists, and considering that the detection result of the motion regularity parameters is long and cumbersome, only the position parameter, the size parameter, the offset parameter, and the weight thereof in the evaluation parameters may be considered, and the motion regularity parameters may not be considered.
After obtaining the result of the evaluation parameter of each feature point recognition object, one feature point recognition object from which the result of the evaluation parameter most conforms to expectation can be selected as a target object for gesture detection; one or more feature point recognition objects whose results of evaluating the parameters exceed a set threshold may also be set as target objects for gesture detection by setting a certain threshold. The above calculation methods of the evaluation parameters and the determination method of the target object are only examples, and in practical applications, any setting and calculation method of the evaluation parameters and any determination method of the target object may be adopted, which is not limited herein.
According to the embodiment of the present invention, the target objects respectively identified by the determining unit 620 from each of the at least two frames of video images may be the same or different from each other, and the number of the identified target objects may not be limited. For example, it is possible to respectively identify the same one or more target objects from each of the at least two frames of video images, and determine all or a part of the identified target objects as the target objects requiring the gesture detection. For another example, it is also possible to respectively identify one or more target objects different from each other from each of the at least two frames of video images, and select a target object from the one or more target objects that needs to be followed by pose detection. Optionally, when some video images do not have a finally selected target object which needs to be subjected to the attitude detection, the video images may not be processed in the subsequent process, so as to eliminate unnecessary noise as much as possible, save system resources and improve the attitude detection efficiency. The above are only some examples of the determination manners of the target object that needs to perform the gesture detection, and in practical applications, different determination manners of the target object may be selected according to different required scenes, which is not limited herein.
The prediction unit 630 predicts the pose of the target object in at least one frame of the at least two frames of video images in combination with the temporal order of the at least two frames of video images to obtain a pose prediction result.
After acquiring the target object to be subjected to the gesture detection, the prediction unit 630 may acquire the gesture trajectory of at least one target object to be detected according to the time sequence, and match the gesture trajectory model. Optionally, the gesture trajectory of the target object may be a variation curve of the gesture feature with time. For example, the posture feature may include one or more of the parameters of the target object, such as the position of the limb, the angle of the limb, the speed of the limb movement, the speed of the joint point movement, the angle of the joint, and the like. After the pose feature of the video image at a certain time point is acquired, a change track of the pose feature with time can be drawn by taking time as an axis, and the change track is taken as a pose track of a target object.
After acquiring the pose trajectory of the target object, the prediction unit 630 may match the pose trajectory of the target object with a pre-stored pose trajectory model to predict a pose to be made by the target object at a next time point or time period, so as to obtain a pose prediction result. The pre-stored posture trajectory model may also be a rule that the posture features change with time, for example, for each posture, a model that the corresponding posture features change with time may be stored as the posture trajectory model of the posture. When the gesture track of the target object is matched with the pre-stored gesture track model, the trend of the gesture track of the target object changing along with time can be fitted with the gesture track model of a certain gesture within a period of time to judge whether the gesture is met. After the attitude trajectory of the target object is fitted to the attitude trajectory models of the plurality of attitudes, the attitude closest to the target object may be selected according to the fitting result and used as the attitude prediction result of the target object.
The recognition unit 640 recognizes a gesture of a target object in at least one of the at least two video images to obtain a gesture recognition result.
After the target object requiring the posture detection is acquired through the determination unit 620 as described above, the recognition unit 640 may match the posture of the acquired target object with a static posture model stored in advance. Optionally, the pose of the target object may also include certain pose features. For example, the posture feature may include one or more of the parameters of the target object such as the limb position, the limb angle, the joint point coordinates, and the like.
After acquiring the pose of the target object, the recognition unit 640 may match the pose of the target object with a pre-stored static pose model to recognize the current pose of the target object, so as to obtain a pose recognition result. Wherein, the pre-stored static attitude model may also include one or more of the aforementioned attitude characteristics. When the posture of the target object is matched with the pre-stored static posture model, the posture of the target object can be compared with the static posture models of multiple postures, and then the closest posture can be selected according to the fitting result as the posture recognition result of the target object.
The detection unit 650 acquires a posture detection result of the at least one target object based on the posture prediction result and the posture recognition result.
According to an embodiment of the present invention, the obtaining order of the gesture prediction result predicted by the prediction unit 630 and the gesture recognition result recognized by the recognition unit 640 is not limited. Alternatively, the gesture prediction result may be obtained first, and then the gesture recognition result may be obtained; optionally, the gesture recognition result may be obtained first, and then the gesture prediction result may be obtained; of course, the gesture prediction result and the gesture recognition result may be obtained simultaneously. Further, in one example, only the gesture recognition result may be obtained without obtaining the gesture prediction result; of course, in another example, only the gesture prediction result may be predicted without obtaining the gesture recognition result. The above-mentioned various result obtaining manners are all examples, and are not limited herein.
In the case where the posture prediction result and the posture recognition result of the at least one target object have been acquired, the detection unit 650 may determine the respective confidences according to the acquired posture prediction result and the posture recognition result, respectively. Optionally, the confidence of the posture prediction result may be a fitting degree between the posture trajectory of the target object and a corresponding posture trajectory model thereof, such as a difference between limb angles, a difference between limb movement speed changes, and the like; the confidence of the gesture recognition result may be a fitting degree between the gesture of the target object and the corresponding static gesture model, such as a difference between limb angles, a difference between specific positions of limbs, and the like. Of course, the above calculation manner of the confidence is only an example, and is not limited herein.
After determining the confidence degrees of the pose prediction result and the pose recognition result of the at least one target object, the detection unit 650 may compare between the confidence degrees of the pose prediction result and the pose recognition result of the target object. Specifically, when the confidence degree comparison result between the posture prediction result and the posture recognition result meets a preset condition, taking the posture prediction result as the posture detection result of the at least one target object; and when the confidence degree comparison result of the attitude prediction result and the attitude recognition result does not meet the preset condition, taking the attitude recognition result as the attitude detection result of the at least one target object. The preset conditions may be different according to an application scenario of the method according to the embodiment of the present invention. For example, the magnitudes of the confidence levels of the posture prediction result and the posture recognition result may be compared, and when the confidence level of the posture prediction result is greater, the posture prediction result is used as the posture detection result of the at least one target object; and when the confidence of the gesture recognition result is larger, taking the gesture recognition result as the gesture detection result of the at least one target object.
Alternatively, according to an embodiment of the present invention, considering that the gesture of the target object may be an intermediate gesture at a certain time point (in the middle of a certain gesture, the gesture is not yet completed), in this case, the confidence of the gesture recognition result may be relatively low, and therefore, when the confidence of the gesture recognition result is relatively low (below a threshold), the gesture prediction result (if any) may be used as the gesture detection result of the at least one target object. Conversely, when the gesture of the target object is already in the completed gesture of a certain gesture, the confidence of the gesture recognition result is relatively high, and the detection result of the gesture is relatively accurate, so that the gesture recognition result can be used as the gesture detection result of the at least one target object when the confidence of the gesture recognition result is relatively high (higher than a threshold). Here, when the posture recognition result is not obtained yet at this time, the posture prediction result may be temporarily used as the posture detection result of the target object, and when the posture recognition result is obtained once, the obtained posture recognition result may be used as the posture detection result of the target object. Further, in order to further improve the accuracy of the gesture prediction result, the obtained gesture recognition result may be used to modify the acquisition criterion of the gesture prediction result, for example, the gesture trajectory model may be optimized, and the relevant parameter or threshold range for gesture prediction may be modified.
Of course, the above manner of obtaining the gesture detection result of the at least one target object according to the gesture prediction result and the gesture recognition result is only an example, and in practical applications, any manner of obtaining the gesture detection result by combining the gesture prediction result and the gesture recognition result may be considered, and is not limited herein.
The following describes how to obtain the pose trajectory model and the static pose model for pose detection in the embodiment of the present invention, and the following obtaining manner is only an example, and in practical applications, any pose trajectory model and static pose model may be used for pose detection for a target object.
The attitude trajectory model and the static attitude model according to the embodiment of the invention can be obtained by adopting the similar process of the attitude detection method. In one example, at least two frames of video images arranged in time sequence within a preset time range can be obtained from a video image stream; respectively identifying target objects of each frame of video image of the at least two frames of video images, and determining at least one target object needing model acquisition from the target objects; acquiring a gesture track model aiming at a target object in at least one frame of video image of the at least two frames of video images by combining the time sequence of the at least two frames of video images; and acquiring a static attitude model aiming at the attitude of the target object in at least one frame of video image of the at least two frames of video images. In addition, after the static attitude model is acquired, the attitude trajectory model can be optimized by using the acquired result of the static attitude model. As mentioned above, the posture trajectory model and the static posture model may be obtained simultaneously or sequentially, and are not limited herein. In practical applications, the pose trajectory model may be obtained for a certain video image stream, and the static pose model may be obtained for another video image stream.
According to the attitude detection device provided by the embodiment of the invention, the attitude prediction and the attitude recognition can be respectively carried out on the target object acquired in the video image by combining the time sequence of the video image, and the attitude detection can be carried out based on the results of the attitude prediction and the attitude recognition. The gesture detection device can effectively detect gestures in real time, reduce the response time of the system on the basis of ensuring the accuracy of gesture detection, and improve the user experience.
Next, a posture detection apparatus according to an embodiment of the present invention is described with reference to fig. 7. Fig. 7 shows a block diagram of a gesture detection apparatus 700 according to an embodiment of the present invention. As shown in fig. 7, the apparatus 700 may be a computer or a server.
As shown in fig. 7, the gesture detection apparatus 700 includes one or more processors 710 and memory 720, although, of course, the gesture detection apparatus 700 may include input devices, output devices (not shown), and the like, among other things, which may be interconnected via a bus system and/or other form of connection mechanism. It should be noted that the components and structure of the gesture detection apparatus 700 shown in fig. 7 are exemplary only, and not limiting, and the gesture detection apparatus 700 may have other components and structures as needed.
The processor 710 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may utilize computer program instructions stored in the memory 720 to perform desired functions, which may include: acquiring at least two frames of video images which are arranged in a time sequence within a preset time range from a video image stream; respectively carrying out target object identification on each frame of video image of the at least two frames of video images, and determining at least one target object needing gesture detection from the target object; predicting the posture of a target object in at least one frame of video image of the at least two frames of video images by combining the time sequence of the at least two frames of video images to obtain a posture prediction result; recognizing the gesture of a target object in at least one frame of video image of the at least two frames of video images to obtain a gesture recognition result; and acquiring a posture detection result of the at least one target object based on the posture prediction result and the posture recognition result.
Memory 720 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. One or more computer program instructions may be stored on the computer-readable storage medium and executed by processor 710 to implement the functions of the gesture detection apparatus of the embodiments of the invention described above and/or other desired functions, and/or to perform a gesture detection method according to embodiments of the invention. Various applications and various data may also be stored in the computer-readable storage medium.
In the following, a computer readable storage medium according to an embodiment of the present invention is described, on which computer program instructions are stored, wherein the computer program instructions, when executed by a processor, implement the steps of: acquiring at least two frames of video images which are arranged in a time sequence within a preset time range from a video image stream; respectively carrying out target object identification on each frame of video image of the at least two frames of video images, and determining at least one target object needing gesture detection from the target object; predicting the posture of a target object in at least one frame of video image of the at least two frames of video images by combining the time sequence of the at least two frames of video images to obtain a posture prediction result; recognizing the gesture of a target object in at least one frame of video image of the at least two frames of video images to obtain a gesture recognition result; and acquiring a posture detection result of the at least one target object based on the posture prediction result and the posture recognition result.
Of course, the above-mentioned embodiments are merely examples and not limitations, and those skilled in the art can combine and combine some steps and apparatuses from the above-mentioned separately described embodiments to achieve the effects of the present invention according to the concepts of the present invention, and such combined and combined embodiments are also included in the present invention, and such combined and combined embodiments are not necessarily described herein.
Note that advantages, effects, and the like mentioned in the present invention are merely examples and not limitations, and they cannot be considered essential to various embodiments of the present invention. Furthermore, the foregoing detailed description of the invention is provided for the purpose of illustration and understanding only, and is not intended to be limiting, since the invention will be described in any way as it would be understood by one skilled in the art.
The block diagrams of devices, apparatuses, systems involved in the present invention are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".
The flowchart of steps in the present invention and the above description of the method are only given as illustrative examples and are not intended to require or imply that the steps of the various embodiments must be performed in the order presented. As will be appreciated by those skilled in the art, the order of the steps in the above embodiments may be performed in any order. Words such as "thereafter," "then," "next," etc. are not intended to limit the order of the steps; these words are only used to guide the reader through the description of these methods. Furthermore, any reference to an element in the singular, for example, using the terms "a," "an," or "the" is not to be construed as limiting the element to the singular.
In addition, the steps and devices in the embodiments are not limited to be implemented in a certain embodiment, and in fact, some steps and devices in the embodiments may be combined according to the concept of the present invention to conceive new embodiments, and these new embodiments are also included in the scope of the present invention.
The individual operations of the methods described above may be performed by any suitable means capable of performing the corresponding functions. The means may include various hardware and/or software components and/or modules including, but not limited to, a circuit, an Application Specific Integrated Circuit (ASIC), or a processor.
The various illustrative logical blocks, modules, and circuits described may be implemented or performed with a general purpose processor, a Digital Signal Processor (DSP), an ASIC, a field programmable gate array signal (FPGA) or other Programmable Logic Device (PLD), discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any commercially available processor, controller, microcontroller or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
The steps of a method or algorithm described in connection with the invention may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may reside in any form of tangible storage medium. Some examples of storage media that may be used include Random Access Memory (RAM), Read Only Memory (ROM), flash memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, and the like. A storage medium may be coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. A software module may be a single instruction, or many instructions, and may be distributed over several different code segments, among different programs, and across multiple storage media.
The inventive methods herein comprise one or more acts for implementing the described methods. The methods and/or acts may be interchanged with one another without departing from the scope of the claims. In other words, unless a specific order of actions is specified, the order and/or use of specific actions may be modified without departing from the scope of the claims.
The functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored as one or more instructions on a tangible computer-readable medium. A storage media may be any available tangible media that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other tangible medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. As used herein, disk (disc) includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc.
Accordingly, a computer program product may perform the operations presented herein. For example, such a computer program product may be a computer-readable tangible medium having instructions stored (and/or encoded) thereon that are executable by one or more processors to perform the operations described herein. The computer program product may include packaged material.
Software or instructions may also be transmitted over a transmission medium. For example, the software may be transmitted from a website, server, or other remote source using a transmission medium such as coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, or microwave.
Further, modules and/or other suitable means for carrying out the methods and techniques described herein may be downloaded and/or otherwise obtained by a user terminal and/or base station as appropriate. For example, such a device may be coupled to a server to facilitate the transfer of means for performing the methods described herein. Alternatively, the various methods described herein may be provided via storage means (e.g., RAM, ROM, a physical storage medium such as a CD or floppy disk) such that the user terminal and/or base station may obtain the various methods when coupled to or providing storage means to the device. Further, any other suitable technique for providing the methods and techniques described herein to a device may be utilized.
Other examples and implementations are within the scope and spirit of the invention and the following claims. For example, due to the nature of software, the functions described above may be implemented using software executed by a processor, hardware, firmware, hard-wired, or any combination of these. Features implementing functions may also be physically located at various locations, including being distributed such that portions of functions are implemented at different physical locations. Also, as used herein, including in the claims, "or" as used in a list of items beginning with "at least one" indicates a separate list, such that a list of "A, B or at least one of C" means a or B or C, or AB or AC or BC, or ABC (i.e., a and B and C). Furthermore, the word "exemplary" does not mean that the described example is preferred or better than other examples.
Various changes, substitutions and alterations to the techniques described herein may be made without departing from the techniques of the teachings as defined by the appended claims. Moreover, the scope of the present claims is not intended to be limited to the particular aspects of the process, machine, manufacture, composition of matter, means, methods and acts described above. Processes, machines, manufacture, compositions of matter, means, methods, or acts, presently existing or later to be developed that perform substantially the same function or achieve substantially the same result as the corresponding aspects described herein may be utilized. Accordingly, the appended claims are intended to include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or acts.
The previous description of the inventive aspects is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the invention. Thus, the present invention is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the invention to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims (12)

1. A gesture detection method, comprising:
acquiring at least two frames of video images which are arranged in a time sequence within a preset time range from a video image stream;
respectively carrying out target object identification on each frame of video image of the at least two frames of video images, and determining at least one target object needing gesture detection from the target object;
predicting the posture of a target object in at least one frame of video image of the at least two frames of video images by combining the time sequence of the at least two frames of video images to obtain a posture prediction result;
recognizing the gesture of a target object in at least one frame of video image of the at least two frames of video images to obtain a gesture recognition result;
and acquiring a posture detection result of the at least one target object based on the posture prediction result and the posture recognition result.
2. The method of claim 1, wherein performing target object recognition on each of the at least two frames of video images, respectively, and determining therefrom at least one target object for which pose detection is required comprises:
and carrying out object feature identification or edge detection on each frame of the at least two frames of video images to identify the at least one target object.
3. The method of claim 1, wherein performing target object recognition on each of the at least two frames of video images, respectively, and determining therefrom at least one target object for which pose detection is required comprises:
performing feature point identification on each frame of video image in the at least two frames of video images to obtain at least one feature point identification object and a corresponding feature point;
and determining at least one target object needing gesture detection from the at least one characteristic point recognition object.
4. The method of claim 3, wherein determining at least one target object requiring pose detection from the at least one feature point recognition object comprises:
and evaluating each feature point identification object according to the evaluation parameters, and determining the at least one target object according to the evaluation result.
5. The method of claim 4, wherein,
the evaluation parameter includes at least one of a position parameter, a size parameter, a regularity of motion parameter, and an offset parameter of the feature point identification object.
6. The method of claim 1, wherein predicting the pose of the at least one target object in combination with the temporal order of the at least two frames of video images to obtain a pose prediction result comprises:
and acquiring the attitude track of the at least one target object according to the time sequence, matching the attitude track with the attitude track model, and acquiring an attitude prediction result according to a matching result.
7. The method of claim 1, wherein recognizing the pose of the target object in at least one of the at least two frames of video images to obtain the pose recognition result comprises:
and matching the gesture of the target object in at least one frame of video image of the at least two frames of video images with the static gesture model, and acquiring a gesture recognition result according to the matching result.
8. The method of claim 1, wherein obtaining the pose detection result for the at least one target object based on the pose prediction result and the pose recognition result comprises:
determining confidence degrees of the pose prediction result and the pose recognition result of the at least one target object respectively;
when the confidence degree comparison results of the attitude prediction result and the attitude recognition result meet preset conditions, taking the attitude prediction result as an attitude detection result of the at least one target object; and when the confidence degree comparison result of the attitude prediction result and the attitude recognition result does not meet the preset condition, taking the attitude recognition result as the attitude detection result of the at least one target object.
9. The method of claim 8, wherein the method further comprises:
and when the confidence degree comparison result of the attitude prediction result and the attitude recognition result does not meet the preset condition, correcting the acquisition standard of the attitude prediction result according to the attitude recognition result.
10. An attitude detection apparatus comprising:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is configured to acquire at least two frames of video images which are arranged in a time sequence within a preset time range from a video image stream;
the determining unit is configured to respectively perform target object recognition on each frame of video images of the at least two frames of video images, and determine at least one target object needing gesture detection from the target object;
a prediction unit configured to predict a pose of a target object in at least one frame of video images of the at least two frames of video images in combination with a temporal order of the at least two frames of video images to obtain a pose prediction result;
the recognition unit is configured to recognize the gesture of the target object in at least one frame of video image of the at least two frames of video images so as to obtain a gesture recognition result;
a detection unit configured to acquire a posture detection result of the at least one target object based on the posture prediction result and a posture recognition result.
11. An attitude detection apparatus comprising:
a processor;
and a memory having computer program instructions stored therein,
wherein the computer program instructions, when executed by the processor, cause the processor to perform the steps of:
acquiring at least two frames of video images which are arranged in a time sequence within a preset time range from a video image stream;
respectively carrying out target object identification on each frame of video image of the at least two frames of video images, and determining at least one target object needing gesture detection from the target object;
predicting the posture of a target object in at least one frame of video image of the at least two frames of video images by combining the time sequence of the at least two frames of video images to obtain a posture prediction result;
recognizing the gesture of a target object in at least one frame of video image of the at least two frames of video images to obtain a gesture recognition result;
and acquiring a posture detection result of the at least one target object based on the posture prediction result and the posture recognition result.
12. A computer readable storage medium having computer program instructions stored thereon, wherein the computer program instructions, when executed by a processor, implement the steps of:
acquiring at least two frames of video images which are arranged in a time sequence within a preset time range from a video image stream;
respectively carrying out target object identification on each frame of video image of the at least two frames of video images, and determining at least one target object needing gesture detection from the target object;
predicting the posture of a target object in at least one frame of video image of the at least two frames of video images by combining the time sequence of the at least two frames of video images to obtain a posture prediction result;
recognizing the gesture of a target object in at least one frame of video image of the at least two frames of video images to obtain a gesture recognition result;
and acquiring a posture detection result of the at least one target object based on the posture prediction result and the posture recognition result.
CN201911344827.3A 2019-12-24 2019-12-24 Gesture detection method, gesture detection device and computer-readable storage medium Active CN113033252B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911344827.3A CN113033252B (en) 2019-12-24 2019-12-24 Gesture detection method, gesture detection device and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911344827.3A CN113033252B (en) 2019-12-24 2019-12-24 Gesture detection method, gesture detection device and computer-readable storage medium

Publications (2)

Publication Number Publication Date
CN113033252A true CN113033252A (en) 2021-06-25
CN113033252B CN113033252B (en) 2024-06-28

Family

ID=76451502

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911344827.3A Active CN113033252B (en) 2019-12-24 2019-12-24 Gesture detection method, gesture detection device and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN113033252B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190050996A1 (en) * 2017-08-04 2019-02-14 Intel Corporation Methods and apparatus to generate temporal representations for action recognition systems
CN109598229A (en) * 2018-11-30 2019-04-09 李刚毅 Monitoring system and its method based on action recognition
CN109614882A (en) * 2018-11-19 2019-04-12 浙江大学 A kind of act of violence detection system and method based on human body attitude estimation
CN109670474A (en) * 2018-12-28 2019-04-23 广东工业大学 A kind of estimation method of human posture based on video, device and equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190050996A1 (en) * 2017-08-04 2019-02-14 Intel Corporation Methods and apparatus to generate temporal representations for action recognition systems
CN109614882A (en) * 2018-11-19 2019-04-12 浙江大学 A kind of act of violence detection system and method based on human body attitude estimation
CN109598229A (en) * 2018-11-30 2019-04-09 李刚毅 Monitoring system and its method based on action recognition
CN109670474A (en) * 2018-12-28 2019-04-23 广东工业大学 A kind of estimation method of human posture based on video, device and equipment

Also Published As

Publication number Publication date
CN113033252B (en) 2024-06-28

Similar Documents

Publication Publication Date Title
KR102150776B1 (en) Face location tracking method, apparatus and electronic device
CN108960163B (en) Gesture recognition method, device, equipment and storage medium
CN110046546B (en) Adaptive sight tracking method, device and system and storage medium
CN112767489B (en) Three-dimensional pose determining method and device, electronic equipment and storage medium
EP3540574B1 (en) Eye tracking method, electronic device, and non-transitory computer readable storage medium
CN105825524A (en) Target tracking method and apparatus
CN107967693A (en) Video Key point processing method, device, computing device and computer-readable storage medium
JP6194995B2 (en) Motion prediction optimization method, apparatus and system
CN107784281B (en) Method for detecting human face, device, equipment and computer-readable medium
CN112949512B (en) Dynamic gesture recognition method, gesture interaction method and interaction system
US11972578B2 (en) Method and system for object tracking using online training
Zhang et al. A tracking and predicting scheme for ping pong robot
US20220269883A1 (en) Methods, apparatuses, devices and storage media for predicting correlation between objects involved in image
CN108177146A (en) Control method, device and the computing device of robot head
CN109636828A (en) Object tracking methods and device based on video image
CN108875506B (en) Face shape point tracking method, device and system and storage medium
US20150213308A1 (en) Method and system for analyzing human behavior in an intelligent surveillance system
US20220300774A1 (en) Methods, apparatuses, devices and storage media for detecting correlated objects involved in image
CN114821787A (en) Dynamic gesture recognition method, device, equipment and storage medium
US11244154B2 (en) Target hand tracking method and apparatus, electronic device, and storage medium
KR20230080938A (en) Method and apparatus of gesture recognition and classification using convolutional block attention module
CN111382606A (en) Tumble detection method, tumble detection device and electronic equipment
US9398208B2 (en) Imaging apparatus and imaging condition setting method and program
CN113033252B (en) Gesture detection method, gesture detection device and computer-readable storage medium
CN111611941A (en) Special effect processing method and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant