WO2022152001A1 - 手势识别方法和装置、电子设备、可读存储介质和芯片 - Google Patents

手势识别方法和装置、电子设备、可读存储介质和芯片 Download PDF

Info

Publication number
WO2022152001A1
WO2022152001A1 PCT/CN2021/143855 CN2021143855W WO2022152001A1 WO 2022152001 A1 WO2022152001 A1 WO 2022152001A1 CN 2021143855 W CN2021143855 W CN 2021143855W WO 2022152001 A1 WO2022152001 A1 WO 2022152001A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
image
feature points
vector
feature vector
Prior art date
Application number
PCT/CN2021/143855
Other languages
English (en)
French (fr)
Inventor
郭桦
毛芳勤
Original Assignee
维沃移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 维沃移动通信有限公司 filed Critical 维沃移动通信有限公司
Priority to EP21919188.9A priority Critical patent/EP4273745A4/en
Publication of WO2022152001A1 publication Critical patent/WO2022152001A1/zh
Priority to US18/222,476 priority patent/US20230360443A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/74Determining position or orientation of objects or cameras using feature-based methods involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • G06V40/113Recognition of static hand signs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0499Feedforward networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Definitions

  • the present application relates to the technical field of image recognition, and in particular, to a gesture recognition method, a gesture recognition device, an electronic device, a readable storage medium and a chip.
  • gesture recognition is mainly divided into two types: gesture recognition based on the original red, green and blue color mode (RGB) image and gesture recognition based on hand key points.
  • RGB red, green and blue color mode
  • gesture recognition based on hand key points mainly obtains the category of the gesture by directly classifying the acquired gesture area pictures, and the gesture recognition based on the key points of the hand is obtained by modeling the positional relationship of the 21 key points of the hand.
  • the categories of gestures, but the above recognition methods still have the problem of low recognition accuracy.
  • the present application discloses a gesture recognition method, a gesture recognition device, an electronic device, a readable storage medium and a chip to solve the problem of low gesture recognition accuracy in the related art.
  • a first aspect of the present application provides a gesture recognition method, comprising: acquiring a hand region sub-image in a target image, and determining multiple feature point position information corresponding to multiple feature points in the hand region sub-image; The position information of multiple feature points is used to determine the first position feature vector.
  • the first position feature vector represents the relative positional relationship of any feature point in the multiple feature points with respect to the remaining feature points in the multiple feature points.
  • the perceptron processes the position information of multiple feature points to obtain the second position feature vector of the multiple feature points in the sub-image of the hand region; according to the first position feature vector and the second position feature vector, the sub-image of the hand region is output recognition result.
  • a second aspect of the present application provides a gesture recognition device, comprising: an acquisition unit configured to acquire a sub-image of a hand region in a target image, and determine a plurality of features corresponding to a plurality of feature points in the sub-image of the hand region point position information; a feature determination unit for determining a first position feature vector of the relative position relationship between multiple feature points according to the multiple feature point position information; processing to obtain the second position feature vector of the plurality of feature points in the hand region sub-image; the output unit is configured to output the recognition result of the hand region sub-image according to the first position feature vector and the second position feature vector.
  • a third aspect of the present application proposes an electronic device, including a processor, a memory, and a program or instruction stored in the memory and executable on the processor.
  • the program or instruction is executed by the processor to implement the gesture of the first aspect above. Identify the steps of the method.
  • a fourth aspect of the present application provides a readable storage medium, in which a program or an instruction is stored, and when the program or instruction is executed by a processor, the steps of the gesture recognition method of the above-mentioned first aspect are implemented.
  • a fifth aspect of the present application provides a chip, including a processor and a communication interface, the communication interface and the processor are coupled, and the processor is configured to run programs or instructions to implement the steps of the gesture recognition method of the first aspect.
  • the gesture recognition method proposed in the present application first acquires the sub-image of the hand region in the image to be processed, and determines the position information of multiple feature points corresponding to the multiple feature points in the sub-image of the hand region, and then according to the multiple feature points A first position feature vector and a second position feature vector are obtained by calculating the corresponding multiple feature point position information, and finally the recognition result of the hand region sub-image is determined according to the first position feature vector and the second position feature vector.
  • the bone joint points can be selected as the feature points of the sub-image of the hand area, and the action categories can be accurately identified through multiple feature points.
  • the first position feature vector retains the relative positional relationship between multiple feature points, excluding the influence of different perspectives on the recognition of the action type, that is, for the same action, no matter what perspective the target image is obtained from After the hand region sub-image is determined and multiple feature points are determined, the obtained first position feature vectors of the multiple feature points are exactly the same.
  • the second position feature vector retains the absolute positions of multiple feature points in the sub-image of the hand region.
  • This gesture recognition method takes into account the relative positional relationship between multiple feature points and the absolute positional relationship of multiple feature points in the sub-image of the hand area, which effectively solves the misrecognition in the case of various perspective changes problem, improve the stability of action recognition.
  • FIG. 1 shows one of the schematic flowcharts of the gesture recognition method according to the embodiment of the present application
  • Fig. 2 shows one of the sub-images of the hand region obtained at different viewing angles
  • Fig. 3 shows the second sub-images of the hand region obtained from different viewing angles
  • Fig. 4 shows the third sub-image of the hand region obtained from different viewing angles
  • FIG. 5 shows the second schematic flowchart of the gesture recognition method according to the embodiment of the present application
  • FIG. 6 shows the third schematic flowchart of the gesture recognition method according to the embodiment of the present application.
  • FIG. 7 shows a schematic diagram of feature points of a gesture recognition method according to an embodiment of the present application.
  • FIG. 8 shows a fourth schematic flowchart of a gesture recognition method according to an embodiment of the present application.
  • FIG. 9 shows a schematic diagram of a recognition result of the gesture recognition method according to an embodiment of the present application.
  • FIG. 10 shows the fifth schematic flowchart of the gesture recognition method according to the embodiment of the present application.
  • FIG. 11 shows a sixth schematic flowchart of a gesture recognition method according to an embodiment of the present application.
  • FIG. 12 shows a schematic diagram of a hand region sub-image of the gesture recognition method according to an embodiment of the present application
  • FIG. 13 shows one of the schematic structural block diagrams of the gesture recognition apparatus according to the embodiment of the present application.
  • FIG. 14 shows the second schematic block diagram of the structure of the gesture recognition apparatus according to the embodiment of the present application.
  • FIG. 15 shows the third schematic block diagram of the structure of the gesture recognition apparatus according to the embodiment of the present application.
  • FIG. 16 shows the fourth schematic block diagram of the structure of the gesture recognition apparatus according to the embodiment of the present application.
  • FIG. 17 shows the fifth schematic block diagram of the structure of the gesture recognition apparatus according to the embodiment of the present application.
  • FIG. 18 shows the sixth schematic block diagram of the structure of the gesture recognition apparatus according to the embodiment of the present application.
  • FIG. 19 shows a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.
  • FIG. 1 shows a schematic flowchart of a gesture recognition method according to an embodiment of the present application.
  • the gesture recognition method of the embodiment of the present application includes:
  • Step S102 obtaining the sub-image of the hand region in the target image, and determining the position information of multiple feature points corresponding to the multiple feature points in the sub-image of the hand region;
  • Step S104 determining a first position feature vector according to the position information of a plurality of feature points
  • Step S106 according to the position information of a plurality of feature points, determine the second position feature vector
  • Step S108 output the recognition result of the sub-image of the hand region according to the first position feature vector and the second position feature vector.
  • the hand region sub-images in the target image are first obtained.
  • M target images can be obtained in any way, for example, any format and size M target images.
  • the M target images can be detected to obtain the target area in the M target images, that is, the hand area, and then N hand area sub-images can be extracted.
  • N hand area sub-images can be extracted.
  • the subsequent recognition process only It is only necessary to extract the feature points and the location information of the positioning feature points in the sub-image of the hand area, which can narrow the recognition range, reduce the amount of calculation in the recognition process, and improve the accuracy and efficiency of recognition. Since the completion of the action requires the traction of the bones, in the embodiment of the present application, the bone joint points are used as the feature points of the sub-image of the hand area, and the action categories are accurately identified by these feature points, and then the identification result is determined.
  • the first position feature vector is determined respectively, and the first position feature vector indicates that any one of the multiple feature points is relative to each other.
  • the second position feature vector represents the absolute position relationship of the multiple feature points in the sub-image of the hand region, and is determined according to the first position feature vector and the second position feature vector.
  • the relative relationship between the feature points in the images captured for the same hand posture is also different, such as the thumb
  • the line connecting the feature point 202 and the second feature point 204 has a different angular relationship with the line connecting the third feature point 206 and the fourth feature point 208 of the index finger, and the relationship of the hand posture in the three-dimensional space is different
  • the relative positional relationship between the feature points in the image acquired from the perspective has changed.
  • the first position feature vector in the embodiment of the present application extracts the relative positional relationship between multiple feature points, excluding the influence of different perspectives on the recognition of the action type.
  • the obtained first position feature vectors of the multiple feature points are exactly the same.
  • the second position feature vector retains the absolute positions of multiple feature points in the sub-image of the hand region.
  • the relative position and the absolute position of the feature point in the picture, and finally the action type of the sub-image of the hand area is obtained.
  • This gesture recognition method takes into account the relative positional relationship between multiple feature points and the absolute positional relationship of multiple feature points in the sub-image of the hand area, which effectively solves the misrecognition in the case of various perspective changes problem, improve the stability of action recognition.
  • FIG. 5 shows a schematic flowchart of a gesture recognition method according to an embodiment of the first aspect of the present application. Among them, the method includes:
  • Step S202 obtaining the hand region sub-image in the target image, and determining the position information of multiple feature points corresponding to the multiple feature points in the hand region sub-image;
  • Step S204 establishing a first coordinate matrix according to the position information of a plurality of feature points, and obtaining a first eigenvector corresponding to the maximum eigenvalue of the first coordinate matrix;
  • Step S206 the first feature vector is processed by the first multilayer perceptron to obtain the first position feature vector
  • Step S208 processing the position information of the multiple feature points by the second multilayer perceptron to obtain the second position feature vector of the multiple feature points in the sub-image of the hand region;
  • Step S210 output the recognition result of the sub-image of the hand region according to the first position feature vector and the second position feature vector.
  • a first coordinate matrix is established according to the location information of the multiple feature points, and the first eigenvector corresponding to the largest eigenvalue of the first coordinate matrix is obtained by calculation .
  • the introduction of the first feature vector is to facilitate subsequent operations.
  • the obtained first feature vector is processed by the first multilayer perceptron, which excludes the influence of the viewing angle on the judgment of the action type, so that the relative positional relationship between multiple feature points can be accurately determined under different viewing angles.
  • an element in the first coordinate matrix is the Euclidean distance between any one of the multiple feature points and any one of the remaining feature points among the multiple feature points.
  • the Euclidean distance between the feature points is used as an element in the first coordinate matrix, and the Euclidean distance can represent the true distance between two points in a multi-dimensional space. Therefore, in these embodiments, the Euclidean distance is quoted in the first coordinate matrix, rather than the plane distance of the feature points in the direction perpendicular to the photographing device, which can effectively solve the problem that any feature point in the multiple feature points is relatively relative to each other under different viewing angles. In order to accurately obtain the actual relative positional relationship of the plurality of feature points in the three-dimensional space, the distance between any of the remaining feature points among the plurality of feature points is determined.
  • the first coordinate matrix is an upper triangular matrix.
  • the element in the first coordinate matrix is the relative distance between the feature point corresponding to the row and the feature point corresponding to the column.
  • the element in the second row and the third column of the matrix may represent the second feature point and the third
  • the relative distance between the feature points, the elements in the third row and the second column of the matrix can represent the relative distance between the third feature point and the second feature point. It can be known that the values of these two elements are repeated, and A large number of repeated elements will increase the complexity of the operation.
  • the established first coordinate matrix is a multi-dimensional upper triangular matrix
  • the elements at the lower left of the diagonal are all 0, and the elements at the upper right of the diagonal are the feature points corresponding to the row and the
  • the relative distance between the feature points corresponding to the column plays a role in simplifying the operation process.
  • FIG. 6 shows a schematic flowchart of a gesture recognition method according to another embodiment of the first aspect of the present application. Among them, the method includes:
  • Step S302 obtaining the sub-image of the hand region in the target image, and determining the position information of multiple feature points corresponding to the multiple feature points in the sub-image of the hand region;
  • Step S304 determining a first position feature vector according to the position information of a plurality of feature points
  • Step S306 obtaining a second feature vector according to the position information of multiple feature points
  • Step S308 processing the second feature vector by the second multilayer perceptron to obtain a second position feature vector
  • Step S310 output the recognition result of the sub-image of the hand region according to the first position feature vector and the second position feature vector.
  • a second feature vector is determined according to the location information of the multiple feature points, wherein the elements in the second feature vector respectively represent the X direction and the Y direction of each feature point. location information.
  • the obtained second feature vector is processed by the second multilayer perceptron, so that the number of elements of the obtained second position feature vector is the same as the number of elements in the first position feature vector, so as to facilitate subsequent calculation.
  • FIG. 8 shows a schematic flowchart of a gesture recognition method according to another embodiment of the first aspect of the present application. Among them, the method includes:
  • Step S402 acquiring the hand region sub-image in the target image, and determining the position information of multiple feature points corresponding to the multiple feature points in the hand region sub-image;
  • Step S404 determining a first position feature vector according to the position information of a plurality of feature points
  • Step S406 processing the position information of the multiple feature points by the first multilayer perceptron to obtain the second position feature vector of the multiple feature points in the sub-image of the hand region;
  • Step S408 adding the vector value of the first position feature vector and the vector value of the second position feature vector bit by bit to obtain a fusion vector value
  • Step S410 the fusion vector value is processed by the third multilayer perceptron to obtain a classification vector
  • Step S412 the action category corresponding to the maximum value in the classification vector is determined as the recognition result of the hand region sub-image.
  • a specific method for determining the action category of the hand region sub-image based on the first position feature vector and the second position feature vector is proposed. After obtaining the first position feature vector and the second position feature vector, the vector values of the first position feature vector and the second position feature vector are added bit by bit, that is, the first position feature vector value and the first position feature vector value of each feature point. The two position eigenvector values are added. The fusion vector value obtained by the addition is processed through the fourth multi-layer grid perception structure to obtain a classification vector. Each element in the classification vector represents the probability that the action in the sub-image of the hand region conforms to the corresponding action type of the element. Therefore, The action category corresponding to the element with the largest maximum value in the classification vector is the action type most likely to conform to the action in the sub-image of the hand region, and then the recognition result of the gesture recognition is determined.
  • the first position feature vector and the second position feature vector are vectors with the same number of elements, and they are added bit by bit during fusion, and the obtained fusion vector is processed by the fourth multi-layer grid-aware structure to obtain a classification vector.
  • Each element represents the probability that the gesture in the sub-image of the hand area belongs to the action category corresponding to the element, so the action corresponding to the element with the largest value in the classification vector is taken, that is, the action category corresponding to the element with the largest probability value, That is, the action category corresponding to the gesture action in the sub-image of the hand region. Therefore, through the above steps, the output of the gesture recognition result is realized.
  • FIG. 9 shows a schematic diagram of recognizing the recognition result of the user's gesture as "palm".
  • FIG. 10 shows a schematic flowchart of a gesture recognition method according to another embodiment of the first aspect of the present application. Among them, the method includes:
  • Step S502 obtaining the target area in the target image through a preset neural network model
  • Step S504 determining the sub-image of the hand region according to the target region, and identifying multiple feature points of the sub-image of the hand region through a preset neural network model;
  • Step S506 acquiring feature point position information of multiple feature points
  • Step S508 determining a first position feature vector according to the position information of a plurality of feature points
  • Step S510 processing the position information of the multiple feature points by the first multilayer perceptron to obtain the second position feature vector of the multiple feature points in the hand region sub-image;
  • Step S512 output the recognition result of the sub-image of the hand region according to the first position feature vector and the second position feature vector.
  • the target image is first processed through a preset neural network model to obtain the target area where the sub-image of the hand area is located, that is, the neural network model is used to find the sub-image of the hand area in the original image to be processed , and determine its area range. Then the hand region sub-image is determined according to the target region. Through the range of the region determined in the previous step, the range is further narrowed, and the sub-image of the hand region is determined. There are multiple feature points on the sub-image of the hand region, and the action category can be accurately identified through these feature points. Identify multiple feature points of the hand region sub-image through a preset neural network model; obtain the feature point position information of the multiple feature points, and then further determine the recognition result of the hand region sub-image according to the obtained feature point position information .
  • the target image may be processed by the palm detection model in the preset neural network model to obtain the target area where the sub-image of the hand area is located.
  • the acquired target image is recognized by the palm detection model.
  • the palm detection model can obtain the vertex position information of the quadrilateral in the area where the sub-image of the hand area is located through the matrix operation method in deep learning, and then can frame the target area where the sub-image of the hand area is located. , that is, the hand area in this embodiment.
  • the target image is cropped, and the target area where the sub-image of the hand area selected by the retention frame is cropped.
  • the hand region sub-image can be determined according to the target region through the feature point detection model in the preset neural network model, the multiple feature points of the hand region sub-image can be recognized through the preset neural network model, and multiple feature points of the hand region sub-image can be obtained. location information of feature points.
  • the feature point detection model detects the cropped target image to obtain a sub-image of the hand region. Through the matrix operation in deep learning, multiple feature points and their position information of the sub-image of the hand region can be obtained.
  • traditional gesture recognition methods can be used to smooth and debounce the feature points, so that the feature points are more sensitive and stable. , to avoid affecting the determination of the positions of multiple feature points due to shaking during the shooting process or affecting the imaging quality of the target image.
  • FIG. 11 shows a schematic flowchart of a gesture recognition method according to another embodiment of the first aspect of the present application. Among them, the method includes:
  • Step S602 receiving a first input
  • Step S604 in response to the first input, acquiring a target image including a sub-image of the hand region;
  • Step S606 obtaining the sub-image of the hand region in the target image, and determining the position information of multiple feature points corresponding to the multiple feature points in the sub-image of the hand region;
  • Step S608 determining a first position feature vector according to the position information of a plurality of feature points
  • Step S610 determining a second position feature vector according to the position information of a plurality of feature points
  • Step S612 output the recognition result of the sub-image of the hand region according to the first position feature vector and the second position feature vector.
  • a first input is received, and in response to the first input, a target image including the hand region sub-image is acquired.
  • the image to be processed is not acquired, and subsequent operations are not performed, so as to avoid a huge amount of calculation caused by frequent unnecessary gesture recognition, and reduce the calculation load.
  • the first input may be a screen input or a voice input, and different input modes can cope with a variety of different usage scenarios, bringing a better experience to the user.
  • the received first input may be the input on the screen, and may be input by the user by clicking on the touch screen.
  • Schematic of the target image As shown in FIG. 12 , the user opens the preview of the photo camera, and starts the photo function. Receive the user's click action on the screen, and use this action as the first input, and then in response to the received first input, start the photo preview function, in the photo preview function mode, make a gesture and make the hand enter the lens to shoot Within the range, the target image containing the sub-image of the hand area can be obtained.
  • a method for performing gesture recognition when taking pictures of a hand under different viewing angles is provided, which may specifically include the following steps.
  • the user opens the camera preview, activates the palm photo function, and receives and responds to the first input. Then load three deep learning neural network models and initialize the preset neural network models.
  • the preset neural network model includes: a palm detection model, a feature point detection model, and a gesture classification model, wherein the palm detection model can obtain the position of the palm region in the photo, and the feature point detection model can obtain the position of the gesture feature points. information, the gesture classification model can determine the type of gesture action.
  • the palm detection model is used to identify the acquired target image, and the vertex position information of the quadrilateral of the hand area can be obtained through the matrix operation in deep learning, and then the hand area can be framed, that is, the target area where the sub-image of the hand area is obtained.
  • the position information of the 21 feature points of the gesture can be obtained, that is, to obtain multiple feature points of the sub-image of the hand region.
  • the schematic diagram of the 21 feature points of the gesture is shown in FIG. 7 .
  • the shape of the same gesture in the photo is often different due to different perspectives, and the position coordinates of the gesture feature points are also different.
  • Figures 2 to 4 show the changes of the gesture in various perspectives.
  • the heuristic artificial features in the related art are generally rule-based methods.
  • the key feature point is the thumb first feature point 202.
  • the second feature point 204 of the thumb is the thumb first feature point 202.
  • the third feature point 206 of the index finger the fourth feature point 208 of the index finger.
  • a 21*21 upper triangular matrix is calculated, where the elements of each matrix represent the Euclidean distance between points, and the maximum eigenvalue is extracted from the matrix
  • Corresponding feature vector obtain the first feature vector containing 21 elements, use the multilayer perceptron to further extract features from the first feature vector, and obtain the first position feature vector of the relative position relationship between the feature points, that is, the view-invariant feature vector.
  • the perspective invariant feature feature1 retains the relative positional relationship between feature points.
  • the original feature point information needs to be obtained through a supervised multilayer perceptron to automatically learn feature2.
  • the specific Need to define a second feature vector the second feature vector contains 42 elements, representing the coordinates of each feature point in the x and y directions, and then through three layers of multilayer perceptrons, get automatic learning containing 21 elements
  • the feature vector that is, the second position feature vector of the feature point in the sub-image of the hand region.
  • the first position feature vector that is, the view-invariant feature vector
  • the second position feature vector that is, the automatic learning feature vector
  • the first position feature vector and the second position feature vector are vectors with the same dimension, which are added bit by bit during fusion, and the obtained fusion vector passes through two layers of multilayer perceptrons to obtain the final classification vector result, which represents The probability of belonging to each gesture category, the category with the highest probability is the corresponding gesture. Therefore, through the above steps, the output of the gesture category is realized.
  • an embodiment of the second aspect of the present application proposes a gesture recognition device 100, including: an acquisition unit 110, configured to acquire a hand region sub-image in a target image, The multiple feature point position information corresponding to the multiple feature points; the feature determination unit 120 is used to determine the first position feature vector of the relative position relationship between the multiple feature points according to the multiple feature point position information; The layer perceptron processes the position information of the plurality of feature points to obtain the second position feature vector of the plurality of feature points in the sub-image of the hand region; the output unit 130 is used for according to the first position feature vector and the second position feature A vector that outputs the recognition result of the sub-image of the hand region.
  • the acquiring unit 110 plays an incoming role of acquiring the hand region sub-image and determining the position information of multiple feature points in the hand region sub-image
  • the feature determining unit 120 plays the role of determining the first position feature.
  • vector and the second position feature vector that is, the function of determining the relative position between the feature points and the absolute position of the feature point in the image
  • the output unit 130 determines the first position feature vector and the second position feature according to the feature determining unit 120.
  • vector perform corresponding operation processing, determine and output the recognition result of the sub-image of the hand area.
  • the gesture recognition device considers the relationship between multiple feature points, effectively solves the problem of misrecognition in the case of various perspective changes, and improves the stability of action recognition.
  • the feature determination unit 120 includes: a first feature acquisition subunit 122, configured to establish a first coordinate matrix according to the position information of a plurality of feature points, and obtain the largest eigenvalue of the first coordinate matrix The corresponding first feature vector; the first feature determination subunit 124 is configured to process the first feature vector through the first multilayer perceptron to obtain the first position feature vector.
  • the first feature acquisition subunit 122 after determining the position information of the multiple feature points, establishes a first coordinate matrix according to the position information of the multiple feature points, and obtains the maximum eigenvalue of the first coordinate matrix through calculation the corresponding first feature vector.
  • the introduction of the first eigenvector is to select a good set of bases to facilitate subsequent operations.
  • the first feature determination subunit 124 processes the obtained first feature vector through the first multi-layer perceptron, which excludes the influence of the viewing angle on the judgment of the action type, so that the action type can be accurately identified under different viewing angles.
  • an element in the first coordinate matrix is the Euclidean distance between any two feature points in the plurality of feature points.
  • the Euclidean distance between the feature points is used as an element in the first coordinate matrix, and the Euclidean distance can represent the real distance between two points in a multi-dimensional space. Therefore, in these embodiments, the Euclidean distance is quoted, Instead of the plane distance of the feature points in the vertical direction of the photographing device, it can effectively solve the problem of judging the distance between any two feature points under different viewing angles.
  • the first coordinate matrix is an upper triangular matrix.
  • the element in the first coordinate matrix is the relative distance between the feature point corresponding to the row and the feature point corresponding to the column.
  • the element in the second row and the third column of the matrix may represent the second feature point and the third
  • the relative distance between the feature points, the elements in the third row and the second column of the matrix can represent the relative distance between the third feature point and the second feature point. It can be known that the values of these two elements are repeated, and A large number of repeated elements will increase the complexity of the operation. Therefore, in these embodiments, the established first coordinate matrix is a multi-dimensional upper triangular matrix, the elements at the lower left of the diagonal are all 0, and the elements at the upper right of the diagonal are all 0s. It is the relative distance between the feature point corresponding to the row and the feature point corresponding to the column, which simplifies the operation process.
  • the feature determination unit 120 further includes: a second feature acquisition subunit 126, configured to obtain a second feature vector according to the position information of multiple feature points; a second feature determination subunit 128, The second feature vector is processed by the second multilayer perceptron to obtain the second position feature vector.
  • the second feature acquisition subunit 126 determines a second feature vector according to the location information of the multiple feature points, wherein the elements in the second feature vector represent each feature respectively The position of the point in the x and y directions.
  • the second feature determination subunit 128 processes the obtained second feature vector through the second multilayer perceptron, so that the number of elements in the obtained second position feature vector is the same as the number of elements in the first position feature vector, which is convenient for subsequent calculate.
  • the output unit 130 further includes: a fusion unit 132, configured to add the vector value of the first position feature vector and the vector value of the second position feature vector bitwise to obtain fusion vector value; the processing unit 134 is used to process the fusion vector value by the third multilayer perceptron to obtain a classification vector; the determination unit 136 is used to determine the action category corresponding to the maximum value in the classification vector as the hand region sub Image recognition results.
  • a fusion unit 132 configured to add the vector value of the first position feature vector and the vector value of the second position feature vector bitwise to obtain fusion vector value
  • the processing unit 134 is used to process the fusion vector value by the third multilayer perceptron to obtain a classification vector
  • the determination unit 136 is used to determine the action category corresponding to the maximum value in the classification vector as the hand region sub Image recognition results.
  • the unit structure of the output unit 130 for determining the recognition result of the hand region sub-image according to the first position feature vector and the second position feature vector is proposed.
  • the fusion unit 132 obtains the first position feature vector and the second position feature vector
  • the vector values of the first position feature vector and the second position feature vector are added bit by bit, that is, the first position feature vector of each feature point.
  • the value is added to the second position eigenvector value.
  • the processing unit 134 processes the obtained fusion vector value through the fourth multi-layer grid perception structure to obtain a classification vector, and each element in the classification vector represents the probability that the action in the sub-image of the hand region conforms to the corresponding action type of the element. , so the determination unit 136 selects the action category corresponding to the maximum value in the classification vector, that is, selects the action category that the action in the hand region sub-image most likely matches.
  • the obtaining unit 110 further includes: a region obtaining subunit 112 for processing the target image through a preset neural network model to obtain the target region where the hand region sub-image is located; feature points
  • the acquisition sub-unit 114 is used to determine the sub-image of the hand area according to the target area, and identify multiple feature points of the sub-image of the hand area through a preset neural network model; the position information acquisition sub-unit 116 is used to acquire multiple feature points feature point location information.
  • the region obtaining subunit 112 processes the target image through a preset neural network model to obtain the target region where the hand region sub-image is located, that is, using the neural network model, finds in the original image to be processed Hand area sub-image and determine its area range. Then the feature point acquisition subunit 114 determines the hand region sub-image according to the target region. Through the range of the region determined in the previous step, the range is further narrowed, and the sub-image of the hand region is determined. There are multiple feature points on the sub-image of the hand region, and the action category can be accurately identified through these feature points.
  • the feature point acquisition sub-unit 114 identifies multiple feature points of the sub-image of the hand region through a preset neural network model; the position information acquisition sub-unit 116 acquires the feature point position information of the multiple feature points, and can obtain the feature points according to the acquired feature points.
  • the location information further determines the recognition result of the image.
  • the gesture recognition apparatus 100 further includes: a receiving unit 140 for receiving a first input; and a response unit 150 for obtaining a sub-image including the hand region sub-image in response to the first input target image.
  • the receiving unit 140 before acquiring the hand region sub-image in the target image, the receiving unit 140 first receives the first input, and the responding unit 150 acquires the target image including the hand region sub-image in response to the first input.
  • the receiving unit 140 does not receive the first input, the gesture recognition apparatus 100 does not acquire the to-be-processed image and does not perform subsequent operations, thereby avoiding the huge computational load caused by frequent unnecessary gesture recognition and reducing the computational load.
  • the first input may be a screen input or a voice input, and different input modes can cope with a variety of different usage scenarios, bringing a better experience to the user.
  • FIG. 19 shows a schematic diagram of a hardware structure of an electronic device according to an embodiment of the application.
  • the electronic device 1900 provided in this embodiment of the present application may be, for example, a mobile phone, a notebook computer, a tablet computer, or the like.
  • the electronic device 1900 includes but is not limited to: a radio frequency unit 1901, a network module 1902, an audio output unit 1903, an input unit 1904, a sensor 1905, a display unit 1906, a user input unit 1904, an interface unit 1908, a memory 1909, and a processor 1910, etc. part.
  • the electronic device 1900 may also include a power source 1911 (such as a battery) for supplying power to various components, and the power source may be logically connected to the processor 1910 through a power management system, so as to manage charging, discharging, and Power management and other functions.
  • a power source 1911 such as a battery
  • the structure of the electronic device shown in FIG. 19 does not constitute a limitation on the electronic device, and the electronic device may include more or less components than those shown in the figure, or combine some components, or arrange different components, which will not be repeated here. .
  • the processor 1910 determines the position information of multiple feature points corresponding to the multiple feature points in the sub-image of the hand region; is used to determine the first position of the relative position relationship between the multiple feature points according to the position information of the multiple feature points feature vector; process the position information of multiple feature points through the first multilayer perceptron to obtain the second position feature vector of the multiple feature points in the sub-image of the hand region; according to the first position feature vector and the second position feature A vector that outputs the recognition result of the sub-image of the hand region.
  • the radio frequency unit 1901 may be used to send and receive information or send and receive signals during a call, and specifically, receive downlink data from the base station or send uplink data to the base station.
  • the radio frequency unit 1901 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like.
  • the network module 1902 provides the user with wireless broadband Internet access, such as helping the user to send and receive emails, browse the web, and access streaming media.
  • the audio output unit 1903 may convert audio data received by the radio frequency unit 1901 or the network module 1902 or stored in the memory 1909 into audio signals and output as sound. Also, the audio output unit 1903 may also provide audio output related to a specific function performed by the electronic device 1900 (eg, call signal reception sound, message reception sound, etc.).
  • the audio output unit 1903 includes a speaker, a buzzer, a receiver, and the like.
  • the input unit 1904 is used to receive audio or video signals.
  • the input unit 1904 may include a graphics processor (Graphics Processing Unit, GPU) 5082 and a microphone 5084, the graphics processor 5082 is used for still pictures or video images obtained by an image capture device (such as a camera) in a video capture mode or an image capture mode data is processed.
  • the processed image frames may be displayed on the display unit 1906, or stored in the memory 1909 (or other storage medium), or transmitted via the radio frequency unit 1901 or the network module 1902.
  • the microphone 5084 can receive sound, and can process the sound into audio data, and the processed audio data can be output in a format that can be transmitted to a mobile communication base station via the radio frequency unit 1901 in the case of a phone call mode.
  • the electronic device 1900 also includes at least one sensor 1905, such as a fingerprint sensor, pressure sensor, iris sensor, molecular sensor, gyroscope, barometer, hygrometer, thermometer, infrared sensor, light sensor, motion sensor, and other sensors.
  • a sensor 1905 such as a fingerprint sensor, pressure sensor, iris sensor, molecular sensor, gyroscope, barometer, hygrometer, thermometer, infrared sensor, light sensor, motion sensor, and other sensors.
  • the display unit 1906 is used to display information input by the user or information provided to the user.
  • the display unit 1906 may include a display panel 5122, which may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like.
  • the user input unit 1907 may be used to receive input numerical or character information, and generate key signal input related to user settings and function control of the electronic device.
  • the user input unit 1907 includes a touch panel 5142 and other input devices 5144 .
  • the touch panel 5142 also referred to as a touch screen, can collect user's touch operations on or near it.
  • the touch panel 5142 may include two parts, a touch detection device and a touch controller. Among them, the touch detection device detects the user's touch orientation, detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts it into contact coordinates, and then sends it to the touch controller.
  • Other input devices 5144 may include, but are not limited to, physical keyboards, function keys (such as volume control keys, switch keys, etc.), trackballs, mice, and joysticks, which are not described herein again.
  • the touch panel 5142 can be overlaid on the display panel 5122.
  • the touch panel 5142 detects a touch operation on or near it, it transmits it to the processor 1910 to determine the type of the touch event, and then the processor 1910 determines the type of the touch event according to the touch
  • the type of event provides corresponding visual output on display panel 5122.
  • the touch panel 5142 and the display panel 5122 can be used as two independent components, or can be integrated into one component.
  • the interface unit 1908 is an interface for connecting an external device to the electronic device 1900 .
  • external devices may include wired or wireless headset ports, external power (or battery charger) ports, wired or wireless data ports, memory card ports, ports for connecting devices with identification modules, audio input/output (I/O) ports, video I/O ports, headphone ports, and more.
  • the interface unit 1908 may be used to receive input (eg, data information, power, etc.) from external devices and transmit the received input to one or more elements within the electronic device 1900 or may be used between the electronic device 1900 and external Transfer data between devices.
  • the memory 1909 may be used to store software programs as well as various data.
  • the memory 1909 may mainly include a stored program area and a stored data area, wherein the stored program area may store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), etc.; Data (such as audio data, phone book, etc.) created by the use of the mobile terminal, etc.
  • memory 1909 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
  • the processor 1910 executes various functions of the electronic device 1900 and processes data by running or executing the software programs and/or modules stored in the memory 1909, and calling the data stored in the memory 1909, so as to monitor the electronic device 1900 as a whole .
  • the processor 1910 may include one or more processing units; optionally, the processor 1910 may integrate an application processor and a modem processor, wherein the application processor mainly processes the operating system, user interface, and application programs, etc., and the modem
  • the modulation processor mainly handles wireless communication.
  • the electronic device 1900 may also include a power supply 1911 for supplying power to various components.
  • the power supply 1911 may be logically connected to the processor 1910 through a power management system, so as to implement functions such as managing charging, discharging, and power consumption through the power management system.
  • the embodiments of the present application further provide a readable storage medium, where a program or an instruction is stored on the readable storage medium.
  • a program or an instruction is stored on the readable storage medium.
  • the processor is the processor in the electronic device in the above embodiment.
  • a readable storage medium including a computer-readable storage medium, examples of the computer-readable storage medium include a non-transitory computer-readable storage medium, such as a computer read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory) Memory, RAM), magnetic disk or optical disk, etc.
  • ROM computer read-only memory
  • RAM random access memory
  • magnetic disk or optical disk etc.
  • the embodiment of the present application also proposes a chip, which includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is used to run a program or an instruction to implement the steps of the gesture recognition method in the first aspect, and thus has the gesture recognition method. All the beneficial effects of the identification method will not be repeated here.
  • the chip mentioned in the embodiments of the present application may also be referred to as a system-on-chip, a system-on-chip, a system-on-a-chip, or a system-on-a-chip, or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Image Analysis (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

一种手势识别方法、手势识别装置、电子设备、可读存储介质和芯片,方法包括:获取目标图像中的手部区域子图像,确定与手部区域子图像中的多个特征点对应的多个特征点位置信息(S102);根据多个特征点位置信息,确定第一位置特征向量(S104),第一位置特征向量表示多个特征点中的任一个特征点相对于多个特征点中其余特征点的相对位置关系;根据多个特征点位置信息,确定第二位置特征向量(S106),第二位置特征向量表示多个特征点在手部区域子图像中的绝对位置关系;根据第一位置特征向量和第二位置特征向量,输出手部区域子图像的识别结果(S108)。

Description

手势识别方法和装置、电子设备、可读存储介质和芯片
相关申请的交叉引用
本申请要求享有于2021年01月15日提交的中国专利申请202110057731.X的优先权,该申请的全部内容通过引用并入本文中。
技术领域
本申请涉及图像识别技术领域,具体而言,涉及手势识别方法、手势识别装置、电子设备、可读存储介质和芯片。
背景技术
目前,在移动端用户的人机交互中,除了常见的触屏交互,基于手势的交互开始受到了越来越多的重视,随着手机增强现实(Augmented Reality,AR)、虚拟现实(Virtual Reality,VR)能力的发展,手势交互慢慢成为一个不可替代的趋势。目前手势识别主要分为两种类型:基于原始红绿蓝色彩模式(RGB color mode,RGB)图像的手势识别以及基于手部关键点的手势识别。其中基于原始RGB图像手势识别主要是通过对获取的手势区域图片直接进行图像分类,来获取手势的类别,基于手部关键点的手势识别通过手部21个关键点的位置关系进行建模来获取手势的类别,但以上识别方法,仍旧存在识别准确度低的问题。
发明内容
本申请公开了一种手势识别方法、手势识别装置、电子设备、可读存储介质和芯片,以解决相关技术中手势识别准确度低的问题。
为了解决上述技术问题,本申请提供了如下的实施例。
本申请的第一方面提出了一种手势识别方法,包括:获取目标图像中的手部区域子图像,确定与手部区域子图像中的多个特征点对应的多个特征点位置信息;根据多个特征点位置信息,确定第一位置特征向量,第一位置特征向量表示多个特征点中的任一个特征点相对于多个特征点中其余特征点的相对位置关系,通过第一多层感知机对多个特征点位置信息进行处理,得到多个特征点在手部区域子图像中的第二位置特征向量;根据第一位置特征向量和第二位置特征向量,输出手部区域子图像的识别结果。
本申请的第二方面提出了一种手势识别装置,包括:获取单元,用于获取目标图像中的手部区域子图像,确定与手部区域子图像中的多个特征点对应的多个特征点位置信息;特征确定单元,用于根据多个特征点位置信息,确定多个特征点之间相对位置关系的第一位置特征向量;通过第一多层感知机对多个特征点位置信息进行处理,以得到多个特征点在手部区域子图像中的第二位置特征向量;输出单元,用于根据第一位置特征向量和第二位置特征向量,输出手部区域子图像的识别结果。
本申请的第三方面提出了一种电子设备,包括处理器,存储器及存储在存储器上并可在处理器上运行的程序或指令,程序或指令被处理器执行时实现上述第一方面的手势识别方法的步骤。
本申请的第四方面提出了一种可读存储介质,可读存储介质上存储程序或指令,程序或指令被处理器执行时实现如上述第一方面的手势识别方法的步骤。
本申请的第五方面提出了一种芯片,包括处理器和通信接口,通信接口和处理器耦合,处理器用于运行程序或指令,实现如上述第一方面的手势识别方法的步骤。
本申请提出的手势识别方法,首先获取待处理的图像中的手部区域子图像,并确定手部区域子图像中的多个特征点对应的多个特征点位置信息,再根据多个特征点对应的多个特征点位置信息通过计算分别得到第一位置特征向量和第二位置特征向量,最后根据第一位置特征向量和第二位置特征向量,确定手部区域子图像的识别结果。在这种手势识别方法中,由于 手部动作的完成需要骨骼的牵引,因此可以选择骨关节点作为手部区域子图像的特征点,通过多个特征点来精准识别动作类别。在确定了手部区域子图像中多个特征点的位置信息后,根据确定的特征点位置信息,分别确定多个特征点之间相对位置关系的第一位置特征向量和多个特征点在手部区域子图像中的第二位置特征向量。其中,第一位置特征向量保留了多个特征点之间的相对位置关系,排除了视角不同对识别动作类型的影响,也就是说,对于相同的动作,无论在什么视角获取目标图像,在得到手部区域子图像并确定多个特征点后,得到的多个特征点的第一位置特征向量是完全相同的。而第二位置特征向量保留了多个特征点在手部区域子图像中的绝对位置,通过对第一位置特征向量和第二特征位置的综合判断,也就是结合了多特征点之间的相对位置和特征点在图片中的绝对位置,最终得到手部区域子图像的动作类型。这种手势识别的方法,考虑了多个特征点之间的相对位置关系和多个特征点在手部区域子图像中的绝对位置关系,有效解决了在各种视角变换的情况下的误识别问题,提高了动作识别的稳定性。
本申请的附加方面和优点将在下面的描述部分中变得明显,或通过本申请的实践了解到。
附图说明
本申请的上述和/或附加的方面和优点从结合下面附图对实施例的描述中将变得明显和容易理解,其中:
图1示出了本申请的实施例的手势识别方法的流程示意图之一;
图2示出了在不同视角获取到的手部区域子图像之一;
图3示出了在不同视角获取到的手部区域子图像之二;
图4示出了在不同视角获取到的手部区域子图像之三;
图5示出了本申请的实施例的手势识别方法的流程示意图之二;
图6示出了本申请的实施例的手势识别方法的流程示意图之三;
图7示出了本申请的实施例的手势识别方法的特征点示意图;
图8示出了本申请的实施例的手势识别方法的流程示意图之四;
图9示出了本申请的实施例的手势识别方法的识别结果示意图;
图10示出了本申请实施例的手势识别方法的流程示意图之五;
图11示出了本申请实施例的手势识别方法的流程示意图之六;
图12示出了本申请实施例的手势识别方法的手部区域子图像的示意图;
图13示出了本申请实施例的手势识别装置的结构示意框图之一;
图14示出了本申请实施例的手势识别装置的结构示意框图之二;
图15示出了本申请实施例的手势识别装置的结构示意框图之三;
图16示出了本申请实施例的手势识别装置的结构示意框图之四;
图17示出了本申请实施例的手势识别装置的结构示意框图之五;
图18示出了本申请实施例的手势识别装置的结构示意框图之六;
图19示出了本申请实施例的电子设备的硬件结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请的说明书和权利要求书中的术语“第一”、“第二”等是用于区别类似的对象,而不用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,说明书以及权利要求中“和/或”表示所连接对象的至少其中之一,字符“/”,一般表示前后关联对象是一种“或”的关系。
下面结合附图,通过具体的实施例及其应用场景对本申请实施例提供的手势识别方法、图像处理装置、电子设备和可读存储介质进行详细地说明。
本申请第一方面的实施例,提出一种手势识别方法,图1示出了本申请的一个实施例的手势识别方法的流程示意图。其中,本申请实施例的手势识别方法包括:
步骤S102,获取目标图像中的手部区域子图像,确定与手部区域子图像中的多个特征点对应的多个特征点位置信息;
步骤S104,根据多个特征点位置信息,确定第一位置特征向量;
步骤S106,根据多个特征点位置信息,确定第二位置特征向量;
步骤S108,根据第一位置特征向量和第二位置特征向量,输出对手部区域子图像的识别结果。
在本申请的实施例中,首先获取目标图像中的手部区域子图像,示例地,可以通过任意方式获取M张目标图像,例如可以通过网络下载、相机拍摄、视频截取等方式获取任意格式及大小的M张目标图像。在进行特征点识别之前,可以对M张目标图像进行检测,以获得M张目标图像中的目标区域也即手部区域,进而提取出N张手部区域子图像,在后续识别过程中,只需要在手部区域子图像中提取特征点以及定位特征点的位置信息即可,可以缩小识别范围,减少识别过程中的计算量,提高识别的准确度和识别效率。由于动作的完成需要骨骼的牵引,因此本申请的实施例中,将骨关节点作为手部区域子图像的特征点,通过这些特征点来精准识别动作类别,进而确定识别结果。
确定了手部区域子图像中多个特征点的位置信息后,根据确定的特征点位置信息,分别确定第一位置特征向量,第一位置特征向量表示多个特征点中的任一个特征点相对于多个特征点中其余特征点的相对位置关系,第二位置特征向量表示多个特征点在手部区域子图像中的绝对位置关系,再根据第一位置特征向量和第二位置特征向量确定手部区域子图像的识别结果。
如图2、图3和图4所示,由于获取手部区域子图像的视角不同,对于同一手部姿态进行拍摄的图像中的特征点之间的相对关系也不相同,如大拇指的第一特征点202和第二特征点204的连线,与食指的第三特征点206和第四特征点208的连线之间具有不同的角度关系,手部姿态在立体空间中的关系在不同视角获取的图像中特征点之间的相对位置关系发生了变化。本申请实施例中的第一位置特征向量提现了多个特征点之间的相对 位置关系,排除了视角不同对识别动作类型的影响,也就是说,对于相同的动作,无论在什么视角获取目标图像,在得到手部区域子图像并确定多个特征点后,得到的多个特征点的第一位置特征向量是完全相同的。而第二位置特征向量保留了多个特征点在手部区域子图像中的绝对位置,通过对第一位置特征向量和第二特征位置的综合判断,也就是结合了多个特征点之间的相对位置和特征点在图片中的绝对位置,最终得到手部区域子图像的动作类型。这种手势识别的方法,考虑了多个特征点之间的相对位置关系和多个特征点在手部区域子图像中的绝对位置关系,有效解决了在各种视角变换的情况下的误识别问题,提高了动作识别的稳定性。
如图5所示,图5示出了本申请第一方面的一个实施例的手势识别方法的流程示意图。其中,该方法包括:
步骤S202,获取目标图像中的手部区域子图像,确定与手部区域子图像中的多个特征点对应的多个特征点位置信息;
步骤S204,根据多个特征点位置信息建立第一坐标矩阵,获取第一坐标矩阵的最大特征值所对应的第一特征向量;
步骤S206,通过第一多层感知机对第一特征向量进行处理,得到第一位置特征向量;
步骤S208,通过第二多层感知机对多个特征点位置信息进行处理,得到多个特征点在手部区域子图像中的第二位置特征向量;
步骤S210,根据第一位置特征向量和第二位置特征向量,输出手部区域子图像的识别结果。
在这些实施例中,在确定多个特征点的位置信息后,根据多个特征点的位置信息建立第一坐标矩阵,并通过计算得到第一坐标矩阵的最大特征值所对应的第一特征向量。第一特征向量的引入是为了便于进行后续运算。将得到的第一特征向量通过第一多层感知机进行处理,排除了视角对动作类型判断的影响,使得在不同视角下均可以准确确定多个特征点之间的相对位置关系。
可选地,在一些实施例中,第一坐标矩阵中的元素为多个特征点中的任 一个特征点相对于多个特征点中其余特征点中任一个特征点之间的欧式距离。
在这些实施例中,使用特征点之间的欧式距离作为第一坐标矩阵中的元素,欧式距离可以表示多维空间中两点之间的真实距离。因此,在这些实施例中第一坐标矩阵中引用了欧式距离,而不是特征点在垂直拍照设备方向上的平面距离,可以有效解决在不同视角情况下多个特征点中的任一个特征点相对于多个特征点中其余特征点中任一个特征点之间的距离判断的真实性的问题,从而能够准确的获得多个特征点在立体空间中的实际相对位置关系。
可选地,在一些实施例中,第一坐标矩阵为上三角矩阵。
示例地,第一坐标矩阵中元素是所在行对应的特征点与所在列对应的特征点之间的相对距离,例如,矩阵第二行第三列的元素可表示第二个特征点和第三个特征点之间的相对距离,矩阵第三行第二列的元素可表示第三个特征点和第二个特征点之间的相对距离,可以知道,这两个元素值是重复的,而大量重复元素会增加运算的复杂度。因此,在这些实施例中,建立的第一坐标矩阵是多维上三角矩阵,其对角线左下方的元素全部为0,而对角线右上方的元素则是所在行对应的特征点与所在列对应的特征点之间的相对距离,起到了简化运算过程的作用。
如图6所示,图6示出了本申请的第一方面的另一个实施例的手势识别方法的流程示意图。其中,该方法包括:
步骤S302,获取目标图像中的手部区域子图像,确定与手部区域子图像中的多个特征点对应的多个特征点位置信息;
步骤S304,根据多个特征点位置信息,确定第一位置特征向量;
步骤S306,根据多个特征点位置信息得到第二特征向量;
步骤S308,通过第二多层感知机对第二特征向量处理,得到第二位置特征向量;
步骤S310,根据第一位置特征向量和第二位置特征向量,输出对手部区域子图像的识别结果。
在这些实施例中,在确定多个特征点的位置信息后,根据多个特征点的位置信息确定第二特征向量,其中第二特征向量中元素分别表示每个特 征点在X方向和Y方向的位置信息。将得到的第二特征向量通过第二多层感知机进行处理,以使得到的第二位置特征向量的元素数量与第一位置特征向量中的元素数量相同,便于进行后续计算。
示例地,为保留特征点在手部区域子图像中的绝对位置关系,需要对特征点位置信息进行处理,如图7所示,获取到21个手势特征点的位置信息,首先定义一个向量,并将每个特征点的特征点位置信息分别依次填充入该向量的元素中,每个特征点的位置信息占据两个元素位,分别表示该特征点在X方向和Y方向上的坐标,将得到的包含42个元素的向量作为第二特征向量,如下:vector2=[position1-x,postion1-y,positon2-x,position2-y...,position21-x,position21-y],将得到的第二特征向量经过第二多层感知机的处理,得到包含21个元素的第二位置特征向量,也就是特征点在手部区域子图像中的第二位置特征向量。经过第三感知网络结构的处理,第二位置特征向量的元素数量与第一位置特征向量的元素数量相同,便于后续步骤对第一位置特征向量和第二位置特征向量进行综合分析,得到手部区域子图像的动作类别,进而确定识别结果。
如图8所示,图8示出了本申请的第一方面的另一个实施例的手势识别方法的流程示意图。其中,该方法包括:
步骤S402,获取目标图像中的手部区域子图像,确定与手部区域子图像中的多个特征点对应的多个特征点位置信息;
步骤S404,根据多个特征点位置信息,确定第一位置特征向量;
步骤S406,通过第一多层感知机对多个特征点位置信息进行处理,得到多个特征点在手部区域子图像中的第二位置特征向量;
步骤S408,将第一位置特征向量的向量值与第二位置特征向量的向量值按位相加,以得到融合向量值;
步骤S410,通过第三多层感知机对融合向量值进行处理,得到分类向量;
步骤S412,将分类向量中的最大值对应的动作类别确定为手部区域子图像的识别结果。
在这些实施例中,提出了根据第一位置特征向量和第二位置特征向量确定手部区域子图像的动作类别的具体方法。得到第一位置特征向量和第二位 置特征向量后,将第一位置特征向量和第二位置特征向量的向量值分别按位相加,也就是每个特征点的第一位置特征向量值和第二位置特征向量值相加。将相加得到的融合向量值通过第四多层网格感知结构进行处理,得到分类向量,分类向量中每个元素代表手部区域子图像中的动作符合该元素对应动作类型的概率,因此,分类向量中的最大值数值最大的元素对应的动作类别即为手部区域子图像中的动作最可能符合的动作类型,进而确定手势识别的识别结果。
示例地,第一位置特征向量和第二位置特征向量是元素数量相同的向量,融合时按位相加,得到的融合向量再经过第四多层网格感知结构处理得到分类向量,分类向量中每个元素代表手部区域子图像中的手势属于该元素所对应的动作类别的概率,因此取分类向量中的数值最大的元素对应的动作,也就是概率值最大的元素所对应的动作类别,即是手部区域子图像中的手势动作所对应的动作类别,因此,通过上述步骤,实现了手势的识别结果的输出。例如,图9示出了识别用户手势的识别结果为“手掌”的示意图。
如图10所示,图10示出了本申请的第一方面的另一个实施例的手势识别方法的流程示意图。其中,该方法包括:
步骤S502,通过预设的神经网络模型获取目标图像中的目标区域;
步骤S504,根据目标区域确定手部区域子图像,通过预设的神经网络模型识别手部区域子图像的多个特征点;
步骤S506,获取多个特征点的特征点位置信息;
步骤S508,根据多个特征点位置信息,确定第一位置特征向量;
步骤S510,通过第一多层感知机对多个特征点位置信息进行处理,得到多个特征点在手部区域子图像中的第二位置特征向量;
步骤S512,根据第一位置特征向量和第二位置特征向量,输出对手部区域子图像的识别结果。
在这些实施例中,首先通过预设的神经网络模型处理目标图像,以获得手部区域子图像所在的目标区域,也就是利用神经网络模型,在待处理的原始图像中找到手部区域子图像,并确定其区域范围。然后根据目标区域确定手部区域子图像。通过上一步骤确定的区域范围,进一步缩小范围,确定手部区域 子图像,手部区域子图像上有多个特征点,通过这些特征点可以精准识别动作类别。通过预设的神经网络模型识别手部区域子图像的多个特征点;获取多个特征点的特征点位置信息,就可以根据获取到的特征点位置信息进一步确定手部区域子图像的识别结果。
示例地,可以通过预设的神经网络模型中的手掌检测模型处理目标图像,以获得手部区域子图像所在的目标区域。通过手掌检测模型识别获取到的目标图像,手掌检测模型可以通过深度学习中的矩阵运算方法获得手部区域子图像所在区域四边形的顶点位置信息,继而可以框选出手部区域子图像所在的目标区域,也就是本实施例中的手部区域。最后对目标图像进行裁剪,裁剪保留框选出的手部区域子图像所在的目标区域。
示例地,可以通过预设的神经网络模型中的特征点检测模型实现根据目标区域确定手部区域子图像,通过预设的神经网络模型识别手部区域子图像的多个特征点,并获取多个特征点位置信息。特征点检测模型对裁剪后的目标图像进行检测得到手部区域子图像,通过深度学习中的矩阵运算可以得到手部区域子图像的多个特征点及其位置信息。
可选地,在检测出多个特征点位置信息后,可以使用传统手势识别方法(如:卡尔曼滤波等处理法)对特征点进行平滑和去抖,使特征点更具有灵敏性及稳定性,避免由于拍摄过程中抖动或者影响目标图像的成像质量,从而影响多个特征点位置的确定。
如图11所示,图11示出了本申请的第一方面的另一个实施例的手势识别方法的流程示意图。其中,该方法包括:
步骤S602,接收第一输入;
步骤S604,响应于第一输入,获取包含手部区域子图像的目标图像;
步骤S606,获取目标图像中的手部区域子图像,确定与手部区域子图像中的多个特征点对应的多个特征点位置信息;
步骤S608,根据多个特征点位置信息,确定第一位置特征向量;
步骤S610,根据多个特征点位置信息,确定第二位置特征向量;
步骤S612,根据第一位置特征向量和第二位置特征向量,输出手部区域子图像的识别结果。
在这些实施例中,在获取目标图像中的手部区域子图像之前,先接收第一输入,并响应于第一输入,获取包含手部区域子图像的目标图像。在没有接收到第一输入时,不获取待处理的图像,也不进行后续操作,避免频繁进行不必要的手势识别所带来的巨大计算量,减小计算负荷。可选地,第一输入可以是屏幕端输入或者语音输入,不同的输入方式可以应对多种不同的使用场景,给用户带来更好的体验。
举例来说,接收的第一输入可以是屏幕端的输入,可以是用户通过点击触摸屏的方式输入的,图12示出了本实施例的手势识别方法中获取到的对“手掌”进行拍摄以获取目标图像的示意图。如图12所示,用户打开拍照相机预览,启动拍照功能。接收到用户在屏幕端的点击动作,并将这个动作作为第一输入,然后响应于接收到的第一输入,启动拍照预览功能,在拍照预览功能模式下,做出手势并使手部进入镜头拍摄范围内,就可以获取包含手部区域子图像的目标图像。
在本申请的一些实施例中,提供了一种在不同视角下,对手部进行拍照时,进行手势识别的方法,具体可以包括以下步骤。
一、获得特征点位置信息
首先用户打开拍照相机预览,启动手掌拍照功能,接收并响应第一输入。然后加载三个深度学***滑和去抖,使特征点更具有灵敏性及稳定性。接下来提取稳定的人工先验特征以应对实际复杂场景中的视角变化。
二、确定视角不变特征
实际场景中经常因视角不同而导致相同手势在照片中的形状不同,手势特征点的位置坐标也不相同,如图2至图4所示,该手势为短视频中常见的“比心”,图2至图4展示了该手势在各种视角的变化,相关技术中的启发式的人工特征一般是基于规则的方法,例如针对该手势,其中的较关键的特征点拇指第一特征点202、拇指第二特征点204、食指第三特征点206、食指第四特征点208,为了定义该手势,既需要考虑到上述特征点之间的相互关系,还需要考虑到这种相互关系需要在各种旋转等视角变化下的稳定性。利用上一步骤中获取到的手势21个特征点的位置信息,计算21*21的上三角矩阵,其中每个矩阵的元素表示点与点之间的欧氏距离,对该矩阵提取最大特征值对应的特征向量,得到包含21个元素的第一特征向量,利用多层感知机对第一特征向量进一步提取特征,得到特征点之间相对位置关系的第一位置特征向量,即视角不变特征向量。
三、确定自动学习特征
视角不变特征feature1保留特征点之间的相对位置关系,为保留特征点在图片中的绝对位置关系,需要将原始的特征点信息通过有监督的多层感知机获取到自动学习特征feature2,具体的需要定义一个第二特征向量,第二特征向量包含42个元素,分别表示每个特征点在x和y方向上的坐标,然后经过三层多层感知机,得到包含21个元素的自动学习特征向量,也就是特征点在手部区域子图像中的第二位置特征向量。
四、确定手势类别
融合第一位置特征向量也即视角不变特征向量以及第二位置特征向量也即自动学习特征向量,进行分类结果的输出。示例地,第一位置特征向量和第二位置特征向量是具有相同维度的矢量,融合时按位相加,得到的融合向量再经过两层多层感知机之后得到最终的分类向量结果,其代表属于各个手势类别 的概率,概率最大的类别即是对应的手势,因此,通过上述步骤,实现了手势类别的输出。之后,识别用户手势的动作类别为“手掌”,捕捉用户意图,假定提取到“手掌”的含义为“拍照”,说明用户做出该手势的目的是需要实现拍照功能,最后触发拍照功能,将用户照片保存。
如图13所示,本申请第二方面的实施例提出了一种手势识别装置100,包括:获取单元110,用于获取目标图像中的手部区域子图像,确定与手部区域子图像中的多个特征点对应的多个特征点位置信息;特征确定单元120,用于根据多个特征点位置信息,确定多个特征点之间相对位置关系的第一位置特征向量;通过第一多层感知机对多个特征点位置信息进行处理,以得到多个特征点在手部区域子图像中的第二位置特征向量;输出单元130,用于根据第一位置特征向量和第二位置特征向量,输出手部区域子图像的识别结果。
示例地,在本实施例中,获取单元110起到了获取手部区域子图像并确定手部区域子图像中多个特征点位置信息的传入作用,特征确定单元120起到了确定第一位置特征向量和第二位置特征向量,也就是确定特征点之间的相对位置和特征点在图像中绝对位置的作用,而输出单元130根据特征确定单元120确定的第一位置特征向量和第二位置特征向量,进行相应的运算处理,确定并输出手部区域子图像的识别结果。这种手势识别装置,考虑了多个特征点之间的相互关系,有效解决了在各种视角变换的情况下的误识别问题,提高了动作识别的稳定性。
在一些实施例中,如图14所示,特征确定单元120包括:第一特征获取子单元122,用于根据多个特征点位置信息建立第一坐标矩阵,获取第一坐标矩阵的最大特征值所对应的第一特征向量;第一特征确定子单元124,用于通过第一多层感知机对第一特征向量进行处理,得到第一位置特征向量。
在这些实施例中,在确定多个特征点的位置信息后,第一特征获取子单元122根据多个特征点的位置信息建立第一坐标矩阵,并通过计算得到第一坐标矩阵的最大特征值所对应的第一特征向量。第一特征向量的引入是为了选取一组很好的基,便于进行后续运算。第一特征确定子单元124将得 到的第一特征向量通过第一多层感知机进行处理,排除了视角对动作类型判断的影响,使得在不同视角下均可以准确识别动作类型。
可选地,在一些实施例中,第一坐标矩阵中的元素为多个特征点中任意两个特征点之间的欧式距离。
在这些实施例中,使用特征点之间的欧式距离作为第一坐标矩阵中的元素,欧式距离可以表示多维空间中两点之间的真实距离,因此,在这些实施例中引用了欧式距离,而不是特征点在垂直拍照设备方向上的平面距离,可以有效解决在不同视角情况下任意两个特征点之间的距离判断问题。
可选地,在一些实施例中,第一坐标矩阵为上三角矩阵。
示例地,第一坐标矩阵中元素是所在行对应的特征点与所在列对应的特征点之间的相对距离,例如,矩阵第二行第三列的元素可表示第二个特征点和第三个特征点之间的相对距离,矩阵第三行第二列的元素可表示第三个特征点和第二个特征点之间的相对距离,可以知道,这两个元素值是重复的,而大量重复元素会增加运算的复杂度,因此,在这些实施例中,建立的第一坐标矩阵是多维上三角矩阵,其对角线左下方的元素全部为0,而对角线右上方的元素则是所在行对应的特征点与所在列对应的特征点之间的相对距离,起到了简化运算过程的作用。
在一些实施例中,如图15所示,特征确定单元120还包括:第二特征获取子单元126,用于根据多个特征点位置信息得到第二特征向量;第二特征确定子单元128,用于通过第二多层感知机对第二特征向量处理,得到第二位置特征向量。
在这些实施例中,在确定多个特征点的位置信息后,第二特征获取子单元126根据多个特征点的位置信息确定第二特征向量,其中第二特征向量中元素分别表示每个特征点在x和y方向的位置。第二特征确定子单元128将得到的第二特征向量通过第二多层感知机进行处理,以使得到的第二位置特征向量的元素数量与第一位置特征向量中元素数量相同,便于进行后续计算。
在一些实施例中,如图16所示,输出单元130还包括:融合单元132, 用于将第一位置特征向量的向量值与第二位置特征向量的向量值按位相加,以得到融合向量值;处理单元134,用于通过第三多层感知机对融合向量值进行处理,得到分类向量;确定单元136,用于将分类向量中的最大值对应的动作类别确定为手部区域子图像的识别结果。
在这些实施例中,提出了根据第一位置特征向量和第二位置特征向量确定手部区域子图像的识别结果的输出单元130的单元结构。融合单元132得到第一位置特征向量和第二位置特征向量后,将第一位置特征向量和第二位置特征向量的向量值分别按位相加,也就是每个特征点的第一位置特征向量值和第二位置特征向量值相加。处理单元134将相加得到的融合向量值通过第四多层网格感知结构进行处理,得到分类向量,分类向量中每个元素代表手部区域子图像中的动作符合该元素对应动作类型的概率,因此确定单元136选取分类向量中的最大值对应的动作类别,也就是选取手部区域子图像中的动作最可能符合的动作类型。
在一些实施例中,如图17所示,获取单元110还包括:区域获取子单元112,用于通过预设的神经网络模型处理目标图像,获得手部区域子图像所在的目标区域;特征点获取子单元114,用于根据目标区域确定手部区域子图像,通过预设的神经网络模型识别手部区域子图像的多个特征点;位置信息获取子单元116,用于获取多个特征点的特征点位置信息。
在这些实施例中,首先区域获取子单元112通过预设的神经网络模型处理目标图像,以获得手部区域子图像所在的目标区域,也就是利用神经网络模型,在待处理的原始图像中找到手部区域子图像,并确定其区域范围。然后特征点获取子单元114根据目标区域确定手部区域子图像。通过上一步骤确定的区域范围,进一步缩小范围,确定手部区域子图像,手部区域子图像上有多个特征点,通过这些特征点可以精准识别动作类别。特征点获取子单元114通过预设的神经网络模型识别手部区域子图像的多个特征点;位置信息获取子单元116获取多个特征点的特征点位置信息,就可以根据获取到的特征点位置信息进一步确定图像的识别结果。
在一些实施例中,如图18所示,手势识别装置100还包括:接收单元140,用于接收第一输入;响应单元150,用于响应于第一输入,获取包含手部区域子图像的目标图像。
在这些实施例中,在获取目标图像中的手部区域子图像之前,接收单元140先接收第一输入,响应单元150响应于第一输入,获取包含手部区域子图像的目标图像。在接收单元140没有接收到第一输入时,手势识别装置100不获取待处理的图像,也不进行后续操作,避免频繁进行不必要的手势识别所带来的巨大计算量,减小计算负荷。可选地,第一输入可以是屏幕端输入或者语音输入,不同的输入方式可以应对多种不同的使用场景,给用户带来更好的体验。
图19示出了根据申请实施例的电子设备的硬件结构示意图。
可选地,本申请实施例提供的电子设备1900例如可以是手机、笔记本电脑、平板电脑等。
该电子设备1900包括但不限于:射频单元1901、网络模块1902、音频输出单元1903、输入单元1904、传感器1905、显示单元1906、用户输入单元1904、接口单元1908、存储器1909、以及处理器1910等部件。
本领域技术人员可以理解,电子设备1900还可以包括给各个部件供电的电源1911(比如电池),电源可以通过电源管理***与处理器1910逻辑相连,从而通过电源管理***实现管理充电、放电、以及功耗管理等功能。图19中示出的电子设备结构并不构成对电子设备的限定,电子设备可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置,在此不再赘述。
处理器1910,确定与手部区域子图像中的多个特征点对应的多个特征点位置信息;用于根据多个特征点位置信息,确定多个特征点之间相对位置关系的第一位置特征向量;通过第一多层感知机对多个特征点位置信息进行处理,得到多个特征点在手部区域子图像中的第二位置特征向量;根据第一位置特征向量和第二位置特征向量,输出手部区域子图像的识别结果。
应理解的是,本申请实施例中,射频单元1901可用于收发信息或收发通 话过程中的信号,具体的,接收基站的下行数据或向基站发送上行数据。射频单元1901包括但不限于天线、至少一个放大器、收发信机、耦合器、低噪声放大器、双工器等。
网络模块1902为用户提供了无线的宽带互联网访问,如帮助用户收发电子邮件、浏览网页和访问流式媒体等。
音频输出单元1903可以将射频单元1901或网络模块1902接收的或者在存储器1909中存储的音频数据转换成音频信号并且输出为声音。而且,音频输出单元1903还可以提供与电子设备1900执行的特定功能相关的音频输出(例如,呼叫信号接收声音、消息接收声音等等)。音频输出单元1903包括扬声器、蜂鸣器以及受话器等。
输入单元1904用于接收音频或视频信号。输入单元1904可以包括图形处理器(Graphics Processing Unit,GPU)5082和麦克风5084,图形处理器5082对在视频捕获模式或图像捕获模式中由图像捕获装置(如摄像头)获得的静态图片或视频的图像数据进行处理。处理后的图像帧可以显示在显示单元1906上,或者存储在存储器1909(或其它存储介质)中,或者经由射频单元1901或网络模块1902发送。麦克风5084可以接收声音,并且能够将声音处理为音频数据,处理后的音频数据可以在电话通话模式的情况下转换为可经由射频单元1901发送到移动通信基站的格式输出。
电子设备1900还包括至少一种传感器1905,比如指纹传感器、压力传感器、虹膜传感器、分子传感器、陀螺仪、气压计、湿度计、温度计、红外线传感器、光传感器、运动传感器以及其他传感器。
显示单元1906用于显示由用户输入的信息或提供给用户的信息。显示单元1906可包括显示面板5122,可以采用液晶显示器、有机发光二极管等形式来配置显示面板5122。
用户输入单元1907可用于接收输入的数字或字符信息,以及产生与电子设备的用户设置以及功能控制有关的键信号输入。示例地,用户输入单元1907包括触控面板5142以及其他输入设备5144。触控面板5142也称为触摸屏, 可收集用户在其上或附近的触摸操作。触控面板5142可包括触摸检测装置和触摸控制器两个部分。其中,触摸检测装置检测用户的触摸方位,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成触点坐标,再送给处理器1910,接收处理器1910发来的命令并加以执行。其他输入设备5144可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆,在此不再赘述。
进一步的,触控面板5142可覆盖在显示面板5122上,当触控面板5142检测到在其上或附近的触摸操作后,传送给处理器1910以确定触摸事件的类型,随后处理器1910根据触摸事件的类型在显示面板5122上提供相应的视觉输出。触控面板5142与显示面板5122可作为两个独立的部件,也可以集成为一个部件。
接口单元1908为外部装置与电子设备1900连接的接口。例如,外部装置可以包括有线或无线头戴式耳机端口、外部电源(或电池充电器)端口、有线或无线数据端口、存储卡端口、用于连接具有识别模块的装置的端口、音频输入/输出(I/O)端口、视频I/O端口、耳机端口等等。接口单元1908可以用于接收来自外部装置的输入(例如,数据信息、电力等等)并且将接收到的输入传输到电子设备1900内的一个或多个元件或者可以用于在电子设备1900和外部装置之间传输数据。
存储器1909可用于存储软件程序以及各种数据。存储器1909可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作***、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据移动终端的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器1909可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。
处理器1910通过运行或执行存储在存储器1909内的软件程序和/或模块,以及调用存储在存储器1909内的数据,执行电子设备1900的各种功能和处理 数据,从而对电子设备1900进行整体监控。处理器1910可包括一个或多个处理单元;可选地,处理器1910可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作***、用户界面和应用程序等,调制解调处理器主要处理无线通信。
电子设备1900还可以包括给各个部件供电的电源1911,可选地,电源1911可以通过电源管理***与处理器1910逻辑相连,从而通过电源管理***实现管理充电、放电、以及功耗管理等功能。
本申请实施例还提供一种可读存储介质,可读存储介质上存储有程序或指令,该程序或指令被处理器执行时实现上述手势识别方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
其中,处理器为上述实施例中的电子设备中的处理器。可读存储介质,包括计算机可读存储介质,计算机可读存储介质的示例包括非暂态计算机可读存储介质,如计算机只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等。
本申请实施例还提出了一种芯片,包括处理器和通信接口,通信接口和处理器耦合,处理器用于运行程序或指令,实现如上述第一方面的手势识别方法的步骤,因而具备该手势识别方法的全部有益效果,在此不再赘述。
应理解,本申请实施例提到的芯片还可以称为***级芯片、***芯片、芯片***或片上***芯片等。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个......”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。此外,需要指出的是,本申请实施方式中的方法和装置的范围不限按示出或讨论的顺序来执行功能,还可包括根据所涉及的功能按基本同时的方式或按相反的顺序来执行功能,例如,可以按不同于所 描述的次序来执行所描述的方法,并且还可以添加、省去、或组合各种步骤。另外,参照某些示例所描述的特征可在其他示例中被组合。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例的方法。
上面结合附图对本申请的实施例进行了描述,但是本申请并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本申请的启示下,在不脱离本申请宗旨和权利要求所保护的范围情况下,还可做出很多形式,均属于本申请的保护之内。在本说明书的描述中,术语“一个实施例”、“一些实施例”、“具体实施例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或特点包含于本申请的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施例或实例。而且,描述的具体特征、结构、材料或特点可以在任何的一个或多个实施例或示例中以合适的方式结合。
以上仅为本申请的可选实施例而已,并不用于限制本申请,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种手势识别方法,包括:
    获取目标图像中的手部区域子图像,确定与所述手部区域子图像中的多个特征点对应的多个特征点位置信息;
    根据所述多个特征点位置信息,确定第一位置特征向量,所述第一位置特征向量表示所述多个特征点中的任一个特征点相对于所述多个特征点中其余特征点的相对位置关系;
    根据所述多个特征点位置信息,确定第二位置特征向量,所述第二位置特征向量表示所述多个特征点在所述手部区域子图像中的绝对位置关系;
    根据所述第一位置特征向量和所述第二位置特征向量,输出对所述手部区域子图像的识别结果。
  2. 根据权利要求1所述的手势识别方法,其中,所述根据多个所述特征点位置信息,确定第一位置特征向量的步骤,具体包括:
    根据多个所述特征点位置信息建立第一坐标矩阵,获取所述第一坐标矩阵的最大特征值所对应的第一特征向量;
    通过第一多层感知机对所述第一特征向量进行处理,得到所述第一位置特征向量。
  3. 根据权利要求2所述的手势识别方法,其中,
    所述第一坐标矩阵中的元素为所述多个特征点中的任一个特征点相对于所述多个特征点中其余特征点中任一个特征点之间的欧式距离。
  4. 根据权利要求2所述的手势识别方法,其中,
    所述第一坐标矩阵为上三角矩阵。
  5. 根据权利要求1所述的手势识别方法,其中,所述根据所述多个特征点位置信息,确定第二位置特征向量步骤,具体包括:
    根据所述多个特征点位置信息得到第二特征向量;
    通过第二多层感知机对所述第二特征向量处理,得到所述第二位置特征向 量。
  6. 根据权利要求1所述的手势识别方法,其中,所述根据所述第一位置特征向量和所述第二位置特征向量,输出所述手部区域子图像的识别结果的步骤,具体包括:
    将所述第一位置特征向量的向量值与所述第二位置特征向量的向量值按位相加,以得到融合向量值;
    通过第三多层感知机对所述融合向量值进行处理,得到分类向量;
    将所述分类向量中的最大值对应的动作类别确定为所述手部区域子图像的识别结果。
  7. 根据权利要求1至6中任一项所述的手势识别方法,其中,获取目标图像中的手部区域子图像,确定所述手部区域子图像中的特征点的特征点位置信息的步骤,具体包括:
    通过预设的神经网络模型获取所述目标图像中的目标区域;
    根据所述目标区域确定所述手部区域子图像,通过预设的神经网络模型识别所述手部区域子图像的多个特征点;
    获取所述多个特征点的特征点位置信息。
  8. 根据权利要求1至6中任一项所述的手势识别方法,其中,在所述获取目标图像中的手部区域子图像的步骤之前,还包括:
    接收第一输入;
    响应于所述第一输入,获取包含所述手部区域子图像的目标图像。
  9. 一种手势识别装置,包括:
    获取单元,用于获取目标图像中的手部区域子图像,确定与所述手部区域子图像中的多个特征点对应的多个特征点位置信息;
    特征确定单元,用于根据所述多个特征点位置信息,确定第一位置特征向量和第二位置特征向量,所述第一位置特征向量表示所述多个特征点中的任一个特征点相对于所述多个特征点中其余特征点的相对位置关系;所述第二位置特征向量表示所述多个特征点在所述手部区域子图像中的绝对位置关系;
    输出单元,用于根据所述第一位置特征向量和所述第二位置特征向量,输出所述手部区域子图像的识别结果。
  10. 根据权利要求9所述的手势识别装置,其中,所述特征确定单元包括:
    第一特征获取子单元,用于根据多个所述特征点位置信息建立第一坐标矩阵,获取所述第一坐标矩阵的最大特征值所对应的第一特征向量;
    第一特征确定子单元,用于通过第一多层感知机对所述第一特征向量进行处理,得到所述第一位置特征向量。
  11. 根据权利要求10所述的手势识别装置,其中,
    所述第一坐标矩阵中的元素为所述多个特征点中的任一个特征点相对于所述多个特征点中其余特征点中任一个特征点之间的欧式距离。
  12. 根据权利要求10所述的手势识别装置,其中,
    所述第一坐标矩阵为上三角矩阵。
  13. 根据权利要求9所述的手势识别装置,其中,所述特征确定单元还包括:
    第二特征获取子单元,用于根据所述多个特征点位置信息得到第二特征向量;
    第二特征确定子单元,用于通过第二多层感知机对所述第二特征向量处理,得到所述第二位置特征向量。
  14. 根据权利要求9所述的手势识别装置,其中,所述输出单元包括:
    融合单元,用于将所述第一位置特征向量的向量值与所述第二位置特征向量的向量值按位相加,以得到融合向量值;
    处理单元,用于通过第三多层感知机对所述融合向量值进行处理,得到分类向量;
    确定单元,用于将所述分类向量中的最大值对应的动作类别确定为所述手部区域子图像的识别结果。
  15. 根据权利要求9至14中任一项所述的手势识别装置,其中,所述获取单元包括:
    区域获取子单元,通过预设的神经网络模型获取所述目标图像中的目标区域;
    特征点获取子单元,根据所述目标区域确定所述手部区域子图像,通过预设的神经网络模型识别所述手部区域子图像的多个特征点;
    位置信息子获取单元,用于获取所述多个特征点的特征点位置信息。
  16. 根据权利要求9至14中任一项所述的手势识别装置,还包括:
    接收单元,用于接收第一输入;
    响应单元,用于响应于所述第一输入,获取包含所述手部区域子图像的目标图像。
  17. 一种电子设备,包括处理器,存储器及存储在所述存储器上并可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现如权利要求1至8中任一项所述的手势识别方法的步骤。
  18. 一种电子设备,被配置用于执行如权利要求1至8中任一项所述的手势识别方法的步骤。
  19. 一种可读存储介质,所述可读存储介质上存储程序或指令,所述程序或指令被处理器执行时实现如权利要求1至8中任一项所述的手势识别方法的步骤。
  20. 一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现如权利要求1至8中任一项所述的手势识别方法的步骤。
PCT/CN2021/143855 2021-01-15 2021-12-31 手势识别方法和装置、电子设备、可读存储介质和芯片 WO2022152001A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21919188.9A EP4273745A4 (en) 2021-01-15 2021-12-31 GESTURE RECOGNITION METHOD AND APPARATUS, ELECTRONIC DEVICE, READABLE STORAGE MEDIUM AND CHIP
US18/222,476 US20230360443A1 (en) 2021-01-15 2023-07-16 Gesture recognition method and apparatus, electronic device, readable storage medium, and chip

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110057731.XA CN112699849A (zh) 2021-01-15 2021-01-15 手势识别方法和装置、电子设备、可读存储介质和芯片
CN202110057731.X 2021-01-15

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/222,476 Continuation US20230360443A1 (en) 2021-01-15 2023-07-16 Gesture recognition method and apparatus, electronic device, readable storage medium, and chip

Publications (1)

Publication Number Publication Date
WO2022152001A1 true WO2022152001A1 (zh) 2022-07-21

Family

ID=75515431

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/143855 WO2022152001A1 (zh) 2021-01-15 2021-12-31 手势识别方法和装置、电子设备、可读存储介质和芯片

Country Status (4)

Country Link
US (1) US20230360443A1 (zh)
EP (1) EP4273745A4 (zh)
CN (1) CN112699849A (zh)
WO (1) WO2022152001A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115937743A (zh) * 2022-12-09 2023-04-07 武汉星巡智能科技有限公司 基于图像融合的婴幼儿看护行为识别方法、装置及***

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112699849A (zh) * 2021-01-15 2021-04-23 维沃移动通信有限公司 手势识别方法和装置、电子设备、可读存储介质和芯片
CN113780083A (zh) * 2021-08-10 2021-12-10 新线科技有限公司 一种手势识别方法、装置、设备及存储介质
CN114063772B (zh) * 2021-10-26 2024-05-31 深圳市鸿合创新信息技术有限责任公司 手势识别方法、装置、设备及介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140361981A1 (en) * 2013-06-07 2014-12-11 Canon Kabushiki Kaisha Information processing apparatus and method thereof
CN111126339A (zh) * 2019-12-31 2020-05-08 北京奇艺世纪科技有限公司 手势识别方法、装置、计算机设备和存储介质
CN111158478A (zh) * 2019-12-26 2020-05-15 维沃移动通信有限公司 响应方法及电子设备
CN112699849A (zh) * 2021-01-15 2021-04-23 维沃移动通信有限公司 手势识别方法和装置、电子设备、可读存储介质和芯片

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8600166B2 (en) * 2009-11-06 2013-12-03 Sony Corporation Real time hand tracking, pose classification and interface control
CN110688914A (zh) * 2019-09-09 2020-01-14 苏州臻迪智能科技有限公司 一种手势识别的方法、智能设备、存储介质和电子设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140361981A1 (en) * 2013-06-07 2014-12-11 Canon Kabushiki Kaisha Information processing apparatus and method thereof
CN111158478A (zh) * 2019-12-26 2020-05-15 维沃移动通信有限公司 响应方法及电子设备
CN111126339A (zh) * 2019-12-31 2020-05-08 北京奇艺世纪科技有限公司 手势识别方法、装置、计算机设备和存储介质
CN112699849A (zh) * 2021-01-15 2021-04-23 维沃移动通信有限公司 手势识别方法和装置、电子设备、可读存储介质和芯片

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4273745A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115937743A (zh) * 2022-12-09 2023-04-07 武汉星巡智能科技有限公司 基于图像融合的婴幼儿看护行为识别方法、装置及***
CN115937743B (zh) * 2022-12-09 2023-11-14 武汉星巡智能科技有限公司 基于图像融合的婴幼儿看护行为识别方法、装置及***

Also Published As

Publication number Publication date
US20230360443A1 (en) 2023-11-09
EP4273745A4 (en) 2024-05-15
EP4273745A1 (en) 2023-11-08
CN112699849A (zh) 2021-04-23

Similar Documents

Publication Publication Date Title
WO2022152001A1 (zh) 手势识别方法和装置、电子设备、可读存储介质和芯片
EP4266244A1 (en) Surface defect detection method, apparatus, system, storage medium, and program product
US20220076000A1 (en) Image Processing Method And Apparatus
WO2019101021A1 (zh) 图像识别方法、装置及电子设备
WO2021078116A1 (zh) 视频处理方法及电子设备
CN111541845B (zh) 图像处理方法、装置及电子设备
WO2019233229A1 (zh) 一种图像融合方法、装置及存储介质
CN108712603B (zh) 一种图像处理方法及移动终端
WO2019174628A1 (zh) 拍照方法及移动终端
US20220309836A1 (en) Ai-based face recognition method and apparatus, device, and medium
CN107592459A (zh) 一种拍照方法及移动终端
WO2021190428A1 (zh) 图像拍摄方法和电子设备
JP7394879B2 (ja) 撮像方法及び端末
CN107580209A (zh) 一种移动终端的拍照成像方法及装置
WO2022052620A1 (zh) 图像生成方法及电子设备
WO2020182035A1 (zh) 图像处理方法及终端设备
WO2021190387A1 (zh) 检测结果输出的方法、电子设备及介质
CN109272473B (zh) 一种图像处理方法及移动终端
US11386586B2 (en) Method and electronic device for adding virtual item
WO2021147921A1 (zh) 图像处理方法、电子设备及计算机可读存储介质
CN107845057A (zh) 一种拍照预览方法及移动终端
WO2021082772A1 (zh) 截屏方法及电子设备
CN109544445B (zh) 一种图像处理方法、装置及移动终端
CN109068063B (zh) 一种三维图像数据的处理、显示方法、装置及移动终端
WO2020238454A1 (zh) 拍摄方法及终端

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21919188

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021919188

Country of ref document: EP

Effective date: 20230801

NENP Non-entry into the national phase

Ref country code: DE