CN113703568B - Gesture recognition method, gesture recognition device, gesture recognition system, and storage medium - Google Patents

Gesture recognition method, gesture recognition device, gesture recognition system, and storage medium Download PDF

Info

Publication number
CN113703568B
CN113703568B CN202110786284.1A CN202110786284A CN113703568B CN 113703568 B CN113703568 B CN 113703568B CN 202110786284 A CN202110786284 A CN 202110786284A CN 113703568 B CN113703568 B CN 113703568B
Authority
CN
China
Prior art keywords
vibration information
gesture
information
gesture recognition
frames
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110786284.1A
Other languages
Chinese (zh)
Other versions
CN113703568A (en
Inventor
何柏霖
王灿
段声才
李鹏博
吴新宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN202110786284.1A priority Critical patent/CN113703568B/en
Publication of CN113703568A publication Critical patent/CN113703568A/en
Application granted granted Critical
Publication of CN113703568B publication Critical patent/CN113703568B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Human Computer Interaction (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Signal Processing (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The application discloses a gesture recognition method, a gesture recognition device, a gesture recognition system and a storage medium, wherein the method comprises the following steps: obtaining vibration information on tendons; processing the vibration information to obtain the corresponding characteristics of the vibration information; and classifying the vibration information based on the characteristics to obtain gesture types corresponding to the tendons. Through the mode, the gesture classification method and the gesture classification device can effectively classify and identify gesture categories.

Description

Gesture recognition method, gesture recognition device, gesture recognition system, and storage medium
Technical Field
The present application relates to the field of man-machine recognition technologies, and in particular, to a gesture recognition method, a gesture recognition device, a gesture recognition system, and a storage medium.
Background
In the field of exoskeleton human-machine interfaces, intent recognition has become an important research point, including gait recognition, gesture recognition, and the like. So that the application of researching the tendon tones of the wrist in the gesture recognition of the rehabilitation exoskeleton is wider and wider.
Typically, sound may be collected by sensors on the tendons of the wrist. However, some researchers record sound through microphones through stethoscopes fixed to the skin. When the prior art scheme is applied to the exoskeleton, measurement errors are often caused to collide with equipment moving the exoskeleton, noise is affected when the environment noise is combined to collect sound signals, and the probability of category identification is reduced.
Disclosure of Invention
A first aspect of an embodiment of the present application provides a gesture recognition method, including: obtaining vibration information on tendons; processing the vibration information to obtain the corresponding characteristics of the vibration information; and classifying the vibration information based on the characteristics to obtain gesture types corresponding to the tendons.
A second aspect of an embodiment of the present application provides a gesture recognition apparatus, which includes a processor and a memory, where the memory stores a computer program, and the processor is configured to execute the computer program to implement a method provided by the first aspect of the embodiment of the present application.
A third aspect of an embodiment of the present application provides a gesture recognition system, including:
A sensor configured to be fixed to a tendon part of a hand for collecting vibration information of the tendon part;
the processing equipment is used for processing the vibration information to obtain the corresponding characteristics of the vibration information;
The extracting device is used for extracting the characteristics corresponding to the vibration information;
the gesture recognition device is connected with the sensor, the processing equipment and the extracting equipment and is used for executing the method provided by the first aspect of the embodiment of the application.
A fourth aspect of the embodiments of the present application provides a computer readable storage medium storing a computer program which, when executed by a processor, is capable of implementing the method provided by the first aspect of the embodiments of the present application.
The beneficial effects of the application are as follows: in contrast to the situation of the related art, according to the present gesture classification recognition method, the acquired vibration information on the tendons is processed, noise in the surrounding environment is removed, characteristics corresponding to the tendons on the hands are obtained, the gesture classification is closely related to the vibration information of the tendons, the characteristics of the vibration information corresponding to the tendons are recognized and classified, and gesture types corresponding to the tendons can be quickly obtained.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a system framework diagram of a gesture recognition method of the present application;
FIG. 2 is a schematic diagram of gesture categories for the gesture recognition method of the present application, wherein FIG. 2 (1) shows a five finger open gesture; FIG. 2 (2) shows a gesture to lower the wrist; FIG. 2 (3) shows a gesture of combining five fingers into a fist;
FIG. 3 is a flow chart of an embodiment of a gesture recognition method of the present application;
FIG. 4 is a flowchart of step S12 in FIG. 3;
FIG. 5 is a flowchart of step S22 in FIG. 4;
FIG. 6 is a flowchart of step S23 in FIG. 4;
FIG. 7 is a flowchart of step S33 in FIG. 5;
FIG. 8 is a flowchart illustrating the step S13 of FIG. 3 according to an embodiment;
FIG. 9 is a schematic diagram of experimental results of an embodiment of a gesture recognition method of the present application;
FIG. 10 is a schematic block diagram of one embodiment of a gesture recognition apparatus of the present application;
FIG. 11 is a schematic block diagram of one embodiment of a gesture recognition system of the present application;
FIG. 12 is a schematic block diagram of one embodiment of a computer-readable storage medium of the present application;
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
As used in this specification and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".
In order to explain the technical solution of the present application, the following description is made by using specific embodiments to provide a gesture recognition method according to a first aspect of the present application, and in order to better explain the gesture recognition method according to the present application, please refer to fig. 1, fig. 1 is a system frame diagram of the gesture recognition method according to the present application, where the system at least includes the steps of: s1: collecting signals; s2: pretreatment; s3: energy activation; s4: extracting features; s5: and (5) classification.
Specifically, a sensor is provided on the wrist tendon so that the sensor can measure slight vibration of sound on the wrist tendon. And preprocessing, energy activation, feature extraction and classification are carried out on the sound signals converted from the vibration signals by using preset software, so that classification of different gestures is obtained.
Still further, where the acquisition board may be selected from the semiconductor of law (STEVAL-MKIGIBV), the sensor may be selected from LIS25BA, although other types of acquisition boards and sensors may be selected as desired by those skilled in the art, without limitation.
For more convenient understanding of the meaning of gesture categories, please refer to fig. 2, fig. 2 is a schematic diagram of gesture category of the gesture recognition method of the present application, wherein fig. 2 (1) shows a gesture of opening five fingers, which is simplified to be represented by gesture HS; fig. 2 (2) shows a gesture for fastening down the wrist, which is simplified to be shown by a gesture FR; fig. 2 (3) shows a gesture of combining five fingers into a fist, which is simplified to be represented by a gesture FM, however, other gestures are also possible, and these three gestures are merely taken as examples, and are not limited thereto.
In order to explain the technical solution of the present application, the following description is made by using specific embodiments to explain the gesture recognition method according to the first aspect of the present application, and referring to fig. 3, fig. 3 is a schematic flow chart of an embodiment of the gesture recognition method according to the present application, where the gesture recognition method specifically includes the following steps:
s11: obtaining vibration information on tendons;
In general, a sensor is provided on a tendon for sensing vibration information on the tendon, and specifically, vibration information may be obtained by measuring vibration of the tendon through the sensor, for example, a vibration detection sensor is employed, and the vibration detection sensor mainly includes an eddy current sensor, a speed sensor, an acceleration sensor, and the like.
The minute and small amplitude of the motion related to the change in the gesture can be measured with a more accurate acceleration sensor, and vibration information on the tendon can be acquired.
It should be noted that the sensor may be disposed on a tendon, or may be disposed on a mechanical arm, or disposed on another human body portion, so long as the vibration information of the portion to be measured can be sensed, which is not limited herein.
S12: processing the vibration information to obtain the corresponding characteristics of the vibration information;
Each vibration information corresponds to a unique feature for distinguishing gesture classification corresponding to the vibration information, so that the feature corresponding to the vibration information can be obtained by processing the vibration information, for example, the vibration information can be extracted by adopting preset software, and the feature corresponding to the vibration information can be obtained.
Specifically, the features corresponding to the vibration information are different from the types of gesture classification, and the above-mentioned gesture classification includes an HS gesture motion, an FM gesture motion and an FS gesture motion, and the three gestures are different from each other due to the represented vibration information, so that the features corresponding to the HS gesture motion, the features corresponding to the FM gesture motion and the features corresponding to the FS gesture motion are obtained by processing the vibration information.
Of course, these three gestures are merely examples, and are not limited thereto, as other gestures are possible.
S13: and classifying the vibration information based on the characteristics to obtain gesture types corresponding to the tendons.
The vibration information can be classified by acquiring the features corresponding to the vibration information, specifically, the features corresponding to the gesture types can be stored in a database in advance, and the features corresponding to the acquired vibration information are compared with the features corresponding to the gesture types stored in the database in advance.
If the characteristics can be matched, the gesture type corresponding to the characteristics can be obtained as the gesture type corresponding to the pre-stored tendon. If the characteristics cannot be matched, the characteristics may be named and stored in a database for use in subsequent operation recognition and judgment, and of course, other methods may be used to classify vibration information, and the vibration information may be classified by just one method to obtain gesture types corresponding to tendons.
Therefore, according to the present gesture classification recognition method, the acquired vibration information on the tendons is processed, noise in the surrounding environment is removed, characteristics corresponding to the tendons on the hands are obtained, the gesture classification is closely related to the vibration information of the tendons, the characteristics corresponding to the tendons are recognized and classified, and gesture types corresponding to the tendons can be quickly obtained.
Further, referring to fig. 4, fig. 4 is a flowchart illustrating an embodiment of step S12 in fig. 3, and the vibration information processing method specifically includes the following steps:
S21: converting the vibration information to obtain a sound signal of the tendon;
In general, the sensor collects vibration information on tendons, wherein the term "vibration" is generally focused on the fields of physics and science, and refers to a process of periodically changing motion of an object or a particle or a certain physical quantity, and has a certain time rule and period, for example, electromagnetic vibration generates electromagnetic waves.
In the embodiment of the application, the sensor on the tendon acquires vibration information generated by gesture motion change on the tendon. Because the sensor is placed in the environment, the surrounding environment will typically have some noise. Therefore, the vibration information here includes not only sounds on tendons but also sounds generated by the surrounding environment.
Therefore, by converting the vibration information, a sound signal of the tendon can be obtained.
S22: preprocessing the sound signal by adopting preset software to obtain audio information corresponding to the sound signal;
The sound signals converted from the vibration signals are preprocessed by preset software, specifically, the sound signals converted from the vibration signals are preprocessed by MATLAB software, so that audio information corresponding to the sound signals can be obtained.
The audio information, i.e. the audio signal, is a regular sound wave frequency, amplitude variation information carrier with speech, music and sound effects. Typically an electrical signal that can be received by an audio device, such as a sound device, and then played back as sound.
Sound signals generally refer to the carrier of sound, i.e. sound waves. The voice signal is preprocessed through the preset software, so that the audio information corresponding to the voice signal can be obtained, the voice signal is converted into an electric signal, the voice signal can be conveniently processed, and the feasibility of the gesture recognition scheme is improved.
S23: energy activating is carried out on the audio information;
The audio information includes sound wave frequency and amplitude information, and since the amplitude of the sound signal generated by the tendon corresponding to the gesture operation is small, it is generally difficult for the human ear to hear the sound signal generated by the tendon corresponding to the gesture operation under the surrounding environment condition, that is, the amplitude of the sound signal is small.
In order to find out useful information in the audio information more conveniently, energy activation needs to be performed on the audio information, for example, the audio information is amplified according to a preset proportion, so that information meeting the utilization condition in the audio information is found out.
Of course, the audio information may be amplified according to a preset proportion to perform energy activation, or the audio information may be subjected to protocol control over a predetermined segment of the audio information to perform energy activation, which is not limited herein, and those skilled in the art may perform energy activation on the audio information in other manners.
S24: and extracting the characteristics of the activated audio information to obtain the characteristics corresponding to the vibration information.
The characteristics corresponding to the vibration information of the activated audio information are more obvious, so that the activated audio information can be conveniently subjected to characteristic extraction, and the characteristics corresponding to the vibration information are obtained.
Specifically, the features herein may include audio frequency and audio amplitude of the audio information, and may also be mean and variance of the audio information, which are not specifically limited herein. Generally, the difference of the extraction process can be different according to the difference of the characteristics corresponding to the vibration information, for example, the audio frequency can be subjected to characteristic extraction according to the frequency speed of the audio frequency, for example, the audio amplitude can be subjected to characteristic extraction according to the amplitude of the audio frequency, so that the follow-up confirmation of gesture categories is facilitated.
Further, referring to fig. 5, fig. 5 is a flowchart illustrating an embodiment of step S22 in fig. 4, and the method specifically includes the following steps:
s31: filtering the sound signal by using a low-pass filter to obtain audio information;
Since the surrounding environment easily causes noise to be generated in the vibration information acquired by the sensor, in order to further filter the sound signal, the sound signal may be filtered by a low-pass filter, which is an electronic filter device that allows a signal below the cut-off frequency to pass but a signal above the cut-off frequency cannot pass.
Because the tendon corresponding to the gesture produces less vibration, but the noise produced by the surrounding environment is larger, the signal intensity difference in the middle is too large, so that the audio amplitude corresponding to the sound signal produced by the tendon is displayed too small, and in the post-processing process, the audio amplitude which is too small in amplitude and insufficient in difference is generally difficult to identify.
Therefore, the sound signal is filtered through the low-pass filter to obtain the denoised audio information, so that the feasibility of a later scheme is facilitated.
S32: framing the audio information to obtain a plurality of frames corresponding to the audio information;
The denoised audio information comprises a plurality of frames, and because the gestures are different, the corresponding frames are also different, so that the audio information is divided into frames to obtain a plurality of frames corresponding to the audio information, and then the frames are processed.
Specifically, in an experiment of a specific embodiment, the sensor employs a 24kHz sampling frequency, frame shift 510, frame length 510. Each gesture was performed 40 times for each subject. The audio information of 40 actions is divided into frames of about 500 frames each.
S33: a plurality of frames are processed using a Hamming window, and frames greater than an energy threshold are selected.
The sound signal is a non-stationary time-varying signal whose production is closely related to the movement of the sound organ. And the state change speed of the sound generating organ is much slower than the speed of the sound vibration, so that the sound signal can be regarded as stationary for a short time.
Because the actual sound signal is long, it is generally not possible nor necessary to process very long data at a time. In general, the solution is to take one piece of data at a time, analyze it, then take the next piece of data, and analyze it again.
In order to obtain the frames with value more quickly, an energy threshold is set for screening a plurality of frames, usually a small piece of audio information has no obvious periodicity, a hamming window is adopted for processing the plurality of frames, the data shape is expressed as periodicity, and the selection of frames larger than the energy threshold is facilitated.
Further, referring to fig. 6, fig. 6 is a flowchart illustrating an embodiment of step S23 in fig. 4, which specifically includes the following steps:
S41: energy activating a plurality of frames;
From the above, the sensor uses a 24kHz sampling frequency, frame shift 510, frame length 510. Each gesture was performed 40 times for each subject. The audio information of 40 actions is divided into frames of about 500 frames each.
Specifically, the energy activation is performed on a plurality of frames, specifically, in this specific experiment, on the one hand, the energy activation is performed on 500 frames, and the energy activation is performed on 2000 frames corresponding to 40 motivations; on the other hand, since there are different experimental objects and different gestures, the plurality of frames herein may also represent audio information frames corresponding to different gestures, and further may represent audio information frames corresponding to different experimental objects.
S42: performing wavelet transformation processing on the frames after the energy activation to obtain a plurality of wavelet coefficient characteristics;
A wavelet refers to a wave whose energy is very concentrated in the time domain, whose energy is limited, and which is concentrated near a certain point, and whose integral value is zero. The fourier transform of audio is a decomposition of the sound signal into sine waves of various frequencies. Also, wavelet transformation is the decomposition of image sound into a set of wavelets after displacement and scaling by the original wavelet.
Specifically, the wavelet transform processing step: step 1: the wavelet w (t) is compared with the beginning of the primitive function f (t) to calculate the coefficient C. Coefficient C represents the degree of similarity of the partial function to the wavelet; step 2: the wavelet is shifted to the right by k units to obtain wavelet w (t-k), and 1 is repeated. Repeating the step until the function f is finished; step 3: expanding the wavelet w (t) to obtain the wavelet w (t/2), and repeating the steps 1 and 2; step 4: the wavelet is continuously expanded, and the steps 1,2 and 3 are repeated.
Therefore, the wavelet transform process is performed on the frames after the activation of the plurality of energies, so that a plurality of wavelet coefficient features, for example, wavelet coefficient (31111 dimension) features, are obtained, and then, for example, wavelet coefficients of frequencies corresponding to each time point are obtained, so that the wavelet coefficients are used as input, and a support vector machine is selected as a classification algorithm.
S43: and inputting the characteristics of the wavelet coefficients into a training set according to a preset proportion to obtain activated audio information.
The gesture recognition method is provided with a preset proportion and is used for training and inputting the obtained wavelet coefficient characteristics, so that on one hand, unnecessary wavelet coefficient characteristics are eliminated, and on the other hand, valuable wavelet coefficient characteristics are screened.
Thus, a plurality of wavelet coefficient features are input into the training set at a preset ratio, for example, the preset ratio is 70%, that is, specifically, 70% of the obtained wavelet coefficient (31111 dimension) features are input as the training set to the SVM, so that the activated audio information is obtained.
Of course, the preset proportion may be 65%, 75%, 80% or other data, and is specifically selected and set according to actual requirements, which is not limited herein.
Still further, a hamming window is used to process a plurality of frames, and a frame greater than the energy threshold is selected, referring to fig. 7, fig. 7 is a flowchart of an embodiment of step S33 in fig. 5, which specifically includes the following steps:
S51: judging whether the energy value of the frame is larger than an energy threshold value or not by utilizing a Hamming window;
the hamming window is actually a function, mainly a barrier effect caused by truncation. The fence effect is a phenomenon that when the frequency spectrum calculated by the audio information transformation is limited to an integral multiple of the fundamental frequency, the output can only be seen at corresponding discrete points.
If a hamming window is added, only the data in the middle is represented, and the data information on two sides is lost, but the hamming window is moved in the moving process, such as a 1/3 or 1/2 window, and the lost data of the previous frame or two frames lost due to the fence effect is represented again, so that the effective energy of the frames can be judged, such as judging whether the energy value of the frames is larger than the energy threshold value or not by utilizing the hamming window.
If so, step S52 is entered, i.e., it is determined to select frames greater than the energy threshold. If not, the process proceeds to step S53, i.e., the frame is discarded, the next frame is selected, and the determination is prepared, and the process returns to step S51.
Further, based on the characteristics, the vibration information is classified to obtain gesture types corresponding to tendons, referring to fig. 8, fig. 8 is a flow chart of an embodiment of step S13 in fig. 3, which specifically includes the following steps:
S61: matching the characteristics corresponding to the vibration information with the characteristics corresponding to the gestures in a preset gesture library;
The method comprises the steps of presetting a gesture library, pre-storing corresponding characteristics of gesture types, matching the characteristics corresponding to the vibration information with the characteristics corresponding to the gestures in the preset gesture library in order to distinguish the gesture types corresponding to the vibration information more quickly, comparing the characteristics corresponding to the vibration information with the characteristics corresponding to the gestures in the preset gesture library, and distinguishing the gesture types corresponding to the vibration information by judging the characteristics corresponding to the gestures in the preset gesture library, wherein the characteristics corresponding to the gestures in the preset gesture library are consistent, the inconsistent and the inconsistent places.
S62: judging whether the characteristics corresponding to the vibration information are successfully matched with the characteristics corresponding to the gestures in a preset gesture library;
Through judgment, whether the characteristics corresponding to the vibration information are successfully matched with the characteristics corresponding to the gestures in the preset gesture library or not can be known.
If the matching is successful, step S63 is performed, that is, it is determined that the gesture corresponding to the vibration information is a preset gesture type, specifically, voice prompt may be performed, or popup prompt may be performed. If the matching is not successful, the process proceeds to step S64, i.e. a pop-up window, for warning prompt. Of course, this is just a feedback mechanism, and other modes of prompting are also possible, and are not limited in this regard.
Further, in order to measure the accuracy of the gesture recognition method according to the embodiment of the present application, a series of experiments are performed, please refer to fig. 9, fig. 9 is a schematic diagram of experimental results of the gesture recognition method according to the embodiment of the present application, wherein the ordinate indicates the accuracy (accuracy), and the abscissa indicates the subject number.
The experiment collects data of 5 persons, the ages of the persons are between 23 and 27 years, each gesture of each experimental object is performed 40 times, each person is a group of experiments, and the first person is a first group of experiments, the second person is a second group of experiments, the third person is a third group of experiments, the fourth person is a fourth group of experiments, and the fifth person is a fifth group of experiments.
The audio information of 40 actions is divided into frames of about 500 frames each. The frames are then subjected to a wavelet transform process, 70% of which are the training set and the remaining 30% of which are the test set. The sensor is 24kHz sampling frequency, frame shift 510, frame length 510, and Hamming window to collect signals. The feature dimension 31111 extracted after wavelet transformation is used as an SVM input. The test result is shown in fig. 3, and the third person has the highest accuracy of 95.16%. And the average accuracy of 5 people is 92.332%.
Thus, it is known that the accuracy of gesture recognition is 92.332% on average over multiple measurements. In the first set of experiments, the accuracy of gesture recognition was 93.96%. In the second set of experiments, the accuracy of gesture recognition was 90.87%. In the third set of experiments, the accuracy of gesture recognition was 95.16%. In the fourth set of experiments, the accuracy of gesture recognition was 90.72%. In the fifth set of experiments, the accuracy of gesture recognition was 90.95%.
Therefore, the hand gesture classification method based on the sound signal has the advantages that 1) compared with a sensor in contact with a human body, the hand gesture classification method based on the sound signal can be used for collecting sound generated when hands at tendons move, the hand gesture classification method based on the sound signal has the advantages that the hand gesture classification method based on the sensor is achieved only by one sensor, and the problem that a subject wears uncomfortable due to the fact that the sensors are too many is effectively avoided. 2) Compared with a visual scheme, the method is easier to develop, and as a biosensing signal, the response speed is faster than the visual response speed when the biosensing signal is collected, and the gesture recognition speed is faster. 3) The bone conduction sensor is used for collecting sound signals, so that the influence of environmental noise on the collected signals can be effectively avoided.
Further, referring to fig. 10, fig. 10 is a schematic block diagram of an embodiment of a gesture recognition apparatus of the present application. The second aspect of the embodiment of the present application provides a gesture recognition apparatus 4, which includes a processor 41 and a memory 42, where the memory 42 stores a computer program 421, and the processor 41 is configured to execute the computer program 421 to perform the recognition method of the first aspect of the embodiment of the present application, which is not described herein.
Further, referring to fig. 11, fig. 11 is a schematic block diagram illustrating an embodiment of a gesture recognition system of the present application. A third aspect of the embodiments of the present application further provides a gesture recognition system 5, the gesture recognition system 5 comprising: a sensor 51 configured to be fixed to a tendon site of a hand for acquiring vibration information of the tendon site; a processing device 52, configured to process the vibration information to obtain characteristics corresponding to the vibration information; an extracting device 53 for extracting a feature corresponding to the vibration information; the gesture recognition apparatus 54 is connected to the sensor 51, the processing device 52 and the extracting device 53, and is used for executing the recognition method according to the first aspect of the embodiment of the present application, which is not described herein.
Referring to FIG. 12, FIG. 12 is a schematic block diagram of one embodiment of a computer readable storage medium of the present application. The fourth aspect of the embodiments of the present application also provides a computer-readable storage medium, if implemented in the form of a software functional unit and sold or used as a stand-alone product, which can be stored in the computer-readable storage medium 60. Based on this understanding, the technical solution of the present application may be embodied essentially or partly in the form of a software product or all or part of the technical solution, which is stored in a storage means, and includes several instructions (computer program 61) for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. The aforementioned storage device includes: various media such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk or an optical disk, and electronic devices such as a computer, a mobile phone, a notebook computer, a tablet computer, and a camera having the above-mentioned storage media.
The description of the execution process of the computer program in the computer readable storage medium may be described with reference to the above embodiment of the processing method of the gesture recognition apparatus 5 of the present application, which is not repeated herein.
The foregoing description is only a partial embodiment of the present application, and is not intended to limit the scope of the present application, and all equivalent devices or equivalent processes using the descriptions and the drawings of the present application or directly or indirectly applied to other related technical fields are included in the scope of the present application.

Claims (9)

1.A method of gesture recognition, the method comprising:
Obtaining vibration information on tendons;
processing the vibration information to obtain the corresponding characteristics of the vibration information;
classifying the vibration information based on the characteristics to obtain gesture types corresponding to the tendons;
Wherein the processing the vibration information includes:
Converting the vibration information to obtain a sound signal of the tendon;
Preprocessing the sound signal by adopting preset software to obtain audio information corresponding to the sound signal;
Energy activating the audio information;
Extracting the characteristics of the activated audio information to obtain the characteristics corresponding to the vibration information;
The energy activating the audio information comprises:
And performing energy activation on the audio information by performing protocol control on the audio information in a preset section bit.
2. The method of claim 1, wherein the step of determining the position of the substrate comprises,
The preprocessing the sound signal by adopting preset software comprises the following steps:
Filtering the sound signal by using a low-pass filter to obtain the audio information;
framing the audio information to obtain a plurality of frames corresponding to the audio information;
a plurality of the frames are processed using a Hamming window, and frames greater than an energy threshold are selected.
3. The method of claim 2, wherein the step of determining the position of the substrate comprises,
Said energy activating said sound signal comprises:
energy activating a plurality of the frames;
Performing wavelet transformation processing on a plurality of frames after energy activation to obtain a plurality of wavelet coefficient characteristics;
And inputting a plurality of wavelet coefficient characteristics into a training set according to a preset proportion to obtain the activated audio information.
4. The method of claim 2, wherein the step of determining the position of the substrate comprises,
Said processing a plurality of said frames using a hamming window, selecting frames greater than an energy threshold, comprising:
judging whether the energy value of the frame is larger than an energy threshold value or not by utilizing a Hamming window;
if so, it is determined to select frames greater than the energy threshold.
5. The method of claim 1, wherein the step of determining the position of the substrate comprises,
Based on the characteristics, classifying the vibration information to obtain gesture categories corresponding to the tendons, including:
matching the characteristics corresponding to the vibration information with the characteristics corresponding to the gestures in a preset gesture library;
judging whether the characteristics corresponding to the vibration information are successfully matched with the characteristics corresponding to the gestures in a preset gesture library or not;
If the matching is successful, determining that the gesture corresponding to the vibration information is a preset gesture type.
6. The method of claim 1, wherein the step of determining the position of the substrate comprises,
The obtaining vibration information on tendons includes:
and measuring the vibration of the tendon by a sensor to obtain the vibration information.
7. A gesture recognition apparatus comprising a processor and a memory, the memory having stored therein a computer program for executing the computer program to implement the processing method of any of claims 1-6.
8. A gesture recognition system, the gesture recognition system comprising:
A sensor configured to be fixed to a tendon site of a hand for acquiring vibration information of the tendon site;
The processing equipment is used for processing the vibration information to obtain the corresponding characteristics of the vibration information;
The extraction equipment is used for extracting the characteristics corresponding to the vibration information;
Gesture recognition means connected to said sensor, to said processing device and to said extraction device for performing the method according to any of claims 1-6.
9. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by a processor, implements the method according to any of claims 1-6.
CN202110786284.1A 2021-07-12 2021-07-12 Gesture recognition method, gesture recognition device, gesture recognition system, and storage medium Active CN113703568B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110786284.1A CN113703568B (en) 2021-07-12 2021-07-12 Gesture recognition method, gesture recognition device, gesture recognition system, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110786284.1A CN113703568B (en) 2021-07-12 2021-07-12 Gesture recognition method, gesture recognition device, gesture recognition system, and storage medium

Publications (2)

Publication Number Publication Date
CN113703568A CN113703568A (en) 2021-11-26
CN113703568B true CN113703568B (en) 2024-06-21

Family

ID=78648480

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110786284.1A Active CN113703568B (en) 2021-07-12 2021-07-12 Gesture recognition method, gesture recognition device, gesture recognition system, and storage medium

Country Status (1)

Country Link
CN (1) CN113703568B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3193317A1 (en) * 2016-01-15 2017-07-19 Thomson Licensing Activity classification from audio

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2698686B1 (en) * 2012-07-27 2018-10-10 LG Electronics Inc. Wrist-wearable terminal and control method thereof
US10216274B2 (en) * 2014-06-23 2019-02-26 North Inc. Systems, articles, and methods for wearable human-electronics interface devices
US20170215768A1 (en) * 2016-02-03 2017-08-03 Flicktek Ltd. Wearable controller for wrist
CN107145236B (en) * 2017-05-12 2020-02-07 中国科学技术大学 Gesture recognition method and system based on wrist tendon pressure related characteristics
WO2020186477A1 (en) * 2019-03-20 2020-09-24 深圳大学 Intelligent input method and system based on bone conduction
CN111103976B (en) * 2019-12-05 2023-05-02 深圳职业技术学院 Gesture recognition method and device and electronic equipment

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3193317A1 (en) * 2016-01-15 2017-07-19 Thomson Licensing Activity classification from audio

Also Published As

Publication number Publication date
CN113703568A (en) 2021-11-26

Similar Documents

Publication Publication Date Title
EP3191924B1 (en) Method and apparatus for differentiating touch screen users based on touch event analysis
Ajmera et al. Text-independent speaker identification using Radon and discrete cosine transforms based features from speech spectrogram
EP2064698B1 (en) A method and a system for providing sound generation instructions
CN110069199B (en) Skin type finger gesture recognition method based on smart watch
Strese et al. Surface classification using acceleration signals recorded during human freehand movement
Bansal et al. Environmental Sound Classification: A descriptive review of the literature
Majidnezhad et al. An ANN-based method for detecting vocal fold pathology
Rai et al. An automatic classification of bird species using audio feature extraction and support vector machines
Chaki Pattern analysis based acoustic signal processing: a survey of the state-of-art
Pan et al. Cognitive acoustic analytics service for Internet of Things
US6751580B1 (en) Tornado recognition system and associated methods
Ryu et al. Embedded identification of surface based on multirate sensor fusion with deep neural network
Rahman et al. Dynamic time warping assisted svm classifier for bangla speech recognition
Ribeiro et al. Binary neural networks for classification of voice commands from throat microphone
CN113703568B (en) Gesture recognition method, gesture recognition device, gesture recognition system, and storage medium
Mielke et al. Smartphone application for automatic classification of environmental sound
Sephus et al. Modulation spectral features: In pursuit of invariant representations of music with application to unsupervised source identification
Rao Audio signal processing
Boualoulou et al. Speech analysis for the detection of Parkinson’s disease by combined use of empirical mode decomposition, Mel frequency cepstral coefficients, and the K-nearest neighbor classifier
Abinaya Acoustic based scene event identification using deep learning cnn
Okubo et al. Recognition of transient environmental sounds based on temporal and frequency features
Hesham Wavelet-scalogram based study of non-periodicity in speech signals as a complementary measure of chaotic content
Kumar et al. Raaga identification using clustering algorithm
Cenedese et al. A parsimonious approach for activity recognition with wearable devices: An application to cross-country skiing
Naronglerdrit et al. Monitoring of indoors human activities using mobile phone audio recordings

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant