CN111913585A - Gesture recognition method, device, equipment and storage medium - Google Patents

Gesture recognition method, device, equipment and storage medium Download PDF

Info

Publication number
CN111913585A
CN111913585A CN202010997902.2A CN202010997902A CN111913585A CN 111913585 A CN111913585 A CN 111913585A CN 202010997902 A CN202010997902 A CN 202010997902A CN 111913585 A CN111913585 A CN 111913585A
Authority
CN
China
Prior art keywords
video data
hand
control instruction
target
text information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010997902.2A
Other languages
Chinese (zh)
Inventor
李文栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Apollo Intelligent Connectivity Beijing Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202010997902.2A priority Critical patent/CN111913585A/en
Publication of CN111913585A publication Critical patent/CN111913585A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Multimedia (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The application discloses a gesture recognition method, a gesture recognition device, gesture recognition equipment and a storage medium, and relates to the field of computer videos and artificial intelligence. The specific implementation scheme is as follows: acquiring video data containing hand movements; obtaining a target movement track based on the hand motion presented by the video data; obtaining text information corresponding to the hand motion presented by the video data based on the target moving track, and determining a control instruction corresponding to the text information, wherein the control instruction can instruct a target device to perform corresponding operation. So, can effectively discern the hand action, especially discern the handwriting action of separating the space, generate corresponding control command to richen gesture recognition's mode, and then richen gesture recognition's use scene, promote user experience.

Description

Gesture recognition method, device, equipment and storage medium
Technical Field
The application relates to the field of computers, in particular to the field of computer vision and artificial intelligence. The present application is also applicable to the field of autopilot.
Background
The existing human-computer interaction scene comprises a plurality of interaction modes, such as function keys and knobs, a touch screen, voice recognition, gesture recognition and the like, so that the application scene is enriched, and the user experience is improved; however, in the existing gesture recognition interaction mode, a touch screen needs to be combined mostly, so that the use scene is limited, for example, in the car machine interaction scene, if the touch screen is used to realize the interaction function, the potential safety hazard in the driving process is inevitably increased.
Disclosure of Invention
The application provides a gesture recognition method, a gesture recognition device, gesture recognition equipment and a storage medium.
According to an aspect of the present application, there is provided a gesture recognition method including:
acquiring video data containing hand movements;
obtaining a target movement track based on the hand motion presented by the video data;
obtaining text information corresponding to the hand motion presented by the video data based on the target moving track, and determining a control instruction corresponding to the text information, wherein the control instruction can instruct a target device to perform corresponding operation.
According to another aspect of the present application, there is provided a gesture recognition apparatus including:
the data acquisition unit is used for acquiring video data containing hand movements;
the track determining unit is used for obtaining a target moving track based on the hand motion presented by the video data;
the text information determining unit is used for obtaining text information corresponding to the hand action presented by the video data based on the target moving track;
and the instruction determining unit is used for determining a control instruction corresponding to the text information, wherein the control instruction can instruct the target equipment to perform corresponding operation.
According to another aspect of the present application, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method described above.
According to another aspect of the present application, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method described above.
According to the scheme, the hand action is effectively recognized, for example, the spaced handwriting action is recognized, and the corresponding control instruction is generated, so that the gesture recognition modes are enriched, the use scenes of the gesture recognition are also enriched, and a foundation is laid for simplifying the user operation and improving the user experience.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present application, nor do they limit the scope of the present application. Other features of the present application will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
FIG. 1 is a first flowchart illustrating a method for gesture recognition according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of a second implementation flow of a gesture recognition method according to an embodiment of the present application;
FIG. 3 is a schematic structural diagram of a gesture recognition apparatus according to an embodiment of the present application;
FIG. 4 is a block diagram of an electronic device for implementing a gesture recognition method according to an embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The present application provides a gesture recognition method, which is applied to a gesture recognition apparatus, and specifically, fig. 1 is a schematic flow chart of an implementation of the gesture recognition method according to an embodiment of the present application, and as shown in fig. 1, the method includes:
step S101: video data including hand movements is acquired.
Step S102: and obtaining a target movement track based on the hand motion presented by the video data.
Step S103: obtaining text information corresponding to the hand motion presented by the video data based on the target moving track, and determining a control instruction corresponding to the text information, wherein the control instruction can instruct a target device to perform corresponding operation.
Therefore, the hand action can be effectively recognized, for example, the spaced handwriting action is recognized, the corresponding text information is obtained, and then the control instruction corresponding to the text information is obtained, so that the gesture recognition mode is enriched, the gesture recognition use scene is enriched, and a foundation is laid for simplifying the user operation and improving the user experience.
Here, in practical applications, a mapping relationship between the text information and the control instruction may be preset, and after the text information is determined, the control instruction corresponding to the text information may be determined based on a preset mapping relationship table, so that the control operation is completed.
In practical applications, the text information mainly includes characters, for example, includes a character, or includes a character string, and the characters may be specific to symbols, letters, or words, so that the control command corresponding to the text information is determined based on the semantic meaning represented by the text information.
Here, the scheme of the application can be applied to a car machine interaction scene, for example, in the driving process of a car, the scheme of the application is adopted, a driver does not need to operate a touch screen or touch keys, the car machine, such as an on-board device and the like, can be controlled through the spaced hand action, and thus compared with the existing touch operation or key operation mode, the scheme of the application enriches the human-computer interaction mode, improves the user experience, and lays a foundation for meeting the use requirements of different users. Moreover, due to the fact that the scheme can identify the space handwriting and further achieve the space handwriting input function, potential safety hazards in the driving process can be effectively avoided in the vehicle-mounted machine interaction scene, and a foundation is laid for improving driving safety.
In a specific example of the scheme of the application, in consideration of a usage habit of a user in an actual scene, for example, the habit is to use a finger to perform space writing, and to improve an accuracy rate of recognition and avoid invalid recognition, in the process of determining the trajectory, only the movement trajectory of the finger may be determined, and then the movement trajectory of the finger is taken as a target movement trajectory, specifically, step S102 may specifically include: obtaining a target movement track based on the finger movement characteristics in the hand action presented by the video data; therefore, an effective target moving track is obtained, and a foundation is laid for subsequently improving the accuracy of text recognition.
In a specific example of the scheme of the application, in order to further improve the recognition accuracy, a model trained in advance can be used for text recognition, so that on one hand, the recognition efficiency can be improved, and on the other hand, the accuracy of a recognition result is also improved. Specifically, step S102 may specifically include: inputting a video frame sequence at least containing hand motion in the video data to a preset neural network model to obtain the target moving track; the preset neural network model is obtained after a sample video marked with a moving track is trained, the sample video comprises hand actions, and the marked moving track is matched with the hand actions of the sample video.
In practical application, in consideration of the problem of model processing efficiency, the acquired video data can be preprocessed to remove video frames which do not include hand motions, so as to obtain a video frame sequence including the hand motions, and then only the video frame sequence including the hand motions is input into a preset neural network model for track recognition, so that the model processing efficiency is improved.
Of course, in an actual scene, in consideration of the image processing capability and the computing capability of the device, all video data may also be input to the preset neural network model for identifying the movement track, and the movement track may be selected based on the actual processing capability of the device, which is not limited in the present application.
In consideration of the use habits of users in actual scenes, for example, the habits of using fingers to write in space, the method can only identify the video frame sequence containing the finger movement characteristics in the video data to obtain the target movement track, so that the identification efficiency is improved, and meanwhile, a foundation is laid for subsequently improving the accuracy of the identification result.
In a specific example of the present application, when performing text recognition, a text recognition model may be used to recognize a target movement track, and specifically, in step S103, obtaining text information corresponding to a hand motion presented by the video data based on the target movement track includes: inputting the target movement track into a preset recognition model to obtain probability characteristics (such as probability values) of representing preset characters by the target movement track; and determining character information represented by the target movement track based on the probability characteristic that the target movement track represents preset characters so as to obtain text information corresponding to the hand action presented by the video data. Here, the preset recognition model is trained based on a mapping relationship between the movement trajectory and the character.
That is to say, in this example, a preset recognition model (for example, a text recognition model) is utilized to obtain a probability value that the target movement track belongs to a certain character or character string, and then determine text information corresponding to the target movement track, for example, text content corresponding to the character or character string with the probability value greater than a preset threshold is used as the text information of the target movement track, so that recognition efficiency is improved and accuracy of a recognition result is improved.
In a specific example of the present application, the gesture recognition apparatus implementing the gesture recognition method can also control a state of the image capturing device, for example, as shown in fig. 2, including:
step S001: and detecting a starting instruction. In practical applications, the starting instruction may be specifically an instruction generated based on a voice input, or a trigger generated based on other user operations, which is not limited in the present application.
Step S002: and responding to the starting instruction, and triggering the image acquisition equipment indicated by the starting instruction to acquire the image of the hand motion in the acquisition area to obtain video data.
Therefore, the intellectualization of gesture recognition is further improved, and a foundation is laid for effectively recognizing hand actions. Moreover, the image acquisition equipment is triggered and started based on the starting instruction, so that the scheme of the application can be flexibly configured based on the requirements of actual demand scenes, and thus, a foundation is laid for meeting the user requirements of different scenes and improving and enriching the user experience.
In a specific example of the scheme of the application, the control instruction may be a corresponding instruction in a vehicle-mounted device environment, and at this time, the vehicle indicated by the control instruction may be correspondingly operated based on the control instruction, for example, an air conditioning device is controlled to perform temperature adjustment; or, based on the control instruction, performing corresponding operation on the vehicle-mounted device in the vehicle indicated by the control instruction, for example, controlling the smart sound box to play music. Therefore, the user operation is simplified, the human-computer interaction requirement of the user is met, and the user experience is further improved.
Therefore, the hand action can be effectively recognized, for example, the spaced handwriting action is recognized, the corresponding text information is obtained, and then the control instruction corresponding to the text information is obtained, so that the gesture recognition mode is enriched, the gesture recognition use scene is enriched, and a foundation is laid for simplifying the user operation and improving the user experience.
The following further detailed description is provided with reference to specific examples, and specifically, the present application combines an image gesture recognition technology with a handwriting recognition input method technology, collects a gesture image of a user through a camera, recognizes a character or a character string input by the user in an air space, and transposes the recognized character or character string into a corresponding software function, thereby implementing an air space handwriting recognition input function similar to voice recognition, greatly expanding a vehicle-mounted image recognition interaction function, avoiding potential safety hazards in a driving process, and laying a foundation for improving driving safety.
The present example specifically contains several key conditions and steps:
under the environment of a vehicle-mounted vehicle, camera hardware suitable for image acquisition and a matched software environment are configured, for example, the device for realizing the scheme of the application is integrated into a central control of the vehicle, so that an isolated handwriting recognition input function can be realized by utilizing the central control. Based on the method, after the camera is started (for example, the camera is started by itself for image acquisition after the vehicle is started, or the camera is triggered to be started based on user operation, or the camera can be controlled to be started based on voice of a user), the user writes characters (such as characters and the like) through gestures at intervals, for example, 'songs are played', at the moment, the camera can capture handwriting gesture videos of the user, and then the handwriting gesture videos are recognized through a first convolution neural network of an image gesture recognition technology, and tracks drawn by fingers of the user are recognized; furthermore, the recognized track can be recognized into a corresponding text through a second convolutional neural network which realizes a handwriting input recognition technology; and then the recognized text is transferred to corresponding software function operation, for example, the text is transferred to a control instruction for starting a music player, and then the music player is controlled to randomly play songs, so that man-machine interaction based on the air gesture is realized.
According to the scheme, the image gesture recognition technology, the handwriting input recognition technology and the semantic analysis technology are combined, the purpose of gesture recognition is greatly expanded, the method and the device are particularly suitable for scenes that a driver needs to interact with a vehicle machine in a vehicle-mounted environment, and meanwhile interaction safety is improved.
In addition, in the identification process of the scheme, the identification can be realized without the need of memorizing special gestures by a user, so that the convenience is improved, and the user operation is simplified.
The present application further provides a gesture recognition apparatus, as shown in fig. 3, including:
a data acquisition unit 301 configured to acquire video data including a hand motion;
a track determining unit 302, configured to obtain a target movement track based on the hand motion presented by the video data;
a text information determining unit 303, configured to obtain text information corresponding to the hand motion presented by the video data based on the target movement trajectory;
an instruction determining unit 304, configured to determine a control instruction corresponding to the text information, where the control instruction is capable of instructing a target device to perform a corresponding operation.
In a specific example of the scheme of the application, the trajectory determination unit is further configured to obtain a target movement trajectory based on a finger movement feature in the hand motion presented by the video data.
In a specific example of the application, the trajectory determination unit is further configured to input at least a sequence of video frames including a hand motion in the video data to a preset neural network model to obtain the target movement trajectory; the preset neural network model is obtained after a sample video marked with a moving track is trained, the sample video comprises hand actions, and the marked moving track is matched with the hand actions of the sample video.
In a specific example of the scheme of the present application, the text information determining unit includes:
the model subunit is used for inputting the target movement track into a preset recognition model to obtain the probability characteristic that the target movement track represents preset characters;
and the character processing subunit is configured to determine character information represented by the target movement trajectory based on the probability characteristic that the target movement trajectory represents a preset character, so as to obtain text information corresponding to the hand motion presented by the video data.
In a specific example of the scheme of the present application, the method further includes:
the starting unit is used for detecting a starting instruction;
and the image acquisition unit is used for responding to the starting instruction and triggering the image acquisition equipment indicated by the starting instruction to acquire images of the hand movements in the acquisition area to obtain video data.
In a specific example of the scheme of the present application, the method further includes:
the control unit is used for carrying out corresponding operation on the vehicle indicated by the control instruction based on the control instruction; or based on the control instruction, performing corresponding operation on the vehicle-mounted equipment in the vehicle indicated by the control instruction.
Therefore, the hand action can be effectively recognized, for example, the spaced handwriting action is recognized, the corresponding text information is obtained, and then the control instruction corresponding to the text information is obtained, so that the gesture recognition mode is enriched, the gesture recognition use scene is enriched, and a foundation is laid for simplifying the user operation and improving the user experience.
According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.
Fig. 4 is a block diagram of an electronic device according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 4, the electronic apparatus includes: one or more processors 401, memory 402, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 4, one processor 401 is taken as an example.
Memory 402 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the gesture recognition methods provided herein. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to perform the gesture recognition method provided herein.
The memory 402, which is a non-transitory computer-readable storage medium, may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the gesture recognition method in the embodiment of the present application (for example, the data acquisition unit 301, the trajectory determination unit 302, the text information determination unit 303, and the instruction determination unit 304 shown in fig. 3). The processor 401 executes various functional applications of the server and data processing by running non-transitory software programs, instructions and modules stored in the memory 402, that is, implements the gesture recognition method in the above method embodiment.
The memory 402 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device by the gesture recognition method, and the like. Further, the memory 402 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 402 may optionally include memory located remotely from the processor 401, which may be connected to the gesture recognition method electronics over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device of the gesture recognition method may further include: an input device 403 and an output device 404. The processor 401, the memory 402, the input device 403 and the output device 404 may be connected by a bus or other means, and fig. 4 illustrates an example of a connection by a bus.
The input device 403 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus of the gesture recognition method, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or other input devices. The output devices 404 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and Virtual Private Server (VPS) service.
According to the technical scheme of the embodiment of the application, the hand action can be effectively recognized, for example, the spaced handwriting action is recognized, the corresponding text information is obtained, and the control instruction corresponding to the text information is obtained, so that the gesture recognition modes are enriched, the use scenes of the gesture recognition are enriched, and a foundation is laid for simplifying the user operation and improving the user experience.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (14)

1. A gesture recognition method, comprising:
acquiring video data containing hand movements;
obtaining a target movement track based on the hand motion presented by the video data;
obtaining text information corresponding to the hand motion presented by the video data based on the target moving track, and determining a control instruction corresponding to the text information, wherein the control instruction can instruct a target device to perform corresponding operation.
2. The method of claim 1, wherein the deriving a target movement trajectory based on hand motion presented by the video data comprises:
and obtaining a target movement track based on the finger movement characteristics in the hand action presented by the video data.
3. The method of claim 1 or 2, wherein the deriving a target movement trajectory based on hand motion presented by the video data comprises:
inputting a video frame sequence at least containing hand motion in the video data to a preset neural network model to obtain the target moving track; the preset neural network model is obtained after a sample video marked with a moving track is trained, the sample video comprises hand actions, and the marked moving track is matched with the hand actions of the sample video.
4. The method of claim 1, wherein the obtaining text information corresponding to the hand motion presented by the video data based on the target movement trajectory comprises:
inputting the target moving track into a preset recognition model to obtain the probability characteristic that the target moving track represents preset characters;
and determining character information represented by the target movement track based on the probability characteristic that the target movement track represents preset characters so as to obtain text information corresponding to the hand action presented by the video data.
5. The method of claim 1, further comprising:
detecting a starting instruction;
and responding to the starting instruction, and triggering the image acquisition equipment indicated by the starting instruction to acquire the image of the hand motion in the acquisition area to obtain video data.
6. The method of claim 1 or 5, further comprising:
based on the control instruction, performing corresponding operation on the vehicle indicated by the control instruction; or,
and performing corresponding operation on the vehicle-mounted equipment in the vehicle indicated by the control instruction based on the control instruction.
7. A gesture recognition apparatus comprising:
the data acquisition unit is used for acquiring video data containing hand movements;
the track determining unit is used for obtaining a target moving track based on the hand motion presented by the video data;
the text information determining unit is used for obtaining text information corresponding to the hand action presented by the video data based on the target moving track;
and the instruction determining unit is used for determining a control instruction corresponding to the text information, wherein the control instruction can instruct the target equipment to perform corresponding operation.
8. The apparatus according to claim 7, wherein the trajectory determination unit is further configured to obtain a target movement trajectory based on finger movement characteristics in the hand motion presented by the video data.
9. The apparatus according to claim 7 or 8, wherein the trajectory determining unit is further configured to input at least a sequence of video frames including hand movements in the video data to a preset neural network model to obtain the target movement trajectory; the preset neural network model is obtained after a sample video marked with a moving track is trained, the sample video comprises hand actions, and the marked moving track is matched with the hand actions of the sample video.
10. The apparatus of claim 7, wherein the text information determining unit comprises:
the model subunit is used for inputting the target movement track into a preset recognition model to obtain the probability characteristic that the target movement track represents preset characters;
and the character processing subunit is configured to determine character information represented by the target movement trajectory based on the probability characteristic that the target movement trajectory represents a preset character, so as to obtain text information corresponding to the hand motion presented by the video data.
11. The apparatus of claim 7, further comprising:
the starting unit is used for detecting a starting instruction;
and the image acquisition unit is used for responding to the starting instruction and triggering the image acquisition equipment indicated by the starting instruction to acquire images of the hand movements in the acquisition area to obtain video data.
12. The apparatus of claim 7 or 11, further comprising:
the control unit is used for carrying out corresponding operation on the vehicle indicated by the control instruction based on the control instruction; or based on the control instruction, performing corresponding operation on the vehicle-mounted equipment in the vehicle indicated by the control instruction.
13. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.
14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-6.
CN202010997902.2A 2020-09-21 2020-09-21 Gesture recognition method, device, equipment and storage medium Withdrawn CN111913585A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010997902.2A CN111913585A (en) 2020-09-21 2020-09-21 Gesture recognition method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010997902.2A CN111913585A (en) 2020-09-21 2020-09-21 Gesture recognition method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111913585A true CN111913585A (en) 2020-11-10

Family

ID=73265348

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010997902.2A Withdrawn CN111913585A (en) 2020-09-21 2020-09-21 Gesture recognition method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111913585A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112274920A (en) * 2020-11-24 2021-01-29 智博云信息科技(广州)有限公司 Virtual reality gesture control method, platform, server and readable storage medium
CN112527110A (en) * 2020-12-04 2021-03-19 北京百度网讯科技有限公司 Non-contact interaction method and device, electronic equipment and medium
CN113038216A (en) * 2021-03-10 2021-06-25 深圳创维-Rgb电子有限公司 Instruction obtaining method, television, server and storage medium
CN113204283A (en) * 2021-04-30 2021-08-03 Oppo广东移动通信有限公司 Text input method, text input device, storage medium and electronic equipment
CN113325950A (en) * 2021-05-27 2021-08-31 百度在线网络技术(北京)有限公司 Function control method, device, equipment and storage medium
CN113810536A (en) * 2021-08-02 2021-12-17 惠州Tcl移动通信有限公司 Method, device and terminal for displaying information based on motion trajectory of human body in video

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103092343A (en) * 2013-01-06 2013-05-08 深圳创维数字技术股份有限公司 Control method based on camera and mobile terminal
CN104216514A (en) * 2014-07-08 2014-12-17 深圳市华宝电子科技有限公司 Method and device for controlling vehicle-mounted device, and vehicle
CN105579319A (en) * 2013-03-12 2016-05-11 罗伯特·博世有限公司 System and method for identifying handwriting gestures in an in-vehicle information system
CN106295599A (en) * 2016-08-18 2017-01-04 乐视控股(北京)有限公司 The control method of vehicle and device
CN108170266A (en) * 2017-12-25 2018-06-15 珠海市君天电子科技有限公司 Smart machine control method, device and equipment
CN109033954A (en) * 2018-06-15 2018-12-18 西安科技大学 A kind of aerial hand-written discrimination system and method based on machine vision
CN109032356A (en) * 2018-07-27 2018-12-18 深圳绿米联创科技有限公司 Sign language control method, apparatus and system
CN111367415A (en) * 2020-03-17 2020-07-03 北京明略软件***有限公司 Equipment control method and device, computer equipment and medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103092343A (en) * 2013-01-06 2013-05-08 深圳创维数字技术股份有限公司 Control method based on camera and mobile terminal
CN105579319A (en) * 2013-03-12 2016-05-11 罗伯特·博世有限公司 System and method for identifying handwriting gestures in an in-vehicle information system
CN104216514A (en) * 2014-07-08 2014-12-17 深圳市华宝电子科技有限公司 Method and device for controlling vehicle-mounted device, and vehicle
CN106295599A (en) * 2016-08-18 2017-01-04 乐视控股(北京)有限公司 The control method of vehicle and device
CN108170266A (en) * 2017-12-25 2018-06-15 珠海市君天电子科技有限公司 Smart machine control method, device and equipment
CN109033954A (en) * 2018-06-15 2018-12-18 西安科技大学 A kind of aerial hand-written discrimination system and method based on machine vision
CN109032356A (en) * 2018-07-27 2018-12-18 深圳绿米联创科技有限公司 Sign language control method, apparatus and system
CN111367415A (en) * 2020-03-17 2020-07-03 北京明略软件***有限公司 Equipment control method and device, computer equipment and medium

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112274920A (en) * 2020-11-24 2021-01-29 智博云信息科技(广州)有限公司 Virtual reality gesture control method, platform, server and readable storage medium
CN112274920B (en) * 2020-11-24 2022-05-31 亓乐(北京)文化科技有限公司 Virtual reality gesture control method, platform, server and readable storage medium
CN112527110A (en) * 2020-12-04 2021-03-19 北京百度网讯科技有限公司 Non-contact interaction method and device, electronic equipment and medium
CN113038216A (en) * 2021-03-10 2021-06-25 深圳创维-Rgb电子有限公司 Instruction obtaining method, television, server and storage medium
CN113204283A (en) * 2021-04-30 2021-08-03 Oppo广东移动通信有限公司 Text input method, text input device, storage medium and electronic equipment
CN113325950A (en) * 2021-05-27 2021-08-31 百度在线网络技术(北京)有限公司 Function control method, device, equipment and storage medium
CN113325950B (en) * 2021-05-27 2023-08-25 百度在线网络技术(北京)有限公司 Function control method, device, equipment and storage medium
CN113810536A (en) * 2021-08-02 2021-12-17 惠州Tcl移动通信有限公司 Method, device and terminal for displaying information based on motion trajectory of human body in video
CN113810536B (en) * 2021-08-02 2023-12-12 惠州Tcl移动通信有限公司 Information display method, device and terminal based on human limb action track in video

Similar Documents

Publication Publication Date Title
CN111913585A (en) Gesture recognition method, device, equipment and storage medium
JP7078808B2 (en) Real-time handwriting recognition management
CN112131988B (en) Method, apparatus, device and computer storage medium for determining virtual character lip shape
CN112507735B (en) Training method and device of machine translation model and electronic equipment
CN111931591A (en) Method and device for constructing key point learning model, electronic equipment and readable storage medium
JP7281521B2 (en) Voice control method and voice control device, electronic device and storage medium
CN111968631B (en) Interaction method, device, equipment and storage medium of intelligent equipment
CN112099645A (en) Input image generation method and device, electronic equipment and storage medium
CN111225236A (en) Method and device for generating video cover, electronic equipment and computer-readable storage medium
CN112825013A (en) Control method and device of terminal equipment
CN111966212A (en) Multi-mode-based interaction method and device, storage medium and smart screen device
CN112383805A (en) Method for realizing man-machine interaction at television end based on human hand key points
CN112269867A (en) Method, device, equipment and storage medium for pushing information
CN111708477B (en) Key identification method, device, equipment and storage medium
JP2022020574A (en) Information processing method and apparatus in user dialogue, electronic device, and storage media
CN112036315A (en) Character recognition method, character recognition device, electronic equipment and storage medium
JP2022028667A (en) Method for updating user image recognition model, device, electronic apparatus, computer-readable recording medium, and computer program
EP3654205A1 (en) Systems and methods for generating haptic effects based on visual characteristics
CN111443853B (en) Digital human control method and device
CN111027195B (en) Simulation scene generation method, device and equipment
CN111638787A (en) Method and device for displaying information
CN111736799A (en) Voice interaction method, device, equipment and medium based on man-machine interaction
US20220328076A1 (en) Method and apparatus of playing video, electronic device, and storage medium
CN116167426A (en) Training method of face key point positioning model and face key point positioning method
CN112788390B (en) Control method, device, equipment and storage medium based on man-machine interaction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20211015

Address after: 100176 Room 101, 1st floor, building 1, yard 7, Ruihe West 2nd Road, economic and Technological Development Zone, Daxing District, Beijing

Applicant after: Apollo Intelligent Connectivity (Beijing) Technology Co., Ltd.

Address before: 2 / F, *** building, 10 Shangdi 10th Street, Haidian District, Beijing 100085

Applicant before: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY Co.,Ltd.

WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20201110