WO2021258862A1 - Typing method and apparatus, and device and storage medium - Google Patents

Typing method and apparatus, and device and storage medium Download PDF

Info

Publication number
WO2021258862A1
WO2021258862A1 PCT/CN2021/091737 CN2021091737W WO2021258862A1 WO 2021258862 A1 WO2021258862 A1 WO 2021258862A1 CN 2021091737 W CN2021091737 W CN 2021091737W WO 2021258862 A1 WO2021258862 A1 WO 2021258862A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
dimensional
gesture
pose
character
Prior art date
Application number
PCT/CN2021/091737
Other languages
French (fr)
Chinese (zh)
Inventor
张学勇
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2021258862A1 publication Critical patent/WO2021258862A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language

Definitions

  • This application relates to human-computer interaction technology, including but not limited to typing methods and devices, equipment, and storage media.
  • AR Augmented Reality
  • AR technology has made keyboardless typing possible. Users can write characters in front of the AR glasses they are wearing. AR glasses can recognize handwritten characters based on the trajectory of the collected fingertip feature points. However, this method has the defect of low recognition accuracy, making the human-computer interaction experience of keyboardless typing unfriendly.
  • the typing method, device, device, and storage medium provided by the embodiments of the present application can improve the efficiency of keyboardless typing; among them, the typing method, device, device, and storage medium provided by the embodiments of the present application are implemented as follows:
  • the typing method provided by the embodiment of the present application includes: acquiring a two-dimensional image and a depth image of a user's posture; constructing a three-dimensional posture model of the user based on the two-dimensional image and the depth image; and performing processing on the three-dimensional posture model Analyze and output the target characters obtained from the analysis to realize the keyboardless typing function.
  • the typing device includes: an image acquisition module for acquiring a two-dimensional image and a depth image of a user's posture; a model construction module for constructing the user based on the two-dimensional image and the depth image
  • the character output module is used to analyze the three-dimensional posture model, and output the target characters obtained by the analysis, so as to realize the keyboardless typing function.
  • the electronic device provided by an embodiment of the present application includes a memory and a processor, the memory stores a computer program that can run on the processor, and the processor implements any of the typing methods described in the embodiments of the present application when the processor executes the program Steps in.
  • the computer-readable storage medium provided by the embodiment of the present application has a computer program stored thereon, and the computer program implements the steps in any typing method of the embodiment of the present application when the computer program is executed by a processor.
  • a typing method is provided.
  • an electronic device obtains a two-dimensional image and a depth image of a user's posture; and constructs a three-dimensional posture model of the user based on the two-dimensional image and the depth image Analyze the three-dimensional posture model and output the target characters obtained by the analysis; compared to determining the user posture based on only two-dimensional images, thereby realizing the keyboardless typing function, the embodiment of the present application determines the user based on two-dimensional images and depth images
  • the posture is more accurate, which can improve the efficiency of keyboardless typing.
  • FIG. 1 is a schematic diagram of an application scenario of a typing method according to an embodiment of this application
  • FIG. 2 is a schematic diagram of the implementation process of the typing method according to the embodiment of the application.
  • Figure 3 is a schematic diagram of a user's gestures extending in different planes
  • FIG. 4 is a schematic diagram of another implementation process of the typing method according to the embodiment of this application.
  • 5A is a schematic diagram of another implementation process of the typing method according to the embodiment of this application.
  • 5B is a schematic diagram of a method for obtaining a target pose model according to an embodiment of the application.
  • FIG. 6 is a schematic diagram of the correspondence relationship between a character gesture model and a letter predefined in an embodiment of the application
  • Figure 7 is a schematic diagram of a hardware system for AR glasses gesture recognition
  • FIG. 8 is a schematic diagram of the depth imaging principle of the Time of Flight (TOF) depth camera module
  • FIG. 9 is a schematic diagram of another gesture recognition hardware system for AR glasses.
  • Figure 10 is a schematic diagram of the principle of depth imaging of the structured light module
  • 11A is a schematic diagram of the structure of a typing device according to an embodiment of the application.
  • 11B is a schematic diagram of another structure of the typing device according to the embodiment of the application.
  • FIG. 12 is a schematic diagram of a hardware entity of an electronic device according to an embodiment of the application.
  • first ⁇ second ⁇ third involved in the embodiments of the present application only distinguishes similar or different objects, and does not represent a specific order of objects. Understandably, “first ⁇ second ⁇ “Third” can be interchanged in a specific order or sequence when permitted, so that the embodiments of the present application described herein can be implemented in a sequence other than those illustrated or described herein.
  • the user 10 performs a gesture operation on the front end of the AR glasses 11 after starting the gesture input method of the AR glasses 11. For example, if a fist is stretched out, the AR glasses 11 recognize the fist as the corresponding letter A, and output and display the letter on the AR chat interface 13.
  • keyboardless typing is not limited to the foregoing chat application scenario.
  • the application scenarios of keyboardless typing can be diverse, for example, it can also be office scenarios such as document editing, search keywords input through a search engine, and any application scenarios that require text input.
  • the electronic devices that implement the typing method may be diverse, and are not limited to headsets such as AR glasses.
  • the electronic device may also be a notebook computer, desktop computer, server, or TV set that has information processing capabilities or has image collection capabilities and information processing capabilities.
  • the functions realized by the typing method can be realized by calling the program code by the processor in the electronic device.
  • the program code can be stored in a computer storage medium. It can be seen that the electronic device at least includes a processor and a storage medium.
  • FIG. 2 is a schematic diagram of the implementation process of the typing method according to the embodiment of this application. As shown in FIG. 2, the method may at least include the following steps 201 to 203:
  • Step 201 Obtain a two-dimensional image and a depth image of the user's posture.
  • the user posture can be a posture or a hand posture (ie, a gesture), which is not limited.
  • the types of the two-dimensional image can be various, for example, the two-dimensional image is an infrared gray image, a speckle image, or a red-green-blue (RGB) image.
  • the electronic device may be a device capable of image collection, for example, the device is AR glasses.
  • the device can collect a two-dimensional image and a depth image of the user's posture through its TOF camera.
  • the device may also collect the speckle image of the user's posture through its structured light module, and then calculate the depth image of the user's posture based on the speckle image.
  • Step 202 Construct a three-dimensional posture model of the user according to the two-dimensional image and the depth image.
  • the electronic device may perform key feature point extraction based on the two-dimensional image, for example, identify joint points of the user, and use these joint points as key feature points.
  • the joint points of the hand and the center of the palm can be used as key feature points; then, the pixel coordinates of these feature points are converted into x and y coordinates in a specific coordinate system (such as a Cartesian coordinate system) , Convert the depth information of the corresponding points of these feature points in the depth image to the z-coordinates in the specific coordinate system, so that the three-dimensional coordinates of these feature points in the specific coordinate system are obtained; finally, the three-dimensional coordinates based on these feature points Coordinates, a three-dimensional posture model of the user’s posture can be constructed.
  • a three-dimensional posture model of the user’s posture can be constructed.
  • There are many ways to build a three-dimensional model For example, according to the positional relationship of these feature points, these feature points are connected to obtain a three-dimensional posture model of the user's posture
  • Step 203 Analyze the three-dimensional posture model, and output the target characters obtained by the analysis, so as to realize the keyboardless typing function.
  • the output method can be varied. For example, assuming that the character is a letter, the electronic device can directly display the letter on the display interface, or display multiple candidate Chinese characters corresponding to the letter. The electronic device can output the letter to the Chinese character input method, so that the Chinese character input method displays these candidate Chinese characters on the display interface.
  • the electronic device can input the three-dimensional pose model into the pre-trained classifier to obtain the similarity with each predefined character pose model, that is, the degree of similarity that belongs to each predefined character pose model Probability; then the predefined character pose model corresponding to the maximum probability is determined as the target pose model matching the three-dimensional pose model, and the character corresponding to the target pose model is determined as the target character; where the classifier can pass each The sample image of the predefined character pose model is obtained by training the deep learning model.
  • a typing method is provided.
  • an electronic device obtains a two-dimensional image and a depth image of a user's posture; and constructs a three-dimensional posture of the user based on the two-dimensional image and the depth image Model; the three-dimensional pose model is analyzed, and the target characters obtained by the analysis are output; compared to determining the user's pose based on only two-dimensional images, thereby realizing the keyboardless typing function, the embodiment of the present application is determined based on two-dimensional images and depth images
  • the user's posture is more accurate, which in turn can improve the efficiency of keyboardless typing. Understandably, the recognition accuracy of user gestures directly affects the efficiency of keyboardless typing, and the recognition accuracy is low. The user needs to make incorrect recognition gestures many times, which greatly reduces the typing efficiency and the user experience is poor. The higher the recognition accuracy, the higher the efficiency of keyboardless typing.
  • FIG. 4 is a schematic diagram of another implementation process of the typing method of the embodiment of the application. As shown in FIG. 4, the method may at least include the following steps 401 to 405:
  • Step 401 Receive a start instruction, where the start instruction is used to instruct to start the gesture input method.
  • the user can start the gesture input method by making a specific gesture, or can start the gesture input method by touching or pressing a button for starting the gesture input method.
  • the gesture input method refers to that the user can input corresponding characters by making different gestures.
  • the gesture input method is a gesture input method, and the user can make different gestures to make the electronic device recognize and output corresponding letters.
  • the gesture input method can also be a posture input method.
  • Step 402 In response to the start instruction, start the gesture input method.
  • the electronic device may output prompt information after starting the gesture input method to prompt the user to type by gesture.
  • prompt information There are various ways to output the prompt information, such as outputting a prompt signal, outputting a voice prompt, or outputting a text prompt.
  • the purpose of starting the gesture input method here is to enable the same gesture to express multiple meanings.
  • a certain gesture represents a corresponding character; but before the input method is turned on, the gesture represents a control instruction, such as a click operation.
  • Step 403 Obtain a two-dimensional image and a depth image of the user's posture
  • the typing method may be applied to a head-mounted device, and the head-mounted device may include a depth camera for capturing the user's posture.
  • the headset can control the depth camera to collect two-dimensional images and depth images of the user's posture.
  • the depth camera for example, can be a TOF camera or a structured light module.
  • the length and width of the image sensor of the TOF camera are the same, so that the lens of the TOF camera and the image sensor are in an inscribed circle relationship, so that the two-dimensional image collected by the image sensor and
  • the effective pixel area of the depth image is a circular area with the center of the image sensor as the center; in this way, the redundant information included in the two-dimensional image and the depth image is reduced, thereby reducing the interference to the gesture recognition and improving the gesture recognition
  • the accuracy rate of keyboardless typing can shorten the delay of keyboardless typing and improve the efficiency of keyboardless typing.
  • the field of view range of the depth camera lens is [100°, 120°].
  • the field of view angle can be any angle within [100°, 120°], which is not limited in the embodiment of the present application.
  • the lens of the depth camera can also be a lens with a field of view less than 100°.
  • Step 404 Construct a three-dimensional posture model of the user according to the two-dimensional image and the depth image;
  • Step 405 Analyze the three-dimensional posture model, and output the target characters obtained by the analysis, so as to realize the keyboardless typing function.
  • a typing method is provided.
  • An electronic device receives a start instruction input by a user, and in response to the instruction, starts a gesture input method, thereby realizing a keyboardless typing function.
  • a certain gesture made by the user represents a character, otherwise, the gesture represents other meanings; in this way, a gesture can have multiple meanings, which improves the reusability of user gestures .
  • FIG. 5A is a schematic diagram of another implementation process of the typing method according to the embodiment of the present application. As shown in FIG. 5A, the method may at least include the following steps 501 to 508:
  • Step 501 Obtain a two-dimensional image and a depth image of the user's posture.
  • the server can receive the above two images sent by the user terminal. These two images can be collected by the user terminal through the TOF camera, or the user terminal can first collect the speckle image of the user's posture (that is, an example of a two-dimensional image) through the structured light module, and then based on the speckle image Calculate the corresponding depth image.
  • the speckle image of the user's posture that is, an example of a two-dimensional image
  • the user terminal can obtain the above-mentioned image through its own depth camera. For example, the user terminal acquires the above two images through a TOF camera; or, the user terminal acquires a speckle image of the user's posture through a structured light module, and then calculates the corresponding depth image based on the speckle image.
  • Step 502 Identify multiple key feature points of the user gesture included in the two-dimensional image.
  • the key feature points may be the joint points of the user's body, or may also be the joint points of the hand and the palm.
  • Step 503 Convert the pixel coordinates of each of the key feature points into x-coordinates and y-coordinates in a specific coordinate system.
  • the function of the specific coordinate system is to unify the pixel coordinates and depth information of the key feature points into one coordinate system.
  • the specific coordinate system is a Cartesian coordinate system.
  • Step 504 Extract the depth information of each key feature point from the depth image
  • Step 505 Convert the depth information of each of the key feature points into z-coordinates in the specific coordinate system
  • Step 506 Construct the three-dimensional pose model based on the x, y, and z coordinates of each of the key feature points;
  • Step 507 Match the three-dimensional pose model with a plurality of predefined character pose models to obtain a target pose model; wherein, each predefined character pose model is used to uniquely represent a corresponding character.
  • characters refer to glyph-like units or symbols, including letters, numbers, arithmetic symbols, punctuation marks and other symbols, as well as some functional symbols.
  • the user gesture may be the user's hand gesture (that is, a gesture)
  • the predefined character gesture model may be an alphabet gesture model
  • the alphabet gesture model is used to uniquely represent the corresponding letter.
  • the multiple predefined character gesture models are American Sign Language (ASL) gestures, and each ASL gesture is used to uniquely represent a letter defined by ASL.
  • ASL American Sign Language
  • defining a set of standardized and universal gesture operation instructions not only facilitates the use of ordinary people, but also greatly facilitates the use of some special people (such as deaf-mute people). These special groups of people do not need to relearn new gestures to perform keyboardless typing efficiently.
  • multiple predefined character gesture models adopt ASL gestures, which enhances the versatility of the typing method, and makes the electronic device implementing the typing method easier to accept by users, especially those who are deaf and dumb.
  • the electronic device may determine the similarity between the three-dimensional pose model and each predefined character pose model; and then determine the predefined character pose model whose similarity meets specific conditions as the same as the three-dimensional pose model.
  • the target pose model that matches the model, and the character corresponding to the target pose model is output as the target character.
  • Step 508 Output the character corresponding to the target pose model as the target character.
  • the three-dimensional pose model is matched with a plurality of predefined character pose models to obtain the target pose model, which can be implemented in various ways.
  • the electronic device may perform the matching of the three-dimensional pose model with multiple predefined character pose models in step 507 through the following steps 5071 and 5072 to obtain the target Posture model:
  • Step 5071 Determine the similarity between the three-dimensional pose model and each of the predefined character pose models
  • the electronic device may use a pre-trained classifier to process the three-dimensional pose model to obtain the similarity with each of the predefined character pose models, that is, the three-dimensional pose model belongs to The probability of each predefined character pose model; wherein, the classifier is obtained by training the deep learning model by using the sample image of each predefined character pose model.
  • the predefined character pose model corresponds to one or more frames of sample images
  • the multi-frame sample images can be images of the character pose model collected by the camera at multiple angles; this way, the calculation accuracy of the classifier can be improved In this way, the accuracy of gesture recognition is improved, and the efficiency of keyboardless typing is improved.
  • the electronic device may determine the Euclidean distance between the three-dimensional pose model and each predefined character pose model to achieve the determination of similarity.
  • Step 5072 Determine a character pose model whose similarity meets a predetermined condition as the target pose model.
  • the similarity is determined by the classifier, that is, the classifier outputs the probability that the three-dimensional pose model belongs to each predefined character pose model, and the corresponding specific condition is the maximum probability. That is, the character pose model corresponding to the maximum probability in each probability is determined as the target pose model.
  • similarity is determined by calculating the Euclidean distance between two models, and the corresponding specific condition is the minimum Euclidean distance. That is, the character posture model corresponding to the smallest Euclidean distance in each Euclidean distance is determined as the target posture model.
  • the electronic device may also directly obtain the target pose model through the classifier, thereby implementing the above step 507. That is, the electronic device inputs the three-dimensional pose model to a pre-trained classifier to obtain a target pose model that matches the three-dimensional pose model; wherein, the classifier uses each of the predefined character poses
  • the sample image of the model is obtained by training the deep learning model.
  • gesture recognition is used to recognize double-click, slide, click and other operations, and it can be directly realized by gesture recognition.
  • the typing function is not yet perfect;
  • gesture recognition schemes realize the recognition of handwritten characters based on the trajectory positioning of the feature points of the fingertips, but the recognition accuracy of this method is not high.
  • the gesture recognition typing method provided by the embodiment of the present application is implemented based on a type of standard gesture samples, and the recognition accuracy is improved;
  • the current gesture recognition is based on a set of customized gestures to achieve some control operations such as double-clicking, tapping, and sliding. For example, swiping a fist twice indicates a double-click operation; swiping a fist once indicates a single-click operation; sliding the fist left and right indicates a sliding operation.
  • this defined gesture operation is not an international general operation, and is mainly used for control, not typing. Therefore, the user group of this gesture recognition is generally the general population, and it cannot solve the typing of special people (deaf-mute people). Experience and typing efficiency;
  • the current gesture recognition operation and other functions are affected by the recognition accuracy and latency, and fail to meet the comfort of the user's interactive experience. That is to say, due to the low accuracy of current gesture recognition, when the gesture recognition is wrong, the user needs to repeat the gesture made before, and even the user needs to make the same gesture multiple times to accurately recognize the meaning of the gesture. This reduces the efficiency of gesture recognition operations and increases the latency of gesture recognition.
  • the embodiment of the application provides a design scheme of a TOF depth camera gesture recognition typing system applied to AR glasses.
  • a set of standard gesture letters to replace the physical keys of the solidified keyboard.
  • the TOF depth camera on the AR glasses it can capture the infrared images of gestures (an example of two-dimensional images) and depth images in real time, and then through a specific gesture recognition algorithm process, the letters corresponding to the gestures are output in real time to achieve keyboardless typing Features.
  • the gestures corresponding to the 26 English letters are an example of a predefined character gesture model, as shown in FIG. 6, that is, American Sign Language (ASL) gesture letters.
  • the so-called gesture letters are letters represented by gestures.
  • the gesture recognition defined by the above-mentioned standard is not only suitable for ordinary users, but also suitable for special groups such as deaf-mute people.
  • a TOF depth camera is used to collect a large number of gesture depth images and infrared gray-scale images (ie IR images), which are used as gesture recognition samples for the algorithm classifier.
  • the gesture feature points are extracted for these gesture recognition samples, and the pose information of the gesture feature points of each sample is determined.
  • the pose information can be, for example, three-dimensional coordinate information in Cartesian coordinates; based on the gesture feature points of each sample The pose information is used to construct the three-dimensional gesture model contained in the sample; these three-dimensional gesture models are used as training samples, and the letters corresponding to each three-dimensional gesture model are used as labels to train the deep learning model to obtain the above-mentioned classifier.
  • the typing system shows a large field of view TOF depth camera module integrated on AR glasses.
  • the entire hardware module for gesture recognition and typing includes a large field of view TOF
  • the user performs gesture operations at the front end of the AR glasses, which is captured by the TOF depth camera on the AR glasses.
  • the TOF depth camera transmits the collected images to the processing chip in real time, and is processed by the algorithm related to gesture recognition to realize the function of gesture recognition and typing.
  • FIG 8 is a schematic diagram of the depth imaging of the TOF depth camera module.
  • the field of view angle of the TOF depth camera lens is designed to be [100°, 120°]
  • the image sensor of the TOF camera The ratio of length to width is 1:1, and the number of pixels is 512 ⁇ 512.
  • the length and width ratio of the image sensor of the TOF camera is designed to be 1:1. The reason is: in order to match the field of view (FOV) of the large field of view lens with an inscribed circle, so as to make the output depth image Only include information within the field of view.
  • TOF depth imaging is an actively launched 3D imaging technology.
  • the transmitter module emits the debug light pulse signal. After the receiver module receives the transmitted light signal, it calculates the time difference or phase difference between the light emission time and the reflection time to calculate the gesture In-depth information.
  • the software system of the TOF gesture recognition typing system mainly includes: a gesture image acquisition module, a gesture detection module, a gesture recognition module, and an application module.
  • the gesture image acquisition module uses the large-angle TOF depth camera designed in the embodiment of the application to collect relevant gesture images and video stream data in real time.
  • the collected gesture images include infrared grayscale images and depth images in the 940 nanometer (nanometer, nm) band or 850 nm band.
  • the resolution is 512 ⁇ 512, and the effective pixel area is 78%, which is a circular area with the center of the image sensor as the center. .
  • the frame rate of the collected video stream data can reach 30 frames per second (fps), which meets the real-time requirements of gesture recognition.
  • the gesture detection module uses methods based on infrared gray image and depth information to perform gesture detection.
  • the main processes include camera calibration and reprojection, infrared gray image and depth image segmentation, etc.
  • the gesture recognition module and the application module mainly use the above-mentioned infrared grayscale and depth images to recognize gestures.
  • the main processes include: gesture sample data preparation, classifier training, and gesture feature points (including joint feature points) extraction.
  • the typing method provided freed from the limitations of physical keys and touch screen typing, and can realize the function of air typing; the typing solution that uses TOF depth camera for gesture recognition improves the human-computer interaction mode. Glasses and other products have a higher degree of freedom and comfortable experience and applications; the typing method of gesture recognition can solve the typing experience and typing efficiency of special people (such as deaf-mute people, etc.).
  • the embodiment of the present application provides a typing method that is applied to AR glasses instead of a physical keyboard.
  • the solution captures and recognizes gestures in real time through the large field of view TOF depth camera integrated on the AR glasses, defines and collects a gesture sample library, and combines gesture detection and gesture recognition algorithms to realize the functions of air typing and gesture typing.
  • This solution completely gets rid of the limitations of physical keyboard and touch screen typing, and has the advantages of high degree of freedom of operation.
  • the depth sensor used for gesture recognition may also be a structured light module.
  • the depth camera integrated on the AR glasses is changed to a structured light module.
  • the entire hardware module for gesture recognition and typing includes a structured light module 90 with a large field of view, a processing chip and circuit 92, and AR Glasses lens and display system 93.
  • the user performs gesture operations on the front end of the AR glasses, which is captured by the structured light depth camera on the AR glasses.
  • the structured light depth camera includes a transmitter and a receiver, and transmits the collected depth images to the processing chip in real time, and passes gesture recognition related algorithms Processing to realize the function of gesture recognition and typing.
  • the basic principle of depth imaging of the 3D structured light module for gesture recognition is: through a near-infrared laser, light with certain structural characteristics is projected onto the object to be photographed, and then a specialized The infrared camera for collection.
  • This kind of light with a certain structure will collect different image phase information due to the different depth areas of the subject, and then use the arithmetic unit to convert this structure change into depth information to obtain a three-dimensional structure.
  • the algorithm and software architecture of the structured light-based gesture recognition typing system are basically the same as those based on the TOF camera. And the advantage of using structured light is that the depth imaging accuracy is higher.
  • the typing device may include the included modules and the units included in each module, and may be implemented by a processor in an electronic device; of course, it may also be implemented by specific logic. Circuit implementation; in the implementation process, the processor can be a central processing unit (CPU), a microprocessor (MPU), a digital signal processor (DSP), or a field programmable gate array (FPGA), etc.
  • the processor can be a central processing unit (CPU), a microprocessor (MPU), a digital signal processor (DSP), or a field programmable gate array (FPGA), etc.
  • FIG. 11A is a schematic structural diagram of a typing device according to an embodiment of the application.
  • the device 110 includes an image acquisition module 111, a model construction module 112, and a character output module 113.
  • the image acquisition module 111 is used to acquire users Two-dimensional image and depth image of posture;
  • the model construction module 112 is configured to construct a three-dimensional posture model of the user according to the two-dimensional image and the depth image;
  • the character output module 113 is used to analyze the three-dimensional posture model and output the target characters obtained by the analysis, so as to realize the keyboardless typing function.
  • the device 110 further includes: an instruction receiving module 114 and an activation module 115; wherein, the instruction receiving module 114 is configured to receive a activation instruction, and the activation instruction is used to instruct to activate the gesture input method
  • the activation module 115 is used to activate the gesture input method in response to the activation instruction.
  • the device 110 may be a head-mounted device
  • the image acquisition module 111 is a depth camera, which is used to collect a two-dimensional image and a depth image of the user's posture, or to collect a two-dimensional image of the user's posture.
  • the headset is AR glasses
  • the depth camera is a TOF camera or a structured light module.
  • the length and width of the image sensor of the TOF camera are the same, so that the lens of the TOF camera and the image sensor are in an inscribed circle relationship, so that the two-dimensional image captured by the image sensor
  • the effective pixel area of the image and the depth image is a circular area centered on the center of the image sensor.
  • the field angle range of the lens of the depth camera is [100°, 120°].
  • the model construction module 112 is configured to: identify a plurality of key feature points of the user gesture contained in the two-dimensional image; convert the pixel coordinates of each key feature point into a specific coordinate system Extract the depth information of each key feature point from the depth image; convert the depth information of each key feature point into the z coordinate in the specific coordinate system; based on The x, y, and z coordinates of each of the key feature points construct the three-dimensional posture model.
  • the character output module 113 is configured to: match the three-dimensional pose model with a plurality of predefined character pose models to obtain a target pose model; wherein, each predefined character pose model is used for unique Represents the corresponding character; and outputs the character corresponding to the target pose model as the target character.
  • the character output module 113 is configured to: determine the similarity between the three-dimensional pose model and each of the predefined character pose models; the predefined character pose model that meets specific conditions for the similarity, Determined as the target pose model.
  • the character output module 113 is configured to: use a pre-trained classifier to process the three-dimensional pose model to obtain the similarity with each of the predefined character pose models; wherein, The classifier is obtained by training a deep learning model by using sample images of each of the predefined character pose models.
  • the user gesture is a hand gesture of the user
  • the predefined character gesture model is a letter gesture model
  • the letter gesture model is used to uniquely represent a corresponding letter.
  • the plurality of predefined character gesture models are American Sign Language ASL gestures, and each ASL gesture is used to uniquely represent a letter defined by the ASL.
  • the above typing method is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer readable storage medium.
  • the technical solutions of the embodiments of the present application can be embodied in the form of a software product in essence or a part that contributes to related technologies.
  • the computer software product is stored in a storage medium and includes a number of instructions to enable The electronic device executes all or part of the method described in each embodiment of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read only memory (Read Only Memory, ROM), magnetic disk or optical disk and other media that can store program codes. In this way, the embodiments of the present application are not limited to any specific combination of hardware and software.
  • the electronic device 120 may include: a memory 121 and a processor 122.
  • the memory 121 stores a computer program that can run on the processor 122.
  • the processor 122 executes the program, the steps in the typing method provided in the foregoing embodiment are implemented.
  • the memory 121 is configured to store instructions and applications executable by the processor 122, and can also cache the to-be-processed or processed data (for example, image data, audio data, voice communication data, and Video communication data) can be implemented through flash memory (FLASH) or random access memory (Random Access Memory, RAM).
  • FLASH flash memory
  • RAM Random Access Memory
  • the computer-readable storage medium provided by the embodiment of the present application has a computer program stored thereon, and when the computer program is executed by a processor, the steps in the typing method provided in the foregoing embodiment are implemented.
  • the disclosed device and method can be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, such as: multiple units or components can be combined, or It can be integrated into another system, or some features can be ignored or not implemented.
  • the coupling, or direct coupling, or communication connection between the components shown or discussed can be indirect coupling or communication connection through some interfaces, devices or units, and can be electrical, mechanical or other forms. of.
  • the units described above as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units; they may be located in one place or distributed on multiple network units; Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • the functional units in the embodiments of the present application may all be integrated into one processing unit, or each unit may be individually used as a unit, or two or more units may be integrated into one unit; the above-mentioned integration
  • the unit of can be implemented in the form of hardware, or in the form of hardware plus software functional units.
  • the foregoing program can be stored in a computer readable storage medium.
  • the execution includes The steps of the foregoing method embodiment; and the foregoing storage medium includes: various media that can store program codes, such as a removable storage device, a read only memory (Read Only Memory, ROM), a magnetic disk, or an optical disk.
  • ROM Read Only Memory
  • the aforementioned integrated unit of this application is implemented in the form of a software function module and sold or used as an independent product, it may also be stored in a computer readable storage medium.
  • the technical solutions of the embodiments of the present application can be embodied in the form of a software product in essence or a part that contributes to related technologies.
  • the computer software product is stored in a storage medium and includes a number of instructions to enable The electronic device executes all or part of the method described in each embodiment of the present application.
  • the aforementioned storage media include: removable storage devices, ROMs, magnetic disks, or optical disks and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Multimedia (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

A typing method and apparatus, and a device and a storage medium, the method comprising: acquiring a two-dimensional image and a depth image of a user attitude (201); constructing a three-dimensional attitude model of the user according to the two-dimensional image and the depth image (202); and analyzing the three-dimensional attitude model, and outputting a target character obtained by means of analysis, so as to realize a keyboard-free typing function (203).

Description

打字方法及装置、设备、存储介质Typing method and device, equipment and storage medium
相关申请的交叉引用Cross-references to related applications
本申请基于申请号为202010591802.X、申请日为2020年06月24日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此以全文引入的方式引入本申请。This application is based on a Chinese patent application with an application number of 202010591802.X and an application date of June 24, 2020, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is hereby introduced in full This application.
技术领域Technical field
本申请涉及人机交互技术,涉及但不限于打字方法及装置、设备、存储介质。This application relates to human-computer interaction technology, including but not limited to typing methods and devices, equipment, and storage media.
背景技术Background technique
随着增强现实(Augmented Reality,AR)技术的应用和普及,以AR眼镜为代表的产品也在逐渐改变人机交互方式,例如打字。传统的打字***一般受到硬件物理设备的限制,比如键盘、鼠标、触摸屏等,这些物理硬件设备一直制约着打字的舒适性和应用场合的自由度。With the application and popularization of Augmented Reality (AR) technology, products represented by AR glasses are gradually changing the way of human-computer interaction, such as typing. Traditional typing systems are generally limited by physical hardware devices, such as keyboards, mice, touch screens, etc. These physical hardware devices have always restricted the comfort of typing and the freedom of application.
AR技术的发展,使得无键盘打字成为可能,用户可以在佩戴的AR眼镜的前方手写字符,AR眼镜基于采集的指尖特征点的轨迹实现手写字符的识别。然而,这种方式存在识别准确率低的缺陷,使得无键盘打字的人机交互体验并不友好。The development of AR technology has made keyboardless typing possible. Users can write characters in front of the AR glasses they are wearing. AR glasses can recognize handwritten characters based on the trajectory of the collected fingertip feature points. However, this method has the defect of low recognition accuracy, making the human-computer interaction experience of keyboardless typing unfriendly.
发明内容Summary of the invention
有鉴于此,本申请实施例提供的打字方法及装置、设备、存储介质,能够提高无键盘打字的效率;其中,本申请实施例提供的打字方法及装置、设备、存储介质是这样实现的:In view of this, the typing method, device, device, and storage medium provided by the embodiments of the present application can improve the efficiency of keyboardless typing; among them, the typing method, device, device, and storage medium provided by the embodiments of the present application are implemented as follows:
本申请实施例提供的打字方法,包括:获取用户姿态的二维图像和深度图像;根据所述二维图像和所述深度图像,构建所述用户的三维姿态模型;对所述三维姿态模型进行分析,输出分析得到的目标字符,以实现无键盘打字功能。The typing method provided by the embodiment of the present application includes: acquiring a two-dimensional image and a depth image of a user's posture; constructing a three-dimensional posture model of the user based on the two-dimensional image and the depth image; and performing processing on the three-dimensional posture model Analyze and output the target characters obtained from the analysis to realize the keyboardless typing function.
本申请实施例提供的打字装置,包括:图像获取模块,用于获取用户姿态的二维图像和深度图像;模型构建模块,用于根据所述二维图像和所述深度图像,构建所述用户的三维姿态模型;字符输出模块,用于对所述三维姿态模型进行分析,输出分析得到的目标字符,以实现无键盘打字功能。The typing device provided by the embodiment of the present application includes: an image acquisition module for acquiring a two-dimensional image and a depth image of a user's posture; a model construction module for constructing the user based on the two-dimensional image and the depth image The character output module is used to analyze the three-dimensional posture model, and output the target characters obtained by the analysis, so as to realize the keyboardless typing function.
本申请实施例提供的电子设备,包括存储器和处理器,所述存储器存储有可在处理器上运行的计算机程序,所述处理器执行所述程序时实现本申请实施例任一所述打字方法中的步 骤。The electronic device provided by an embodiment of the present application includes a memory and a processor, the memory stores a computer program that can run on the processor, and the processor implements any of the typing methods described in the embodiments of the present application when the processor executes the program Steps in.
本申请实施例提供的计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现本申请实施例任一打字方法中的步骤。The computer-readable storage medium provided by the embodiment of the present application has a computer program stored thereon, and the computer program implements the steps in any typing method of the embodiment of the present application when the computer program is executed by a processor.
本申请实施例中,提供一种打字方法,在该方法中,电子设备获取用户姿态的二维图像和深度图像;根据所述二维图像和所述深度图像,构建所述用户的三维姿态模型;对所述三维姿态模型进行分析,输出分析得到的目标字符;相比于仅基于二维图像确定用户姿态,从而实现无键盘打字功能,本申请实施例基于二维图像和深度图像确定的用户姿态更加准确,进而能够提高无键盘打字的效率。In an embodiment of the present application, a typing method is provided. In this method, an electronic device obtains a two-dimensional image and a depth image of a user's posture; and constructs a three-dimensional posture model of the user based on the two-dimensional image and the depth image Analyze the three-dimensional posture model and output the target characters obtained by the analysis; compared to determining the user posture based on only two-dimensional images, thereby realizing the keyboardless typing function, the embodiment of the present application determines the user based on two-dimensional images and depth images The posture is more accurate, which can improve the efficiency of keyboardless typing.
附图说明Description of the drawings
图1为本申请实施例打字方法的应用场景示意图;FIG. 1 is a schematic diagram of an application scenario of a typing method according to an embodiment of this application;
图2为本申请实施例打字方法的实现流程示意图;2 is a schematic diagram of the implementation process of the typing method according to the embodiment of the application;
图3为用户在不同平面伸出的手势示意图;Figure 3 is a schematic diagram of a user's gestures extending in different planes;
图4为本申请实施例打字方法的另一实现流程示意图;4 is a schematic diagram of another implementation process of the typing method according to the embodiment of this application;
图5A为本申请实施例打字方法的再一实现流程示意图;5A is a schematic diagram of another implementation process of the typing method according to the embodiment of this application;
图5B为本申请实施例获得目标姿态模型的方法示意图;5B is a schematic diagram of a method for obtaining a target pose model according to an embodiment of the application;
图6为本申请实施例预定义的字符姿态模型与字母的对应关系示意图;FIG. 6 is a schematic diagram of the correspondence relationship between a character gesture model and a letter predefined in an embodiment of the application;
图7为AR眼镜手势识别硬件***示意图;Figure 7 is a schematic diagram of a hardware system for AR glasses gesture recognition;
图8为飞行时间(Time of Flight,TOF)深度相机模组深度成像原理示意图;Figure 8 is a schematic diagram of the depth imaging principle of the Time of Flight (TOF) depth camera module;
图9为AR眼镜另一手势识别硬件***示意图;FIG. 9 is a schematic diagram of another gesture recognition hardware system for AR glasses;
图10为结构光模组深度成像原理示意图;Figure 10 is a schematic diagram of the principle of depth imaging of the structured light module;
图11A为本申请实施例打字装置的结构示意图;11A is a schematic diagram of the structure of a typing device according to an embodiment of the application;
图11B为本申请实施例打字装置的另一结构示意图;11B is a schematic diagram of another structure of the typing device according to the embodiment of the application;
图12为本申请实施例电子设备的一种硬件实体示意图。FIG. 12 is a schematic diagram of a hardware entity of an electronic device according to an embodiment of the application.
具体实施方式detailed description
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请的具体技术方案做进一步详细描述。以下实施例用于说明本申请,但不用来限制本申请的范围。In order to make the objectives, technical solutions, and advantages of the embodiments of the present application clearer, the specific technical solutions of the present application will be described in further detail below in conjunction with the drawings in the embodiments of the present application. The following examples are used to illustrate the application, but are not used to limit the scope of the application.
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本申请实施例的目的,不是旨在 限制本申请。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those skilled in the technical field of this application. The terminology used herein is only for the purpose of describing the embodiments of the application, and is not intended to limit the application.
在以下的描述中,涉及到“一些实施例”,其描述了所有可能实施例的子集,但是可以理解,“一些实施例”可以是所有可能实施例的相同子集或不同子集,并且可以在不冲突的情况下相互结合。In the following description, “some embodiments” are referred to, which describe a subset of all possible embodiments, but it is understood that “some embodiments” may be the same subset or different subsets of all possible embodiments, and Can be combined with each other without conflict.
需要指出,本申请实施例所涉及的术语“第一\第二\第三”仅仅是是区别类似或不同的对象,不代表针对对象的特定排序,可以理解地,“第一\第二\第三”在允许的情况下可以互换特定的顺序或先后次序,以使这里描述的本申请实施例能够以除了在这里图示或描述的以外的顺序实施。It should be pointed out that the term "first\second\third" involved in the embodiments of the present application only distinguishes similar or different objects, and does not represent a specific order of objects. Understandably, "first\second\ "Third" can be interchanged in a specific order or sequence when permitted, so that the embodiments of the present application described herein can be implemented in a sequence other than those illustrated or described herein.
本申请实施例在以下所描述的打字方法的应用场景,是为了更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定。本领域普通技术人员可知,随着新业务场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。The application scenarios of the typing method described in the embodiments of the present application below are intended to more clearly illustrate the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided by the embodiments of the present application. A person of ordinary skill in the art knows that with the emergence of new business scenarios, the technical solutions provided in the embodiments of the present application are equally applicable to similar technical problems.
以用户通过AR眼镜与好友聊天为例,如图1所示,用户10在启动AR眼镜11的手势输入法之后,在AR眼镜11的前端进行手势操作。例如,伸出拳头,AR眼镜11将该拳头识别为对应的字母A,并将该字母输出显示在AR聊天界面13上。Taking a user chatting with a friend through AR glasses as an example, as shown in FIG. 1, the user 10 performs a gesture operation on the front end of the AR glasses 11 after starting the gesture input method of the AR glasses 11. For example, if a fist is stretched out, the AR glasses 11 recognize the fist as the corresponding letter A, and output and display the letter on the AR chat interface 13.
当然,对于无键盘打字的应用场景并不限定为上述聊天的应用场景。无键盘打字的适用场景可以是多种多样的,例如,还可以是文档编辑等办公场景、通过搜索引擎输入查找关键词等等任何需要输入文字的应用场景。Of course, the application scenario of keyboardless typing is not limited to the foregoing chat application scenario. The application scenarios of keyboardless typing can be diverse, for example, it can also be office scenarios such as document editing, search keywords input through a search engine, and any application scenarios that require text input.
另外,在本申请实施例中,实现打字方法的电子设备可以是多种多样的,不限定是AR眼镜等头戴设备。比如,该电子设备还可以是笔记本电脑、台式计算机、服务器或电视机等具备信息处理能力或者具备图像采集能力和信息处理能力的设备。所述打字方法所实现的功能可以通过该电子设备中的处理器调用程序代码来实现,当然程序代码可以保存在计算机存储介质中,可见,该电子设备至少包括处理器和存储介质。In addition, in the embodiments of the present application, the electronic devices that implement the typing method may be diverse, and are not limited to headsets such as AR glasses. For example, the electronic device may also be a notebook computer, desktop computer, server, or TV set that has information processing capabilities or has image collection capabilities and information processing capabilities. The functions realized by the typing method can be realized by calling the program code by the processor in the electronic device. Of course, the program code can be stored in a computer storage medium. It can be seen that the electronic device at least includes a processor and a storage medium.
图2为本申请实施例打字方法的实现流程示意图,如图2所示,所述方法至少可以包括以下步骤201至步骤203:FIG. 2 is a schematic diagram of the implementation process of the typing method according to the embodiment of this application. As shown in FIG. 2, the method may at least include the following steps 201 to 203:
步骤201,获取用户姿态的二维图像和深度图像。Step 201: Obtain a two-dimensional image and a depth image of the user's posture.
用户姿态可以是体态,也可以是手部姿态(即手势),对此不做限定。二维图像的类型可以是多种多样的,例如,二维图像为红外灰度图像、散斑图或者红绿蓝(Red Green Blue,RGB)图像等。电子设备可以是具备图像采集能力的设备,例如该设备为AR眼镜。在一些实施例中,该设备可以通过自身的TOF相机采集该用户姿态的二维图像和深度图像。在另一些实施例中,该设备还可以通过自身的结构光模组采集该用户姿态的散斑图,然后基于该散斑图计算该用户姿态的深度图像。The user posture can be a posture or a hand posture (ie, a gesture), which is not limited. The types of the two-dimensional image can be various, for example, the two-dimensional image is an infrared gray image, a speckle image, or a red-green-blue (RGB) image. The electronic device may be a device capable of image collection, for example, the device is AR glasses. In some embodiments, the device can collect a two-dimensional image and a depth image of the user's posture through its TOF camera. In other embodiments, the device may also collect the speckle image of the user's posture through its structured light module, and then calculate the depth image of the user's posture based on the speckle image.
步骤202,根据所述二维图像和所述深度图像,构建所述用户的三维姿态模型。Step 202: Construct a three-dimensional posture model of the user according to the two-dimensional image and the depth image.
在实现时,电子设备可以基于该二维图像进行关键特征点提取,例如识别用户的关节点,将这些关节点作为关键特征点。以用户姿态为手势为例,可以将手部的关节点和掌心作为关键特征点;然后,将这些特征点的像素坐标转换为特定坐标系(例如笛卡尔坐标系)中的x坐标和y坐标,将这些特征点在深度图像中对应的点的深度信息转换为特定坐标系中的z坐标,这样就得到了这些特征点在该特定坐标系下的三维坐标;最后,基于这些特征点的三维坐标,即可构建出该用户姿态的三维姿态模型。构建三维模型的方法可以是多种多样的。例如,根据这些特征点的位置关系,将这些特征点连接起来,从而得到该用户姿态的三维姿态模型。In implementation, the electronic device may perform key feature point extraction based on the two-dimensional image, for example, identify joint points of the user, and use these joint points as key feature points. Taking the user's posture as a gesture as an example, the joint points of the hand and the center of the palm can be used as key feature points; then, the pixel coordinates of these feature points are converted into x and y coordinates in a specific coordinate system (such as a Cartesian coordinate system) , Convert the depth information of the corresponding points of these feature points in the depth image to the z-coordinates in the specific coordinate system, so that the three-dimensional coordinates of these feature points in the specific coordinate system are obtained; finally, the three-dimensional coordinates based on these feature points Coordinates, a three-dimensional posture model of the user’s posture can be constructed. There are many ways to build a three-dimensional model. For example, according to the positional relationship of these feature points, these feature points are connected to obtain a three-dimensional posture model of the user's posture.
可以理解地,相比于仅基于二维图像来实现手势识别,结合深度图像实现手势识别的方法,其识别准确度更高,这样能够减少输出错误字符的概率,进而能够提高用户无键盘打字的效率。举例来说,如图3所示,如果用户在与镜头平行的平面伸出手势301,无论是仅基于二维图像还是结合深度图像,均能够识别出该手势是伸出了中指和食指;但是,如果用户在与镜头垂直的平面伸出该手势(如图片302所示),则仅基于二维图像得到的识别结果可能为用户仅伸出了一个手指,而结合深度图像得到的是三维姿态模型,其不仅包含二维图像的信息,还包含了另一维度(即深度)的信息,其识别结果更为准确。Understandably, compared to the method of realizing gesture recognition based on only two-dimensional images and combining depth images to realize gesture recognition, its recognition accuracy is higher, which can reduce the probability of outputting wrong characters, thereby improving the user's keyboardless typing efficient. For example, as shown in Figure 3, if the user extends a gesture 301 in a plane parallel to the lens, whether it is based on only a two-dimensional image or combined with a depth image, it can be recognized that the gesture is extending the middle finger and index finger; but , If the user extends the gesture in a plane perpendicular to the lens (as shown in picture 302), the recognition result obtained only based on the two-dimensional image may be that the user has only extended one finger, and the three-dimensional gesture obtained by combining the depth image The model, which not only contains the information of the two-dimensional image, but also contains the information of another dimension (ie, depth), the recognition result is more accurate.
步骤203,对所述三维姿态模型进行分析,输出分析得到的目标字符,以实现无键盘打字功能。Step 203: Analyze the three-dimensional posture model, and output the target characters obtained by the analysis, so as to realize the keyboardless typing function.
输出的方式可以是多种多样的。举例来说,假设字符为字母,电子设备可以直接在显示界面上显示该字母,也可以显示该字母对应的多个候选汉字。电子设备可以将该字母输出给汉字输入法,以使汉字输入法在显示界面上显示这些候选汉字。The output method can be varied. For example, assuming that the character is a letter, the electronic device can directly display the letter on the display interface, or display multiple candidate Chinese characters corresponding to the letter. The electronic device can output the letter to the Chinese character input method, so that the Chinese character input method displays these candidate Chinese characters on the display interface.
在实现时,电子设备可以将该三维姿态模型输入至预先训练得到的分类器中,从而得到与每一预定义的字符姿态模型之间的相似度,即属于每一预定义的字符姿态模型的概率;然后将最大概率对应的预定义的字符姿态模型确定为与该三维姿态模型相匹配的目标姿态模型,将该目标姿态模型对应的字符确定为目标字符;其中,该分类器可以通过每一预定义的字符姿态模型的样本图像训练深度学习模型得到。In the implementation, the electronic device can input the three-dimensional pose model into the pre-trained classifier to obtain the similarity with each predefined character pose model, that is, the degree of similarity that belongs to each predefined character pose model Probability; then the predefined character pose model corresponding to the maximum probability is determined as the target pose model matching the three-dimensional pose model, and the character corresponding to the target pose model is determined as the target character; where the classifier can pass each The sample image of the predefined character pose model is obtained by training the deep learning model.
在本申请实施例中,提供一种打字方法,在该方法中,电子设备获取用户姿态的二维图像和深度图像;根据所述二维图像和所述深度图像,构建所述用户的三维姿态模型;对所述三维姿态模型进行分析,输出分析得到的目标字符;相比于仅基于二维图像确定用户姿态,从而实现无键盘打字功能,本申请实施例基于二维图像和深度图像确定的用户姿态更加准确,进而能够提高无键盘打字的效率。可以理解地,用户姿态的识别准确度直接影响了无键 盘打字的效率,识别准确度低,用户需要多次做出识别错误的姿态,从而大大降低了打字效率,用户体验差;而用户姿态的识别准确度越高,无键盘打字的效率也就越高。In an embodiment of the present application, a typing method is provided. In this method, an electronic device obtains a two-dimensional image and a depth image of a user's posture; and constructs a three-dimensional posture of the user based on the two-dimensional image and the depth image Model; the three-dimensional pose model is analyzed, and the target characters obtained by the analysis are output; compared to determining the user's pose based on only two-dimensional images, thereby realizing the keyboardless typing function, the embodiment of the present application is determined based on two-dimensional images and depth images The user's posture is more accurate, which in turn can improve the efficiency of keyboardless typing. Understandably, the recognition accuracy of user gestures directly affects the efficiency of keyboardless typing, and the recognition accuracy is low. The user needs to make incorrect recognition gestures many times, which greatly reduces the typing efficiency and the user experience is poor. The higher the recognition accuracy, the higher the efficiency of keyboardless typing.
本申请实施例再提供一种打字方法,图4为本申请实施例打字方法的另一实现流程示意图,如图4所示,所述方法至少可以包括以下步骤401至步骤405:The embodiment of the application further provides a typing method. FIG. 4 is a schematic diagram of another implementation process of the typing method of the embodiment of the application. As shown in FIG. 4, the method may at least include the following steps 401 to 405:
步骤401,接收启动指令,所述启动指令用于指示启动姿态输入法。Step 401: Receive a start instruction, where the start instruction is used to instruct to start the gesture input method.
在实际应用中,用户可以通过做出特定的姿态来启动该姿态输入法,也可以通过触摸或按压用于启动姿态输入法的按钮,来启动该姿态输入法。In practical applications, the user can start the gesture input method by making a specific gesture, or can start the gesture input method by touching or pressing a button for starting the gesture input method.
可以理解地,所述姿态输入法,指的是用户通过做出不同的姿态即可实现对应字符的输入。例如,姿态输入法为手势输入法,用户可以通过做出不同的手势,使得电子设备识别并输出对应的字母。再如,姿态输入法还可以为体态输入法。Understandably, the gesture input method refers to that the user can input corresponding characters by making different gestures. For example, the gesture input method is a gesture input method, and the user can make different gestures to make the electronic device recognize and output corresponding letters. For another example, the gesture input method can also be a posture input method.
步骤402,响应于所述启动指令,启动所述姿态输入法。Step 402: In response to the start instruction, start the gesture input method.
在一些实施例中,电子设备可以在启动该姿态输入法之后,输出提示信息,以提示用户可以通过姿态进行打字。输出提示信息的方式可以是多种多样的,例如输出提示信号、输出语音提示或者输出文字提示。In some embodiments, the electronic device may output prompt information after starting the gesture input method to prompt the user to type by gesture. There are various ways to output the prompt information, such as outputting a prompt signal, outputting a voice prompt, or outputting a text prompt.
可以理解地,这里启动姿态输入法的目的是为了使得同一姿态能够表示多种含义。例如,在姿态输入法被打开的前提下,某一姿态表示的是对应的字符;但是在该输入法被打开之前,该姿态表示的是一控制指令,如点击操作。Understandably, the purpose of starting the gesture input method here is to enable the same gesture to express multiple meanings. For example, on the premise that the gesture input method is turned on, a certain gesture represents a corresponding character; but before the input method is turned on, the gesture represents a control instruction, such as a click operation.
步骤403,获取用户姿态的二维图像和深度图像;Step 403: Obtain a two-dimensional image and a depth image of the user's posture;
在一些实施例中,所述打字方法可以应用于头戴设备,该头戴设备可以包括用于采集所述用户姿态的深度相机。该头戴设备可以控制深度相机采集用户姿态的二维图像和深度图像。深度相机,例如可以是TOF相机,也可以是结构光模组。In some embodiments, the typing method may be applied to a head-mounted device, and the head-mounted device may include a depth camera for capturing the user's posture. The headset can control the depth camera to collect two-dimensional images and depth images of the user's posture. The depth camera, for example, can be a TOF camera or a structured light module.
在一些实施例中,TOF相机的图像传感器的长和宽的尺寸相同,以使所述TOF相机的镜头与所述图像传感器为内接圆关系,从而使得所述图像传感器采集的二维图像和深度图像的有效像素区域为以所述图像传感器中心为圆心的圆形区域;这样,就减少了二维图像和深度图像所包括的冗余信息,从而降低了对姿态识别的干扰,提高姿态识别的准确率,进而缩短无键盘打字的延迟性,提高无键盘打字的效率。In some embodiments, the length and width of the image sensor of the TOF camera are the same, so that the lens of the TOF camera and the image sensor are in an inscribed circle relationship, so that the two-dimensional image collected by the image sensor and The effective pixel area of the depth image is a circular area with the center of the image sensor as the center; in this way, the redundant information included in the two-dimensional image and the depth image is reduced, thereby reducing the interference to the gesture recognition and improving the gesture recognition The accuracy rate of keyboardless typing can shorten the delay of keyboardless typing and improve the efficiency of keyboardless typing.
为了能够覆盖更大的姿态操作(例如手势操作)空间,在一些实施例中,深度相机的镜头的视场角范围为[100°,120°]。换言之,视场角可以是[100°,120°]内的任一角,本申请实施例中对此不做限定。当然,深度相机的镜头也可以是视场角小于100°的镜头。In order to be able to cover a larger space for gesture operations (for example, gesture operations), in some embodiments, the field of view range of the depth camera lens is [100°, 120°]. In other words, the field of view angle can be any angle within [100°, 120°], which is not limited in the embodiment of the present application. Of course, the lens of the depth camera can also be a lens with a field of view less than 100°.
步骤404,根据所述二维图像和所述深度图像,构建所述用户的三维姿态模型;Step 404: Construct a three-dimensional posture model of the user according to the two-dimensional image and the depth image;
步骤405,对所述三维姿态模型进行分析,输出分析得到的目标字符,以实现无键盘打字功能。Step 405: Analyze the three-dimensional posture model, and output the target characters obtained by the analysis, so as to realize the keyboardless typing function.
在本申请实施例中,提供一种打字方法,电子设备接收用户输入的启动指令,并响应于该指令,启动姿态输入法,从而实现无键盘打字功能。这样,在姿态输入法工作的模式下,用户做出的某一姿态才表示一字符,否则,该姿态则表示其他含义;如此,使得一个姿态可以具有多个含义,提高用户姿态的复用性。In an embodiment of the present application, a typing method is provided. An electronic device receives a start instruction input by a user, and in response to the instruction, starts a gesture input method, thereby realizing a keyboardless typing function. In this way, in the working mode of the gesture input method, a certain gesture made by the user represents a character, otherwise, the gesture represents other meanings; in this way, a gesture can have multiple meanings, which improves the reusability of user gestures .
本申请实施例再提供一种打字方法,图5A为本申请实施例打字方法的再一实现流程示意图,如图5A所示,所述方法至少可以包括以下步骤501至步骤508:The embodiment of the present application provides another typing method. FIG. 5A is a schematic diagram of another implementation process of the typing method according to the embodiment of the present application. As shown in FIG. 5A, the method may at least include the following steps 501 to 508:
步骤501,获取用户姿态的二维图像和深度图像。Step 501: Obtain a two-dimensional image and a depth image of the user's posture.
可以理解地,当实现步骤501的电子设备是服务器时,服务器可以接收用户终端发送的上述两张图像。这两张图像可以是用户终端通过TOF相机采集的,也可以是用户终端通过结构光模组先采集得到用户姿态的散斑图(即二维图像的一种示例),然后基于该散斑图计算得到对应的深度图像。Understandably, when the electronic device implementing step 501 is a server, the server can receive the above two images sent by the user terminal. These two images can be collected by the user terminal through the TOF camera, or the user terminal can first collect the speckle image of the user's posture (that is, an example of a two-dimensional image) through the structured light module, and then based on the speckle image Calculate the corresponding depth image.
当实现步骤501的电子设备不是服务器,而是用户终端时,用户终端可以通过自身的深度相机得到上述图像。例如,用户终端通过TOF相机采集得到上述两张图像;或者,用户终端通过结构光模组采集得到用户姿态的散斑图,然后基于该散斑图计算得到对应的深度图像。When the electronic device implementing step 501 is not a server but a user terminal, the user terminal can obtain the above-mentioned image through its own depth camera. For example, the user terminal acquires the above two images through a TOF camera; or, the user terminal acquires a speckle image of the user's posture through a structured light module, and then calculates the corresponding depth image based on the speckle image.
步骤502,识别所述二维图像中包含的所述用户姿态的多个关键特征点。Step 502: Identify multiple key feature points of the user gesture included in the two-dimensional image.
在一些实施例中,关键特征点可以是用户身体的关节点,或者还可以是手部的关节点和掌心。In some embodiments, the key feature points may be the joint points of the user's body, or may also be the joint points of the hand and the palm.
步骤503,将每一所述关键特征点的像素坐标转换为特定坐标系中的x坐标和y坐标。Step 503: Convert the pixel coordinates of each of the key feature points into x-coordinates and y-coordinates in a specific coordinate system.
可以理解地,所述特定坐标系的作用是为了将关键特征点的像素坐标和深度信息统一至一个坐标系下。在一些实施例中,特定坐标系为笛卡尔坐标系。Understandably, the function of the specific coordinate system is to unify the pixel coordinates and depth information of the key feature points into one coordinate system. In some embodiments, the specific coordinate system is a Cartesian coordinate system.
步骤504,从所述深度图像中提取每一所述关键特征点的深度信息;Step 504: Extract the depth information of each key feature point from the depth image;
步骤505,将每一所述关键特征点的深度信息转换为所述特定坐标系中的z坐标;Step 505: Convert the depth information of each of the key feature points into z-coordinates in the specific coordinate system;
步骤506,基于每一所述关键特征点的x、y和z坐标,构建出所述三维姿态模型;Step 506: Construct the three-dimensional pose model based on the x, y, and z coordinates of each of the key feature points;
步骤507,将所述三维姿态模型与多个预定义的字符姿态模型进行匹配,得到目标姿态模型;其中,每一预定义的字符姿态模型用于唯一表示对应的字符。Step 507: Match the three-dimensional pose model with a plurality of predefined character pose models to obtain a target pose model; wherein, each predefined character pose model is used to uniquely represent a corresponding character.
可以理解地,字符指类字形单位或符号,包括字母、数字、运算符号、标点符号和其他符号,以及一些功能性符号。Understandably, characters refer to glyph-like units or symbols, including letters, numbers, arithmetic symbols, punctuation marks and other symbols, as well as some functional symbols.
在一些实施例中,所述用户姿态可以为所述用户的手部姿态(也即手势),所述预定义的字符姿态模型可以为字母手势模型,所述字母手势模型用于唯一表示对应的字母。例如图6所示,所述多个预定义的字符姿态模型为美语手语(American Sign Language,ASL)手势,每一ASL手势用于唯一表示ASL定义的字母。In some embodiments, the user gesture may be the user's hand gesture (that is, a gesture), the predefined character gesture model may be an alphabet gesture model, and the alphabet gesture model is used to uniquely represent the corresponding letter. For example, as shown in FIG. 6, the multiple predefined character gesture models are American Sign Language (ASL) gestures, and each ASL gesture is used to uniquely represent a letter defined by ASL.
可以理解地,在无键盘打字的应用场景中,定义一套标准化、通用性的手势操作指令,不仅方便了普通人群的使用,也极大地方便了一些特殊人群(如聋哑人群)的使用,这些特殊人群无需在重新学习新的手势操作,即可高效率地进行无键盘打字。Understandably, in the application scenario of keyboardless typing, defining a set of standardized and universal gesture operation instructions not only facilitates the use of ordinary people, but also greatly facilitates the use of some special people (such as deaf-mute people). These special groups of people do not need to relearn new gestures to perform keyboardless typing efficiently.
并且,多个预定义的字符姿态模型采用ASL手势,增强了打字方法的通用性,使得实现所述打字方法的电子设备更加容易被用户接受,尤其是那些聋哑人。In addition, multiple predefined character gesture models adopt ASL gestures, which enhances the versatility of the typing method, and makes the electronic device implementing the typing method easier to accept by users, especially those who are deaf and dumb.
在一些实施例中,电子设备可以确定该三维姿态模型与每一预定义的字符姿态模型之间的相似度;然后将相似度满足特定条件的预定义的字符姿态模型,确定为与该三维姿态模型相匹配的目标姿态模型,将该目标姿态模型对应的字符作为目标字符输出。In some embodiments, the electronic device may determine the similarity between the three-dimensional pose model and each predefined character pose model; and then determine the predefined character pose model whose similarity meets specific conditions as the same as the three-dimensional pose model. The target pose model that matches the model, and the character corresponding to the target pose model is output as the target character.
步骤508,将所述目标姿态模型对应的字符作为所述目标字符输出。Step 508: Output the character corresponding to the target pose model as the target character.
对于步骤507,将所述三维姿态模型与多个预定义的字符姿态模型进行匹配,得到目标姿态模型,实现的方式可以是多种多样的。例如,在一些实施例中,如图5B所示,电子设备可以通过以下步骤5071和步骤5072实现步骤507所述的将所述三维姿态模型与多个预定义的字符姿态模型进行匹配,得到目标姿态模型:For step 507, the three-dimensional pose model is matched with a plurality of predefined character pose models to obtain the target pose model, which can be implemented in various ways. For example, in some embodiments, as shown in FIG. 5B, the electronic device may perform the matching of the three-dimensional pose model with multiple predefined character pose models in step 507 through the following steps 5071 and 5072 to obtain the target Posture model:
步骤5071,确定所述三维姿态模型与每一所述预定义的字符姿态模型之间的相似度;Step 5071: Determine the similarity between the three-dimensional pose model and each of the predefined character pose models;
在一些实施例中,电子设备可以利用预先训练得到的分类器对所述三维姿态模型进行处理,得到与每一所述预定义的字符姿态模型之间的相似度,也就是该三维姿态模型属于每一预定义的字符姿态模型的概率;其中,所述分类器是利用每一所述预定义的字符姿态模型的样本图像对深度学习模型进行训练得到的。In some embodiments, the electronic device may use a pre-trained classifier to process the three-dimensional pose model to obtain the similarity with each of the predefined character pose models, that is, the three-dimensional pose model belongs to The probability of each predefined character pose model; wherein, the classifier is obtained by training the deep learning model by using the sample image of each predefined character pose model.
需要说明的是,预定义的字符姿态模型对应一帧或多帧样本图像,多帧样本图像可以是相机在多个角度下采集的该字符姿态模型的图像;如此,能够提高分类器的计算准确度,从而提高姿态识别的准确度,进而提高无键盘打字的效率。It should be noted that the predefined character pose model corresponds to one or more frames of sample images, and the multi-frame sample images can be images of the character pose model collected by the camera at multiple angles; this way, the calculation accuracy of the classifier can be improved In this way, the accuracy of gesture recognition is improved, and the efficiency of keyboardless typing is improved.
在另一些实施例中,电子设备可以确定三维姿态模型与每一预定义的字符姿态模型之间的欧式距离,以实现相似度的确定。In other embodiments, the electronic device may determine the Euclidean distance between the three-dimensional pose model and each predefined character pose model to achieve the determination of similarity.
步骤5072,将相似度满足特定条件预定义的字符姿态模型,确定为所述目标姿态模型。Step 5072: Determine a character pose model whose similarity meets a predetermined condition as the target pose model.
可以理解地,用于表征相似度的参数类型不同,对应的特定条件是不同的。例如,相似度是通过所述分类器确定的,即分类器输出的是该三维姿态模型属于每一预定义的字符姿态模型的概率,则对应的特定条件为最大概率。即,将每一概率中的最大概率对应的字符姿态 模型确定为目标姿态模型。再如,相似度是通过计算两个模型之间的欧式距离确定的,则对应的特定条件是最小欧式距离。即,将每一欧式距离中的最小欧式距离对应的字符姿态模型确定为目标姿态模型。Understandably, different types of parameters used to characterize similarity have different corresponding specific conditions. For example, the similarity is determined by the classifier, that is, the classifier outputs the probability that the three-dimensional pose model belongs to each predefined character pose model, and the corresponding specific condition is the maximum probability. That is, the character pose model corresponding to the maximum probability in each probability is determined as the target pose model. For another example, similarity is determined by calculating the Euclidean distance between two models, and the corresponding specific condition is the minimum Euclidean distance. That is, the character posture model corresponding to the smallest Euclidean distance in each Euclidean distance is determined as the target posture model.
再如,在一些实施例中,电子设备还可以直接通过分类器得到目标姿态模型,从而实现上述步骤507。即,电子设备将所述三维姿态模型输入至预先训练得到的分类器,得到与所述三维姿态模型相匹配的目标姿态模型;其中,所述分类器是利用每一所述预定义的字符姿态模型的样本图像对深度学习模型进行训练得到的。For another example, in some embodiments, the electronic device may also directly obtain the target pose model through the classifier, thereby implementing the above step 507. That is, the electronic device inputs the three-dimensional pose model to a pre-trained classifier to obtain a target pose model that matches the three-dimensional pose model; wherein, the classifier uses each of the predefined character poses The sample image of the model is obtained by training the deep learning model.
发明人在研究过程中发现,手势识别的相关技术主要存在一些不足:In the course of research, the inventor discovered that the related technology of gesture recognition mainly has some shortcomings:
(1)、当前没有提出应用在AR眼镜上的手势识别打字方案,虽然AR眼镜等产品集成了手势识别功能,但是手势识别用于实现双击、滑动、点击等操作的识别,利用手势识别直接实现打字的功能还未完善;(1) Currently, no gesture recognition typing scheme applied to AR glasses is proposed. Although AR glasses and other products integrate gesture recognition, gesture recognition is used to recognize double-click, slide, click and other operations, and it can be directly realized by gesture recognition. The typing function is not yet perfect;
(2)、很多手势识别方案是基于指尖特征点的轨迹定位来实现手写字符的识别,但是这种方式的识别准确率不高。而本申请实施例提供的手势识别打字方法是基于一类标准形态的手势样本实现的,识别准确率得到提升;(2) Many gesture recognition schemes realize the recognition of handwritten characters based on the trajectory positioning of the feature points of the fingertips, but the recognition accuracy of this method is not high. However, the gesture recognition typing method provided by the embodiment of the present application is implemented based on a type of standard gesture samples, and the recognition accuracy is improved;
(3)、当前的手势识别是基于自定义的一套手势来实现一些双击、点击、滑动等控制操作的。例如,拳头挥两下,表示双击操作;拳头挥一下,表示单击操作;拳头左右滑动,表示滑动操作。然而定义的这种手势操作并非是国际通用的操作,且主要用于控制,而非打字,所以这种手势识别的使用群体一般是普通人群,其更加无法解决特殊人群(聋哑人群)的打字体验和打字效率;(3) The current gesture recognition is based on a set of customized gestures to achieve some control operations such as double-clicking, tapping, and sliding. For example, swiping a fist twice indicates a double-click operation; swiping a fist once indicates a single-click operation; sliding the fist left and right indicates a sliding operation. However, this defined gesture operation is not an international general operation, and is mainly used for control, not typing. Therefore, the user group of this gesture recognition is generally the general population, and it cannot solve the typing of special people (deaf-mute people). Experience and typing efficiency;
(4)、当前手势识别操作等功能受识别准确率和延迟性的影响,未能满足用户交互体验的舒适度。也就是说,由于当前的手势识别准确率低,当手势识别错误时,需要用户重复之前所做出的手势,甚至于用户需要多次重复做出同一手势,才能准确识别手势所要表达的含义,这样就降低了手势识别操作的效率,手势识别的延迟性增加。(4) The current gesture recognition operation and other functions are affected by the recognition accuracy and latency, and fail to meet the comfort of the user's interactive experience. That is to say, due to the low accuracy of current gesture recognition, when the gesture recognition is wrong, the user needs to repeat the gesture made before, and even the user needs to make the same gesture multiple times to accurately recognize the meaning of the gesture. This reduces the efficiency of gesture recognition operations and increases the latency of gesture recognition.
基于此,下面将说明本申请实施例在一个实际的应用场景中的示例性应用。Based on this, an exemplary application of the embodiment of the present application in an actual application scenario will be described below.
本申请实施例提供了一种应用在AR眼镜上的TOF深度相机手势识别打字***的设计方案。通过定义一套标准的手势字母来替代固化键盘的物理按键。结合AR眼镜上的TOF深度相机可以实时捕捉手势的红外图像(即二维图像的一种示例)和深度图像,再经过特定的手势识别算法流程,实时输出手势对应的字母,实现无键盘打字的功能。The embodiment of the application provides a design scheme of a TOF depth camera gesture recognition typing system applied to AR glasses. By defining a set of standard gesture letters to replace the physical keys of the solidified keyboard. Combined with the TOF depth camera on the AR glasses, it can capture the infrared images of gestures (an example of two-dimensional images) and depth images in real time, and then through a specific gesture recognition algorithm process, the letters corresponding to the gestures are output in real time to achieve keyboardless typing Features.
26个英文字母对应的手势,即预定义的字符姿态模型的一种示例,如图6所示,即美语手语(American Sign Language,ASL)手势字母,所谓手势字母即通过手势表示的字母。采 用上述标准定义的手势识别既适用于普通使用者,而且适用于聋哑人等特殊人群。定义26个英文字母手势之后,采用TOF深度相机采集大量的手势深度图像和红外灰度图像(即IR图像),用作算法分类器的手势识别样本。对这些手势识别样本进行手势特征点的提取,确定每一样本的手势特征点的位姿信息,该位姿信息例如可以是笛卡尔坐标下的三维坐标信息;基于每一样本的手势特征点的位姿信息,构建该样本包含的三维手势模型;将这些三维手势模型作为训练样本,将每一三维手势模型对应的字母作为标签,对深度学习模型进行训练,从而得到上述分类器。The gestures corresponding to the 26 English letters are an example of a predefined character gesture model, as shown in FIG. 6, that is, American Sign Language (ASL) gesture letters. The so-called gesture letters are letters represented by gestures. The gesture recognition defined by the above-mentioned standard is not only suitable for ordinary users, but also suitable for special groups such as deaf-mute people. After defining 26 English letter gestures, a TOF depth camera is used to collect a large number of gesture depth images and infrared gray-scale images (ie IR images), which are used as gesture recognition samples for the algorithm classifier. The gesture feature points are extracted for these gesture recognition samples, and the pose information of the gesture feature points of each sample is determined. The pose information can be, for example, three-dimensional coordinate information in Cartesian coordinates; based on the gesture feature points of each sample The pose information is used to construct the three-dimensional gesture model contained in the sample; these three-dimensional gesture models are used as training samples, and the letters corresponding to each three-dimensional gesture model are used as labels to train the deep learning model to obtain the above-mentioned classifier.
对于打字***的硬件设计,如图7所示,其示出了一款集成在AR眼镜上的大视场角TOF深度相机模组,整个用于手势识别打字的硬件模块包括大视场角TOF深度相机70、处理芯片和电路72、以及AR眼镜镜片和显示***73;用户可以在TOF深度相机的视场内操作ASL英文字母手势71。用户在AR眼镜前端进行手势操作,从而被AR眼镜上的TOF深度相机所捕获,TOF深度相机将采集的图像实时传输到处理芯片上,经过手势识别相关算法处理,实现手势识别打字的功能。For the hardware design of the typing system, as shown in Figure 7, it shows a large field of view TOF depth camera module integrated on AR glasses. The entire hardware module for gesture recognition and typing includes a large field of view TOF The depth camera 70, the processing chip and the circuit 72, and the AR glasses lens and the display system 73; the user can operate the ASL English letter gesture 71 in the field of view of the TOF depth camera. The user performs gesture operations at the front end of the AR glasses, which is captured by the TOF depth camera on the AR glasses. The TOF depth camera transmits the collected images to the processing chip in real time, and is processed by the algorithm related to gesture recognition to realize the function of gesture recognition and typing.
图8为TOF深度相机模组深度成像示意图,如图8所示,为了覆盖更大的手势操作空间,TOF深度相机镜头的视场角设计为[100°,120°],TOF相机的图像传感器的长和宽的尺寸比例为1:1,像素数为512×512。TOF相机的图像传感器的长和宽的尺寸比例设计为1:1的原因是:为了和大视场角镜头的视场(Field of View,FOV)做内接圆配合,从而使得输出的深度图像只包含视场范围内的信息。TOF深度成像作为一种主动发射的3D成像技术,发射模组发射调试的光脉冲信号,接收模组接收到发射光信号后,通过计算光线发射时间与反射时间的时间差或相位差,计算得到手势的深度信息。Figure 8 is a schematic diagram of the depth imaging of the TOF depth camera module. As shown in Figure 8, in order to cover a larger gesture operation space, the field of view angle of the TOF depth camera lens is designed to be [100°, 120°], the image sensor of the TOF camera The ratio of length to width is 1:1, and the number of pixels is 512×512. The length and width ratio of the image sensor of the TOF camera is designed to be 1:1. The reason is: in order to match the field of view (FOV) of the large field of view lens with an inscribed circle, so as to make the output depth image Only include information within the field of view. TOF depth imaging is an actively launched 3D imaging technology. The transmitter module emits the debug light pulse signal. After the receiver module receives the transmitted light signal, it calculates the time difference or phase difference between the light emission time and the reflection time to calculate the gesture In-depth information.
对于软件***设计,本申请实施例提供的TOF手势识别打字***的软件***主要包括:手势图像采集模块、手势检测模块、手势识别模块和应用模块。For the design of the software system, the software system of the TOF gesture recognition typing system provided in the embodiment of the present application mainly includes: a gesture image acquisition module, a gesture detection module, a gesture recognition module, and an application module.
其中,手势图像采集模块使用本申请实施例设计的大角度TOF深度相机,实时采集相关的手势图像和视频流数据。采集的手势图像包括940纳米(nanometer,nm)波段或者850nm波段的红外灰度图和深度图像,分辨率为512×512,有效像素区域为78%,即以图像传感器中心为圆心的圆形区域。采集的视频流数据帧率可以达到30帧每秒(frames per second,fps),满足手势识别的实时性要求。Among them, the gesture image acquisition module uses the large-angle TOF depth camera designed in the embodiment of the application to collect relevant gesture images and video stream data in real time. The collected gesture images include infrared grayscale images and depth images in the 940 nanometer (nanometer, nm) band or 850 nm band. The resolution is 512×512, and the effective pixel area is 78%, which is a circular area with the center of the image sensor as the center. . The frame rate of the collected video stream data can reach 30 frames per second (fps), which meets the real-time requirements of gesture recognition.
手势检测模块,利用基于红外灰度图像和深度信息的方法来进行手势检测,主要流程包括:相机标定和重投影、红外灰度图像和深度图像分割等。The gesture detection module uses methods based on infrared gray image and depth information to perform gesture detection. The main processes include camera calibration and reprojection, infrared gray image and depth image segmentation, etc.
手势识别模块和应用模块,主要是利用上述红外灰度和深度图像识别手势,主要流程包括:手势样本数据的准备、分类器训练和手势特征点(包含关节特征点)提取等。The gesture recognition module and the application module mainly use the above-mentioned infrared grayscale and depth images to recognize gestures. The main processes include: gesture sample data preparation, classifier training, and gesture feature points (including joint feature points) extraction.
在本申请实施例中,提供的打字方法,摆脱了物理按键和触摸屏打字的限制,可以实现隔空打字的功能;采用TOF深度相机进行手势识别的打字方案,改善了人机交互方式,在AR眼镜等产品上有更高自由度和舒适度的体验和应用;采用手势识别的打字方式,可以解决特殊人群(如聋哑人群等)的打字体验和打字效率。In the embodiments of this application, the typing method provided freed from the limitations of physical keys and touch screen typing, and can realize the function of air typing; the typing solution that uses TOF depth camera for gesture recognition improves the human-computer interaction mode. Glasses and other products have a higher degree of freedom and comfortable experience and applications; the typing method of gesture recognition can solve the typing experience and typing efficiency of special people (such as deaf-mute people, etc.).
本申请实施例提供一种应用在AR眼镜上取代物理键盘的打字方式。该方案通过AR眼镜上集成的大视场角TOF深度相机实时捕获和识别手势,通过定义和采集手势样本库,结合手势检测和手势识别算法,实现隔空打字、手势打字的功能。该方案彻底摆脱了物理键盘和触摸屏打字的限制,具有操作自由度高等优势。The embodiment of the present application provides a typing method that is applied to AR glasses instead of a physical keyboard. The solution captures and recognizes gestures in real time through the large field of view TOF depth camera integrated on the AR glasses, defines and collects a gesture sample library, and combines gesture detection and gesture recognition algorithms to realize the functions of air typing and gesture typing. This solution completely gets rid of the limitations of physical keyboard and touch screen typing, and has the advantages of high degree of freedom of operation.
在一些实施例中,用于手势识别的深度传感器还可以是结构光模组。集成在AR眼镜上的深度相机改为结构光模组,如图9所示,整个用于手势识别打字的硬件模块包括大视场角的结构光模组90、处理芯片和电路92、以及AR眼镜镜片和显示***93。用户在AR眼镜前端进行手势操作,从而被AR眼镜上的结构光深度相机所捕获,结构光深度相机包括发射端和接收端,将采集的深度图像实时传输到处理芯片上,经过手势识别相关算法处理,实现手势识别打字的功能。In some embodiments, the depth sensor used for gesture recognition may also be a structured light module. The depth camera integrated on the AR glasses is changed to a structured light module. As shown in Figure 9, the entire hardware module for gesture recognition and typing includes a structured light module 90 with a large field of view, a processing chip and circuit 92, and AR Glasses lens and display system 93. The user performs gesture operations on the front end of the AR glasses, which is captured by the structured light depth camera on the AR glasses. The structured light depth camera includes a transmitter and a receiver, and transmits the collected depth images to the processing chip in real time, and passes gesture recognition related algorithms Processing to realize the function of gesture recognition and typing.
在一些实施例中,如图10所示,用于手势识别的3D结构光模组深度成像基本原理是:通过近红外激光器,将具有一定结构特征的光线投射到被拍摄物体上,再由专门的红外摄像头进行采集。这种具备一定结构的光线,会因被摄物体的不同深度区域,而采集不同的图像相位信息,然后通过运算单元将这种结构的变化换算成深度信息,以此来获得三维结构。另外,基于结构光的手势识别打字***的算法和软件架构和基于TOF相机的基本一致。且利用结构光的优势是深度成像精度更高。In some embodiments, as shown in FIG. 10, the basic principle of depth imaging of the 3D structured light module for gesture recognition is: through a near-infrared laser, light with certain structural characteristics is projected onto the object to be photographed, and then a specialized The infrared camera for collection. This kind of light with a certain structure will collect different image phase information due to the different depth areas of the subject, and then use the arithmetic unit to convert this structure change into depth information to obtain a three-dimensional structure. In addition, the algorithm and software architecture of the structured light-based gesture recognition typing system are basically the same as those based on the TOF camera. And the advantage of using structured light is that the depth imaging accuracy is higher.
基于前述的实施例,本申请实施例提供的打字装置,可以包括所包括的各模块、以及各模块所包括的各单元,可以通过电子设备中的处理器来实现;当然也可通过具体的逻辑电路实现;在实施的过程中,处理器可以为中央处理器(CPU)、微处理器(MPU)、数字信号处理器(DSP)或现场可编程门阵列(FPGA)等。Based on the foregoing embodiments, the typing device provided by the embodiments of the present application may include the included modules and the units included in each module, and may be implemented by a processor in an electronic device; of course, it may also be implemented by specific logic. Circuit implementation; in the implementation process, the processor can be a central processing unit (CPU), a microprocessor (MPU), a digital signal processor (DSP), or a field programmable gate array (FPGA), etc.
图11A为本申请实施例打字装置的结构示意图,如图11A所示,所述装置110包括图像获取模块111、模型构建模块112和字符输出模块113,其中:图像获取模块111,用于获取用户姿态的二维图像和深度图像;FIG. 11A is a schematic structural diagram of a typing device according to an embodiment of the application. As shown in FIG. 11A, the device 110 includes an image acquisition module 111, a model construction module 112, and a character output module 113. The image acquisition module 111 is used to acquire users Two-dimensional image and depth image of posture;
模型构建模块112,用于根据所述二维图像和所述深度图像,构建所述用户的三维姿态模型;The model construction module 112 is configured to construct a three-dimensional posture model of the user according to the two-dimensional image and the depth image;
字符输出模块113,用于对所述三维姿态模型进行分析,输出分析得到的目标字符,以实现无键盘打字功能。The character output module 113 is used to analyze the three-dimensional posture model and output the target characters obtained by the analysis, so as to realize the keyboardless typing function.
在一些实施例中,如图11B所示,装置110还包括:指令接收模块114和启动模块115;其中,指令接收模块114,用于接收启动指令,所述启动指令用于指示启动姿态输入法;启动模块115,用于响应于所述启动指令,启动所述姿态输入法。In some embodiments, as shown in FIG. 11B, the device 110 further includes: an instruction receiving module 114 and an activation module 115; wherein, the instruction receiving module 114 is configured to receive a activation instruction, and the activation instruction is used to instruct to activate the gesture input method The activation module 115 is used to activate the gesture input method in response to the activation instruction.
在一些实施例中,装置110可以是头戴设备,图像获取模块111为深度相机,用于采集所述用户姿态的二维图像和深度图像,或者采集用户姿态的二维图像。In some embodiments, the device 110 may be a head-mounted device, and the image acquisition module 111 is a depth camera, which is used to collect a two-dimensional image and a depth image of the user's posture, or to collect a two-dimensional image of the user's posture.
在一些实施例中,所述头戴设备为AR眼镜,所述深度相机为TOF相机或结构光模组。In some embodiments, the headset is AR glasses, and the depth camera is a TOF camera or a structured light module.
在一些实施例中,所述TOF相机的图像传感器的长和宽的尺寸相同,以使所述TOF相机的镜头与所述图像传感器为内接圆关系,从而使得所述图像传感器采集的二维图像和深度图像的有效像素区域为以所述图像传感器中心为圆心的圆形区域。In some embodiments, the length and width of the image sensor of the TOF camera are the same, so that the lens of the TOF camera and the image sensor are in an inscribed circle relationship, so that the two-dimensional image captured by the image sensor The effective pixel area of the image and the depth image is a circular area centered on the center of the image sensor.
在一些实施例中,所述深度相机的镜头的视场角范围为[100°,120°]。In some embodiments, the field angle range of the lens of the depth camera is [100°, 120°].
在一些实施例中,模型构建模块112,用于:识别所述二维图像中包含的所述用户姿态的多个关键特征点;将每一所述关键特征点的像素坐标转换为特定坐标系中的x坐标和y坐标;从所述深度图像中提取每一所述关键特征点的深度信息;将每一所述关键特征点的深度信息转换为所述特定坐标系中的z坐标;基于每一所述关键特征点的x、y和z坐标,构建出所述三维姿态模型。In some embodiments, the model construction module 112 is configured to: identify a plurality of key feature points of the user gesture contained in the two-dimensional image; convert the pixel coordinates of each key feature point into a specific coordinate system Extract the depth information of each key feature point from the depth image; convert the depth information of each key feature point into the z coordinate in the specific coordinate system; based on The x, y, and z coordinates of each of the key feature points construct the three-dimensional posture model.
在一些实施例中,字符输出模块113,用于:将所述三维姿态模型与多个预定义的字符姿态模型进行匹配,得到目标姿态模型;其中,每一预定义的字符姿态模型用于唯一表示对应的字符;将所述目标姿态模型对应的字符作为所述目标字符输出。In some embodiments, the character output module 113 is configured to: match the three-dimensional pose model with a plurality of predefined character pose models to obtain a target pose model; wherein, each predefined character pose model is used for unique Represents the corresponding character; and outputs the character corresponding to the target pose model as the target character.
在一些实施例中,字符输出模块113,用于:确定所述三维姿态模型与每一所述预定义的字符姿态模型之间的相似度;将相似度满足特定条件预定义的字符姿态模型,确定为所述目标姿态模型。In some embodiments, the character output module 113 is configured to: determine the similarity between the three-dimensional pose model and each of the predefined character pose models; the predefined character pose model that meets specific conditions for the similarity, Determined as the target pose model.
在一些实施例中,字符输出模块113,用于:利用预先训练得到的分类器对所述三维姿态模型进行处理,得到与每一所述预定义的字符姿态模型之间的相似度;其中,所述分类器是利用每一所述预定义的字符姿态模型的样本图像对深度学习模型进行训练得到的。In some embodiments, the character output module 113 is configured to: use a pre-trained classifier to process the three-dimensional pose model to obtain the similarity with each of the predefined character pose models; wherein, The classifier is obtained by training a deep learning model by using sample images of each of the predefined character pose models.
在一些实施例中,所述用户姿态为所述用户的手部姿态,所述预定义的字符姿态模型为字母手势模型,所述字母手势模型用于唯一表示对应的字母。In some embodiments, the user gesture is a hand gesture of the user, the predefined character gesture model is a letter gesture model, and the letter gesture model is used to uniquely represent a corresponding letter.
在一些实施例中,所述多个预定义的字符姿态模型为美语手语ASL手势,每一ASL手势用于唯一表示ASL定义的字母。In some embodiments, the plurality of predefined character gesture models are American Sign Language ASL gestures, and each ASL gesture is used to uniquely represent a letter defined by the ASL.
以上装置实施例的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果。对于本申请装置实施例中未披露的技术细节,请参照本申请方法实施例的描述而理解。The description of the above device embodiment is similar to the description of the above method embodiment, and has similar beneficial effects as the method embodiment. For technical details not disclosed in the device embodiments of the present application, please refer to the description of the method embodiments of the present application for understanding.
需要说明的是,本申请实施例中,如果以软件功能模块的形式实现上述的打字方法,并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得电子设备执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的介质。这样,本申请实施例不限制于任何特定的硬件和软件结合。It should be noted that, in the embodiments of the present application, if the above typing method is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer readable storage medium. Based on this understanding, the technical solutions of the embodiments of the present application can be embodied in the form of a software product in essence or a part that contributes to related technologies. The computer software product is stored in a storage medium and includes a number of instructions to enable The electronic device executes all or part of the method described in each embodiment of the present application. The aforementioned storage media include: U disk, mobile hard disk, read only memory (Read Only Memory, ROM), magnetic disk or optical disk and other media that can store program codes. In this way, the embodiments of the present application are not limited to any specific combination of hardware and software.
对应地,如图12所示,本申请实施例提供的电子设备120,可以包括:包括存储器121和处理器122,所述存储器121存储有可在处理器122上运行的计算机程序,所述处理器122执行所述程序时实现上述实施例中提供的打字方法中的步骤。Correspondingly, as shown in FIG. 12, the electronic device 120 provided by the embodiment of the present application may include: a memory 121 and a processor 122. The memory 121 stores a computer program that can run on the processor 122. When the processor 122 executes the program, the steps in the typing method provided in the foregoing embodiment are implemented.
存储器121配置为存储由处理器122可执行的指令和应用,还可以缓存待处理器122以及电子设备120中各模块待处理或已经处理的数据(例如,图像数据、音频数据、语音通信数据和视频通信数据),可以通过闪存(FLASH)或随机访问存储器(Random Access Memory,RAM)实现。The memory 121 is configured to store instructions and applications executable by the processor 122, and can also cache the to-be-processed or processed data (for example, image data, audio data, voice communication data, and Video communication data) can be implemented through flash memory (FLASH) or random access memory (Random Access Memory, RAM).
对应地,本申请实施例提供的计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现上述实施例中提供的打字方法中的步骤。Correspondingly, the computer-readable storage medium provided by the embodiment of the present application has a computer program stored thereon, and when the computer program is executed by a processor, the steps in the typing method provided in the foregoing embodiment are implemented.
这里需要指出的是:以上存储介质和设备实施例的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果。对于本申请存储介质和设备实施例中未披露的技术细节,请参照本申请方法实施例的描述而理解。What needs to be pointed out here is that the description of the foregoing storage medium and device embodiments is similar to the description of the foregoing method embodiment, and has similar beneficial effects as the method embodiment. For technical details not disclosed in the storage medium and device embodiments of this application, please refer to the description of the method embodiments of this application for understanding.
应理解,说明书通篇中提到的“一个实施例”或“一些实施例”意味着与实施例有关的特定特征、结构或特性包括在本申请的至少一个实施例中。因此,在整个说明书各处出现的“在一个实施例中”或“在一些实施例中”未必一定指相同的实施例。此外,这些特定的特征、结构或特性可以任意适合的方式结合在一个或多个实施例中。应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。It should be understood that “one embodiment” or “some embodiments” mentioned throughout the specification means that a specific feature, structure, or characteristic related to the embodiment is included in at least one embodiment of the present application. Therefore, the appearances of "in one embodiment" or "in some embodiments" in various places throughout the specification do not necessarily refer to the same embodiment. In addition, these specific features, structures, or characteristics can be combined in one or more embodiments in any suitable manner. It should be understood that, in the various embodiments of the present application, the size of the sequence numbers of the above-mentioned processes does not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, and should not correspond to the embodiments of the present application. The implementation process constitutes any limitation. The serial numbers of the foregoing embodiments of the present application are for description only, and do not represent the superiority or inferiority of the embodiments.
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。It should be noted that in this article, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements not only includes those elements, It also includes other elements that are not explicitly listed, or elements inherent to the process, method, article, or device. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, method, article, or device that includes the element.
在本申请所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个单元或组件可以结合,或可以集成到另一个***,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性的、机械的或其它形式的。In the several embodiments provided in this application, it should be understood that the disclosed device and method can be implemented in other ways. The device embodiments described above are merely illustrative. For example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, such as: multiple units or components can be combined, or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the coupling, or direct coupling, or communication connection between the components shown or discussed can be indirect coupling or communication connection through some interfaces, devices or units, and can be electrical, mechanical or other forms. of.
上述作为分离部件说明的单元可以是、或也可以不是物理上分开的,作为单元显示的部件可以是、或也可以不是物理单元;既可以位于一个地方,也可以分布到多个网络单元上;可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。The units described above as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units; they may be located in one place or distributed on multiple network units; Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
另外,在本申请各实施例中的各功能单元可以全部集成在一个处理单元中,也可以是各单元分别单独作为一个单元,也可以两个或两个以上单元集成在一个单元中;上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。In addition, the functional units in the embodiments of the present application may all be integrated into one processing unit, or each unit may be individually used as a unit, or two or more units may be integrated into one unit; the above-mentioned integration The unit of can be implemented in the form of hardware, or in the form of hardware plus software functional units.
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、只读存储器(Read Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的介质。A person of ordinary skill in the art can understand that all or part of the steps in the above method embodiments can be implemented by a program instructing relevant hardware. The foregoing program can be stored in a computer readable storage medium. When the program is executed, the execution includes The steps of the foregoing method embodiment; and the foregoing storage medium includes: various media that can store program codes, such as a removable storage device, a read only memory (Read Only Memory, ROM), a magnetic disk, or an optical disk.
或者,本申请上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得电子设备执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、ROM、磁碟或者光盘等各种可以存储程序代码的介质。Alternatively, if the aforementioned integrated unit of this application is implemented in the form of a software function module and sold or used as an independent product, it may also be stored in a computer readable storage medium. Based on this understanding, the technical solutions of the embodiments of the present application can be embodied in the form of a software product in essence or a part that contributes to related technologies. The computer software product is stored in a storage medium and includes a number of instructions to enable The electronic device executes all or part of the method described in each embodiment of the present application. The aforementioned storage media include: removable storage devices, ROMs, magnetic disks, or optical disks and other media that can store program codes.
本申请所提供的几个方法实施例中所揭露的方法,在不冲突的情况下可以任意组合,得到新的方法实施例。The methods disclosed in the several method embodiments provided in this application can be combined arbitrarily without conflict to obtain new method embodiments.
本申请所提供的几个产品实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的产品实施例。The features disclosed in the several product embodiments provided in this application can be combined arbitrarily without conflict to obtain new product embodiments.
本申请所提供的几个方法或设备实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的方法实施例或设备实施例。The features disclosed in the several method or device embodiments provided in this application can be combined arbitrarily without conflict to obtain a new method embodiment or device embodiment.
以上所述,仅为本申请的实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above are only the implementation manners of this application, but the protection scope of this application is not limited to this. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in this application. Covered in the scope of protection of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims (20)

  1. 一种打字方法,所述方法包括:A typing method, the method includes:
    获取用户姿态的二维图像和深度图像;Obtain two-dimensional images and depth images of the user's posture;
    根据所述二维图像和所述深度图像,构建所述用户的三维姿态模型;Constructing a three-dimensional posture model of the user according to the two-dimensional image and the depth image;
    对所述三维姿态模型进行分析,输出分析得到的目标字符,以实现无键盘打字功能。The three-dimensional posture model is analyzed, and the target characters obtained by the analysis are output, so as to realize the keyboardless typing function.
  2. 根据权利要求1所述的方法,其中,在所述获取用户姿态的二维图像和深度图像之前,所述方法还包括:The method according to claim 1, wherein, before said acquiring a two-dimensional image and a depth image of the user's posture, the method further comprises:
    接收启动指令,所述启动指令用于指示启动姿态输入法;Receiving a start instruction, where the start instruction is used to instruct to start the gesture input method;
    响应于所述启动指令,启动所述姿态输入法。In response to the start instruction, start the gesture input method.
  3. 根据权利要求1所述的方法,其中,所述对所述三维姿态模型进行分析,输出分析得到的目标字符,包括:The method according to claim 1, wherein the analyzing the three-dimensional pose model and outputting the target characters obtained by the analysis comprises:
    将所述三维姿态模型与多个预定义的字符姿态模型进行匹配,得到目标姿态模型;其中,每一所述预定义的字符姿态模型用于唯一表示对应的字符;Matching the three-dimensional pose model with a plurality of predefined character pose models to obtain a target pose model; wherein, each of the predefined character pose models is used to uniquely represent a corresponding character;
    将所述目标姿态模型对应的字符作为所述目标字符输出。The character corresponding to the target pose model is output as the target character.
  4. 根据权利要求3所述的方法,其中,所述将所述三维姿态模型与多个预定义的字符姿态模型进行匹配,得到目标姿态模型,包括:The method according to claim 3, wherein the matching the three-dimensional pose model with a plurality of predefined character pose models to obtain a target pose model comprises:
    确定所述三维姿态模型与每一所述预定义的字符姿态模型之间的相似度;Determining the similarity between the three-dimensional pose model and each of the predefined character pose models;
    将相似度满足特定条件预定义的字符姿态模型,确定为所述目标姿态模型。A character pose model whose similarity meets a predetermined condition is determined as the target pose model.
  5. 根据权利要求3所述的方法,其中,所述将所述三维姿态模型与多个预定义的字符姿态模型进行匹配,得到目标姿态模型,包括:The method according to claim 3, wherein the matching the three-dimensional pose model with a plurality of predefined character pose models to obtain a target pose model comprises:
    将所述三维姿态模型输入至预先训练得到的分类器,得到与所述三维姿态模型相匹配的目标姿态模型;其中,所述分类器是利用每一所述预定义的字符姿态模型的样本图像对深度学习模型进行训练得到的。Input the three-dimensional pose model to a pre-trained classifier to obtain a target pose model matching the three-dimensional pose model; wherein, the classifier uses sample images of each of the predefined character pose models It is obtained by training the deep learning model.
  6. 根据权利要求3至5任一项所述的方法,其中,所述用户姿态为所述用户的手部姿态,所述预定义的字符姿态模型为字母手势模型,所述字母手势模型用于唯一表示对应的字母。The method according to any one of claims 3 to 5, wherein the user gesture is a hand gesture of the user, the predefined character gesture model is a letter gesture model, and the letter gesture model is used for unique Represents the corresponding letter.
  7. 根据权利要求6所述的方法,其中,所述多个预定义的字符姿态模型为美语手语ASL手势,每一ASL手势用于唯一表示ASL定义的字母。The method according to claim 6, wherein the plurality of predefined character gesture models are American Sign Language ASL gestures, and each ASL gesture is used to uniquely represent a letter defined by ASL.
  8. 根据权利要求1所述的方法,其中,所述方法应用于头戴设备,所述头戴设备包括用于采集所述用户姿态的深度相机;The method according to claim 1, wherein the method is applied to a head-mounted device, and the head-mounted device includes a depth camera for collecting the user's posture;
    相应地,所述获取用户姿态的二维图像和深度图像,包括:所述头戴设备控制所述深度 相机采集所述用户姿态的二维图像和深度图像。Correspondingly, the acquiring the two-dimensional image and the depth image of the user's posture includes: the head-mounted device controls the depth camera to collect the two-dimensional image and the depth image of the user's posture.
  9. 根据权利要求8所述的方法,其中,所述深度相机为飞行时间TOF相机,所述TOF相机的图像传感器的长和宽的尺寸相同,以使所述TOF相机的镜头与所述图像传感器为内接圆关系,从而使得所述图像传感器采集的二维图像和深度图像的有效像素区域为以所述图像传感器中心为圆心的圆形区域。The method according to claim 8, wherein the depth camera is a time-of-flight TOF camera, and the length and width of the image sensor of the TOF camera are the same, so that the lens of the TOF camera and the image sensor are The inscribed circle relationship is such that the effective pixel area of the two-dimensional image and the depth image collected by the image sensor is a circular area with the center of the image sensor as the center of the circle.
  10. 根据权利要求8或9所述的方法,其中,所述深度相机的镜头的视场角范围为[100°,120°]。The method according to claim 8 or 9, wherein the field angle range of the lens of the depth camera is [100°, 120°].
  11. 根据权利要求1所述的方法,其中,所述根据所述二维图像和所述深度图像,构建所述用户的三维姿态模型,包括:The method according to claim 1, wherein the constructing a three-dimensional posture model of the user according to the two-dimensional image and the depth image comprises:
    识别所述二维图像中包含的所述用户姿态的多个关键特征点;Identifying multiple key feature points of the user gesture contained in the two-dimensional image;
    将每一所述关键特征点的像素坐标转换为特定坐标系中的x坐标和y坐标;Converting the pixel coordinates of each key feature point into x-coordinates and y-coordinates in a specific coordinate system;
    从所述深度图像中提取每一所述关键特征点的深度信息;Extracting the depth information of each of the key feature points from the depth image;
    将每一所述关键特征点的深度信息转换为所述特定坐标系中的z坐标;Converting the depth information of each of the key feature points into the z coordinate in the specific coordinate system;
    基于每一所述关键特征点的x、y和z坐标,构建出所述三维姿态模型。Based on the x, y and z coordinates of each key feature point, the three-dimensional posture model is constructed.
  12. 一种打字装置,包括:A typing device, including:
    图像获取模块,用于获取用户姿态的二维图像和深度图像;The image acquisition module is used to acquire two-dimensional images and depth images of the user's posture;
    模型构建模块,用于根据所述二维图像和所述深度图像,构建所述用户的三维姿态模型;A model construction module, configured to construct a three-dimensional posture model of the user according to the two-dimensional image and the depth image;
    字符输出模块,用于对所述三维姿态模型进行分析,输出分析得到的目标字符,以实现无键盘打字功能。The character output module is used to analyze the three-dimensional posture model and output the target characters obtained by the analysis, so as to realize the keyboardless typing function.
  13. 根据权利要求12所述的装置,其中,还包括:The device according to claim 12, further comprising:
    指令接收模块,用于接收启动指令,所述启动指令用于指示启动姿态输入法;The instruction receiving module is used to receive a start instruction, and the start instruction is used to instruct to start the gesture input method;
    启动模块,用于响应于所述启动指令,启动所述姿态输入法。The activation module is used to activate the gesture input method in response to the activation instruction.
  14. 根据权利要求12所述的装置,其中,所述字符输出模块,用于:The device according to claim 12, wherein the character output module is configured to:
    将所述三维姿态模型与多个预定义的字符姿态模型进行匹配,得到目标姿态模型;其中,每一预定义的字符姿态模型用于唯一表示对应的字符;以及Matching the three-dimensional pose model with a plurality of predefined character pose models to obtain a target pose model; wherein each predefined character pose model is used to uniquely represent the corresponding character; and
    将所述目标姿态模型对应的字符作为所述目标字符输出。The character corresponding to the target pose model is output as the target character.
  15. 根据权利要求14所述的装置,其中,所述字符输出模块,用于:The device according to claim 14, wherein the character output module is used for:
    确定所述三维姿态模型与每一所述预定义的字符姿态模型之间的相似度;Determining the similarity between the three-dimensional pose model and each of the predefined character pose models;
    以及将相似度满足特定条件预定义的字符姿态模型,确定为所述目标姿态模型。And determining the pre-defined character pose model whose similarity satisfies a specific condition as the target pose model.
  16. 根据权利要求14所述的装置,其中,所述字符输出模块,用于:The device according to claim 14, wherein the character output module is used for:
    将所述三维姿态模型输入至预先训练得到的分类器,得到与所述三维姿态模型相匹配的 目标姿态模型;Inputting the three-dimensional pose model to a pre-trained classifier to obtain a target pose model matching the three-dimensional pose model;
    其中,所述分类器是利用每一所述预定义的字符姿态模型的样本图像对深度学习模型进行训练得到的。Wherein, the classifier is obtained by training a deep learning model by using sample images of each of the predefined character pose models.
  17. 根据权利要求14至16任一项所述的装置,其中,所述用户姿态为所述用户的手部姿态,所述预定义的字符姿态模型为字母手势模型,所述字母手势模型用于唯一表示对应的字母。The device according to any one of claims 14 to 16, wherein the user gesture is the user's hand gesture, the predefined character gesture model is an alphabet gesture model, and the alphabet gesture model is used to uniquely Represents the corresponding letter.
  18. 根据权利要求17所述的装置,其中,所述多个预定义的字符姿态模型为美语手语ASL手势,每一ASL手势用于唯一表示ASL定义的字母。The device according to claim 17, wherein the plurality of predefined character gesture models are American Sign Language ASL gestures, and each ASL gesture is used to uniquely represent a letter defined by the ASL.
  19. 电子设备,包括存储器和处理器,所述存储器存储有可在处理器上运行的计算机程序,所述处理器执行所述程序时实现权利要求1至11任一项所述打字方法中的步骤。The electronic device includes a memory and a processor, the memory stores a computer program that can run on the processor, and the processor implements the steps in the typing method of any one of claims 1 to 11 when the processor executes the program.
  20. 计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现权利要求1至11任一项所述打字方法中的步骤。A computer-readable storage medium has a computer program stored thereon, and when the computer program is executed by a processor, the steps in the typing method of any one of claims 1 to 11 are realized.
PCT/CN2021/091737 2020-06-24 2021-04-30 Typing method and apparatus, and device and storage medium WO2021258862A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010591802.XA CN111782041A (en) 2020-06-24 2020-06-24 Typing method and device, equipment and storage medium
CN202010591802.X 2020-06-24

Publications (1)

Publication Number Publication Date
WO2021258862A1 true WO2021258862A1 (en) 2021-12-30

Family

ID=72760440

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/091737 WO2021258862A1 (en) 2020-06-24 2021-04-30 Typing method and apparatus, and device and storage medium

Country Status (2)

Country Link
CN (1) CN111782041A (en)
WO (1) WO2021258862A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111782041A (en) * 2020-06-24 2020-10-16 Oppo广东移动通信有限公司 Typing method and device, equipment and storage medium
TWI809538B (en) * 2021-10-22 2023-07-21 國立臺北科技大學 Clearing trajectory positioning system and method combined with augmented reality

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106325509A (en) * 2016-08-19 2017-01-11 北京暴风魔镜科技有限公司 Three-dimensional gesture recognition method and system
CN106503620A (en) * 2016-09-26 2017-03-15 深圳奥比中光科技有限公司 Numerical ciphers input method and its system based on gesture
US20180067197A1 (en) * 2015-04-01 2018-03-08 Iee International Electronics & Engineering S.A. Method and system for real-time motion artifact handling and noise removal for tof sensor images
CN110942479A (en) * 2018-09-25 2020-03-31 Oppo广东移动通信有限公司 Virtual object control method, storage medium, and electronic device
CN111782041A (en) * 2020-06-24 2020-10-16 Oppo广东移动通信有限公司 Typing method and device, equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104317391B (en) * 2014-09-24 2017-10-03 华中科技大学 A kind of three-dimensional palm gesture recognition exchange method and system based on stereoscopic vision
CN106980362A (en) * 2016-10-09 2017-07-25 阿里巴巴集团控股有限公司 Input method and device based on virtual reality scenario
CN109461203B (en) * 2018-09-17 2020-09-29 百度在线网络技术(北京)有限公司 Gesture three-dimensional image generation method and device, computer equipment and storage medium
CN110598556A (en) * 2019-08-12 2019-12-20 深圳码隆科技有限公司 Human body shape and posture matching method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180067197A1 (en) * 2015-04-01 2018-03-08 Iee International Electronics & Engineering S.A. Method and system for real-time motion artifact handling and noise removal for tof sensor images
CN106325509A (en) * 2016-08-19 2017-01-11 北京暴风魔镜科技有限公司 Three-dimensional gesture recognition method and system
CN106503620A (en) * 2016-09-26 2017-03-15 深圳奥比中光科技有限公司 Numerical ciphers input method and its system based on gesture
CN110942479A (en) * 2018-09-25 2020-03-31 Oppo广东移动通信有限公司 Virtual object control method, storage medium, and electronic device
CN111782041A (en) * 2020-06-24 2020-10-16 Oppo广东移动通信有限公司 Typing method and device, equipment and storage medium

Also Published As

Publication number Publication date
CN111782041A (en) 2020-10-16

Similar Documents

Publication Publication Date Title
Kumar et al. A multimodal framework for sensor based sign language recognition
Chakraborty et al. Review of constraints on vision‐based gesture recognition for human–computer interaction
US10438080B2 (en) Handwriting recognition method and apparatus
US10209881B2 (en) Extending the free fingers typing technology and introducing the finger taps language technology
Taylor et al. Type-hover-swipe in 96 bytes: A motion sensing mechanical keyboard
CN106774850B (en) Mobile terminal and interaction control method thereof
TWI471815B (en) Gesture recognition device and method
Agrawal et al. A survey on manual and non-manual sign language recognition for isolated and continuous sign
TWI382352B (en) Video based handwritten character input device and method thereof
WO2021258862A1 (en) Typing method and apparatus, and device and storage medium
Qi et al. Computer vision-based hand gesture recognition for human-robot interaction: a review
US12008159B2 (en) Systems and methods for gaze-tracking
US20230195301A1 (en) Text input method and apparatus based on virtual keyboard
WO2023197648A1 (en) Screenshot processing method and apparatus, electronic device, and computer readable medium
Kumarage et al. Real-time sign language gesture recognition using still-image comparison & motion recognition
JP2016099643A (en) Image processing device, image processing method, and image processing program
Zahra et al. Camera-based interactive wall display using hand gesture recognition
US10593077B2 (en) Associating digital ink markups with annotated content
Robert et al. A review on computational methods based automated sign language recognition system for hearing and speech impaired community
Yang et al. Audio–visual perception‐based multimodal HCI
US11762617B2 (en) Display apparatus, display method, and display system
Verma et al. 7 Machine vision for human–machine interaction using hand gesture recognition
Sabni et al. Laser projection virtual keyboard: a laser and image processing based human-computer interaction device
Chansri et al. Low cost hand gesture control in complex environment using raspberry pi
Atreya et al. Enhancing 3D-Air Signature by Pen Tip Tail Trajectory Awareness: Dataset and Featuring by Novel Spatio-temporal CNN

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21827848

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21827848

Country of ref document: EP

Kind code of ref document: A1