CN112749646A - Interactive point-reading system based on gesture recognition - Google Patents

Interactive point-reading system based on gesture recognition Download PDF

Info

Publication number
CN112749646A
CN112749646A CN202011620981.1A CN202011620981A CN112749646A CN 112749646 A CN112749646 A CN 112749646A CN 202011620981 A CN202011620981 A CN 202011620981A CN 112749646 A CN112749646 A CN 112749646A
Authority
CN
China
Prior art keywords
image
module
gesture
gesture recognition
recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011620981.1A
Other languages
Chinese (zh)
Inventor
黄坚
李慧敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202011620981.1A priority Critical patent/CN112749646A/en
Publication of CN112749646A publication Critical patent/CN112749646A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • User Interface Of Digital Computer (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an interactive click-to-read system based on gesture recognition, which takes an image acquired by a camera in real time as input, preprocesses the input image, performs gesture recognition on a target image containing a hand by using a gesture recognition network, firstly classifies each pixel point in the image by using the gesture recognition network, determines the category of each pixel point so as to obtain a hand cutting image, and performs gesture classification according to the obtained hand cutting image to realize gesture recognition; and transmitting the preprocessed image into an image recognition module, detecting a boundary frame of the object to be recognized, matching the boundary frame with the preprocessed image according to a gesture recognition result, and performing translation and voice reading processing according to a matching result, thereby realizing a point reading function. The system realizes gesture recognition through a gesture recognition network, further realizes an interactive point reading function, and gets rid of dependence of a traditional point reading algorithm on a point reading tool and constraint limitation of an existing interactive point reading system on a recognition object.

Description

Interactive point-reading system based on gesture recognition
Technical Field
The invention relates to gesture recognition, finger tip detection, target detection, character detection, recognition and gesture segmentation, belongs to the field of artificial intelligence, and particularly relates to an interactive click-to-read system based on gesture recognition.
Background
With the continuous development of scientific technology, intelligent education is gradually popularized, for example, a text touch-and-talk pen, a touch-and-talk machine and the like have been widely applied, and currently, several common interactive touch-and-talk systems rely on touch-and-talk equipment (for example, a touch-and-talk machine and a touch-and-talk pen), and the systems directly extract required texts by using the touch-and-talk equipment and perform necessary processing on the texts, so that a touch-and-talk function is realized; some special sensing materials are worn on the finger tips of other systems, so that the finger tips are recognized; some methods that use artificial intelligence to detect the position of the fingertip iteratively and then recognize the text, for example, the patent with application number 201910837914.6, use artificial intelligence to build a finger feature recognition neural network to recognize the fingertip, and this method only recognizes the position of the fingertip and the characters in the area in front of the fingertip; chinese patent No. CN109325464A discloses a character recognition method based on artificial intelligence, which utilizes a pure deep learning algorithm to realize finger point reading, realizes the rapid text recognition and list word searching process, consumes no more than 300ms in the whole process, and greatly improves the point reading efficiency. Some methods realize finger tip detection by assisting with hand key points, and the methods have the following limitations in different degrees:
first, the dependency problem of the dedicated identification device;
secondly, the constraint limitation of the recognition object, namely that some interactive point-and-read systems can only recognize characters, but can not recognize non-text objects;
thirdly, in the process of recognizing characters, there is a certain degree of ambiguity of recognition: for example, after the position information of the fingertips exists, how to extract the text information pointed by the fingertips according to the positions of the fingertips and how to determine the information association among the texts, wherein the positions pointed by the fingertips refer to the character, a word or a segment of the character; how the characters should be processed if the sizes of the characters are different, and the like;
fourthly, the interactive click-to-read semantics are not rich enough, the existing system can only realize the word click-to-read, can not realize intelligent human-computer interaction, and can not meet the requirements of people on intelligence.
Disclosure of Invention
The invention solves the problems: the constraint limits of the prior art described in the above background art are overcome or solved to different degrees, and an interactive touch and talk system based on gesture recognition is provided, which is used for realizing an interactive touch and talk function to meet the demand of people on intellectualization.
The system takes an image acquired by a camera in real time as input, preprocesses the input image, and performs gesture recognition on the image by using a gesture recognition network; and matching the boundary frame of the object to be recognized with the result of gesture recognition according to the boundary frame of the object to be recognized detected by the image recognition module, and performing translation and voice reading processing according to the matching result so as to realize a point reading function.
The technical problem to be solved is as follows: the method overcomes the defect that recognition of gestures and finger fingertips depends on recognition equipment in the prior art, realizes intelligent interaction, and meets the requirement of users on intelligence.
The technical scheme is as follows: an interactive point-and-read system based on gesture recognition, characterized in that the system comprises the following modules: the camera is connected with the image preprocessing module, the image preprocessing module is respectively connected with the gesture recognition module and the image recognition module, the gesture recognition module is connected with the integration module, the image recognition module is connected with the integration module, the integration module is connected with the translation module, and the translation module is connected with the voice module.
The camera is used for acquiring images in real time;
the image preprocessing module is used for preprocessing the image;
the gesture recognition module is used for recognizing the preprocessed image, receiving the output image of the image preprocessing module, segmenting the image by using a gesture recognition network, classifying each pixel point in the image, determining the category of each pixel point so as to obtain the segmented image of the hand, classifying the gesture according to the geometric shape of a hand region in the segmented image, realizing gesture recognition, and performing different subsequent processing on different gesture recognition results;
the image recognition module is used for detecting and positioning the object to be recognized, receiving the image output by the image preprocessing module, detecting the object in the image by using an image recognition algorithm, returning the information of the boundary frame and the label of the object to be recognized in the image, and transmitting the information into the integration module;
the integrated module receives the output of the gesture recognition module and the image recognition module, matches the result of gesture recognition (here, taking a single finger fingertip click-to-read gesture as an example, explaining is carried out, and the coordinates of the finger fingertip are approximately calculated according to the geometric shape of the gesture) with the boundary box of image recognition, completes the bidirectional matching process of the gesture positioning fingertip and the image recognition positioning boundary box, outputs the label information of the detection object, transmits the label information to the translation module, completes the subsequent processing and realizes the click-to-read function;
the translation module: the system is used for translating the label information returned by the detection and integration module into different languages so as to meet different requirements;
the voice module: and reading the result of the translation module.
The image preprocessing module is used for preprocessing the image acquired by the camera, and comprises: image denoising and image zooming, wherein according to an experimental result, the denoising can improve the classification accuracy in gesture recognition; meanwhile, the training and recognition time of the gesture recognition network can be changed by zooming the image to different degrees;
the gesture recognition module is used for recognizing gestures so as to realize subsequent processing; the module receives an output image of the image preprocessing module, divides the image by a gesture recognition network, classifies each pixel point in the image, determines the category of each pixel point, thereby acquiring a hand part cutting image, classifies gestures according to the acquired hand part cutting image, realizes gesture recognition, and performs different subsequent processing on different gestures (enumerating several different gestures in the following steps, explaining different subsequent processing flows), and the module comprises the following steps:
(1) inputting the image output by the image preprocessing module into a gesture recognition network, classifying each pixel point in the image, and determining the category of each pixel point so as to obtain a hand segmentation graph of the image;
(2) dividing the hand part in the step (1), classifying, and recognizing different gestures;
(3) performing subsequent processing according to the gesture recognized in the step (2); here, several different gestures are described, and if the gesture recognition results are:
1) if the single fingertip reads the gesture, the mass center of the gesture shape outline is approximately calculated according to the geometric shape of the gesture, and the coordinates of the finger fingertip are calculated according to the mass center coordinates and the geometric characteristics of the single fingertip reading gesture;
2) if the gesture recognition result is a camera pause/acquisition control gesture, controlling the pause/acquisition of the camera according to the recognition result;
3) if the gesture is matched by multiple fingertips, the subsequent flow processing is realized by correspondingly matching the geometric shape of the gesture with the result of the image recognition module in claim 1;
4) other gestures, according to the gesture result, carry on different treatment;
the image recognition module is used for processing the output of the image preprocessing module and transmitting the detected bounding box and label information into the subsequent integration module, and the purpose of connecting the image preprocessing module with the image recognition module is to solve the possible interference of gestures to the image recognition module in the image recognition process;
the integrated module is used for receiving the gesture recognition result output by the gesture control module and the label information and the boundary box of the object returned by the image recognition module, matching the gesture recognition result (here, taking a single-finger fingertip point-reading gesture as an example to explain a matching process) of the gesture recognition module with the boundary box of the image recognition module, if the fingertip coordinate is successfully matched with the boundary box of the detection object, outputting the label information of the detection object to the translation module, and if the fingertip coordinate is not matched with the boundary box, further prompting the explanation information.
The translation module and the voice module call an open source library;
the invention provides an interactive click-to-read system based on gesture recognition, which takes an image acquired by a camera in real time as input, preprocesses the input image, and performs gesture recognition on a target image containing a hand by using a gesture recognition network; and transmitting the preprocessed image into an image recognition module, detecting a boundary frame of the object to be recognized, matching the boundary frame with the preprocessed image according to a gesture recognition result, and performing translation and voice reading processing according to a matching result, thereby realizing a point reading function. The system realizes gesture recognition through a gesture recognition network, further realizes an interactive point reading function, and gets rid of dependence of a traditional point reading algorithm on a point reading tool and constraint limitation of an existing interactive point reading system on a recognition object.
Compared with the prior art, the invention has the advantages that:
firstly, the dependence problem of special point reading equipment is solved;
secondly, constraint limitation of the recognition object is solved, and point reading of texts and non-texts is realized;
thirdly, the semantics of interactive point reading is expanded, intelligent human-computer interaction is realized according to different gestures, and the requirements of people on intelligence are met to a certain extent.
Fourthly, the expandability is strong, and the system can also be applied to other related fields such as common text recognition, children picture recognition, picture book recognition and the like, and can also be applied to related fields such as finger tip detection and the like based on gesture recognition.
Drawings
FIG. 1 is a schematic diagram of the overall flow principle of the system of the present invention (note: arrows indicating multiple calls to preprocessed images when the preprocessed images are transmitted to a gesture recognition network and an image recognition module after image preprocessing is performed using an OpenCV library);
FIG. 2 is a schematic flow diagram of a gesture recognition network;
FIG. 3 is a network architecture diagram of a gesture recognition network;
FIG. 4 is a schematic flow diagram of an image recognition network;
FIG. 5 is a schematic diagram of the system of the present invention;
fig. 6 shows the raw image (a) and the corresponding gesture segmentation map (b) after OpenCV processing.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, an intelligent interactive touch and talk system based on gesture recognition comprises the following steps:
(1) capturing images with a camera
(2) The method comprises the following steps of performing image compression and denoising on an image acquired by a camera by using an OpenCV algorithm library, and specifically comprises the following steps:
step 1: denoising the image and smoothing the object boundary;
step 2: scaling images to a certain scale, wherein the computation time and recognition precision of the images with different scales transmitted into a gesture recognition network and an image recognition network are different (scaling the images to 800x800 resolution in an experiment)
(3) Classifying each pixel point in the image by using a pre-trained gesture recognition network, determining the category of each pixel point so as to obtain a hand segmentation image, classifying gestures according to the obtained hand segmentation image, realizing gesture recognition, and performing subsequent processing according to a recognition result; here, several different gestures are listed for explanation, and if the gesture recognition result is a camera pause/acquisition control gesture, the camera pause/acquisition is controlled according to the recognition result; if the result of the gesture recognition is a non-control gesture:
1) the single fingertip is clicked and read the gesture, then according to the geometric shape of the gesture, approximately calculate the centroid of the gesture shape outline, according to the centroid coordinate and the geometric characteristics of the single fingertip click and read the gesture to find the coordinates of the finger fingertip, specifically, find the position of the outline centroid to the farthest point of the outline and approximate the centroid to the coordinates of the fingertip, wherein the related formula of finding the Euclidean distance is as follows,
Figure BDA0002874011220000071
transmitting the calculated fingertip coordinates into an integration module;
2) if the camera pauses/acquires the control gesture, pausing/acquiring the camera is controlled according to the recognition result;
3) matching gestures with multiple fingertips, and performing corresponding matching according to the geometric shape of the gestures and the result of an image recognition module in the following steps to realize subsequent flow processing;
4) other gestures are processed differently according to the gesture recognition result;
(4) detecting and positioning the image by using a pre-trained image detection network, returning the information of a boundary frame and a label of a detection object in the image, and transmitting the information into an integrated module;
(5) and matching the gesture recognition result with the image recognition result, outputting label information of the detection object to a translation module if the fingertip coordinate is successfully matched with the boundary box of the detection object, and further prompting description information if the fingertip coordinate is not matched with the boundary box.
(6) Translating and reading the matched label text information in the step (5) in a voice mode
Referring to fig. 2, the process of the gesture recognition neural network includes the following steps:
step 1: acquiring training data by photographing, zooming an original image into an image (original image) with the resolution of 800X800 after image preprocessing, manually marking and processing to obtain a gesture segmentation image of the image, and taking the zoomed image and the gesture segmentation image as input images;
step 2: and (3) performing data enhancement operation on the input image in Step1 to expand the input sample image of the network, and specifically, introducing a data enhancement technology used in part:
1) data flipping
Horizontally and vertically overturning an input image by using an OpenCV (open circuit vehicle library);
2) translation
Performing image translation operation on input images by using OpenCV (open circuit computer vision library)
Step 3: and (3) constructing a gesture recognition network, transmitting the input image after Step2 into the gesture recognition network, and performing network training, wherein a schematic diagram of a network structure is shown in FIG. 3.
Step 4: the gesture is recognized by the gesture recognition network, the gradient of the original image and the segmented image is solved by calculating a loss function between the original image and the segmented image, the gradient is reversely propagated to update network parameters, the weight parameters of the deep convolutional neural network are continuously trained through samples in Step2, after the network is stable, corresponding parameters of the network are obtained, and training is completed.
Further, the image recognition network flow is similar, and comprises the following steps:
1) training of non-textual object networks, see fig. 4:
step 1: collecting training images (containing partial COCO data sets), and manually marking samples; starting with a data set COCO profile, the data set contains several categories: including person, bicycle, car, motorbike, aeroplane, bus, cat, dog, kite, etc., as described in detail in the official website (http:// cococataset.org);
step 2: and (4) making a data set of the user for training according to the gesture recognition network. Experiments show that when the intelligent point-and-read equipment is used for recognizing the children picture books, the sample type is different from the characteristic style in the COCO data set, and the data set of the intelligent point-and-read equipment needs to be manufactured and trained according to the gesture recognition network. The difference is that the input image of the COCO dataset does not need to be divided into images, but the bounding box information and the label class information of the object to be identified need to be added into the label data.
Step 3: and (5) constructing an image recognition network and carrying out network training. Image recognition module is accessed after training is finished
2) Training of text object network:
after multiple experiments, when a self-made data set training text detection and identification network is realized based on the existing neural networks such as CRNN and HigherHRNet, a better experiment effect cannot be obtained due to the constraint of the existing conditions (such as hardware equipment GPU) and the like, so that after a character boundary box is detected, an open source OCR character recognition library tesseract-OCR is used for character recognition, and the method specifically comprises the following steps:
step 1: self-making a data set, converting the training image into a tif format, and converting the tif format into a box file according to requirements;
step 2: opening a jTessBoxEditor tool in tesseract-ocr, opening a training image and performing position correction;
step 3: training according to the tesseract-ocr requirement, and accessing the image recognition module after training;
step 4: transmitting the detected text bounding box information into tesseract recognition;
referring to fig. 5, an intelligent interactive touch-and-talk system based on gesture recognition is characterized by comprising the following modules: the system comprises the following modules: the camera is connected with the image preprocessing module, the image preprocessing module is connected with the gesture recognition module and the image recognition module, the gesture recognition module is connected with the integration module, the image recognition module is connected with the integration module, the integration module is connected with the translation module, and the translation module is connected with the voice module.
The camera is used for acquiring images in real time;
the image preprocessing module is used for preprocessing the image;
the gesture recognition module is used for recognizing the processed image, receiving the input image of the image preprocessing module, segmenting the image by using a gesture recognition network, classifying each pixel point in the image, determining the category of each pixel point, acquiring the shape and contour division of a hand region, classifying gestures according to the acquired shape of the hand region, realizing gesture recognition, and performing different subsequent processing on different gestures;
the image recognition module is used for detecting and positioning the object to be recognized, receiving the image output by the image preprocessing module, detecting the boundary frame and the label information of the object to be recognized in the image returned by the object in the image by utilizing an image recognition algorithm, and transmitting the boundary frame and the label information into the integration module;
the integrated module receives the input of the gesture recognition module and the image recognition module, controls the acquisition and pause of the camera according to the gesture recognition result, performs coordinate matching with the image recognition boundary box according to the gesture recognition result (here, if the gesture is a single finger fingertip point reading gesture, the coordinates of the finger fingertip are approximately calculated according to the geometric shape of the gesture), completes the bidirectional matching process of the gesture positioning fingertip and the image recognition positioning boundary box, outputs the label information of the detected object, transmits the label information into the translation module, completes the subsequent processing and realizes the point reading function;
the translation module: the system is used for translating the label information returned by the detection and integration module into different languages so as to meet different requirements;
the voice module: and reading the result of the translation module.
As shown in fig. 6, wherein (a) is an acquired image of a certain frame, an original image after OpenCV processing, and (b) is a gesture segmentation image corresponding to (a) and output by the neural network;
the method for reading by point based on gesture recognition has the following advantages:
firstly, a method for identifying fingertips by training a neural network by using a fingertip sample, which is different from the method in the background art, is realized;
secondly, for single-fingertip recognition, as detailed in the single-fingertip point-reading gesture, the coordinate position of the single fingertip can be conveniently calculated by using the geometric characteristics of the gesture shown in (b) in fig. 6;
thirdly, the gesture-based touch and talk method expands rich semantics of interactive touch and talk, corresponding touch and talk, control and the like can be realized according to different gestures, intelligent human-computer interaction can be better realized, and the requirements of people on intelligence are met.
The above examples are provided for the purpose of describing the present invention only, and are not intended to limit the scope of the present invention. The scope of the invention is defined by the appended claims. Various equivalent substitutions and modifications can be made without departing from the spirit and principles of the invention, and are intended to be within the scope of the invention.

Claims (6)

1. An interactive point-and-read system based on gesture recognition, characterized in that the system comprises the following modules: the camera is connected with the image preprocessing module, the image preprocessing module is connected with the gesture recognition module and the image recognition module, the gesture recognition module is connected with the integration module, the image recognition module is connected with the integration module, the integration module is connected with the translation module, and the translation module is connected with the voice module.
The camera is used for acquiring images in real time;
the image preprocessing module is used for preprocessing the image;
the gesture recognition module is used for recognizing the preprocessed image, receiving the input image of the image preprocessing module, segmenting the image by using a gesture recognition network, classifying each pixel point in the image, determining the category of each pixel point so as to obtain a hand segmentation image, classifying gestures according to the obtained hand segmentation image, realizing gesture recognition, and performing different subsequent processing on different gestures;
the image recognition module is used for detecting and positioning an object to be recognized, receiving the image output by the image preprocessing module, detecting the boundary frame and the label information (including object category information) of the object to be recognized in the image returned by the object in the image by utilizing an image recognition algorithm, and transmitting the boundary frame and the label information into the integration module;
the integrated module receives the input of the gesture recognition module and the image recognition module, matches the input according to a gesture recognition result and an image recognition result, completes the bidirectional matching process of the gesture recognition result and the image recognition positioning boundary box, outputs the label information of the detection object, and transmits the label information to the translation module to complete the subsequent processing so as to realize the point reading function;
the translation module: the system is used for translating the label information returned by the detection and integration module into different languages so as to meet different requirements;
the voice module: and reading the result of the translation module.
2. The gesture recognition based interactive point-and-read system of claim 1, wherein: the image preprocessing module is used for preprocessing the image acquired by the camera, and comprises: image denoising and image zooming, wherein according to an experimental result, the denoising can improve the classification accuracy in gesture recognition; meanwhile, the images are zoomed to different degrees, so that the training and recognition time of the gesture recognition network can be changed.
3. The gesture recognition based interactive point-and-read system of claim 1, wherein: the gesture recognition module is used for realizing gesture recognition and further realizing subsequent processing, receives an output image of the image preprocessing module, divides the image by using a gesture recognition network, classifies each pixel point in the image, determines the category of each pixel point, acquires a hand part cutting image, performs gesture classification according to the acquired hand part cutting image, realizes gesture recognition, and performs different subsequent processing on different gesture recognition results, and specifically comprises the following steps:
(1) inputting the image transmitted by the image processing module into a gesture recognition network, classifying each pixel point in the image, and determining the category of each pixel point so as to obtain a hand segmentation image;
(2) dividing the hand part in the step (1), classifying, and recognizing different gestures according to the geometric shape of the outline of the hand region;
(3) performing subsequent processing by using the gesture recognized in the step (2); here, several different gestures are described, and if the gesture recognition results are:
1) if the single fingertip reads the gesture, the mass center of the gesture shape outline is approximately calculated according to the geometric shape of the gesture, and the coordinates of the finger fingertip are calculated according to the mass center coordinates and the geometric characteristics of the single fingertip reading gesture;
2) if the gesture recognition result is a camera pause/acquisition control gesture, controlling the pause/acquisition of the camera according to the recognition result;
3) if the gesture is matched by multiple fingertips, the subsequent flow processing is realized by correspondingly matching the geometric shape of the gesture with the result of the image recognition module in claim 1;
4) and other gestures are processed differently according to gesture results.
4. The gesture recognition based interactive point-and-read system of claim 1, wherein: the image recognition module is used for processing the output of the image preprocessing module, detecting an object in the image, and transmitting the detected boundary frame and the label information into the subsequent integration module; the purpose of connecting the image preprocessing module with the image recognition module is to solve the possible interference of gestures to the image recognition module in the image recognition process.
5. The gesture recognition based interactive point-and-read system of claim 1, wherein: the integrated module is used for receiving the gesture recognition result output by the gesture control module and the label information and the boundary box of the object returned by the image recognition module, matching the gesture recognition result of the gesture recognition module with the boundary box of the image recognition module, outputting the label information of the detected object to the translation module if the fingertip coordinate is successfully matched with the boundary box of the detected object, and further prompting description information if the fingertip coordinate is not matched with the boundary box.
6. The gesture recognition based interactive point-and-read system of claim 1, wherein: and the translation module and the voice module call an open source library.
CN202011620981.1A 2020-12-30 2020-12-30 Interactive point-reading system based on gesture recognition Pending CN112749646A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011620981.1A CN112749646A (en) 2020-12-30 2020-12-30 Interactive point-reading system based on gesture recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011620981.1A CN112749646A (en) 2020-12-30 2020-12-30 Interactive point-reading system based on gesture recognition

Publications (1)

Publication Number Publication Date
CN112749646A true CN112749646A (en) 2021-05-04

Family

ID=75650271

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011620981.1A Pending CN112749646A (en) 2020-12-30 2020-12-30 Interactive point-reading system based on gesture recognition

Country Status (1)

Country Link
CN (1) CN112749646A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113936233A (en) * 2021-12-16 2022-01-14 北京亮亮视野科技有限公司 Method and device for identifying finger-designated target
CN114648756A (en) * 2022-05-24 2022-06-21 之江实验室 Book character recognition and reading method and system based on pointing vector
WO2023272604A1 (en) * 2021-06-30 2023-01-05 东莞市小精灵教育软件有限公司 Positioning method and apparatus based on biometric recognition
WO2023283934A1 (en) * 2021-07-16 2023-01-19 Huawei Technologies Co.,Ltd. Devices and methods for gesture-based selection

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120249422A1 (en) * 2011-03-31 2012-10-04 Smart Technologies Ulc Interactive input system and method
CN104157171A (en) * 2014-08-13 2014-11-19 三星电子(中国)研发中心 Point-reading system and method thereof
CN109325464A (en) * 2018-10-16 2019-02-12 上海翎腾智能科技有限公司 A kind of finger point reading character recognition method and interpretation method based on artificial intelligence
CN110443231A (en) * 2019-09-05 2019-11-12 湖南神通智能股份有限公司 A kind of fingers of single hand point reading character recognition method and system based on artificial intelligence
CN111353501A (en) * 2020-02-25 2020-06-30 暗物智能科技(广州)有限公司 Book point-reading method and system based on deep learning
CN111597969A (en) * 2020-05-14 2020-08-28 新疆爱华盈通信息技术有限公司 Elevator control method and system based on gesture recognition
CN112052724A (en) * 2020-07-23 2020-12-08 深圳市玩瞳科技有限公司 Finger tip positioning method and device based on deep convolutional neural network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120249422A1 (en) * 2011-03-31 2012-10-04 Smart Technologies Ulc Interactive input system and method
CN104157171A (en) * 2014-08-13 2014-11-19 三星电子(中国)研发中心 Point-reading system and method thereof
CN109325464A (en) * 2018-10-16 2019-02-12 上海翎腾智能科技有限公司 A kind of finger point reading character recognition method and interpretation method based on artificial intelligence
CN110443231A (en) * 2019-09-05 2019-11-12 湖南神通智能股份有限公司 A kind of fingers of single hand point reading character recognition method and system based on artificial intelligence
CN111353501A (en) * 2020-02-25 2020-06-30 暗物智能科技(广州)有限公司 Book point-reading method and system based on deep learning
CN111597969A (en) * 2020-05-14 2020-08-28 新疆爱华盈通信息技术有限公司 Elevator control method and system based on gesture recognition
CN112052724A (en) * 2020-07-23 2020-12-08 深圳市玩瞳科技有限公司 Finger tip positioning method and device based on deep convolutional neural network

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023272604A1 (en) * 2021-06-30 2023-01-05 东莞市小精灵教育软件有限公司 Positioning method and apparatus based on biometric recognition
WO2023283934A1 (en) * 2021-07-16 2023-01-19 Huawei Technologies Co.,Ltd. Devices and methods for gesture-based selection
CN113936233A (en) * 2021-12-16 2022-01-14 北京亮亮视野科技有限公司 Method and device for identifying finger-designated target
CN114648756A (en) * 2022-05-24 2022-06-21 之江实验室 Book character recognition and reading method and system based on pointing vector

Similar Documents

Publication Publication Date Title
WO2020078017A1 (en) Method and apparatus for recognizing handwriting in air, and device and computer-readable storage medium
US20210271862A1 (en) Expression recognition method and related apparatus
CN112749646A (en) Interactive point-reading system based on gesture recognition
Kumar et al. Sign language recognition
Kausar et al. A survey on sign language recognition
Pan et al. Real-time sign language recognition in complex background scene based on a hierarchical clustering classification method
Kadhim et al. A Real-Time American Sign Language Recognition System using Convolutional Neural Network for Real Datasets.
US20080008387A1 (en) Method and apparatus for recognition of handwritten symbols
Wang et al. Sparse observation (so) alignment for sign language recognition
CN103092329A (en) Lip reading technology based lip language input method
Hemayed et al. Edge-based recognizer for Arabic sign language alphabet (ArS2V-Arabic sign to voice)
CN102930270A (en) Method and system for identifying hands based on complexion detection and background elimination
Makhmudov et al. Improvement of the end-to-end scene text recognition method for “text-to-speech” conversion
JP2017084349A (en) Memory with set operation function and method for set operation processing using the memory
Shabir et al. Real-time pashto handwritten character recognition using salient geometric and spectral features
Geetha et al. Dynamic gesture recognition of Indian sign language considering local motion of hand using spatial location of Key Maximum Curvature Points
Patil et al. Literature survey: sign language recognition using gesture recognition and natural language processing
CN113220125A (en) Finger interaction method and device, electronic equipment and computer storage medium
Sharma et al. Highly Accurate Trimesh and PointNet based algorithm for Gesture and Hindi air writing recognition
Nahar et al. A robust model for translating arabic sign language into spoken arabic using deep learning
Robert et al. A review on computational methods based automated sign language recognition system for hearing and speech impaired community
Li et al. A novel art gesture recognition model based on two channel region-based convolution neural network for explainable human-computer interaction understanding
Axyonov et al. Method of multi-modal video analysis of hand movements for automatic recognition of isolated signs of Russian sign language
Geetha et al. A 3D stroke based representation of sign language signs using key maximum curvature points and 3D chain codes
Nguyen et al. Vietnamese sign language reader using Intel Creative Senz3D

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210504

WD01 Invention patent application deemed withdrawn after publication