CN111753715B - Method and device for shooting test questions in click-to-read scene, electronic equipment and storage medium - Google Patents

Method and device for shooting test questions in click-to-read scene, electronic equipment and storage medium Download PDF

Info

Publication number
CN111753715B
CN111753715B CN202010581452.9A CN202010581452A CN111753715B CN 111753715 B CN111753715 B CN 111753715B CN 202010581452 A CN202010581452 A CN 202010581452A CN 111753715 B CN111753715 B CN 111753715B
Authority
CN
China
Prior art keywords
image
fingertip
preview image
carrier
preview
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010581452.9A
Other languages
Chinese (zh)
Other versions
CN111753715A (en
Inventor
赵华
史云奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Genius Technology Co Ltd
Original Assignee
Guangdong Genius Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Genius Technology Co Ltd filed Critical Guangdong Genius Technology Co Ltd
Priority to CN202010581452.9A priority Critical patent/CN111753715B/en
Publication of CN111753715A publication Critical patent/CN111753715A/en
Application granted granted Critical
Publication of CN111753715B publication Critical patent/CN111753715B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/63Control of cameras or camera modules by using electronic viewfinders
    • H04N23/631Graphical user interfaces [GUI] specially adapted for controlling image capture or setting capture parameters
    • H04N23/632Graphical user interfaces [GUI] specially adapted for controlling image capture or setting capture parameters for displaying or modifying preview images prior to image capturing, e.g. variety of image resolutions or capturing parameters

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Biophysics (AREA)
  • Signal Processing (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The embodiment of the invention discloses a method and a device for shooting test questions in a click-through scene, electronic equipment and a storage medium. The method comprises the following steps: starting a first image acquisition device and a second image acquisition device to aim at a carrier when the electronic equipment is in a click-to-read scene, and respectively acquiring a first preview image and a second preview image, wherein the carrier image obtained by the first preview image does not comprise gestures, and the carrier image obtained by the second preview image is superimposed with suspension gestures; recognizing the gesture in the second preview image to obtain a first fingertip coordinate; converting the first fingertip coordinates into a first preview image in an affine transformation mode to obtain second fingertip coordinates, and displaying the second fingertip coordinates in the first preview image in real time; adjusting the first fingertip coordinates so that the second fingertip coordinates are located in the target area; and receiving a user intention instruction and determining a user intention image. By implementing the embodiment of the invention, the purpose of taking pictures is realized without needing to go to the finger.

Description

Method and device for shooting test questions in click-to-read scene, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of image processing, in particular to a method and a device for shooting test questions in a click-to-read scene, electronic equipment and a storage medium.
Background
At present, many electronic teaching auxiliary equipment mostly has a click-to-read scene, and when the click-to-read scene means that a user points to a supporting body such as a book, an exercise book or a test paper through fingers, the teaching auxiliary equipment can shoot the supporting body through an image acquisition device and identify the position of the fingers, so that the intention of the user is determined according to the position of the fingers, and further an image corresponding to the intention of the user is obtained and is used for displaying, searching questions or recording questions and the like, wherein the searching questions can be searching answers, searching readings or semantics and the like. Because the existing teaching auxiliary equipment needs fingers to directly contact with a page, in order to ensure shooting effect and follow-up operation, finger removal operation is generally needed, namely, the fingers in the shot picture are removed, but the fingers can shield part of characters in many cases, and the finger removal operation is not ideal in effect.
Disclosure of Invention
Aiming at the defects, the embodiment of the invention discloses a method, a device, electronic equipment and a storage medium for shooting test questions in a click-to-read scene, which can realize the purpose of shooting intention without finger operation.
The first aspect of the embodiment of the invention discloses a method for shooting test questions in a click-to-read scene, which comprises the following steps:
Starting a first image acquisition device and a second image acquisition device to aim at a carrier when the electronic equipment is in a click-to-read scene, and respectively acquiring a first preview image and a second preview image, wherein the carrier image obtained by the first preview image does not comprise gestures, and the carrier image obtained by the second preview image is superimposed with suspension gestures;
Recognizing the gesture in the second preview image to obtain a first fingertip coordinate;
converting the first fingertip coordinates into a first preview image in an affine transformation mode to obtain second fingertip coordinates, and displaying the second fingertip coordinates in the first preview image in real time;
Adjusting the first fingertip coordinates so that the second fingertip coordinates are in a target area;
and receiving a user intention instruction and determining a user intention image.
In a first aspect of the embodiment of the present invention, the identifying the gesture in the second preview image to obtain the first fingertip coordinate includes:
and identifying the fingertip in the second preview image by using a skin color segmentation method or a fingertip identification model based on machine learning to obtain a first fingertip coordinate.
In a first aspect of the embodiment of the present invention, the first image capturing device and the second image capturing device are activated to align with the carrier, and respectively obtain the first preview image and the second preview image, and then further include:
And carrying out gesture recognition on the carrier image obtained by the first preview image, and when fingertips are recognized in the carrier image obtained by the first preview image, sending out an interaction instruction to remind a user to adjust the position or/and the height of the suspension gesture.
In a first aspect of the embodiment of the present invention, the first image capturing device and the second image capturing device are activated to align with the carrier, and respectively obtain the first preview image and the second preview image, and then further include:
Displaying the first preview image and the second preview image in a display screen of the electronic device by adopting a split screen technology;
the displaying the second pointer coordinates in the first preview image in real time includes:
Endowing the second pointer coordinates with preset RGB values to obtain second pointer coordinate images;
and amplifying the second pointer coordinate image by a preset multiple, and displaying the amplified second pointer coordinate image in the first preview image in real time.
As an optional implementation manner, in the first aspect of the embodiment of the present invention, the receiving a user intention instruction, determining a user intention image includes:
receiving a user intention instruction, photographing a carrier through a first image acquisition device and a second image acquisition device, and respectively acquiring a first image and a second image;
Recognizing the gesture in the second image to obtain a third fingertip coordinate;
Converting the third fingertip coordinate into the first image by utilizing an affine transformation mode to obtain a fourth fingertip coordinate;
Inputting a first image into a corresponding pre-trained text detection model according to the user intention to obtain text information corresponding to the first image;
and determining an intention image according to the fourth fingertip coordinates, the text information and the preset rule.
The second aspect of the embodiment of the invention discloses a device for shooting test questions in a click-to-read scene, which comprises:
the electronic equipment comprises a preview unit, a first image acquisition device and a second image acquisition device, wherein the preview unit is used for starting the first image acquisition device and the second image acquisition device to aim at a carrier when the electronic equipment is in a click-to-read scene, and respectively acquiring a first preview image and a second preview image, wherein the carrier image obtained by the first preview image does not comprise gestures, and suspension gestures are superposed in the carrier image obtained by the second preview image;
the recognition unit is used for recognizing the gesture in the second preview image to obtain a first fingertip coordinate;
The transformation unit is used for transforming the first fingertip coordinates into the first preview image in an affine transformation mode to obtain second fingertip coordinates, and displaying the second fingertip coordinates in the first preview image in real time;
the adjusting unit is used for adjusting the first fingertip coordinates so that the second fingertip coordinates are in the target area;
and the photographing unit is used for receiving the user intention instruction and determining a user intention image.
As an optional implementation manner, in a second aspect of the embodiment of the present invention, the identifying unit includes:
and identifying the fingertip in the second preview image by using a skin color segmentation method or a fingertip identification model based on machine learning to obtain a first fingertip coordinate.
As an optional implementation manner, in the second aspect of the embodiment of the present invention, the apparatus further includes:
And the judging unit is used for carrying out gesture recognition on the carrier image obtained by the first preview image, and when the fingertip is recognized in the carrier image obtained by the first preview image, an interaction instruction is sent out to remind a user to adjust the position or/and the height of the suspension gesture.
As an optional implementation manner, in the second aspect of the embodiment of the present invention, the apparatus further includes:
a display unit, configured to display the first preview image and the second preview image on a display screen of the electronic device by using a split screen technology;
The transformation unit includes:
The assignment subunit is used for assigning the preset RGB value to the second pointer coordinate to obtain a second pointer coordinate image;
and the amplifying subunit is used for amplifying the second pointer coordinate image by a preset multiple and displaying the amplified second pointer coordinate image in the first preview image in real time.
As an optional implementation manner, in a second aspect of the embodiment of the present invention, the photographing unit includes:
The receiving subunit is used for receiving the user intention instruction, photographing the carrier through the first image acquisition device and the second image acquisition device, and respectively acquiring a first image and a second image;
the recognition subunit is used for recognizing the gesture in the second image to obtain a third fingertip coordinate;
A transformation subunit, configured to transform the third fingertip coordinate into the first image by using an affine transformation manner, to obtain a fourth fingertip coordinate;
the detection subunit is used for inputting a first image into a corresponding pre-trained text detection model according to the user intention to obtain text information corresponding to the first image;
And the determining subunit is used for determining the intention image according to the fourth fingertip coordinates, the text information and the preset rule.
A third aspect of an embodiment of the present invention discloses an electronic device, including: a memory storing executable program code; a processor coupled to the memory; the processor invokes the executable program code stored in the memory to execute part or all of the steps of the method for shooting the test questions in the click-to-read scene disclosed in the first aspect of the embodiment of the invention.
A fourth aspect of the embodiment of the present invention discloses a computer-readable storage medium storing a computer program, where the computer program causes a computer to execute some or all of the steps of a method for capturing a test question in a point-to-read scene disclosed in the first aspect of the embodiment of the present invention.
A fifth aspect of the embodiments of the present invention discloses a computer program product, which when run on a computer, causes the computer to execute part or all of the steps of a method for capturing a test question in a point-to-read scenario disclosed in the first aspect of the embodiments of the present invention.
A sixth aspect of the embodiment of the present invention discloses an application publishing platform, where the application publishing platform is configured to publish a computer program product, where when the computer program product runs on a computer, the computer is caused to execute part or all of the steps of a method for shooting a test question in a click-to-read scenario disclosed in the first aspect of the embodiment of the present invention.
Compared with the prior art, the embodiment of the invention has the following beneficial effects:
in the embodiment of the invention, when the electronic equipment is in a click-to-read scene, a first image acquisition device and a second image acquisition device are started to aim at a carrier, a first preview image and a second preview image are respectively acquired, wherein the carrier image obtained by the first preview image does not comprise gestures, and suspension gestures are superposed in the carrier image obtained by the second preview image; recognizing the gesture in the second preview image to obtain a first fingertip coordinate; converting the first fingertip coordinates into a first preview image in an affine transformation mode to obtain second fingertip coordinates, and displaying the second fingertip coordinates in the first preview image in real time; adjusting the first fingertip coordinates so that the second fingertip coordinates are in a target area; and receiving a user intention instruction and determining a user intention image. Therefore, by implementing the embodiment of the invention, the finger-free picture and the corresponding intention coordinate can be obtained by combining the two image acquisition devices through the suspension gesture in the click-to-read scene, so that the intention image is obtained according to the finger-free picture and the intention coordinate and is used for subsequent operation.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a method for shooting test questions in a click-to-read scene according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a device for shooting test questions in a click-to-read scene according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that the terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present invention are used for distinguishing between different objects and not necessarily for describing a particular sequential or chronological order. The terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, apparatus, article, or device that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or device.
The embodiment of the invention discloses a method, a device, electronic equipment and a storage medium for shooting test questions in a point-reading scene, which can obtain a finger-free picture and corresponding intention coordinates by combining two image acquisition devices through a suspension gesture in the point-reading scene, so that an intention image is obtained according to the finger-free picture and the intention coordinates and is used for subsequent operation, and the method, the device, the electronic equipment and the storage medium are described in detail below with reference to drawings.
Example 1
Referring to fig. 1, fig. 1 is a flow chart of a method for shooting test questions in a click-to-read scene according to an embodiment of the present invention. As shown in fig. 1, the method for shooting the test questions in the click-to-read scene comprises the following steps:
110. And starting the first image acquisition device and the second image acquisition device to align with the carrier when the electronic equipment is in a click-to-read scene, and respectively acquiring a first preview image and a second preview image, wherein the carrier image obtained by the first preview image does not comprise gestures, and the carrier image obtained by the second preview image is superimposed with suspension gestures.
The electronic equipment can be intelligent equipment such as a home teaching machine, a learning machine, a mobile phone with a learning function or a tablet personal computer. And starting a corresponding point reading APP (for example, a question searching APP or a question recording APP) under the point reading scene. In the click-to-read scene, the image acquisition device can be automatically started to obtain the preview image, or the user can trigger the image acquisition device to be started to obtain the preview image in the click-to-read scene.
The first image capturing device and the second image capturing device are photographing devices integrated on the electronic device, such as a front camera, for photographing a carrier placed on the front side of the electronic device, or may be discrete components for photographing a carrier placed near the electronic device.
The carrier is placed at the position where the first image acquisition device and the second image acquisition device are aligned, and in general, for example, when the front camera is used for entity shooting, the first image acquisition device and the second image acquisition device are aligned to the carrier by setting the position of the front camera during manufacturing and placing the carrier in front of the electronic device.
The first image acquisition device is lower than the second image acquisition device in height, and in this case, when a user uses a hanging gesture to take a picture, a finger is placed in an area between the first image acquisition device and the second image acquisition device, so that the gesture is not included in a carrier image obtained by the first image acquisition device, and the gesture is included in the carrier image obtained by the second image acquisition device with high probability. Obviously, if a user directly places a finger on the carrier to take a picture, the first preview image and the second preview image must both contain the finger, so that only the hanging gesture can meet the requirement.
The supporting body is a book, exercise book and other paper learning documents, and the user determines the target position through gestures, so that the user intention is realized through photographing, for example, searching questions or collecting questions through intention images, and the like.
The method for determining that the carrier image obtained by the first preview image does not comprise gestures and the carrier image obtained by the second preview image is overlapped with suspension gestures is realized in a gesture recognition mode. For example, gestures (i.e., fingers) in the first preview image and the second preview image may be identified by skin tone segmentation. Firstly, converting a first preview image and a second preview image from an RGB color space to a YCbCr color space or an HSV color space, wherein the conversion aim is that the skin color of the RGB color space is easily influenced by illumination and is not easily separated; whether the finger is included is then identified by a skin tone detection model, which may be thresholding or a single Gaussian model, or the like.
The finger recognition can be realized based on a machine learning mode, the machine learning model is trained through a large number of pictures with fingers and pictures without fingers, the finger recognition model is obtained, the probability of the fingers is included as a result of training, the first preview image and the second preview image are input into the finger recognition model, and whether the first preview image and the second preview image comprise gestures or not can be determined. The machine learning model includes, but is not limited to, a fully connected neural network model, a convolutional neural network model, a recurrent neural network model, a capsule network model, and the like.
In the embodiment of the present invention, whether or not the carrier images of the first preview image and the second preview image include a gesture is not considered in the case of being out of the carrier images. Therefore, before the gesture recognition, the frame of the carrier image is further recognized, the recognition method can be obtained through edge detection, such as canny edge detection or hough transform linear detection, and a more accurate frame can be obtained through a machine learning mode, so that a more accurate carrier image is obtained. After the frame of the carrier image is determined, the image in the frame, namely the carrier image, is obtained based on an edge segmentation method.
If the finger is identified in the carrier image in the first preview image, sending a corresponding interaction instruction to the user through the electronic equipment, for example, reminding the user of the position or/and the height of the hanging gesture in a voice mode; similarly, if no finger is recognized in the carrier image in the second preview image, a corresponding interaction instruction is also issued.
In order to enable a user to more intuitively determine whether the first preview image does not include gestures or not, the second preview image includes gestures, the first preview image and the second preview image can be displayed in real time in the electronic device, specifically, the preview images acquired by the first image acquisition device and the second image acquisition device are displayed on a display screen of the electronic device through a split screen technology, and therefore, the user can also make gesture adjustment according to the observed images.
120. And recognizing the gesture in the second preview image to obtain a first fingertip coordinate.
On the premise of determining that the gesture is contained in the carrier image of the second preview image, the fingertip position can be determined. The projection of the suspension gesture is overlapped on the carrier image and then is displayed in the carrier image, and the projection can be further extended by the gesture recognition method to obtain a first fingertip coordinate.
Illustratively, the fingers in the second preview image may be identified by skin tone segmentation. Firstly, converting a second preview image from an RGB color space to a YCbCr color space or an HSV color space, wherein the conversion aim is that the skin color of the RGB color space is easily influenced by illumination and is not easily separated; then, the finger outline is determined through a skin color detection model, and further the fingertip position is determined, wherein the skin color detection model can be a threshold value limiting method or a single Gaussian model method.
Finger recognition can be realized based on a machine learning mode, finger positions are manually marked through a large number of pictures with fingers, a machine learning model is trained to obtain a finger tip recognition model, a second preview image is input into the finger tip recognition model, and the finger tip positions in the second preview image can be determined. The machine learning model includes, but is not limited to, a fully connected neural network model, a convolutional neural network model, a recurrent neural network model, a capsule network model, and the like.
130. And converting the first fingertip coordinates into the first preview image by utilizing an affine transformation mode to obtain second fingertip coordinates, and displaying the second fingertip coordinates in the first preview image in real time.
Under the condition that the positions of the first image acquisition device and the second image acquisition device are determined, the coordinates on the images obtained by the first image acquisition device and the second image acquisition device can be converted through a fixed transformation matrix, the determination mode of the transformation matrix can be realized through a sample, a plurality of fixed points are marked in the sample, the fixed points can be in the fields of view of the first image acquisition device and the second image acquisition device, then the coordinates of the points in a first preview image obtained by the first image acquisition device are determined, the coordinates of the points in a second preview image obtained by the second image acquisition device are determined, and the transformation matrix is obtained through fitting by a least square method or SVM.
After the transformation matrix is obtained, the first fingertip coordinates can be converted into the first preview image, the second fingertip coordinates corresponding to the first fingertip coordinates in the first preview image are obtained, and meanwhile, the second fingertip coordinates can be displayed in the first preview image displayed in a split screen mode.
For example, the display method may be to assign a preset RGB value (for example, red) to the second pointer coordinate, obtain a second pointer coordinate image, then zoom in a preset multiple on the second pointer coordinate image, and display the zoomed second pointer coordinate image in the first preview image in real time. Of course, the second pointer coordinate image can be displayed in the first preview image in real time at different preset magnifications by a fixed interval time, so that the second pointer coordinate image is in a flickering shape.
140. And adjusting the first fingertip coordinates so that the second fingertip coordinates are in the target region.
The first fingertip coordinates are adjusted by adjusting the position or/and the height of the hover gesture. After the new first fingertip coordinates are obtained, the position of the second fingertip coordinates can be obtained through affine transformation and displayed. In order to prevent frequent movements of the finger by the user during the adjustment, in the embodiment of the present invention, the first fingertip coordinates are determined and affine transformation is performed to determine the second fingertip coordinates when the residence time of the first fingertip coordinates is greater than or equal to a preset time.
The target area is what the user intends to select, for example, if the user is used for searching for pronunciation, the target area is a word or a word, the second pointer coordinate may be below the target area or on the target area, specifically according to a preset rule, and the user may be a question type user through a voice interaction mode, for example, "please place a blinking dot below the word or on the word". If the user searches answers or the questions are recorded, the target area is a certain question, and the second pointer coordinate can be below the target area or on the target area.
150. And receiving a user intention instruction and determining a user intention image.
After the user locates the second pointer coordinate in the target area, the user intention instruction can be sent out, and the user intention instruction can be a voice interaction instruction or other instructions. The final intended image can be obtained by the instruction, and then the corresponding operation is performed based on the intended image.
The process is similar to steps 110-140 except that the previous preview image is replaced with a photographic image. After an intention instruction of a user is received, a first image acquisition device and a second image acquisition device are used for photographing a carrier body to obtain a first image and a second image, the carrier body images in the first image and the second image do not contain gestures and contain gestures respectively, then the fingertip positions in the second image are identified and recorded as third fingertip coordinates, and fourth fingertip coordinates corresponding to the third fingertip coordinates in the first image are determined through affine transformation.
Determining a text detection model of the first image according to a user intention instruction, wherein for searching answers or topic recording of test questions, the recognition model can be a Mask-CNN model based on topic segmentation, the text outline of each topic can be obtained by segmentation, then determining a text image (namely an intention image) which is required to be segmented by the user intention according to the text outline and a fourth fingertip coordinate, storing the text image when the user intention is used for topic recording, and obtaining text information of the text image through OCR recognition when the text image is used for searching answers, and searching corresponding answers in a resource library or the Internet.
For searching pronunciation, near meaning word, anti-meaning word or meaning, the recognition model may be PSEnet model based on text line detection to obtain the outline of each text line, then determining the target text line based on the fourth fingertip coordinates, then obtaining the corresponding word outline according to the fourth fingertip coordinates and the preset rule, determining the image (intention image) of the word outline, and obtaining the text information of the text image through OCR recognition, searching the corresponding result in the resource library or the Internet, which may be voice, word, etc.
By implementing the embodiment of the invention, the finger-free picture and the corresponding intention coordinate can be obtained by combining the two image acquisition devices through the suspension gesture in the click-to-read scene, so that the intention image is obtained according to the finger-free picture and the intention coordinate and is used for subsequent operation.
Example two
Referring to fig. 2, fig. 2 is a schematic structural diagram of a device for shooting test questions in a click-to-read scene according to an embodiment of the present invention. As shown in fig. 2, the device for shooting a test question in the click-through scene may include:
The preview unit 210 is configured to start the first image capturing device and the second image capturing device to align with the carrier when the electronic device is in the click-to-read scene, and respectively obtain a first preview image and a second preview image, where the carrier image obtained by the first preview image does not include a gesture, and a suspension gesture is superimposed in the carrier image obtained by the second preview image;
The recognition unit 240 is configured to recognize a gesture in the second preview image to obtain a first fingertip coordinate;
A transforming unit 250, configured to transform the first fingertip coordinates into the first preview image by using an affine transformation manner, to obtain second fingertip coordinates, and display the second fingertip coordinates in the first preview image in real time;
an adjusting unit 260 for adjusting the first fingertip coordinates so that the second fingertip coordinates are in the target region;
And a photographing unit 270 for receiving the user intention instruction and determining the user intention image.
As an alternative embodiment, the identifying unit 240 may include:
and identifying the fingertip in the second preview image by using a skin color segmentation method or a fingertip identification model based on machine learning to obtain a first fingertip coordinate.
As an alternative embodiment, the apparatus may further include: the judging unit 220 is configured to perform gesture recognition on the carrier image obtained by the first preview image, and when a fingertip is recognized in the carrier image obtained by the first preview image, send an interaction instruction to remind the user to adjust the position or/and the height of the hanging gesture.
As an alternative embodiment, the apparatus may further include: a display unit 230 for displaying the first preview image and the second preview image in a display screen of the electronic device using a split screen technique;
the transforming unit 250 may include:
an assignment subunit 251, configured to assign a preset RGB value to the second pointer coordinate, so as to obtain a second pointer coordinate image;
And the amplifying subunit 252 is configured to amplify the second pointer coordinate image by a preset multiple, and display the amplified second pointer coordinate image in the first preview image in real time.
As an alternative embodiment, the photographing unit 270 may include:
a receiving subunit 271, configured to receive a user intention instruction, take a picture of the carrier through the first image capturing device and the second image capturing device, and acquire a first image and a second image respectively;
a recognition subunit 272, configured to recognize a gesture in the second image to obtain a third fingertip coordinate;
a transforming subunit 273, configured to transform the third fingertip coordinate into the first image by using an affine transformation manner, to obtain a fourth fingertip coordinate;
A detection subunit 274, configured to input a first image into a corresponding pre-trained text detection model according to the user intention, to obtain text information corresponding to the first image;
and the determining subunit 275 is configured to determine an intent image according to the fourth fingertip coordinates, the text information and a preset rule.
The device for shooting the test questions in the click-to-read scene shown in fig. 2 can obtain a finger-free picture and corresponding intention coordinates by combining the two image acquisition devices through the suspension gesture in the click-to-read scene, so that an intention image is obtained according to the finger-free picture and the intention coordinates for subsequent operation.
Example III
Referring to fig. 3, fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the invention. As shown in fig. 3, the electronic device may include:
a memory 310 in which executable program code is stored;
A processor 320 coupled to the memory 310;
the processor 320 invokes executable program codes stored in the memory 310 to execute some or all of the steps in the method for shooting the test questions in the read-in scene in the embodiment.
The embodiment of the invention discloses a computer readable storage medium which stores a computer program, wherein the computer program enables a computer to execute part or all of the steps in a method for shooting test questions in a reading scene in the embodiment.
The embodiment of the invention also discloses a computer program product, wherein when the computer program product runs on a computer, the computer is caused to execute part or all of the steps in the method for shooting the test questions in the reading scene in the embodiment.
The embodiment of the invention also discloses an application release platform, wherein the application release platform is used for releasing the computer program product, and when the computer program product runs on a computer, the computer is caused to execute part or all of the steps in the method for shooting the test questions in the reading scene in the embodiment.
In various embodiments of the present invention, it should be understood that the size of the sequence numbers of the processes does not mean that the execution sequence of the processes is necessarily sequential, and the execution sequence of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present invention.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer-accessible memory. Based on this understanding, the technical solution of the present invention, or a part contributing to the prior art or all or part of the technical solution, may be embodied in the form of a software product stored in a memory, comprising several requests for a computer device (which may be a personal computer, a server or a network device, etc., in particular may be a processor in a computer device) to execute some or all of the steps of the method according to the embodiments of the present invention.
In the embodiments provided herein, it should be understood that "B corresponding to a" means that B is associated with a, from which B can be determined. It should also be understood that determining B from a does not mean determining B from a alone, but may also determine B from a and/or other information.
Those of ordinary skill in the art will appreciate that some or all of the steps of the various methods of the described embodiments may be implemented by hardware associated with a program that may be stored in a computer-readable storage medium, including Read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), programmable Read-Only Memory (Programmable Read-Only Memory, PROM), erasable programmable Read-Only Memory (Erasable Programmable Read-Only Memory, EPROM), one-time programmable Read-Only Memory (One-time Programmable Read-Only Memory, OTPROM), electrically erasable programmable Read-Only Memory (EEPROM), compact disc Read-Only Memory (Compact Disc Read-Only Memory, CD-ROM), or other optical disk Memory, magnetic disk Memory, tape Memory, or any other medium capable of being used to carry or store data.
The method, the device, the electronic equipment and the storage medium for shooting the test questions in the click-to-read scene disclosed by the embodiment of the invention are described in detail, and specific examples are applied to the description of the principle and the implementation mode of the invention, and the description of the above embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims (11)

1. The method for shooting the test questions in the click-to-read scene is characterized by comprising the following steps:
Starting a first image acquisition device and a second image acquisition device to aim at a carrier when the electronic equipment is in a click-to-read scene, and respectively acquiring a first preview image and a second preview image, wherein the carrier image obtained by the first preview image does not comprise gestures, and the carrier image obtained by the second preview image is superimposed with suspension gestures;
Gesture recognition is carried out on the carrier image obtained by the first preview image, and when fingertips are recognized in the carrier image obtained by the first preview image, an interaction instruction is sent out to remind a user to adjust the position or/and the height of the suspension gesture;
Recognizing the gesture in the second preview image to obtain a first fingertip coordinate;
converting the first fingertip coordinates into a first preview image in an affine transformation mode to obtain second fingertip coordinates, and displaying the second fingertip coordinates in the first preview image in real time;
Adjusting the first fingertip coordinates so that the second fingertip coordinates are in a target area;
and receiving a user intention instruction and determining a user intention image.
2. The method of claim 1, wherein identifying the gesture in the second preview image to obtain the first fingertip coordinates comprises:
and identifying the fingertip in the second preview image by using a skin color segmentation method or a fingertip identification model based on machine learning to obtain a first fingertip coordinate.
3. The method of any of claims 1-2, wherein activating the first image capture device and the second image capture device to align the carrier to capture the first preview image and the second preview image, respectively, further comprises:
Displaying the first preview image and the second preview image in a display screen of the electronic device by adopting a split screen technology;
the displaying the second pointer coordinates in the first preview image in real time includes:
Endowing the second pointer coordinates with preset RGB values to obtain second pointer coordinate images;
and amplifying the second pointer coordinate image by a preset multiple, and displaying the amplified second pointer coordinate image in the first preview image in real time.
4. The method of any of claims 1-2, wherein the receiving a user intent instruction, determining a user intent image, comprises:
receiving a user intention instruction, photographing a carrier through a first image acquisition device and a second image acquisition device, and respectively acquiring a first image and a second image;
Recognizing the gesture in the second image to obtain a third fingertip coordinate;
Converting the third fingertip coordinate into the first image by utilizing an affine transformation mode to obtain a fourth fingertip coordinate;
Inputting a first image into a corresponding pre-trained text detection model according to the user intention to obtain text information corresponding to the first image;
and determining an intention image according to the fourth fingertip coordinates, the text information and the preset rule.
5. The utility model provides a device that examination questions shoot under click-through scene, its characterized in that, the device includes:
the electronic equipment comprises a preview unit, a first image acquisition device and a second image acquisition device, wherein the preview unit is used for starting the first image acquisition device and the second image acquisition device to aim at a carrier when the electronic equipment is in a click-to-read scene, and respectively acquiring a first preview image and a second preview image, wherein the carrier image obtained by the first preview image does not comprise gestures, and suspension gestures are superposed in the carrier image obtained by the second preview image;
a judging unit: the gesture recognition method comprises the steps that gesture recognition is carried out on a carrier image obtained by the first preview image, and when fingertips are recognized in the carrier image obtained by the first preview image, an interaction instruction is sent out to remind a user to adjust the position or/and the height of a suspension gesture;
the recognition unit is used for recognizing the gesture in the second preview image to obtain a first fingertip coordinate;
The transformation unit is used for transforming the first fingertip coordinates into the first preview image in an affine transformation mode to obtain second fingertip coordinates, and displaying the second fingertip coordinates in the first preview image in real time;
the adjusting unit is used for adjusting the first fingertip coordinates so that the second fingertip coordinates are in the target area;
and the photographing unit is used for receiving the user intention instruction and determining a user intention image.
6. The apparatus according to claim 5, wherein the identification unit comprises:
and identifying the fingertip in the second preview image by using a skin color segmentation method or a fingertip identification model based on machine learning to obtain a first fingertip coordinate.
7. The apparatus of claim 5, wherein the apparatus further comprises:
And the judging unit is used for carrying out gesture recognition on the carrier image obtained by the first preview image, and when the fingertip is recognized in the carrier image obtained by the first preview image, an interaction instruction is sent out to remind a user to adjust the position or/and the height of the suspension gesture.
8. The apparatus according to any one of claims 5-7, further comprising:
a display unit, configured to display the first preview image and the second preview image on a display screen of the electronic device by using a split screen technology;
The transformation unit includes:
The assignment subunit is used for assigning the preset RGB value to the second pointer coordinate to obtain a second pointer coordinate image;
and the amplifying subunit is used for amplifying the second pointer coordinate image by a preset multiple and displaying the amplified second pointer coordinate image in the first preview image in real time.
9. The apparatus according to any one of claims 5-7, wherein the photographing unit comprises:
The receiving subunit is used for receiving the user intention instruction, photographing the carrier through the first image acquisition device and the second image acquisition device, and respectively acquiring a first image and a second image;
the recognition subunit is used for recognizing the gesture in the second image to obtain a third fingertip coordinate;
A transformation subunit, configured to transform the third fingertip coordinate into the first image by using an affine transformation manner, to obtain a fourth fingertip coordinate;
the detection subunit is used for inputting a first image into a corresponding pre-trained text detection model according to the user intention to obtain text information corresponding to the first image;
And the determining subunit is used for determining the intention image according to the fourth fingertip coordinates, the text information and the preset rule.
10. An electronic device, comprising: a memory storing executable program code; a processor coupled to the memory; the processor invokes the executable program code stored in the memory for performing the method of examination question shooting in a point-and-read scenario as claimed in any one of claims 1 to 4.
11. A computer-readable storage medium storing a computer program, wherein the computer program causes a computer to execute the method of examination question shooting in a point-and-read scene as claimed in any one of claims 1 to 4.
CN202010581452.9A 2020-06-23 2020-06-23 Method and device for shooting test questions in click-to-read scene, electronic equipment and storage medium Active CN111753715B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010581452.9A CN111753715B (en) 2020-06-23 2020-06-23 Method and device for shooting test questions in click-to-read scene, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010581452.9A CN111753715B (en) 2020-06-23 2020-06-23 Method and device for shooting test questions in click-to-read scene, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111753715A CN111753715A (en) 2020-10-09
CN111753715B true CN111753715B (en) 2024-06-21

Family

ID=72676987

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010581452.9A Active CN111753715B (en) 2020-06-23 2020-06-23 Method and device for shooting test questions in click-to-read scene, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111753715B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023272604A1 (en) * 2021-06-30 2023-01-05 东莞市小精灵教育软件有限公司 Positioning method and apparatus based on biometric recognition

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107592459A (en) * 2017-09-22 2018-01-16 维沃移动通信有限公司 A kind of photographic method and mobile terminal
CN108205641A (en) * 2016-12-16 2018-06-26 比亚迪股份有限公司 Images of gestures processing method and processing device
CN110969159A (en) * 2019-11-08 2020-04-07 北京字节跳动网络技术有限公司 Image recognition method and device and electronic equipment

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101587534B (en) * 2009-05-05 2011-11-09 深圳市迪索音乐科技有限公司 Stick-type direct-point-read voice system and method of publications
JP2017004438A (en) * 2015-06-15 2017-01-05 富士通株式会社 Input device, finger-tip position detection method, and computer program for finger-tip position detection
CN110209273B (en) * 2019-05-23 2022-03-01 Oppo广东移动通信有限公司 Gesture recognition method, interaction control method, device, medium and electronic equipment
CN110909729B (en) * 2019-12-09 2022-10-18 广东小天才科技有限公司 Click-to-read content identification method and device and terminal equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108205641A (en) * 2016-12-16 2018-06-26 比亚迪股份有限公司 Images of gestures processing method and processing device
CN107592459A (en) * 2017-09-22 2018-01-16 维沃移动通信有限公司 A kind of photographic method and mobile terminal
CN110969159A (en) * 2019-11-08 2020-04-07 北京字节跳动网络技术有限公司 Image recognition method and device and electronic equipment

Also Published As

Publication number Publication date
CN111753715A (en) 2020-10-09

Similar Documents

Publication Publication Date Title
US9672421B2 (en) Method and apparatus for recording reading behavior
CN111027537B (en) Question searching method and electronic equipment
CN109376612B (en) Method and system for assisting positioning learning based on gestures
EP2336949A1 (en) Apparatus and method for registering plurality of facial images for face recognition
Vazquez-Fernandez et al. Built-in face recognition for smart photo sharing in mobile devices
TWI586160B (en) Real time object scanning using a mobile phone and cloud-based visual search engine
CN110941992B (en) Smile expression detection method and device, computer equipment and storage medium
CN111401238B (en) Method and device for detecting character close-up fragments in video
CN111026949A (en) Question searching method and system based on electronic equipment
CN111753168A (en) Method and device for searching questions, electronic equipment and storage medium
CN111753715B (en) Method and device for shooting test questions in click-to-read scene, electronic equipment and storage medium
CN111079777B (en) Page positioning-based click-to-read method and electronic equipment
CN111027533B (en) Click-to-read coordinate transformation method, system, terminal equipment and storage medium
US9501710B2 (en) Systems, methods, and media for identifying object characteristics based on fixation points
CN111079503B (en) Character recognition method and electronic equipment
CN110795918A (en) Method, device and equipment for determining reading position
CN111077993B (en) Learning scene switching method, electronic equipment and storage medium
CN111091034B (en) Question searching method based on multi-finger recognition and home teaching equipment
CN111027353A (en) Search content extraction method and electronic equipment
CN111432131B (en) Photographing frame selection method and device, electronic equipment and storage medium
CN111711758B (en) Multi-pointing test question shooting method and device, electronic equipment and storage medium
US20230196558A1 (en) Medicine image recognition method, electronic device and readable storage medium
CN112560728B (en) Target object identification method and device
CN111582281B (en) Picture display optimization method and device, electronic equipment and storage medium
CN111027556B (en) Question searching method and learning device based on image preprocessing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant