CN114429632B

CN114429632B - Method, device, electronic equipment and computer storage medium for identifying click-to-read content

Info

Publication number: CN114429632B
Application number: CN202011104395.1A
Authority: CN
Inventors: 董胜; 徐浩; 项小明
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-10-15
Filing date: 2020-10-15
Publication date: 2023-12-12
Anticipated expiration: 2040-10-15
Also published as: CN114429632A

Abstract

The embodiment of the application provides a method, a device, electronic equipment and a computer storage medium for identifying click-to-read content, and relates to the technical field of cloud education. According to the method, the images of the reading materials are obtained, the characters in the images and the position information of the indicator are identified through an image identification method, the characters pointed by the indicator are determined according to the characters pointed by the indicator, and finally the reading content is determined according to the characters pointed by the indicator, so that the reading range is not limited by a reading pen and specific reading teaching materials, the reading of common teaching materials, data and other reading materials can be supported, great convenience is achieved, solutions can be obtained quickly for the rare content encountered by a user in learning, and the learning efficiency of the user is effectively improved.

Description

Method, device, electronic equipment and computer storage medium for identifying click-to-read content

Technical Field

The application relates to the technical field of point reading, in particular to a method, a device, electronic equipment and a computer storage medium for identifying point reading content.

Background

Cloud education (Cloud Computing Education: CCEDU) refers to an educational platform service based on cloud computing business model applications. On the cloud platform, all education institutions, training institutions, recruitment service institutions, propaganda institutions, industry associations, management institutions, industry media, legal structures and the like are integrated into a resource pool in a concentrated cloud mode, all resources are mutually displayed and interacted, the purposes are achieved according to needs, and therefore education cost is reduced, and efficiency is improved.

The home education machine with the click-to-read function is becoming an important carrier in the field of cloud education, and the current method for recognizing the click-to-read content is generally to prepare a recognition result of a click-to-read position first, and then directly return to the prepared content after acquiring the click-to-read position. For example, the click-to-read content is written into an invisible code layer of the click-to-read textbook in advance, the click-to-read pen emits light through an infrared light emitting diode, the camera acquires the code layer data, and the click-to-read content is identified through the identification code layer data. For example, the page of the click-to-read textbook is provided with mark information, the click-to-read pen is provided with a special mark, the click-to-read pen triggers the click-to-read, and the camera acquires the mark information, so that the click-to-read content is failed.

The existing scheme has the following problems:

1. the identification range is limited, and only customized books can be identified;

2. high cost, need to customize the reading book and the reading pen, consume a great deal of manpower and material resources,

3. the customized click-to-read book cannot update the click-to-read content.

Disclosure of Invention

Embodiments of the present invention provide a method, apparatus, electronic device, and computer storage medium for identifying click-to-read content that overcomes or at least partially solves the above-mentioned problems.

In a first aspect, a method of identifying click-to-read content is provided, the method comprising:

Acquiring an image of a read-on reading material;

identifying the characters in the image and the position information of the indicator, and determining the characters pointed by the indicator according to the characters and the position information of the indicator;

the read-on-point content is determined from the character pointed by the pointer.

In one possible implementation, identifying the character and the position information of the indicator in the image, determining the character pointed by the indicator according to the character and the position information of the indicator includes:

the method comprises the steps of identifying an image through an OCR (optical character recognition) engine to obtain position information of characters in the image; identifying position information of an indicator in the image by a fingertip detection method;

and calculating the character closest to the indicator according to the character and the position information of the indicator, and taking the character closest to the indicator as the character pointed by the indicator.

recognizing the image through an OCR recognition engine to obtain character lines and the positions of the character lines in the image; identifying position information of an indicator in the image by a fingertip detection method;

obtaining a word line nearest to the indicator according to the word line and the position information of the indicator, and taking the word line as a target word line;

And determining the pixel width occupied by a single character in the target text line, and acquiring the character pointed by the indicator by combining the position information of the target text line and the indicator.

In one possible implementation manner, according to the text line and the position information of the indicator, obtaining the text line closest to the indicator as the target text line includes:

for any character line, calculating the distance from the indicator to the bottom edge of the character line according to the position information of the character line and the indicator;

obtaining a position relation coefficient of the character line according to the relative position relation between the indicator and the bottom edge of the character line;

carrying out weighted summation on the vertical distance from the indicator to the bottom edge of the character line and the position relation coefficient of the character line to obtain the weighted distance between the character line and the indicator;

and taking the character row with the smallest weighted distance as a target character row.

In one possible implementation, determining a pixel width occupied by a single character in a target text line includes:

obtaining the pixel width occupied by the target text line according to the position information of the target text line;

and obtaining the pixel width occupied by a single character in the target text line according to the quotient of the pixel width occupied by the target text line and the number of the characters in the target text line.

In one possible implementation, obtaining the character pointed by the indicator in combination with the position information of the target text line and the indicator includes:

determining the distance between the indicator and the left end of the target text line according to the position information of the target text line and the indicator;

obtaining the sequence of the characters pointed by the indicator in the target text line according to the quotient of the distance between the indicator and the left end of the target text line and the pixel width occupied by the single characters in the target text line;

and determining the character pointed by the indicator from the target text line according to the sorting.

In one possible implementation, determining the click-through content from the character pointed to by the pointer includes:

if the character pointed by the indicator is a Chinese character, determining the click-to-read content according to the Chinese character;

if the character pointed by the indicator is an English character, determining the vocabulary in which the English character is located, and determining the click reading content according to the vocabulary.

In a second aspect, there is provided an apparatus for recognizing click-to-read content, the apparatus comprising:

the image acquisition module is used for acquiring an image of the read-on reading material;

the pointing character determining module is used for identifying the characters in the image and the position information of the indicator, and determining the characters pointed by the indicator according to the characters and the position information of the indicator;

And the click-to-read content determining module is used for determining the click-to-read content according to the character pointed by the indicator.

In one possible implementation, the pointing character determining module includes:

the character position determining sub-module is used for identifying the image through the OCR engine to obtain the position information of the characters in the image; identifying position information of an indicator in the image by a fingertip detection method;

and the nearest distance calculating sub-module is used for calculating the character closest to the indicator according to the character and the position information of the indicator, and the character is used as the character pointed by the indicator.

the character line position determining sub-module is used for identifying the image through the OCR engine to obtain character lines in the image and the positions of the character lines; identifying position information of an indicator in the image by a fingertip detection method;

the target text line determining sub-module is used for obtaining the text line closest to the indicator according to the text line and the position information of the indicator, and taking the text line closest to the indicator as a target text line;

a width combining sub-module for determining the pixel width occupied by a single character in the target character line, and combining the position information of the target character line and the indicator to obtain the character pointed by the indicator

In one possible implementation, the target text line determination submodule includes:

the distance determining unit is used for calculating the distance from the indicator to the bottom edge of the character line according to the position information of the character line and the indicator for any character line;

the relation coefficient determining unit is used for obtaining the position relation coefficient of the character line according to the relative position relation between the indicator and the bottom edge of the character line;

the weighted summation unit is used for carrying out weighted summation on the vertical distance from the indicator to the bottom edge of the character line and the position relation coefficient of the character line to obtain the weighted distance between the character line and the indicator;

and the target character line determining unit is used for taking the character line with the smallest weighted distance as the target character line.

In one possible implementation, the width combining sub-module further includes a character width determining unit for determining a pixel width occupied by a single character in the target text line, where the character width determining unit includes:

the text line width calculation subunit is used for obtaining the pixel width occupied by the target text line according to the position information of the target text line;

and the character width calculating subunit is used for obtaining the pixel width occupied by a single character in the target character row according to the quotient of the pixel width occupied by the target character row and the number of characters in the target character row.

In one possible implementation, the width combining sub-module further includes a pointing character determining unit for obtaining a character pointed by the pointer by combining the position information of the target text line and the pointer, and the pointing character determining unit includes:

the distance determining subunit is used for determining the distance between the indicator and the left end of the target text line according to the position information of the target text line and the indicator;

the sequencing determination subunit is used for obtaining the sequencing of the characters pointed by the indicator in the target text line according to the quotient of the distance between the indicator and the left end of the target text line and the pixel width occupied by the single character in the target text line;

and the sequencing character subunit is used for determining the character pointed by the indicator from the target text line according to sequencing.

In a third aspect, an embodiment of the invention provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method as provided in the first aspect when the program is executed.

In a fourth aspect, an embodiment of the invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method as provided by the first aspect.

According to the method, the device, the electronic equipment and the computer storage medium for identifying the click-to-read content, provided by the embodiment of the application, the image of the click-to-read material is obtained, the characters in the image and the position information of the indicator are identified through the image identification method, the characters pointed by the indicator are determined by utilizing the characters and the position information of the indicator, and finally the click-to-read content is determined according to the characters pointed by the indicator, so that the click-to-read range is not limited by a click-to-read pen and a specific click-to-read teaching material, the ordinary click-to-read material, data and other reading materials can be supported, great convenience is provided, the answer can be quickly obtained for the rare content encountered by a user in learning, and the learning efficiency of the user is effectively improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that are required to be used in the description of the embodiments of the present application will be briefly described below.

Fig. 1 is a schematic diagram of a system architecture of a point-to-read system according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a desk lamp according to an embodiment of the present application;

fig. 3 is a schematic flow chart of identifying and clicking content by using the desk lamp shown in fig. 2 according to an embodiment of the present application;

Fig. 4 is a schematic flow chart of identifying and clicking content by using the desk lamp shown in fig. 2 according to another embodiment of the present application;

FIG. 5 is a flowchart illustrating a method for identifying content to be read according to an embodiment of the present application;

FIG. 6 is an interactive schematic diagram of a system for identifying content to read through according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a device for identifying content to be read through according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative only and are not to be construed as limiting the application.

As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein includes all or any element and all combination of one or more of the associated listed items.

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.

The application provides a method, a device, electronic equipment and a computer readable storage medium for identifying click-to-read content, and aims to solve the technical problems in the prior art.

The following describes the technical scheme of the present application and how the technical scheme of the present application solves the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

Fig. 1 shows a schematic diagram of a system architecture of a point-and-read system to which the technical solution of the embodiment of the present application can be applied.

As shown in fig. 1, the system architecture may include a terminal device (such as one or more of the smart phone 101, tablet 102, and point-and-read machine 103 shown in fig. 1, but of course, a desktop computer, etc.), a network 104, and a server 105. The network 104 is the medium used to provide communication links between the terminal devices and the server 105. The network 104 may include various connection types, such as wired communication links, wireless communication links, and the like.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud computing service.

The execution method of the server in the embodiment of the application can be completed in a form of cloud computing (cloud computing), which is a computing mode, and distributes computing tasks on a resource pool formed by a large number of computers, so that various application systems can acquire computing power, storage space and information service according to requirements. The network that provides the resources is referred to as the "cloud". Resources in the cloud are infinitely expandable in the sense of users, and can be acquired at any time, used as needed, expanded at any time and paid for use as needed.

As a basic capability provider of cloud computing, a cloud computing resource pool (cloud platform for short, generally referred to as IaaS (Infrastructure as a Service, infrastructure as a service) platform) is established, in which multiple types of virtual resources are deployed for external clients to select for use.

According to the logic function division, a PaaS (Platform as a Service ) layer can be deployed on an IaaS (Infrastructure as a Service ) layer, and a SaaS (Software as a Service, software as a service) layer can be deployed above the PaaS layer, or the SaaS can be directly deployed on the IaaS. PaaS is a platform on which software runs, such as a database, web container, etc. SaaS is a wide variety of business software such as web portals, sms mass senders, etc. Generally, saaS and PaaS are upper layers relative to IaaS.

A user may interact with the server 105 via the network 104 using a terminal device to receive or send messages or the like. The server 105 may be a server providing various services. For example, the user uploads an image of the reading material read by the user by using the terminal device 103 (may also be the terminal device 101 or 102) to the server 105, the server 105 recognizes the position information of the character and the indicator in the image, and determines the character pointed by the indicator according to the position information of the character and the indicator; the read-on-point content is determined from the character pointed by the pointer. The server 105 determines the positions of the characters and the indicators in the image through the image recognition method, and further determines the characters pointed by the indicators, so that the point reading range is not limited any more, point reading of books such as teaching materials, books and electronic books can be supported, more importantly, point reading textbooks and point reading pens are not needed, and the point reading system is simple and flexible.

The click-to-read system of the embodiment of the application can be deployed as an independent software and hardware integrated system, such as a mobile intelligent equipment system integrated with a camera device and an audio playing device; the independent camera device is combined with the intelligent sound box; the system is formed by combining an independent camera device, an independent audio playing device and mobile intelligent equipment; the system can be deployed as a combination of the terminal and the server.

It will be appreciated that in other embodiments, the terminal may not send the image to the server 105, and the series of processing of the image by the server 105 may be performed by the terminal itself. Here, only one application scenario is exemplified, and the execution subject of the method for recognizing the content of the present application is not limited to the server 105, and the terminal itself may execute the method for recognizing the content of the present application.

Further, the terminal of the embodiment of the present application may integrate a camera device and an audio/video playing device, referring to fig. 2, fig. 2 shows a schematic structural diagram of a desk lamp to which the embodiment of the present application may be applied, where the terminal of the embodiment of the present application uses the desk lamp as a carrier, and includes a base 201, a lamp post 202 and a lamp body 203;

the base 201 is fixedly placed on the desktop, and the base 201 includes a processor 2011, a microphone 2012, a speaker 2013, a communication module 2014, a memory 2015, a display 2016, a power module 2017, and other components;

The microphone 2012, the speaker 2013, the communication module 2014, the memory 2015, and the display 2016 are connected to the processor 2011; a power module 2017 for supplying power to the above components;

the lamp body 203 can be connected with the base 201 fixedly placed on the desktop through the lamp post 202, the lamp body 203 can be transversely arranged and also can be hinged with the lamp post 202, so that the irradiation angle is changed, the lamp body 203 comprises a light source 2031 which can be an LED light source, an OLED light source and the like, and the lamp body 203 can be beneficial to reducing shadows when operated by students and students;

the light body 203 is further provided with a camera 2032, which may be a single camera or a dual camera, or even more cameras, and the camera 2032 is connected with the processor 2011 (the connection relationship is not shown in the figure), so that the camera is controlled by the processor, and may be installed on the light body 203 and at an end far from the light pole 202, and the shooting direction of the camera faces the desktop, so as to obtain a better visual field range.

The memory 2025 stores executable program codes, and the processor 2021 calls the executable program codes stored in the memory 2025 to execute the technical scheme of the embodiment of the present application, obtain the click-to-read content, and play the click-to-read content by the speaker 2023 and the display 2026 respectively.

Therefore, the desk lamp described in fig. 2 can provide the operation experience of direct point-reading of the indication objects of the non-point-reading pen such as a finger, a pencil and the like on a real book for a user of a pupil, has great convenience, can timely ask for the rare content encountered during learning, and can obtain the corresponding answer result, so that the learning efficiency of the pupil can be effectively improved.

Fig. 3 is a schematic flow chart of reading content by using the table lamp shown in fig. 2 according to the embodiment of the present application, where the flow chart is shown in fig. 3:

the user places the book below the desk lamp for reading;

when a user finds an unrecognized word from the page, a finger is placed below the word, a desk lamp is triggered to shoot the finger and the page, and an image of a reading object read by the user is obtained; the image clearly records the characters in the finger and the page;

by identifying the position information of the characters and the fingers in the image, the characters pointed by the fingers are determined to be Chinese character 'bottoms', and the 'bottoms' click-to-read content, such as pronunciation, paraphrasing, strokes, word forming and other information, is displayed on a display.

Fig. 4 is a schematic flow chart of identifying and clicking content by using the desk lamp shown in fig. 2 according to another embodiment of the present application, and fig. 4 shows:

The user places the book below the desk lamp for reading;

when a user finds an unrecognized word from the page, a pencil is placed below the word, a desk lamp is triggered to shoot the pencil and the page, and an image of a reading material is obtained; the pencil point and the characters in the page are clearly recorded in the image;

by identifying the position information of the characters and the pen points in the image, the character pointed by the pen points is determined to be the bottom of the Chinese character according to the position information of the characters and the pen points, and the click-to-read content of the bottom is displayed on a display, such as pronunciation, paraphrasing, strokes, word forming and other information.

The embodiment of the application provides a method for identifying click-to-read content, as shown in fig. 5, which comprises the following steps:

s101, acquiring an image of the read-on-read material.

The image capturing method in the embodiment of the present application may be capturing, and retrieving an image stored locally in real time by a device with an image capturing function, or may be capturing a picture sent by another device or a storage device, which is not specifically limited herein.

The image of the embodiment of the application not only comprises the character to be recognized, but also records the indicator when the user clicks, that is, the embodiment of the application does not need to use a click pen to click as in the prior art, thereby greatly reducing the hardware requirement of the click. It will be appreciated that the indicator according to the embodiments of the present application may be a common writing pen, such as a pencil, a brush, a pen, etc., besides a finger, which is not particularly limited herein.

S102, identifying the characters and the position information of the indicator in the image, and determining the characters pointed by the indicator according to the characters and the position information of the indicator.

OCR (Optical Character Recognition ) refers to the process of an electronic device (e.g., a scanner or digital camera) checking characters printed on paper, determining their shape by detecting dark and light patterns, and then translating the shape into computer text using a character recognition method; that is, for print characters, characters in a paper document are optically converted into an image file of a black-and-white lattice, and characters in an image are converted into a text format by recognition software. The embodiment of the application can recognize the image through the preset OCR engine recognition engine to obtain the position information of each character in the image. The position information of the character can be characterized by the maximum abscissa, the minimum abscissa, the maximum ordinate and the minimum ordinate of the pixel points of the character in the image.

Specifically, the embodiment of the application can recognize characters through the Ten-gram OCR character recognition engine, and the Ten-gram OCR character recognition engine can support the recognition of the whole-figure characters under any layout, including the recognition of more than ten languages such as Chinese and English, letters, numerals, japanese, korean, and the like, and has the advantages of being more accurate and rapid when being applied to the click-reading scene of infants and students in middle and primary schools.

Taking finger identification as an example, the embodiment of the application can identify the finger in the image and the position information of the finger in a fingertip detection mode. The finger is detected by the finger tip, and coordinates of the finger tip are used as position information of the finger. Fingertip detection is a special hand key point recognition and refers to a technology for positioning the fingertips of an index finger extending out of a gesture such as an index finger. The fingertip detection can adopt gesture recognition functions (Gesture Recognition, GR) of the Tencent cloud recognition, wherein the Tencent cloud recognition is based on man-machine interaction technology deduced by Tencent audio-video laboratories and comprises multiple functions of static gesture recognition, key point recognition, fingertip recognition, gesture action recognition and the like. Tencent cloud recognition is supported in any of the pictures, and if the gesture is of the type extending beyond the index finger, the position of the fingertip is returned, wherein the position is represented by the abscissa and the ordinate.

After determining the position information of the character and the pointer, the character closest to the pointer may be searched for as the character pointed to by the pointer. For example, the coordinates of the center point of the character may be determined from the coordinates of the character, and then the distance between the coordinates of the center point of the character and the coordinates of the fingertip may be calculated, with the character having the smallest distance as the character pointed to by the pointer.

S103, determining the click-to-read content according to the character pointed by the indicator.

It should be understood that the character refers to a class unit or symbol, and that the character that is valid during the click-through process is typically a kanji character or an english character, although korean, japanese, arabic, etc. characters are also possible. For characters such as kanji characters that are not alphabetic, the character pointed by the pointer is a complete word, and the read-through content can be determined directly from the complete word. Taking a kanji character as an example, if the character pointed by the indicator is "king", the content related to the "king" character, such as strokes, paraphrasing, pronunciation, can be used as the click-to-read content.

For the characters in the form of letters, a single character does not form a complete word, so that a word head is searched from the character forward, a word is searched backward, the vocabulary in which the character is positioned is determined, and then the click-to-read content is determined according to the vocabulary. Taking english characters as an example, if the character pointed by the indicator is the letter "i", looking forward from the letter "i" to find that the letter is "n", and the front of the letter "n" is a space character, so that the letter "n" is determined to be the letter head, looking backward from the letter "i" to find the letters "c" and "e" successively, and the rear of the letter "e" is a space character, so that the vocabulary where the character pointed by the indicator is determined to be "nice", then the content related to "nice", such as pronunciation, paraphrasing, example sentence, is used as the click-reading content.

Further, if the character pointed by the indicator is a space character, the method for determining the content to be read according to the non-space character adjacent to the space character is similar to the above example, and will not be described here again.

According to the method for identifying the click-to-read content, disclosed by the embodiment of the application, the image of the click-to-read material is obtained, the characters in the image and the position information of the indicator are identified through the image identification method, the characters pointed by the indicator are determined by utilizing the characters and the position information of the indicator, and finally the click-to-read content is determined according to the characters pointed by the indicator, so that the click-to-read range is not limited by a click-to-read pen and a specific click-to-read teaching material, the ordinary click-to-read of the teaching material, data and other reading materials can be supported, great convenience is provided, the answer can be quickly obtained for the rare content encountered by a user in learning, and the learning efficiency of the user is effectively improved.

It should be noted that the recognition accuracy of the OCR recognition engine often has a positive correlation with the cost, and the higher the recognition accuracy of the OCR recognition engine is, the higher the cost is required. Therefore, on the basis that the above embodiment gives an example of accurately recognizing the position information of each character by using the OCR recognition engine, as an alternative embodiment, the present application further provides a method for determining that the pointing object points to the character by using the OCR recognition engine with lower recognition accuracy, specifically, recognizing the position information of the character and the pointing object in the image, and determining the character pointed to by the pointing object according to the position information of the character and the pointing object, including:

S201, recognizing the image through an OCR recognition engine to obtain a character line and a position of the character line in the image; the position information of the pointer in the image is identified by a fingertip detection method.

In the embodiment of the application, the recognition accuracy of the OCR engine is low, and although characters in an image can be recognized, the position of each character cannot be precisely known, and only the position of each character line can be obtained. The position of each text line may be characterized by a maximum abscissa, a minimum abscissa, a maximum ordinate, and a minimum ordinate of the text line.

Considering that a user generally points an indicator below a text instead of below the text when clicking, in order to more accurately determine a target text line, the embodiment of the present application further determines the target text line by acquiring a weighted distance based on calculating a distance between the text line and the indicator, specifically: obtaining a position relation coefficient of the character line according to the relative position relation between the indicator and the bottom edge of the character line;

if the indicator is located above the bottom edge of the text line, the positional relationship coefficient is an integer, and if the indicator is located below the bottom edge of the text line, the positional relationship coefficient is 0 or a negative number.

Carrying out weighted summation on the vertical distance from the indicator to the bottom edge of the character line and the position relation coefficient of the character line to obtain the weighted distance between the character line and the indicator; and taking the character row with the smallest weighted distance as a target character row.

The calculation formula for defining the weighted distance between the text line and the indicator is as follows:

Fi＝a*Li+b*Mi

wherein Fi represents the weighted distance between the i-th character line indicators, a and b are the first weight and the second weight, respectively, li represents the vertical distance from the indicator to the bottom edge of the character line, and Mi represents the positional relationship coefficient of the character line.

The process of identifying the position information of the indicator in the image by the fingertip detection method is described in the above embodiment, and will not be described again.

S202, according to the text line and the position information of the indicator, obtaining the text line closest to the indicator as a target text line.

Specifically, the bottom edge of the character line can be obtained by connecting the maximum abscissa and the minimum abscissa of the character line, then the vertical distance between the ordinate of the indicator and the bottom edge of each character line is calculated, and the character line with the minimum vertical distance is used as the target character line.

S203, determining the pixel width occupied by a single character in the target text line, and combining the position information of the target text line and the indicator to obtain the character pointed by the indicator.

Specifically, after the target text line is determined, the pixel width occupied by the target text line is obtained according to the position information of the target text line, and the pixel width occupied by a single character in the target text line can be determined by combining the number of characters in the target text line identified by the OCR engine.

For example, if the maximum abscissa of the target character line is 100 and the minimum abscissa is 10, the width of the target character line is 90, and if the number of characters in the target character line is 30, the pixel width occupied by a single character is 3.

Further, the distance between the indicator and the left end of the target text line can be determined according to the position information of the target text line and the indicator. It will be appreciated that the value of the abscissa increases gradually from left to right, and that the distance of the indicator from the left end of the target text line can be determined by subtracting the smallest abscissa of the target text line from the abscissa of the indicator. For example, the abscissa of the indicator is 60, and the abscissa of the target text line is 10, then the distance between the indicator and the left end of the target text line is 50.

The sequence of the character pointed by the indicator in the target text line can be obtained by dividing the distance between the indicator and the left end of the target text line by the pixel width occupied by the single character, for example, if the distance between the indicator and the left end of the target text line is 50, the pixel width occupied by the single character is 3, 50 is divided by 3 to 16, and the remainder is 2, which means that the sequence of the character pointed by the indicator in the target text line is 17. After the sequence of the characters pointed by the indicator is obtained, the characters pointed by the indicator can be determined from the target text line according to the sequence, namely, the 17 th word in the target text line is used as the character pointed by the indicator.

Because the embodiment of the application is completed in an image recognition mode when the content is recognized and read, in order to verify the influence of the luminosity of the ambient light on the recognition accuracy, the embodiment of the application respectively carries out statistics of the recognition accuracy at different times under different luminosities, and the statistical results are shown in the table 1:

table 1 identification accuracy statistics table

As can be seen from table 1, the method for identifying the click-to-read content provided by the embodiment of the application has no obvious correlation with the size of the luminosity, and can ensure ideal accuracy even under the condition of lower luminosity (320±30 Lux), therefore, the method has no strict requirement on luminosity, and can meet the click-to-read requirement of general users.

Further, the embodiment of the application is also compared with the accuracy of the existing multiple point-reading products under different luminosities, and the comparison result is shown in Table 2

TABLE 2 statistical table of identification results of the present application and the bid product

As can be seen from Table 2, the recognition accuracy of the application under natural light and desk lamp light is very small, even the recognition accuracy under desk lamp light with lower luminosity is slightly better than that under natural light, and 3 kinds of competing products have the problem of large difference of accuracy under different luminosities or have the problem of obviously lower accuracy under a certain luminosity, so that compared with the prior art, the embodiment of the application has more excellent comprehensive results of accuracy and stability under different luminosities.

Further, the embodiment of the application is also compared with the existing multiple point-reading products in terms of recognition speed, and the comparison result is shown in Table 3.

Product(s)	The application is that	Step by step high	Alx egg	L*ka
					Time consuming (ms)	947.5	1415	1312.5	2777.5

Table 3 identification speed contrast table

As can be seen from Table 3, the recognition speed of the embodiment of the present application is significantly higher than that of the prior art.

Fig. 6 is an interaction schematic diagram of a system for identifying content to be read by touch, as shown in fig. 6, in which the system for identifying content to be read by touch of this embodiment includes a terminal, an access layer and an algorithm layer, specifically,

the terminal responds to the operation of reading the reading matter by the user, acquires the image of the reading matter, and then sends a reading identification request to the access layer through an HTTP protocol (HyperText Transfer Protocol ), wherein the reading identification request comprises the image of the reading matter and account information of the user.

The access layer refers to a part of the network directly facing user connection or access, and the access layer realizes connection with the user by using transmission media such as optical fibers, twisted pair wires, coaxial cables, wireless access technology and the like, and optionally, the access layer consists of a wireless network card, an access point and a switch.

After receiving the click-to-read identification request, the access layer checks account information recorded in the click-to-read identification request, and aims to intercept possible malicious attacks. The purpose of the method is to prevent the operand of the subsequent algorithm layer from exceeding the calculation capability, promote the embodiment of the application to keep stable for a long time, and then check the data, specifically, check the format, definition, size and the like of the image, and only after the data is checked, the click-to-read identification request is sent to the algorithm layer in the Taf framework for processing. Taf is a high-performance RPC framework of a background logic layer, which currently supports three languages of C++, java and node, integrates the functions of scalable protocol coding and decoding, high-performance RPC communication framework, name routing and discovery, release monitoring, log statistics, configuration management and the like, can quickly construct self stable and reliable distributed application in a micro-service mode, and realizes complete and effective service management.

The algorithm layer firstly recognizes the image through an OCR recognition engine to obtain a character line and a position of the character line in the image; identifying position information of an indicator in the image by a fingertip detection method;

then determining a target character line, wherein the target character line comprises a distance from an indicator to the bottom edge of any character line according to the position information of the character line and the indicator; obtaining a position relation coefficient of the character line according to the relative position relation between the indicator and the bottom edge of the character line; carrying out weighted summation on the vertical distance from the indicator to the bottom edge of the character line and the position relation coefficient of the character line to obtain the weighted distance between the character line and the indicator; and taking the character row with the smallest weighted distance as a target character row.

After determining the target text line, analyzing the text line, and determining the pixel width of a single character in the text line, wherein the method specifically comprises the steps of obtaining the pixel width occupied by the target text line according to the position information of the target text line; and obtaining the pixel width occupied by a single character in the target text line according to the quotient of the pixel width occupied by the target text line and the number of the characters in the target text line.

Acquiring the pixel width occupied by a single character, and acquiring the character pointed by the indicator by combining the position information of the target character row and the indicator, specifically, determining the distance between the indicator and the left end of the target character row according to the position information of the target character row and the indicator; obtaining the sequence of the characters pointed by the indicator in the target text line according to the quotient of the distance between the indicator and the left end of the target text line and the pixel width occupied by the single characters in the target text line; determining the character pointed by the indicator from the target text line according to the sequence, wherein if the character pointed by the indicator is a Chinese character, the point reading content is determined according to the Chinese character; if the character pointed by the indicator is an English character, determining the vocabulary in which the English character is located, and determining the click reading content according to the vocabulary.

And the algorithm layer returns the click-to-read content to the terminal for display through the access layer.

The embodiment of the application provides a device for identifying content to be read, as shown in fig. 7, the device for identifying the content to be read may include: an image acquisition module 301, a pointing character determination module 302, and a click-to-read content determination module 303, wherein,

the image acquisition module 301 is configured to acquire an image of a read-through reading material.

The image of the embodiment of the application not only comprises the character to be recognized, but also records the indicator when the user clicks, that is, the embodiment of the application does not need to use a click pen to click as in the prior art, and can realize click by only the indicator, thereby greatly reducing the hardware requirement of click. It will be appreciated that the user may read with a conventional pen, such as a pencil, brush, pen, etc., in addition to the pointing device, without limitation.

The pointing character determining module 302 is configured to identify the position information of the character and the indicator in the image, and determine the character pointed by the indicator according to the position information of the character and the indicator.

The embodiment of the application can adopt the fingertip detection to identify the indicator and the position information of the indicator in the image, detect the fingertip of the indicator, and take the coordinate of the fingertip as the position information of the indicator. Fingertip recognition is a special hand key point recognition and refers to a technology for positioning the fingertips of an index finger extending out of a gesture such as an index finger. The fingertip detection can adopt gesture recognition functions (Gesture Recognition, GR) of the Tencent cloud recognition, wherein the Tencent cloud recognition is based on man-machine interaction technology deduced by Tencent audio-video laboratories and comprises multiple functions of static gesture recognition, key point recognition, fingertip recognition, gesture action recognition and the like. Tencent cloud recognition is supported in any of the pictures, and if the gesture is of the type extending beyond the index finger, the position of the fingertip is returned, wherein the position is represented by the abscissa and the ordinate.

And the click-to-read content determining module 303 is configured to determine click-to-read content according to the character pointed by the pointer.

The content recognition and click-to-read device provided in the embodiment of the present invention specifically executes the flow of the embodiment of the method, and specific please refer to the content of the embodiment of the method for recognizing and click-to-read content in detail, which is not described herein again. According to the device for identifying the click-to-read content, provided by the embodiment of the invention, the image of the click-to-read material is obtained, the characters in the image and the position information of the indicator are identified through the image identification method, the characters pointed by the indicator are determined by the characters and the position information of the indicator, and finally the click-to-read content is determined according to the characters pointed by the indicator, so that the click-to-read range is not limited by a click-to-read pen and a specific click-to-read teaching material, the ordinary reading material, data and other reading materials can be supported, the convenience is great, the solution can be quickly obtained for the rare content encountered by a user in learning, and the learning efficiency of the user is effectively improved.

It should be noted that the recognition accuracy of the OCR recognition engine often has a positive correlation with the cost, and the higher the recognition accuracy of the OCR recognition engine is, the higher the cost is required. Thus, on the basis that the above-described embodiment gives an example in which the position information of each character is accurately recognized by the OCR recognition engine, as an alternative embodiment, the pointing character determining module includes:

the character line position determining sub-module is used for identifying the image through the OCR engine to obtain character lines in the image and the positions of the character lines; the position information of the pointer in the image is identified by a fingertip detection method.

Fi＝a*Li+b*Mi

And the target text line determining sub-module is used for obtaining the text line closest to the indicator as the target text line according to the text line and the position information of the indicator.

And the width combining sub-module is used for determining the pixel width occupied by a single character in the target text line and combining the position information of the target text line and the indicator to obtain the character pointed by the indicator.

and the text line width calculation subunit is used for obtaining the pixel width occupied by the target text line according to the position information of the target text line.

and the distance determining subunit is used for determining the distance between the indicator and the left end of the target text line according to the position information of the target text line and the indicator.

The embodiment of the application provides electronic equipment, which comprises: a memory and a processor; at least one program stored in the memory for execution by the processor, which, when executed by the processor, performs: the method has the advantages that the image of the reading material is obtained, the characters in the image and the position information of the indicator are identified through the image identification method, the characters pointed by the indicator are determined by the characters and the position information of the indicator, and finally the reading content is determined according to the characters pointed by the indicator, so that the reading range is not limited by a reading pen and a specific reading teaching material, the reading of common teaching materials, data and other reading materials can be supported, great convenience is realized, solutions can be quickly obtained for the rare content of the user in learning, and the learning efficiency of the user is effectively improved.

In an alternative embodiment, there is provided an electronic device, as shown in fig. 8, the electronic device 4000 shown in fig. 8 includes: a processor 4001 and a memory 4003. Wherein the processor 4001 is coupled to the memory 4003, such as via a bus 4002. Optionally, the electronic device 4000 may also include a transceiver 4004. It should be noted that, in practical applications, the transceiver 4004 is not limited to one, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.

The processor 4001 may be a CPU (Central Processing Unit ), general purpose processor, DSP (Digital Signal Processor, data signal processor), ASIC (Application Specific Integrated Circuit ), FPGA (Field Programmable Gate Array, field programmable gate array) or other programmable logic device, transistor logic device, hardware components, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules and circuits described in connection with this disclosure. The processor 4001 may also be a combination that implements computing functionality, e.g., comprising one or more microprocessor combinations, a combination of a DSP and a microprocessor, etc.

Bus 4002 may include a path to transfer information between the aforementioned components. Bus 4002 may be a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus or an EISA (Extended Industry Standard Architecture ) bus, or the like. The bus 4002 can be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in fig. 8, but not only one bus or one type of bus.

Memory 4003 may be, but is not limited to, ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, RAM (Random Access Memory ) or other type of dynamic storage device that can store information and instructions, EEPROM (Electrically Erasable Programmable Read Only Memory ), CD-ROM (Compact Disc Read Only Memory, compact disc Read Only Memory) or other optical disk storage, optical disk storage (including compact discs, laser discs, optical discs, digital versatile discs, blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.

The memory 4003 is used for storing application program codes for executing the inventive arrangements, and is controlled to be executed by the processor 4001. The processor 4001 is configured to execute application program codes stored in the memory 4003 to realize what is shown in the foregoing method embodiment.

Embodiments of the present application provide a computer-readable storage medium having a computer program stored thereon, which when run on a computer, causes the computer to perform the corresponding method embodiments described above. Compared with the prior art, the method has the advantages that the images of the reading materials are obtained, the characters in the images and the position information of the indicator are identified through the image identification method, the characters pointed by the indicator are determined according to the characters pointed by the indicator, and finally the reading content is determined according to the characters pointed by the indicator, so that the reading range is not limited by a reading pen and specific reading teaching materials, the reading of common teaching materials, data and other reading materials can be supported, great convenience is achieved, solutions can be obtained quickly for the rare content encountered by a user in learning, and the learning efficiency of the user is effectively improved.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

The foregoing is only a partial embodiment of the present invention, and it should be noted that it will be apparent to those skilled in the art that modifications and adaptations can be made without departing from the principles of the present invention, and such modifications and adaptations are intended to be comprehended within the scope of the present invention.

Claims

1. A method of identifying click-to-read content, comprising:

acquiring an image of a read-on reading material;

identifying the position information of the characters and the indicators in the image, and determining the characters pointed by the indicators according to the position information of the characters and the indicators;

determining the click-to-read content according to the character pointed by the indicator;

the method for identifying the position information of the character and the indicator in the image, determining the character pointed by the indicator according to the position information of the character and the indicator, comprises the following steps:

recognizing the image through an OCR recognition engine to obtain a character line in the image and the position of the character line; identifying position information of an indicator in the image by a fingertip detection method;

determining the pixel width occupied by a single character in a target character line, and combining the position information of the target character line and the indicator to obtain the character pointed by the indicator;

The step of obtaining the character pointed by the indicator by combining the target text line and the position information of the indicator comprises the following steps:

2. The method of claim 1, wherein identifying the location information of the character and the pointer in the image, determining the character pointed by the pointer based on the location information of the character and the pointer, comprises:

the image is recognized by an OCR recognition engine, and the position information of the characters in the image is obtained; identifying position information of an indicator in the image by a fingertip detection method;

3. The method for recognizing click-to-read contents according to claim 1, wherein the step of obtaining a character line closest to the pointer as a target character line based on the character line and the positional information of the pointer comprises:

and taking the text line with the smallest weighted distance as the target text line.

4. The method for recognizing click-through content according to claim 1, wherein determining a pixel width occupied by a single character in the target text line comprises:

acquiring the pixel width occupied by the target text line according to the position information of the target text line;

5. The method of any one of claims 1-4, wherein determining the click-to-read content from the character pointed to by the pointer comprises:

if the character pointed by the indicator is a Chinese character, determining the point reading content according to the Chinese character;

if the character pointed by the indicator is an English character, determining the vocabulary where the English character is located, and determining the click reading content according to the vocabulary.

6. An apparatus for identifying click-to-read content, comprising:

the pointing character determining module is used for identifying the position information of the characters and the indicators in the image and determining the characters pointed by the indicators according to the position information of the characters and the indicators;

the click-to-read content determining module is used for determining click-to-read content according to the character pointed by the indicator;

the pointing character determining module is specifically configured to:

the pointing character determining module obtains the character pointed by the indicator by combining the target text line and the position information of the indicator, and the pointing character determining module comprises:

7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor performs the steps of the method of identifying content for click-through as claimed in any one of claims 1 to 5 when the program is executed.

8. A computer-readable storage medium storing computer instructions that cause the computer to perform the steps of the method of identifying content for click-to-read as claimed in any one of claims 1 to 5.