CN110647785B - Method and device for identifying accuracy of input text and electronic equipment - Google Patents

Method and device for identifying accuracy of input text and electronic equipment Download PDF

Info

Publication number
CN110647785B
CN110647785B CN201810675867.5A CN201810675867A CN110647785B CN 110647785 B CN110647785 B CN 110647785B CN 201810675867 A CN201810675867 A CN 201810675867A CN 110647785 B CN110647785 B CN 110647785B
Authority
CN
China
Prior art keywords
character
characters
recognized
input
error
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810675867.5A
Other languages
Chinese (zh)
Other versions
CN110647785A (en
Inventor
蓝天才
张瑞强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Office Software Inc
Zhuhai Kingsoft Office Software Co Ltd
Guangzhou Kingsoft Mobile Technology Co Ltd
Original Assignee
Beijing Kingsoft Office Software Inc
Zhuhai Kingsoft Office Software Co Ltd
Guangzhou Kingsoft Mobile Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Office Software Inc, Zhuhai Kingsoft Office Software Co Ltd, Guangzhou Kingsoft Mobile Technology Co Ltd filed Critical Beijing Kingsoft Office Software Inc
Priority to CN201810675867.5A priority Critical patent/CN110647785B/en
Publication of CN110647785A publication Critical patent/CN110647785A/en
Application granted granted Critical
Publication of CN110647785B publication Critical patent/CN110647785B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/32Digital ink
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/32Digital ink
    • G06V30/36Matching; Classification

Abstract

The embodiment of the invention provides a method and a device for identifying the accuracy of an input text and electronic equipment, wherein the method comprises the following steps: acquiring an ink image containing input text input by handwriting as an ink image to be recognized; identifying the ink image to be identified to obtain characters to be identified in the input text; matching the character to be recognized with a preset dictionary to obtain a matching result; based on the obtained matching result, it is determined whether there is an erroneous character within the character to be recognized. Therefore, in the embodiment of the invention, after the characters to be recognized in the input text are recognized from the ink image containing the input text of the handwriting input, the characters to be recognized are matched with the preset dictionary, and whether the characters to be recognized have error characters or not can be determined based on the matching result, so that the text input in the handwriting input mode is subjected to error recognition.

Description

Method and device for identifying accuracy of input text and electronic equipment
Technical Field
The invention relates to the technical field of information processing, in particular to a method and a device for identifying the accuracy of an input text and electronic equipment.
Background
With the development of touch electronic devices, a handwriting input mode becomes a favorite input mode for inputting texts. When a user inputs text by handwriting, it may happen that wrong text is input by handwriting, for example, wrong chinese characters, english words, japanese characters, text words, etc. are input by handwriting.
However, at present, there is no method for performing error recognition on a text input by a handwriting input method, and how to perform error recognition on a text input by a handwriting input method becomes an urgent problem to be solved.
Disclosure of Invention
The embodiment of the invention aims to provide a method and a device for identifying the accuracy of an input text and electronic equipment, so as to realize error identification of the text input by a handwriting input mode. The specific technical scheme is as follows:
in one aspect, an embodiment of the present invention provides a method for identifying accuracy of an input text, where the method includes:
acquiring an ink image containing input text input by handwriting as an ink image to be recognized;
identifying the ink image to be identified to obtain characters to be identified in the input text;
matching the character to be recognized with a preset dictionary to obtain a matching result;
and determining whether an error character exists in the character to be recognized or not based on the obtained matching result.
Optionally, the step of recognizing the ink image to be recognized to obtain the character to be recognized in the input text includes:
identifying the ink image to be identified to obtain characters to be identified in the input text and position information of each character to be identified;
the method further comprises the following steps:
and when the error characters exist in the characters to be recognized, displaying prompt information aiming at the error characters in a display screen based on the position information of each error character.
Optionally, the step of outputting a prompt message for each error character in the display screen based on the position information of the error character includes:
and displaying a preset prompt line at a first position corresponding to the position of each error character in the display screen based on the position information of each error character.
Optionally, the step of outputting a prompt message for each error character in the display screen based on the position information of the error character includes:
and highlighting the position of each error character in the display screen based on the position information of the error character.
Optionally, after the step of outputting a prompt message for each error character in the display screen based on the position information of the error character, the method further includes:
obtaining a corrected character corresponding to each error character as a replacement character;
and displaying the replacement characters at a second position corresponding to the position of each error character in the display screen based on the position information of each error character.
Optionally, the step of recognizing the ink image to be recognized to obtain the character to be recognized in the input text includes:
and identifying the ink image to be identified through an Optical Character Recognition (OCR) algorithm to obtain the character to be identified in the input text.
Optionally, the step of recognizing the ink image to be recognized to obtain the character to be recognized in the input text includes:
identifying the ink image to be identified to obtain all characters in the input text;
performing word segmentation processing on all the recognized characters to obtain at least one group of word groups;
screening out a group of phrases containing the last handwritten input character from the at least one group of phrases to be used as a phrase to be identified;
and taking all characters in the phrase to be recognized as characters to be recognized.
Optionally, the step of recognizing the ink image to be recognized to obtain the character to be recognized in the input text includes:
identifying the ink image to be identified to obtain all characters in the input text;
determining the character input by handwriting finally from all the recognized characters;
and taking the character input by handwriting at the last time and a preset number of characters before the character input by handwriting at the last time in all the recognized characters as characters to be recognized.
In another aspect, an embodiment of the present invention provides an apparatus for identifying accuracy of an input text, where the apparatus includes:
the first obtaining module is used for obtaining an ink image containing input text of handwriting input as an ink image to be recognized;
the recognition module is used for recognizing the ink image to be recognized to obtain characters to be recognized in the input text;
the matching module is used for matching the character to be recognized with a preset dictionary to obtain a matching result;
and the determining module is used for determining whether an error character exists in the character to be recognized or not based on the obtained matching result.
Optionally, the identification module, in particular for
Identifying the ink image to be identified to obtain characters to be identified in the input text and position information of each character to be identified;
the device further comprises:
and the first display module is used for outputting prompt information aiming at the error character in a display screen based on the position information of each error character when the error character exists in the character to be recognized.
Optionally, the first display module is specifically configured to
And displaying a preset prompt line at a first position corresponding to the position of each error character in the display screen based on the position information of each error character.
Optionally, the first display module is specifically configured to
And highlighting the position of each error character in the display screen based on the position information of the error character.
Optionally, the apparatus further comprises:
a second obtaining module, configured to obtain a corrected character corresponding to each error character as a replacement character after outputting prompt information for the error character in a display screen based on the position information of each error character;
and the second display module is used for displaying the replacement character at a second position corresponding to the position of the error character in the display screen based on the position information of each error character.
Optionally, the identification module, in particular for
And identifying the ink image to be identified through an Optical Character Recognition (OCR) algorithm to obtain the characters to be identified in the input text.
Optionally, the identification module is specifically for
Identifying the ink image to be identified to obtain all characters in the input text;
performing word segmentation processing on all the recognized characters to obtain at least one group of word groups;
screening out a group of phrases containing the last handwritten input character from the at least one group of phrases to be used as a phrase to be identified;
and taking all characters in the phrase to be recognized as characters to be recognized.
Optionally, the identification module, in particular for
Identifying the ink image to be identified to obtain all characters in the input text;
determining the character input by handwriting finally from all the recognized characters;
and taking the character input by handwriting at the last time and a preset number of characters before the character input by handwriting at the last time in all the recognized characters as characters to be recognized.
On the other hand, the embodiment of the invention provides electronic equipment, which comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete mutual communication through the communication bus;
a memory for storing a computer program;
the processor is used for realizing any steps of the method for identifying the accuracy of the input text provided by the embodiment of the invention when the computer program stored on the memory is executed.
In another aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method for identifying the accuracy of an input text provided by an embodiment of the present invention are implemented.
In the embodiment of the invention, an ink image containing input text of handwriting input is obtained and used as an ink image to be identified; identifying the ink image to be identified to obtain characters to be identified in the input text; matching the character to be recognized with a preset dictionary to obtain a matching result; based on the obtained matching result, it is determined whether there is an erroneous character within the character to be recognized. Therefore, in the embodiment of the invention, after the characters to be recognized in the input text are recognized and obtained from the ink image of the input text containing the handwriting input, the characters to be recognized are matched with the preset dictionary, whether the characters to be recognized have error characters can be determined based on the matching result, so that the error recognition of the text input in a handwriting input mode is realized, namely, the input text of the handwriting input is checked, and whether the error characters appear in the input text of the handwriting input is recognized. Of course, it is not necessary for any product or method to achieve all of the above-described advantages at the same time for practicing the invention.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a schematic flowchart of a method for identifying accuracy of an input text according to an embodiment of the present invention;
fig. 2 is another schematic flow chart of a method for identifying the accuracy of an input text according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an apparatus for recognizing accuracy of an input text according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of an apparatus for recognizing accuracy of an input text according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention provides a method and a device for identifying the accuracy of an input text and electronic equipment, which are used for realizing the error identification of the text input by a handwriting input mode.
As shown in fig. 1, an embodiment of the present invention provides a method for identifying accuracy of an input text, which may include the following steps:
s101: acquiring an ink image containing input text input by handwriting as an ink image to be recognized;
it can be understood that the method for identifying the accuracy of the input text provided by the embodiment of the present invention can be applied to any electronic device with a touch screen, such as a touch screen computer, a touch screen mobile phone, a touch screen game machine, a touch screen electronic reader, and the like. In one case, the functional software for implementing the method for identifying accuracy of an input text provided by the embodiment of the present invention may exist in the form of a special client software, or may exist in the form of a plug-in of any functional software for providing a handwriting input function, where the functional software for providing text input may be: and (4) office software.
In one case, in order to achieve timeliness of identifying accuracy of an input text to a certain extent, when the electronic device detects that a character is completed by handwriting input of a user, an identification process of accuracy of the input text provided by the embodiment of the invention is triggered. When the electronic equipment detects that the handwriting input of the user is finished by one character, an ink image containing the input text of the handwriting input is obtained and is used as an ink image to be recognized, wherein the input text contained in the ink image to be recognized at least comprises: and the user inputs the last character finished by handwriting.
In another case, when the electronic device detects that the preset error recognition function key is triggered, the electronic device triggers the accurate recognition process of the input text provided by the embodiment of the present invention. When the electronic equipment detects that the preset error recognition function key is triggered, the electronic equipment obtains an ink image containing input texts input by handwriting as to-be-recognized ink images, wherein the input texts contained in the to-be-recognized ink images can be texts selected by users.
In the embodiment of the present invention, the electronic device may provide an ink writing function for a user, after the ink writing function of the electronic device is turned on, the user may input a text in a handwriting input manner on a touch screen of the electronic device, and subsequently, the electronic device may obtain an image including the input text handwritten by the user as an ink image to be recognized, and then execute a subsequent process of recognizing the accuracy of the input text provided by the embodiment of the present invention. In the embodiment of the present invention, the input text may be referred to as an input text, and the image including the input text input by handwriting may be referred to as an ink image.
The input text may contain a plurality of characters, and the characters may be the smallest units in any language, for example: may be words in chinese, english, russian, korean, etc.
In one implementation, the ink image to be recognized may include: all input texts input by the user in the current handwriting input process, or partial input texts input by the user in the current handwriting input process, or all input texts input by the user in the handwriting input process displayed on a touch screen of the electronic equipment are all possible.
The electronic equipment can determine the input text input by the user in the current handwriting input process based on the time interval between the characters in the input text input by the user in a handwriting mode. Specifically, when a user inputs a text by handwriting, that is, each character is input by handwriting, a time interval exists between the characters, and the time interval between the characters may be: the difference between the time when the user has finished entering a character and the time when the user has started entering the next character. The electronic device may determine each character entered by the user based on the time interval between the characters. When a user inputs each character by handwriting, the electronic device may correspondingly record an input start time (i.e., a time for starting to input the character) and an input end time (i.e., a time for completing inputting the character) when the user inputs the character, and further may determine a time interval between characters to be input by handwriting based on the input start time and the input end time of each character, and when the time interval between two characters is greater than a preset time, the pair of characters is considered to be input by two handwriting input processes.
For example, the user handwriting input text a includes characters a, b, and c, where a and b are characters input adjacently, b and c are characters input adjacently, and c is a character input by the user in the last handwriting. And when the second difference is not less than the preset time length, the character input method is characterized in that c, b and a are input characters in different handwriting input processes. Wherein c is the input text input by the user in the current handwriting input process, and a and b are the input text input in the last handwriting input process.
S102: identifying the ink image to be identified to obtain characters to be identified in the input text;
the character to be recognized in the obtained input text may be one or more characters, where the character to be recognized at least includes: the last character the user handwritten input. In the embodiment of the present invention, after the electronic device obtains the ink image to be recognized, it may use any recognition algorithm that can recognize characters from the image to perform character recognition on the ink image to be recognized, for example: the recognition algorithm may be: structural pattern recognition algorithms, statistical pattern recognition algorithms, and the like. The embodiment of the present invention does not limit the type of the above-described recognition algorithm.
In an implementation manner, the step of recognizing the ink image to be recognized to obtain the character to be recognized in the input text may include:
and identifying the ink image to be identified through an Optical Character Recognition (OCR) algorithm to obtain the Character to be identified in the input text.
It is to be understood that the OCR algorithm is an algorithm for recognizing optical characters by image processing and pattern recognition technology, which can recognize not only each character from an image but also the position of each character in the image. The electronic equipment can better recognize the characters in the input text contained in the ink image to be recognized by utilizing the OCR algorithm, and further determine the characters to be recognized from the characters.
S103: matching the character to be recognized with a preset dictionary to obtain a matching result;
s104: based on the obtained matching result, it is determined whether there is an erroneous character within the character to be recognized.
The preset dictionary may include each accurate single character, short sentences and phrases according with grammar rules, and the like, and may further include a use frequency of each character, a use frequency of each short sentence according with grammar, and a use frequency of phrases. Wherein, the above-mentioned short sentence and phrase that accord with the grammatical rule can be: a string of characters. The characters can be characters in Chinese, words in English, and the like.
In one implementation, the characters to be recognized may be matched with each character, short sentences and phrases that meet the grammar rule, included in a preset dictionary, one to one, so as to obtain a matching result. When the preset dictionary has characters which are the same as the characters to be recognized or short sentences or phrases which accord with grammar rules, the matching result is the result which represents that no wrong characters exist in the characters to be recognized, namely no wrong characters exist. And when the characters which are the same as the characters to be recognized and the short sentences and the phrases which accord with the grammar rules do not exist in the preset dictionary, the matching result is the result which represents that the characters to be recognized have errors, namely the characters with errors exist.
Further, the electronic device may determine whether there is an erroneous character within the character to be recognized based on the obtained matching result. When the matching result is a result representing that no error character exists in the character to be recognized, namely no error character exists, the electronic equipment determines that no error character exists in the character to be recognized; and when the matching result is a result representing that an error character exists in the character to be recognized, namely the error character exists, the electronic equipment determines that the error character exists in the character to be recognized.
In one implementation, the preset dictionary may store different numbers of the phrases and the phrases meeting the grammar rules, which are stored in the preset dictionary, and the preset dictionary may store the stored phrases and the phrases in a classified manner based on the number of the characters contained in the stored phrases, for example: the method comprises the steps of storing a single character in a storage area A of a preset dictionary to generate a first sub-dictionary, storing short sentences and phrases which contain 2 characters and accord with grammar rules in another storage area B of the preset dictionary to generate a second sub-dictionary, storing short sentences and phrases which contain 3 characters and accord with grammar rules in another storage area C of the preset dictionary to generate a third sub-dictionary, wherein the storage areas A, B and C are different storage areas of the preset dictionary.
Furthermore, after the electronic device obtains the character to be recognized, the number of characters in the character to be recognized may be determined first, and then the sub-dictionary to be matched is determined based on the number of characters in the character to be recognized, where the number of characters corresponding to the sub-dictionary to be matched may be the same as the number of characters in the character to be recognized, and may also be greater than or equal to the number of characters in the character to be recognized. Through the matching mode, the matching operation amount in the identification process can be reduced to a certain extent.
In the embodiment of the invention, an ink image containing input text input by handwriting is obtained and used as an ink image to be identified; identifying the ink image to be identified to obtain characters to be identified in the input text; matching the character to be recognized with a preset dictionary to obtain a matching result; based on the obtained matching result, it is determined whether there is an erroneous character within the character to be recognized. Therefore, in the embodiment of the invention, after the characters to be recognized in the input text are recognized from the ink image containing the input text of the handwriting input, the characters to be recognized are matched with the preset dictionary, whether the characters to be recognized have error characters can be determined based on the matching result, so that the error recognition of the text input in the handwriting input mode is realized, that is, the input text of the handwriting input is checked, and whether the error characters appear in the input text of the handwriting input is recognized.
In one case, the embodiment of the invention can identify whether the single character has writing errors or has spelling errors. Specifically, the character to be recognized in the obtained input text may be a character, at this time, the character to be recognized may be matched with content (each character, a short sentence and a phrase that meet a grammar rule) in a preset dictionary, and when the character to be recognized does not exist in the preset dictionary, it may be determined that the character to be recognized is an error character, that is, a writing error or a spelling error occurs in the character to be recognized that is input by handwriting. For example, when the character is an english word, and the word handwritten by the user is "dsiplay", the word handwritten by the user does not exist in the preset dictionary, and at this time, the word is an error character.
In another case, it may happen that the entered character itself is not in error, but in a different context, meaning, the character may be the wrong character. For example: the characters are Chinese characters, the correct phrase has a "new sanction", and when the phrase input by hand is "new sanction", the phrase input by hand is the wrong character. In order to identify the error character in the above category, it is necessary to analyze whether the character is an error character according to the context where the character is located. There may be a plurality of characters to be recognized at this time.
In an implementation manner, the step of recognizing the ink image to be recognized to obtain the character to be recognized in the input text may include:
identifying the ink image to be identified to obtain all characters in the input text;
performing word segmentation processing on all the recognized characters to obtain at least one group of phrases;
screening out a group of phrases containing the last handwritten input character from at least one group of phrases to be used as the phrase to be identified;
and taking all characters in the phrase to be recognized as characters to be recognized.
In the embodiment of the invention, the electronic equipment can identify the ink image to be identified by using a character identification algorithm to obtain all characters in the input text in the ink image to be identified, then, the preset word segmentation processing algorithm is used for carrying out word segmentation processing on all the obtained characters to obtain at least one group of word groups, and then all the characters in one group of word groups including the finally handwritten input characters are used as the characters to be identified. So as to determine whether the character to be recognized has error characters or not based on the semantic meaning of each character to be recognized in the phrase grouping. The preset word segmentation processing algorithm may be an algorithm based on a natural language processing technology.
In another implementation manner, the step of recognizing the ink image to be recognized to obtain the character to be recognized in the input text may include:
identifying the ink image to be identified to obtain all characters in the input text;
determining the character which is input by handwriting finally from all the recognized characters;
and taking the character input by handwriting at the last time and a preset number of characters before the character input by handwriting at the last time in all the recognized characters as characters to be recognized.
The last character input by handwriting is the last character input by handwriting of the user.
The preset number may be set based on the number of characters in a phrase or a phrase that conforms to a grammar rule and is included in the preset dictionary. When the number of characters in a short sentence or a phrase conforming to the grammar rule is at most N, wherein N is a positive integer, the preset number can be any positive integer less than or equal to N.
In an implementation manner, after identifying an incorrect character in an input text, in order to warn a user that the incorrect character exists in the written text, and further enable the user to better find out the written incorrect character, as shown in fig. 2, the method for identifying the accuracy of the input text provided by the embodiment of the present invention may include the following steps:
s201: acquiring an ink image containing input text input by handwriting as an ink image to be recognized;
s201 is the same as S101 shown in fig. 1, and is not described again.
S202: identifying the ink image to be identified to obtain characters to be identified in the input text and position information of each character to be identified;
in the embodiment of the invention, the electronic equipment identifies the ink image to be identified through a character identification algorithm, such as an OCR algorithm, and obtains the characters to be identified in the input text and the position information of each character to be identified so as to position the subsequent identified error characters. In one implementation, the position information of the character to be recognized may be identified as (x, y, h, k), where (x, y) is a position coordinate of an upper left corner point of a minimum rectangular region including the character to be recognized, the h identifies a height of the minimum rectangular region including the character to be recognized, and the k identifies a width of the minimum rectangular region including the character to be recognized. In one implementation manner, the position information of the character to be recognized may be identified as (x1, y1, x2, y2), where the above-mentioned (x1, y1) is the position coordinate of the upper left corner point of the minimum rectangular region containing the character to be recognized, and the above-mentioned (x2, y2) is the position coordinate of the lower right corner point of the minimum rectangular region containing the character to be recognized.
S203: matching the character to be recognized with a preset dictionary to obtain a matching result;
s204: determining whether an error character exists in the character to be recognized based on the obtained matching result;
s203 is the same as S102 shown in fig. 1, and S204 is the same as S103 shown in fig. 1, and thus the description thereof is omitted.
S205: when it is determined that there are erroneous characters within the characters to be recognized, prompt information is output in the display screen for each erroneous character based on the position information of the erroneous character.
The display screen is the touch screen of the electronic device.
In the embodiment of the invention, in order to enable a user to better find that an error occurs in a text which is input by handwriting, when the electronic equipment determines that an error character exists in a character to be recognized, the position information of the error character can be determined, and then prompt information is output for the error character in a display screen based on the position information of the error character, so that the user can more clearly determine which character has the error. The method and the device can check and prompt the error of the text input by the user through handwriting, and improve the user experience.
In one implementation, the step of outputting a prompt message for each error character in the display screen based on the position information of the error character may include:
and displaying a preset prompt line at a first position corresponding to the position of each error character in the display screen based on the position information of each error character.
In the embodiment of the present invention, one or more error characters may exist, and when there are multiple error characters, the electronic device may display, for each error character, a preset prompt line at a first position corresponding to a position where the error character is located in the display screen based on the position information of the error character. The first position may be right below, right left, right, or the like of the position of the error character, and the preset prompt line may be a wavy line, a straight line, an arc line, or the like, which may be a single-layer line or a multi-layer line. In one case, for better prompting, the color of the preset prompt line may be different from the color of the character to be recognized, for example, the color of the character to be recognized is black, and the color of the preset prompt line may be red.
The following description will be given taking the determination of the position of the error character directly below as an example: when the position information of the character to be recognized is marked as (x, y, h, k), the position between the position coordinate (x, y-h) and the position coordinate (x + k, y-h) can be used as the position right below the position of the error character, namely, the first position corresponding to the position of the error character. When the position information of the character to be recognized is identified as (x1, y1, x2, y2), the position from the position coordinates of (x1, y2) to the position coordinates of (x2, y2) can be used as the position right below the position where the error character is located, that is, the first position corresponding to the position where the error character is located.
In another implementation manner, the step of outputting a prompt message for each error character in the display screen based on the position information of the error character may include:
and highlighting the position of each error character in the display screen based on the position information of the error character.
In the embodiment of the invention, the aim of reminding the user can be fulfilled by highlighting the position of the error character. The highlighting may be: a more striking color, such as red, yellow, blue, etc., is displayed in the display screen where the wrong character is located.
In one implementation, after the step of outputting a prompt message for each error character in the display screen based on the position information of the error character, the method may further include:
obtaining a corrected character corresponding to each error character as a replacement character;
and displaying the replacement characters at a second position corresponding to the position of each error character in the display screen based on the position information of the error character.
In the embodiment of the invention, in order to provide more comprehensive service for the user, the user experience is improved. After the electronic equipment determines that the characters to be recognized have the wrong characters, corrected characters corresponding to the wrong characters can be determined based on the preset dictionary and serve as replacement characters to be displayed to a user, so that the user can screen out the correct characters which the user wants to input by handwriting.
In one implementation, when the character to be recognized includes a character and the character is a word in english, the electronic device may calculate an edit distance between the character to be recognized and each single character in a preset dictionary, and filter out M characters, corresponding to minimum edit distances, from the preset dictionary as candidate characters, where in one case, the candidate characters may be directly used as replacement characters; in another case, the preset dictionary may further include a use probability corresponding to each character, and after the candidate characters are determined, a character with the highest use probability may be selected from the candidate characters to serve as a replacement character. Wherein M is an integer of 1 or more.
The process of screening out the M characters with the minimum editing distance may be: the characters in the preset dictionary are sorted in an ascending order based on the corresponding editing distance, and the characters at the first M positions in the sorting order are determined as the M characters with the minimum corresponding editing distance.
In another implementation manner, when the character to be recognized includes one character and the character is a word in the chinese, the electronic device may determine, from a preset dictionary, a word that is similar to the character to be recognized as a replacement character by using any relevant character recognition algorithm.
In another implementation manner, when the character to be recognized includes a plurality of characters, the following may be performed: and determining characters corresponding to the wrong characters in the short sentences or phrases which are in the same grammar rule and have the wrong characters in the characters to be recognized in a preset dictionary as replacement characters. For example: when the character to be recognized input by the user through handwriting is "new sanction", the word group "new sanction" in the preset dictionary is used for determining that the "new" in the character to be recognized input through handwriting is an error character, and at the moment, the electronic equipment can take the "heart" as a replacement character corresponding to the "new" error character in the character to be recognized.
Furthermore, after the electronic device determines the replacement character corresponding to each of the error characters, a second position corresponding to the position of the error character may be determined in the display screen based on the position information of the error character, and the replacement character may be displayed at the second position. The second position may be directly below, directly to the left, or directly to the right of the position where the error character is located. The second position may be a position different from the first position. When the embodiment of the invention displays the replacing characters, the replacing characters can be displayed in a suspension mode.
Corresponding to the above method embodiment, an embodiment of the present invention provides an apparatus for identifying accuracy of an input text, as shown in fig. 3, where the apparatus includes:
a first obtaining module 310, configured to obtain an ink image containing input text of a handwriting input as an ink image to be recognized;
the recognition module 320 is configured to recognize the ink image to be recognized, and obtain a character to be recognized in the input text;
the matching module 330 is configured to match the character to be recognized with a preset dictionary to obtain a matching result;
a determining module 340, configured to determine whether an error character exists in the character to be recognized based on the obtained matching result.
In the embodiment of the invention, an ink image containing input text input by handwriting is obtained and used as an ink image to be identified; identifying the ink image to be identified to obtain characters to be identified in the input text; matching the character to be recognized with a preset dictionary to obtain a matching result; based on the obtained matching result, it is determined whether there is an erroneous character within the character to be recognized. Therefore, in the embodiment of the invention, after the characters to be recognized in the input text are recognized from the ink image containing the input text of the handwriting input, the characters to be recognized are matched with the preset dictionary, whether the characters to be recognized have error characters can be determined based on the matching result, so that the error recognition of the text input in the handwriting input mode is realized, that is, the input text of the handwriting input is checked, and whether the error characters appear in the input text of the handwriting input is recognized.
In one implementation, the identifying module 320 is specifically configured to
Identifying the ink image to be identified to obtain characters to be identified in the input text and position information of each character to be identified;
as shown in fig. 4, the apparatus may further include:
and a first display module 410, configured to, when it is determined that there is an error character in the character to be recognized, output prompt information for each error character in a display screen based on the position information of the error character.
In one implementation, the first display module 410 is specifically used for
And displaying a preset prompt line at a first position corresponding to the position of each error character in the display screen based on the position information of each error character.
In one implementation, the first display module 410 is specifically used for
And highlighting the position of each error character in the display screen based on the position information of the error character.
In one implementation, the apparatus may further include:
a second obtaining module, configured to obtain a corrected character corresponding to each error character as a replacement character after outputting prompt information for the error character in a display screen based on the position information of each error character;
and the second display module is used for displaying the replacement character at a second position corresponding to the position of the error character in the display screen based on the position information of each error character.
In one implementation, the identification module 320 is specifically configured to
And identifying the ink image to be identified through an Optical Character Recognition (OCR) algorithm to obtain the characters to be identified in the input text.
In one implementation, the identification module 320 is specifically configured to
Identifying the ink image to be identified to obtain all characters in the input text;
performing word segmentation processing on all the recognized characters to obtain at least one group of phrases;
screening out a group of phrases containing the last handwritten input character from the at least one group of phrases to serve as the phrases to be recognized;
and taking all characters in the phrase to be recognized as characters to be recognized.
In one implementation, the identifying module 320 is specifically configured to
Identifying the ink image to be identified to obtain all characters in the input text;
determining the character input by handwriting finally from all the recognized characters;
and taking the character input by handwriting at the last time and a preset number of characters before the character input by handwriting at the last time in all the recognized characters as characters to be recognized.
Corresponding to the above method embodiments, the embodiment of the present invention further provides an electronic device, as shown in fig. 5, including a processor 510, a communication interface 520, a memory 530 and a communication bus 540, where the processor 510, the communication interface 520, and the memory 530 complete mutual communication through the communication bus 540,
a memory 530 for storing a computer program;
the processor 510, when executing the computer program stored in the memory 530, is configured to implement any of the steps of the method for identifying the accuracy of the input text provided by the embodiment of the present invention:
acquiring an ink image containing input text input by handwriting as an ink image to be recognized;
identifying the ink image to be identified to obtain characters to be identified in the input text;
matching the character to be recognized with a preset dictionary to obtain a matching result;
and determining whether an error character exists in the character to be recognized or not based on the obtained matching result.
In the embodiment of the invention, an ink image containing input text input by handwriting is obtained and used as an ink image to be identified; identifying the ink image to be identified to obtain characters to be identified in the input text; matching the character to be recognized with a preset dictionary to obtain a matching result; based on the obtained matching result, it is determined whether there is an erroneous character within the character to be recognized. Therefore, in the embodiment of the invention, after the characters to be recognized in the input text are recognized from the ink image containing the input text of the handwriting input, the characters to be recognized are matched with the preset dictionary, whether the characters to be recognized have error characters can be determined based on the matching result, so that the error recognition of the text input in the handwriting input mode is realized, that is, the input text of the handwriting input is checked, and whether the error characters appear in the input text of the handwriting input is recognized.
In one implementation, the step of recognizing the ink image to be recognized to obtain the character to be recognized in the input text includes:
identifying the ink image to be identified to obtain characters to be identified in the input text and position information of each character to be identified;
the method further comprises the following steps:
and when the error characters exist in the characters to be recognized, displaying prompt information aiming at the error characters in a display screen based on the position information of each error character.
In one implementation, the step of outputting a prompt message for each error character in the display screen based on the position information of the error character includes:
and displaying a preset prompt line at a first position corresponding to the position of each error character in the display screen based on the position information of each error character.
In one implementation, the step of outputting a prompt message for each error character in the display screen based on the position information of the error character includes:
and based on the position information of each error character, highlighting the position of the error character in the display screen.
In one implementation, after the step of outputting a prompt message for each error character in the display screen based on the position information of the error character, the method further includes:
obtaining a corrected character corresponding to each error character as a replacement character;
and displaying the replacement characters at a second position corresponding to the position of each error character in the display screen based on the position information of each error character.
In one implementation, the step of recognizing the ink image to be recognized to obtain the character to be recognized in the input text includes:
and identifying the ink image to be identified through an Optical Character Recognition (OCR) algorithm to obtain the character to be identified in the input text.
In one implementation manner, the step of recognizing the ink image to be recognized to obtain the character to be recognized in the input text includes:
identifying the ink image to be identified to obtain all characters in the input text;
performing word segmentation processing on all the recognized characters to obtain at least one group of word groups;
screening out a group of phrases containing the last handwritten input character from the at least one group of phrases to be used as a phrase to be identified;
and taking all characters in the phrase to be recognized as characters to be recognized.
In one implementation, the step of recognizing the ink image to be recognized to obtain the character to be recognized in the input text includes:
identifying the ink image to be identified to obtain all characters in the input text;
determining the character input by handwriting finally from all the recognized characters;
and taking the character input by handwriting at the last time and a preset number of characters before the character input by handwriting at the last time in all the recognized characters as characters to be recognized.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Alternatively, the memory may be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.
Corresponding to the above method embodiments, the embodiment of the present invention further provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the method for identifying the accuracy of the input text provided by the embodiment of the present invention is implemented.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment is described with emphasis on differences from other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (18)

1. A method for identifying the accuracy of input text, the method comprising:
acquiring an ink image containing input text input by handwriting as an ink image to be recognized;
identifying the ink image to be identified to obtain characters to be identified in the input text;
determining the number of characters in the characters to be recognized, and determining a sub-dictionary needing to be matched from a preset dictionary based on the number of characters in the characters to be recognized; the preset dictionary comprises a plurality of sub-dictionaries, the number of single characters, short sentences conforming to grammar rules and characters contained in phrases stored in each sub-dictionary is different, and the number of characters corresponding to the sub-dictionaries to be matched is not less than that of characters in the characters to be recognized; matching the character to be recognized with each accurate single character, short sentences and phrases which accord with grammar rules and are contained in the sub-dictionary needing to be matched one by one to obtain a matching result;
and determining whether an error character exists in the character to be recognized or not based on the obtained matching result.
2. The method according to claim 1, wherein the step of recognizing the ink image to be recognized to obtain the character to be recognized in the input text comprises:
identifying the ink image to be identified to obtain characters to be identified in the input text and position information of each character to be identified;
the method further comprises the following steps:
and when the error characters exist in the characters to be recognized, displaying prompt information aiming at the error characters in a display screen based on the position information of each error character.
3. The method of claim 2, wherein the step of outputting a prompt message for each error character in the display screen based on the position information of the error character comprises:
and displaying a preset prompt line at a first position corresponding to the position of each error character in the display screen based on the position information of each error character.
4. The method of claim 2, wherein the step of outputting a prompt message for each error character in the display screen based on the position information of the error character comprises:
and based on the position information of each error character, highlighting the position of the error character in the display screen.
5. The method of claim 2, wherein after the step of outputting a prompt message for each error character in the display screen based on the position information of the error character, the method further comprises:
obtaining a corrected character corresponding to each error character as a replacement character;
and displaying the replacement characters at a second position corresponding to the position of each error character in the display screen based on the position information of each error character.
6. The method of claim 1, wherein the step of recognizing the ink image to be recognized to obtain the character to be recognized in the input text comprises:
and identifying the ink image to be identified through an Optical Character Recognition (OCR) algorithm to obtain the characters to be identified in the input text.
7. The method of claim 1, wherein the step of recognizing the ink image to be recognized to obtain the character to be recognized in the input text comprises:
identifying the ink image to be identified to obtain all characters in the input text;
performing word segmentation processing on all the recognized characters to obtain at least one group of phrases;
screening out a group of phrases containing the last handwritten input character from the at least one group of phrases to be used as a phrase to be identified;
and taking all characters in the phrase to be recognized as characters to be recognized.
8. The method of claim 1, wherein the step of recognizing the ink image to be recognized to obtain the character to be recognized in the input text comprises:
identifying the ink image to be identified to obtain all characters in the input text;
determining the character input by handwriting finally from all the recognized characters;
and taking the character input by handwriting and the preset number of characters before the character input by handwriting as characters to be recognized in all recognized characters.
9. An apparatus for identifying the accuracy of input text, the apparatus comprising:
the first obtaining module is used for obtaining an ink image containing input text of handwriting input as an ink image to be recognized;
the recognition module is used for recognizing the ink image to be recognized to obtain characters to be recognized in the input text;
the matching module is used for determining the number of characters in the characters to be recognized and determining a sub-dictionary needing to be matched from a preset dictionary based on the number of characters in the characters to be recognized; the preset dictionary comprises a plurality of sub-dictionaries, the number of characters contained in a single character, a short sentence according with grammar rules and a phrase stored in each sub-dictionary is different, and the number of characters corresponding to the sub-dictionary required to be matched is not less than the number of characters in the characters to be recognized; matching the character to be recognized with each accurate single character, short sentences and phrases which accord with grammar rules and are contained in the sub-dictionary needing to be matched one by one to obtain a matching result;
and the determining module is used for determining whether an error character exists in the character to be recognized or not based on the obtained matching result.
10. The apparatus according to claim 9, wherein the recognition module is specifically configured to recognize the ink image to be recognized, obtain the characters to be recognized in the input text, and position information of each character to be recognized;
the device further comprises:
and the first display module is used for outputting prompt information aiming at the error character in a display screen based on the position information of each error character when the error character exists in the character to be recognized.
11. The device according to claim 10, wherein the first display module is in particular adapted for
And displaying a preset prompt line at a first position corresponding to the position of each error character in the display screen based on the position information of each error character.
12. The device according to claim 10, wherein the first display module is in particular adapted for
And highlighting the position of each error character in the display screen based on the position information of the error character.
13. The apparatus of claim 10, further comprising:
a second obtaining module, configured to obtain a corrected character corresponding to each error character as a replacement character after outputting prompt information for the error character in a display screen based on the position information of each error character;
and the second display module is used for displaying the replacement character at a second position corresponding to the position of each error character in the display screen based on the position information of each error character.
14. Device according to claim 9, characterized in that the identification module is, in particular, adapted to
And identifying the ink image to be identified through an Optical Character Recognition (OCR) algorithm to obtain the character to be identified in the input text.
15. Device according to claim 9, characterized in that the identification module is, in particular, adapted to
Identifying the ink image to be identified to obtain all characters in the input text;
performing word segmentation processing on all the recognized characters to obtain at least one group of word groups;
screening out a group of phrases containing the last handwritten input character from the at least one group of phrases to be used as a phrase to be identified;
and taking all characters in the phrase to be recognized as characters to be recognized.
16. Device according to claim 9, characterized in that the identification module is, in particular, adapted to
Identifying the ink image to be identified to obtain all characters in the input text;
determining the character input by handwriting finally from all the recognized characters;
and taking the character input by handwriting at the last time and a preset number of characters before the character input by handwriting at the last time in all the recognized characters as characters to be recognized.
17. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the steps of the method for identifying the accuracy of an input text according to any one of claims 1 to 8 when executing a computer program stored in a memory.
18. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method for identifying the accuracy of an input text according to any one of claims 1 to 8.
CN201810675867.5A 2018-06-27 2018-06-27 Method and device for identifying accuracy of input text and electronic equipment Active CN110647785B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810675867.5A CN110647785B (en) 2018-06-27 2018-06-27 Method and device for identifying accuracy of input text and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810675867.5A CN110647785B (en) 2018-06-27 2018-06-27 Method and device for identifying accuracy of input text and electronic equipment

Publications (2)

Publication Number Publication Date
CN110647785A CN110647785A (en) 2020-01-03
CN110647785B true CN110647785B (en) 2022-09-23

Family

ID=69008923

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810675867.5A Active CN110647785B (en) 2018-06-27 2018-06-27 Method and device for identifying accuracy of input text and electronic equipment

Country Status (1)

Country Link
CN (1) CN110647785B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112287921A (en) * 2020-10-15 2021-01-29 泰州锐比特智能科技有限公司 Composition evaluation system and method based on wrong word identification

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6154579A (en) * 1997-08-11 2000-11-28 At&T Corp. Confusion matrix based method and system for correcting misrecognized words appearing in documents generated by an optical character recognition technique
CN103021412A (en) * 2012-12-28 2013-04-03 安徽科大讯飞信息科技股份有限公司 Voice recognition method and system
CN103400512A (en) * 2013-07-16 2013-11-20 步步高教育电子有限公司 Learning assisting device and operating method thereof
CN107203510A (en) * 2017-05-23 2017-09-26 深圳天珑无线科技有限公司 character detecting method and device
CN107608967A (en) * 2017-09-20 2018-01-19 维沃移动通信有限公司 A kind of error character recognition methods and terminal

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2634194C1 (en) * 2016-09-16 2017-10-24 Общество с ограниченной ответственностью "Аби Девелопмент" Verification of optical character recognition results

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6154579A (en) * 1997-08-11 2000-11-28 At&T Corp. Confusion matrix based method and system for correcting misrecognized words appearing in documents generated by an optical character recognition technique
CN103021412A (en) * 2012-12-28 2013-04-03 安徽科大讯飞信息科技股份有限公司 Voice recognition method and system
CN103400512A (en) * 2013-07-16 2013-11-20 步步高教育电子有限公司 Learning assisting device and operating method thereof
CN107203510A (en) * 2017-05-23 2017-09-26 深圳天珑无线科技有限公司 character detecting method and device
CN107608967A (en) * 2017-09-20 2018-01-19 维沃移动通信有限公司 A kind of error character recognition methods and terminal

Also Published As

Publication number Publication date
CN110647785A (en) 2020-01-03

Similar Documents

Publication Publication Date Title
RU2613734C1 (en) Video capture in data input scenario
US11361570B2 (en) Receipt identification method, apparatus, device and storage medium
JP6527410B2 (en) Character recognition device, character recognition method, and program
CN107357824B (en) Information processing method, service platform and computer storage medium
CA3048356A1 (en) Unstructured data parsing for structured information
RU2571396C2 (en) Method and system for verification during reading
US20140380169A1 (en) Language input method editor to disambiguate ambiguous phrases via diacriticization
CN112149680B (en) Method and device for detecting and identifying wrong words, electronic equipment and storage medium
CN112560452B (en) Method and system for automatically generating error correction corpus
US5940532A (en) Apparatus for and method of recognizing hand-written characters
CN111984589A (en) Document processing method, document processing device and electronic equipment
CN111046627B (en) Chinese character display method and system
JP2016201013A (en) Character recognition device, character recognition processing system, and program
CN110647785B (en) Method and device for identifying accuracy of input text and electronic equipment
KR102282025B1 (en) Method for automatically sorting documents and extracting characters by using computer
CN114730241B (en) Gesture and stroke recognition in touch user interface input
CN114092949A (en) Method and device for training class prediction model and identifying interface element class
US20120281919A1 (en) Method and system for text segmentation
CN112632956A (en) Text matching method, device, terminal and storage medium
CN112559725A (en) Text matching method, device, terminal and storage medium
CN116225956A (en) Automated testing method, apparatus, computer device and storage medium
JP2019175317A (en) Character recognition device, character recognition method, and program
CN115661836A (en) Automatic correction method, device and system and readable storage medium
CN114065762A (en) Text information processing method, device, medium and equipment
CN115223188A (en) Bill information processing method, device, electronic equipment and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant