CN110929749A - Text recognition method, text recognition device, text recognition medium and electronic equipment - Google Patents

Text recognition method, text recognition device, text recognition medium and electronic equipment Download PDF

Info

Publication number
CN110929749A
CN110929749A CN201910979595.2A CN201910979595A CN110929749A CN 110929749 A CN110929749 A CN 110929749A CN 201910979595 A CN201910979595 A CN 201910979595A CN 110929749 A CN110929749 A CN 110929749A
Authority
CN
China
Prior art keywords
text
recognized
character
vector
standard
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910979595.2A
Other languages
Chinese (zh)
Other versions
CN110929749B (en
Inventor
回艳菲
王健宗
程宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910979595.2A priority Critical patent/CN110929749B/en
Publication of CN110929749A publication Critical patent/CN110929749A/en
Application granted granted Critical
Publication of CN110929749B publication Critical patent/CN110929749B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Character Discrimination (AREA)

Abstract

The application provides a text recognition method, a text recognition device, a text recognition medium and electronic equipment. The method comprises the steps of obtaining a text to be recognized, and calculating a hash value of the text to be recognized; acquiring a first text vector group corresponding to a hash value of a text to be recognized, and for each character to be recognized in the text to be recognized, obtaining a second vector according to the radical combination of the character to be recognized, wherein the second vector is arranged into a second text vector group according to the sequence of the character to be recognized in the text to be recognized; obtaining a third vector according to the pinyin of the character to be recognized, and arranging the third vector into a third text vector group according to the sequence of the character to be recognized in the text to be recognized; calculating a weighted average value of a first vector distance, a second vector distance and a third vector distance between the text to be recognized and each standard text in the standard text library as a weighted average vector distance between the text to be recognized and each standard text; and taking the standard text corresponding to the minimum weighted average vector distance as the recognition result of the text to be recognized.

Description

Text recognition method, text recognition device, text recognition medium and electronic equipment
Technical Field
The present application relates to the field of communications technologies, and in particular, to a text recognition method, apparatus, medium, and electronic device.
Background
With the comprehensive development of information construction in China, the character recognition technology is widely applied. The existing commonly used character recognition methods include a template matching method and a geometric feature extraction method, wherein the template matching method carries out relevant matching on input characters and given standard characters of various types, calculates the similarity degree between the input characters and various templates, and takes the type with the maximum similarity as a recognition result; the geometric feature extraction method extracts some geometric features of the characters, such as end points, branch points, concave-convex parts of the characters, line segments in all directions such as horizontal direction, vertical direction and inclined direction, closed loops and the like, and performs logical combination judgment according to the positions and mutual relations of the features to obtain a recognition result.
However, when the handwriting to be recognized is sloppy, the sentence or vocabulary can not be recognized accurately by matching the outline of the single-word, and the situation of inaccurate recognition often occurs.
Disclosure of Invention
The present application is directed to a text recognition method, apparatus, medium, and electronic device, which can accurately recognize a text to be recognized.
According to an aspect of an embodiment of the present application, there is provided a text recognition method including: acquiring a text to be identified; calculating the text to be recognized according to a Hash algorithm to obtain a Hash value corresponding to the text to be recognized; acquiring a first text vector group corresponding to the hash value of the text to be recognized, wherein the first text vector group is formed by connecting character vectors corresponding to each character of the hash value in series according to the character sequence of the hash value of the text to be recognized; acquiring characters to be recognized in the text to be recognized; for each character to be recognized, acquiring a radical in the character to be recognized as a radical combination corresponding to the character to be recognized, inputting the radical combination corresponding to the character to be recognized into a first machine learning model to obtain a second vector corresponding to the character to be recognized, and arranging the second vectors into a second text vector group according to the sequence of the character to be recognized in the text to be recognized; for each character to be recognized, obtaining the pinyin of the character to be recognized, inputting the pinyin of the character to be recognized into a second machine learning model to obtain a third vector corresponding to the character to be recognized, and arranging the third vector into a third text vector group according to the sequence of the character to be recognized in the text to be recognized; respectively solving the vector distances between the first text vector group, the second text vector group and the third text vector group of the text to be recognized and the first standard text vector group, the second standard text vector group and the third standard text vector group of each standard text in the standard text library as the first vector distance, the second vector distance and the third vector distance between the file to be recognized and each standard text; calculating a weighted average value of a first vector distance, a second vector distance and a third vector distance between the text to be recognized and each standard text, and taking the weighted average value as a weighted average vector distance between the text to be recognized and each standard text; and taking the standard text corresponding to the minimum weighted average vector distance as the recognition result of the text to be recognized.
According to an aspect of an embodiment of the present application, there is provided a text recognition apparatus including: the text acquisition module is used for acquiring a text to be identified; the first calculation module is used for calculating the text to be recognized according to a Hash algorithm to obtain a Hash value corresponding to the text to be recognized; a first vector group obtaining module, configured to obtain a first text vector group corresponding to the hash value of the text to be recognized, where the first text vector group is formed by concatenating character vectors corresponding to each character of the hash value of the text to be recognized in a character order of the hash value of the text to be recognized; the character acquisition module is used for acquiring characters to be recognized in the text to be recognized; a second vector group obtaining module, configured to, for each character to be recognized, obtain a radical in the character to be recognized as a radical combination corresponding to the character to be recognized, input the radical combination corresponding to the character to be recognized into a first machine learning model to obtain a second vector corresponding to the character to be recognized, where the second vectors are arranged in the text to be recognized according to the sequence of the character to be recognized in the text to be recognized to form a second text vector group; a third vector group obtaining module, configured to obtain, for each character to be recognized, a pinyin of the character to be recognized, input the pinyin of the character to be recognized into a second machine learning model to obtain a third vector corresponding to the character to be recognized, where the third vector is arranged in the text to be recognized according to an order of the character to be recognized in the text to be recognized to form a third text vector group; the second calculation module is used for respectively calculating the vector distances between the first text vector group, the second text vector group and the third text vector group of the text to be recognized and the first standard text vector group, the second standard text vector group and the third standard text vector group of each standard text in the standard text library as the first vector distance, the second vector distance and the third vector distance between the file to be recognized and each standard text; the third calculation module is used for solving the weighted average value of the first vector distance, the second vector distance and the third vector distance between the text to be recognized and each standard text, and the weighted average value is used as the weighted average vector distance between the text to be recognized and each standard text; and the determining module is used for taking the standard text corresponding to the minimum weighted average vector distance as the recognition result of the text to be recognized.
In some embodiments of the present application, based on the foregoing solution, the text recognition apparatus further includes: the standard text acquisition module is used for acquiring a plurality of standard texts, and for each standard text, calculating the standard text according to a Hash algorithm to obtain a Hash value corresponding to the standard text; acquiring a first standard text vector group corresponding to the hash value of the standard text, wherein the first standard text vector group is formed by connecting character vectors corresponding to each character of the hash value of the standard text in series according to the character sequence of the hash value of the standard text; acquiring a standard character in the standard text; for each standard character, acquiring a radical in the standard character as a radical combination corresponding to the standard character, inputting the radical combination corresponding to the standard character into a first machine learning model to obtain a second vector corresponding to the standard character, and arranging the second vectors into a second standard text vector group of the standard text according to the sequence of the standard character in the standard text; and acquiring the pinyin of the standard character for each standard character, inputting the pinyin of the standard character into a second machine learning model to obtain a third vector corresponding to the standard character, and arranging the third vector into a third standard text vector group of the standard text according to the sequence of the standard character in the standard text.
In some embodiments of the present application, based on the foregoing solution, the text recognition apparatus further includes: the counting module is used for counting the number of the characters in the text to be recognized; the text acquisition module is further configured to: if the number of the characters in the text to be recognized reaches a set value, acquiring keywords to be recognized in the text to be recognized; the first computing module is further configured to: calculating the keywords to be identified according to a Hash algorithm to obtain Hash values corresponding to the keywords to be identified; the first vector group acquisition module is further configured to: acquiring a first word vector group corresponding to the hash value of the keyword to be recognized, wherein the first word vector group is formed by connecting character vectors corresponding to each character of the hash value of the keyword to be recognized in series according to the character sequence of the hash value of the keyword to be recognized; the character acquisition module is further configured to: acquiring characters to be recognized in the keywords to be recognized; the second vector group acquisition module is further configured to: for each character to be recognized, acquiring a radical in the character to be recognized as a radical combination corresponding to the character to be recognized, inputting the radical combination corresponding to the character to be recognized into a first machine learning model to obtain a second vector corresponding to the character to be recognized, and arranging the second vectors into a second word vector group according to the sequence of the character to be recognized in the keyword to be recognized; the third vector group acquisition module is further configured to: for each character to be recognized, obtaining the pinyin of the character to be recognized, inputting the pinyin of the character to be recognized into a second machine learning model to obtain a third vector corresponding to the character to be recognized, and arranging the third vector into a third word vector group according to the sequence of the character to be recognized in the keyword to be recognized; the second computing module is further configured to: respectively solving the vector distances between the first word vector group, the second word vector group and the third word vector group of the keyword to be recognized and the first standard word vector group, the second standard word vector group and the third standard word vector group of each standard keyword in the standard keyword library as a fourth vector distance, a fifth vector distance and a sixth vector distance between the keyword to be recognized and each standard keyword; the third computing module is further configured to: calculating a weighted average value of a fourth vector distance, a fifth vector distance and a sixth vector distance between the keyword to be recognized and each standard keyword, wherein the weighted average value is used as a weighted average vector distance between the keyword to be recognized and each standard keyword; the determination module is further configured to: and combining the standard keywords corresponding to the minimum weighted average vector distance according to the sequence of the keywords to be recognized in the text to be recognized, and taking the combined keywords as the recognition result of the text to be recognized.
In some embodiments of the present application, based on the foregoing, the determining module is further configured to: obtaining standard texts of which the weighted average reaches a threshold, and if a plurality of standard texts of which the weighted average reaches the threshold exist, sending the plurality of standard texts of which the weighted average reaches the threshold to a user for selection; and acquiring the standard text selected by the user as a recognition result of the text to be recognized.
In some embodiments of the present application, based on the foregoing solution, the text recognition apparatus further includes: the fourth vector group calculation module is used for searching a preset radical semantic comparison table according to each radical in the radical combination to obtain the semantic corresponding to each radical; combining the semantics corresponding to the radicals according to the sequence of the radicals in the radical combination; inputting the semantic combination into a third machine learning model to obtain a fourth vector corresponding to the character to be recognized, wherein the fourth vector is arranged into a fourth text vector group according to the sequence of the character to be recognized in the text to be recognized; the second calculation module is further configured to calculate vector distances between the first text vector group, the second text vector group, the third text vector group, the fourth text vector group of the text to be recognized and the first standard text vector group, the second standard text vector group, the third standard text vector group, and the fourth standard text vector group of each standard text in the standard text library, as a first vector distance, a second vector distance, a third vector distance, and a seventh vector distance between the file to be recognized and each standard text; the third calculation module is also used for serving as a weighted average vector distance between the text to be recognized and each standard text; and calculating the weighted average value of the first vector distance, the second vector distance, the third vector distance and the seventh vector distance between the text to be recognized and each standard text, wherein the determining module is further used for taking the standard text corresponding to the minimum weighted average vector distance as the recognition result of the text to be recognized.
According to an aspect of embodiments of the present application, there is provided a computer-readable program medium storing computer program instructions which, when executed by a computer, cause the computer to perform the method of any one of the above.
According to an aspect of an embodiment of the present application, there is provided an electronic apparatus including: a processor; a memory having computer readable instructions stored thereon which, when executed by the processor, implement the method of any of the above.
The technical scheme provided by the embodiment of the application can have the following beneficial effects:
in the technical scheme provided by some embodiments of the application, the text to be recognized is obtained, and the text to be recognized is compared with the standard text to obtain the recognition result. The method comprises the steps of firstly calculating a text to be recognized according to a Hash algorithm to obtain a Hash value corresponding to the text to be recognized, obtaining a first text vector group corresponding to the Hash value of the text to be recognized, wherein the first text vector group is formed by connecting character vectors corresponding to characters of the Hash value in series according to the character sequence of the Hash value of the text to be recognized, and solving the vector distance between the first text vector group of the text to be recognized and the first standard text vector group of each standard text in a standard text library to obtain a first vector distance. The magnitude of the first vector distance represents the distinguishing magnitude of the text to be recognized from each standard text. And then obtaining characters to be recognized in the text to be recognized, obtaining radicals in the characters to be recognized as radical combinations corresponding to the characters to be recognized for each character to be recognized, inputting the radical combinations corresponding to the characters to be recognized into a first machine learning model to obtain second vectors corresponding to the characters to be recognized, arranging the second vectors into a second text vector group according to the sequence of the characters to be recognized in the text to be recognized, and obtaining the vector distance between the second text vector group of the text to be recognized and the second standard text vector group of each standard text in a standard text library to obtain the second vector distance. The magnitude of the second vector distance represents the difference size of the text to be recognized and the radical contained in each standard text. And for each character to be recognized, obtaining the pinyin of the character to be recognized, inputting the pinyin of the character to be recognized into a second machine learning model to obtain a third vector corresponding to the character to be recognized, arranging the third vector into a third text vector group according to the sequence of the character to be recognized in the text to be recognized, and obtaining the vector distance between the third text vector group of the text to be recognized and the third standard text vector group of each standard text in a standard text library to obtain a third vector distance. The magnitude of the third vector distance represents the difference magnitude of the pinyin of the text to be recognized and each standard text. And finally, calculating the weighted average of the first vector distance, the second vector distance and the third vector distance between the text to be recognized and each standard text, taking the weighted average vector distance between the text to be recognized and each standard text, and taking the standard text corresponding to the minimum weighted average vector distance as the recognition result of the text to be recognized. When the recognition result of the text to be recognized is obtained, the difference between the text to be recognized and the standard text is considered, the difference between the radical of the text to be recognized and the radical of the standard text is considered, the difference between the pinyin of the text to be recognized and the pinyin of the standard text is considered, and the weights of the three are also considered, so that the obtained recognition result is more accurate.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
Fig. 1 schematically shows a schematic diagram of an exemplary system architecture to which the technical solution of the embodiments of the present application may be applied;
FIG. 2 schematically shows a flow diagram of a text recognition method according to an embodiment of the present application;
FIG. 3 schematically shows a flow diagram of a text recognition method according to an embodiment of the present application;
FIG. 4 schematically shows a flow diagram of a text recognition method according to an embodiment of the present application;
FIG. 5 schematically illustrates a block diagram of a text recognition apparatus according to an embodiment of the present application;
FIG. 6 is a hardware diagram illustrating a text recognition apparatus according to an example embodiment;
fig. 7 illustrates a computer-readable storage medium for implementing the text recognition method described above, according to an example embodiment.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the subject matter of the present application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the application.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
Fig. 1 shows a schematic diagram of an exemplary system architecture 100 to which the technical solutions of the embodiments of the present application can be applied.
As shown in fig. 1, the system architecture 100 may include a terminal device 101 (which may be one or more of a smartphone, tablet, laptop, desktop, or registered), a network 102, and a server 103. Network 102 is the medium used to provide communication links between terminal devices 101 and server 103. Network 102 may include various connection types, such as wired communication links, wireless communication links, and so forth.
It should be understood that the number of terminal devices 101, networks 102, and servers 103 in fig. 1 is merely illustrative. There may be any number of terminal devices 101, networks 102, and servers 103, as desired for implementation. For example, the server 103 may be a server cluster composed of a plurality of servers.
In one embodiment of the present application, the server 103 may obtain a text to be recognized, which is input from the terminal device 101 by the user, and the text to be recognized may be a voice text or a text with characters. The user may enter the text to be recognized through a client or web page in the terminal device 101. The server 103 obtains the text to be recognized, and compares the text to be recognized with the standard text to obtain a recognition result. The server 103 firstly calculates a text to be recognized according to a hash algorithm to obtain a hash value corresponding to the text to be recognized, obtains a first text vector group corresponding to the hash value, wherein the first text vector group is formed by serially connecting character vectors corresponding to each character of the hash value according to a character sequence of the hash value of the text to be recognized, and obtains a first vector distance by solving a vector distance between the first text vector group of the text to be recognized and a first standard text vector group of each standard text in a standard text library. The magnitude of the first vector distance represents the distinguishing magnitude of the text to be recognized from each standard text. The server 103 obtains characters to be recognized in the text to be recognized again, for each character to be recognized, obtains a radical in the character to be recognized as a radical combination corresponding to the character to be recognized, inputs the radical combination corresponding to the character to be recognized into the first machine learning model to obtain a second vector corresponding to the character to be recognized, arranges the second vectors into a second text vector group according to the sequence of the characters to be recognized in the text to be recognized, and obtains a second vector distance by obtaining the vector distance between the second text vector group of the text to be recognized and the second standard text vector group of each standard text in the standard text library. The magnitude of the second vector distance represents the difference size of the text to be recognized and the radical contained in each standard text. For each character to be recognized, the server 103 obtains the pinyin of the character to be recognized again, inputs the pinyin of the character to be recognized into a second machine learning model to obtain a third vector corresponding to the character to be recognized, arranges the third vector into a third text vector group according to the sequence of the character to be recognized in the text to be recognized, and obtains a third vector distance between the third text vector group of the text to be recognized and the third standard text vector group of each standard text in the standard text library. The magnitude of the third vector distance represents the difference magnitude of the pinyin of the text to be recognized and each standard text. Finally, the server 103 calculates a weighted average of the first vector distance, the second vector distance and the third vector distance between the text to be recognized and each standard text, and uses the weighted average vector distance between the text to be recognized and each standard text, and uses the standard text corresponding to the minimum weighted average vector distance as the recognition result of the text to be recognized. When acquiring the recognition result of the text to be recognized, the server 103 considers the difference between the text to be recognized and the standard text, the difference between the radical of the text to be recognized and the radical of the standard text, the difference between the pinyin of the text to be recognized and the pinyin of the standard text, and the weights of the three components, so that the obtained recognition result is more accurate.
It should be noted that the text recognition method provided in the embodiment of the present application is generally executed by the server 103, and accordingly, the text recognition apparatus is generally disposed in the server 103. However, in other embodiments of the present application, the terminal device 101 may also have a similar function to the server 103, so as to execute the text recognition method provided by the embodiments of the present application.
The implementation details of the technical solution of the embodiment of the present application are set forth in detail below:
fig. 2 schematically shows a flow chart of a text recognition method according to an embodiment of the present application, the execution subject of which may be a server, such as the server 103 shown in fig. 1.
Referring to fig. 2, the text recognition method at least includes steps S210 to S290, which are described in detail as follows:
in step S210, a text to be recognized is acquired.
In one embodiment of the present application, the text to be recognized may be a speech text or a text, and the text to be recognized may be a speech text with noise removed from speech input by a user.
In step S220, the text to be recognized is calculated according to a hash algorithm to obtain a hash value corresponding to the text to be recognized.
In an embodiment of the application, the complete text to be recognized can be calculated according to a hash algorithm, so as to obtain a hash value corresponding to the complete text to be recognized.
In an embodiment of the present application, a complete text to be recognized may be divided into a plurality of text paragraphs to be recognized as needed, and each text paragraph to be recognized is calculated according to a hash algorithm, so as to obtain a hash value corresponding to each text paragraph to be recognized.
In an embodiment of the present application, a complete text to be recognized may be divided according to characters, and a hash value corresponding to each character is calculated according to a hash algorithm.
In step S230, a first text vector group corresponding to the hash value of the text to be recognized is obtained, where the first text vector group is formed by concatenating character vectors corresponding to each character of the hash value of the text to be recognized according to the character sequence of the hash value of the text to be recognized.
In an embodiment of the application, each character of the hash value is preset with a corresponding character vector, a hash value and character vector comparison table may be preset, after the hash value corresponding to the text to be recognized is obtained, a pre-stored hash value and character vector comparison table is searched for according to each character in the hash value, a character vector corresponding to each character in the hash value is obtained, and the character vectors are arranged in the order of the corresponding characters in the hash value as the first vector group.
In an embodiment of the application, a plurality of characters in the hash value form a character group corresponding to a character vector, after the hash value corresponding to the text to be recognized is obtained, a pre-stored hash value character group and character vector comparison table is searched according to each character group in the hash value to obtain a character vector corresponding to each character group in the hash value, and the character vectors are arranged in the order of the corresponding character groups in the hash value as a first vector group.
In step S240, a character to be recognized in the text to be recognized is acquired.
In one embodiment of the present application, the character to be recognized may be each character in the text to be recognized.
In one embodiment of the present application, the character to be identified may be a character entered by a user in a specific location.
In step S250, for each character to be recognized, a radical in the character to be recognized is obtained as a radical combination corresponding to the character to be recognized, the radical combination corresponding to the character to be recognized is input into the first machine learning model to obtain a second vector corresponding to the character to be recognized, and the second vectors are arranged into a second text vector group according to the sequence of the character to be recognized in the text to be recognized.
In one embodiment of the application, each character to be recognized is divided into radicals and then arranged from front to back to form a radical combination according to the sequence of up, down, left and right.
In one embodiment of the present application, the first machine learning model is pre-trained by: the method comprises the steps of obtaining a character sample set, obtaining a second vector corresponding to each character sample in the character sample set, obtaining a radical combination corresponding to each character sample in the character sample set, inputting the radical combination of each character sample into a first machine learning model, and obtaining a second vector corresponding to each character output by the first machine learning model.
And comparing the second vector corresponding to the character output by the first machine learning model with the known second vector corresponding to the character, and if the second vector is inconsistent with the known second vector corresponding to the character, adjusting the first machine learning model so that the second vector output by the first machine learning model is consistent with the known second vector corresponding to the character.
In step S260, for each character to be recognized, obtaining the pinyin of the character to be recognized, inputting the pinyin of the character to be recognized into a second machine learning model to obtain a third vector corresponding to the character to be recognized, and arranging the third vectors into a third text vector group according to the sequence of the character to be recognized in the text to be recognized.
In one embodiment of the application, when the character to be recognized is a literal character, the pinyin of the character to be recognized can be obtained by obtaining the pinyin input by the user in the input method when the user inputs the literal character.
In one embodiment of the present application, when the character to be recognized is a phonetic character, the pinyin of the character to be recognized may be obtained according to the pronunciation of the user.
In one embodiment of the present application, the second machine learning model is pre-trained by: and acquiring a character sample set, wherein a third vector corresponding to each character sample in the character sample set is known, acquiring the pinyin corresponding to the character sample for each character sample in the character sample set, and inputting the pinyin of the character sample into a second machine learning model to obtain the third vector corresponding to the character output by the second machine learning model.
And comparing the third vector corresponding to the character output by the second machine learning model with the known third vector corresponding to the character, and if the third vector is inconsistent with the known third vector corresponding to the character, adjusting the second machine learning model to ensure that the third vector output by the second machine learning model is consistent with the known third vector corresponding to the character.
In step S270, vector distances between the first text vector group, the second text vector group, and the third text vector group of the text to be recognized and the first standard text vector group, the second standard text vector group, and the third standard text vector group of each standard text in the standard text library are respectively obtained as the first vector distance, the second vector distance, and the third vector distance between the file to be recognized and each standard text.
In one embodiment of the present application, the standard text may be obtained by the following process: acquiring a plurality of standard texts, and calculating the standard texts according to a Hash algorithm to obtain Hash values corresponding to the standard texts for each standard text; acquiring a first standard text vector group corresponding to the hash value of the standard text, wherein the first standard text vector group is formed by connecting character vectors corresponding to each character of the hash value in series according to the character sequence of the hash value; acquiring a standard character in the standard text; for each standard character, acquiring a radical in the standard character as a radical combination corresponding to the standard character, inputting the radical combination corresponding to the standard character into a first machine learning model to obtain a second vector corresponding to the standard character, and arranging the second vectors into a second standard text vector group of the standard text according to the sequence of the standard character in the standard text; and for each standard character, obtaining the pinyin of the standard character, inputting the pinyin of the standard character into a second machine learning model to obtain a third vector corresponding to the standard character, and arranging the third vector into a third standard text vector group of the standard text according to the sequence of the standard character in the standard text.
In step S280, a weighted average of the first vector distance, the second vector distance, and the third vector distance between the text to be recognized and each standard text is obtained as a weighted average vector distance between the text to be recognized and each standard text.
In an embodiment of the present application, since there are more homophones, the weight occupied by the first vector distance and the weight occupied by the second vector distance may be more than the weight occupied by the third vector distance obtained according to the pronunciation of the character in the text to be recognized.
In step S290, the standard text corresponding to the minimum weighted average vector distance is used as the recognition result of the text to be recognized.
In an embodiment of the application, a plurality of standard texts of which the weighted average value reaches the threshold value can be obtained, and if the number of the standard texts of which the weighted average value reaches the threshold value is multiple, the plurality of standard texts of which the weighted average value reaches the threshold value are sent to a user for selection; and acquiring the standard text selected by the user as the recognition result of the text to be recognized. And providing the standard text with the weighted average reaching the threshold value for the user, and providing more choices for the user.
In the technical scheme provided by some embodiments of the application, the text to be recognized is obtained, and the text to be recognized is compared with the standard text to obtain the recognition result. The method comprises the steps of firstly calculating a text to be recognized according to a Hash algorithm to obtain a Hash value corresponding to the text to be recognized, obtaining a first text vector group corresponding to the Hash value of the text to be recognized, wherein the first text vector group is formed by connecting character vectors corresponding to characters of the Hash value in series according to the character sequence of the Hash value of the text to be recognized, and solving the vector distance between the first text vector group of the text to be recognized and the first standard text vector group of each standard text in a standard text library to obtain a first vector distance. The magnitude of the first vector distance represents the distinguishing magnitude of the text to be recognized from each standard text. And then obtaining characters to be recognized in the text to be recognized, obtaining radicals in the characters to be recognized as radical combinations corresponding to the characters to be recognized for each character to be recognized, inputting the radical combinations corresponding to the characters to be recognized into a first machine learning model to obtain second vectors corresponding to the characters to be recognized, arranging the second vectors into a second text vector group according to the sequence of the characters to be recognized in the text to be recognized, and obtaining the vector distance between the second text vector group of the text to be recognized and the second standard text vector group of each standard text in a standard text library to obtain the second vector distance. The magnitude of the second vector distance represents the difference size of the text to be recognized and the radical contained in each standard text. And for each character to be recognized, obtaining the pinyin of the character to be recognized, inputting the pinyin of the character to be recognized into a second machine learning model to obtain a third vector corresponding to the character to be recognized, arranging the third vector into a third text vector group according to the sequence of the character to be recognized in the text to be recognized, and obtaining the vector distance between the third text vector group of the text to be recognized and the third standard text vector group of each standard text in a standard text library to obtain a third vector distance. The magnitude of the third vector distance represents the difference magnitude of the pinyin of the text to be recognized and each standard text. And finally, calculating the weighted average of the first vector distance, the second vector distance and the third vector distance between the text to be recognized and each standard text, taking the weighted average vector distance between the text to be recognized and each standard text, and taking the standard text corresponding to the minimum weighted average vector distance as the recognition result of the text to be recognized. When the recognition result of the text to be recognized is obtained, the difference between the text to be recognized and the standard text is considered, the difference between the radical of the text to be recognized and the radical of the standard text is considered, the difference between the pinyin of the text to be recognized and the pinyin of the standard text is considered, and the weights of the three are also considered, so that the obtained recognition result is more accurate.
In one embodiment of the application, when the text to be recognized is a sentence input by the user in the client, "i do not intend to raise a dog, because i have already raised a dog", since the glyphs of "department" and "raise" are very similar, the glyphs recognition easily regards the text to be recognized as "i have raised a dog", resulting in that the semantic meaning of the recognized text is different from the semantic meaning that the user wants to express. The scheme in the application can accurately identify the semantics of the user and can enable the obtained identification result to be more accurate due to the consideration of the characteristics of radicals, pronunciations and the like of the text characters to be identified.
Fig. 3 schematically shows a flow chart of a text recognition method according to an embodiment of the present application, the execution subject of which may be a server, such as the server 103 shown in fig. 1.
Referring to fig. 3, the text recognition method may include steps S310 to S380, which are described in detail as follows:
in step S310, a text to be recognized is obtained, a hash value corresponding to the text to be recognized is obtained by calculating the text to be recognized according to a hash algorithm, and a first text vector group corresponding to the hash value is obtained, where the first text vector group is formed by concatenating character vectors corresponding to each character of the hash value according to a character sequence of the hash value;
in step S320, obtaining characters to be recognized in a text to be recognized;
in step S330, for each character to be recognized, acquiring a radical in the character to be recognized as a radical combination corresponding to the character to be recognized;
in step S340, the radical combination corresponding to the character to be recognized is input into the first machine learning model to obtain a second vector corresponding to the character to be recognized, and the second vectors are arranged into a second text vector group according to the sequence of the character to be recognized in the text to be recognized;
in step S350, searching a preset radical semantic comparison table according to each radical in the radical combination to obtain a semantic corresponding to each radical, combining the semantics corresponding to the radicals according to the sequence of the radicals in the radical combination, inputting the semantic combination into a third machine learning model to obtain a fourth vector corresponding to the character to be recognized, and arranging the fourth vector into a fourth text vector group according to the sequence of the character to be recognized in the text to be recognized;
in step S360, for each character to be recognized, obtaining the pinyin of the character to be recognized, inputting the pinyin of the character to be recognized into a second machine learning model to obtain a third vector corresponding to the character to be recognized, and arranging the third vectors into a third text vector group according to the sequence of the character to be recognized in the text to be recognized;
in step S370, respectively obtaining vector distances between a first text vector group, a second text vector group, a third text vector group, and a fourth text vector group of the text to be recognized and a first standard text vector group, a second standard text vector group, a third standard text vector group, and a fourth standard text vector group of each standard text in the standard text library, as a first vector distance, a second vector distance, a third vector distance, and a seventh vector distance between the file to be recognized and each standard text;
in step S380, a weighted average of the first vector distance, the second vector distance, the third vector distance, and the seventh vector distance between the text to be recognized and each standard text is obtained as a weighted average vector distance between the text to be recognized and each standard text, and the standard text corresponding to the smallest weighted average vector distance is obtained as a recognition result of the text to be recognized.
In some embodiments of the present application, a first vector distance is obtained according to a hash value of a text to be recognized and a standard text, obtaining a second vector distance according to the combination of the radicals of the text to be recognized and the standard text, obtaining a third vector distance according to the pinyin of the text to be recognized and the standard text, obtaining a seventh vector distance according to the radical semantics of the text to be recognized and the standard text, then obtaining a weighted average vector distance between the text to be recognized and the standard text according to the first vector distance, the second vector distance, the third vector distance and the seventh vector distance, the obtained weighted average vector distance simultaneously considers the Hash value similarity, the radical combination similarity, the pinyin similarity and the radical semantic similarity of the text to be recognized and the standard text, and the standard text corresponding to the selected minimum weighted average vector distance is more accurate.
It should be noted that fig. 3 only schematically shows steps of the text recognition method according to an embodiment of the present application, and step S370 may be performed before step S360.
In one embodiment of the present application, the text recognition method may include steps S410 to S480 as shown in fig. 4:
in step S410, a text to be recognized is obtained, and the number of characters in the text to be recognized is counted;
in step S420, determining whether the number of characters in the text to be recognized reaches a set value;
in step S430, if the number of characters in the text to be recognized reaches a set value, obtaining keywords to be recognized in the text to be recognized; calculating the keywords to be identified according to a Hash algorithm to obtain Hash values corresponding to the keywords to be identified; acquiring a first word vector group corresponding to a hash value of a keyword to be recognized, wherein the first word vector group is formed by connecting character vectors corresponding to each character of the hash value in series according to the character sequence of the hash value;
in step S440, obtaining a character to be recognized in the keyword to be recognized; for each character to be recognized, acquiring a radical in the character to be recognized as a radical combination corresponding to the character to be recognized, inputting the radical combination corresponding to the character to be recognized into a first machine learning model to obtain a second vector corresponding to the character to be recognized, and arranging the second vectors into a second word vector group according to the sequence of the character to be recognized in a keyword to be recognized;
in step S450, for each character to be recognized, obtaining the pinyin of the character to be recognized, inputting the pinyin of the character to be recognized into a second machine learning model to obtain a third vector corresponding to the character to be recognized, and arranging the third vector into a third word vector group according to the sequence of the character to be recognized in the keyword to be recognized;
in step S460, the vector distances between the first word vector group, the second word vector group, and the third word vector group of the keyword to be recognized and the first standard word vector group, the second standard word vector group, and the third standard word vector group of each standard keyword in the standard keyword library are respectively obtained as the fourth vector distance, the fifth vector distance, and the sixth vector distance between the keyword to be recognized and each standard keyword;
in step S470, a weighted average of the fourth vector distance, the fifth vector distance, and the sixth vector distance between the keyword to be recognized and each standard keyword is obtained as a weighted average vector distance between the keyword to be recognized and each standard keyword, and the standard keywords corresponding to the smallest weighted average vector distance are combined in the order of the keywords to be recognized in the text to be recognized as the recognition result of the text to be recognized;
in step S480, if the number of characters in the text to be recognized does not reach the set value, step S220 to step S290 in fig. 2 are performed.
In the technical scheme of the embodiment, only the keywords to be recognized in the text to be recognized are recognized, and the obtained standard keyword combination is used as the recognition result of the text to be recognized, so that the content of the text to be recognized can be accurately recognized, and meanwhile, the calculation steps are reduced.
Embodiments of the apparatus of the present application are described below, which may be used to perform the text recognition methods in the above-described embodiments of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the text recognition method described above in the present application.
Fig. 5 schematically shows a block diagram of a text recognition arrangement according to an embodiment of the present application.
Referring to fig. 5, according to an aspect of the embodiment of the present application, there is provided a text recognition apparatus 500, which includes a text obtaining module 501, a first calculating module 502, a first vector group obtaining module 503, a character obtaining module 504, a second vector group obtaining module 505, a third vector group obtaining module 506, a second calculating module 507, a third calculating module 508, and a determining module 509.
In some embodiments of the present application, based on the foregoing scheme, the text obtaining module 401 is configured to obtain a text to be recognized; the first calculating module 502 is configured to calculate a hash value corresponding to the text to be recognized according to a hash algorithm; the first vector group obtaining module 503 is configured to obtain a first text vector group corresponding to the hash value of the text to be recognized, where the first text vector group is formed by concatenating character vectors corresponding to each character of the hash value of the text to be recognized according to a character sequence of the hash value of the text to be recognized; the character obtaining module 504 is configured to obtain a character to be recognized in a text to be recognized; the second vector group obtaining module 505 is configured to, for each character to be recognized, obtain a radical in the character to be recognized as a radical combination corresponding to the character to be recognized, input the radical combination corresponding to the character to be recognized into the first machine learning model to obtain a second vector corresponding to the character to be recognized, and arrange the second vectors into a second text vector group according to the sequence of the character to be recognized in the text to be recognized; the third vector group obtaining module 506 is configured to obtain, for each character to be recognized, a pinyin of the character to be recognized, input the pinyin of the character to be recognized into the second machine learning model to obtain a third vector corresponding to the character to be recognized, and arrange the third vector into a third text vector group according to an order of the character to be recognized in the text to be recognized; the second calculation module 507 is configured to separately obtain vector distances between a first text vector group, a second text vector group, and a third text vector group of the text to be recognized and a first standard text vector group, a second standard text vector group, and a third standard text vector group of each standard text in the standard text library, and use the vector distances as a first vector distance, a second vector distance, and a third vector distance between the file to be recognized and each standard text; the third calculating module 508 is configured to find a weighted average of the first vector distance, the second vector distance, and the third vector distance between the text to be recognized and each standard text, as a weighted average vector distance between the text to be recognized and each standard text; the determining module 509 is configured to use the standard text corresponding to the minimum weighted average vector distance as a recognition result of the text to be recognized.
In some embodiments of the present application, based on the foregoing solution, the text recognition apparatus further includes: the standard text acquisition module is used for acquiring a plurality of standard texts, and for each standard text, calculating the standard text according to a Hash algorithm to obtain a Hash value corresponding to the standard text; acquiring a first standard text vector group corresponding to the hash value of the standard text, wherein the first standard text vector group is formed by connecting character vectors corresponding to each character of the hash value of the standard text in series according to the character sequence of the hash value of the standard text; acquiring a standard character in the standard text; for each standard character, acquiring a radical in the standard character as a radical combination corresponding to the standard character, inputting the radical combination corresponding to the standard character into a first machine learning model to obtain a second vector corresponding to the standard character, and arranging the second vectors into a second standard text vector group of the standard text according to the sequence of the standard character in the standard text; and for each standard character, obtaining the pinyin of the standard character, inputting the pinyin of the standard character into a second machine learning model to obtain a third vector corresponding to the standard character, and arranging the third vector into a third standard text vector group of the standard text according to the sequence of the standard character in the standard text.
In some embodiments of the present application, based on the foregoing solution, the text recognition apparatus further includes: the statistical module is used for counting the number of characters in the text to be recognized; the text acquisition module 501 is further configured to: if the number of the characters in the text to be recognized reaches a set value, acquiring keywords to be recognized in the text to be recognized; the first calculation module 502 is further configured to: calculating the keywords to be identified according to a Hash algorithm to obtain Hash values corresponding to the keywords to be identified; the first vector group acquisition module 503 is further configured to: acquiring a first word vector group corresponding to the hash value of the keyword to be recognized, wherein the first word vector group is formed by connecting character vectors corresponding to each character of the hash value of the keyword to be recognized in series according to the character sequence of the hash value of the keyword to be recognized; the character acquisition module 504 is further configured to: acquiring characters to be recognized in keywords to be recognized; the second vector group acquisition module 505 is further configured to: for each character to be recognized, acquiring a radical in the character to be recognized as a radical combination corresponding to the character to be recognized, inputting the radical combination corresponding to the character to be recognized into a first machine learning model to obtain a second vector corresponding to the character to be recognized, and arranging the second vectors into a second word vector group according to the sequence of the character to be recognized in a keyword to be recognized; the third vector group acquisition module 506 is further configured to: for each character to be recognized, obtaining the pinyin of the character to be recognized, inputting the pinyin of the character to be recognized into a second machine learning model to obtain a third vector corresponding to the character to be recognized, and arranging the third vector into a third word vector group according to the sequence of the character to be recognized in the keyword to be recognized; the second calculation module 507 is further configured to: respectively solving the vector distances between the first word vector group, the second word vector group and the third word vector group of the keyword to be recognized and the first standard word vector group, the second standard word vector group and the third standard word vector group of each standard keyword in the standard keyword library as a fourth vector distance, a fifth vector distance and a sixth vector distance between the keyword to be recognized and each standard keyword; the third calculation module 508 is further configured to: calculating a weighted average value of a fourth vector distance, a fifth vector distance and a sixth vector distance between the keyword to be recognized and each standard keyword, wherein the weighted average value is used as a weighted average vector distance between the keyword to be recognized and each standard keyword; the determination module 509 is further configured to: and combining the standard keywords corresponding to the minimum weighted average vector distance according to the sequence of the keywords to be recognized in the text to be recognized, and taking the combined keywords as the recognition result of the text to be recognized.
In some embodiments of the present application, based on the foregoing scheme, the determining module 509 is further configured to: obtaining standard texts with weighted average values reaching a threshold value, and if a plurality of standard texts with weighted average values reaching the threshold value exist, sending the plurality of standard texts with weighted average values reaching the threshold value to a user for selection; and acquiring the standard text selected by the user as the recognition result of the text to be recognized.
In some embodiments of the present application, based on the foregoing solution, the text recognition apparatus further includes: the fourth vector group calculation module is used for searching a preset radical semantic comparison table according to each radical in the radical combination to obtain the corresponding semantic of each radical; combining the semantics corresponding to the radicals according to the sequence of the radicals in the radical combination; inputting the semantic combination into a third machine learning model to obtain a fourth vector corresponding to the character to be recognized, and arranging the fourth vector into a fourth text vector group according to the sequence of the character to be recognized in the text to be recognized; the second calculation module 507 is further configured to find vector distances between a first text vector group, a second text vector group, a third text vector group, and a fourth text vector group of the text to be recognized and a first standard text vector group, a second standard text vector group, a third standard text vector group, and a fourth standard text vector group of each standard text in the standard text library, as a first vector distance, a second vector distance, a third vector distance, and a seventh vector distance between the file to be recognized and each standard text; the third calculation module 508 is further configured to serve as a weighted average vector distance between the text to be recognized and each standard text; the weighted average of the first vector distance, the second vector distance, the third vector distance, and the seventh vector distance between the text to be recognized and each standard text is obtained, and the determining module 509 is further configured to use the standard text corresponding to the minimum weighted average vector distance as the recognition result of the text to be recognized.
As will be appreciated by one skilled in the art, aspects of the present application may be embodied as a system, method or program product. Accordingly, various aspects of the present application may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
An electronic device 60 according to this embodiment of the present application is described below with reference to fig. 6. The electronic device 60 shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 6, the electronic device 60 is in the form of a general purpose computing device. The components of the electronic device 60 may include, but are not limited to: the at least one processing unit 61, the at least one memory unit 62, a bus 63 connecting different system components (including the memory unit 62 and the processing unit 61), and a display unit 64.
Wherein the storage unit stores program code executable by the processing unit 61 to cause the processing unit 61 to perform the steps according to various exemplary embodiments of the present application described in the section "example methods" above in this specification.
The storage unit 62 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)621 and/or a cache memory unit 622, and may further include a read only memory unit (ROM) 623.
The storage unit 62 may also include a program/utility 624 having a set (at least one) of program modules 625, such program modules 625 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 63 may be any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 60 may also communicate with one or more external devices (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 60, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 60 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 65. Also, the electronic device 60 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via the network adapter 66. As shown, network adapter 66 communicates with the other modules of electronic device 60 via bus 63. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 60, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to make a computing device (which can be a personal computer, a server, a terminal device, or a network device, etc.) execute the method according to the embodiments of the present application.
There is also provided, in accordance with an embodiment of the present application, a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, various aspects of the present application may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps according to various exemplary embodiments of the present application described in the "exemplary methods" section above of this specification, when the program product is run on the terminal device.
Referring to fig. 7, a program product 70 for implementing the above method according to an embodiment of the present application is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
Furthermore, the above-described figures are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the present application, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (8)

1. A text recognition method, comprising:
acquiring a text to be identified;
calculating the text to be recognized according to a Hash algorithm to obtain a Hash value corresponding to the text to be recognized;
acquiring a first text vector group corresponding to the hash value of the text to be recognized, wherein the first text vector group is formed by connecting character vectors corresponding to each character of the hash value of the text to be recognized in series according to the character sequence of the hash value of the text to be recognized;
acquiring characters to be recognized in the text to be recognized;
for each character to be recognized, acquiring a radical in the character to be recognized as a radical combination corresponding to the character to be recognized, inputting the radical combination corresponding to the character to be recognized into a first machine learning model to obtain a second vector corresponding to the character to be recognized, and arranging the second vectors into a second text vector group according to the sequence of the character to be recognized in the text to be recognized;
for each character to be recognized, obtaining the pinyin of the character to be recognized, inputting the pinyin of the character to be recognized into a second machine learning model to obtain a third vector corresponding to the character to be recognized, and arranging the third vector into a third text vector group according to the sequence of the character to be recognized in the text to be recognized;
respectively solving the vector distances between the first text vector group, the second text vector group and the third text vector group of the text to be recognized and the first standard text vector group, the second standard text vector group and the third standard text vector group of each standard text in the standard text library as the first vector distance, the second vector distance and the third vector distance between the file to be recognized and each standard text;
calculating a weighted average value of a first vector distance, a second vector distance and a third vector distance between the text to be recognized and each standard text, and taking the weighted average value as a weighted average vector distance between the text to be recognized and each standard text;
and taking the standard text corresponding to the minimum weighted average vector distance as the recognition result of the text to be recognized.
2. The text recognition method of claim 1,
before the vector distances between the first vector group, the second vector group and the third vector group of the text to be recognized and the first standard text vector group, the second standard text vector group and the third standard text vector group of each standard text in the standard text library are respectively calculated and used as the first vector distance, the second vector distance and the third vector distance between the file to be recognized and each standard text, the method further comprises:
acquiring a plurality of standard texts, and calculating the standard texts according to a Hash algorithm to obtain Hash values corresponding to the standard texts for each standard text;
acquiring a first standard text vector group corresponding to the hash value of the standard text, wherein the first standard text vector group is formed by connecting character vectors corresponding to each character of the hash value of the standard text in series according to the character sequence of the hash value of the standard text;
acquiring a standard character in the standard text;
for each standard character, acquiring a radical in the standard character as a radical combination corresponding to the standard character, inputting the radical combination corresponding to the standard character into a first machine learning model to obtain a second vector corresponding to the standard character, and arranging the second vectors into a second standard text vector group of the standard text according to the sequence of the standard character in the standard text;
and acquiring the pinyin of the standard character for each standard character, inputting the pinyin of the standard character into a second machine learning model to obtain a third vector corresponding to the standard character, and arranging the third vector into a third standard text vector group of the standard text according to the sequence of the standard character in the standard text.
3. The text recognition method of claim 1,
after the obtaining of the text to be recognized, the method further includes: counting the number of characters in the text to be recognized;
if the number of the characters in the text to be recognized reaches a set value, acquiring keywords to be recognized in the text to be recognized;
calculating the keywords to be identified according to a Hash algorithm to obtain Hash values corresponding to the keywords to be identified;
acquiring a first word vector group corresponding to the hash value of the keyword to be recognized, wherein the first word vector group is formed by connecting character vectors corresponding to each character of the hash value of the keyword to be recognized in series according to the character sequence of the hash value of the keyword to be recognized;
acquiring characters to be recognized in the keywords to be recognized;
for each character to be recognized, acquiring a radical in the character to be recognized as a radical combination corresponding to the character to be recognized, inputting the radical combination corresponding to the character to be recognized into a first machine learning model to obtain a second vector corresponding to the character to be recognized, and arranging the second vectors into a second word vector group according to the sequence of the character to be recognized in the keyword to be recognized;
for each character to be recognized, obtaining the pinyin of the character to be recognized, inputting the pinyin of the character to be recognized into a second machine learning model to obtain a third vector corresponding to the character to be recognized, and arranging the third vector into a third word vector group according to the sequence of the character to be recognized in the keyword to be recognized;
respectively solving the vector distances between the first word vector group, the second word vector group and the third word vector group of the keyword to be recognized and the first standard word vector group, the second standard word vector group and the third standard word vector group of each standard keyword in the standard keyword library as a fourth vector distance, a fifth vector distance and a sixth vector distance between the keyword to be recognized and each standard keyword;
calculating a weighted average value of a fourth vector distance, a fifth vector distance and a sixth vector distance between the keyword to be recognized and each standard keyword, wherein the weighted average value is used as a weighted average vector distance between the keyword to be recognized and each standard keyword;
and combining the standard keywords corresponding to the minimum weighted average vector distance according to the sequence of the keywords to be recognized in the text to be recognized, and taking the combined keywords as the recognition result of the text to be recognized.
4. The text recognition method of claim 1,
after the weighted average of the first vector distance, the second vector distance and the third vector distance between the text to be recognized and each standard text is obtained, the method further comprises the following steps:
acquiring a standard text of which the weighted average value reaches a threshold value,
if the weighted average value reaches a plurality of standard texts of the threshold value, sending the plurality of standard texts of which the weighted average value reaches the threshold value to a user for selection;
and acquiring the standard text selected by the user as a recognition result of the text to be recognized.
5. The text recognition method of claim 1,
after acquiring the radical in the character to be recognized as the radical combination corresponding to the character to be recognized for each character to be recognized,
searching a preset radical semantic comparison table according to each radical in the radical combination to obtain the corresponding semantic of each radical;
combining the semantics corresponding to the radicals according to the sequence of the radicals in the radical combination;
inputting the semantic combination into a third machine learning model to obtain a fourth vector corresponding to the character to be recognized, wherein the fourth vector is arranged into a fourth text vector group according to the sequence of the character to be recognized in the text to be recognized;
the method further comprises the following steps:
respectively solving vector distances between a first text vector group, a second text vector group, a third text vector group and a fourth text vector group of the text to be recognized and a first standard text vector group, a second standard text vector group, a third standard text vector group and a fourth standard text vector group of each standard text in a standard text library, and taking the vector distances as a first vector distance, a second vector distance, a third vector distance and a seventh vector distance between the file to be recognized and each standard text;
calculating a weighted average value of a first vector distance, a second vector distance, a third vector distance and a seventh vector distance between the text to be recognized and each standard text, and taking the weighted average value as a weighted average vector distance between the text to be recognized and each standard text;
and taking the standard text corresponding to the minimum weighted average vector distance as the recognition result of the text to be recognized.
6. A text recognition apparatus, comprising:
the text acquisition module is used for acquiring a text to be identified;
the first calculation module is used for calculating the text to be recognized according to a Hash algorithm to obtain a Hash value corresponding to the text to be recognized;
a first vector group obtaining module, configured to obtain a first text vector group corresponding to the hash value of the text to be recognized, where the first text vector group is formed by concatenating character vectors corresponding to each character of the hash value of the text to be recognized in a character order of the hash value of the text to be recognized;
the character acquisition module is used for acquiring characters to be recognized in the text to be recognized;
a second vector group obtaining module, configured to, for each character to be recognized, obtain a radical in the character to be recognized as a radical combination corresponding to the character to be recognized, input the radical combination corresponding to the character to be recognized into a first machine learning model to obtain a second vector corresponding to the character to be recognized, where the second vectors are arranged in the text to be recognized according to the sequence of the character to be recognized in the text to be recognized to form a second text vector group;
a third vector group obtaining module, configured to obtain, for each character to be recognized, a pinyin of the character to be recognized, input the pinyin of the character to be recognized into a second machine learning model to obtain a third vector corresponding to the character to be recognized, where the third vector is arranged in the text to be recognized according to an order of the character to be recognized in the text to be recognized to form a third text vector group;
the second calculation module is used for respectively calculating the vector distances between the first text vector group, the second text vector group and the third text vector group of the text to be recognized and the first standard text vector group, the second standard text vector group and the third standard text vector group of each standard text in the standard text library as the first vector distance, the second vector distance and the third vector distance between the file to be recognized and each standard text;
the third calculation module is used for solving the weighted average value of the first vector distance, the second vector distance and the third vector distance between the text to be recognized and each standard text, and the weighted average value is used as the weighted average vector distance between the text to be recognized and each standard text;
and the determining module is used for taking the standard text corresponding to the minimum weighted average vector distance as the recognition result of the text to be recognized.
7. A computer-readable program medium, characterized in that it stores computer program instructions which, when executed by a computer, cause the computer to perform the method according to any one of claims 1 to 5.
8. An electronic device, comprising:
a processor;
a memory having stored thereon computer readable instructions which, when executed by the processor, implement the method of any of claims 1 to 5.
CN201910979595.2A 2019-10-15 2019-10-15 Text recognition method, text recognition device, text recognition medium and electronic equipment Active CN110929749B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910979595.2A CN110929749B (en) 2019-10-15 2019-10-15 Text recognition method, text recognition device, text recognition medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910979595.2A CN110929749B (en) 2019-10-15 2019-10-15 Text recognition method, text recognition device, text recognition medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN110929749A true CN110929749A (en) 2020-03-27
CN110929749B CN110929749B (en) 2022-04-29

Family

ID=69848950

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910979595.2A Active CN110929749B (en) 2019-10-15 2019-10-15 Text recognition method, text recognition device, text recognition medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN110929749B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023236246A1 (en) * 2022-06-06 2023-12-14 青岛海尔科技有限公司 Text information recognition method and apparatus, and storage medium and electronic apparatus

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1725228A (en) * 2004-07-22 2006-01-25 摩托罗拉公司 Hand writing identification method and system using background picture element
CN108875537A (en) * 2018-02-28 2018-11-23 北京旷视科技有限公司 Method for checking object, device and system and storage medium
CN109165384A (en) * 2018-08-23 2019-01-08 成都四方伟业软件股份有限公司 A kind of name entity recognition method and device
CN109388807A (en) * 2018-10-30 2019-02-26 中山大学 The method, apparatus and storage medium of electronic health record name Entity recognition
CN110209892A (en) * 2019-04-17 2019-09-06 深圳壹账通智能科技有限公司 Sensitive information recognition methods, device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1725228A (en) * 2004-07-22 2006-01-25 摩托罗拉公司 Hand writing identification method and system using background picture element
CN108875537A (en) * 2018-02-28 2018-11-23 北京旷视科技有限公司 Method for checking object, device and system and storage medium
CN109165384A (en) * 2018-08-23 2019-01-08 成都四方伟业软件股份有限公司 A kind of name entity recognition method and device
CN109388807A (en) * 2018-10-30 2019-02-26 中山大学 The method, apparatus and storage medium of electronic health record name Entity recognition
CN110209892A (en) * 2019-04-17 2019-09-06 深圳壹账通智能科技有限公司 Sensitive information recognition methods, device, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023236246A1 (en) * 2022-06-06 2023-12-14 青岛海尔科技有限公司 Text information recognition method and apparatus, and storage medium and electronic apparatus

Also Published As

Publication number Publication date
CN110929749B (en) 2022-04-29

Similar Documents

Publication Publication Date Title
US11403680B2 (en) Method, apparatus for evaluating review, device and storage medium
CN107729313B (en) Deep neural network-based polyphone pronunciation distinguishing method and device
CN107767870B (en) Punctuation mark adding method and device and computer equipment
CN110019732B (en) Intelligent question answering method and related device
JP2016513269A (en) Method and device for acoustic language model training
CN110941951B (en) Text similarity calculation method, text similarity calculation device, text similarity calculation medium and electronic equipment
CN107341143B (en) Sentence continuity judgment method and device and electronic equipment
CN110334209B (en) Text classification method, device, medium and electronic equipment
CN112632226B (en) Semantic search method and device based on legal knowledge graph and electronic equipment
WO2022105235A1 (en) Information recognition method and apparatus, and storage medium
CN111753082A (en) Text classification method and device based on comment data, equipment and medium
CN111858843A (en) Text classification method and device
CN114861889A (en) Deep learning model training method, target object detection method and device
CN110503956A (en) Audio recognition method, device, medium and electronic equipment
CN112036186A (en) Corpus labeling method and device, computer storage medium and electronic equipment
CN110929499B (en) Text similarity obtaining method, device, medium and electronic equipment
CN110929749B (en) Text recognition method, text recognition device, text recognition medium and electronic equipment
CN112632956A (en) Text matching method, device, terminal and storage medium
CN115858776B (en) Variant text classification recognition method, system, storage medium and electronic equipment
CN116049370A (en) Information query method and training method and device of information generation model
CN114758649B (en) Voice recognition method, device, equipment and medium
CN113988085B (en) Text semantic similarity matching method and device, electronic equipment and storage medium
CN107656627B (en) Information input method and device
CN114090885B (en) Product title core word extraction method, related device and computer program product
CN111339760A (en) Method and device for training lexical analysis model, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant