CN108846367A - Uncommon word processing method calculates equipment and computer storage medium - Google Patents

Uncommon word processing method calculates equipment and computer storage medium Download PDF

Info

Publication number
CN108846367A
CN108846367A CN201810659246.8A CN201810659246A CN108846367A CN 108846367 A CN108846367 A CN 108846367A CN 201810659246 A CN201810659246 A CN 201810659246A CN 108846367 A CN108846367 A CN 108846367A
Authority
CN
China
Prior art keywords
rarely used
identified
used word
line
path
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810659246.8A
Other languages
Chinese (zh)
Other versions
CN108846367B (en
Inventor
张恒
李铭瀚
于刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ireader Technology Co Ltd
Zhangyue Technology Co Ltd
Original Assignee
Zhangyue Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhangyue Technology Co Ltd filed Critical Zhangyue Technology Co Ltd
Priority to CN201810659246.8A priority Critical patent/CN108846367B/en
Publication of CN108846367A publication Critical patent/CN108846367A/en
Application granted granted Critical
Publication of CN108846367B publication Critical patent/CN108846367B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Character Discrimination (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of uncommon word processing method, calculate equipment and computer storage medium, wherein method includes:The text object of document is identified, determines rarely used word region to be identified, obtains the rarely used word information to be identified in rarely used word region to be identified;According to similarity mode rule, judge in the recognition result of the document with the presence or absence of the rarely used word information of identification with the rarely used word information matches to be identified;If so, having identified the corresponding recognition result of rarely used word information as the recognition result in the rarely used word region to be identified for described.The present invention program, for identical rarely used word, it only needs to save unique a recognition result, and, in the scene for showing rarely used word to be identified, the identical recognition result for having identified rarely used word is directly used as the recognition result in rarely used word region to be identified, avoids the storage of a large amount of duplicate recognition results, and then reduces the size of book file.

Description

Uncommon word processing method calculates equipment and computer storage medium
Technical field
The present invention relates to text recognition technique fields, and in particular to a kind of uncommon word processing method calculates equipment and calculating Machine storage medium.
Background technique
Currently, with universal and E-book reader the development of the mobile terminals such as mobile phone, e-book is increasingly reviewed Read the favor of user.At the same time, in electronic reader, to enable document content according to the characteristic of arrangement for reading, with most Mode suitable for reading is shown, format document need to be changed into streaming document, for example, PDF document is changed into electronic publishing document (Electronic Publication, abbreviation ePUB).
However, the character code mode due to format documents such as PDF is limited, cause a large amount of rarely used words can only be with path-line Form shows, and for these rarely used words, then the recognition methods using rarely used word is needed, to obtain the rarely used word of specific position Recognition result, such as obtain the rarely used word picture of specific position;And in the filling process of streaming document, then with the spy saved The out-of-the-way filling of new word for positioning the recognition result for the rarely used word set to carry out corresponding position, so that the streaming document being presented to the user Content intact and orderly.But if format document different location, there is identical rarely used word, then need to save respectively The recognition result of the rarely used word of corresponding many places different location, for example, the rarely used word a in format document occurs 100 times, then The identical recognition result that rarely used word a at corresponding 100 is saved in recognition result, can make the recognition result of the rarely used word saved in this way It is repeated in the presence of a large amount of, significantly increases the data volume of book file.
Summary of the invention
In view of the above problems, it proposes on the present invention overcomes the above problem or at least be partially solved in order to provide one kind It states the uncommon word processing method of problem, calculate equipment and computer storage medium.
According to an aspect of the invention, there is provided the text object to document identifies, rarely used word to be identified is determined Region obtains the rarely used word information to be identified in rarely used word region to be identified;
According to similarity mode rule, judge in the recognition result of the document with the presence or absence of with it is described to be identified uncommon The rarely used word information of identification of word information matches;
If so, having identified the corresponding recognition result of rarely used word information as the rarely used word region to be identified for described Recognition result.
According to another aspect of the present invention, a kind of calculating equipment is provided, including:Processor, memory, communication interface and Communication bus, the processor, the memory and the communication interface complete mutual communication by the communication bus;
For the memory for storing an at least executable instruction, it is following that the executable instruction executes the processor Operation:
The text object of document is identified, determines rarely used word region to be identified, is obtained in rarely used word region to be identified Rarely used word information to be identified;
According to similarity mode rule, judge in the recognition result of the document with the presence or absence of with it is described to be identified uncommon The rarely used word information of identification of word information matches;
If so, having identified the corresponding recognition result of rarely used word information as the rarely used word region to be identified for described Recognition result.
According to another aspect of the invention, provide a kind of computer storage medium, be stored in the storage medium to A few executable instruction, the executable instruction make processor execute following operation:
The text object of document is identified, determines rarely used word region to be identified, is obtained in rarely used word region to be identified Rarely used word information to be identified;
According to similarity mode rule, judge in the recognition result of the document with the presence or absence of with it is described to be identified uncommon The rarely used word information of identification of word information matches;
If so, having identified the corresponding recognition result of rarely used word information as the rarely used word region to be identified for described Recognition result.
Uncommon word processing method, calculating equipment and the computer storage medium provided according to the present invention, according to similarity With rule, judge to believe in the recognition result of document with the presence or absence of with the rarely used word of identification of rarely used word information matches to be identified Breath;If it exists, then it no longer as the prior art, is identified for the rarely used word to be identified, and recognition result is carried out It saves, but has identified the corresponding recognition result of rarely used word information directly as the knowledge in rarely used word region to be identified for what is matched Other result.It can be seen that the present invention program, for identical rarely used word, it is only necessary to unique a recognition result is saved, and, In the scene for showing rarely used word to be identified, directly use the identical recognition result for having identified rarely used word as uncommon block to be identified The recognition result in domain, avoids the storage of a large amount of duplicate recognition results, and then reduces the size of book file.
The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention, And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can It is clearer and more comprehensible, the followings are specific embodiments of the present invention.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 shows the flow chart of uncommon word processing method according to an embodiment of the invention;
Fig. 2 shows the flow charts of uncommon word processing method in accordance with another embodiment of the present invention;
Fig. 3 shows the flow chart of the uncommon word processing method of another embodiment according to the present invention;
Fig. 4 shows a kind of structural schematic diagram for calculating equipment according to an embodiment of the present invention.
Specific embodiment
Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure It is fully disclosed to those skilled in the art.
Fig. 1 shows the flow chart of uncommon word processing method according to an embodiment of the invention.As shown in Figure 1, the party Method includes:
Step S101 identifies the text object of document, determines rarely used word region to be identified, obtains to be identified uncommon Rarely used word information to be identified in the domain of block.
Wherein, document is format document;The text object of document includes the character text encoded out by character code mode The rarely used word that sheet and passage path line drawing are drawn.
Wherein, rarely used word information to be identified is the information that can be used in indicating the structure composition of rarely used word to be identified.For example, Rarely used word information to be identified is the information for indicating to form the path-line quantity of rarely used word to be identified.
Specifically, when carrying out the identification of text object of format document, character text passes through existing text identification skill Art can be easier to identify;And since rarely used word to be identified is that passage path line drawing is drawn, it is not to be obtained by coding, It therefore can recognition failures.That is, the unidentified character text out in position where corresponding to rarely used word to be identified in recognition result.It is based on This, in this step, determines rarely used word region to be identified according to the recognition result of each position.But in the present invention, not to true The concrete mode in fixed rarely used word region to be identified does any restriction, when it is implemented, those skilled in the art can be according to practical feelings Condition selects suitable mode, optionally, whether there is path-line according to corresponding position, to determine rarely used word region to be identified.
Step S102 judges to whether there is and life to be identified in the recognition result of document according to similarity mode rule The rarely used word information of identification of rare word information matches.If so, thening follow the steps S103;If it is not, then this method terminates.
Wherein, similarity mode rule is configured according to the analog information in the structure composition of identical rarely used word.For example, The positional relationship of the respective path line of identical rarely used word is consistent, then similarity mode rule can be arranged accordingly and be:According to Identification rarely used word and the consistency of the positional relationship for the path-line for having identified rarely used word carry out similarity mode.
Specifically, according to similarity mode rule, rarely used word information to be identified and the identification in recognition result is raw Rare word information carries out similarity-rough set, judges whether there is the identification with rarely used word information matches to be identified according to comparison result Rarely used word information, and if it exists, then determine the rarely used word information of identification to match it is corresponding identified rarely used word be with wait know The identical rarely used word of other rarely used word, then no longer identified for the rarely used word to be identified, and be not repeated to save identical Recognition result, i.e., the recognition result of rarely used word to be identified, but step S103 is executed, identify that rarely used word information is corresponding with basis Recognition result handle rarely used word to be identified;If it does not exist, it is determined that all to have identified that rarely used word information is corresponding and identified life Rare word is not identical as rarely used word to be identified, i.e., rarely used word to be identified is new rarely used word, then is directed to rarely used word to be identified into one Step identified, after the completion of identification, using the recognition result of rarely used word information to be identified as identified rarely used word recognition result, And using rarely used word information to be identified as having identified that rarely used word information is stored in recognition result, sentence to match next time It is used when disconnected.In the present invention, the method for identifying rarely used word to be identified is not specifically limited, when it is implemented, this field skill Art personnel can flexibly determine recognition methods, optionally, according to the screenshot in rarely used word region to be identified, carry out to rarely used word to be identified Identification.
For example, in the recognition result of document 1, correspondence has identified rarely used word a, has identified rarely used word b and known Other rarely used word c is preserved respectively and has been identified rarely used word information A, identified rarely used word information B and identified rarely used word information C.By current rarely used word information D to be identified with identified rarely used word information A, identified rarely used word information B and identified uncommon Word information C is compared, if rarely used word information D to be identified matches with rarely used word information B has been identified, it is determined that be identified uncommon The corresponding rarely used word d to be identified of word information D with identified that rarely used word information B is corresponding and identified that rarely used word b is identical, then execute step Rapid S103, according to having identified that the corresponding recognition result b of rarely used word information B handles rarely used word d to be identified;If rarely used word letter to be identified It ceases D and has identified rarely used word information A, identified rarely used word information B and identified that rarely used word information C is mismatched, it is determined that The corresponding rarely used word d to be identified of rarely used word information D to be identified is new rarely used word, then carries out for rarely used word d to be identified uncommon Word identification.
Step S103 will identify the corresponding recognition result of rarely used word information as the identification knot in rarely used word region to be identified Fruit.
Specifically, the identification existed at the judgement in the recognition result of document with rarely used word information matches to be identified is raw After rare word information, the rarely used word to be identified to match is established and the incidence relation that has identified rarely used word.Optionally, it establishes to be identified Rarely used word has identified the incidence relation of rarely used word information with what is matched, alternatively, establishing rarely used word to be identified and matching Identify the incidence relation of the corresponding recognition result of rarely used word information.In the scene for needing to show the rarely used word to be identified, then root According to incidence relation, the identical recognition result for having identified rarely used word is determined;Then it is to be identified the recognition result to be filled into display In the rarely used word region to be identified of rarely used word.
The identification of document is judged according to similarity mode rule according to uncommon word processing method provided in this embodiment As a result with the presence or absence of the rarely used word information of identification with rarely used word information matches to be identified in;If it exists, then no longer as existing skill Art is the same, the identification for being identified, and save to recognition result, but being matched for the rarely used word to be identified Recognition result of the corresponding recognition result of rarely used word information directly as rarely used word region to be identified.It can be seen that the present embodiment Scheme, for identical rarely used word, it is only necessary to unique a recognition result is saved, and, in the scene for showing rarely used word to be identified In, the identical recognition result for having identified rarely used word is directly used as the recognition result in rarely used word region to be identified, is avoided big The storage of duplicate recognition result is measured, and then reduces the size of book file.
Fig. 2 shows the flow charts of uncommon word processing method in accordance with another embodiment of the present invention.In the present embodiment, Rarely used word information is specially path-line information.As shown in Fig. 2, this method includes:
Step S201 identifies the text object of document, determines rarely used word region to be identified, obtains to be identified uncommon Rarely used word path-line information to be identified in the domain of block.
Wherein, rarely used word path-line information to be identified includes the quantity and life to be identified of rarely used word path-line to be identified The location information of each paths line of rare word.
In the present embodiment, first according to the recognition result of text object, specified region is determined;Wherein, specified region is position The region of the unidentified text object out of covering between two identified text objects;And due to unidentified text pair out As the reason of there are many, then by judging whether specified region meets preset rules, to determine whether specified region is to be identified Rarely used word region.Further, judge whether specified region meets preset rules and include:Judge specified region width whether position In in preset characters width range, if so, determining that specified region meets preset rules.Wherein, specify the width in region specific For the width between adjacent two edge of two identified text objects, optionally, preset characters width range be 1 to 2 character widths.And/or judge whether comprising path-line in the specified region, if so, determining the specified region symbol Close preset rules.And/or judge whether the specified region is covered with content of text, if it is not, then determining the specified region symbol Close preset rules.When judging that specified region meets preset rules, it is determined that specified region is rarely used word region to be identified.But It is that the present invention is not limited in a manner of the above-mentioned determination shown rarely used word region to be identified.
Step S202 judges to whether there is and life to be identified in the recognition result of document according to path-line comparison rule The rarely used word path-line information of identification of rare word path-line information matches;If so, thening follow the steps S203;If it is not, then executing step Rapid S204.
Wherein, path-line comparison rule includes the comparison rule of two levels of quantity and path line position information of path-line Then.
Specifically, the quantity of rarely used word path-line more to be identified with identified rarely used word path-line quantity whether phase Deng;If equal, judge each paths line position information in rarely used word path-line to be identified and identified rarely used word path-line Whether each paths line position information matches;Optionally, each paths line position information is the extreme coordinates of each paths line.
In format document, what the mulitpath line of each rarely used word was ordered, for example, in PDF document, according to system The mulitpath line of each rarely used word is numbered in one coding rule.In some embodiments of the invention, then may be used Judge whether path line position information matches as follows:
Step 1 calculates the extreme coordinates with the rarely used word path-line to be identified of same paths line number and has identified life The coordinate difference of the extreme coordinates of rare word path-line.Specifically, first against rarely used word region to be identified and uncommon block has been identified Domain determines the first coordinate origin and the second coordinate origin respectively, and establishes the first coordinate system and the second coordinate system respectively, wherein to The first coordinate system for identifying rarely used word region keeps relatively uniform with the second coordinate system for having identified rarely used word region.With to be identified Rarely used word region and for having identified that rarely used word region is rectangular area, the left upper apex in rarely used word region to be identified are first to sit Origin is marked, the top in rarely used word region to be identified is the X-axis of the first coordinate system, and the left side in rarely used word region to be identified is the first seat Mark the Y-axis of system;The left upper apex for having identified rarely used word region is the second coordinate origin, has identified that the top in rarely used word region is the The X-axis of two coordinate systems has identified that the left side in rarely used word region is the Y-axis of the second coordinate system.The end of rarely used word path-line to be identified Point coordinate refers to coordinate of two endpoints compared to the first coordinate system of rarely used word path-line to be identified, has identified rarely used word path The extreme coordinates of line, which refer to, has identified coordinate of two endpoints of rarely used word path-line compared to the second coordinate system.Specifically, phase With the extreme coordinates of the rarely used word path-line to be identified of path-line number and the seat for the extreme coordinates for having identified rarely used word path-line Mark difference is calculated by following formula:
In formula, i is path-line number, and j is path line endpoints serial number (j=1,2), rijFor j-th endpoint of path-line i The coordinate difference of extreme coordinates, xijFor the coordinate value of the X axis coordinate of j-th of endpoint of rarely used word path-line i to be identified, x'ijFor Identify the coordinate value of the X axis coordinate of j-th of endpoint of rarely used word path-line i, yijIt is j-th of rarely used word path-line i to be identified The coordinate value of the Y axis coordinate of endpoint, y'ijFor identified rarely used word path-line i j-th of endpoint Y axis coordinate coordinate value.
Step 2 calculates rarely used word to be identified and has identified the variance of the coordinate difference of all path-lines of rarely used word.Specifically, Rarely used word to be identified is calculated according to the following formula and has identified the variance of the coordinate difference of all path-lines of rarely used word:
In formula, n is the quantity of path-line, s2For the variance of the coordinate difference of all path-lines,For the coordinate of all path-lines The average value of difference.
Step 3, judges whether variance is less than default desired value;If so, determining each item in rarely used word path-line to be identified Path line position information matches with each paths line position information for having identified rarely used word path-line.Ideally, due to The stroke number of identical rarely used word is identical (i.e. the quantity of path-line is identical), and same document according to be identically numbered rule into Row number, then the corresponding path-line number endpoint location of identical stroke is also identical.If rarely used word to be identified with identified it is uncommon Word is identical, then calculated variance yields is 0.But, it is contemplated that some error components, for example, due to determining to be identified uncommon Error caused by block domain is different from the size for having identified rarely used word region judges to be identified uncommon then by presetting desired value In word path-line each paths line position information whether with each paths line position information phase for having identified rarely used word path-line Match.
However, it is desirable to which described herein be, the variance of the above-mentioned coordinate difference according to all path-lines judges path line position The whether matched mode of information is only a kind of preferred embodiment of the invention, and the present invention is not limited thereto, this field skill Art personnel should be understood that all each paths lines that can be used in determining rarely used word to be identified and identify each of rarely used word The embodiment of the consistency of the location information of paths line is all contained in the scope of the present invention.Optionally, of the invention It, can also be according to the length value of each paths line, alternatively, according to the flat of the coordinate difference of all path-lines in other specific implementations Mean value or standard deviation, to judge each paths line position information of rarely used word to be identified and identify each paths line of rarely used word Whether location information matches.
If judging each paths line position information in rarely used word path-line to be identified and having identified rarely used word path-line Each paths line position information matches, then exists and rarely used word path-line information matches to be identified in the recognition result of document The rarely used word path-line information of identification, that is, determine that rarely used word to be identified is and the rarely used word path-line information of identification that matches It is corresponding to have identified the identical rarely used word of rarely used word, then it is no longer identified for the rarely used word to be identified, and be not repeated The recognition result of the identical rarely used word to be identified of preservation recognition result corresponding with rarely used word path-line information has been identified, but hold Row step S203 has identified that the corresponding recognition result of rarely used word path-line information handles rarely used word to be identified with basis;If judgement Each paths line position information and each paths line position for having identified rarely used word path-line in rarely used word path-line to be identified out Information does not match that then there is no the identification with rarely used word path-line information matches to be identified is raw in the recognition result of document Rare word path-line information, thens follow the steps S204, for rarely used word further progress to be identified identify and to recognition result into Row saves, use when to match next time.,
Step S203 will identify the corresponding recognition result of rarely used word path-line information as rarely used word region to be identified Recognition result.
Step S204 identifies rarely used word to be identified, and saves the recognition result of rarely used word to be identified to having known In other result.
If there is no the identification rarely used words with rarely used word path-line information matches to be identified in the recognition result of document Path-line information, i.e., rarely used word to be identified are new rarely used word, then identify, identified for rarely used word further progress to be identified Cheng Hou, using the recognition result of rarely used word information to be identified as the recognition result for having identified rarely used word and will be to be identified uncommon Word information is as having identified that rarely used word information is stored in recognition result, to use when matching judgment next time.
In some embodiments of the invention, rarely used word to be identified is identified in the following way:Treat knowledge Other rarely used word region carries out screenshot and obtains rarely used word picture to be identified, using picture character identification technology to rarely used word figure to be identified Piece is identified;Optionally, picture character identification technology is specially OCR technique.If identification one character of output, with the character As the corresponding recognition result of rarely used word to be identified;If identification exports multiple characters or fails to obtain recognition result, wait know Other rarely used word picture itself is used as the corresponding recognition result of rarely used word to be identified.But the present invention is not limited thereto, it is specific real Shi Shi can also carry out the identification of rarely used word to be identified in a manner of other feasible identification rarely used words.
According to uncommon word processing method provided in this embodiment, by the quantity of rarely used word path-line to be identified and life has been identified The quantity of the path-line of rare word is compared, and by paths line position information each in rarely used word path-line to be identified with known Each paths line position information of other rarely used word path-line is compared, and is judged in the recognition result of document according to comparison result With the presence or absence of the rarely used word path-line information of identification with rarely used word path-line information matches to be identified;If the identification knot of document There is the rarely used word path-line information of identification with rarely used word path-line information matches to be identified, the then knowledge that will be matched in fruit Recognition result of the corresponding recognition result of other rarely used word path-line information as rarely used word region to be identified.It can be seen that this reality A scheme is applied, for identical rarely used word, it is only necessary to unique a recognition result is saved, and, showing rarely used word to be identified In scene, the identical recognition result for having identified rarely used word is directly used to avoid as the recognition result in rarely used word region to be identified The storages of a large amount of duplicate recognition results, and then reduce the size of book file.
Fig. 3 shows the flow chart of the uncommon word processing method of another embodiment according to the present invention.In the present embodiment, Rarely used word information is to carry out the rarely used word picture that screenshot obtains to rarely used word region.As shown in figure 3, this method includes:
Step S301 identifies the text object of document, determines rarely used word region to be identified, obtains to life to be identified Rare word region carries out the rarely used word picture to be identified that screenshot obtains.
Wherein, there are path-lines in rarely used word region to be identified.In this step, screenshot is carried out to rarely used word region to be identified Processing, obtains rarely used word picture to be identified, then rarely used word picture to be identified, which remains, corresponds to rarely used word to be identified in format document The raw information in region.
Specifically, the edge line of screenshot is determined by following steps:According to the text pair on the left of rarely used word region to be identified The right hand edge of elephant determines the left edge of screenshot, determines screenshot according to the left edge of the text object on the right side of rarely used word region to be identified Right hand edge, the top edge of screenshot is determined according to top edge higher in the text object of rarely used word region to be identified two sides, with And lower lower edge determines the lower edge of screenshot in the text object according to rarely used word region to be identified two sides, obtains to be identified Rarely used word picture.Optionally, directly with the right hand edge of the text object on the left of rarely used word region to be identified, uncommon block to be identified The left edge of text object on the right side of domain, rarely used word region to be identified two sides text object in higher top edge and wait know In the text object of other rarely used word region two sides lower lower edge determine respectively the left edge of screenshot, right hand edge, top edge and Lower edge;Can also be with, on the basis of the left edge of the screenshot of aforementioned determination, right hand edge, top edge and lower edge, to the left, to It is right, upward and widen predetermined width respectively downwards, left edge, right hand edge, top edge and lower edge after widening as section Left edge, right hand edge, top edge and the lower edge of figure, which can be such that the rarely used word picture to be identified obtained by screenshot fills Divide the raw information including rarely used word to be identified in format document.
Step S302, according to picture similarity calculation rule, judge in the recognition result of document with the presence or absence of with wait know The rarely used word picture of identification of other rarely used word picture match.If so, thening follow the steps S303;If it is not, thening follow the steps S304.
Specifically, judge whether rarely used word picture to be identified is greater than or waits with any similarity for having identified rarely used word picture In default similarity threshold, for example, by rarely used word picture to be identified and it is any identified rarely used word picture distinguish binary conversion treatment, Then rarely used word picture to be identified and any similarity for having identified rarely used word picture are judged according to binarization result, still, this Invention is not illustrated with this to be limited.If similarity is greater than or equal to default similarity threshold, the recognition result of document is determined Middle exist and the rarely used word picture of identification of rarely used word picture match to be identified, i.e., rarely used word to be identified are and the knowledge that matches Other rarely used word picture is corresponding to have identified the identical rarely used word of rarely used word, thens follow the steps S303, has identified rarely used word with basis The corresponding recognition result of information handles rarely used word to be identified;If similarity is less than default similarity threshold, it is directed to life to be identified Rare word further progress is identified and is saved to recognition result, use when to match next time.
Step S303 will identify the corresponding recognition result of rarely used word picture as the identification knot in rarely used word region to be identified Fruit.
Step S304 identifies rarely used word to be identified, and saves the recognition result of rarely used word to be identified to having known In other result.
According to uncommon word processing method provided in this embodiment, rarely used word region to be identified progress screenshot is obtained to be identified Rarely used word picture calculates rarely used word picture to be identified and the identification rarely used word in recognition result using picture similarity algorithm The similarity of picture, and judge that recognition result whether there is the rarely used word figure of identification to match with rarely used word picture to be identified Piece;If there is the rarely used word picture of identification with rarely used word picture match to be identified in the recognition result of document, by phase That matches has identified recognition result of the corresponding recognition result of rarely used word picture as rarely used word region to be identified.It can be seen that this Example scheme, for identical rarely used word, it is only necessary to unique a recognition result is saved, and, showing rarely used word to be identified Scene in, directly use the identical recognition result for having identified rarely used word to keep away as the recognition result in rarely used word region to be identified Exempt from the storage of a large amount of duplicate recognition results, and then reduces the size of book file.
The embodiment of the present application provides a kind of nonvolatile computer storage media, and the computer storage medium is stored with The uncommon word processing side in above-mentioned any means embodiment can be performed in an at least executable instruction, the computer executable instructions Method.
Executable instruction specifically can be used for so that processor executes following operation:
The text object of document is identified, determines rarely used word region to be identified, is obtained in rarely used word region to be identified Rarely used word information to be identified;
According to similarity mode rule, judge in the recognition result of the document with the presence or absence of with it is described to be identified uncommon The rarely used word information of identification of word information matches;
If so, having identified the corresponding recognition result of rarely used word information as the rarely used word region to be identified for described Recognition result.
In a kind of optional embodiment, wherein rarely used word information is specially:Rarely used word path-line information;It is described can Executing instruction further makes the processor execute following operation:
According to path-line comparison rule, judge in the recognition result of the document with the presence or absence of with it is described to be identified uncommon The rarely used word path-line information of identification of word path-line information matches.
In a kind of optional embodiment, the executable instruction further makes the processor execute following operation:
Whether the quantity of rarely used word path-line more to be identified is equal with the quantity for having identified rarely used word path-line;
If so, judge in the rarely used word path-line to be identified each paths line position information with it is described identified it is uncommon Whether each paths line position information of word path-line matches.
In a kind of optional embodiment, the executable instruction further makes the processor execute following operation:
It calculates the extreme coordinates with the rarely used word path-line to be identified of same paths line number and described has identified The coordinate difference of the extreme coordinates of rarely used word path-line;
It calculates rarely used word to be identified and has identified the variance of the coordinate difference of all path-lines of rarely used word;
Judge whether the variance is less than default desired value;If so, determining each in the rarely used word path-line to be identified Paths line position information matches with each paths line position information for having identified rarely used word path-line.
In a kind of optional embodiment, wherein rarely used word information is specially:Screenshot is carried out to rarely used word region to obtain Rarely used word picture;
The executable instruction further makes the processor execute following operation:
According to picture similarity calculation rule, rarely used word picture to be identified and any phase for having identified rarely used word picture are judged Whether be greater than or equal to default similarity threshold like degree, if so, determine to exist in the recognition result of the document with it is described The rarely used word information of identification of rarely used word information matches to be identified.
Fig. 4 shows a kind of structural schematic diagram for calculating equipment according to an embodiment of the present invention, the specific embodiment of the invention The specific implementation for calculating equipment is not limited.
As shown in figure 4, the calculating equipment may include:Processor (processor) 402, communication interface (Communications Interface) 404, memory (memory) 406 and communication bus 408.
Wherein:
Processor 402, communication interface 404 and memory 406 complete mutual communication by communication bus 408.
Communication interface 404, for being communicated with the network element of other equipment such as client or other servers etc..
Processor 402 can specifically execute the phase in above-mentioned uncommon word processing method embodiment for executing program 410 Close step.
Specifically, program 410 may include program code, which includes computer operation instruction.
Processor 402 may be central processor CPU or specific integrated circuit ASIC (Application Specific Integrated Circuit), or be arranged to implement the integrated electricity of one or more of the embodiment of the present invention Road.The one or more processors that equipment includes are calculated, can be same type of processor, such as one or more CPU;It can also To be different types of processor, such as one or more CPU and one or more ASIC.
Memory 406, for storing program 410.Memory 406 may include high speed RAM memory, it is also possible to further include Nonvolatile memory (non-volatile memory), for example, at least a magnetic disk storage.
Program 410 specifically can be used for so that processor 402 executes following operation:
The text object of document is identified, determines rarely used word region to be identified, is obtained in rarely used word region to be identified Rarely used word information to be identified;
According to similarity mode rule, judge in the recognition result of the document with the presence or absence of with it is described to be identified uncommon The rarely used word information of identification of word information matches;
If so, having identified the corresponding recognition result of rarely used word information as the rarely used word region to be identified for described Recognition result.
In a kind of optional embodiment, wherein rarely used word information is specially:Rarely used word path-line information;
Program 410 can specifically be further used for so that processor 402 executes following operation:
According to path-line comparison rule, judge in the recognition result of the document with the presence or absence of with it is described to be identified uncommon The rarely used word path-line information of identification of word path-line information matches.
In a kind of optional embodiment, program 410 can specifically be further used for so that processor 402 execute it is following Operation:
Whether the quantity of rarely used word path-line more to be identified is equal with the quantity for having identified rarely used word path-line;
If so, judge in the rarely used word path-line to be identified each paths line position information with it is described identified it is uncommon Whether each paths line position information of word path-line matches.
In a kind of optional embodiment, program 410 can specifically be further used for so that processor 402 execute it is following Operation:
It calculates the extreme coordinates with the rarely used word path-line to be identified of same paths line number and described has identified The coordinate difference of the extreme coordinates of rarely used word path-line;
It calculates rarely used word to be identified and has identified the variance of the coordinate difference of all path-lines of rarely used word;
Judge whether the variance is less than default desired value;If so, determining each in the rarely used word path-line to be identified Paths line position information matches with each paths line position information for having identified rarely used word path-line.
In a kind of optional embodiment, wherein rarely used word information is specially:Screenshot is carried out to rarely used word region to obtain Rarely used word picture;
Program 410 can specifically be further used for so that processor 402 executes following operation:
According to picture similarity calculation rule, rarely used word picture to be identified and any phase for having identified rarely used word picture are judged Whether be greater than or equal to default similarity threshold like degree, if so, determine to exist in the recognition result of the document with it is described The rarely used word information of identification of rarely used word information matches to be identified.
Algorithm and display are not inherently related to any particular computer, virtual system, or other device provided herein. Various general-purpose systems can also be used together with teachings based herein.As described above, it constructs required by this kind of system Structure be obvious.In addition, the present invention is also not directed to any particular programming language.It should be understood that can use various Programming language realizes summary of the invention described herein, and the description done above to language-specific is to disclose this hair Bright preferred forms.
In the instructions provided here, numerous specific details are set forth.It is to be appreciated, however, that implementation of the invention Example can be practiced without these specific details.In some instances, well known method, structure is not been shown in detail And technology, so as not to obscure the understanding of this specification.
Similarly, it should be understood that in order to simplify the disclosure and help to understand one or more of the various inventive aspects, Above in the description of exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes In example, figure or descriptions thereof.However, the disclosed method should not be interpreted as reflecting the following intention:It is i.e. required to protect Shield the present invention claims features more more than feature expressly recited in each claim.More precisely, such as right As claim reflects, inventive aspect is all features less than single embodiment disclosed above.Therefore, it then follows tool Thus claims of body embodiment are expressly incorporated in the specific embodiment, wherein each claim conduct itself Separate embodiments of the invention.
Those skilled in the art will understand that can be carried out adaptively to the module in the equipment in embodiment Change and they are arranged in one or more devices different from this embodiment.It can be the module or list in embodiment Member or component are combined into a module or unit or component, and furthermore they can be divided into multiple submodule or subelement or Sub-component.Other than such feature and/or at least some of process or unit exclude each other, it can use any Combination is to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so disclosed All process or units of what method or apparatus are combined.Unless expressly stated otherwise, this specification is (including adjoint power Benefit require, abstract and attached drawing) disclosed in each feature can carry out generation with an alternative feature that provides the same, equivalent, or similar purpose It replaces.
In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments In included certain features rather than other feature, but the combination of the feature of different embodiments mean it is of the invention Within the scope of and form different embodiments.For example, in detail in the claims, embodiment claimed it is one of any Can in any combination mode come using.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and ability Field technique personnel can be designed alternative embodiment without departing from the scope of the appended claims.In the claims, Any reference symbol between parentheses should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not Element or step listed in the claims.Word "a" or "an" located in front of the element does not exclude the presence of multiple such Element.The present invention can be by means of including the hardware of several different elements and being come by means of properly programmed computer real It is existing.The use of word first, second, and third does not indicate any sequence.These words can be construed to title.
The invention discloses:A1. a kind of uncommon word processing method, including:
The text object of document is identified, determines rarely used word region to be identified, is obtained in rarely used word region to be identified Rarely used word information to be identified;
According to similarity mode rule, judge in the recognition result of the document with the presence or absence of with it is described to be identified uncommon The rarely used word information of identification of word information matches;
If so, having identified the corresponding recognition result of rarely used word information as the rarely used word region to be identified for described Recognition result.
A2. method according to a1, wherein rarely used word information is specially:Rarely used word path-line information;
It is described according to similarity mode rule, judge in the recognition result of the document with the presence or absence of with it is described to be identified The rarely used word information of identification of rarely used word information matches is specially:
According to path-line comparison rule, judge in the recognition result of the document with the presence or absence of with it is described to be identified uncommon The rarely used word path-line information of identification of word path-line information matches.
A3. the method according to A2, wherein it is described according to path-line comparison rule, judge the identification of the document As a result with the presence or absence of further with the rarely used word path-line information of identification of the rarely used word path-line information matches to be identified in Including:
Whether the quantity of rarely used word path-line more to be identified is equal with the quantity for having identified rarely used word path-line;
If so, judge in the rarely used word path-line to be identified each paths line position information with it is described identified it is uncommon Whether each paths line position information of word path-line matches.
A4. method according to a3, wherein each paths line position in the judgement rarely used word path-line to be identified Confidence is ceased with whether each paths line position information for having identified rarely used word path-line matches:
It calculates the extreme coordinates with the rarely used word path-line to be identified of same paths line number and described has identified The coordinate difference of the extreme coordinates of rarely used word path-line;
It calculates rarely used word to be identified and has identified the variance of the coordinate difference of all path-lines of rarely used word;
Judge whether the variance is less than default desired value;If so, determining each in the rarely used word path-line to be identified Paths line position information matches with each paths line position information for having identified rarely used word path-line.
A5. method according to a1, wherein rarely used word information is specially:What screenshot obtained is carried out to rarely used word region Rarely used word picture;
It is described according to similarity mode rule, judge in the recognition result of the document with the presence or absence of with it is described to be identified The rarely used word information of identification of rarely used word information matches is specially:
According to picture similarity calculation rule, rarely used word picture to be identified and any phase for having identified rarely used word picture are judged Whether be greater than or equal to default similarity threshold like degree, if so, determine to exist in the recognition result of the document with it is described The rarely used word information of identification of rarely used word information matches to be identified.
The invention also discloses:B6. a kind of calculating equipment, including:Processor, memory, communication interface and communication bus, The processor, the memory and the communication interface complete mutual communication by the communication bus;
For the memory for storing an at least executable instruction, it is following that the executable instruction executes the processor Operation:
The text object of document is identified, determines rarely used word region to be identified, is obtained in rarely used word region to be identified Rarely used word information to be identified;
According to similarity mode rule, judge in the recognition result of the document with the presence or absence of with it is described to be identified uncommon The rarely used word information of identification of word information matches;
If so, having identified the corresponding recognition result of rarely used word information as the rarely used word region to be identified for described Recognition result.
B7. the calculating equipment according to B6, wherein rarely used word information is specially:Rarely used word path-line information;It is described can Executing instruction further makes the processor execute following operation:
According to path-line comparison rule, judge in the recognition result of the document with the presence or absence of with it is described to be identified uncommon The rarely used word path-line information of identification of word path-line information matches.
B8. the calculating equipment according to B7, wherein it is following that the executable instruction further executes the processor Operation:
Whether the quantity of rarely used word path-line more to be identified is equal with the quantity for having identified rarely used word path-line;
If so, judge in the rarely used word path-line to be identified each paths line position information with it is described identified it is uncommon Whether each paths line position information of word path-line matches.
B9. the calculating equipment according to B8, wherein it is following that the executable instruction further executes the processor Operation:
It calculates the extreme coordinates with the rarely used word path-line to be identified of same paths line number and described has identified The coordinate difference of the extreme coordinates of rarely used word path-line;
It calculates rarely used word to be identified and has identified the variance of the coordinate difference of all path-lines of rarely used word;
Judge whether the variance is less than default desired value;If so, determining each in the rarely used word path-line to be identified Paths line position information matches with each paths line position information for having identified rarely used word path-line.
B10. the calculating equipment according to B6, wherein rarely used word information is specially:Screenshot is carried out to rarely used word region to obtain The rarely used word picture arrived;
The executable instruction further makes the processor execute following operation:
According to picture similarity calculation rule, rarely used word picture to be identified and any phase for having identified rarely used word picture are judged Whether be greater than or equal to default similarity threshold like degree, if so, determine to exist in the recognition result of the document with it is described The rarely used word information of identification of rarely used word information matches to be identified.
The invention also discloses:C11. a kind of computer storage medium, being stored at least one in the storage medium can hold Row instruction, the executable instruction make processor execute following operation:
The text object of document is identified, determines rarely used word region to be identified, is obtained in rarely used word region to be identified Rarely used word information to be identified;
According to similarity mode rule, judge in the recognition result of the document with the presence or absence of with it is described to be identified uncommon The rarely used word information of identification of word information matches;
If so, having identified the corresponding recognition result of rarely used word information as the rarely used word region to be identified for described Recognition result.
C12. the computer storage medium according to C11, wherein rarely used word information is specially:Rarely used word path-line letter Breath;The executable instruction further makes the processor execute following operation:
According to path-line comparison rule, judge in the recognition result of the document with the presence or absence of with it is described to be identified uncommon The rarely used word path-line information of identification of word path-line information matches.
C13. the computer storage medium according to C12, wherein the executable instruction further makes the processor Execute following operation:
Whether the quantity of rarely used word path-line more to be identified is equal with the quantity for having identified rarely used word path-line;
If so, judge in the rarely used word path-line to be identified each paths line position information with it is described identified it is uncommon Whether each paths line position information of word path-line matches.
C14. the computer storage medium according to C13, wherein the executable instruction further makes the processor Execute following operation:
It calculates the extreme coordinates with the rarely used word path-line to be identified of same paths line number and described has identified The coordinate difference of the extreme coordinates of rarely used word path-line;
It calculates rarely used word to be identified and has identified the variance of the coordinate difference of all path-lines of rarely used word;
Judge whether the variance is less than default desired value;If so, determining each in the rarely used word path-line to be identified Paths line position information matches with each paths line position information for having identified rarely used word path-line.
C15. the computer storage medium according to C11, wherein rarely used word information is specially:To rarely used word region into The rarely used word picture that row screenshot obtains;
The executable instruction further makes the processor execute following operation:
According to picture similarity calculation rule, rarely used word picture to be identified and any phase for having identified rarely used word picture are judged Whether be greater than or equal to default similarity threshold like degree, if so, determine to exist in the recognition result of the document with it is described The rarely used word information of identification of rarely used word information matches to be identified.

Claims (10)

1. a kind of uncommon word processing method, including:
The text object of document is identified, determines rarely used word region to be identified, obtain in rarely used word region to be identified to Identify rarely used word information;
According to similarity mode rule, judge to believe in the recognition result of the document with the presence or absence of with the rarely used word to be identified It ceases and matched has identified rarely used word information;
If so, having identified the corresponding recognition result of rarely used word information as the identification in the rarely used word region to be identified for described As a result.
2. according to the method described in claim 1, wherein, rarely used word information is specially:Rarely used word path-line information;
It is described according to similarity mode rule, judge in the recognition result of the document with the presence or absence of with it is described to be identified uncommon The rarely used word information of identification of word information matches is specially:
According to path-line comparison rule, judge to whether there is and the rarely used word road to be identified in the recognition result of the document The rarely used word path-line information of identification of radial line information matches.
3. it is described according to path-line comparison rule according to the method described in claim 2, wherein, judge the knowledge of the document In other result with the presence or absence of with the rarely used word path-line information of identification of the rarely used word path-line information matches to be identified into one Step includes:
Whether the quantity of rarely used word path-line more to be identified is equal with the quantity for having identified rarely used word path-line;
If so, judging in the rarely used word path-line to be identified each paths line position information and described having identified rarely used word road Whether each paths line position information of radial line matches.
4. according to the method described in claim 3, wherein, each paths line in the judgement rarely used word path-line to be identified Location information further comprises with whether each paths line position information for having identified rarely used word path-line matches:
Calculate have same paths line number the rarely used word path-line to be identified extreme coordinates and it is described identified it is uncommon The coordinate difference of the extreme coordinates of word path-line;
It calculates rarely used word to be identified and has identified the variance of the coordinate difference of all path-lines of rarely used word;
Judge whether the variance is less than default desired value;If so, determining the road rarely used word path-line Zhong Getiao to be identified Diameter line position information matches with each paths line position information for having identified rarely used word path-line.
5. according to the method described in claim 1, wherein, rarely used word information is specially:Screenshot is carried out to rarely used word region to obtain Rarely used word picture;
It is described according to similarity mode rule, judge in the recognition result of the document with the presence or absence of with it is described to be identified uncommon The rarely used word information of identification of word information matches is specially:
According to picture similarity calculation rule, rarely used word picture to be identified and any similarity for having identified rarely used word picture are judged Whether default similarity threshold is greater than or equal to, if so, determining exist with described in the recognition result of the document wait know The rarely used word information of identification of other rarely used word information matches.
6. a kind of calculating equipment, including:Processor, memory, communication interface and communication bus, the processor, the storage Device and the communication interface complete mutual communication by the communication bus;
The memory makes the processor execute following behaviour for storing an at least executable instruction, the executable instruction Make:
The text object of document is identified, determines rarely used word region to be identified, obtain in rarely used word region to be identified to Identify rarely used word information;
According to similarity mode rule, judge to believe in the recognition result of the document with the presence or absence of with the rarely used word to be identified It ceases and matched has identified rarely used word information;
If so, having identified the corresponding recognition result of rarely used word information as the identification in the rarely used word region to be identified for described As a result.
7. calculating equipment according to claim 6, wherein rarely used word information is specially:Rarely used word path-line information;It is described Executable instruction further makes the processor execute following operation:
According to path-line comparison rule, judge to whether there is and the rarely used word road to be identified in the recognition result of the document The rarely used word path-line information of identification of radial line information matches.
8. calculating equipment according to claim 7, wherein the executable instruction further execute the processor with Lower operation:
Whether the quantity of rarely used word path-line more to be identified is equal with the quantity for having identified rarely used word path-line;
If so, judging in the rarely used word path-line to be identified each paths line position information and described having identified rarely used word road Whether each paths line position information of radial line matches.
9. calculating equipment according to claim 8, wherein the executable instruction further execute the processor with Lower operation:
Calculate have same paths line number the rarely used word path-line to be identified extreme coordinates and it is described identified it is uncommon The coordinate difference of the extreme coordinates of word path-line;
It calculates rarely used word to be identified and has identified the variance of the coordinate difference of all path-lines of rarely used word;
Judge whether the variance is less than default desired value;If so, determining the road rarely used word path-line Zhong Getiao to be identified Diameter line position information matches with each paths line position information for having identified rarely used word path-line.
10. a kind of computer storage medium, an at least executable instruction, the executable instruction are stored in the storage medium Processor is set to execute following operation:
The text object of document is identified, determines rarely used word region to be identified, obtain in rarely used word region to be identified to Identify rarely used word information;
According to similarity mode rule, judge to believe in the recognition result of the document with the presence or absence of with the rarely used word to be identified It ceases and matched has identified rarely used word information;
If so, having identified the corresponding recognition result of rarely used word information as the identification in the rarely used word region to be identified for described As a result.
CN201810659246.8A 2018-06-25 2018-06-25 Uncommon word processing method calculates equipment and computer storage medium Active CN108846367B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810659246.8A CN108846367B (en) 2018-06-25 2018-06-25 Uncommon word processing method calculates equipment and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810659246.8A CN108846367B (en) 2018-06-25 2018-06-25 Uncommon word processing method calculates equipment and computer storage medium

Publications (2)

Publication Number Publication Date
CN108846367A true CN108846367A (en) 2018-11-20
CN108846367B CN108846367B (en) 2019-08-30

Family

ID=64202037

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810659246.8A Active CN108846367B (en) 2018-06-25 2018-06-25 Uncommon word processing method calculates equipment and computer storage medium

Country Status (1)

Country Link
CN (1) CN108846367B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110069767A (en) * 2019-04-23 2019-07-30 掌阅科技股份有限公司 Composition method, electronic equipment and computer storage medium based on e-book
CN111539383A (en) * 2020-05-22 2020-08-14 浙江蓝鸽科技有限公司 Formula knowledge point identification method and device
CN117151041A (en) * 2023-10-27 2023-12-01 成方金融科技有限公司 PDF (Portable document Format) generation method, device, equipment and storage medium compatible with rarely used words

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1622122A (en) * 2003-11-28 2005-06-01 佳能株式会社 Method, device and storage medium for character recognition
US20080193008A1 (en) * 2007-02-09 2008-08-14 Jpmorgan Chase Bank, N.A. System and Method for Generating Magnetic Ink Character Recognition (MICR) Testing Documents
CN101901348A (en) * 2010-06-29 2010-12-01 北京捷通华声语音技术有限公司 Normalization based handwriting identifying method and identifying device
CN102542264A (en) * 2011-12-22 2012-07-04 北京语言大学 Method and device for automatically evaluating right and wrong of Chinese character writing on basis of digital handwriting equipment
CN103154974A (en) * 2011-03-07 2013-06-12 株式会社Ntt都科摩 Character recognition device, character recognition method, character recognition system, and character recognition program
CN103186581A (en) * 2011-12-30 2013-07-03 牟颖 Method for quickly acquiring pronunciation of uncommon word in book through mobile phone
CN103425257A (en) * 2012-05-24 2013-12-04 北京搜狗科技发展有限公司 Method and device for prompting information of uncommon characters
CN103457973A (en) * 2012-06-01 2013-12-18 深圳市腾讯计算机***有限公司 Image uploading method and system, image uploading client terminal and network server
CN108153731A (en) * 2017-12-25 2018-06-12 掌阅科技股份有限公司 Uncommon word processing method, computing device and computer storage media

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1622122A (en) * 2003-11-28 2005-06-01 佳能株式会社 Method, device and storage medium for character recognition
US20080193008A1 (en) * 2007-02-09 2008-08-14 Jpmorgan Chase Bank, N.A. System and Method for Generating Magnetic Ink Character Recognition (MICR) Testing Documents
CN101901348A (en) * 2010-06-29 2010-12-01 北京捷通华声语音技术有限公司 Normalization based handwriting identifying method and identifying device
CN103154974A (en) * 2011-03-07 2013-06-12 株式会社Ntt都科摩 Character recognition device, character recognition method, character recognition system, and character recognition program
CN102542264A (en) * 2011-12-22 2012-07-04 北京语言大学 Method and device for automatically evaluating right and wrong of Chinese character writing on basis of digital handwriting equipment
CN103186581A (en) * 2011-12-30 2013-07-03 牟颖 Method for quickly acquiring pronunciation of uncommon word in book through mobile phone
CN103425257A (en) * 2012-05-24 2013-12-04 北京搜狗科技发展有限公司 Method and device for prompting information of uncommon characters
CN103457973A (en) * 2012-06-01 2013-12-18 深圳市腾讯计算机***有限公司 Image uploading method and system, image uploading client terminal and network server
CN108153731A (en) * 2017-12-25 2018-06-12 掌阅科技股份有限公司 Uncommon word processing method, computing device and computer storage media

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王慧: ""基于模板匹配的手写体字符识别算法研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110069767A (en) * 2019-04-23 2019-07-30 掌阅科技股份有限公司 Composition method, electronic equipment and computer storage medium based on e-book
CN110069767B (en) * 2019-04-23 2020-02-28 掌阅科技股份有限公司 Typesetting method based on electronic book, electronic equipment and computer storage medium
CN111539383A (en) * 2020-05-22 2020-08-14 浙江蓝鸽科技有限公司 Formula knowledge point identification method and device
CN111539383B (en) * 2020-05-22 2023-05-05 浙江蓝鸽科技有限公司 Formula knowledge point identification method and device
CN117151041A (en) * 2023-10-27 2023-12-01 成方金融科技有限公司 PDF (Portable document Format) generation method, device, equipment and storage medium compatible with rarely used words
CN117151041B (en) * 2023-10-27 2024-02-27 成方金融科技有限公司 PDF (Portable document Format) generation method, device, equipment and storage medium compatible with rarely used words

Also Published As

Publication number Publication date
CN108846367B (en) 2019-08-30

Similar Documents

Publication Publication Date Title
CN108846367B (en) Uncommon word processing method calculates equipment and computer storage medium
JP6594988B2 (en) Method and apparatus for processing address text
CN104318259B (en) A kind of equipment, method and computing device for recognizing Target Photo
US11270105B2 (en) Extracting and analyzing information from engineering drawings
CN108153731B (en) Uncommon word processing method calculates equipment and computer storage medium
US20080068383A1 (en) Rendering and encoding glyphs
CN107944324A (en) A kind of Quick Response Code distortion correction method and device
CN113822091B (en) Method and device for correcting errors of two-dimensional code pattern, electronic equipment and storage medium
CN106663311B (en) System and method for increasing the locating depth of image
CN109886127A (en) Fingerprint identification method and terminal device
CN108875855A (en) Print method, apparatus, equipment and the storage medium of polar plot
Harley et al. Learning dense convolutional embeddings for semantic segmentation
CN108399025A (en) A kind of method, apparatus and terminal device for correcting identification deviation
CN110705225A (en) Contract marking method and device
CN115311469A (en) Image labeling method, training method, image processing method and electronic equipment
CN108920955B (en) Webpage backdoor detection method, device, equipment and storage medium
KR102239588B1 (en) Image processing method and apparatus
CN108376146A (en) Influence scoring based on domain
CN104424619B (en) Information processing equipment and information processing method
CN112912837B (en) Neural network compiling method, device, equipment, storage medium and program product
CN105892995A (en) Minus searching method and device as well as processor
CN104182396B (en) Terminal, format document content description optimization apparatus and method
CN109101973A (en) Character recognition method, electronic equipment, storage medium
CN109597980A (en) PDF document dividing method, device and electronic equipment
CN112862842B (en) Image data processing method and device, storage medium and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant