CN108846367A - Uncommon word processing method calculates equipment and computer storage medium - Google Patents
Uncommon word processing method calculates equipment and computer storage medium Download PDFInfo
- Publication number
- CN108846367A CN108846367A CN201810659246.8A CN201810659246A CN108846367A CN 108846367 A CN108846367 A CN 108846367A CN 201810659246 A CN201810659246 A CN 201810659246A CN 108846367 A CN108846367 A CN 108846367A
- Authority
- CN
- China
- Prior art keywords
- rarely used
- identified
- used word
- line
- path
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Character Discrimination (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of uncommon word processing method, calculate equipment and computer storage medium, wherein method includes:The text object of document is identified, determines rarely used word region to be identified, obtains the rarely used word information to be identified in rarely used word region to be identified;According to similarity mode rule, judge in the recognition result of the document with the presence or absence of the rarely used word information of identification with the rarely used word information matches to be identified;If so, having identified the corresponding recognition result of rarely used word information as the recognition result in the rarely used word region to be identified for described.The present invention program, for identical rarely used word, it only needs to save unique a recognition result, and, in the scene for showing rarely used word to be identified, the identical recognition result for having identified rarely used word is directly used as the recognition result in rarely used word region to be identified, avoids the storage of a large amount of duplicate recognition results, and then reduces the size of book file.
Description
Technical field
The present invention relates to text recognition technique fields, and in particular to a kind of uncommon word processing method calculates equipment and calculating
Machine storage medium.
Background technique
Currently, with universal and E-book reader the development of the mobile terminals such as mobile phone, e-book is increasingly reviewed
Read the favor of user.At the same time, in electronic reader, to enable document content according to the characteristic of arrangement for reading, with most
Mode suitable for reading is shown, format document need to be changed into streaming document, for example, PDF document is changed into electronic publishing document
(Electronic Publication, abbreviation ePUB).
However, the character code mode due to format documents such as PDF is limited, cause a large amount of rarely used words can only be with path-line
Form shows, and for these rarely used words, then the recognition methods using rarely used word is needed, to obtain the rarely used word of specific position
Recognition result, such as obtain the rarely used word picture of specific position;And in the filling process of streaming document, then with the spy saved
The out-of-the-way filling of new word for positioning the recognition result for the rarely used word set to carry out corresponding position, so that the streaming document being presented to the user
Content intact and orderly.But if format document different location, there is identical rarely used word, then need to save respectively
The recognition result of the rarely used word of corresponding many places different location, for example, the rarely used word a in format document occurs 100 times, then
The identical recognition result that rarely used word a at corresponding 100 is saved in recognition result, can make the recognition result of the rarely used word saved in this way
It is repeated in the presence of a large amount of, significantly increases the data volume of book file.
Summary of the invention
In view of the above problems, it proposes on the present invention overcomes the above problem or at least be partially solved in order to provide one kind
It states the uncommon word processing method of problem, calculate equipment and computer storage medium.
According to an aspect of the invention, there is provided the text object to document identifies, rarely used word to be identified is determined
Region obtains the rarely used word information to be identified in rarely used word region to be identified;
According to similarity mode rule, judge in the recognition result of the document with the presence or absence of with it is described to be identified uncommon
The rarely used word information of identification of word information matches;
If so, having identified the corresponding recognition result of rarely used word information as the rarely used word region to be identified for described
Recognition result.
According to another aspect of the present invention, a kind of calculating equipment is provided, including:Processor, memory, communication interface and
Communication bus, the processor, the memory and the communication interface complete mutual communication by the communication bus;
For the memory for storing an at least executable instruction, it is following that the executable instruction executes the processor
Operation:
The text object of document is identified, determines rarely used word region to be identified, is obtained in rarely used word region to be identified
Rarely used word information to be identified;
According to similarity mode rule, judge in the recognition result of the document with the presence or absence of with it is described to be identified uncommon
The rarely used word information of identification of word information matches;
If so, having identified the corresponding recognition result of rarely used word information as the rarely used word region to be identified for described
Recognition result.
According to another aspect of the invention, provide a kind of computer storage medium, be stored in the storage medium to
A few executable instruction, the executable instruction make processor execute following operation:
The text object of document is identified, determines rarely used word region to be identified, is obtained in rarely used word region to be identified
Rarely used word information to be identified;
According to similarity mode rule, judge in the recognition result of the document with the presence or absence of with it is described to be identified uncommon
The rarely used word information of identification of word information matches;
If so, having identified the corresponding recognition result of rarely used word information as the rarely used word region to be identified for described
Recognition result.
Uncommon word processing method, calculating equipment and the computer storage medium provided according to the present invention, according to similarity
With rule, judge to believe in the recognition result of document with the presence or absence of with the rarely used word of identification of rarely used word information matches to be identified
Breath;If it exists, then it no longer as the prior art, is identified for the rarely used word to be identified, and recognition result is carried out
It saves, but has identified the corresponding recognition result of rarely used word information directly as the knowledge in rarely used word region to be identified for what is matched
Other result.It can be seen that the present invention program, for identical rarely used word, it is only necessary to unique a recognition result is saved, and,
In the scene for showing rarely used word to be identified, directly use the identical recognition result for having identified rarely used word as uncommon block to be identified
The recognition result in domain, avoids the storage of a large amount of duplicate recognition results, and then reduces the size of book file.
The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention,
And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can
It is clearer and more comprehensible, the followings are specific embodiments of the present invention.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field
Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention
Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:
Fig. 1 shows the flow chart of uncommon word processing method according to an embodiment of the invention;
Fig. 2 shows the flow charts of uncommon word processing method in accordance with another embodiment of the present invention;
Fig. 3 shows the flow chart of the uncommon word processing method of another embodiment according to the present invention;
Fig. 4 shows a kind of structural schematic diagram for calculating equipment according to an embodiment of the present invention.
Specific embodiment
Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing
Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here
It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure
It is fully disclosed to those skilled in the art.
Fig. 1 shows the flow chart of uncommon word processing method according to an embodiment of the invention.As shown in Figure 1, the party
Method includes:
Step S101 identifies the text object of document, determines rarely used word region to be identified, obtains to be identified uncommon
Rarely used word information to be identified in the domain of block.
Wherein, document is format document;The text object of document includes the character text encoded out by character code mode
The rarely used word that sheet and passage path line drawing are drawn.
Wherein, rarely used word information to be identified is the information that can be used in indicating the structure composition of rarely used word to be identified.For example,
Rarely used word information to be identified is the information for indicating to form the path-line quantity of rarely used word to be identified.
Specifically, when carrying out the identification of text object of format document, character text passes through existing text identification skill
Art can be easier to identify;And since rarely used word to be identified is that passage path line drawing is drawn, it is not to be obtained by coding,
It therefore can recognition failures.That is, the unidentified character text out in position where corresponding to rarely used word to be identified in recognition result.It is based on
This, in this step, determines rarely used word region to be identified according to the recognition result of each position.But in the present invention, not to true
The concrete mode in fixed rarely used word region to be identified does any restriction, when it is implemented, those skilled in the art can be according to practical feelings
Condition selects suitable mode, optionally, whether there is path-line according to corresponding position, to determine rarely used word region to be identified.
Step S102 judges to whether there is and life to be identified in the recognition result of document according to similarity mode rule
The rarely used word information of identification of rare word information matches.If so, thening follow the steps S103;If it is not, then this method terminates.
Wherein, similarity mode rule is configured according to the analog information in the structure composition of identical rarely used word.For example,
The positional relationship of the respective path line of identical rarely used word is consistent, then similarity mode rule can be arranged accordingly and be:According to
Identification rarely used word and the consistency of the positional relationship for the path-line for having identified rarely used word carry out similarity mode.
Specifically, according to similarity mode rule, rarely used word information to be identified and the identification in recognition result is raw
Rare word information carries out similarity-rough set, judges whether there is the identification with rarely used word information matches to be identified according to comparison result
Rarely used word information, and if it exists, then determine the rarely used word information of identification to match it is corresponding identified rarely used word be with wait know
The identical rarely used word of other rarely used word, then no longer identified for the rarely used word to be identified, and be not repeated to save identical
Recognition result, i.e., the recognition result of rarely used word to be identified, but step S103 is executed, identify that rarely used word information is corresponding with basis
Recognition result handle rarely used word to be identified;If it does not exist, it is determined that all to have identified that rarely used word information is corresponding and identified life
Rare word is not identical as rarely used word to be identified, i.e., rarely used word to be identified is new rarely used word, then is directed to rarely used word to be identified into one
Step identified, after the completion of identification, using the recognition result of rarely used word information to be identified as identified rarely used word recognition result,
And using rarely used word information to be identified as having identified that rarely used word information is stored in recognition result, sentence to match next time
It is used when disconnected.In the present invention, the method for identifying rarely used word to be identified is not specifically limited, when it is implemented, this field skill
Art personnel can flexibly determine recognition methods, optionally, according to the screenshot in rarely used word region to be identified, carry out to rarely used word to be identified
Identification.
For example, in the recognition result of document 1, correspondence has identified rarely used word a, has identified rarely used word b and known
Other rarely used word c is preserved respectively and has been identified rarely used word information A, identified rarely used word information B and identified rarely used word information
C.By current rarely used word information D to be identified with identified rarely used word information A, identified rarely used word information B and identified uncommon
Word information C is compared, if rarely used word information D to be identified matches with rarely used word information B has been identified, it is determined that be identified uncommon
The corresponding rarely used word d to be identified of word information D with identified that rarely used word information B is corresponding and identified that rarely used word b is identical, then execute step
Rapid S103, according to having identified that the corresponding recognition result b of rarely used word information B handles rarely used word d to be identified;If rarely used word letter to be identified
It ceases D and has identified rarely used word information A, identified rarely used word information B and identified that rarely used word information C is mismatched, it is determined that
The corresponding rarely used word d to be identified of rarely used word information D to be identified is new rarely used word, then carries out for rarely used word d to be identified uncommon
Word identification.
Step S103 will identify the corresponding recognition result of rarely used word information as the identification knot in rarely used word region to be identified
Fruit.
Specifically, the identification existed at the judgement in the recognition result of document with rarely used word information matches to be identified is raw
After rare word information, the rarely used word to be identified to match is established and the incidence relation that has identified rarely used word.Optionally, it establishes to be identified
Rarely used word has identified the incidence relation of rarely used word information with what is matched, alternatively, establishing rarely used word to be identified and matching
Identify the incidence relation of the corresponding recognition result of rarely used word information.In the scene for needing to show the rarely used word to be identified, then root
According to incidence relation, the identical recognition result for having identified rarely used word is determined;Then it is to be identified the recognition result to be filled into display
In the rarely used word region to be identified of rarely used word.
The identification of document is judged according to similarity mode rule according to uncommon word processing method provided in this embodiment
As a result with the presence or absence of the rarely used word information of identification with rarely used word information matches to be identified in;If it exists, then no longer as existing skill
Art is the same, the identification for being identified, and save to recognition result, but being matched for the rarely used word to be identified
Recognition result of the corresponding recognition result of rarely used word information directly as rarely used word region to be identified.It can be seen that the present embodiment
Scheme, for identical rarely used word, it is only necessary to unique a recognition result is saved, and, in the scene for showing rarely used word to be identified
In, the identical recognition result for having identified rarely used word is directly used as the recognition result in rarely used word region to be identified, is avoided big
The storage of duplicate recognition result is measured, and then reduces the size of book file.
Fig. 2 shows the flow charts of uncommon word processing method in accordance with another embodiment of the present invention.In the present embodiment,
Rarely used word information is specially path-line information.As shown in Fig. 2, this method includes:
Step S201 identifies the text object of document, determines rarely used word region to be identified, obtains to be identified uncommon
Rarely used word path-line information to be identified in the domain of block.
Wherein, rarely used word path-line information to be identified includes the quantity and life to be identified of rarely used word path-line to be identified
The location information of each paths line of rare word.
In the present embodiment, first according to the recognition result of text object, specified region is determined;Wherein, specified region is position
The region of the unidentified text object out of covering between two identified text objects;And due to unidentified text pair out
As the reason of there are many, then by judging whether specified region meets preset rules, to determine whether specified region is to be identified
Rarely used word region.Further, judge whether specified region meets preset rules and include:Judge specified region width whether position
In in preset characters width range, if so, determining that specified region meets preset rules.Wherein, specify the width in region specific
For the width between adjacent two edge of two identified text objects, optionally, preset characters width range be 1 to
2 character widths.And/or judge whether comprising path-line in the specified region, if so, determining the specified region symbol
Close preset rules.And/or judge whether the specified region is covered with content of text, if it is not, then determining the specified region symbol
Close preset rules.When judging that specified region meets preset rules, it is determined that specified region is rarely used word region to be identified.But
It is that the present invention is not limited in a manner of the above-mentioned determination shown rarely used word region to be identified.
Step S202 judges to whether there is and life to be identified in the recognition result of document according to path-line comparison rule
The rarely used word path-line information of identification of rare word path-line information matches;If so, thening follow the steps S203;If it is not, then executing step
Rapid S204.
Wherein, path-line comparison rule includes the comparison rule of two levels of quantity and path line position information of path-line
Then.
Specifically, the quantity of rarely used word path-line more to be identified with identified rarely used word path-line quantity whether phase
Deng;If equal, judge each paths line position information in rarely used word path-line to be identified and identified rarely used word path-line
Whether each paths line position information matches;Optionally, each paths line position information is the extreme coordinates of each paths line.
In format document, what the mulitpath line of each rarely used word was ordered, for example, in PDF document, according to system
The mulitpath line of each rarely used word is numbered in one coding rule.In some embodiments of the invention, then may be used
Judge whether path line position information matches as follows:
Step 1 calculates the extreme coordinates with the rarely used word path-line to be identified of same paths line number and has identified life
The coordinate difference of the extreme coordinates of rare word path-line.Specifically, first against rarely used word region to be identified and uncommon block has been identified
Domain determines the first coordinate origin and the second coordinate origin respectively, and establishes the first coordinate system and the second coordinate system respectively, wherein to
The first coordinate system for identifying rarely used word region keeps relatively uniform with the second coordinate system for having identified rarely used word region.With to be identified
Rarely used word region and for having identified that rarely used word region is rectangular area, the left upper apex in rarely used word region to be identified are first to sit
Origin is marked, the top in rarely used word region to be identified is the X-axis of the first coordinate system, and the left side in rarely used word region to be identified is the first seat
Mark the Y-axis of system;The left upper apex for having identified rarely used word region is the second coordinate origin, has identified that the top in rarely used word region is the
The X-axis of two coordinate systems has identified that the left side in rarely used word region is the Y-axis of the second coordinate system.The end of rarely used word path-line to be identified
Point coordinate refers to coordinate of two endpoints compared to the first coordinate system of rarely used word path-line to be identified, has identified rarely used word path
The extreme coordinates of line, which refer to, has identified coordinate of two endpoints of rarely used word path-line compared to the second coordinate system.Specifically, phase
With the extreme coordinates of the rarely used word path-line to be identified of path-line number and the seat for the extreme coordinates for having identified rarely used word path-line
Mark difference is calculated by following formula:
In formula, i is path-line number, and j is path line endpoints serial number (j=1,2), rijFor j-th endpoint of path-line i
The coordinate difference of extreme coordinates, xijFor the coordinate value of the X axis coordinate of j-th of endpoint of rarely used word path-line i to be identified, x'ijFor
Identify the coordinate value of the X axis coordinate of j-th of endpoint of rarely used word path-line i, yijIt is j-th of rarely used word path-line i to be identified
The coordinate value of the Y axis coordinate of endpoint, y'ijFor identified rarely used word path-line i j-th of endpoint Y axis coordinate coordinate value.
Step 2 calculates rarely used word to be identified and has identified the variance of the coordinate difference of all path-lines of rarely used word.Specifically,
Rarely used word to be identified is calculated according to the following formula and has identified the variance of the coordinate difference of all path-lines of rarely used word:
In formula, n is the quantity of path-line, s2For the variance of the coordinate difference of all path-lines,For the coordinate of all path-lines
The average value of difference.
Step 3, judges whether variance is less than default desired value;If so, determining each item in rarely used word path-line to be identified
Path line position information matches with each paths line position information for having identified rarely used word path-line.Ideally, due to
The stroke number of identical rarely used word is identical (i.e. the quantity of path-line is identical), and same document according to be identically numbered rule into
Row number, then the corresponding path-line number endpoint location of identical stroke is also identical.If rarely used word to be identified with identified it is uncommon
Word is identical, then calculated variance yields is 0.But, it is contemplated that some error components, for example, due to determining to be identified uncommon
Error caused by block domain is different from the size for having identified rarely used word region judges to be identified uncommon then by presetting desired value
In word path-line each paths line position information whether with each paths line position information phase for having identified rarely used word path-line
Match.
However, it is desirable to which described herein be, the variance of the above-mentioned coordinate difference according to all path-lines judges path line position
The whether matched mode of information is only a kind of preferred embodiment of the invention, and the present invention is not limited thereto, this field skill
Art personnel should be understood that all each paths lines that can be used in determining rarely used word to be identified and identify each of rarely used word
The embodiment of the consistency of the location information of paths line is all contained in the scope of the present invention.Optionally, of the invention
It, can also be according to the length value of each paths line, alternatively, according to the flat of the coordinate difference of all path-lines in other specific implementations
Mean value or standard deviation, to judge each paths line position information of rarely used word to be identified and identify each paths line of rarely used word
Whether location information matches.
If judging each paths line position information in rarely used word path-line to be identified and having identified rarely used word path-line
Each paths line position information matches, then exists and rarely used word path-line information matches to be identified in the recognition result of document
The rarely used word path-line information of identification, that is, determine that rarely used word to be identified is and the rarely used word path-line information of identification that matches
It is corresponding to have identified the identical rarely used word of rarely used word, then it is no longer identified for the rarely used word to be identified, and be not repeated
The recognition result of the identical rarely used word to be identified of preservation recognition result corresponding with rarely used word path-line information has been identified, but hold
Row step S203 has identified that the corresponding recognition result of rarely used word path-line information handles rarely used word to be identified with basis;If judgement
Each paths line position information and each paths line position for having identified rarely used word path-line in rarely used word path-line to be identified out
Information does not match that then there is no the identification with rarely used word path-line information matches to be identified is raw in the recognition result of document
Rare word path-line information, thens follow the steps S204, for rarely used word further progress to be identified identify and to recognition result into
Row saves, use when to match next time.,
Step S203 will identify the corresponding recognition result of rarely used word path-line information as rarely used word region to be identified
Recognition result.
Step S204 identifies rarely used word to be identified, and saves the recognition result of rarely used word to be identified to having known
In other result.
If there is no the identification rarely used words with rarely used word path-line information matches to be identified in the recognition result of document
Path-line information, i.e., rarely used word to be identified are new rarely used word, then identify, identified for rarely used word further progress to be identified
Cheng Hou, using the recognition result of rarely used word information to be identified as the recognition result for having identified rarely used word and will be to be identified uncommon
Word information is as having identified that rarely used word information is stored in recognition result, to use when matching judgment next time.
In some embodiments of the invention, rarely used word to be identified is identified in the following way:Treat knowledge
Other rarely used word region carries out screenshot and obtains rarely used word picture to be identified, using picture character identification technology to rarely used word figure to be identified
Piece is identified;Optionally, picture character identification technology is specially OCR technique.If identification one character of output, with the character
As the corresponding recognition result of rarely used word to be identified;If identification exports multiple characters or fails to obtain recognition result, wait know
Other rarely used word picture itself is used as the corresponding recognition result of rarely used word to be identified.But the present invention is not limited thereto, it is specific real
Shi Shi can also carry out the identification of rarely used word to be identified in a manner of other feasible identification rarely used words.
According to uncommon word processing method provided in this embodiment, by the quantity of rarely used word path-line to be identified and life has been identified
The quantity of the path-line of rare word is compared, and by paths line position information each in rarely used word path-line to be identified with known
Each paths line position information of other rarely used word path-line is compared, and is judged in the recognition result of document according to comparison result
With the presence or absence of the rarely used word path-line information of identification with rarely used word path-line information matches to be identified;If the identification knot of document
There is the rarely used word path-line information of identification with rarely used word path-line information matches to be identified, the then knowledge that will be matched in fruit
Recognition result of the corresponding recognition result of other rarely used word path-line information as rarely used word region to be identified.It can be seen that this reality
A scheme is applied, for identical rarely used word, it is only necessary to unique a recognition result is saved, and, showing rarely used word to be identified
In scene, the identical recognition result for having identified rarely used word is directly used to avoid as the recognition result in rarely used word region to be identified
The storages of a large amount of duplicate recognition results, and then reduce the size of book file.
Fig. 3 shows the flow chart of the uncommon word processing method of another embodiment according to the present invention.In the present embodiment,
Rarely used word information is to carry out the rarely used word picture that screenshot obtains to rarely used word region.As shown in figure 3, this method includes:
Step S301 identifies the text object of document, determines rarely used word region to be identified, obtains to life to be identified
Rare word region carries out the rarely used word picture to be identified that screenshot obtains.
Wherein, there are path-lines in rarely used word region to be identified.In this step, screenshot is carried out to rarely used word region to be identified
Processing, obtains rarely used word picture to be identified, then rarely used word picture to be identified, which remains, corresponds to rarely used word to be identified in format document
The raw information in region.
Specifically, the edge line of screenshot is determined by following steps:According to the text pair on the left of rarely used word region to be identified
The right hand edge of elephant determines the left edge of screenshot, determines screenshot according to the left edge of the text object on the right side of rarely used word region to be identified
Right hand edge, the top edge of screenshot is determined according to top edge higher in the text object of rarely used word region to be identified two sides, with
And lower lower edge determines the lower edge of screenshot in the text object according to rarely used word region to be identified two sides, obtains to be identified
Rarely used word picture.Optionally, directly with the right hand edge of the text object on the left of rarely used word region to be identified, uncommon block to be identified
The left edge of text object on the right side of domain, rarely used word region to be identified two sides text object in higher top edge and wait know
In the text object of other rarely used word region two sides lower lower edge determine respectively the left edge of screenshot, right hand edge, top edge and
Lower edge;Can also be with, on the basis of the left edge of the screenshot of aforementioned determination, right hand edge, top edge and lower edge, to the left, to
It is right, upward and widen predetermined width respectively downwards, left edge, right hand edge, top edge and lower edge after widening as section
Left edge, right hand edge, top edge and the lower edge of figure, which can be such that the rarely used word picture to be identified obtained by screenshot fills
Divide the raw information including rarely used word to be identified in format document.
Step S302, according to picture similarity calculation rule, judge in the recognition result of document with the presence or absence of with wait know
The rarely used word picture of identification of other rarely used word picture match.If so, thening follow the steps S303;If it is not, thening follow the steps S304.
Specifically, judge whether rarely used word picture to be identified is greater than or waits with any similarity for having identified rarely used word picture
In default similarity threshold, for example, by rarely used word picture to be identified and it is any identified rarely used word picture distinguish binary conversion treatment,
Then rarely used word picture to be identified and any similarity for having identified rarely used word picture are judged according to binarization result, still, this
Invention is not illustrated with this to be limited.If similarity is greater than or equal to default similarity threshold, the recognition result of document is determined
Middle exist and the rarely used word picture of identification of rarely used word picture match to be identified, i.e., rarely used word to be identified are and the knowledge that matches
Other rarely used word picture is corresponding to have identified the identical rarely used word of rarely used word, thens follow the steps S303, has identified rarely used word with basis
The corresponding recognition result of information handles rarely used word to be identified;If similarity is less than default similarity threshold, it is directed to life to be identified
Rare word further progress is identified and is saved to recognition result, use when to match next time.
Step S303 will identify the corresponding recognition result of rarely used word picture as the identification knot in rarely used word region to be identified
Fruit.
Step S304 identifies rarely used word to be identified, and saves the recognition result of rarely used word to be identified to having known
In other result.
According to uncommon word processing method provided in this embodiment, rarely used word region to be identified progress screenshot is obtained to be identified
Rarely used word picture calculates rarely used word picture to be identified and the identification rarely used word in recognition result using picture similarity algorithm
The similarity of picture, and judge that recognition result whether there is the rarely used word figure of identification to match with rarely used word picture to be identified
Piece;If there is the rarely used word picture of identification with rarely used word picture match to be identified in the recognition result of document, by phase
That matches has identified recognition result of the corresponding recognition result of rarely used word picture as rarely used word region to be identified.It can be seen that this
Example scheme, for identical rarely used word, it is only necessary to unique a recognition result is saved, and, showing rarely used word to be identified
Scene in, directly use the identical recognition result for having identified rarely used word to keep away as the recognition result in rarely used word region to be identified
Exempt from the storage of a large amount of duplicate recognition results, and then reduces the size of book file.
The embodiment of the present application provides a kind of nonvolatile computer storage media, and the computer storage medium is stored with
The uncommon word processing side in above-mentioned any means embodiment can be performed in an at least executable instruction, the computer executable instructions
Method.
Executable instruction specifically can be used for so that processor executes following operation:
The text object of document is identified, determines rarely used word region to be identified, is obtained in rarely used word region to be identified
Rarely used word information to be identified;
According to similarity mode rule, judge in the recognition result of the document with the presence or absence of with it is described to be identified uncommon
The rarely used word information of identification of word information matches;
If so, having identified the corresponding recognition result of rarely used word information as the rarely used word region to be identified for described
Recognition result.
In a kind of optional embodiment, wherein rarely used word information is specially:Rarely used word path-line information;It is described can
Executing instruction further makes the processor execute following operation:
According to path-line comparison rule, judge in the recognition result of the document with the presence or absence of with it is described to be identified uncommon
The rarely used word path-line information of identification of word path-line information matches.
In a kind of optional embodiment, the executable instruction further makes the processor execute following operation:
Whether the quantity of rarely used word path-line more to be identified is equal with the quantity for having identified rarely used word path-line;
If so, judge in the rarely used word path-line to be identified each paths line position information with it is described identified it is uncommon
Whether each paths line position information of word path-line matches.
In a kind of optional embodiment, the executable instruction further makes the processor execute following operation:
It calculates the extreme coordinates with the rarely used word path-line to be identified of same paths line number and described has identified
The coordinate difference of the extreme coordinates of rarely used word path-line;
It calculates rarely used word to be identified and has identified the variance of the coordinate difference of all path-lines of rarely used word;
Judge whether the variance is less than default desired value;If so, determining each in the rarely used word path-line to be identified
Paths line position information matches with each paths line position information for having identified rarely used word path-line.
In a kind of optional embodiment, wherein rarely used word information is specially:Screenshot is carried out to rarely used word region to obtain
Rarely used word picture;
The executable instruction further makes the processor execute following operation:
According to picture similarity calculation rule, rarely used word picture to be identified and any phase for having identified rarely used word picture are judged
Whether be greater than or equal to default similarity threshold like degree, if so, determine to exist in the recognition result of the document with it is described
The rarely used word information of identification of rarely used word information matches to be identified.
Fig. 4 shows a kind of structural schematic diagram for calculating equipment according to an embodiment of the present invention, the specific embodiment of the invention
The specific implementation for calculating equipment is not limited.
As shown in figure 4, the calculating equipment may include:Processor (processor) 402, communication interface
(Communications Interface) 404, memory (memory) 406 and communication bus 408.
Wherein:
Processor 402, communication interface 404 and memory 406 complete mutual communication by communication bus 408.
Communication interface 404, for being communicated with the network element of other equipment such as client or other servers etc..
Processor 402 can specifically execute the phase in above-mentioned uncommon word processing method embodiment for executing program 410
Close step.
Specifically, program 410 may include program code, which includes computer operation instruction.
Processor 402 may be central processor CPU or specific integrated circuit ASIC (Application
Specific Integrated Circuit), or be arranged to implement the integrated electricity of one or more of the embodiment of the present invention
Road.The one or more processors that equipment includes are calculated, can be same type of processor, such as one or more CPU;It can also
To be different types of processor, such as one or more CPU and one or more ASIC.
Memory 406, for storing program 410.Memory 406 may include high speed RAM memory, it is also possible to further include
Nonvolatile memory (non-volatile memory), for example, at least a magnetic disk storage.
Program 410 specifically can be used for so that processor 402 executes following operation:
The text object of document is identified, determines rarely used word region to be identified, is obtained in rarely used word region to be identified
Rarely used word information to be identified;
According to similarity mode rule, judge in the recognition result of the document with the presence or absence of with it is described to be identified uncommon
The rarely used word information of identification of word information matches;
If so, having identified the corresponding recognition result of rarely used word information as the rarely used word region to be identified for described
Recognition result.
In a kind of optional embodiment, wherein rarely used word information is specially:Rarely used word path-line information;
Program 410 can specifically be further used for so that processor 402 executes following operation:
According to path-line comparison rule, judge in the recognition result of the document with the presence or absence of with it is described to be identified uncommon
The rarely used word path-line information of identification of word path-line information matches.
In a kind of optional embodiment, program 410 can specifically be further used for so that processor 402 execute it is following
Operation:
Whether the quantity of rarely used word path-line more to be identified is equal with the quantity for having identified rarely used word path-line;
If so, judge in the rarely used word path-line to be identified each paths line position information with it is described identified it is uncommon
Whether each paths line position information of word path-line matches.
In a kind of optional embodiment, program 410 can specifically be further used for so that processor 402 execute it is following
Operation:
It calculates the extreme coordinates with the rarely used word path-line to be identified of same paths line number and described has identified
The coordinate difference of the extreme coordinates of rarely used word path-line;
It calculates rarely used word to be identified and has identified the variance of the coordinate difference of all path-lines of rarely used word;
Judge whether the variance is less than default desired value;If so, determining each in the rarely used word path-line to be identified
Paths line position information matches with each paths line position information for having identified rarely used word path-line.
In a kind of optional embodiment, wherein rarely used word information is specially:Screenshot is carried out to rarely used word region to obtain
Rarely used word picture;
Program 410 can specifically be further used for so that processor 402 executes following operation:
According to picture similarity calculation rule, rarely used word picture to be identified and any phase for having identified rarely used word picture are judged
Whether be greater than or equal to default similarity threshold like degree, if so, determine to exist in the recognition result of the document with it is described
The rarely used word information of identification of rarely used word information matches to be identified.
Algorithm and display are not inherently related to any particular computer, virtual system, or other device provided herein.
Various general-purpose systems can also be used together with teachings based herein.As described above, it constructs required by this kind of system
Structure be obvious.In addition, the present invention is also not directed to any particular programming language.It should be understood that can use various
Programming language realizes summary of the invention described herein, and the description done above to language-specific is to disclose this hair
Bright preferred forms.
In the instructions provided here, numerous specific details are set forth.It is to be appreciated, however, that implementation of the invention
Example can be practiced without these specific details.In some instances, well known method, structure is not been shown in detail
And technology, so as not to obscure the understanding of this specification.
Similarly, it should be understood that in order to simplify the disclosure and help to understand one or more of the various inventive aspects,
Above in the description of exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes
In example, figure or descriptions thereof.However, the disclosed method should not be interpreted as reflecting the following intention:It is i.e. required to protect
Shield the present invention claims features more more than feature expressly recited in each claim.More precisely, such as right
As claim reflects, inventive aspect is all features less than single embodiment disclosed above.Therefore, it then follows tool
Thus claims of body embodiment are expressly incorporated in the specific embodiment, wherein each claim conduct itself
Separate embodiments of the invention.
Those skilled in the art will understand that can be carried out adaptively to the module in the equipment in embodiment
Change and they are arranged in one or more devices different from this embodiment.It can be the module or list in embodiment
Member or component are combined into a module or unit or component, and furthermore they can be divided into multiple submodule or subelement or
Sub-component.Other than such feature and/or at least some of process or unit exclude each other, it can use any
Combination is to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so disclosed
All process or units of what method or apparatus are combined.Unless expressly stated otherwise, this specification is (including adjoint power
Benefit require, abstract and attached drawing) disclosed in each feature can carry out generation with an alternative feature that provides the same, equivalent, or similar purpose
It replaces.
In addition, it will be appreciated by those of skill in the art that although some embodiments described herein include other embodiments
In included certain features rather than other feature, but the combination of the feature of different embodiments mean it is of the invention
Within the scope of and form different embodiments.For example, in detail in the claims, embodiment claimed it is one of any
Can in any combination mode come using.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and ability
Field technique personnel can be designed alternative embodiment without departing from the scope of the appended claims.In the claims,
Any reference symbol between parentheses should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not
Element or step listed in the claims.Word "a" or "an" located in front of the element does not exclude the presence of multiple such
Element.The present invention can be by means of including the hardware of several different elements and being come by means of properly programmed computer real
It is existing.The use of word first, second, and third does not indicate any sequence.These words can be construed to title.
The invention discloses:A1. a kind of uncommon word processing method, including:
The text object of document is identified, determines rarely used word region to be identified, is obtained in rarely used word region to be identified
Rarely used word information to be identified;
According to similarity mode rule, judge in the recognition result of the document with the presence or absence of with it is described to be identified uncommon
The rarely used word information of identification of word information matches;
If so, having identified the corresponding recognition result of rarely used word information as the rarely used word region to be identified for described
Recognition result.
A2. method according to a1, wherein rarely used word information is specially:Rarely used word path-line information;
It is described according to similarity mode rule, judge in the recognition result of the document with the presence or absence of with it is described to be identified
The rarely used word information of identification of rarely used word information matches is specially:
According to path-line comparison rule, judge in the recognition result of the document with the presence or absence of with it is described to be identified uncommon
The rarely used word path-line information of identification of word path-line information matches.
A3. the method according to A2, wherein it is described according to path-line comparison rule, judge the identification of the document
As a result with the presence or absence of further with the rarely used word path-line information of identification of the rarely used word path-line information matches to be identified in
Including:
Whether the quantity of rarely used word path-line more to be identified is equal with the quantity for having identified rarely used word path-line;
If so, judge in the rarely used word path-line to be identified each paths line position information with it is described identified it is uncommon
Whether each paths line position information of word path-line matches.
A4. method according to a3, wherein each paths line position in the judgement rarely used word path-line to be identified
Confidence is ceased with whether each paths line position information for having identified rarely used word path-line matches:
It calculates the extreme coordinates with the rarely used word path-line to be identified of same paths line number and described has identified
The coordinate difference of the extreme coordinates of rarely used word path-line;
It calculates rarely used word to be identified and has identified the variance of the coordinate difference of all path-lines of rarely used word;
Judge whether the variance is less than default desired value;If so, determining each in the rarely used word path-line to be identified
Paths line position information matches with each paths line position information for having identified rarely used word path-line.
A5. method according to a1, wherein rarely used word information is specially:What screenshot obtained is carried out to rarely used word region
Rarely used word picture;
It is described according to similarity mode rule, judge in the recognition result of the document with the presence or absence of with it is described to be identified
The rarely used word information of identification of rarely used word information matches is specially:
According to picture similarity calculation rule, rarely used word picture to be identified and any phase for having identified rarely used word picture are judged
Whether be greater than or equal to default similarity threshold like degree, if so, determine to exist in the recognition result of the document with it is described
The rarely used word information of identification of rarely used word information matches to be identified.
The invention also discloses:B6. a kind of calculating equipment, including:Processor, memory, communication interface and communication bus,
The processor, the memory and the communication interface complete mutual communication by the communication bus;
For the memory for storing an at least executable instruction, it is following that the executable instruction executes the processor
Operation:
The text object of document is identified, determines rarely used word region to be identified, is obtained in rarely used word region to be identified
Rarely used word information to be identified;
According to similarity mode rule, judge in the recognition result of the document with the presence or absence of with it is described to be identified uncommon
The rarely used word information of identification of word information matches;
If so, having identified the corresponding recognition result of rarely used word information as the rarely used word region to be identified for described
Recognition result.
B7. the calculating equipment according to B6, wherein rarely used word information is specially:Rarely used word path-line information;It is described can
Executing instruction further makes the processor execute following operation:
According to path-line comparison rule, judge in the recognition result of the document with the presence or absence of with it is described to be identified uncommon
The rarely used word path-line information of identification of word path-line information matches.
B8. the calculating equipment according to B7, wherein it is following that the executable instruction further executes the processor
Operation:
Whether the quantity of rarely used word path-line more to be identified is equal with the quantity for having identified rarely used word path-line;
If so, judge in the rarely used word path-line to be identified each paths line position information with it is described identified it is uncommon
Whether each paths line position information of word path-line matches.
B9. the calculating equipment according to B8, wherein it is following that the executable instruction further executes the processor
Operation:
It calculates the extreme coordinates with the rarely used word path-line to be identified of same paths line number and described has identified
The coordinate difference of the extreme coordinates of rarely used word path-line;
It calculates rarely used word to be identified and has identified the variance of the coordinate difference of all path-lines of rarely used word;
Judge whether the variance is less than default desired value;If so, determining each in the rarely used word path-line to be identified
Paths line position information matches with each paths line position information for having identified rarely used word path-line.
B10. the calculating equipment according to B6, wherein rarely used word information is specially:Screenshot is carried out to rarely used word region to obtain
The rarely used word picture arrived;
The executable instruction further makes the processor execute following operation:
According to picture similarity calculation rule, rarely used word picture to be identified and any phase for having identified rarely used word picture are judged
Whether be greater than or equal to default similarity threshold like degree, if so, determine to exist in the recognition result of the document with it is described
The rarely used word information of identification of rarely used word information matches to be identified.
The invention also discloses:C11. a kind of computer storage medium, being stored at least one in the storage medium can hold
Row instruction, the executable instruction make processor execute following operation:
The text object of document is identified, determines rarely used word region to be identified, is obtained in rarely used word region to be identified
Rarely used word information to be identified;
According to similarity mode rule, judge in the recognition result of the document with the presence or absence of with it is described to be identified uncommon
The rarely used word information of identification of word information matches;
If so, having identified the corresponding recognition result of rarely used word information as the rarely used word region to be identified for described
Recognition result.
C12. the computer storage medium according to C11, wherein rarely used word information is specially:Rarely used word path-line letter
Breath;The executable instruction further makes the processor execute following operation:
According to path-line comparison rule, judge in the recognition result of the document with the presence or absence of with it is described to be identified uncommon
The rarely used word path-line information of identification of word path-line information matches.
C13. the computer storage medium according to C12, wherein the executable instruction further makes the processor
Execute following operation:
Whether the quantity of rarely used word path-line more to be identified is equal with the quantity for having identified rarely used word path-line;
If so, judge in the rarely used word path-line to be identified each paths line position information with it is described identified it is uncommon
Whether each paths line position information of word path-line matches.
C14. the computer storage medium according to C13, wherein the executable instruction further makes the processor
Execute following operation:
It calculates the extreme coordinates with the rarely used word path-line to be identified of same paths line number and described has identified
The coordinate difference of the extreme coordinates of rarely used word path-line;
It calculates rarely used word to be identified and has identified the variance of the coordinate difference of all path-lines of rarely used word;
Judge whether the variance is less than default desired value;If so, determining each in the rarely used word path-line to be identified
Paths line position information matches with each paths line position information for having identified rarely used word path-line.
C15. the computer storage medium according to C11, wherein rarely used word information is specially:To rarely used word region into
The rarely used word picture that row screenshot obtains;
The executable instruction further makes the processor execute following operation:
According to picture similarity calculation rule, rarely used word picture to be identified and any phase for having identified rarely used word picture are judged
Whether be greater than or equal to default similarity threshold like degree, if so, determine to exist in the recognition result of the document with it is described
The rarely used word information of identification of rarely used word information matches to be identified.
Claims (10)
1. a kind of uncommon word processing method, including:
The text object of document is identified, determines rarely used word region to be identified, obtain in rarely used word region to be identified to
Identify rarely used word information;
According to similarity mode rule, judge to believe in the recognition result of the document with the presence or absence of with the rarely used word to be identified
It ceases and matched has identified rarely used word information;
If so, having identified the corresponding recognition result of rarely used word information as the identification in the rarely used word region to be identified for described
As a result.
2. according to the method described in claim 1, wherein, rarely used word information is specially:Rarely used word path-line information;
It is described according to similarity mode rule, judge in the recognition result of the document with the presence or absence of with it is described to be identified uncommon
The rarely used word information of identification of word information matches is specially:
According to path-line comparison rule, judge to whether there is and the rarely used word road to be identified in the recognition result of the document
The rarely used word path-line information of identification of radial line information matches.
3. it is described according to path-line comparison rule according to the method described in claim 2, wherein, judge the knowledge of the document
In other result with the presence or absence of with the rarely used word path-line information of identification of the rarely used word path-line information matches to be identified into one
Step includes:
Whether the quantity of rarely used word path-line more to be identified is equal with the quantity for having identified rarely used word path-line;
If so, judging in the rarely used word path-line to be identified each paths line position information and described having identified rarely used word road
Whether each paths line position information of radial line matches.
4. according to the method described in claim 3, wherein, each paths line in the judgement rarely used word path-line to be identified
Location information further comprises with whether each paths line position information for having identified rarely used word path-line matches:
Calculate have same paths line number the rarely used word path-line to be identified extreme coordinates and it is described identified it is uncommon
The coordinate difference of the extreme coordinates of word path-line;
It calculates rarely used word to be identified and has identified the variance of the coordinate difference of all path-lines of rarely used word;
Judge whether the variance is less than default desired value;If so, determining the road rarely used word path-line Zhong Getiao to be identified
Diameter line position information matches with each paths line position information for having identified rarely used word path-line.
5. according to the method described in claim 1, wherein, rarely used word information is specially:Screenshot is carried out to rarely used word region to obtain
Rarely used word picture;
It is described according to similarity mode rule, judge in the recognition result of the document with the presence or absence of with it is described to be identified uncommon
The rarely used word information of identification of word information matches is specially:
According to picture similarity calculation rule, rarely used word picture to be identified and any similarity for having identified rarely used word picture are judged
Whether default similarity threshold is greater than or equal to, if so, determining exist with described in the recognition result of the document wait know
The rarely used word information of identification of other rarely used word information matches.
6. a kind of calculating equipment, including:Processor, memory, communication interface and communication bus, the processor, the storage
Device and the communication interface complete mutual communication by the communication bus;
The memory makes the processor execute following behaviour for storing an at least executable instruction, the executable instruction
Make:
The text object of document is identified, determines rarely used word region to be identified, obtain in rarely used word region to be identified to
Identify rarely used word information;
According to similarity mode rule, judge to believe in the recognition result of the document with the presence or absence of with the rarely used word to be identified
It ceases and matched has identified rarely used word information;
If so, having identified the corresponding recognition result of rarely used word information as the identification in the rarely used word region to be identified for described
As a result.
7. calculating equipment according to claim 6, wherein rarely used word information is specially:Rarely used word path-line information;It is described
Executable instruction further makes the processor execute following operation:
According to path-line comparison rule, judge to whether there is and the rarely used word road to be identified in the recognition result of the document
The rarely used word path-line information of identification of radial line information matches.
8. calculating equipment according to claim 7, wherein the executable instruction further execute the processor with
Lower operation:
Whether the quantity of rarely used word path-line more to be identified is equal with the quantity for having identified rarely used word path-line;
If so, judging in the rarely used word path-line to be identified each paths line position information and described having identified rarely used word road
Whether each paths line position information of radial line matches.
9. calculating equipment according to claim 8, wherein the executable instruction further execute the processor with
Lower operation:
Calculate have same paths line number the rarely used word path-line to be identified extreme coordinates and it is described identified it is uncommon
The coordinate difference of the extreme coordinates of word path-line;
It calculates rarely used word to be identified and has identified the variance of the coordinate difference of all path-lines of rarely used word;
Judge whether the variance is less than default desired value;If so, determining the road rarely used word path-line Zhong Getiao to be identified
Diameter line position information matches with each paths line position information for having identified rarely used word path-line.
10. a kind of computer storage medium, an at least executable instruction, the executable instruction are stored in the storage medium
Processor is set to execute following operation:
The text object of document is identified, determines rarely used word region to be identified, obtain in rarely used word region to be identified to
Identify rarely used word information;
According to similarity mode rule, judge to believe in the recognition result of the document with the presence or absence of with the rarely used word to be identified
It ceases and matched has identified rarely used word information;
If so, having identified the corresponding recognition result of rarely used word information as the identification in the rarely used word region to be identified for described
As a result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810659246.8A CN108846367B (en) | 2018-06-25 | 2018-06-25 | Uncommon word processing method calculates equipment and computer storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810659246.8A CN108846367B (en) | 2018-06-25 | 2018-06-25 | Uncommon word processing method calculates equipment and computer storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108846367A true CN108846367A (en) | 2018-11-20 |
CN108846367B CN108846367B (en) | 2019-08-30 |
Family
ID=64202037
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810659246.8A Active CN108846367B (en) | 2018-06-25 | 2018-06-25 | Uncommon word processing method calculates equipment and computer storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108846367B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110069767A (en) * | 2019-04-23 | 2019-07-30 | 掌阅科技股份有限公司 | Composition method, electronic equipment and computer storage medium based on e-book |
CN111539383A (en) * | 2020-05-22 | 2020-08-14 | 浙江蓝鸽科技有限公司 | Formula knowledge point identification method and device |
CN117151041A (en) * | 2023-10-27 | 2023-12-01 | 成方金融科技有限公司 | PDF (Portable document Format) generation method, device, equipment and storage medium compatible with rarely used words |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1622122A (en) * | 2003-11-28 | 2005-06-01 | 佳能株式会社 | Method, device and storage medium for character recognition |
US20080193008A1 (en) * | 2007-02-09 | 2008-08-14 | Jpmorgan Chase Bank, N.A. | System and Method for Generating Magnetic Ink Character Recognition (MICR) Testing Documents |
CN101901348A (en) * | 2010-06-29 | 2010-12-01 | 北京捷通华声语音技术有限公司 | Normalization based handwriting identifying method and identifying device |
CN102542264A (en) * | 2011-12-22 | 2012-07-04 | 北京语言大学 | Method and device for automatically evaluating right and wrong of Chinese character writing on basis of digital handwriting equipment |
CN103154974A (en) * | 2011-03-07 | 2013-06-12 | 株式会社Ntt都科摩 | Character recognition device, character recognition method, character recognition system, and character recognition program |
CN103186581A (en) * | 2011-12-30 | 2013-07-03 | 牟颖 | Method for quickly acquiring pronunciation of uncommon word in book through mobile phone |
CN103425257A (en) * | 2012-05-24 | 2013-12-04 | 北京搜狗科技发展有限公司 | Method and device for prompting information of uncommon characters |
CN103457973A (en) * | 2012-06-01 | 2013-12-18 | 深圳市腾讯计算机***有限公司 | Image uploading method and system, image uploading client terminal and network server |
CN108153731A (en) * | 2017-12-25 | 2018-06-12 | 掌阅科技股份有限公司 | Uncommon word processing method, computing device and computer storage media |
-
2018
- 2018-06-25 CN CN201810659246.8A patent/CN108846367B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1622122A (en) * | 2003-11-28 | 2005-06-01 | 佳能株式会社 | Method, device and storage medium for character recognition |
US20080193008A1 (en) * | 2007-02-09 | 2008-08-14 | Jpmorgan Chase Bank, N.A. | System and Method for Generating Magnetic Ink Character Recognition (MICR) Testing Documents |
CN101901348A (en) * | 2010-06-29 | 2010-12-01 | 北京捷通华声语音技术有限公司 | Normalization based handwriting identifying method and identifying device |
CN103154974A (en) * | 2011-03-07 | 2013-06-12 | 株式会社Ntt都科摩 | Character recognition device, character recognition method, character recognition system, and character recognition program |
CN102542264A (en) * | 2011-12-22 | 2012-07-04 | 北京语言大学 | Method and device for automatically evaluating right and wrong of Chinese character writing on basis of digital handwriting equipment |
CN103186581A (en) * | 2011-12-30 | 2013-07-03 | 牟颖 | Method for quickly acquiring pronunciation of uncommon word in book through mobile phone |
CN103425257A (en) * | 2012-05-24 | 2013-12-04 | 北京搜狗科技发展有限公司 | Method and device for prompting information of uncommon characters |
CN103457973A (en) * | 2012-06-01 | 2013-12-18 | 深圳市腾讯计算机***有限公司 | Image uploading method and system, image uploading client terminal and network server |
CN108153731A (en) * | 2017-12-25 | 2018-06-12 | 掌阅科技股份有限公司 | Uncommon word processing method, computing device and computer storage media |
Non-Patent Citations (1)
Title |
---|
王慧: ""基于模板匹配的手写体字符识别算法研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110069767A (en) * | 2019-04-23 | 2019-07-30 | 掌阅科技股份有限公司 | Composition method, electronic equipment and computer storage medium based on e-book |
CN110069767B (en) * | 2019-04-23 | 2020-02-28 | 掌阅科技股份有限公司 | Typesetting method based on electronic book, electronic equipment and computer storage medium |
CN111539383A (en) * | 2020-05-22 | 2020-08-14 | 浙江蓝鸽科技有限公司 | Formula knowledge point identification method and device |
CN111539383B (en) * | 2020-05-22 | 2023-05-05 | 浙江蓝鸽科技有限公司 | Formula knowledge point identification method and device |
CN117151041A (en) * | 2023-10-27 | 2023-12-01 | 成方金融科技有限公司 | PDF (Portable document Format) generation method, device, equipment and storage medium compatible with rarely used words |
CN117151041B (en) * | 2023-10-27 | 2024-02-27 | 成方金融科技有限公司 | PDF (Portable document Format) generation method, device, equipment and storage medium compatible with rarely used words |
Also Published As
Publication number | Publication date |
---|---|
CN108846367B (en) | 2019-08-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108846367B (en) | Uncommon word processing method calculates equipment and computer storage medium | |
JP6594988B2 (en) | Method and apparatus for processing address text | |
CN104318259B (en) | A kind of equipment, method and computing device for recognizing Target Photo | |
US11270105B2 (en) | Extracting and analyzing information from engineering drawings | |
CN108153731B (en) | Uncommon word processing method calculates equipment and computer storage medium | |
US20080068383A1 (en) | Rendering and encoding glyphs | |
CN107944324A (en) | A kind of Quick Response Code distortion correction method and device | |
CN113822091B (en) | Method and device for correcting errors of two-dimensional code pattern, electronic equipment and storage medium | |
CN106663311B (en) | System and method for increasing the locating depth of image | |
CN109886127A (en) | Fingerprint identification method and terminal device | |
CN108875855A (en) | Print method, apparatus, equipment and the storage medium of polar plot | |
Harley et al. | Learning dense convolutional embeddings for semantic segmentation | |
CN108399025A (en) | A kind of method, apparatus and terminal device for correcting identification deviation | |
CN110705225A (en) | Contract marking method and device | |
CN115311469A (en) | Image labeling method, training method, image processing method and electronic equipment | |
CN108920955B (en) | Webpage backdoor detection method, device, equipment and storage medium | |
KR102239588B1 (en) | Image processing method and apparatus | |
CN108376146A (en) | Influence scoring based on domain | |
CN104424619B (en) | Information processing equipment and information processing method | |
CN112912837B (en) | Neural network compiling method, device, equipment, storage medium and program product | |
CN105892995A (en) | Minus searching method and device as well as processor | |
CN104182396B (en) | Terminal, format document content description optimization apparatus and method | |
CN109101973A (en) | Character recognition method, electronic equipment, storage medium | |
CN109597980A (en) | PDF document dividing method, device and electronic equipment | |
CN112862842B (en) | Image data processing method and device, storage medium and electronic device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |