CN104077593A - Image processing method and image processing device - Google Patents

Image processing method and image processing device Download PDF

Info

Publication number
CN104077593A
CN104077593A CN201310101523.0A CN201310101523A CN104077593A CN 104077593 A CN104077593 A CN 104077593A CN 201310101523 A CN201310101523 A CN 201310101523A CN 104077593 A CN104077593 A CN 104077593A
Authority
CN
China
Prior art keywords
url
identification
predetermined symbol
stroke
optical character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310101523.0A
Other languages
Chinese (zh)
Inventor
汪留安
孙俊
何源
范伟
胜山裕
堀田悦伸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to CN201310101523.0A priority Critical patent/CN104077593A/en
Priority to JP2014033893A priority patent/JP2014191825A/en
Publication of CN104077593A publication Critical patent/CN104077593A/en
Pending legal-status Critical Current

Links

Landscapes

  • Character Discrimination (AREA)
  • Character Input (AREA)

Abstract

The invention provides an image processing method and an image processing device. The method comprises the following steps that: a preset sign is recognized in a text region of an image; a part corresponding to the recognized preset sign is removed from the text region; the text region subjected to the corresponding part removal is subjected to optical character recognition; and the recognized preset sign is added into a corresponding position in an optical character recognition result.

Description

Image processing method and device
Technical field
The application relates generally to image and processes, and more specifically, relates to the method and apparatus that image is carried out to optical character identification (OCR).
Background technology
OCR is widely used in the character in recognition image.Conventionally, OCR is designed to the character of single character set or languages to identify, when identifying object is that in the situation of mixing of word and some symbol, general OCR method may be difficult to obtain gratifying recognition accuracy.The situation that these words and symbol mix is such as comprising URL(uniform resource locator) (URL), e-mail address, mathematical formulae, program code etc.Correspondingly, for the feature of specific identification object, exist recognition result is carried out to error correction to improve the method for recognition accuracy.For example, there is the specific syntax rule having according to concrete identifying object or based on historical information, recognition result carried out the method for error correction.
Summary of the invention
Provided hereinafter about brief overview of the present invention, to the basic comprehension about some aspect of the present invention is provided.Should be appreciated that this general introduction is not about exhaustive general introduction of the present invention.It is not that intention is determined key of the present invention or pith, and nor is it intended to limit the scope of the present invention.Its object is only that the form of simplifying provides some concept, usings this as the preorder in greater detail of discussing after a while.
According to the application aspect, a kind of image processing method comprises: at the text filed middle identification predetermined symbol of image; The removal part corresponding with the predetermined symbol identifying from text filed; To having removed the text filed execution optical character identification of this corresponding part; And the predetermined symbol identifying is added to the relevant position in the result of optical character identification.
According to another aspect of the application, a kind of image processing apparatus comprises: Symbol recognition part, is configured to the text filed middle identification predetermined symbol at image; Symbol is removed part, is configured to the removal part corresponding with the predetermined symbol identifying from text filed; Optical character identification part, is configured to having removed the text filed execution optical character identification of this corresponding part; And symbol adds part, be configured to add the predetermined symbol identifying to relevant position in the result of optical character identification.
Accompanying drawing explanation
The present invention can, by reference to hereinafter given description and being better understood by reference to the accompanying drawings, wherein use same or analogous Reference numeral to represent identical or similar parts in institute's drawings attached.Described accompanying drawing comprises in this manual and forms the part of this instructions together with detailed description below, and is used for further illustrating the preferred embodiments of the present invention and explains principle and advantage of the present invention.In the accompanying drawings:
Fig. 1 is the process flow diagram illustrating according to the process example of the image processing method of the embodiment of the present application;
Fig. 2 is the schematic diagram of according to the image processing method of the embodiment of the present application, back slash symbol being identified for illustrating;
Fig. 3 is the process flow diagram illustrating according to the process example of the image processing method of another embodiment of the application;
Fig. 4 illustrates for identifying the process flow diagram of the subprocess example of URL;
Fig. 5 illustrates for identifying the process flow diagram of the subprocess example of URL;
Fig. 6 illustrates for identifying the process flow diagram of the subprocess example of URL;
Fig. 7 illustrates for identifying the process flow diagram of the subprocess example of URL;
Fig. 8 is the block diagram illustrating according to the ios dhcp sample configuration IOS DHCP of the image processing apparatus of the embodiment of the present application;
Fig. 9 is the block diagram that the ios dhcp sample configuration IOS DHCP of Symbol recognition part is shown;
Figure 10 is the block diagram illustrating according to the ios dhcp sample configuration IOS DHCP of the image processing apparatus of another embodiment of the application;
Figure 11 is the block diagram that the ios dhcp sample configuration IOS DHCP of URL identification division is shown;
Figure 12 is the block diagram that the ios dhcp sample configuration IOS DHCP of URL identification division is shown;
Figure 13 is the block diagram that the ios dhcp sample configuration IOS DHCP of URL identification division is shown;
Figure 14 is the block diagram that the ios dhcp sample configuration IOS DHCP of URL identification division is shown; And
Figure 15 is the block diagram of exemplary configurations that the computing machine of the method and apparatus of realizing the application is shown.
Embodiment
Embodiments of the invention are described below with reference to accompanying drawings.The element of describing in an accompanying drawing of the present invention or a kind of embodiment and feature can combine with element and feature shown in one or more other accompanying drawing or embodiment.It should be noted that for purposes of clarity, in accompanying drawing and explanation, omitted expression and the description of unrelated to the invention, parts known to persons of ordinary skill in the art and processing.
First with reference to Fig. 1, describe according to the process example of the image processing method of the embodiment of the present application.
As the processing object of the image processing method of the embodiment of the present application, can be still image (image of taking such as network picture or user etc.), or can be the frame of video (such as TV, Internet video etc.), but be not limited to this.In addition, image can comprise coloured image, gray level image, bianry image etc., but is not limited to this.Where necessary, adjustment or the conversion of the aspects such as color, resolution can be carried out by several different methods known in the art, to meet the requirement of respective optical character recognition mode to image.In addition, can be by multiple existing mode recognition image text filed, do not repeat them here.
As shown in Figure 1, at step S110, in the character area of image, identify predetermined symbol.
For different application, the concrete predetermined symbol that identify can be set.For example, can be for the URL comprising in image be carried out to optical character identification according to the image processing method of the application specific embodiment.It is the important technology of optical character identification that URL based on image or frame of video detects with identification, it for example can provide the URL of the web site url showing in picture for TV or advertisement beholder, thereby makes beholder can without manual input URL in the situation that, carry out easily website visiting.As a kind of mode of man-machine interaction, part and parcel is to use that optical character system is correct fast identifies the URL in image.The mixing that comprises word and symbol due to URL, so a difficult point of URL identification is how to correct the mistake of optical character system output URL.
Correspondingly, according to the application embodiment, the predetermined symbol that identify comprises the separator for URL, for example :-_ .~! * ' (); : &=+ $ ,/%#[].Certainly, the invention is not restricted to this, in the situation that carry out optical character identification for the object that other words and symbol are mixed, the predetermined symbol that identify can be correspondingly set in step S110.For example, when the mathematical formulae of image or program code are carried out to optical character identification, can be using predetermined mathematic sign or program code symbol as predetermined symbol.Below, without loss of generality in the situation that, take and the application's embodiment is described as example for the optical character identification of URL.
Due to the URL comprising in for example television advertising or billboard comparatively simple so that beholder's memory and access conventionally, the therefore normal symbol occurring radix point ". " and the back slash "/" in URL separator normally wherein.
Therefore, according to an embodiment, the step of identifying predetermined symbol in the character area of image can comprise: according to preassigned, by the stroke recognition in text filed, be radix point ". ", and/or by the stroke recognition in text filed, be back slash "/" according to preassigned.
Wherein, " stroke " can refer to the UNICOM's parts (connected component) in the predetermined magnitude range of foreground pixel of image.For example, for bianry image, stroke can be confirmed as the black pixel communication means in predetermined magnitude range; For gray level image and coloured image, gray scale or color can be set as to foreground pixel at the pixel in preset range, and the foreground pixel communication means in predetermined magnitude range is defined as to stroke.In addition, the size of communication means can refer to absolute size (pixel quantity for example comprising) or relative size (for example, with respect to picture size or with respect to other UNICOM's part dimensions).Exist multiple known method to carry out the stroke in recognition image, do not repeat them here.
For identifying the preassigned of radix point, for example can comprise: (i) this stroke is less than preassigned with respect to the relative size of other strokes in character zone; (ii) this stroke is positioned at below the center line of text filed corresponding line of text; And (iii) the quantity of the foreground pixel in the circumscribed rectangular region of this stroke and background pixel than higher than predetermined threshold.
Wherein, about condition (i), the preassigned of this relative size can be for example that the size (for example, the pixel quantity comprising) of this stroke is less than predetermined threshold with the ratio of the mean size of text filed interior other strokes, but is not limited to this.
About condition (ii), can use existing method (such as pixel projection, stroke projection etc.) to determine the line of text in text filed, thereby determine that this stroke is whether below the center line of its line of text.In addition, about the judgement of above-below direction, for example, can, with reference to the default orientation of image, also can determine upper-lower position relation with reference to the image orientation of utilizing existing method to judge.Existing multiple line of text identification and image orientation determination methods, do not repeat them here.
Therefore about condition (iii), according to the shape facility of radix point, the circumscribed rectangular region of this stroke should be occupied by foreground pixel substantially, only has and is just defined as radix point when higher than predetermined threshold when the quantity of foreground pixel and background pixel.
In addition, except above-mentioned condition, when determining decimal deparator, can also further consider other factors, for example the length breadth ratio of the boundary rectangle of stroke need to enough approach 1, etc.
For identifying the preassigned of back slash, for example can comprise: (i) in the situation that the circumscribed rectangular region of this stroke is divided into a plurality of, piece on the diagonal line of the lower-left of rectangular area to upper right comprises foreground pixel, and the piece at the upper left corner and place, the lower right corner does not comprise foreground pixel; And (ii) the angle of inclination of stroke in preset range.
Wherein, about condition (i), as shown in Figure 2, for example, in the situation that the circumscribed rectangular region of stroke is divided into 9 pieces, piece on the diagonal line of its lower-left to upper right, piece 7,9 and 3 comprises foreground pixel (black), and the piece at the upper left corner and place, the lower right corner, be that piece 1 and 5 does not comprise foreground pixel, thereby meet the Rule of judgment of back slash.Certainly, the partitioning scheme of piece is not limited to the concrete mode shown in figure.
About condition (ii), can adopt multiplex existing mode, for example, by principal component analysis (PCA), determine the angle of inclination of stroke.According to a specific embodiment, the preset range at angle of inclination can be that the angle of inclination (as the α in Fig. 2) of the X direction (example is X-direction as shown in Figure 2) with respect to corresponding line of text is between 30 ° to 90 °.Wherein, the X direction of line of text refers to the direction that line of text Chinese word is arranged in order.As mentioned above, existing several different methods is identified the method for line of text, correspondingly can determine the X direction of line of text.
The conventional symbol of take above in URL has illustrated according to the step S110 of the identification predetermined symbol in the image processing method of the embodiment of the present application as example, yet for the application for other identifying objects (such as e-mail address, mathematical formulae, program code etc.), can determine different predetermined symbols and recognition method thereof.
Continuation is with reference to Fig. 1, when identifying in the situation of predetermined symbol, and the removal part (S120) corresponding with the predetermined symbol identifying from text filed.For example, the foreground pixel of this predetermined symbol is changed into background pixel.
Next, to having removed the text filed execution optical character identification (S130) of the part corresponding with this predetermined symbol, wherein can adopt multiple existing optical character recognition method.
Then, the predetermined symbol identifying is added to the relevant position in the result of the optical character identification in S130 in step S110, that is, the predetermined symbol identifying is added back in optical character identification result the position in original text one's own profession with it.Those skilled in the art can expect, can determine in several ways the position of predetermined symbol, to this predetermined symbol is pressed to original position, add back in optical character identification result.For example, in optical character identification result, can comprise each character identifying position in image, can add the predetermined symbol identifying to tram in character string with reference to the positional information in optical character identification result.
Utilize above-mentioned according to the image processing method of the embodiment of the present invention, by independent identification predetermined symbol, and to having removed the text filed optical character identification of carrying out of this predetermined symbol, can reduce in optical character identification because symbol and word mix the wrong identification causing, thereby improve the accuracy rate of final recognition result.In addition, the mode of identifying separately predetermined symbol for special object with this predetermined symbol is compared and is had higher counting yield with the mode of carrying out optical character identification together with other characters.
Next, with reference to Fig. 3, illustrate according to the image processing method of an embodiment of the application.Step S310 to S340 is similar with the above step S110 to S140 with reference to Fig. 1 explanation, and especially, the predetermined symbol of wherein identifying in step S310 comprises the separator for URL.At step S350, the character string obtaining based on add predetermined symbol to optical character identification result in step S340 is identified URL.Can adopt the existing multiple URL recognition method in this area, for example URL recognition methods based on syntax rule.
Due to according to the image processing method of the embodiment of the present application the character identification result based on pin-point accuracy identify URL, thereby can effectively avoid due to the situations such as the overlapping of symbol and character, adhesion cause etc. mistake in host name, thereby compare the accuracy that can improve URL identification with existing URL recognition methods.
In addition, according to the application's embodiment, the step of identifying URL from recognition result character string can comprise the subprocess described hereinafter with reference to Fig. 4 to Fig. 7 or it is appropriately combined.
As shown in Figure 4, according to the application's embodiment, the step of identification URL can comprise and string table is shown to unified coded format (for example, ACSII) (S410), so that subsequent treatment.Certainly, in the situation that the output of optical character identification itself possesses Unified coding form, can omit the processing of Unified coding form.
Alternatively, can by letter in character string unified be small letter form (S420), with further consolidation form, thereby be convenient to follow-up searching or the processing such as coupling.Yet, the processing of this step also can follow-up search or matching process in carry out.
As previously mentioned, in URL, available symbol can comprise :-_ .~! * ' (); : &=+ $ ,/%#[].Correspondingly, the character string that comprises the symbol (that is, the symbol of forbidding in URL) except these significant characters can be judged as to non-URL character string, and be removed (S430), to dwindle the object range of follow-up identifying processing.
URL is often with specific character string beginning, for example, and character string " www " or " http: // ".Wherein, " www " and part afterwards thereof, or the main part that " http: // " part afterwards can be used as URL offers user effectively to connect and to access.And " www " part before, or " http: // " and part before thereof can be considered to not belong to the part of URL.According to the application's embodiment, the in the situation that of comprising " www " in character string, remove and be positioned at " www " part (S440-S450) before; And/or, in the situation that character string comprises " // ", remove " // " and part before (S460-S470) thereof.
It should be noted that for convenience of explanation and diagram, in same flow process, there is shown step S410 to S470.Yet as mentioned above, some step wherein can be omitted, or can carry out individually these steps or its combination.
As shown in Figure 5, according to the application's embodiment, the step of identification URL can comprise: according to the distance separating character string (S510) between the space comprising in character string and/or adjacent character.
In step S510, can with the stroke spacing obtaining based on image, string segmentation be become to a plurality of fragments according to the space in optical character identification result or according to the stroke spacing obtaining based on image or according to the space in optical character identification result.
According to the taxeme of URL, in URL, do not comprise space.Therefore,, by after string segmentation, potential URL may be present in certain character string fragment.For example, when the character string identifying is " visit us.fujitsu.com/computers ", can be divided into two character string fragments " visit " and " us.fujitsu.com/computers " by space, and can select as the fragment of candidate URL or get rid of the fragment of non-URL by subsequent step.
About determining of space, can directly utilize the space in optical character identification result, also can identify separately space according to the stroke spacing in image.Or, also can adopt the combination of these two kinds of modes.For example, for the space in optical character identification result, the stroke spacing of corresponding position in image is verified, when stroke spacing reaches predetermined standard time, determined and have space.By such checking, the accuracy that improves space identification in the situation that raising assessing the cost can exceeded.
It may be noted that the forbidden digit according to adopting in the step S430 of URL forbidden digit removal character string illustrating with reference to Fig. 4 does not comprise space above.In addition, although in description above, step S430 removes the object that will get rid of to dwindle the scope of follow-up identifying processing in the rank of character string, yet the processing of step S430 also can carried out afterwards according to space and/or apart from the processing (S510) of separating character string, correspondingly, the character string fragment that comprises URL forbidden digit is removed, thereby can in the rank of character string fragment, be removed the object that will get rid of, thereby further improve the accuracy of processing.
As shown in Figure 5, the step of identification URL can also comprise: the character string based on through cutting apart, and select the character string part that comprises common key word in URL as candidate's URL(uniform resource locator) (S520).
For example, can, according to the storehouse of the common key word of URL, while comprising the key word in storehouse in character string fragment, determine that this character string fragment is candidate URL.Key word for example comprises that URL commonly uses suffix, as " .com ", " .net ", " .gov ", " .edu ", " .info " " .cn ", " .us ", " .jp ", " .uk " etc., but is not limited to this.
It should be noted that for convenience of explanation and diagram, in same flow process, there is shown step S510 and S520.Yet, as mentioned above, can carry out individually these steps or by itself and other treatment combination.
In addition, according to the application's embodiment, the step of identification URL can also comprise in sub-step as shown in Figure 6 one or all.
At step S610, according to the conventional rule of combination in URL, estimate and add the radix point of leakage identification in optical character identification result.
In optical character recognition process, the decimal deparator in image may be owing to being identified as a part for adjacent character with adjacent character adhesion by mistake, thereby lost radix point in the character string obtaining.Yet, can estimate whether to exist a mouthful radix point for identification according to the architectural feature of URL and syntax rule.For example, while, there is no ". " when candidate character strings comprises the letter in the conventional prefix of above-named URL or suffix, can estimate to exist the radix point that leaks identification.In addition, can also, for identifying the corresponding position of radix point with estimating leakage in image, to image, be further analyzed that this estimation is verified.
In addition, the step of identification URL can also comprise S620: according to position and shape facility, and radix point and whippletree in checking optical character identification result.For example, while comprising radix point ". ", middle whippletree "-" or lower whippletree " _ " in optical character identification result, can verify this optical character identification result according to the shape of corresponding stroke in image and/or position, thereby further improve the accuracy of character recognition.For example, can be according to the position of corresponding stroke and shape difference ". " and "-", according to the shape difference ". " and " _ " of corresponding stroke, according to the position difference "-" and " _ " of corresponding stroke.
It should be noted that for convenience of explanation and diagram, in same flow process, there is shown step S610 and S620.Yet, can carry out individually these steps or by itself and other treatment combination.
As shown in Figure 7, according to the application's embodiment, the step of identification URL can comprise: by mating with URL dictionary the URL comprising in the result of determining optical character identification.Obtain candidate URL from optical character identification result in the situation that, can directly it be offered to user as recognition result.Or, candidate URL can be mated to (S710) with the actual URL in existing URL dictionary.In the situation that mating degree of confidence higher than predetermined threshold (" Y " in S720), using the coupling URL in URL dictionary as URL recognition result.In the situation that mating degree of confidence not higher than predetermined threshold (" N " in S720), the candidate URL that optical character identification is obtained is as URL recognition result.Wherein, coupling degree of confidence for example can be determined according to the editing distance between the URL contrasting.By this, process, can further improve the accuracy of URL identification.
Above, with reference to Fig. 4 to Fig. 7, described respectively a plurality of subprocess of URL identification step, yet that these subprocess can carry out is appropriately combined.
Embodiment for URL identification of the present invention is compared and can be improved URL recognition accuracy with traditional URL recognition method.The general form that retrains URL according to simple syntax rule of traditional method, use therein URL syntax rule is fairly simple, and can only correct limited mistake, but can not correct the mistake occurring in host name, such as symbol, have with character overlap, the situation of adhesion etc.The present invention can avoid this mistake effectively.In addition, as previously mentioned, the processing object of the application's image processing method can comprise frame of video.The method of traditional multiframe information fusion is utilized the redundancy feature of video and is selected the conduct output of high confidence level, but for the output of identical video-frequency band OCR system based on being identical, and can not process the URL in image.And according to the image processing method of the embodiment of the present application, digital picture and frame of video are had to certain robustness.
Next, illustrate according to the ios dhcp sample configuration IOS DHCP of the image processing apparatus of the embodiment of the present application.
As shown in Figure 8, image processing apparatus 800 comprises that Symbol recognition part 810, symbol removal part 820, optical character identification part 830 and symbol add part 840.
Symbol recognition part 810 is configured to the predetermined symbol in recognition image text filed.Concrete object according to identifying, can arrange different predetermined symbols.For example, when identifying URL, predetermined symbol can comprise the conventional separator of URL; When identifying mathematical formulae, predetermined symbol can be set to specific mathematic sign.The concrete identifying object and the predetermined symbol that the invention is not restricted to enumerate here, the identifying object mixing for different symbols and word, can have multiple concrete setting.
Symbol is removed part 820 and is configured to the corresponding part of predetermined symbol that removal identifies with Symbol recognition part 810 from text filed,, removes corresponding stroke or UNICOM's parts in image that is.
Optical character identification part 830 is configured to removing through symbol the text filed execution optical character identification that part 820 has been removed corresponding part.
Symbol adds predetermined symbol that part 840 is configured to that Symbol recognition part 810 is identified and adds the relevant position in the result of optical character identification of optical character identification part 830 to.
According to the image processing apparatus of the embodiment of the present application, pass through identification predetermined symbol separately, and to having removed the text filed optical character identification of carrying out of this predetermined symbol, can reduce in optical character identification because symbol and word mix the mistake identification causing, thereby improve the accuracy rate of final recognition result, and can improve treatment effeciency.
As mentioned above, according to the image processing apparatus of the embodiment of the present application, can carry out optical character identification for the image that comprises URL.Correspondingly, as shown in Figure 9, Symbol recognition part 910 can comprise radix point recognition unit 912 and back slash recognition unit 914, for these the two kinds of conventional separators in identification URL.
Wherein, can be configured to according to following standard be radix point by the stroke recognition in text filed to radix point recognition unit 912: (i) this stroke is less than preassigned with respect to the relative size of other strokes in character zone; (ii) this stroke is positioned at below the center line of text filed corresponding line of text; And (iii) the quantity of the foreground pixel in the circumscribed rectangular region of this stroke and background pixel than higher than predetermined threshold.
It is back slash by the stroke recognition in text filed that back slash recognition unit 914 can be configured to according to following standard: (i) in the situation that the circumscribed rectangular region of this stroke is divided into a plurality of, piece on the diagonal line of the lower-left of this rectangular area to upper right comprises foreground pixel, and the piece at the upper left corner and place, the lower right corner does not comprise foreground pixel; And (ii) the angle of inclination of this stroke in preset range.Wherein, back slash recognition unit 914 can be configured to determine by principal component analysis (PCA) the angle of inclination of stroke.In addition, according to a specific embodiment, the preset range at angle of inclination is that angle of inclination with respect to the X direction of corresponding line of text is between 30 ° to 90 °.
As shown in figure 10, according to the application embodiment, the image processing apparatus 1000 that is configured to the URL in can recognition image comprises that Symbol recognition part 1010, symbol remove part 1020, optical character identification part 1030, symbol and add part 1040 and URL identification division 1050.Wherein, Symbol recognition part 1010, symbol removal part 1020, optical character identification part 1030 and the symbol interpolation configuration of part 1040 and the configuration of above-mentioned appropriate section are similar, do not repeat them here.
URL identification division 1050 is configured to according to predetermined syntax rule, at symbol, adds that part 1040 is added predetermined symbol the recognition result of optical character identification part 1030 to and identifies URL in the character string that obtains.URL identification division 1050 can be configured in several ways, and for example the existing mode based on syntax rule is identified URL.
In addition, Figure 11 to Figure 14 shows according to the ios dhcp sample configuration IOS DHCP of URL identification division.
As shown in figure 11, URL identification division can comprise that coding unit 1152, format conversion unit 1154, character string selection unit 1156 and character string cut out unit 1158.
Coding unit 1152 is configured to string table to be shown unified coded format, for example ASCII.
Form dress changes unit 1154 and is configured to change the letter dress in character string into small letter form.
Coding unit 1152 and format conversion unit 1154 for the form of same optical character identification result so that subsequent treatment, yet these parts also can be contained in optical character and process in optical character identification part or corresponding subsequent treatment part, and are not necessarily included in URL identification division.
Character string selection unit 1156 is configured to remove the character string of the symbol comprise URL forbidding, thus object range that can little follow-up identifying processing.
The in the situation that character string being cut out unit 1158 and is configured to comprise " www " in character string, remove and be positioned at " www " part before; Or in the situation that character string comprises " // ", remove " // " and part before thereof.Thereby, can reduce the length of pending character string.
As shown in figure 12, URL identification division can comprise string segmentation unit 1252 and candidate's selected cell 1254.
String segmentation unit 1252 is configured to cut apart described character string according to the distance between space and/or adjacent character.Wherein, string segmentation unit 1252 can be configured to directly utilize the space in optical character identification result to cut apart, and also can identify separately space according to the stroke spacing in image, and utilizes the space of identifying to cut apart.Or, string segmentation unit 1252 can be configured to adopt the combination of these two kinds of modes, for example, for the space in optical character identification result, stroke spacing to corresponding position in image is verified, when stroke spacing reaches predetermined standard time, determine and have space, and utilize the space of empirical tests to carry out cutting apart of character string.
Candidate's selected cell 1254 be configured in the character string of cutting apart through string segmentation unit 1252 to select comprising URL in the character string part of common key word (for example above cited) as candidate URL.
The forbidden digit that the character string selection unit 1156 illustrating with reference to Figure 11 before it may be noted that adopts does not comprise space.In addition, although in description above, the object that will get rid of is removed in character string selection unit 1156 in the rank of character string, however its also can with string segmentation unit 1252 cooperation, with according to space and or apart from the processing of separating character string after carry out the screening to character string fragment.
As shown in figure 13, URL identification division can comprise leakage identification determining unit 1352 and validation symbol unit 1354.
Leak identification determining unit 1352 and be configured to the conventional rule of combination according to URL, estimate and add the radix point of leakage identification in optical character identification result.For example, leak identification determining unit 1352 and can be configured to comprise the letter in the conventional prefix of above-mentioned URL or suffix and while there is no ". " when candidate character strings, estimate to exist and leak the radix point of identifying.In addition, leaking identification determining unit 1352 can also be configured to, for identifying the corresponding position of radix point with estimating leakage in image, image is further analyzed to this estimation is verified.
Validation symbol unit 1354 is configured to according to position and shape facility, verifies radix point and whippletree in described optical character identification result.For example, when, validation symbol unit 1354 can be configured to comprise radix point ". ", middle whippletree "-" or lower whippletree " _ " in optical character identification result, according to the shape of corresponding stroke in image and/or position, verify this optical character identification result.
As shown in figure 14, URL identification division can comprise matching unit 1452, is configured to by mating with URL dictionary the URL(uniform resource locator) comprising in the result of determining optical character identification.Particularly, for obtain candidate URL from optical character identification result, matching unit 1452 can be configured to candidate URL to mate with the actual URL in existing URL dictionary, in the situation that mating degree of confidence higher than predetermined threshold, the coupling URL in URL dictionary is defined as to URL recognition result, the candidate URL in the situation that mating degree of confidence not higher than predetermined threshold, optical character identification being obtained is defined as URL recognition result.Wherein, coupling degree of confidence can be determined according to the editing distance between the URL contrasting.
Above, for convenience of explanation and diagram, with reference to Figure 11 to Figure 14, described respectively the ios dhcp sample configuration IOS DHCP of URL identification division, yet unit in this ios dhcp sample configuration IOS DHCP can be configured individually or unit can carry out appropriately combined.
As an example, each step of said method and all modules of said apparatus and/or unit may be embodied as software, firmware, hardware or its combination.In the situation that realizing by software or firmware, the program of the software that is configured for implementing said method can be installed from storage medium or network to the computing machine (example multi-purpose computer 1500 as shown in figure 15) with specialized hardware structure, this computing machine, when various program is installed, can be carried out various functions etc.
In Figure 15, operation processing unit (being CPU) 1501 is carried out various processing according to the program of storage in ROM (read-only memory) (ROM) 1502 or from the program that storage area 1508 is loaded into random-access memory (ram) 1503.In RAM1503, also store as required data required when CPU1501 carries out various processing etc.CPU1501, ROM1502 and RAM1503 are via bus 1504 link each other.Input/output interface 1505 also link arrives bus 1504.
Following parts link is to input/output interface 1505: importation 1506(comprises keyboard, mouse etc.), output 1507(comprises display, such as cathode ray tube (CRT), liquid crystal display (LCD) etc., with loudspeaker etc.), storage area 1508(comprises hard disk etc.), communications portion 1509(comprises that network interface unit is such as LAN card, modulator-demodular unit etc.).Communications portion 1509 via network such as the Internet executive communication is processed.As required, driver 1510 also can link to input/output interface 1505.Detachable media 1511, such as disk, CD, magneto-optic disk, semiconductor memory etc. are installed on driver 1510 as required, is installed in storage area 1508 computer program of therefrom reading as required.
In the situation that realizing above-mentioned series of processes by software, from network such as the Internet or storage medium are such as detachable media 1511 is installed the program that forms softwares.
It will be understood by those of skill in the art that this storage medium is not limited to wherein having program stored therein shown in Figure 15, distributes separately to user, to provide the detachable media 1511 of program with equipment.The example of detachable media 1511 comprises disk (comprising floppy disk (registered trademark)), CD (comprising compact disc read-only memory (CD-ROM) and digital universal disc (DVD)), magneto-optic disk (comprising mini-disk (MD) (registered trademark)) and semiconductor memory.Or storage medium can be hard disk comprising in ROM1502, storage area 1508 etc., computer program stored wherein, and be distributed to user together with the equipment that comprises them.
The present invention also proposes a kind of program product that stores the instruction code that machine readable gets.When described instruction code is read and carried out by machine, can carry out above-mentioned according to the method for the embodiment of the present invention.
Correspondingly, for carrying the above-mentioned storage medium that stores the program product of the instruction code that machine readable gets, be also included within of the present invention open.Described storage medium includes but not limited to floppy disk, CD, magneto-optic disk, storage card, memory stick etc.
In the above in the description of the specific embodiment of the invention, the feature of describing and/or illustrating for a kind of embodiment can be used by same or similar mode in one or more other embodiment, combined with the feature in other embodiment, or substitute the feature in other embodiment.
Should emphasize, term " comprises/comprises " existence that refers to feature, key element, step or assembly while using herein, but does not get rid of the existence of one or more further feature, key element, step or assembly or add.
In above-described embodiment and example, adopted the Reference numeral of digital composition to represent each step and/or unit.Those of ordinary skill in the art should be understood that these Reference numerals are just for the ease of narration and drawing, and not represents its order or any other restriction.
In addition, the time sequencing of describing during method of the present invention is not limited to is to specifications carried out, also can be according to other time sequencing ground, carry out concurrently or independently.The execution sequence of the method for therefore, describing in this instructions is not construed as limiting technical scope of the present invention.
Although the present invention is disclosed by the description to specific embodiments of the invention above,, should be appreciated that, above-mentioned all embodiment and example are all illustrative, and not restrictive.Those skilled in the art can design various modifications of the present invention, improvement or equivalent in the spirit and scope of claims.These modifications, improvement or equivalent also should be believed to comprise in protection scope of the present invention.
According to above description, known the application at least discloses following technical scheme:
1. 1 kinds of image processing methods of remarks, comprising:
Text filed middle identification predetermined symbol at described image;
From described text filed the removal part corresponding with the described predetermined symbol identifying;
To having removed the described text filed execution optical character identification of described corresponding part; And
The described described predetermined symbol identifying is added to the relevant position in the result of described optical character identification.
Remarks 2. is according to the method described in remarks 1, and wherein, described predetermined symbol comprises the separator for URL(uniform resource locator).
Remarks 3. is according to the method described in remarks 1, and wherein, described predetermined symbol comprises radix point, and the step of identifying described predetermined symbol comprises that be radix point according to following standard by described stroke recognition in text filed:
This stroke is less than preassigned with respect to the relative size of other strokes in described character zone;
This stroke is positioned at below the center line of described text filed corresponding line of text; And
Foreground pixel in the circumscribed rectangular region of this stroke compares higher than predetermined threshold with the quantity of background pixel.
Remarks 4. is according to the method described in remarks 1, and wherein, described predetermined symbol comprises back slash, and the step of identifying described predetermined symbol comprises that be back slash according to following standard by described stroke recognition in text filed:
In the situation that the circumscribed rectangular region of this stroke is divided into a plurality of, the piece on the diagonal line of the lower-left of described rectangular area to upper right comprises foreground pixel, and the piece at the upper left corner of described rectangular area and place, the lower right corner does not comprise foreground pixel; And
The angle of inclination of described stroke is in preset range.
Remarks 5., according to the method described in remarks 4, wherein, is determined the angle of inclination of described stroke by principal component analysis (PCA).
Remarks 6. is according to the method described in remarks 4, and wherein, the preset range at described angle of inclination is: with respect to the angle of inclination of the X direction of corresponding line of text between 30 ° to 90 °.
Remarks 7., according to the method described in remarks 2, also comprises:
According to predetermined syntax rule, in the character string obtaining adding described predetermined symbol to described optical character identification result, identify URL(uniform resource locator).
Remarks 8. is according to the method described in remarks 7, and wherein, the step of described identification URL(uniform resource locator) comprises:
With unified coded format, represent described character string.
Remarks 9. is according to the method described in remarks 8, and wherein, described unified coded format comprises ASCII.
Remarks 10. is according to the method described in remarks 7, and wherein, the step of described identification URL(uniform resource locator) comprises:
Change the letter dress in described character string into small letter form.
Remarks 11. is according to the method described in remarks 7, and wherein, the step of described identification URL(uniform resource locator) comprises:
The character string that removal comprises the symbol of forbidding in URL(uniform resource locator).
Remarks 12. is according to the method described in remarks 7, and wherein, the step of described identification URL(uniform resource locator) comprises:
In the situation that described character string comprises " www ", remove and be positioned at " www " part before; Or
In the situation that described character string comprises " // ", remove " // " and part before thereof.
Remarks 13. is according to the method described in remarks 7, and wherein, the step of described identification URL(uniform resource locator) comprises:
According to the distance between space and/or adjacent character, cut apart described character string.
Remarks 14. is according to the method described in remarks 13, and wherein, the step of described identification URL(uniform resource locator) also comprises:
Select to comprise URL(uniform resource locator) in the described character string through cutting apart in, the character string part of common key word is as candidate's URL(uniform resource locator).
Remarks 15. is according to the method described in remarks 7, and wherein, the step of described identification URL(uniform resource locator) comprises:
According to the conventional rule of combination in URL(uniform resource locator), estimate and add the radix point of leakage identification in described optical character identification result.
Remarks 16. is according to the method described in remarks 7, and wherein, the step of described identification URL(uniform resource locator) comprises:
According to position and shape facility, verify radix point and whippletree in described optical character identification result.
Remarks 17. is according to the method described in remarks 7, and wherein, the step of described identification URL(uniform resource locator) comprises:
By mating with URL(uniform resource locator) dictionary the URL(uniform resource locator) comprising in the result of determining described optical character identification.
Remarks 18., according to the method described in remarks 17, wherein, is determined the degree of confidence of described coupling according to editing distance.
Remarks 19. is according to the method described in any one in aforementioned remarks, and wherein, described image comprises frame of video.
20. 1 kinds of image processing apparatus of remarks, comprising:
Symbol recognition part, is configured to the text filed middle identification predetermined symbol at described image;
Symbol is removed part, be configured to from described text filed the removal part corresponding with the described predetermined symbol identifying;
Optical character identification part, is configured to having removed the described text filed execution optical character identification of described corresponding part; And
Symbol adds part, is configured to add the described described predetermined symbol identifying to relevant position in the result of described optical character identification.
Remarks 21. is according to the device described in remarks 20, and wherein, described predetermined symbol comprises the separator for URL(uniform resource locator).
Remarks 22. is according to the device described in remarks 20, and wherein, described Symbol recognition partly comprises radix point recognition unit, and being configured to according to following standard is radix point by described text filed interior stroke recognition:
This stroke is less than preassigned with respect to the relative size of other strokes in described character zone;
This stroke is positioned at below the center line of described text filed corresponding line of text; And
Foreground pixel in the circumscribed rectangular region of this stroke compares higher than predetermined threshold with the quantity of background pixel.
Remarks 23. is according to the device described in remarks 20, and wherein, described Symbol recognition partly comprises back slash recognition unit, and being configured to according to following standard is back slash by described text filed interior stroke recognition:
In the situation that the circumscribed rectangular region of this stroke is divided into a plurality of, the piece on the diagonal line of the lower-left of described rectangular area to upper right comprises foreground pixel, and the piece at the upper left corner of described rectangular area and place, the lower right corner does not comprise foreground pixel; And
The angle of inclination of described stroke is in preset range.
Remarks 24. is according to the device described in remarks 23, and wherein, described back slash recognition unit is configured to determine by principal component analysis (PCA) the angle of inclination of described stroke.
Remarks 25. is according to the device described in remarks 23, and wherein, the preset range at described angle of inclination is: with respect to the angle of inclination of the X direction of corresponding line of text between 30 ° to 90 °.
Remarks 26., according to the device described in remarks 21, also comprises:
URL(uniform resource locator) identification division, is configured to according to predetermined syntax rule, in the character string obtaining, identifies URL(uniform resource locator) adding described predetermined symbol to described optical character identification result.
Remarks 27. is according to the device described in remarks 26, and wherein, described URL(uniform resource locator) identification division comprises:
Coding unit, is configured to represent described character string with unified coded format.
Remarks 28. is according to the device described in remarks 27, and wherein, described unified coded format comprises ASCII.
Remarks 29. is according to the device described in remarks 26, and wherein, described URL(uniform resource locator) identification division comprises:
Form dress changes unit, is configured to change the letter dress in described character string into small letter form.
Remarks 30. is according to the device described in remarks 26, and wherein, described URL(uniform resource locator) identification division comprises:
Character string selection unit, is configured to remove the character string that comprises the symbol of forbidding in URL(uniform resource locator).
Remarks 31. is according to the device described in remarks 26, and wherein, described URL(uniform resource locator) identification division comprises that character string cuts out unit, is configured to:
In the situation that described character string comprises " www ", remove and be positioned at " www " part before; Or
In the situation that described character string comprises " // ", remove " // " and part before thereof.
Remarks 32. is according to the device described in remarks 26, and wherein, described URL(uniform resource locator) identification division comprises:
String segmentation unit, is configured to cut apart described character string according to the distance between space and/or adjacent character.
Remarks 33. is according to the device described in remarks 32, and wherein, described URL(uniform resource locator) identification division also comprises:
Candidate's selected cell, be configured in the described character string through cutting apart to select comprising URL(uniform resource locator) in the character string part of common key word as candidate's URL(uniform resource locator).
Remarks 34. is according to the device described in remarks 26, and wherein, described URL(uniform resource locator) identification division comprises:
Leak identification determining unit, be configured to according to the conventional rule of combination in URL(uniform resource locator), estimate and add the radix point of leakage identification in described optical character identification result.
Remarks 35. is according to the device described in remarks 26, and wherein, described URL(uniform resource locator) identification division comprises:
Validation symbol unit, is configured to according to position and shape facility, verifies radix point and whippletree in described optical character identification result.
Remarks 36. is according to the device described in remarks 26, and wherein, described URL(uniform resource locator) identification division comprises:
Matching unit, is configured to by mating with URL(uniform resource locator) dictionary the URL(uniform resource locator) comprising in the result of determining described optical character identification.
Remarks 37. is according to the device described in remarks 36, and wherein, described matching unit is configured to determine according to editing distance the degree of confidence of described coupling.
Remarks 38. is according to the device described in any one in aforementioned remarks, and wherein, described image comprises frame of video.

Claims (10)

1. an image processing method, comprising:
Text filed middle identification predetermined symbol at described image;
From described text filed the removal part corresponding with the described predetermined symbol identifying;
To having removed the described text filed execution optical character identification of described corresponding part; And
The described described predetermined symbol identifying is added to the relevant position in the result of described optical character identification.
2. method according to claim 1, wherein, described predetermined symbol comprises the separator for URL(uniform resource locator).
3. method according to claim 1, wherein, described predetermined symbol comprises radix point, and the step of identifying described predetermined symbol comprises that be radix point according to following standard by described stroke recognition in text filed:
This stroke is less than preassigned with respect to the relative size of other strokes in described character zone;
This stroke is positioned at below the center line of described text filed corresponding line of text; And
Foreground pixel in the circumscribed rectangular region of this stroke compares higher than predetermined threshold with the quantity of background pixel.
4. method according to claim 1, wherein, described predetermined symbol comprises back slash, and the step of identifying described predetermined symbol comprises that be back slash according to following standard by described stroke recognition in text filed:
In the situation that the circumscribed rectangular region of this stroke is divided into a plurality of, the piece on the diagonal line of the lower-left of described rectangular area to upper right comprises foreground pixel, and the piece at the upper left corner of described rectangular area and place, the lower right corner does not comprise foreground pixel; And
The angle of inclination of described stroke is in preset range.
5. method according to claim 2, also comprises:
According to predetermined syntax rule, in the character string obtaining adding described predetermined symbol to described optical character identification result, identify URL(uniform resource locator).
6. method according to claim 5, wherein, the step of described identification URL(uniform resource locator) comprises:
In the situation that described character string comprises " www ", remove and be positioned at " www " part before; Or
In the situation that described character string comprises " // ", remove " // " and part before thereof.
7. method according to claim 5, wherein, the step of described identification URL(uniform resource locator) comprises:
According to the distance between space and/or adjacent character, cut apart described character string.
8. method according to claim 5, wherein, the step of described identification URL(uniform resource locator) comprises:
By mating with URL(uniform resource locator) dictionary the URL(uniform resource locator) comprising in the result of determining described optical character identification.
9. according to method in any one of the preceding claims wherein, wherein, described image comprises frame of video.
10. an image processing apparatus, comprising:
Symbol recognition part, is configured to the text filed middle identification predetermined symbol at described image;
Symbol is removed part, be configured to from described text filed the removal part corresponding with the described predetermined symbol identifying;
Optical character identification part, is configured to having removed the described text filed execution optical character identification of described corresponding part; And
Symbol adds part, is configured to add the described described predetermined symbol identifying to relevant position in the result of described optical character identification.
CN201310101523.0A 2013-03-27 2013-03-27 Image processing method and image processing device Pending CN104077593A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201310101523.0A CN104077593A (en) 2013-03-27 2013-03-27 Image processing method and image processing device
JP2014033893A JP2014191825A (en) 2013-03-27 2014-02-25 Image processing method and image processing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310101523.0A CN104077593A (en) 2013-03-27 2013-03-27 Image processing method and image processing device

Publications (1)

Publication Number Publication Date
CN104077593A true CN104077593A (en) 2014-10-01

Family

ID=51598839

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310101523.0A Pending CN104077593A (en) 2013-03-27 2013-03-27 Image processing method and image processing device

Country Status (2)

Country Link
JP (1) JP2014191825A (en)
CN (1) CN104077593A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108416555A (en) * 2018-03-26 2018-08-17 海航货运有限公司 A kind of aviation goods postal shipping bill data processing method and device
CN109766885A (en) * 2018-12-29 2019-05-17 北京旷视科技有限公司 A kind of character detecting method, device, electronic equipment and storage medium
CN109815946A (en) * 2018-12-03 2019-05-28 东南大学 Multithreading business license positioning identifying method based on intensive connection network
CN109961063A (en) * 2017-12-26 2019-07-02 杭州海康机器人技术有限公司 Method for text detection and device, computer equipment and storage medium
CN110059214A (en) * 2019-04-01 2019-07-26 北京奇艺世纪科技有限公司 A kind of image resource processing method and processing device

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7039882B2 (en) * 2017-08-16 2022-03-23 富士フイルムビジネスイノベーション株式会社 Image analysis device and image analysis program

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101441713A (en) * 2007-11-19 2009-05-27 汉王科技股份有限公司 Optical character recognition method and apparatus of PDF document
CN101520851A (en) * 2008-02-29 2009-09-02 富士通株式会社 Character information identification device and method
CN101593276A (en) * 2008-05-29 2009-12-02 汉王科技股份有限公司 A kind of video OCR image-text separation method and system
CN102236800A (en) * 2010-05-03 2011-11-09 微软公司 Word recognition of text undergoing an OCR process
CN102654874A (en) * 2011-03-02 2012-09-05 顾菊林 Bill data management method and system

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS62298885A (en) * 1986-06-18 1987-12-25 Hitachi Ltd Optical character read system
JPH09274646A (en) * 1996-04-05 1997-10-21 Zakuson R & D:Kk Automatic recognition method for url
JP4575630B2 (en) * 2001-08-29 2010-11-04 パナソニック株式会社 URL information acquisition device
JP4431335B2 (en) * 2003-08-07 2010-03-10 日立オムロンターミナルソリューションズ株式会社 String reader
JP2006244243A (en) * 2005-03-04 2006-09-14 Canon Inc On-demand catalog creating system
CN102024139A (en) * 2009-09-18 2011-04-20 富士通株式会社 Device and method for recognizing character strings

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101441713A (en) * 2007-11-19 2009-05-27 汉王科技股份有限公司 Optical character recognition method and apparatus of PDF document
CN101520851A (en) * 2008-02-29 2009-09-02 富士通株式会社 Character information identification device and method
CN101593276A (en) * 2008-05-29 2009-12-02 汉王科技股份有限公司 A kind of video OCR image-text separation method and system
CN102236800A (en) * 2010-05-03 2011-11-09 微软公司 Word recognition of text undergoing an OCR process
CN102654874A (en) * 2011-03-02 2012-09-05 顾菊林 Bill data management method and system

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109961063A (en) * 2017-12-26 2019-07-02 杭州海康机器人技术有限公司 Method for text detection and device, computer equipment and storage medium
CN108416555A (en) * 2018-03-26 2018-08-17 海航货运有限公司 A kind of aviation goods postal shipping bill data processing method and device
CN109815946A (en) * 2018-12-03 2019-05-28 东南大学 Multithreading business license positioning identifying method based on intensive connection network
CN109766885A (en) * 2018-12-29 2019-05-17 北京旷视科技有限公司 A kind of character detecting method, device, electronic equipment and storage medium
CN109766885B (en) * 2018-12-29 2022-01-18 北京旷视科技有限公司 Character detection method and device, electronic equipment and storage medium
CN110059214A (en) * 2019-04-01 2019-07-26 北京奇艺世纪科技有限公司 A kind of image resource processing method and processing device

Also Published As

Publication number Publication date
JP2014191825A (en) 2014-10-06

Similar Documents

Publication Publication Date Title
CN104077593A (en) Image processing method and image processing device
CN111563509B (en) Tesseract-based substation terminal row identification method and system
CN103400099B (en) Terminal and two-dimensional code identification method
CN105631393A (en) Information recognition method and device
CN104866542A (en) POI data verification method and device
CN109685870B (en) Information labeling method and device, labeling equipment and storage medium
CN104852889A (en) Picture identifying code generation method and system, and verify method and client side, and server
KR100964792B1 (en) System and method of content adaptation for mobile web conditions
US20160259991A1 (en) Method and image processing apparatus for performing optical character recognition (ocr) of an article
CN102024138A (en) Character identification method and character identification device
CN106446898A (en) Extraction method and extraction device of character information in image
RU2605078C2 (en) Image segmentation for data verification
CN101625752B (en) Image processing apparatus and image processing method
CN102243707B (en) Character recognition result verification apparatus and character recognition result verification method
CN113361462B (en) Method and device for video processing and caption detection model
CN103677502A (en) Method and device for displaying pictures
CN104765630A (en) Software installation method and software installation device
CN112396048A (en) Picture information extraction method and device, computer equipment and storage medium
CN107622046A (en) A kind of algorithm according to keyword abstraction text snippet
CN108881665B (en) Information processing apparatus, information processing method, and computer program
CN105677718A (en) Character retrieval method and apparatus
WO2022105120A1 (en) Text detection method and apparatus from image, computer device and storage medium
CN113033333B (en) Entity word recognition method, entity word recognition device, electronic equipment and storage medium
CN115331247A (en) Document structure identification method and device, electronic equipment and readable storage medium
CN113055708B (en) Program copyright protection method and device based on station caption identification

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
AD01 Patent right deemed abandoned
AD01 Patent right deemed abandoned

Effective date of abandoning: 20190215