WO2020173008A1 - Text recognition method and apparatus - Google Patents

Text recognition method and apparatus Download PDF

Info

Publication number
WO2020173008A1
WO2020173008A1 PCT/CN2019/088978 CN2019088978W WO2020173008A1 WO 2020173008 A1 WO2020173008 A1 WO 2020173008A1 CN 2019088978 W CN2019088978 W CN 2019088978W WO 2020173008 A1 WO2020173008 A1 WO 2020173008A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
area
picture
target
target picture
Prior art date
Application number
PCT/CN2019/088978
Other languages
French (fr)
Chinese (zh)
Inventor
许洋
刘鹏
王健宗
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020173008A1 publication Critical patent/WO2020173008A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching

Definitions

  • This application relates to the field of computer technology, in particular to a text recognition method and device.
  • optical character recognition services are divided into two types: general optical character recognition (Optical Character Recognition, OCR) and specific scene OCR.
  • OCR Optical Character Recognition
  • specific scene OCR In theory, universal OCR can recognize all the characters contained in the picture, but its recognition accuracy is low; due to the characteristics of the universal OCR itself, the specific meaning of the characters in the picture cannot be obtained.
  • OCR for a specific scene is better than general OCR for specific pictures, and the meaning of the characters in the picture can be obtained
  • OCR for a specific scene only supports pictures containing specified text carriers (such as ID cards, driving licenses or value-added tax invoices) And other content pictures)
  • specified text carriers such as ID cards, driving licenses or value-added tax invoices
  • other content pictures such as image cards, driving licenses or value-added tax invoices
  • the use of OCR in a specific scene requires template customization for the target type of picture, and model training needs to be performed on a large number of pictures of this type, resulting in a long template generation cycle.
  • there are many types of text carriers the cost of producing multiple recognition templates to include multiple text carriers is too high, and the market demand for some recognition templates is low, which makes cost recovery difficult.
  • the embodiments of the present application provide a text recognition method and device, which can quickly and effectively recognize the text meaning of a text carrier in a picture to be recognized.
  • an embodiment of the present application provides a text recognition method, which may include:
  • the anchor area and the text area have a positional correspondence relationship, and the text area includes text information defining the meaning of the text;
  • the target picture to be recognized in the target picture is determined by the positional correspondence between the anchor point area and the text area in the picture recognition template Text area
  • the embodiment of this application not only solves the problem that general OCR recognition cannot obtain the meaning of picture text, but also eliminates the need for deep customization of a specific text carrier and a large amount of training data of the text carrier, and uses a shorter development cycle to support those without ready-made templates
  • the style can quickly obtain the textual meaning with a certain pattern and form.
  • an embodiment of the present application provides a text recognition device, which may include:
  • the generating unit is configured to generate a picture recognition template according to the preset anchor point area and its corresponding text area; the anchor point area and the text area have a positional correspondence relationship, and the text area includes text defining the meaning of the text information;
  • the positioning unit is configured to determine the target picture according to the positional correspondence between the anchor area and the text area in the picture recognition template when the target picture matches the anchor area of the picture recognition template
  • the reading unit is configured to obtain the text meaning of the text information in the target text area according to the defined text meaning.
  • an embodiment of the present application provides a network device, including a processor and a memory, the processor and the memory are connected to each other, wherein the memory is used to store a computer program, and the computer program includes program instructions
  • the processor is configured to call the program instructions to execute the method described in the first aspect.
  • an embodiment of the present application provides a computer-readable storage medium that stores a computer program.
  • the computer program includes program instructions that, when executed by a processor, cause all The processor executes the method described in the first aspect.
  • FIG. 1 is a schematic diagram of a text recognition system architecture provided by an embodiment of the present application
  • FIG. 2 is a schematic flowchart of a text recognition method provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of perspective transformation provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a text recognition device provided by an embodiment of the present application.
  • Fig. 5 is a schematic structural diagram of a network device provided by an embodiment of the present application.
  • the embodiments of the present application provide a text recognition method and device, which can quickly and effectively recognize the text meaning of a text carrier in a picture to be recognized.
  • OCR Optical Character Recognition
  • Perspective transformation refers to the use of the three-point collinear condition of the perspective center, the image point, and the target point to rotate the bearing surface (perspective surface) around the trace (perspective axis) according to the law of perspective rotation. Angle, destroy the original projection light beam, and still maintain the constant transformation of the projection geometry on the shadow bearing surface.
  • FIG. 1 is a schematic diagram of a text recognition system architecture provided by an embodiment of the present application.
  • the system architecture includes a terminal, a target picture, and text meaning;
  • the terminal can be a mobile phone, tablet computer, notebook computer, palmtop computer, mobile Internet device or other mobile terminal; among them,
  • a terminal can be a peripheral device in a computer network, and can also be used for input of information and output of processing results. It can also be referred to as a system, user unit, user station, mobile station, mobile station, remote station, remote terminal, mobile device, user terminal, mobile terminal, wireless communication device, user agent, user device, service device with plug-in installation or User equipment (user equipment, UE).
  • the terminal may be a cellular phone, mobile phone, cordless phone, smart watch, wearable device, tablet device, session initiation protocol (SIP) phone, wireless local loop (wireless local loop, WLL) ) Station, personal digital assistant (PDA), handheld device with wireless communication function, computing device, in-vehicle communication module, smart meter or other processing device connected to wireless modem.
  • SIP session initiation protocol
  • WLL wireless local loop
  • PDA personal digital assistant
  • the terminal may perform the following operations: generate a picture recognition template according to a preset anchor area and its corresponding text area; there is a positional correspondence between the anchor area and the text area, and the text
  • the area includes text information that defines the meaning of the text; in the case where the target picture matches the anchor area of the picture recognition template, the position correspondence between the anchor area and the text area in the picture recognition template is used, Determine the target text area for recognition in the target picture; obtain the text meaning of the text information in the target text area according to the defined text meaning.
  • the target picture is the picture to be recognized that needs to recognize the meaning of the text; the picture includes the text carrier.
  • the text carrier is the physical carrier of text information.
  • Obtaining the meaning of the text information in the target picture is the purpose of the embodiments of this application.
  • Picture types can include ID card photos, invoice photos, driver's license photos, etc.; in the figure, the ID card photo is taken as an example for illustration.
  • the ID card in the ID card photo is the text carrier, and the name and ID number are text information;
  • the textual meaning can be obtained by recognizing the picture through the embodiment of the application, that is, by recognizing the target picture, the name of the ID card owner and the ID number of the owner can be directly obtained.
  • the text meaning is the actual meaning of the text information in the target picture that is expected to be obtained through the identification of the embodiments of this application.
  • the recognized text meaning is not only a string of numbers or characters, but also reflects the content expressed by the numbers or characters.
  • the unified social credit code of the company can be identified from the photo of the company's invoice.
  • FIG. 1 is only an exemplary implementation in the embodiments of the present application.
  • the system architecture in the embodiment of the present application may include but is not limited to the above system architecture.
  • FIG. 2 is a schematic flowchart of a text recognition method provided by an embodiment of the present application.
  • the text recognition method can be applied to a text recognition system (including the above architecture).
  • a text recognition system including the above architecture.
  • the method may include the following steps S201 to S204; wherein, the optional steps may include step S202.
  • Step S201 Generate a picture recognition template according to the preset anchor area and its corresponding text area.
  • the anchor area and the corresponding text area of the template are preset to generate the corresponding recognition template.
  • the anchor area of the template can be one or more.
  • the text area includes text information defining the meaning of the text.
  • M picture recognition templates are generated respectively according to the preset anchor area and the preset text area, where M is an integer greater than 0.
  • each picture recognition template has a corresponding anchor area and text area, and this application does not limit the number of anchor areas and text areas of the picture recognition template.
  • the M picture recognition templates are generated according to M standard pictures. Taking an ID card image as an example, in the standard image of the ID card image recognition template, the image is clear and has no serious deformation, which meets the requirements for identifying other target images using it as the identification standard.
  • the anchor area can include several texts or several symbols in the text area, such as: "VAT invoices” for value-added tax invoices, "names” of ID cards, driving certificates “The People's Republic of China Motor Vehicle Driving Permit” etc. It is understandable that there may be one or more preset anchor point areas; preset multiple scattered anchor point areas are beneficial to accurately locate the target text area.
  • the text area can include the area where the text is located; the shape of the text area can be selected and set by the user; optionally, the preset number of text areas can be one or more; in the image recognition template ,
  • the meaning of the text in the preset text area, for example, the text carrier is the image recognition template of a bank card.
  • the 16 to 19 digits in the target text area represent the card number of the bank card, and the Chinese character "XXXX Bank" in the target text area "Represents the name of the bank, etc.
  • the user selects the anchor area, such as the area where the "name” is located and the area where the "citizen ID number” is located; among them, the standard ID card image includes definition and degree of deformation A template picture that meets the recognition requirements; it is understandable that each anchor area needs to include at least key text content, such as the "name” area needs to contain the "name” characters; the preset anchor area size and shape are reasonable,
  • the shape of the anchor point area may be a rectangle of a suitable size, that is, the rectangular area at least includes the target text, but it is not too large to include other unnecessary characters and other text content.
  • the user selects the text area, and the user can set the meaning of the text in the text area. It is understandable that after the user selects the target text area, the position of the target text area relative to the anchor point area is confirmed.
  • Step S202 Recognize the target picture according to the universal optical character recognition OCR, and confirm the text information of the target picture; by judging whether the text information of the target picture matches the anchor area of the picture recognition template.
  • general optical characters are used to recognize the text information in the target picture; according to the obtained text information of the target picture, it is determined whether the text information is consistent with the text information in the anchor area of the picture recognition template.
  • ID card recognition as an example, obtain all the text or character information and relative positions in the target image through general OCR recognition; to identify the "name” and "ID number" of a specific location to match the anchor area of the image recognition template , To determine whether there is the same information in the anchor area of the template.
  • Step S203 In the case that the target picture matches the anchor point area of the picture recognition template, determine that the process is performed in the target picture according to the position correspondence between the anchor point area and the text area in the picture recognition template. The recognized target text area.
  • the anchor point area of each picture recognition template in the M picture recognition templates is matched; if the target anchor point area matches a certain picture recognition template of the M picture recognition templates If the anchor point area matches, it is confirmed that the image recognition template can be used to recognize the target picture, otherwise it is determined that no suitable image recognition template can be recognized.
  • the template device obtains the required identification template. If there is only one picture recognition template, the target picture is matched according to the unique picture recognition template to determine whether it can be recognized. It is understandable that the matching process may include detecting whether the information in the anchor point area is consistent, and detecting whether the size of the anchor point area is consistent, and so on.
  • the judging whether the text information of the target picture matches the anchor area of the picture recognition template includes: judging whether the area corresponding to the text information of the target picture matches the The anchor point area of the picture recognition template matches; if so, it is detected whether the similarity between the text information of the target picture and the text information contained in the anchor point area of the picture recognition template exceeds a preset threshold; The text information matches the anchor point area of the picture recognition template.
  • the method further includes: when the target picture recognition template cannot be found from the plurality of picture recognition templates, obtaining the target picture from the first device according to the text information of the target picture
  • the target picture recognition template and the first device is a device storing the target picture recognition template.
  • the following exemplarily enumerate the application scenarios of the text recognition method in the present application, and the following describes the recognition template of the identification card as an example.
  • the anchor area 1 of the image recognition template used to identify the ID card is the area containing "name”, and the anchor area 2 is the area containing the "citizen ID number";
  • the target image is When the text carrier is a picture of an ID card, and the text information in the target picture includes "name” and "citizen ID number", the picture recognition template is determined for image recognition, and subsequent operations are performed. Otherwise, the recognition operation can be ended. After it is determined that the aforementioned picture recognition template can recognize the target picture (that is, the aforementioned picture recognition template matches the target picture), image correction can be performed on the target picture.
  • the method of correcting the picture may include correcting the deformation of the target picture, correcting the sharpness of the target picture, etc.
  • the picture may also be preprocessed before the picture is recognized to facilitate subsequent recognition and text meaning extraction. It is understandable that before the image correction, it can be judged whether the target picture to be recognized needs image correction. If the overlap ratio of the target text area and the text area of the target template reaches a preset value, confirm that the meaning of the text information in the target text area is that the overlap ratio is used to reflect the target text area and the The overlap degree of the text area of the target template.
  • the text information of the target picture is obtained; according to the anchor point area preset in each of the M picture recognition templates, it is detected whether the text information is included
  • the text information in the anchor area of the key text field matches.
  • the text information may include words, symbols, words, or positions of symbols in the target map, and so on.
  • the overlap rate of the anchor point area of the picture recognition template and the target anchor point area of the target picture exceeds a preset value (for example, 90%), and it is determined that the picture recognition template can recognize the target picture.
  • the method further includes: preprocessing the original picture corresponding to the target picture, the preprocessing includes adjusting the definition of the original picture, and/or removing and The anchor area and the image background irrelevant to the text area.
  • the method further includes: determining whether the target picture needs image correction; if the target picture needs image correction, Then image correction is performed on the target picture.
  • the method of changing the image deformation in the image correction method may include perspective transformation, affine transformation, etc.; wherein the specific steps of perspective transformation may include: perspective transformation through selected points; or, determining text through Hough transformation The boundary of the carrier is then transformed into perspective. Since the photos taken on the text carrier have a certain angle of perspective, which causes deformation on the carrier, restoring these perspective deformations (that is, by projecting the original image onto a new forward viewing plane) can better determine the target text area .
  • the original picture On the original picture, four points need to be selected and four points corresponding to the points of the original picture need to be selected in the target space (that is, the picture recognition template used to recognize the target picture) to perform subsequent transformation matrix calculations.
  • the target space that is, the picture recognition template used to recognize the target picture.
  • the method of selecting four coordinate points may include random selection, or selecting four points from the text of the original picture.
  • (u, v) is the coordinates of a point in the original picture.
  • (u, v) is expressed as [u, v, 1];
  • x is the point after perspective transformation The abscissa of, y is the ordinate of the point after perspective transformation;
  • [x′,y′,z′] is expressed as a three-dimensional space; Is a 3 ⁇ 3 transformation matrix;
  • the point (u, v) is projected into the three-dimensional space (x', y', z') by formula (1), and then mapped from the three-dimensional space to the new two-dimensional space to complete the perspective transformation of the point coordinates.
  • the first row and the second row of the 3 ⁇ 3 transformation matrix are used for affine transformation (linear transformation and translation), and the third row is used for perspective transformation.
  • formula (3) can be obtained as follows:
  • Step S204 Obtain the text meaning of the text information in the target text area according to the defined text meaning.
  • the text meaning of the preset text information is combined to obtain the text meaning of the target text information.
  • the name such as Zhang San
  • the ID number such as 3241569855632145867
  • the preset name to represent the name of the ID holder
  • the ID number represents the The holder’s identification number obtained the actual meaning represented by "Zhang San” and "3241569855632145867".
  • the method of determining the target text area may include determining the text of the target image by using the relative position of the anchor area and the target text area in the image recognition template corresponding to the target image according to the target image and the anchor area of the target image that meet the recognition requirements Area; identification requirements can include image clarity, degree of deformation, degree of offset, etc.
  • the picture recognition template The positional correspondence between the anchor point area and the text area, and determining the target text area for recognition in the target picture includes: recognizing the target picture according to universal optical character recognition OCR, and confirming the text information of the target picture; The anchor point area of the target picture recognition template found in the plurality of picture recognition templates matches the target picture; the target is determined by the positional correspondence between the anchor point area and the text area in the target picture recognition template The target text area for recognition in the picture.
  • the obtaining the text meaning of the text information in the target text area according to the defined text meaning includes: the difference between the target text area and the text area of the target picture recognition template When the coincidence rate reaches the preset value, the meaning of the text information in the target text area is confirmed. It is understandable that when the overlap rate of the target text area of the target image to be recognized and the target text area of the recognition template reaches a preset value, text meaning recognition can be performed. For example: two text areas have a coincidence rate of more than 90%; the text recognized in the text area in the target picture is spliced, and the recognition result of the target text area is output according to the interpretation of the meaning of the text in the picture recognition template. Take ID card recognition as an example for description. After detecting the overlap rate of ID card number areas of the image recognition template and the target image to a preset value, stitch each digital text in the text area and combine the preset text in the template. Meaning, output the number string as ID number.
  • the pre-defined anchor area is used to match a picture recognition template suitable for the picture to be recognized, and then the target text area is located according to the anchor area of the selected picture recognition template, and the target text area is obtained.
  • the text information in the target image is combined with the preset text meaning to obtain the meaning of the text information in the target text area in the target picture.
  • the embodiment of this application mainly uses the anchor point area in the image recognition template to accurately locate the target text area to improve the accuracy and precision of the positioning of the text to be recognized, and avoid obtaining irrelevant information; after obtaining the target text, the target text area is preset Interpretation of the meaning of the text, feedback the recognized text meaning and text content to the user, without long-period template customization, saving cost and time, and can be identified for a specific text carrier.
  • the embodiments of the present application not only solve the problem that general OCR recognition cannot obtain the meaning of the image text, but also eliminate the need for in-depth customization of a specific text carrier and a large amount of training data for the text carrier, and use a short development cycle to support the lack of ready-made
  • the style of the template can quickly obtain the textual meaning with a certain regularity and form.
  • the text recognition device 40 may include a generating unit 401, a positioning unit 402, a reading unit 403, a matching unit 404 and a correction unit 405.
  • the optional units include a matching unit 404 and a correction unit 405.
  • the generating unit 401 is configured to generate a picture recognition template according to a preset anchor area and its corresponding text area; the anchor area and the text area have a positional correspondence relationship, and the text area includes a text area that defines the meaning of the text.
  • Text information is configured to generate a picture recognition template according to a preset anchor area and its corresponding text area; the anchor area and the text area have a positional correspondence relationship, and the text area includes a text area that defines the meaning of the text.
  • the positioning unit 402 is configured to determine the target according to the positional correspondence between the anchor area and the text area in the image recognition template when the target picture matches the anchor area of the picture recognition template The target text area for recognition in the picture;
  • the reading unit 403 is configured to obtain the text meaning of the text information in the target text area according to the defined text meaning.
  • the device 40 further includes: a matching unit 404, configured to recognize through the picture when the positioning unit matches the anchor region of the picture recognition template in the target picture The corresponding relationship between the anchor point area and the text area in the template. Before determining the target text area to be recognized in the target picture, recognize the target picture according to the universal optical character recognition OCR, and confirm the text of the target picture Information; by judging whether the text information of the target picture matches the anchor point area of the picture recognition template.
  • a matching unit 404 configured to recognize through the picture when the positioning unit matches the anchor region of the picture recognition template in the target picture The corresponding relationship between the anchor point area and the text area in the template.
  • the matching unit 404 further includes a judging unit 406 for judging whether the text information of the target picture matches the anchor area of the picture recognition template;
  • the judgment unit 406 is specifically configured to:
  • the positioning unit 402 is specifically configured to: recognize the target image according to the universal optical character recognition OCR, and confirm the text of the target image Information; find the anchor area of the target picture recognition template from the plurality of picture recognition templates to match the target picture; determine the position correspondence between the anchor area and the text area in the target picture recognition template The target text area for recognition in the target picture.
  • the anchor point area in the picture recognition template is a plurality of anchor point areas;
  • the positioning unit 402 further includes a text positioning unit 407 for recognizing all the points in the template through the target picture The positional correspondence between the anchor point area and the text area, and determine the target text area for recognition in the target picture;
  • the text positioning unit 407 is specifically configured to: when the degree of matching between the multiple anchor point regions in the target picture and the multiple anchor point regions of the target picture recognition template is greater than a preset threshold, according to the target picture recognition template The position correspondence between each anchor point area in the plurality of anchor point areas and the corresponding text area determines the target text area for recognition in the target picture.
  • the device 40 further includes a searching unit 408, configured to:
  • the target picture recognition template When the target picture recognition template cannot be found from the plurality of picture recognition templates, the target picture recognition template is acquired from the first device according to the text information of the target picture, and the first device stores the target Picture recognition template equipment.
  • the device further includes: a correction unit 405, configured to determine whether the target picture needs image correction; if the target picture needs image correction, perform image correction on the target picture Correct.
  • a correction unit 405 configured to determine whether the target picture needs image correction; if the target picture needs image correction, perform image correction on the target picture Correct.
  • the reading unit 403 is specifically configured to confirm the target when the overlap ratio of the target text area and the text area of the target picture recognition template reaches a preset value. The meaning of the text information in the text area.
  • the device 40 further includes a preprocessing unit 409, configured to:
  • the preprocessing includes adjusting the definition of the original picture, and/or removing the image background irrelevant to the anchor area and the text area in the original picture.
  • An embodiment of the present application further provides a computer storage medium, wherein the computer storage medium may store a program, and the program includes part or all of the steps of any one of the above method embodiments when executed.
  • the embodiments of the present application also provide a computer program, the computer program including instructions, when the computer program is executed by a computer, the computer can execute part or all of the steps in the text recognition method in the above method embodiment.
  • FIG. 5 is a schematic structural diagram of a network device provided by an embodiment of the present application.
  • the text recognition apparatus can be implemented in the structure of FIG.
  • the network device 50 may include at least one storage component 501, at least one processing component 502, and at least one communication component 503.
  • the device may also include general components such as an antenna and a power supply, which are not described in detail here.
  • the storage component 501 may include one or more storage units, and each unit may include one or more memories.
  • the storage component can be used to store programs and various data, and can complete the program or data at high speed and automatically during the operation of the device. access.
  • a physical device with two stable states can be used to store information, and the two stable states are represented as "0" and "1" respectively.
  • the aforementioned storage component 501 may be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, a random access memory (RAM) or can store information and instructions
  • ROM read-only memory
  • RAM random access memory
  • Other types of dynamic storage devices can also be Electrically Erasable Programmable Read-Only Memory (EEPROM), CD-ROM (Compact Disc Read-Only Memory, CD-ROM) or other optical disk storage
  • CD storage can include compressed CDs, laser disks, CDs, digital versatile CDs, Blu-ray CDs, etc.
  • disk storage media or other magnetic storage devices or can be used to carry or store desired program codes in the form of instructions or data structures And any other media that can be accessed by the computer, but not limited to this.
  • the memory can exist independently and is connected to the processor through a bus.
  • the memory can also be integrated with the processor.
  • the processing component 502 may also be referred to as a processor, a processing unit, a processing board, a processing module, a processing device, and the like.
  • the processing component can be a central processing unit (CPU), a network processor (NP) or a combination of CPU and NP, or a microprocessor, application-specific integrated circuit (ASIC) ), or one or more integrated circuits used to control the execution of the program above.
  • CPU central processing unit
  • NP network processor
  • ASIC application-specific integrated circuit
  • the communication component 503 which may also be called a transceiver, or a transceiver, may be used to communicate with other devices or a communication network, and may include a unit used for wireless, wired, or other communication methods.
  • the processing component 502 is configured to call the data of the storage component 501 to execute the relevant description of the method described in FIG. 2, and details are not repeated here.
  • the disclosed device may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the above-mentioned units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or integrated. To another system, or some features can be ignored, or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical or other forms.
  • the units described as separate parts may or may not be physically separated, and the parts displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed in multiple locations.
  • Network unit Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments of the present application.
  • the functional components in the various embodiments of the present application may be integrated into one component, or each component may exist alone physically, or two or more components may be integrated into one component.
  • the above-mentioned integrated components can be implemented in the form of hardware or software functional units.
  • the integrated component is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of this application is essentially or the part that contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium It includes several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program code .
  • the size of the sequence numbers of the above-mentioned processes does not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, rather than corresponding to the embodiments of the present application.
  • the implementation process constitutes any limitation.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Character Input (AREA)

Abstract

Disclosed are a text recognition method and apparatus. The method comprises: generating a picture recognition template according to a preset anchor area and a text area corresponding thereto, there being a positional correspondence between the anchor area and the text area, and the text area comprises text information that defines a text meaning; if a target picture matches an anchor area of the picture recognition template, determining, according to the positional correspondence between the anchor area and the text area in the picture recognition template, a target text area to be recognized in the target picture; and obtaining the text meaning of the text information in the target text area according to the defined text meaning. By use of embodiments of the present application, the text meaning of the text information in a picture to be recognized can be quickly and efficiently obtained.

Description

一种文本识别方法及装置Method and device for text recognition
本申请要求于2019年2月27日提交中国专利局、申请号为201910146626.6、申请名称为“一种文本识别方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on February 27, 2019, the application number is 201910146626.6, and the application title is "a text recognition method and device", the entire content of which is incorporated into this application by reference .
技术领域Technical field
本申请涉及计算机技术领域,尤其涉及一种文本识别方法及装置。This application relates to the field of computer technology, in particular to a text recognition method and device.
背景技术Background technique
目前,常见的光学字符识别业务分为两种:通用光学字符识别(Optical Character Recognition,OCR)和特定场景的OCR。在理论上,通用OCR可以识别出图片包含的所有字符,然而它的识别精度低;由于通用OCR本身的特性,不能获得图片中字符的具体含义。虽然特定场景的OCR针对特定图片的识别效果优于通用OCR,而且可以获得图中字符的含义,但是特定场景的OCR仅支持包含指定文本载体的图片(例如包含身份证、驾驶证或者增值税***等内容的图片),并且使用特定场景的OCR需要针对目标类型的图片进行模板的定制,并需要根据大量该类型图片进行模型的训练,导致生成模板的周期较长。同时,文本载体种类丰富,制作多个识别模板囊括多种文本载体的成本过高,而且有些识别模板市场需求量低,导致成本回收困难。At present, common optical character recognition services are divided into two types: general optical character recognition (Optical Character Recognition, OCR) and specific scene OCR. In theory, universal OCR can recognize all the characters contained in the picture, but its recognition accuracy is low; due to the characteristics of the universal OCR itself, the specific meaning of the characters in the picture cannot be obtained. Although OCR for a specific scene is better than general OCR for specific pictures, and the meaning of the characters in the picture can be obtained, OCR for a specific scene only supports pictures containing specified text carriers (such as ID cards, driving licenses or value-added tax invoices) And other content pictures), and the use of OCR in a specific scene requires template customization for the target type of picture, and model training needs to be performed on a large number of pictures of this type, resulting in a long template generation cycle. At the same time, there are many types of text carriers, the cost of producing multiple recognition templates to include multiple text carriers is too high, and the market demand for some recognition templates is low, which makes cost recovery difficult.
因此,如何快速有效地识别待识别图片中文本载体的文本含义,是本申请亟待解决的问题。Therefore, how to quickly and effectively recognize the textual meaning of the text carrier in the image to be recognized is a problem to be solved in this application.
发明内容Summary of the invention
本申请实施例提供一种文本识别方法及装置,可以快速有效地识别待识别图片中文本载体的文本含义。The embodiments of the present application provide a text recognition method and device, which can quickly and effectively recognize the text meaning of a text carrier in a picture to be recognized.
第一方面,本申请实施例提供了一种文本识别方法,该方法可包括:In the first aspect, an embodiment of the present application provides a text recognition method, which may include:
根据预设的锚点区域及其对应的文本区域,生成图片识别模板;所述锚点区域与所述文本区域存在位置对应关系,所述文本区域包括定义了文本含义的文本信息;Generating a picture recognition template according to a preset anchor area and its corresponding text area; the anchor area and the text area have a positional correspondence relationship, and the text area includes text information defining the meaning of the text;
在目标图片匹配有所述图片识别模板的锚点区域的情况下,通过所述图片识别模板中所述锚点区域与所述文本区域的位置对应关系,确定所述目标图片中进行识别的目标文本区域;In the case that the target picture matches the anchor point area of the picture recognition template, the target picture to be recognized in the target picture is determined by the positional correspondence between the anchor point area and the text area in the picture recognition template Text area
根据定义的所述文本含义得到所述目标文本区域中文本信息的文本含义。Obtain the text meaning of the text information in the target text area according to the defined text meaning.
本申请实施例既解决通用OCR识别不能获得图片文本含义的问题,同时也免去了对特定文本载体的深度定制以及大量该文本载体训练数据的需求,利用较短的开发周期支持没有现成模板的样式,又能 快速地获得具有一定规律和形式的文本含义。The embodiment of this application not only solves the problem that general OCR recognition cannot obtain the meaning of picture text, but also eliminates the need for deep customization of a specific text carrier and a large amount of training data of the text carrier, and uses a shorter development cycle to support those without ready-made templates The style can quickly obtain the textual meaning with a certain pattern and form.
第二方面,本申请实施例提供了一种文本识别装置,该装置可包括:In the second aspect, an embodiment of the present application provides a text recognition device, which may include:
生成单元,用于根据预设的锚点区域及其对应的文本区域,生成图片识别模板;所述锚点区域与所述文本区域存在位置对应关系,所述文本区域包括定义了文本含义的文本信息;The generating unit is configured to generate a picture recognition template according to the preset anchor point area and its corresponding text area; the anchor point area and the text area have a positional correspondence relationship, and the text area includes text defining the meaning of the text information;
定位单元,用于在目标图片匹配有所述图片识别模板的锚点区域的情况下,通过所述图片识别模板中所述锚点区域与所述文本区域的位置对应关系,确定所述目标图片中进行识别的目标文本区域;The positioning unit is configured to determine the target picture according to the positional correspondence between the anchor area and the text area in the picture recognition template when the target picture matches the anchor area of the picture recognition template The target text area for recognition in
读取单元,用于根据定义的所述文本含义得到所述目标文本区域中文本信息的文本含义。The reading unit is configured to obtain the text meaning of the text information in the target text area according to the defined text meaning.
第三方面,本申请实施例提供了一种网络设备,包括处理器和存储器,所述处理器和所述存储器相互连接,其中,所述存储器用于存储计算机程序,所述计算机程序包括程序指令,所述处理器被配置用于调用所述程序指令,执行第一方面所述的方法。In a third aspect, an embodiment of the present application provides a network device, including a processor and a memory, the processor and the memory are connected to each other, wherein the memory is used to store a computer program, and the computer program includes program instructions The processor is configured to call the program instructions to execute the method described in the first aspect.
第四方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行第一方面所述的方法。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium that stores a computer program. The computer program includes program instructions that, when executed by a processor, cause all The processor executes the method described in the first aspect.
附图说明Description of the drawings
为了更清楚地说明本申请实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍。In order to more clearly describe the technical solutions of the embodiments of the present application, the following will briefly introduce the accompanying drawings used in the description of the embodiments.
图1是本申请实施例提供的一种文本识别的***架构示意图;FIG. 1 is a schematic diagram of a text recognition system architecture provided by an embodiment of the present application;
图2是本申请实施例提供的一种文本识别方法的流程示意图;FIG. 2 is a schematic flowchart of a text recognition method provided by an embodiment of the present application;
图3是本申请实施例提供的一种透视变换示意图;FIG. 3 is a schematic diagram of perspective transformation provided by an embodiment of the present application;
图4是本申请实施例提供的一种文本识别装置的结构示意图;4 is a schematic structural diagram of a text recognition device provided by an embodiment of the present application;
图5是本申请实施例提供的一种网络设备的结构示意图。Fig. 5 is a schematic structural diagram of a network device provided by an embodiment of the present application.
具体实施方式detailed description
本申请实施例提供一种文本识别方法及装置,可以快速有效地识别待识别图片中文本载体的文本含 义。The embodiments of the present application provide a text recognition method and device, which can quickly and effectively recognize the text meaning of a text carrier in a picture to be recognized.
本申请说明书、权利要求书和附图中出现的术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、***、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。此外,术语“第一”、“第二”和“第三”等是用于区别不同的对象,而并非用于描述特定的顺序。下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。The terms "including" and "having" appearing in the specification, claims and drawings of this application and any variations thereof are intended to cover non-exclusive inclusions. For example, a process, method, system, product, or device that includes a series of steps or units is not limited to the listed steps or units, but optionally includes unlisted steps or units, or optionally also includes Other steps or units inherent to these processes, methods, products or equipment. In addition, the terms "first", "second", "third", etc. are used to distinguish different objects, but not to describe a specific sequence. The technical solutions in the embodiments of the present application will be described below in conjunction with the drawings in the embodiments of the present application.
首先,对本申请中的部分用语进行解释说明,以便于本领域技术人员理解。First, some terms in this application are explained to facilitate the understanding of those skilled in the art.
(1)光学字符识别(Optical Character Recognition,OCR),是指电子设备(例如扫描仪或数码相机)检查纸上打印的字符,通过检测暗、亮的模式确定其形状,然后用字符识别方法将形状翻译成计算机文字的过程;即,针对印刷体字符,采用光学的方式将纸质文档中的文字转换成为黑白点阵的图像文件,并通过识别软件将图像中的文字转换成文本格式,供文字处理软件进一步编辑加工的技术。如何除错或利用辅助信息提高识别正确率,是OCR最重要的课题,ICR(Intelligent Character Recognition)的名词也因此而产生。衡量一个OCR***性能好坏的主要指标有:拒识率、误识率、识别速度、用户界面的友好性,产品的稳定性,易用性及可行性等。(1) Optical Character Recognition (OCR) means that electronic devices (such as scanners or digital cameras) check the characters printed on paper, determine their shape by detecting dark and light patterns, and then use character recognition methods to The process of translating shapes into computer text; that is, for printed characters, the text in a paper document is optically converted into a black and white dot matrix image file, and the text in the image is converted into a text format through recognition software. Word processing software further editing and processing technology. How to debug errors or use auxiliary information to improve the recognition accuracy is the most important topic of OCR, and the term ICR (Intelligent Character Recognition) is also born. The main indicators to measure the performance of an OCR system are: rejection rate, false recognition rate, recognition speed, user-friendliness, product stability, ease of use and feasibility, etc.
(2)透视变换(Perspective Transformation),是指利用透视中心、像点、目标点三点共线的条件,按透视旋转定律使承影面(透视面)绕迹线(透视轴)旋转某一角度,破坏原有的投影光线束,仍能保持承影面上投影几何图形不变的变换。(2) Perspective transformation (Perspective Transformation) refers to the use of the three-point collinear condition of the perspective center, the image point, and the target point to rotate the bearing surface (perspective surface) around the trace (perspective axis) according to the law of perspective rotation. Angle, destroy the original projection light beam, and still maintain the constant transformation of the projection geometry on the shadow bearing surface.
下面先对本申请实施例所基于的其中一种***架构进行描述,本申请提出的文本识别方法可以应用于该***架构。请参见图1,图1是本申请实施例提供的一种文本识别的***架构示意图,如图1所示,该***架构包含了终端、目标图片和文本含义;本申请实施例中提及的终端可为手机、平板电脑、笔记本电脑、掌上电脑、移动互联网设备或其他移动终端;其中,The following first describes one of the system architectures based on the embodiments of this application, and the text recognition method proposed in this application can be applied to this system architecture. Please refer to FIG. 1. FIG. 1 is a schematic diagram of a text recognition system architecture provided by an embodiment of the present application. As shown in FIG. 1, the system architecture includes a terminal, a target picture, and text meaning; The terminal can be a mobile phone, tablet computer, notebook computer, palmtop computer, mobile Internet device or other mobile terminal; among them,
终端,可以是计算机网络中处于网络最***的设备,也可以用于信息的输入以及处理结果的输出等。也可以称为***、用户单元、用户站、移动站、移动台、远方站、远程终端、移动设备、用户终端、移动终端、无线通信设备、用户代理、用户装置、可安装插件的服务设备或用户设备(user equipment,UE)。例如,终端可以是蜂窝电话、移动电话、无绳电话、智能手表、可穿戴设备(wearable device)、平板设备、会话启动协议(session initiation protocol,SIP)电话、无线本地环路(wireless local loop,WLL)站、个人数字助手(personal digital assistant,PDA)、具备无线通信功能的手持设备、计算设备、 车载通信模块、智能电表或连接到无线调制解调器的其它处理设备。在本申请实施例中,终端可以执行如下操作:根据预设的锚点区域及其对应的文本区域,生成图片识别模板;所述锚点区域与所述文本区域存在位置对应关系,所述文本区域包括定义了文本含义的文本信息;在目标图片匹配有所述图片识别模板的锚点区域的情况下,通过所述图片识别模板中所述锚点区域与所述文本区域的位置对应关系,确定所述目标图片中进行识别的目标文本区域;根据定义的所述文本含义得到所述目标文本区域中文本信息的文本含义。A terminal can be a peripheral device in a computer network, and can also be used for input of information and output of processing results. It can also be referred to as a system, user unit, user station, mobile station, mobile station, remote station, remote terminal, mobile device, user terminal, mobile terminal, wireless communication device, user agent, user device, service device with plug-in installation or User equipment (user equipment, UE). For example, the terminal may be a cellular phone, mobile phone, cordless phone, smart watch, wearable device, tablet device, session initiation protocol (SIP) phone, wireless local loop (wireless local loop, WLL) ) Station, personal digital assistant (PDA), handheld device with wireless communication function, computing device, in-vehicle communication module, smart meter or other processing device connected to wireless modem. In the embodiment of the present application, the terminal may perform the following operations: generate a picture recognition template according to a preset anchor area and its corresponding text area; there is a positional correspondence between the anchor area and the text area, and the text The area includes text information that defines the meaning of the text; in the case where the target picture matches the anchor area of the picture recognition template, the position correspondence between the anchor area and the text area in the picture recognition template is used, Determine the target text area for recognition in the target picture; obtain the text meaning of the text information in the target text area according to the defined text meaning.
目标图片,是需要识别出文本含义的待识别图片;图片中包括了文本载体。其中,文本载体为文本信息的实物载体。获得目标图片中文本信息的含义是本申请实施例的目的。图片类型可以包括身份证照片、***照片、驾驶证照片等;图中以身份证照片为目标图片为例进行说明,身份证照片中的身份证为文本载体,姓名和身份证号码为文本信息;通过本申请实施例识别图片能够得到文本含义,即通过识别目标图片,直接获得了该身份证拥有人的姓名以及该拥有人的身份证号码。The target picture is the picture to be recognized that needs to recognize the meaning of the text; the picture includes the text carrier. Among them, the text carrier is the physical carrier of text information. Obtaining the meaning of the text information in the target picture is the purpose of the embodiments of this application. Picture types can include ID card photos, invoice photos, driver's license photos, etc.; in the figure, the ID card photo is taken as an example for illustration. The ID card in the ID card photo is the text carrier, and the name and ID number are text information; The textual meaning can be obtained by recognizing the picture through the embodiment of the application, that is, by recognizing the target picture, the name of the ID card owner and the ID number of the owner can be directly obtained.
文本含义,是通过本申请实施例识别,期望获得的目标图片中文本信息的实际含义。识别出的文本含义不仅是一串数字或者字符,同时还反映该数字或者字符表达的内容。例如,从企业的***照片中识别出该企业的统一社会信用代码。The text meaning is the actual meaning of the text information in the target picture that is expected to be obtained through the identification of the embodiments of this application. The recognized text meaning is not only a string of numbers or characters, but also reflects the content expressed by the numbers or characters. For example, the unified social credit code of the company can be identified from the photo of the company's invoice.
可以理解的是,图1所示的内容只是本申请实施例中的一种示例性的实施方式。本申请实施例中的***架构可以包括但不仅限于以上***架构。It can be understood that the content shown in FIG. 1 is only an exemplary implementation in the embodiments of the present application. The system architecture in the embodiment of the present application may include but is not limited to the above system architecture.
下面结合上述***架构和本申请中提供的文本识别方法的实施例,对本申请中提出的技术问题进行具体分析和解决。In the following, in combination with the foregoing system architecture and the embodiments of the text recognition method provided in this application, the technical problems proposed in this application are specifically analyzed and resolved.
请参见图2,图2是本申请实施例提供的一种文本识别方法的流程示意图,文本识别方法可以应用于文本识别***(包括上述架构)。下面将结合图2,以终端为执行主体为例,从单侧进行描述,该方法可以包括以下步骤S201-步骤S204;其中,可选的步骤可以包括步骤S202。Please refer to FIG. 2. FIG. 2 is a schematic flowchart of a text recognition method provided by an embodiment of the present application. The text recognition method can be applied to a text recognition system (including the above architecture). In the following, with reference to FIG. 2, taking the terminal as the execution subject as an example, the method will be described from a single side. The method may include the following steps S201 to S204; wherein, the optional steps may include step S202.
步骤S201:根据预设的锚点区域及其对应的文本区域,生成图片识别模板。Step S201: Generate a picture recognition template according to the preset anchor area and its corresponding text area.
具体地,在生成一个图片识别模板的情况下,预设模板的锚点区域以及对应的文本区域,生成相应的识别模板。模板的锚点区域可以是一个或者多个。所述锚点区域与所述文本区域存在位置对应关系,所述文本区域包括定义了文本含义的文本信息。在生成多个图片识别模板的情况下,根据预设的锚点区域和预设的文本区域,分别生成M个图片识别模板,M为大于0的整数。其中,每个图片识别模板都有相应的锚点区域和文本区域,本申请对图片识别模板的锚点区域以及文本区域数量不做限定。例如,将身份证载体页面上的“姓名”以及“身份证号码”作为用于识别身份证信息的图片识别模板的锚点区 域,并设置“姓名”对应的待识别文本区域以及“身份证号码”对应的待识别文本区域,生成用于识别身份证信息的图片识别模板。Specifically, in the case of generating a picture recognition template, the anchor area and the corresponding text area of the template are preset to generate the corresponding recognition template. The anchor area of the template can be one or more. There is a positional correspondence between the anchor point area and the text area, and the text area includes text information defining the meaning of the text. In the case of generating multiple picture recognition templates, M picture recognition templates are generated respectively according to the preset anchor area and the preset text area, where M is an integer greater than 0. Among them, each picture recognition template has a corresponding anchor area and text area, and this application does not limit the number of anchor areas and text areas of the picture recognition template. For example, use the "name" and "ID number" on the ID card carrier page as the anchor area of the image recognition template used to identify ID information, and set the text area to be recognized corresponding to the "name" and "ID number" "Corresponding to the text area to be recognized, generate a picture recognition template for identifying ID card information.
在一种可能的实现方式中,根据M张标准图片生成所述M个图片识别模板。以身份证图片为例,身份证图片识别模板的标准图片中,图片清晰且没有严重的形变,符合以其为识别标准对其他目标图片进行识别的要求。In a possible implementation manner, the M picture recognition templates are generated according to M standard pictures. Taking an ID card image as an example, in the standard image of the ID card image recognition template, the image is clear and has no serious deformation, which meets the requirements for identifying other target images using it as the identification standard.
在一种可能的实现方式中,锚点区域可以包括文本区域内若干个文字或者若干个符号等文本内容,比如:增值税***的“增值税专用***”,身份证的“姓名”,行驶证的“中华人民共和国机动车行驶证”等。可以理解的是,预设的锚点区域可以为一个或者多个;预设多个分散的锚点区域有利于准确定位目标文本区域。In a possible implementation, the anchor area can include several texts or several symbols in the text area, such as: "VAT invoices" for value-added tax invoices, "names" of ID cards, driving certificates "The People's Republic of China Motor Vehicle Driving Permit" etc. It is understandable that there may be one or more preset anchor point areas; preset multiple scattered anchor point areas are beneficial to accurately locate the target text area.
在一种可能的实现方式中,文本区域可以包括文本所在的区域;文本区域形状可以由用户选择和设置;可选的,预设的文本区域数量可以为一个或者多个;在图片识别模板中,预先设定的文本区域内的文本含义,比如文本载体为银行卡的图片识别模板中目标文本区域内的16位至19位数字代表该银行卡的***,目标文本区域内的汉字“XXXX银行”代表银行名称等。In a possible implementation, the text area can include the area where the text is located; the shape of the text area can be selected and set by the user; optionally, the preset number of text areas can be one or more; in the image recognition template , The meaning of the text in the preset text area, for example, the text carrier is the image recognition template of a bank card. The 16 to 19 digits in the target text area represent the card number of the bank card, and the Chinese character "XXXX Bank" in the target text area "Represents the name of the bank, etc.
为了便于理解本申请实施例,以下示例性列举本申请中文本识别方法所应用的场景,下面以生成用于识别身份证的图片识别模板为例进行描述。In order to facilitate the understanding of the embodiments of the present application, the following exemplarily enumerate the application scenarios of the text recognition method in the present application, and the following describes the generation of a picture recognition template for identifying an ID card as an example.
首先,在一张标准身份证图片上,由用户选择锚点区域,比如“姓名”所在的区域和“公民身份号码”所在的区域;其中,所述标准身份证图片,包括清晰度、形变程度符合识别要求的模板图片;可以理解的是,每个锚点区域至少需要包括关键文本内容,如“姓名”区域需要包含了“姓名”字符;预设的锚点区域大小、形状为合理的,比如锚点区域的形状可以是合适大小的矩形,即该矩形区域至少包括了目标文本,但不会过大而导致包括了其他不必要的字符等文本内容。First, on a standard ID card picture, the user selects the anchor area, such as the area where the "name" is located and the area where the "citizen ID number" is located; among them, the standard ID card image includes definition and degree of deformation A template picture that meets the recognition requirements; it is understandable that each anchor area needs to include at least key text content, such as the "name" area needs to contain the "name" characters; the preset anchor area size and shape are reasonable, For example, the shape of the anchor point area may be a rectangle of a suitable size, that is, the rectangular area at least includes the target text, but it is not too large to include other unnecessary characters and other text content.
其次,由用户选择文本区域,并可以由用户设置文本区域内的文本含义。可以理解的是,在用户选择目标文本区域之后,确认目标文本区域相对于锚点区域的位置。Secondly, the user selects the text area, and the user can set the meaning of the text in the text area. It is understandable that after the user selects the target text area, the position of the target text area relative to the anchor point area is confirmed.
最后,在设置锚点区域和设置目标文本区域之后,生成相应的图片识别模板并保存所述图片识别模板。Finally, after setting the anchor point area and setting the target text area, a corresponding picture recognition template is generated and the picture recognition template is saved.
可以理解的是,上述的应用场景只是本申请实施例中的示例性的实施方式,本申请实施例中的应用场景包括但不仅限于以上应用场景。It can be understood that the above application scenarios are only exemplary implementations in the embodiments of the present application, and the application scenarios in the embodiments of the present application include but are not limited to the above application scenarios.
步骤S202:根据通用光学字符识别OCR识别所述目标图片,确认所述目标图片的文本信息;通过判断所述目标图片的文本信息是否匹配有所述图片识别模板的锚点区域。Step S202: Recognize the target picture according to the universal optical character recognition OCR, and confirm the text information of the target picture; by judging whether the text information of the target picture matches the anchor area of the picture recognition template.
具体地,利用通用光学字符识别目标图片中的文本信息;根据获得目标图片的文本信息,判断其文 本信息是否与图片识别模板的锚点区域中文本信息是否一致。以身份证识别为例,通过通用OCR识别获得目标图片中所有的文字或者字符的信息以及相对位置;以识别出特定位置的“姓名”和“身份证号码”去匹配图片识别模板的锚点区域,判断模板的锚点区域内是否有相同的信息。Specifically, general optical characters are used to recognize the text information in the target picture; according to the obtained text information of the target picture, it is determined whether the text information is consistent with the text information in the anchor area of the picture recognition template. Take ID card recognition as an example, obtain all the text or character information and relative positions in the target image through general OCR recognition; to identify the "name" and "ID number" of a specific location to match the anchor area of the image recognition template , To determine whether there is the same information in the anchor area of the template.
步骤S203:在目标图片匹配有所述图片识别模板的锚点区域的情况下,通过所述图片识别模板中所述锚点区域与所述文本区域的位置对应关系,确定所述目标图片中进行识别的目标文本区域。Step S203: In the case that the target picture matches the anchor point area of the picture recognition template, determine that the process is performed in the target picture according to the position correspondence between the anchor point area and the text area in the picture recognition template. The recognized target text area.
具体地,根据所述目标图片的锚点区域匹配所述M个图片识别模板中每一个图片识别模板的锚点区域;如果目标锚点区域与M个图片识别模板中的某一个图片识别模板的锚点区域匹配,则确认可以使用该图片识别模板识别该目标图片,否则判断没有合适的图片识别模板可以识别。可选地,在没有存在预设识别模板的情况下,根据初步识别到的图片关键特征,如图片中包含的票据的名字等,通过网络或者其他数据交互方式,从网上或者其他存储有图片识别模板的设备获取需要的识别模板。如果只有一个图片识别模板,根据唯一的图片识别模板匹配目标图片,判断是否可以进行识别。可以理解的是,匹配的过程可以包括检测锚点区域内信息是否一致,检测锚点区域大小是否一致等。Specifically, according to the anchor point area of the target picture, the anchor point area of each picture recognition template in the M picture recognition templates is matched; if the target anchor point area matches a certain picture recognition template of the M picture recognition templates If the anchor point area matches, it is confirmed that the image recognition template can be used to recognize the target picture, otherwise it is determined that no suitable image recognition template can be recognized. Optionally, in the absence of a preset recognition template, based on the key features of the image initially recognized, such as the name of the bill contained in the image, through the Internet or other data interaction methods, from the Internet or other stored image recognition The template device obtains the required identification template. If there is only one picture recognition template, the target picture is matched according to the unique picture recognition template to determine whether it can be recognized. It is understandable that the matching process may include detecting whether the information in the anchor point area is consistent, and detecting whether the size of the anchor point area is consistent, and so on.
在一种可能的实现方式中,所述通过判断所述目标图片的文本信息是否匹配有所述图片识别模板的锚点区域,包括:判断所述目标图片的文本信息对应的区域是否与所述图片识别模板的锚点区域吻合;若是,检测所述目标图片的文本信息和所述图片识别模板的锚点区域包含的文本信息的相似度是否超过预设阈值,若是,确认所述目标图片的文本信息匹配有所述图片识别模板的锚点区域。In a possible implementation manner, the judging whether the text information of the target picture matches the anchor area of the picture recognition template includes: judging whether the area corresponding to the text information of the target picture matches the The anchor point area of the picture recognition template matches; if so, it is detected whether the similarity between the text information of the target picture and the text information contained in the anchor point area of the picture recognition template exceeds a preset threshold; The text information matches the anchor point area of the picture recognition template.
在一种可能的实现方式中,所述图片识别模板中的锚点区域为多个锚点区域;所述通过所述目标图片识别模板中所述锚点区域与所述文本区域的位置对应关系,确定所述目标图片中进行识别的目标文本区域,包括:当所述目标图片中的多个锚点区域与所述目标图片识别模板的多个锚点区域匹配程度大于预设阈值,根据所述目标图片识别模板的多个锚点区域中每一个锚点区域与对应文本区域的位置对应关系,确定所述目标图片中进行识别的目标文本区域。In a possible implementation manner, the anchor point area in the picture recognition template is a plurality of anchor point areas; the position correspondence between the anchor point area and the text area in the target picture recognition template , Determining the target text area for recognition in the target picture includes: when the degree of matching between the multiple anchor point areas in the target picture and the multiple anchor point areas of the target picture recognition template is greater than a preset threshold, According to the positional correspondence between each anchor point area of the plurality of anchor point areas of the target picture recognition template and the corresponding text area, the target text area for recognition in the target picture is determined.
在一种可能的实现方式中,所述方法还包括:当从所述多个图片识别模板中无法查找所述目标图片识别模板,根据所述目标图片的文本信息,从第一设备获取所述目标图片识别模板,所述第一设备为存储所述目标图片识别模板的设备。In a possible implementation, the method further includes: when the target picture recognition template cannot be found from the plurality of picture recognition templates, obtaining the target picture from the first device according to the text information of the target picture The target picture recognition template, and the first device is a device storing the target picture recognition template.
为了便于理解本申请实施例,以下示例性列举本申请中文本识别方法所应用的场景,下面以识别身份证的识别模板为例进行描述。In order to facilitate the understanding of the embodiments of the present application, the following exemplarily enumerate the application scenarios of the text recognition method in the present application, and the following describes the recognition template of the identification card as an example.
在保存的M个图片识别模板中,用于识别身份证的图片识别模板的锚点区域1为包含“姓名”的区域,锚点区域2为包含“公民身份号码”的区域;当目标图片是文本载体为身份证的图片,且目标图片中的文本信息包括了“姓名”和“公民身份号码”的情况下,确定该图片识别模板进行图像识别,并 进行后续操作。否则,可以结束识别操作。在确定了前述图片识别模板能够识别目标图片(即前述图片识别模板匹配该目标图片)之后,可以对目标图片进行图像矫正。可选地,矫正图片的方法可以包括矫正目标图片的形变、矫正目标图片的清晰度等,在对图片识别之前还可以对图片进行预处理,便于后续的识别以及文本含义的提取。可以理解的是,在图像矫正之前,可以判断待识别的目标图片是否需要进行图片矫正。若所述目标文本区域与所述目标模板的文本区域的重合率达到预设值,确认所述目标文本区域中文本信息的含义是,所述重合率用于反映所述目标文本区域与所述目标模板的文本区域的重合程度。Among the saved M image recognition templates, the anchor area 1 of the image recognition template used to identify the ID card is the area containing "name", and the anchor area 2 is the area containing the "citizen ID number"; when the target image is When the text carrier is a picture of an ID card, and the text information in the target picture includes "name" and "citizen ID number", the picture recognition template is determined for image recognition, and subsequent operations are performed. Otherwise, the recognition operation can be ended. After it is determined that the aforementioned picture recognition template can recognize the target picture (that is, the aforementioned picture recognition template matches the target picture), image correction can be performed on the target picture. Optionally, the method of correcting the picture may include correcting the deformation of the target picture, correcting the sharpness of the target picture, etc. The picture may also be preprocessed before the picture is recognized to facilitate subsequent recognition and text meaning extraction. It is understandable that before the image correction, it can be judged whether the target picture to be recognized needs image correction. If the overlap ratio of the target text area and the text area of the target template reaches a preset value, confirm that the meaning of the text information in the target text area is that the overlap ratio is used to reflect the target text area and the The overlap degree of the text area of the target template.
可以理解的是,上述的应用场景只是本申请实施例中的示例性的实施方式,本申请实施例中的应用场景包括但不仅限于以上应用场景。It can be understood that the above application scenarios are only exemplary implementations in the embodiments of the present application, and the application scenarios in the embodiments of the present application include but are not limited to the above application scenarios.
在一种可能的实现方式中,在目标图片经过通用光学字符识别OCR识别之后,获得目标图片的文本信息;根据M个图片识别模板中的每一个模板预设的锚点区域检测是否文本信息中的关键文本域锚点区域的文本信息匹配。其中,文本信息可以包括文字、符号、文字或者符号在目标图中的位置等。In a possible implementation manner, after the target picture is recognized by the universal optical character recognition OCR, the text information of the target picture is obtained; according to the anchor point area preset in each of the M picture recognition templates, it is detected whether the text information is included The text information in the anchor area of the key text field matches. Among them, the text information may include words, symbols, words, or positions of symbols in the target map, and so on.
在一种可能的实现方式中,图片识别模板的锚点区域与目标图片的目标锚点区域的重合率超过预设的数值(比如90%),判断该图片识别模板可以识别该目标图片。In a possible implementation manner, the overlap rate of the anchor point area of the picture recognition template and the target anchor point area of the target picture exceeds a preset value (for example, 90%), and it is determined that the picture recognition template can recognize the target picture.
在一种可能的实现方式中,所述方法还包括:预处理所述目标图片对应的原始图片,所述预处理包括调整所述原始图片的清晰度,和/或去除所述原始图片中与所述锚点区域以及所述文本区域无关的图像背景。In a possible implementation manner, the method further includes: preprocessing the original picture corresponding to the target picture, the preprocessing includes adjusting the definition of the original picture, and/or removing and The anchor area and the image background irrelevant to the text area.
在一种可能的实现方式中,所述目标图片匹配有所述图片识别模板的锚点区域之后,还包括:判断所述目标图片是否需要进行图像矫正;若所述目标图片需要进行图像矫正,则对所述目标图片进行图像矫正。In a possible implementation manner, after the target picture matches the anchor point area of the picture recognition template, the method further includes: determining whether the target picture needs image correction; if the target picture needs image correction, Then image correction is performed on the target picture.
可选地,图像矫正方法中改变图片形变的方法,可以包括透视变换、仿射变换等;其中,透视变换的具体步骤可以包括:通过选取的点进行透视变换;或者,通过霍夫变换确定文本载体的边界,再进行透视变换。由于对文本载体拍摄的照片都存在一定角度的透视从而对载体造成形变,将这些透视形变还原(即通过将原图投影到新的正向的视平面上)能够比较好地确定将目标文本区域。Optionally, the method of changing the image deformation in the image correction method may include perspective transformation, affine transformation, etc.; wherein the specific steps of perspective transformation may include: perspective transformation through selected points; or, determining text through Hough transformation The boundary of the carrier is then transformed into perspective. Since the photos taken on the text carrier have a certain angle of perspective, which causes deformation on the carrier, restoring these perspective deformations (that is, by projecting the original image onto a new forward viewing plane) can better determine the target text area .
以通过选取的点进行透视变换为例进行描述,如下:Take the perspective transformation through the selected points as an example for description, as follows:
先在原始图片(即待识别的目标图片)上建立坐标系;本申请对建立坐标系的过程(如选取原点以及确定x轴、y轴等)不做限定。First establish a coordinate system on the original picture (that is, the target picture to be recognized); this application does not limit the process of establishing the coordinate system (such as selecting the origin and determining the x-axis, y-axis, etc.).
在原始图片上,需要选取四个点以及在目标空间(即用于识别该目标图片的图片识别模板)选取与原始图片的点相对应的四个点,进行后续变换矩阵的计算。前述选取的四个点越分散,造成的误差影响 越小。可以理解的是,变换矩阵的求算,需要上述原始图片的四个点以及目标空间与其对应的四个点;求得变换矩阵后,就可以将原始图片映射成一个矫正后的正向图片。On the original picture, four points need to be selected and four points corresponding to the points of the original picture need to be selected in the target space (that is, the picture recognition template used to recognize the target picture) to perform subsequent transformation matrix calculations. The more scattered the four points selected above are, the smaller the error impact caused. It is understandable that the calculation of the transformation matrix requires the four points of the original picture and the four points corresponding to the target space; after the transformation matrix is obtained, the original picture can be mapped into a corrected forward picture.
可选地,在原始图片中,选取四个坐标点的方法可以包括随机选择,或者从原始图片的文本中选择四个点。Optionally, in the original picture, the method of selecting four coordinate points may include random selection, or selecting four points from the text of the original picture.
下面以原始图片中的一个点的坐标为例进行描述:The following describes the coordinates of a point in the original picture as an example:
Figure PCTCN2019088978-appb-000001
Figure PCTCN2019088978-appb-000001
Figure PCTCN2019088978-appb-000002
Figure PCTCN2019088978-appb-000002
其中,(u,v)是原始图片的一个点的坐标,为了不影响三维空间中点的坐标,将(u,v)表示为[u,v,1];x是经过透视变换后该点的横坐标,y是经过透视变换后该点的纵坐标;[x′,y′,z′]表示为三维空间;
Figure PCTCN2019088978-appb-000003
为3×3变换矩阵;
Among them, (u, v) is the coordinates of a point in the original picture. In order not to affect the coordinates of the point in the three-dimensional space, (u, v) is expressed as [u, v, 1]; x is the point after perspective transformation The abscissa of, y is the ordinate of the point after perspective transformation; [x′,y′,z′] is expressed as a three-dimensional space;
Figure PCTCN2019088978-appb-000003
Is a 3×3 transformation matrix;
通过式(1)将点(u,v)投影映射到三维空间(x′,y′,z′)中,从三维空间中再映射到新的二维空间中,完成点坐标的透视变换。其中,3×3变换矩阵的第一行、第二行用于仿射变换(线性变换与平移),第三行用于透视变换。The point (u, v) is projected into the three-dimensional space (x', y', z') by formula (1), and then mapped from the three-dimensional space to the new two-dimensional space to complete the perspective transformation of the point coordinates. Among them, the first row and the second row of the 3×3 transformation matrix are used for affine transformation (linear transformation and translation), and the third row is used for perspective transformation.
联立公式(1)、公式(2),可以得到公式(3),如下:Combining formula (1) and formula (2) together, formula (3) can be obtained as follows:
Figure PCTCN2019088978-appb-000004
Figure PCTCN2019088978-appb-000004
在公式(3)中,得到新的二维空间中该点的坐标(x′,y′)。In formula (3), the coordinates (x', y') of the point in the new two-dimensional space are obtained.
上述的例子对一个点的坐标进行计算,以此类推,可以计算选取的四个点的坐标;请参见图3,图3是本申请实施例提供的一种透视变换示意图,图中反映了将形变通过透视变换进行矫正。The above example calculates the coordinates of one point, and so on, the coordinates of the selected four points can be calculated; please refer to Figure 3, which is a schematic diagram of a perspective transformation provided by an embodiment of the present application, and the figure reflects the The deformation is corrected by perspective transformation.
步骤S204:根据定义的所述文本含义得到所述目标文本区域中文本信息的文本含义。Step S204: Obtain the text meaning of the text information in the target text area according to the defined text meaning.
具体地,在确定了目标文本区域之后,结合预设的文本信息的文本含义,得到所述目标文本信息的文本含义。例如,通过本申请实施例获得身份证的目标文本区域中的姓名(如张三),身份证号码(如3241569855632145867),结合预设的姓名表示身份证持有人的名字,身份证号码表示该持有人的识别号码,获得“张三”以及“3241569855632145867”代表的实际含义。其中,确定目标文本区域的方式可以包括根据符合识别要求的目标图片以及目标图片的锚点区域,利用目标图片对应的图片识别模板中锚点区域和目标文本区域的相对位置,确定目标图片的文本区域;识别要求可以包括图像清晰度,形变程度、偏移程度等。Specifically, after the target text area is determined, the text meaning of the preset text information is combined to obtain the text meaning of the target text information. For example, the name (such as Zhang San) and the ID number (such as 3241569855632145867) in the target text area of the ID card are obtained through this application embodiment, combined with the preset name to represent the name of the ID holder, and the ID number represents the The holder’s identification number obtained the actual meaning represented by "Zhang San" and "3241569855632145867". Among them, the method of determining the target text area may include determining the text of the target image by using the relative position of the anchor area and the target text area in the image recognition template corresponding to the target image according to the target image and the anchor area of the target image that meet the recognition requirements Area; identification requirements can include image clarity, degree of deformation, degree of offset, etc.
在一种可能的实现方式中,当生成有多个图片识别模板的情况下,所述在目标图片匹配有所述图片识别模板的锚点区域的情况下,通过所述图片识别模板中所述锚点区域与所述文本区域的位置对应关系,确定所述目标图片中进行识别的目标文本区域,包括:根据通用光学字符识别OCR识别所述目标图片,确认所述目标图片的文本信息;从所述多个图片识别模板中找到目标图片识别模板的锚点区域匹配所述目标图片;通过所述目标图片识别模板中所述锚点区域与所述文本区域的位置对应关系,确定所述目标图片中进行识别的目标文本区域。In a possible implementation manner, when multiple picture recognition templates are generated, in the case where the target picture matches the anchor region of the picture recognition template, the picture recognition template The positional correspondence between the anchor point area and the text area, and determining the target text area for recognition in the target picture includes: recognizing the target picture according to universal optical character recognition OCR, and confirming the text information of the target picture; The anchor point area of the target picture recognition template found in the plurality of picture recognition templates matches the target picture; the target is determined by the positional correspondence between the anchor point area and the text area in the target picture recognition template The target text area for recognition in the picture.
在一种可能的实现方式中,所述根据定义的所述文本含义得到所述目标文本区域中文本信息的文本含义,包括:在所述目标文本区域与所述目标图片识别模板的文本区域的重合率达到预设值的情况下,确认所述目标文本区域中文本信息的含义。可以理解的是,当待识别的目标图片的目标文本区域与识别模板的目标文本区域的重合率达到预设值时,则可以进行文本含义识别。比如:两个文本区域有90%以上的重合率;则拼接目标图片中文本区域内识别出的文本,并根据图片识别模板中该文本含义的解释,输出该目标文本区域的识别结果。以身份证识别为例进行描述,经检测图片识别模板和目标图片两者的身份证号码区域的重合率达到预设的数值之后,拼接文本区域内的每个数字文本,结合模板中预设文本的含义,输出该数字串为身份证号码。In a possible implementation manner, the obtaining the text meaning of the text information in the target text area according to the defined text meaning includes: the difference between the target text area and the text area of the target picture recognition template When the coincidence rate reaches the preset value, the meaning of the text information in the target text area is confirmed. It is understandable that when the overlap rate of the target text area of the target image to be recognized and the target text area of the recognition template reaches a preset value, text meaning recognition can be performed. For example: two text areas have a coincidence rate of more than 90%; the text recognized in the text area in the target picture is spliced, and the recognition result of the target text area is output according to the interpretation of the meaning of the text in the picture recognition template. Take ID card recognition as an example for description. After detecting the overlap rate of ID card number areas of the image recognition template and the target image to a preset value, stitch each digital text in the text area and combine the preset text in the template. Meaning, output the number string as ID number.
实施本申请实施例,通过利用预先定义好的锚点区域,匹配适合待识别图片的图片识别模板,然后根据选择的图片识别模板的锚点区域进行目标文本区域的定位,并获取目标文本区域中的文本信息;结合预先设定的文本含义获得目标图片中目标文本区域内的文本信息的含义。本申请实施例主要通过图片识别模板中的锚点区域准确定位目标文本区域,提高对待识别文本定位的准确性和精准度,避免获取无关信息;获取目标文本后根据预设的对目标文本区域内的含义解释,向用户反馈识别出的文本含义以及文本内容,无需长周期的模板定制,节约成本和时间,且能够针对特定的文本载体进行识别。因此,本申请实施例既解决通用OCR识别不能获得图片文本含义的问题,同时也免去了对特定文本载体的深度定制以及大量该文本载体训练数据的需求,利用较短的开发周期支持没有现成模板的样式,又能快速地 获得具有一定规律和形式的文本含义。To implement the embodiment of this application, the pre-defined anchor area is used to match a picture recognition template suitable for the picture to be recognized, and then the target text area is located according to the anchor area of the selected picture recognition template, and the target text area is obtained The text information in the target image is combined with the preset text meaning to obtain the meaning of the text information in the target text area in the target picture. The embodiment of this application mainly uses the anchor point area in the image recognition template to accurately locate the target text area to improve the accuracy and precision of the positioning of the text to be recognized, and avoid obtaining irrelevant information; after obtaining the target text, the target text area is preset Interpretation of the meaning of the text, feedback the recognized text meaning and text content to the user, without long-period template customization, saving cost and time, and can be identified for a specific text carrier. Therefore, the embodiments of the present application not only solve the problem that general OCR recognition cannot obtain the meaning of the image text, but also eliminate the need for in-depth customization of a specific text carrier and a large amount of training data for the text carrier, and use a short development cycle to support the lack of ready-made The style of the template can quickly obtain the textual meaning with a certain regularity and form.
上述详细阐述了本申请实施例的方法,下面提供了本申请实施例的相关装置。The foregoing describes the method of the embodiment of the present application in detail, and the related device of the embodiment of the present application is provided below.
请参见图4,图4是本申请实施例提供的一种文本识别装置的结构示意图,文本识别装置40可以包括生成单元401、定位单元402、读取单元403、匹配单元404和矫正单元405。其中,可选的单元包括匹配单元404和矫正单元405。Please refer to FIG. 4, which is a schematic structural diagram of a text recognition device provided by an embodiment of the present application. The text recognition device 40 may include a generating unit 401, a positioning unit 402, a reading unit 403, a matching unit 404 and a correction unit 405. Among them, the optional units include a matching unit 404 and a correction unit 405.
生成单元401,用于根据预设的锚点区域及其对应的文本区域,生成图片识别模板;所述锚点区域与所述文本区域存在位置对应关系,所述文本区域包括定义了文本含义的文本信息;The generating unit 401 is configured to generate a picture recognition template according to a preset anchor area and its corresponding text area; the anchor area and the text area have a positional correspondence relationship, and the text area includes a text area that defines the meaning of the text. Text information
定位单元402,用于在目标图片匹配有所述图片识别模板的锚点区域的情况下,通过所述图片识别模板中所述锚点区域与所述文本区域的位置对应关系,确定所述目标图片中进行识别的目标文本区域;The positioning unit 402 is configured to determine the target according to the positional correspondence between the anchor area and the text area in the image recognition template when the target picture matches the anchor area of the picture recognition template The target text area for recognition in the picture;
读取单元403,用于根据定义的所述文本含义得到所述目标文本区域中文本信息的文本含义。The reading unit 403 is configured to obtain the text meaning of the text information in the target text area according to the defined text meaning.
在一种可能的实现方式中,所述装置40还包括:匹配单元404,用于在所述定位单元在目标图片匹配有所述图片识别模板的锚点区域的情况下,通过所述图片识别模板中所述锚点区域与所述文本区域的位置对应关系,确定所述目标图片中进行识别的目标文本区域之前,根据通用光学字符识别OCR识别所述目标图片,确认所述目标图片的文本信息;通过判断所述目标图片的文本信息是否匹配有所述图片识别模板的锚点区域。In a possible implementation manner, the device 40 further includes: a matching unit 404, configured to recognize through the picture when the positioning unit matches the anchor region of the picture recognition template in the target picture The corresponding relationship between the anchor point area and the text area in the template. Before determining the target text area to be recognized in the target picture, recognize the target picture according to the universal optical character recognition OCR, and confirm the text of the target picture Information; by judging whether the text information of the target picture matches the anchor point area of the picture recognition template.
在一种可能的实现方式中,匹配单元404,还包括判断单元406,用于通过判断所述目标图片的文本信息是否匹配有所述图片识别模板的锚点区域;In a possible implementation, the matching unit 404 further includes a judging unit 406 for judging whether the text information of the target picture matches the anchor area of the picture recognition template;
所述判断单元406,具体用于:The judgment unit 406 is specifically configured to:
判断所述目标图片的文本信息对应的区域是否与所述图片识别模板的锚点区域吻合;Judging whether the area corresponding to the text information of the target picture matches the anchor point area of the picture recognition template;
若是,检测所述目标图片的文本信息和所述图片识别模板的锚点区域包含的文本信息的相似度是否超过预设阈值,若是,确认所述目标图片的文本信息匹配有所述图片识别模板的锚点区域。If yes, detect whether the similarity between the text information of the target picture and the text information contained in the anchor region of the picture recognition template exceeds a preset threshold; if so, confirm that the text information of the target picture matches the picture recognition template The anchor area.
在一种可能的实现方式中,在生成有多个图片识别模板的情况下,所述定位单元402,具体用于:根据通用光学字符识别OCR识别所述目标图片,确认所述目标图片的文本信息;从所述多个图片识别模板中找到目标图片识别模板的锚点区域匹配所述目标图片;通过所述目标图片识别模板中所述锚点区域与所述文本区域的位置对应关系,确定所述目标图片中进行识别的目标文本区域。In a possible implementation manner, when multiple image recognition templates are generated, the positioning unit 402 is specifically configured to: recognize the target image according to the universal optical character recognition OCR, and confirm the text of the target image Information; find the anchor area of the target picture recognition template from the plurality of picture recognition templates to match the target picture; determine the position correspondence between the anchor area and the text area in the target picture recognition template The target text area for recognition in the target picture.
在一种可能的实现方式中,所述图片识别模板中的锚点区域为多个锚点区域;所述定位单元402,还包括文本定位单元407,用于通过所述目标图片识别模板中所述锚点区域与所述文本区域的位置对应关系,确定所述目标图片中进行识别的目标文本区域;In a possible implementation, the anchor point area in the picture recognition template is a plurality of anchor point areas; the positioning unit 402 further includes a text positioning unit 407 for recognizing all the points in the template through the target picture The positional correspondence between the anchor point area and the text area, and determine the target text area for recognition in the target picture;
所述文本定位单元407,具体用于:当所述目标图片中的多个锚点区域与所述目标图片识别模板的多个锚点区域匹配程度大于预设阈值,根据所述目标图片识别模板的多个锚点区域中每一个锚点区域与对应文本区域的位置对应关系,确定所述目标图片中进行识别的目标文本区域。The text positioning unit 407 is specifically configured to: when the degree of matching between the multiple anchor point regions in the target picture and the multiple anchor point regions of the target picture recognition template is greater than a preset threshold, according to the target picture recognition template The position correspondence between each anchor point area in the plurality of anchor point areas and the corresponding text area determines the target text area for recognition in the target picture.
在一种可能的实现方式中,所述装置40还包括搜索单元408,用于:In a possible implementation manner, the device 40 further includes a searching unit 408, configured to:
当从所述多个图片识别模板中无法查找所述目标图片识别模板,根据所述目标图片的文本信息,从第一设备获取所述目标图片识别模板,所述第一设备为存储所述目标图片识别模板的设备。When the target picture recognition template cannot be found from the plurality of picture recognition templates, the target picture recognition template is acquired from the first device according to the text information of the target picture, and the first device stores the target Picture recognition template equipment.
在一种可能的实现方式中,所述装置还包括:矫正单元405,用于判断所述目标图片是否需要进行图像矫正;若所述目标图片需要进行图像矫正,则对所述目标图片进行图像矫正。In a possible implementation, the device further includes: a correction unit 405, configured to determine whether the target picture needs image correction; if the target picture needs image correction, perform image correction on the target picture Correct.
在一种可能的实现方式中,所述读取单元403,具体用于在所述目标文本区域与所述目标图片识别模板的文本区域的重合率达到预设值的情况下,确认所述目标文本区域中文本信息的含义。In a possible implementation manner, the reading unit 403 is specifically configured to confirm the target when the overlap ratio of the target text area and the text area of the target picture recognition template reaches a preset value. The meaning of the text information in the text area.
在一种可能的实现方式中,所述装置40还包括预处理单元409,用于:In a possible implementation manner, the device 40 further includes a preprocessing unit 409, configured to:
预处理所述目标图片对应的原始图片,所述预处理包括调整所述原始图片的清晰度,和/或去除所述原始图片中与所述锚点区域以及所述文本区域无关的图像背景。Preprocessing the original picture corresponding to the target picture, the preprocessing includes adjusting the definition of the original picture, and/or removing the image background irrelevant to the anchor area and the text area in the original picture.
需要说明的是,本申请装置实施例中所描述的文本识别装置40的各功能单元的功能,可参见上述图2所述的方法实施例中文本识别方法的相关描述,此处不再赘述。It should be noted that the functions of each functional unit of the text recognition device 40 described in the device embodiment of the present application can be referred to the relevant description of the text recognition method in the method embodiment described in FIG. 2, and will not be repeated here.
本申请实施例还提供一种计算机存储介质,其中,该计算机存储介质可存储有程序,该程序执行时包括上述方法实施例中记载的任意一种的部分或全部步骤。An embodiment of the present application further provides a computer storage medium, wherein the computer storage medium may store a program, and the program includes part or all of the steps of any one of the above method embodiments when executed.
本申请实施例还提供一种计算机程序,该计算机程序包括指令,当该计算机程序被计算机执行时,使得计算机可以执行上述方法实施例中文本识别方法中的部分或全部步骤。The embodiments of the present application also provide a computer program, the computer program including instructions, when the computer program is executed by a computer, the computer can execute part or all of the steps in the text recognition method in the above method embodiment.
本申请实施例提供了一种网络设备50,请参见图5,图5是本申请实施例提供的一种网络设备的结构示意图,如图5所示,文本识别装置能以图5的结构实现,网络设备50可以包括至少一个存储部件501、至少一个处理部件502、至少一个通信部件503。此外,该设备还可以包括天线、电源等通用部件,在此不再详述。An embodiment of the present application provides a network device 50. Please refer to FIG. 5. FIG. 5 is a schematic structural diagram of a network device provided by an embodiment of the present application. As shown in FIG. 5, the text recognition apparatus can be implemented in the structure of FIG. The network device 50 may include at least one storage component 501, at least one processing component 502, and at least one communication component 503. In addition, the device may also include general components such as an antenna and a power supply, which are not described in detail here.
存储部件501可以包括一个或多个存储单元,每个单元可以包括一个或多个存储器,存储部件可用于存储程序和各种数据,并能在设备运行过程中高速、自动地完成程序或数据的存取。可以采用具有两种稳定状态的物理器件来存储信息,所述两种稳定状态分别表示为“0”和“1”。前述存储部件501,可以是只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)或者可存储信息和指令的其他类型的动态存储设备,也可以是电可擦可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、 只读光盘(Compact Disc Read-Only Memory,CD-ROM)或其他光盘存储、光碟存储(可以包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。存储器可以是独立存在,通过总线与处理器相连接。存储器也可以和处理器集成在一起。The storage component 501 may include one or more storage units, and each unit may include one or more memories. The storage component can be used to store programs and various data, and can complete the program or data at high speed and automatically during the operation of the device. access. A physical device with two stable states can be used to store information, and the two stable states are represented as "0" and "1" respectively. The aforementioned storage component 501 may be a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, a random access memory (RAM) or can store information and instructions Other types of dynamic storage devices can also be Electrically Erasable Programmable Read-Only Memory (EEPROM), CD-ROM (Compact Disc Read-Only Memory, CD-ROM) or other optical disk storage , CD storage (can include compressed CDs, laser disks, CDs, digital versatile CDs, Blu-ray CDs, etc.), disk storage media or other magnetic storage devices, or can be used to carry or store desired program codes in the form of instructions or data structures And any other media that can be accessed by the computer, but not limited to this. The memory can exist independently and is connected to the processor through a bus. The memory can also be integrated with the processor.
处理部件502,也可以称为处理器,处理单元,处理单板,处理模块、处理装置等。处理部件可以是中央处理器(central processing unit,CPU),网络处理器(network processor,NP)或者CPU和NP的组合,也可以是微处理器,特定应用集成电路(application-specific integrated circuit,ASIC),或一个或多个用于控制以上方案程序执行的集成电路。The processing component 502 may also be referred to as a processor, a processing unit, a processing board, a processing module, a processing device, and the like. The processing component can be a central processing unit (CPU), a network processor (NP) or a combination of CPU and NP, or a microprocessor, application-specific integrated circuit (ASIC) ), or one or more integrated circuits used to control the execution of the program above.
通信部件503,也可以称为收发机,或收发器等,可以是用于与其他设备或通信网络通信,其中可以包括用来进行无线、有线或其他通信方式的单元。The communication component 503, which may also be called a transceiver, or a transceiver, may be used to communicate with other devices or a communication network, and may include a unit used for wireless, wired, or other communication methods.
当网络设备50为图1所述终端时,所述处理部件502用于调用所述存储部件501的数据执行上述图2所述方法的相关描述,此处不再赘述。When the network device 50 is the terminal shown in FIG. 1, the processing component 502 is configured to call the data of the storage component 501 to execute the relevant description of the method described in FIG. 2, and details are not repeated here.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。In the above-mentioned embodiments, the description of each embodiment has its own focus. For parts that are not described in detail in an embodiment, reference may be made to related descriptions of other embodiments.
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可能可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。It should be noted that for the foregoing method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should know that this application is not limited by the described sequence of actions. Because according to this application, some steps may be performed in other order or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by this application.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如上述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个***,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed device may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the above-mentioned units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or integrated. To another system, or some features can be ignored, or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical or other forms.
在本申请中,所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本申请实施例方案的目的。In this application, the units described as separate parts may or may not be physically separated, and the parts displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed in multiple locations. Network unit. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments of the present application.
另外,在本申请各个实施例中的各功能组件可以集成在一个组件也可以是各个组件单独物理存在,也可以是两个或两个以上组件集成在一个组件中。上述集成的组件既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, the functional components in the various embodiments of the present application may be integrated into one component, or each component may exist alone physically, or two or more components may be integrated into one component. The above-mentioned integrated components can be implemented in the form of hardware or software functional units.
所述集成的组件如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分,或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated component is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of this application is essentially or the part that contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium It includes several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program code .
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Anyone familiar with the technical field can easily think of various equivalents within the technical scope disclosed in this application. Modifications or replacements, these modifications or replacements shall be covered within the protection scope of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.
应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。尽管在此结合各实施例对本申请进行了描述,然而,在实施例所要求保护的本申请过程中,本领域技术人员可理解并实现公开实施例的其他变化。It should be understood that in the various embodiments of the present application, the size of the sequence numbers of the above-mentioned processes does not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, rather than corresponding to the embodiments of the present application. The implementation process constitutes any limitation. Although the present application has been described in conjunction with various embodiments, those skilled in the art can understand and implement other changes in the disclosed embodiments during the process of the present application claimed by the embodiments.

Claims (20)

  1. 一种文本识别方法,其特征在于,包括:A text recognition method, characterized in that it comprises:
    根据预设的锚点区域及其对应的文本区域,生成图片识别模板;所述锚点区域与所述文本区域存在位置对应关系,所述文本区域包括定义了文本含义的文本信息,所述图片识别模板中的锚点区域为一个锚点区域或者多个锚点区域;A picture recognition template is generated according to a preset anchor point area and its corresponding text area; the anchor point area and the text area have a positional correspondence relationship, the text area includes text information defining the meaning of the text, and the picture Identify the anchor point area in the template as an anchor point area or multiple anchor point areas;
    在目标图片匹配有所述图片识别模板的锚点区域的情况下,通过所述图片识别模板中所述锚点区域与所述文本区域的位置对应关系,确定所述目标图片中进行识别的目标文本区域;In the case that the target picture matches the anchor point area of the picture recognition template, the target picture to be recognized in the target picture is determined by the positional correspondence between the anchor point area and the text area in the picture recognition template Text area
    根据定义的所述文本含义得到所述目标文本区域中文本信息的文本含义。Obtain the text meaning of the text information in the target text area according to the defined text meaning.
  2. 根据权利要求1所述的方法,其特征在于,所述在目标图片匹配有所述图片识别模板的锚点区域的情况下,通过所述图片识别模板中所述锚点区域与所述文本区域的位置对应关系,确定所述目标图片中进行识别的目标文本区域之前,还包括:The method according to claim 1, wherein in the case that the target picture matches the anchor point area of the picture recognition template, the anchor point area and the text area in the picture recognition template Before determining the target text area for recognition in the target picture, the position correspondence relationship further includes:
    根据通用光学字符识别OCR识别所述目标图片,确认所述目标图片的文本信息;Recognize the target picture according to general optical character recognition OCR, and confirm the text information of the target picture;
    通过判断所述目标图片的文本信息是否匹配有所述图片识别模板的锚点区域。By judging whether the text information of the target picture matches the anchor point area of the picture recognition template.
  3. 根据权利要求2所述的方法,其特征在于,所述通过判断所述目标图片的文本信息是否匹配有所述图片识别模板的锚点区域,包括:The method according to claim 2, wherein the judging whether the text information of the target picture matches the anchor point area of the picture recognition template comprises:
    判断所述目标图片的文本信息对应的区域是否与所述图片识别模板的锚点区域吻合;Judging whether the area corresponding to the text information of the target picture matches the anchor point area of the picture recognition template;
    若是,检测所述目标图片的文本信息和所述图片识别模板的锚点区域包含的文本信息的相似度是否超过预设阈值,若是,确认所述目标图片的文本信息匹配有所述图片识别模板的锚点区域。If yes, detect whether the similarity between the text information of the target picture and the text information contained in the anchor region of the picture recognition template exceeds a preset threshold; if so, confirm that the text information of the target picture matches the picture recognition template The anchor area.
  4. 根据权利要求1所述的方法,其特征在于,当生成有多个图片识别模板的情况下,所述在目标图片匹配有所述图片识别模板的锚点区域的情况下,通过所述图片识别模板中所述锚点区域与所述文本区域的位置对应关系,确定所述目标图片中进行识别的目标文本区域,包括:The method according to claim 1, wherein when a plurality of picture recognition templates are generated, in the case where the target picture matches the anchor region of the picture recognition template, the picture recognition The positional correspondence between the anchor point area and the text area in the template, and determining the target text area for recognition in the target picture includes:
    根据通用光学字符识别OCR识别所述目标图片,确认所述目标图片的文本信息;Recognize the target picture according to general optical character recognition OCR, and confirm the text information of the target picture;
    从所述多个图片识别模板中找到目标图片识别模板的锚点区域匹配所述目标图片;Finding from the plurality of image recognition templates that the anchor region of the target image recognition template matches the target image;
    通过所述目标图片识别模板中所述锚点区域与所述文本区域的位置对应关系,确定所述目标图片中进行识别的目标文本区域。According to the positional correspondence between the anchor point area and the text area in the target picture recognition template, the target text area for recognition in the target picture is determined.
  5. 根据权利要求4所述的方法,其特征在于,所述图片识别模板中的锚点区域为多个锚点区域;所述通过所述目标图片识别模板中所述锚点区域与所述文本区域的位置对应关系,确定所述目标图片中进行识别的目标文本区域,包括:The method according to claim 4, wherein the anchor point area in the picture recognition template is a plurality of anchor point areas; the anchor point area and the text area in the target picture recognition template To determine the target text area for recognition in the target picture, including:
    当所述目标图片中的多个锚点区域与所述目标图片识别模板的多个锚点区域匹配程度大于预设阈值,根据所述目标图片识别模板的多个锚点区域中每一个锚点区域与对应文本区域的位置对应关系,确定所述目标图片中进行识别的目标文本区域。When the degree of matching between the multiple anchor point areas in the target picture and the multiple anchor point areas of the target picture recognition template is greater than a preset threshold, each anchor point in the multiple anchor point areas of the target picture recognition template The positional correspondence between the area and the corresponding text area determines the target text area for recognition in the target picture.
  6. 根据权利要求4所述的方法,其特征在于,所述方法还包括:The method according to claim 4, wherein the method further comprises:
    当从所述多个图片识别模板中无法查找所述目标图片识别模板,根据所述目标图片的文本信息,从第一设备获取所述目标图片识别模板,所述第一设备为存储所述目标图片识别模板的设备。When the target picture recognition template cannot be found from the plurality of picture recognition templates, the target picture recognition template is acquired from the first device according to the text information of the target picture, and the first device stores the target Picture recognition template equipment.
  7. 根据权利要求1所述的方法,其特征在于,所述目标图片匹配有所述图片识别模板的锚点区域之后,还包括:The method according to claim 1, wherein after the target picture matches the anchor region of the picture recognition template, the method further comprises:
    判断所述目标图片是否需要进行图像矫正;Judging whether the target picture needs image correction;
    若所述目标图片需要进行图像矫正,则对所述目标图片进行图像矫正。If the target picture needs image correction, then the target picture is image corrected.
  8. 根据权利要求1任一项所述的方法,其特征在于,所述根据定义的所述文本含义得到所述目标文本区域中文本信息的文本含义,包括:The method according to any one of claims 1, wherein the obtaining the text meaning of the text information in the target text area according to the defined text meaning comprises:
    在所述目标文本区域与所述目标图片识别模板的文本区域的重合率达到预设值的情况下,确认所述目标文本区域中文本信息的含义。When the overlap ratio of the target text area and the text area of the target picture recognition template reaches a preset value, the meaning of the text information in the target text area is confirmed.
  9. 根据权利要求1-8所述的方法,其特征在于,所述方法还包括:The method of claims 1-8, wherein the method further comprises:
    预处理所述目标图片对应的原始图片,所述预处理包括调整所述原始图片的清晰度,和/或去除所述原始图片中与所述锚点区域以及所述文本区域无关的图像背景。Preprocessing the original picture corresponding to the target picture, the preprocessing includes adjusting the definition of the original picture, and/or removing the image background irrelevant to the anchor area and the text area in the original picture.
  10. 一种文本识别装置,其特征在于,包括:A text recognition device, characterized by comprising:
    生成单元,用于根据预设的锚点区域及其对应的文本区域,生成图片识别模板;所述锚点区域与所述文本区域存在位置对应关系,所述文本区域包括定义了文本含义的文本信息;The generating unit is configured to generate a picture recognition template according to the preset anchor point area and its corresponding text area; the anchor point area and the text area have a positional correspondence relationship, and the text area includes text defining the meaning of the text information;
    定位单元,用于在目标图片匹配有所述图片识别模板的锚点区域的情况下,通过所述图片识别模板中所述锚点区域与所述文本区域的位置对应关系,确定所述目标图片中进行识别的目标文本区域;The positioning unit is configured to determine the target picture according to the positional correspondence between the anchor area and the text area in the picture recognition template when the target picture matches the anchor area of the picture recognition template The target text area for recognition in
    读取单元,用于根据定义的所述文本含义得到所述目标文本区域中文本信息的文本含义。The reading unit is configured to obtain the text meaning of the text information in the target text area according to the defined text meaning.
  11. 根据权利要求10所述的装置,其特征在于,所述装置还包括:The device according to claim 10, wherein the device further comprises:
    匹配单元,用于在所述定位单元在目标图片匹配有所述图片识别模板的锚点区域的情况下,通过所述图片识别模板中所述锚点区域与所述文本区域的位置对应关系,确定所述目标图片中进行识别的目标文本区域之前,根据通用光学字符识别OCR识别所述目标图片,确认所述目标图片的文本信息;通过判断所述目标图片的文本信息是否匹配有所述图片识别模板的锚点区域。The matching unit is configured to, when the positioning unit matches the anchor point area of the picture recognition template in the target picture, use the position correspondence between the anchor point area and the text area in the picture recognition template, Before determining the target text area to be recognized in the target picture, recognize the target picture according to general optical character recognition OCR, and confirm the text information of the target picture; by judging whether the text information of the target picture matches the picture Identify the anchor area of the template.
  12. 根据权利要求11所述的装置,其特征在于,匹配单元,还包括判断单元,用于通过判断所述 目标图片的文本信息是否匹配有所述图片识别模板的锚点区域;The device according to claim 11, wherein the matching unit further comprises a judging unit for judging whether the text information of the target picture matches the anchor area of the picture recognition template;
    所述判断单元,具体用于:The judgment unit is specifically used for:
    判断所述目标图片的文本信息对应的区域是否与所述图片识别模板的锚点区域吻合;Judging whether the area corresponding to the text information of the target picture matches the anchor point area of the picture recognition template;
    若是,检测所述目标图片的文本信息和所述图片识别模板的锚点区域包含的文本信息的相似度是否超过预设阈值,若是,确认所述目标图片的文本信息匹配有所述图片识别模板的锚点区域。If yes, detect whether the similarity between the text information of the target picture and the text information contained in the anchor region of the picture recognition template exceeds a preset threshold; if so, confirm that the text information of the target picture matches the picture recognition template The anchor area.
  13. 根据权利要求10所述的装置,其特征在于,当生成有多个图片识别模板的情况下,所述定位单元,具体用于:The device according to claim 10, wherein when multiple image recognition templates are generated, the positioning unit is specifically configured to:
    根据通用光学字符识别OCR识别所述目标图片,确认所述目标图片的文本信息;从所述多个图片识别模板中找到目标图片识别模板的锚点区域匹配所述目标图片;通过所述目标图片识别模板中所述锚点区域与所述文本区域的位置对应关系,确定所述目标图片中进行识别的目标文本区域。Recognize the target picture according to the universal optical character recognition OCR, confirm the text information of the target picture; find the anchor region of the target picture recognition template from the multiple picture recognition templates to match the target picture; pass the target picture The positional correspondence between the anchor point area and the text area in the recognition template is determined, and the target text area for recognition in the target picture is determined.
  14. 根据权利要求13所述的装置,其特征在于,所述图片识别模板中的锚点区域为多个锚点区域;所述定位单元,还包括文本定位单元,用于通过所述目标图片识别模板中所述锚点区域与所述文本区域的位置对应关系,确定所述目标图片中进行识别的目标文本区域;The device according to claim 13, wherein the anchor point area in the picture recognition template is a plurality of anchor point areas; the positioning unit further comprises a text positioning unit for recognizing the template through the target picture The positional correspondence between the anchor region and the text region in the above, and determining the target text region for recognition in the target picture;
    所述文本定位单元,具体用于:当所述目标图片中的多个锚点区域与所述目标图片识别模板的多个锚点区域匹配程度大于预设阈值,根据所述目标图片识别模板的多个锚点区域中每一个锚点区域与对应文本区域的位置对应关系,确定所述目标图片中进行识别的目标文本区域。The text positioning unit is specifically configured to: when the degree of matching between the multiple anchor point regions in the target picture and the multiple anchor point regions of the target picture recognition template is greater than a preset threshold, according to the target picture recognition template The positional correspondence between each anchor point area in the plurality of anchor point areas and the corresponding text area determines the target text area for recognition in the target picture.
  15. 根据权利要求13所述的装置,其特征在于,所述装置还包括搜索单元,用于:The device according to claim 13, wherein the device further comprises a search unit, configured to:
    当从所述多个图片识别模板中无法查找所述目标图片识别模板,根据所述目标图片的文本信息,从第一设备获取所述目标图片识别模板,所述第一设备为存储所述目标图片识别模板的设备。When the target picture recognition template cannot be found from the plurality of picture recognition templates, the target picture recognition template is acquired from the first device according to the text information of the target picture, and the first device stores the target Picture recognition template equipment.
  16. 根据权利要求10所述的装置,其特征在于,所述装置还包括矫正单元,用于:The device according to claim 10, wherein the device further comprises a correction unit for:
    在目标图片匹配有所述图片识别模板的锚点区域之后,判断所述目标图片是否需要进行图像矫正;若所述目标图片需要进行图像矫正,则对所述目标图片进行图像矫正。After the target picture matches the anchor point area of the picture recognition template, it is determined whether the target picture needs image correction; if the target picture needs image correction, then the target picture is image corrected.
  17. 根据权利要求10所述的装置,其特征在于,所述读取单元,具体用于:The device according to claim 10, wherein the reading unit is specifically configured to:
    在所述目标文本区域与所述目标图片识别模板的文本区域的重合率达到预设值的情况下,确认所述目标文本区域中文本信息的含义。When the overlap ratio of the target text area and the text area of the target picture recognition template reaches a preset value, the meaning of the text information in the target text area is confirmed.
  18. 根据权利要求10-17所述的装置,其特征在于,所述装置还包括预处理单元,用于:The device according to claims 10-17, wherein the device further comprises a preprocessing unit for:
    预处理所述目标图片对应的原始图片,所述预处理包括调整所述原始图片的清晰度,和/或去除所述原始图片中与所述锚点区域以及所述文本区域无关的图像背景。Preprocessing the original picture corresponding to the target picture, the preprocessing includes adjusting the definition of the original picture, and/or removing the image background irrelevant to the anchor area and the text area in the original picture.
  19. 一种网络设备,其特征在于,包括存储部件、通信部件和处理部件,存储部件、通信部件和处理部件相互连接,其中,存储部件用于存储数据处理代码,通信部件用于与外部设备进行信息交互;处理部件被配置用于调用程序代码,执行上述权利要求1-5任意一项所述的方法。A network device characterized by comprising a storage component, a communication component, and a processing component. The storage component, the communication component, and the processing component are connected to each other. The storage component is used to store data processing codes, and the communication component is used to communicate with an external device. Interaction; the processing component is configured to call program code to execute the method of any one of claims 1-5.
  20. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行如权利要求1-5任一项所述的方法。A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and the computer program includes program instructions that, when executed by a processor, cause the processor to execute The method of any one of 1-5 is required.
PCT/CN2019/088978 2019-02-27 2019-05-29 Text recognition method and apparatus WO2020173008A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910146626.6 2019-02-27
CN201910146626.6A CN109977935B (en) 2019-02-27 2019-02-27 Text recognition method and device

Publications (1)

Publication Number Publication Date
WO2020173008A1 true WO2020173008A1 (en) 2020-09-03

Family

ID=67077503

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/088978 WO2020173008A1 (en) 2019-02-27 2019-05-29 Text recognition method and apparatus

Country Status (2)

Country Link
CN (1) CN109977935B (en)
WO (1) WO2020173008A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112348024A (en) * 2020-10-29 2021-02-09 北京信工博特智能科技有限公司 Image-text identification method and system based on deep learning optimization network
CN112363918A (en) * 2020-11-02 2021-02-12 北京云聚智慧科技有限公司 Automatic test method, device, equipment and storage medium for user interface AI
CN112508550A (en) * 2020-12-04 2021-03-16 建信金融科技有限责任公司 Transfer processing method, device, equipment and storage medium
CN112699740A (en) * 2020-12-10 2021-04-23 广州广电运通金融电子股份有限公司 Bank card information structured extraction method, system and equipment
CN112711668A (en) * 2020-12-29 2021-04-27 广东电网有限责任公司 Intelligent text receiving and sending system and method for OA office
CN113177541A (en) * 2021-05-17 2021-07-27 上海云扩信息科技有限公司 Method for extracting character contents in PDF document and picture by computer program
CN113191348A (en) * 2021-05-31 2021-07-30 山东新一代信息产业技术研究院有限公司 Template-based text structured extraction method and tool

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112396057A (en) * 2019-08-13 2021-02-23 上海高德威智能交通***有限公司 Character recognition method and device and electronic equipment
CN110516664A (en) * 2019-08-16 2019-11-29 咪咕数字传媒有限公司 Bank slip recognition method, apparatus, electronic equipment and storage medium
CN110674811B (en) * 2019-09-04 2022-04-29 广东浪潮大数据研究有限公司 Image recognition method and device
CN111126125B (en) * 2019-10-15 2023-08-01 平安科技(深圳)有限公司 Method, device, equipment and readable storage medium for extracting target text in certificate
CN111046736B (en) * 2019-11-14 2021-04-16 北京房江湖科技有限公司 Method, device and storage medium for extracting text information
CN111191557B (en) * 2019-12-25 2023-12-05 深圳市优必选科技股份有限公司 Mark identification positioning method, mark identification positioning device and intelligent equipment
CN111079709B (en) * 2019-12-31 2021-04-20 广州市昊链信息科技股份有限公司 Electronic document generation method and device, computer equipment and storage medium
CN111178365A (en) * 2019-12-31 2020-05-19 五八有限公司 Picture character recognition method and device, electronic equipment and storage medium
CN111079708B (en) * 2019-12-31 2020-12-29 广州市昊链信息科技股份有限公司 Information identification method and device, computer equipment and storage medium
CN111368840A (en) * 2020-02-20 2020-07-03 中国建设银行股份有限公司 Certificate picture processing method and device
CN111401137A (en) * 2020-02-24 2020-07-10 中国建设银行股份有限公司 Method and device for identifying certificate column
CN111476227B (en) * 2020-03-17 2024-04-05 平安科技(深圳)有限公司 Target field identification method and device based on OCR and storage medium
CN113762244A (en) * 2020-06-05 2021-12-07 北京市天元网络技术股份有限公司 Document information extraction method and device
CN112381087A (en) * 2020-08-26 2021-02-19 北京来也网络科技有限公司 Image recognition method, apparatus, computer device and medium combining RPA and AI
CN112580499A (en) * 2020-12-17 2021-03-30 上海眼控科技股份有限公司 Text recognition method, device, equipment and storage medium
CN112633118A (en) * 2020-12-18 2021-04-09 上海眼控科技股份有限公司 Text information extraction method, equipment and storage medium
CN113326785B (en) * 2021-06-01 2023-08-04 上海期货信息技术有限公司 File identification method and device
CN113672322B (en) * 2021-07-29 2024-05-24 浙江太美医疗科技股份有限公司 Method and device for providing interpretation information
CN113963339A (en) * 2021-09-02 2022-01-21 泰康保险集团股份有限公司 Information extraction method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105913093A (en) * 2016-05-03 2016-08-31 电子科技大学 Template matching method for character recognizing and processing
US20160255235A1 (en) * 2014-03-04 2016-09-01 Xerox Corporation Global registration of filled-out content in an application form
CN107368800A (en) * 2017-07-13 2017-11-21 上海携程商务有限公司 Order confirmation method, system, equipment and storage medium based on fax identification
CN109002768A (en) * 2018-06-22 2018-12-14 深源恒际科技有限公司 Medical bill class text extraction method based on the identification of neural network text detection
CN109086756A (en) * 2018-06-15 2018-12-25 众安信息技术服务有限公司 A kind of text detection analysis method, device and equipment based on deep neural network

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7788253B2 (en) * 2006-12-28 2010-08-31 International Business Machines Corporation Global anchor text processing
US11074495B2 (en) * 2013-02-28 2021-07-27 Z Advanced Computing, Inc. (Zac) System and method for extremely efficient image and pattern recognition and artificial intelligence platform

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160255235A1 (en) * 2014-03-04 2016-09-01 Xerox Corporation Global registration of filled-out content in an application form
CN105913093A (en) * 2016-05-03 2016-08-31 电子科技大学 Template matching method for character recognizing and processing
CN107368800A (en) * 2017-07-13 2017-11-21 上海携程商务有限公司 Order confirmation method, system, equipment and storage medium based on fax identification
CN109086756A (en) * 2018-06-15 2018-12-25 众安信息技术服务有限公司 A kind of text detection analysis method, device and equipment based on deep neural network
CN109002768A (en) * 2018-06-22 2018-12-14 深源恒际科技有限公司 Medical bill class text extraction method based on the identification of neural network text detection

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112348024A (en) * 2020-10-29 2021-02-09 北京信工博特智能科技有限公司 Image-text identification method and system based on deep learning optimization network
CN112363918A (en) * 2020-11-02 2021-02-12 北京云聚智慧科技有限公司 Automatic test method, device, equipment and storage medium for user interface AI
CN112363918B (en) * 2020-11-02 2024-03-08 北京云聚智慧科技有限公司 User interface AI automatic test method, device, equipment and storage medium
CN112508550A (en) * 2020-12-04 2021-03-16 建信金融科技有限责任公司 Transfer processing method, device, equipment and storage medium
CN112699740A (en) * 2020-12-10 2021-04-23 广州广电运通金融电子股份有限公司 Bank card information structured extraction method, system and equipment
CN112711668A (en) * 2020-12-29 2021-04-27 广东电网有限责任公司 Intelligent text receiving and sending system and method for OA office
CN113177541A (en) * 2021-05-17 2021-07-27 上海云扩信息科技有限公司 Method for extracting character contents in PDF document and picture by computer program
CN113177541B (en) * 2021-05-17 2023-12-19 上海云扩信息科技有限公司 Method for extracting text content in PDF document and picture by computer program
CN113191348A (en) * 2021-05-31 2021-07-30 山东新一代信息产业技术研究院有限公司 Template-based text structured extraction method and tool

Also Published As

Publication number Publication date
CN109977935B (en) 2024-04-12
CN109977935A (en) 2019-07-05

Similar Documents

Publication Publication Date Title
WO2020173008A1 (en) Text recognition method and apparatus
CN109829453B (en) Method and device for recognizing characters in card and computing equipment
CN111476227B (en) Target field identification method and device based on OCR and storage medium
US10303968B2 (en) Method and apparatus for image recognition
EP2898451B1 (en) Information obtaining method and apparatus
WO2015171518A1 (en) Method and apparatus of extracting particular information from standard card
JP4904426B1 (en) Image processing system and imaging object used therefor
CN110431563B (en) Method and device for correcting image
CN111290684B (en) Image display method, image display device and terminal equipment
CN110070491A (en) Bank card picture antidote, device, equipment and storage medium
CN112613513A (en) Image recognition method, device and system
CN112308046A (en) Method, device, server and readable storage medium for positioning text region of image
US10896339B2 (en) Detecting magnetic ink character recognition codes
CN111414914A (en) Image recognition method and device, computer equipment and storage medium
CN112966719A (en) Method and device for recognizing meter panel reading and terminal equipment
CN110880023A (en) Method and device for detecting certificate picture
JP5651221B2 (en) Symbol piece, image processing program, and image processing method
CN110263310B (en) Data graph generation method and device and computer readable storage medium
JP5140773B2 (en) Image processing program, portable terminal, and image processing method
JP5101740B2 (en) Object to be imaged
JP2018116005A (en) Three-dimensional data processing apparatus, three-dimensional data measuring machine, and three-dimensional data processing method
JP2012058869A (en) Image processor, image-forming device and image processing program
CN116052180A (en) Invoice recognition method and device based on deep learning and electronic equipment
CN115035527A (en) Method, device and equipment for identifying electronic signature position
JP5140772B2 (en) Image processing program, portable terminal, and image processing method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19917054

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 19.10.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19917054

Country of ref document: EP

Kind code of ref document: A1