WO2021115490A1 - 面向复杂环境的***文字检测识别方法、装置及介质 - Google Patents

面向复杂环境的***文字检测识别方法、装置及介质 Download PDF

Info

Publication number
WO2021115490A1
WO2021115490A1 PCT/CN2020/136402 CN2020136402W WO2021115490A1 WO 2021115490 A1 WO2021115490 A1 WO 2021115490A1 CN 2020136402 W CN2020136402 W CN 2020136402W WO 2021115490 A1 WO2021115490 A1 WO 2021115490A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
seal
picture
area
detection
Prior art date
Application number
PCT/CN2020/136402
Other languages
English (en)
French (fr)
Inventor
汤鑫
梁晓云
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021115490A1 publication Critical patent/WO2021115490A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • G06V10/464Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations

Definitions

  • This application relates to the field of artificial intelligence technology, and in particular to a method, device, electronic device, and computer-readable storage medium for detecting and recognizing seal characters in a complex environment.
  • Seals are authoritative and are widely used in our country’s state agencies, organizations, enterprises and institutions. Sealed texts are legally binding, and the inspection of seals and seals occupies a large proportion of document inspections.
  • the inventor found that at present, it is usually necessary to adopt a manual inspection method to confirm whether the seal in the document is correct, and a large amount of manual verification is labor intensive and inefficient.
  • automatic seal detection and identification Faced with the increasing workload of inspection and identification, the diversification of stamping conditions, and the diversification of sample extraction quality, automatic seal detection and identification has great research value and economic benefits.
  • the existing automatic seal recognition can only handle situations without background text interference. In a real scene, due to factors such as background text interference and differences in seal quality, it is very difficult to recognize the seal in the real scene.
  • the present application provides a method, device, electronic device, and computer-readable storage medium for detecting and recognizing seal characters in a complex environment, the main purpose of which is to detect and recognize seal characters in the presence of background text interference.
  • the first aspect of the present application is to provide a method for detecting and recognizing seal characters in a complex environment.
  • the method includes:
  • the text recognition model adopts SAR (Show, Attend and Read) network for text recognition.
  • the SAR network includes a residual network (Residual Network, ResNet) module, which is used to extract text features and obtain feature vectors;
  • ResNet residual Network
  • the framework of a Long Short-Term Memory (LSTM) encoder-decoder the framework includes an LSTM encoder and a decoder; an attention module, which is used to apply an attention mechanism to the decoder;
  • the feature vector is obtained through the ResNet module, and the feature vector is input into the LSTM encoder to obtain the hidden state vector; the hidden state vector is input into the decoder applied with the attention mechanism to obtain the text information in the seal.
  • the second aspect of the present application is to provide a seal character detection and recognition device oriented to a complex environment, including:
  • the image acquisition module is used to acquire the image of the document to be processed
  • the seal extraction module is configured to perform seal detection and positioning on the document image, and extract the seal image according to the detection and positioning result, where the seal image is the smallest rectangular image including the seal;
  • the detection and segmentation module is used to perform text detection on the seal picture, and segment to obtain a curved text area in the seal;
  • the text conversion module is used to convert the curved text area from a curve to a straight line to obtain a straight text picture
  • the text recognition module is used to input the linear text picture into the text recognition model to obtain the text information in the seal;
  • the text recognition model uses a SAR network for text recognition.
  • the SAR network includes a ResNet module for extracting text features and obtaining feature vectors; based on the LSTM encoder-decoder framework, the framework includes an LSTM encoder And decoder; attention module, used to apply attention mechanism to the decoder;
  • the feature vector is obtained through the ResNet module, and the feature vector is input into the LSTM encoder to obtain the hidden state vector; the hidden state vector is input into the decoder applied with the attention mechanism to obtain the text information in the seal.
  • a third aspect of the present application is to provide an electronic device, the electronic device comprising: at least one processor; and a memory communicatively connected with the at least one processor; wherein the memory stores An instruction executable by the at least one processor, the instruction being executed by the at least one processor, so that the at least one processor can execute the above-mentioned method for detecting and recognizing seal characters in a complex environment.
  • the fourth aspect of the present application is to provide a computer-readable storage medium storing a computer program that, when executed by a processor, implements the above-mentioned method for detecting and recognizing seal characters in a complex environment as described above .
  • This application uses a text recognition model to perform end-to-end detection and recognition of the seal, which has high robustness against background text interference of the seal, and can convert the curved text in the seal into straight text, which solves the difficulty of recognizing the seal text The problem.
  • This application is based on artificial intelligence and image detection technology to automatically extract the text content of the stamp from the document, and does not depend on the color of the stamp. It can handle various colors of stamps such as black and white, red, and blue, avoiding hiring a lot of manpower to compare. The content of the seal saves manpower and improves economic efficiency.
  • FIG. 1 is a schematic flowchart of a method for detecting and recognizing seal characters according to an embodiment of the application
  • Figure 2a is a schematic diagram of a document picture provided by an embodiment of this application.
  • Figure 2b is a schematic diagram of a seal picture provided by an embodiment of the application.
  • 2c is a schematic diagram of a curved text area provided by an embodiment of the application.
  • 2d is a schematic diagram of a linear text picture provided by an embodiment of the application.
  • FIG. 3 is a schematic diagram of modules of a seal text detection and recognition device provided by an embodiment of the application.
  • FIG. 4 is a schematic diagram of the internal structure of an electronic device that implements a method for detecting and recognizing seal characters according to an embodiment of the application;
  • This application provides a method for detecting and recognizing seal characters in a complex environment.
  • FIG. 1 it is a schematic flowchart of a method for detecting and recognizing seal characters according to an embodiment of this application.
  • the method can be executed by a device, and the device can be implemented by software and/or hardware.
  • the method for detecting and recognizing seal characters in a complex environment includes:
  • Step S1 Obtain a document picture to be processed.
  • Figure 2a is a schematic diagram of a document picture provided in an embodiment of this application.
  • the document picture has a seal to be recognized, and there is no restriction on the color of the seal, which can be black and white or red. , Blue, etc.;
  • Step S2 Perform seal detection and positioning on the document picture, and extract the seal picture according to the detection and positioning result, where the seal picture is the smallest rectangular picture including the seal, as shown in FIG. 2b, which is provided by an embodiment of this application
  • the schematic diagram of the seal picture taking a circular seal as an example, the seal picture is the smallest rectangle including the outer circle of the seal; for the input document, first check whether there is a seal, if there is a seal, you need to locate the position of the seal and extract it according to the position of the seal Seal picture;
  • Step S3 Perform text detection on the seal picture, and segment to obtain a curved text area in the seal, where the curved text area refers to that the entire area including the text to be recognized in the seal is curved, which may be An ellipse or a circle, as shown in FIG. 2c, is a schematic diagram of a curved text area provided by an embodiment of this application, which is obtained by performing text detection and segmentation on the seal picture shown in FIG. 2b, as shown in FIG. 2c. In the shape text area, only the shape text "XX City XXXXXX Meeting" is included;
  • Step S4 Convert the curved text area from a curve to a linear shape to obtain a linear text picture.
  • FIG. 2d which is a schematic diagram of a linear text picture provided by an embodiment of this application, by comparing the image shown in FIG. 2c The curved text area is converted, and the arc-shaped area is converted into a rectangular area;
  • Step S5 input the linear text picture into the text recognition model to obtain the text information in the seal, for example, the obtained seal information is "XX City XXXXXX Meeting";
  • the text recognition model uses a SAR network for text recognition.
  • the SAR network includes a ResNet module for extracting text features and obtaining feature vectors; based on the LSTM encoder-decoder framework, the framework includes an LSTM encoder And decoder; attention module, used to apply attention mechanism to the decoder; obtain feature vectors through the ResNet module, input the feature vectors into the LSTM encoder to obtain the hidden state vector; apply the hidden state vector input In the decoder with the attention mechanism, the text information in the seal is obtained.
  • the seal text detection and recognition of the present application can recognize the seal text under a complex background, and can recognize the deformed seal text, improve the accuracy of the seal text recognition, and is more robust against background text interference of the seal.
  • the text recognition model of the present application is trained using real complex background images, which include company name and seal type character string data, as the text content of the seal, and the complex background, where the complex background refers to the background of the seal image
  • the included background color or other distracting text for example, "April 2, 2019" in Figure 2b, is used as the background text of the seal picture, which does not belong to the content included in the seal itself, but belongs to the content of the document.
  • the seal type character string is placed in a complex background, and the seal text is deformed and position transformed.
  • the position transformation includes rotation, up and down, left and right movement, etc., so that the training samples are diversified.
  • the step of performing seal detection and positioning on the document picture includes: training a YOLOv3 detection model; and using the trained YOLOv3 detection model to obtain the position coordinates of the seal in the document picture.
  • the YOLOv3 detection model is an improved target detection algorithm based on the YOLOv2 model. It includes multiple convolutional layers. Feature maps at different scales are obtained through the convolutional layers. Each feature map contains a predicted target area (seal picture). The coordinates of the center point, the size and classification of the target area, and the position coordinates of the seal in the document picture are obtained according to the feature maps at different scales. Using this multi-scale feature prediction method makes the prediction results more accurate.
  • the training sample for training the YOLOv3 detection model is a text picture with a seal, and the seal position coordinates in the text picture are marked. Using this batch of data to train the model can realize the precise positioning of the seal.
  • the existing text recognition can only deal with the recognition of horizontal text, and there may be curved text and rectangular text in the stamp at the same time. Therefore, it is necessary to detect all the text areas in the stamp image, as required Extract separately. Since the spacing between the various characters in the seal is relatively small, after the seal image is obtained through the seal positioning detection, the seal image is cropped (refer to Figure 2b), and then the cropped seal image is enlarged and can be opened The interval between each text facilitates the detection of different text areas through the detection algorithm.
  • a PSENet Progressive Scale Expansion Network
  • a PSENet Progressive Scale Expansion Network
  • each text area in the seal picture is detected, including a curved text area and/or a rectangular text area.
  • a curved text area For rectangular text areas, you can directly enter the text recognition model for text recognition, while for curved text areas, you need to convert the curved text area into a rectangular text area, and then enter the text recognition model for processing.
  • steps of performing text detection on the seal picture include:
  • the input image dimension is [B,3,H,W], where B represents batch size, H represents image height, and W represents Image width
  • a Breadth First Search (BFS) algorithm is used to search the output picture. Starting from S1, more pixels are added to expand the area according to S2 until the Sn search ends, and the text connected domain is obtained, and the text area in the seal picture is obtained.
  • BFS Breadth First Search
  • step S3 a piece of curved text area can be detected and segmented, and the point coordinates of its edge line are recorded as Among them, p i represents the coordinates (x i , y i) of the i- th point, and the area surrounded by these N points is denoted as S.
  • the step of transforming the curved text area from a curved shape to a straight shape, and obtaining a straight-line text picture includes:
  • Step S41 assuming that the curved text area S is a part of a circular area, obtain the center coordinates and radius of the circular area; specifically, according to the characteristics of the seal, the curved text of the seal is mainly circular and elliptical. Therefore, it can be assumed that the curved text area S is a part of a circular area. Solving the center and radius of a circular area can be transformed into solving the following optimization problem:
  • r is the radius of the circle
  • c is the coordinate of the center of the circle
  • p i is the point coordinate of the i-th point on the edge line
  • N is the number of points on the edge line of the curved text area S;
  • (x i , y i ) are the coordinates of the point on the edge line
  • (c 0 , c 1 ) are the coordinates of the center of the circle.
  • the curved text area is composed of two concentric circles.
  • the minimum radius r and the maximum radius R are estimated by the following formulas:
  • p i is the coordinate of the i-th point on the edge line
  • c is the coordinate of the center of the circle
  • r is the minimum radius
  • R is the maximum radius
  • R can be appropriately increased and r can be appropriately reduced.
  • the specific adjustment amount is determined according to the actual situation.
  • Step S43 According to the coordinates of the center of the circle, the minimum radius, the maximum radius, and the arcs corresponding to the start and end points of the arc area, the coordinates in the linear text picture are corresponded to the coordinates in the seal picture, thereby converting the curved text area Map it to a rectangular area to get a linear text picture.
  • the curved text area is expanded into a rectangular area, the height is R-r, and the length is 2 ⁇ R.
  • the coordinates in the linear text picture to the coordinates in the seal picture by the following formula,
  • (c 0 , c 1 ) represents the coordinates of the center of the circle
  • r represents the minimum radius of the arc area
  • R represents the maximum radius of the arc area
  • represents the arc corresponding to the start and end points of the arc area
  • (x, y) represents The coordinates in the linear text picture
  • (x', y') indicate that the coordinates (x, y) correspond to the coordinates in the seal picture.
  • the method further includes: judging whether the text area is curved (including a circle or an ellipse), if the point coordinates of the edge line of the text area enclose the area The ratio of the area to the area of the smallest circumscribed rectangle of the enclosed area is less than a preset threshold, and the text area is curved.
  • the area A 1 of the enclosed area is calculated at the same time The minimum enclosing rectangular area A 2 of the enclosed area, if Then judged
  • This application uses the text recognition model to perform end-to-end detection and recognition of the seal, which has high robustness to the background text interference of the seal, and can convert the curved text in the seal into linear text, not only can identify the text in the seal Curved text can also recognize the rectangular text (straight-line text) in the seal, which solves the problem that the seal text is difficult to recognize.
  • FIG. 3 it is a functional block diagram of the seal character detection and recognition device of the present application.
  • the seal character detection and recognition device 100 described in this application can be installed in an electronic device.
  • the seal text detection and recognition device may include a picture acquisition module 101, a seal extraction module 102, a detection and segmentation module 103, a text conversion module 104, and a text recognition module 105.
  • the module described in the present invention can also be called a unit, which refers to a series of computer program segments that can be executed by the processor of an electronic device and can complete fixed functions, and are stored in the memory of the electronic device.
  • each module/unit is as follows:
  • the picture acquisition module 101 is used to acquire a document picture to be processed, where the document picture has a seal to be recognized, and there is no restriction on the color of the seal, which can be black and white, red, blue, etc.;
  • the seal extraction module 102 is used to perform seal detection and positioning on the document image, and extract the seal image according to the detection and positioning result, where the seal image is the smallest rectangular image including the seal; referring to Figure 2b, a circular seal is used as For example, the seal picture is the smallest rectangle including the outer circle of the seal; for the input document, first check whether there is a seal, if there is a seal, you need to locate the position of the seal, and extract the seal picture according to the position of the seal;
  • the detection and segmentation module 103 is used to perform text detection on the seal picture, and segment to obtain a curved text area in the seal; as shown in FIG. 2c, in the curved text area, only the curved text "XX City XXXXX Meeting" is included. ;
  • the text conversion module 104 is used to convert a curved text area from a curve to a straight line to obtain a straight text picture; referring to FIG. 2d, it is obtained by transforming the curved text area shown in FIG. 2c, and the arc-shaped area Converted into a rectangular area;
  • the text recognition module 105 is configured to input the linear text picture into a text recognition model to obtain text information in the seal;
  • the text recognition model uses a SAR network for text recognition.
  • the SAR network includes a ResNet module for extracting text features and obtaining feature vectors; based on the LSTM encoder-decoder framework, the framework includes an LSTM encoder And decoder; attention module, used to apply attention mechanism to the decoder;
  • the feature vector is obtained through the ResNet module, and the feature vector is input into the LSTM encoder to obtain the hidden state vector; the hidden state vector is input into the decoder applied with the attention mechanism to obtain the text information in the seal.
  • the seal text detection and recognition of the present application can recognize seal texts under complex backgrounds, and can recognize deformed seal texts, improve the accuracy of seal text recognition, and have better robustness against background text interference of the seal.
  • the text recognition model of the present application is trained using real complex background images, which include company name and seal type character string data, as the text content of the seal, and the complex background, where the complex background refers to the background of the seal image
  • the included background color or other interfering texts for example, "20XX year XX month XX day" in Figure 2b, as the background text of the seal picture, does not belong to the content included in the seal itself, but belongs to the content of the document.
  • the seal type character string is placed in a complex background, and the seal text is deformed and position transformed.
  • the position transformation includes rotation, up and down, left and right movement, etc., so that the training samples are diversified.
  • the seal extraction module 102 uses the YOLOv3 detection model to perform seal detection and positioning on the document picture. Specifically, this is achieved in the following ways: training a YOLOv3 detection model; using the trained YOLOv3 detection model to obtain the position coordinates of the seal in the document picture.
  • the YOLOv3 detection model is an improved target detection algorithm model based on the YOLOv2 model. It includes multiple convolutional layers. Feature maps at different scales are obtained through the convolutional layers. Each feature map contains the predicted target area (seal picture). The coordinates of the center point of ), the size and classification of the target area, the position coordinates of the seal in the document picture are obtained according to the feature maps at different scales. Using this multi-scale feature prediction method makes the prediction results more accurate.
  • the training sample for training the YOLOv3 detection model is a text picture with a seal, and the seal position coordinates in the text picture are marked. Using this batch of data to train the model can realize the precise positioning of the seal.
  • the existing text recognition can only deal with the recognition of horizontal text, and there may be curved text and rectangular text in the stamp at the same time. Therefore, it is necessary to detect all the text areas in the stamp image, as required Extract separately. Since the spacing between the various characters in the seal is relatively small, after the seal image is obtained through the seal positioning detection, the seal image is cropped (refer to Figure 2b), and then the cropped seal image is enlarged and can be opened The interval between each text facilitates the detection of different text areas through the detection algorithm.
  • the detection and segmentation module 103 uses a PSENet (Shape Robust Text Detection with Progressive Scale Expansion Network) text detection network to perform text detection on the seal image, and detects various text areas in the seal image, including curved text areas and / Or rectangular text area.
  • PSENet Shape Robust Text Detection with Progressive Scale Expansion Network
  • For rectangular text areas you can directly enter the text recognition model for text recognition, while for curved text areas, you need to convert the curved text area into a rectangular text area, and then enter the text recognition model for processing.
  • step of the detection and segmentation module 103 performing text detection on the seal image includes:
  • the input image dimension is [B,3,H,W], where H represents the height of the image and W represents the width of the image;
  • a Breadth First Search (BFS) algorithm is used to search the output picture. Starting from S1, more pixels are added to expand the area according to S2 until the Sn search ends, and the text connected domain is obtained, and the text area in the seal picture is obtained.
  • BFS Breadth First Search
  • p i represents the coordinates (x i , y i) of the i- th point, and the area surrounded by these N points is denoted as S.
  • the text conversion module 104 converts the curved text area from a curve shape to a straight line shape in the following manner to obtain a straight line text picture, which specifically includes:
  • Step S41 assuming that the curved text area S is a part of a circular area, obtain the center coordinates and radius of the circular area; specifically, according to the characteristics of the seal, the curved text of the seal is mainly circular and elliptical. Therefore, it can be assumed that the curved text area S is a part of a circular area. Solving the center and radius of a circular area can be transformed into solving the following optimization problem:
  • r is the radius of the circle
  • c is the coordinate of the center of the circle
  • p i is the point coordinate of the i-th point on the edge line
  • N is the number of points on the edge line of the curved text area S;
  • (x i , y i ) are the coordinates of the point on the edge line
  • (c 0 , c 1 ) are the coordinates of the center of the circle.
  • the curved text area is composed of two concentric circles.
  • the minimum radius r and the maximum radius R are estimated by the following formulas:
  • p i is the coordinate of the i-th point on the edge line
  • c is the coordinate of the center of the circle
  • r is the minimum radius
  • R is the maximum radius
  • R can be appropriately increased and r can be appropriately reduced.
  • the specific adjustment amount is determined according to the actual situation.
  • Step S43 According to the coordinates of the center of the circle, the minimum radius, the maximum radius, and the arcs corresponding to the start and end points of the arc area, the coordinates in the linear text picture are corresponded to the coordinates in the seal picture, thereby converting the curved text area Map it to a rectangular area to get a linear text picture.
  • the curved text area is expanded into a rectangular area, the height is R-r, and the length is 2 ⁇ R.
  • the coordinates in the linear text picture to the coordinates in the seal picture by the following formula,
  • (c 0 , c 1 ) represents the coordinates of the center of the circle
  • r represents the minimum radius of the arc area
  • R represents the maximum radius of the arc area
  • represents the arc corresponding to the start and end points of the arc area
  • (x, y) represents The coordinates in the linear text picture
  • (x', y') indicate that the coordinates (x, y) correspond to the coordinates in the seal picture.
  • the result of segmentation in the seal picture may be either a curved text area or a rectangular text area.
  • it further includes: judging whether the text area is curved (including a circle or an ellipse), if the area of the area surrounded by the point coordinates of the edge line of the text area is equal to If the ratio of the area of the smallest circumscribed rectangle of the enclosed area is less than a preset threshold, the text area is curved.
  • the area A 1 of the enclosed area is calculated at the same time The minimum enclosing rectangular area A 2 of the enclosed area, if Then judged
  • FIG. 4 it is a schematic diagram of the structure of an electronic device that implements the method for detecting and recognizing seal characters according to the present application.
  • the electronic device 1 may include a processor 10, a memory 11, and a bus, and may also include a computer program stored in the memory 11 and running on the processor 10, such as a seal character detection and recognition program 12.
  • the memory 11 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, mobile hard disk, multimedia card, card-type memory (such as SD or DX memory, etc.), magnetic memory, magnetic disk, CD etc.
  • the memory 11 may be an internal storage unit of the electronic device 1 in some embodiments, for example, a mobile hard disk of the electronic device 1.
  • the memory 11 may also be an external storage device of the electronic device 1, such as a plug-in mobile hard disk, a smart media card (SMC), and a secure digital (Secure Digital) equipped on the electronic device 1. , SD) card, flash card (Flash Card), etc.
  • the memory 11 may also include both an internal storage unit of the electronic device 1 and an external storage device.
  • the memory 11 can be used not only to store application software and various data installed in the electronic device 1, such as the code of a seal character detection and recognition program, but also to temporarily store data that has been output or will be output.
  • the processor 10 may be composed of integrated circuits in some embodiments, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits with the same function or different functions, including one or more Combinations of central processing unit (CPU), microprocessor, digital processing chip, graphics processor, and various control chips, etc.
  • the processor 10 is the control unit of the electronic device, which uses various interfaces and lines to connect various components of the entire electronic device, and runs or executes programs or modules (such as seals) stored in the memory 11 Character detection and recognition programs, etc.), and call data stored in the memory 11 to execute various functions of the electronic device 1 and process data.
  • the bus may be a peripheral component interconnect standard (PCI) bus or an extended industry standard architecture (EISA) bus, etc.
  • PCI peripheral component interconnect standard
  • EISA extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus and so on.
  • the bus is configured to implement connection and communication between the memory 11 and at least one processor 10 and the like.
  • FIG. 4 only shows an electronic device with components. Those skilled in the art can understand that the structure shown in FIG. 3 does not constitute a limitation on the electronic device 1, and may include fewer or more components than shown in the figure. Components, or a combination of certain components, or different component arrangements.
  • the electronic device 1 may also include a power source (such as a battery) for supplying power to various components.
  • the power source may be logically connected to the at least one processor 10 through a power management device, thereby controlling power
  • the device implements functions such as charge management, discharge management, and power consumption management.
  • the power supply may also include any components such as one or more DC or AC power supplies, recharging devices, power failure detection circuits, power converters or inverters, and power status indicators.
  • the electronic device 1 may also include various sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.
  • the electronic device 1 may also include a network interface.
  • the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which is usually used in the electronic device 1 Establish a communication connection with other electronic devices.
  • the electronic device 1 may also include a user interface.
  • the user interface may be a display (Display) and an input unit (such as a keyboard (Keyboard)).
  • the user interface may also be a standard wired interface or a wireless interface.
  • the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, etc.
  • the display can also be appropriately called a display screen or a display unit, which is used to display the information processed in the electronic device 1 and to display a visualized user interface.
  • the seal character detection and recognition program 12 stored in the memory 11 in the electronic device 1 is a combination of multiple instructions. When running in the processor 10, it can realize:
  • the text recognition model uses a SAR network for text recognition.
  • the SAR network includes a ResNet module for extracting text features and obtaining feature vectors; based on the LSTM encoder-decoder framework, the framework includes an LSTM encoder And decoder; attention module, used to apply attention mechanism to the decoder;
  • the feature vector is obtained through the ResNet module, and the feature vector is input into the LSTM encoder to obtain the hidden state vector; the hidden state vector is input into the decoder applied with the attention mechanism to obtain the text information in the seal.
  • the integrated module/unit of the electronic device 1 is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory) .
  • the embodiments of the present application also propose a computer-readable storage medium.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the computer-readable storage medium includes a computer program, and the computer program is The following operations are implemented when the processor is executed:
  • the text recognition model uses a SAR network for text recognition.
  • the SAR network includes a ResNet module for extracting text features and obtaining feature vectors; based on the LSTM encoder-decoder framework, the framework includes an LSTM encoder And decoder; attention module, used to apply attention mechanism to the decoder;
  • the feature vector is obtained through the ResNet module, and the feature vector is input into the LSTM encoder to obtain the hidden state vector; the hidden state vector is input into the decoder applied with the attention mechanism to obtain the text information in the seal.
  • the specific implementation of the computer-readable storage medium of the present application is substantially the same as the specific implementation of the above-mentioned method, device, and electronic device for detecting and recognizing seal characters in a complex environment, and will not be repeated here.
  • modules described as separate components may or may not be physically separated, and the components displayed as modules may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional modules in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Character Input (AREA)
  • Character Discrimination (AREA)

Abstract

一种面向复杂环境的***文字检测识别方法、装置及介质,方法包括:获取待处理的文档图片(S1);对文档图片进行***检测定位,并提取***图片(S2);对***图片进行文本检测,并分割得到曲形文本区域(S3);将曲形文本区域由曲线形转变为直线形,得到直线形文本图片(S4);将直线形文本图片输入文字识别模型,得到***中的文字信息(S5);其中,文字识别模型采用SAR网络进行文字识别,SAR网络包括ResNet模块,用于提取文字特征,获取特征向量;基于LSTM编码器-解码器的框架,框架包括LSTM编码器和解码器;注意力模块,用于向解码器施加注意力机制。对***进行端对端的检测识别,对***的背景文本干扰的鲁棒性较高。

Description

面向复杂环境的***文字检测识别方法、装置及介质
本申请要求于2020年6月22日提交中国专利局、申请号为202010573766.4,发明名称为“面向复杂环境的***文字检测识别方法、装置及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种面向复杂环境的***文字检测识别方法、装置、电子设备及计算机可读存储介质。
背景技术
***具有权威性,广泛应用于我国的国家机关、团体、企事业单位,盖有***的文本是具有法律效力的,而***印文的检验在文件检验中占据了较大比例。发明人发现,目前通常需要采用人工检验的办法来确认文件中所盖***是否正确,大量的人工校验耗费人力且效率低。面对检验鉴定的工作量日益增长,盖章条件、样本提取质量多样化等情况,***自动化检测识别具有很大的研究价值和经济效益。但是现有的***自动识别只能处理没有背景文本干扰的情况,而在真实场景下,由于背景文本的干扰、***质量的差异等因素,现实场景下***识别的难度很大。
发明内容
本申请提供一种面向复杂环境的***文字检测识别方法、装置、电子设备及计算机可读存储介质,其主要目的在于,在存在背景文本干扰情况下,对***文字进行检测识别。
为了实现上述目的,本申请的第一个方面是提供一种面向复杂环境的***文字检测识别方法,所述方法包括:
获取待处理的文档图片;
对所述文档图片进行***检测定位,并根据检测定位结果提取***图片,其中,所述***图片为包括***的最小矩形图片;
对所述***图片进行文本检测,并分割得到***中的曲形文本区域;
将曲形文本区域由曲线形转变为直线形,得到直线形文本图片;
将所述直线形文本图片输入文字识别模型,得到***中的文字信息;
其中,所述文字识别模型采用SAR(Show,Attend and Read)网络进行文字识别,所述SAR网络包括残差网络(Residual Network,ResNet)模块,用于提取文字特征,并获取特征向量;基于长短期记忆网络(Long Short-Term Memory,LSTM)编码器-解码器的框架,所述框架包括LSTM编码器和解码器;注意力模块,用于向解码器施加注意力机制;
通过所述ResNet模块获取特征向量,将所述特征向量输入所述LSTM编码器中获得隐藏状态向量;将隐藏状态向量输入施加了注意力机制的解码器中,获得***中的文字信息。
为了实现上述目的,本申请的第二个方面是提供一种面向复杂环境的***文字检测识别装置,包括:
图片获取模块,用于获取待处理的文档图片;
***提取模块,用于对所述文档图片进行***检测定位,并根据检测定位结果提取***图片,其中,所述***图片为包括***的最小矩形图片;
检测分割模块,用于对所述***图片进行文本检测,并分割得到***中的曲形文本区域;
文本转化模块,用于将曲形文本区域由曲线形转变为直线形,得到直线形文本图片;
文字识别模块,用于将所述直线形文本图片输入文字识别模型,得到***中的文字信息;
其中,所述文字识别模型采用SAR网络进行文字识别,所述SAR网络包括ResNet模块,用于提取文字特征,并获取特征向量;基于LSTM编码器-解码器的框架,所述框架包括LSTM编码器和解码器;注意力模块,用于向解码器施加注意力机制;
通过所述ResNet模块获取特征向量,将所述特征向量输入所述LSTM编码器中获得隐藏状态向量;将隐藏状态向量输入施加了注意力机制的解码器中,获得***中的文字信息。
为了实现上述目的,本申请的第三个方面是提供一种电子设备,所述电子设备包括:至少一个处理器;以及与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如上所述的面向复杂环境的***文字检测识别方法。
为了实现上述目的,本申请的第四个方面是提供一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时实现如上所述的面向复杂环境的***文字检测识别方法。
本申请通过文字识别模型对***进行端对端的检测识别,对***的背景文本干扰的鲁棒性较高,并且,能够将***中的曲形文本转化为直线形文本,解决了***文字难以识别的问题。
本申请基于人工智能和图片检测技术自动从文档中提取***文本内容,且不依赖于***颜色,对黑白、红色、蓝色等各种颜色的***都能处理,避免了雇佣大量人力来比对***内容,节省了人力,提高经济效益。
附图说明
图1为本申请一实施例提供的***文字检测识别方法的流程示意图;
图2a为本申请一实施例提供的文档图片的示意图;
图2b为本申请一实施例提供的***图片的示意图;
图2c为本申请一实施例提供的曲形文本区域的示意图;
图2d为本申请一实施例提供的直线形文本图片的示意图;
图3为本申请一实施例提供的***文字检测识别装置的模块示意图;
图4为本申请一实施例提供的实现***文字检测识别方法的电子设备的内部结构示意图;
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请提供一种面向复杂环境的***文字检测识别方法。参照图1所示,为本申请一实施例提供的***文字检测识别方法的流程示意图。该方法可以由一个装置执行,该装置可以由软件和/或硬件实现。
在本实施例中,面向复杂环境的***文字检测识别方法包括:
步骤S1,获取待处理的文档图片,参照图2a所示,为本申请一实施例提供的文档图片的示意图,文档图片中具有待识别的***,并且对***颜色没有限定,可以是黑白、红色、蓝色等;
步骤S2,对所述文档图片进行***检测定位,并根据检测定位结果提取***图片,其中,所述***图片为包括***的最小矩形图片,参照图2b所示,为本申请一实施例提供的***图片的示意图,以圆形***为例,***图片为包括***外圆在内的最小矩形;对于输入的文档,先检测是否存在***,若存在***,则需要定位***位置,根据***位置提取***图片;
步骤S3,对所述***图片进行文本检测,并分割得到***中的曲形文本区域,其中, 曲形文本区域指的是包括***中待识别的文字在内的整体区域呈曲形,可以是椭圆形或圆形,参照图2c所示,为本申请一实施例提供的曲形文本区域的示意图,通过对图2b所示的***图片进行文本检测分割得到,如图2c所示,在曲形文本区域中,仅包括曲形文本“XX市XXXXXX会”;
步骤S4,将曲形文本区域由曲线形转变为直线形,得到直线形文本图片,参照图2d所示,为本申请一实施例提供的直线形文本图片的示意图,通过对图2c所示的曲形文本区域转化得到,将圆弧形区域转化为矩形区域;
步骤S5,将所述直线形文本图片输入文字识别模型,得到***中的文字信息,例如,获取到***信息为“XX市XXXXXX会”;
其中,所述文字识别模型采用SAR网络进行文字识别,所述SAR网络包括ResNet模块,用于提取文字特征,并获取特征向量;基于LSTM编码器-解码器的框架,所述框架包括LSTM编码器和解码器;注意力模块,用于向解码器施加注意力机制;通过所述ResNet模块获取特征向量,将所述特征向量输入所述LSTM编码器中获得隐藏状态向量;将隐藏状态向量输入施加了注意力机制的解码器中,获得***中的文字信息。
本申请的***文字检测识别可以识别复杂背景下的***文字,并且,可以识别变形的***文字,提高***文字识别的准确率,对***的背景文本干扰的鲁棒性较好。
本申请的文字识别模型利用真实复杂背景图进行训练,所述真实复杂背景图包括公司名称和***类型字符串数据,作为***文本内容,以及复杂背景,其中,复杂背景指的是***图片背景中包括的底色或其他干扰文字等,例如,图2b中的“2019年4月2日”,作为***图片的背景文字,其不属于***本身包括的内容,而属于文档的内容。在训练文字识别模型时,将***类型字符串放置于复杂背景中,并对***文字做形变以及位置变换处理,位置变换包括旋转、上下左右移动等,使得训练样本具有多样性。
在一个实施例中,对所述文档图片进行***检测定位的步骤包括:训练YOLOv3检测模型;利用训练得到的YOLOv3检测模型获取***在文档图片中的位置坐标。
其中,YOLOv3检测模型是基于YOLOv2模型改进的目标检测算法,包括多个卷积层,通过所述卷积层获取不同尺度下的特征图,每个特征图均包含了预测目标区域(***图片)的中心点坐标、目标区域的尺寸以及分类,根据不同尺度下的特征图得到***在文档图片中的位置坐标。采用这种多尺度特征预测的方法使得预测结果更加精确。
其中,训练YOLOv3检测模型的训练样本为带有***的文本图片,并标注有文本图片中的***位置坐标。利用这批数据训练模型可以实现***的精准定位。
对于已经检测到的***,现有的文字识别只能处理横排文字的识别,而***中可能同时存在曲形文本和矩形文本,因此,需要将***图片中的所有文本区域检测出来,根据需要分别进行提取。由于***中各个文字之间的间距比较小,在此通过***的定位检测得到***图片之后,将***图片裁剪下来(参照图2b所示),然后,将裁剪得到的***图片放大,可以拉开各个文字之间的间隔,方便通过检测算法检测出不同的文本区域。
在一个实施例中,采取PSENet(Progressive Scale Expansion Network)文字检测网络对所述***图片进行文本检测,检测出***图片中的各个文本区域,包括曲形文本区域和/或矩形文本区域。对于矩形文本区域,可以直接输入文字识别模型中进行文字识别,而对于曲形文本区域,则需要将曲形文本区域转化为矩形文本区域,再输入文字识别模型中进行处理。
进一步地,对***图片进行文本检测的步骤包括:
将***图片输入PSENet文字检测网络,获取与输入的***图片对应的低维特征图,输入图片维度为[B,3,H,W],其中,B表示批尺寸,H表示图片高度,W表示图片宽度;
对输入的***图片进行下采样处理,得到高维特征图;
对所述高维特征图进行上采样处理,并与所述低维特征图进行特征融合,得到与输入 的***图片相同尺寸的输出图片,维度为[B,C,H,W],其中C为设置的核个数,核根据从小到大表示为S1。。。Sn,即分割区域;
使用广度优先搜索算法(Breadth First Search,BFS)搜索所述输出图片,从S1开始,根据S2加入更多像素来扩展区域,直到Sn搜索结束,获取文本连通域,得到***图片中的文本区域。
由于***中可能存在有曲形文本区域和矩形文本区域,对于曲形文本区域,在***文字识别时,最关键的就是将曲形文字转为横排直的文本。通过步骤S3可以检测分割得到一块曲形文本区域,其边缘线的点坐标,记为
Figure PCTCN2020136402-appb-000001
其中,p i表示第i个点的坐标(x i,y i),这N个点包围的区域记为S。
在一个实施例中,将曲形文本区域由曲线形转变为直线形,得到直线形文本图片的步骤包括:
步骤S41,假设曲形文本区域S为圆形区域的一部分,获取所述圆形区域的圆心坐标和圆半径;具体地,根据***的特点,***曲形文本,主要是圆形和椭圆形。因此,可以假设曲形文本区域S是一个圆形区域的一部分。求解圆形区域的圆心和半径,可以转化为求解下述优化问题:
Figure PCTCN2020136402-appb-000002
其中,r为圆半径,c为圆心坐标,p i表示边缘线上第i个点的点坐标,N为曲形文本区域S的边缘线上点的数量;
步骤S42,根据所述圆心坐标和圆半径估算曲形文本区域对应的圆弧区域,得到所述圆弧区域起点及终点所对应的弧度,并获取所述圆弧区域的最小半径和最大半径;具体地,采取试探法,通过边缘线上每个点
Figure PCTCN2020136402-appb-000003
与圆心c=(c 0,c 1)的连线,可以求得相应的弧度
Figure PCTCN2020136402-appb-000004
即:
Figure PCTCN2020136402-appb-000005
其中,(x i,y i)为边缘线上点坐标,(c 0,c 1)为圆心坐标。
现在需要找到文本区域的起始点和终点所对应的弧度值α。显然就是找到一个直线,其与圆心连线对应的弧度值不在
Figure PCTCN2020136402-appb-000006
与圆心连线的弧度值集合中,也即现需要找到一个弧度α,使得
Figure PCTCN2020136402-appb-000007
具体地,通过采样的方法,从[0,2π]区间中每隔0.01取样,判断是否满足公式
Figure PCTCN2020136402-appb-000008
如果满足,则将弧度值α作为曲形文本区域起点及终点所对应的弧度。
参照图2c所示,曲形文本区域是由两个同心圆组成的,分别通过下述公式估计最小半径r和最大半径R:
Figure PCTCN2020136402-appb-000009
其中,p i为边缘线上的第i个点坐标,c为圆心坐标,r为最小半径,R为最大半径。
为了保证这对同心圆能包括所有的文本,可以将R适当的增大以及r适当缩小一点,具体调整量根据实际情况确定。
步骤S43,根据所述圆心坐标、最小半径、最大半径和所述圆弧区域起点及终点所对应的弧度,将直线形文本图片中的坐标对应到***图片中的坐标,从而将曲形文本区域映射到矩形区域,得到直线形文本图片。具体地,假设曲形文本区域展开为矩形区域,高为R-r,长为2πR。通过下式将直线形文本图片中的坐标对应到***图片中的坐标,
Figure PCTCN2020136402-appb-000010
其中,(c 0,c 1)表示圆心坐标,r表示圆弧区域的最小半径,R表示圆弧区域的最大半径,α表示圆弧区域起点及终点所对应的弧度,(x,y)表示直线形文本图片中的坐标,(x',y')表示坐标(x,y)对应到***图片中的坐标。
由于***图片中分割得到的既可能是曲形文本区域,也可能是矩形文本区域。优选地,将曲形文本区域由曲线形转变为直线形的步骤之前,还包括:判断文本区域是否为曲形(包括圆形或椭圆形),若文本区域的边缘线的点坐标包围区域的面积与所述包围区域的最小外接矩形面积的比值小于预设阈值,则所述文本区域为曲形。具体地,利用OPENCV算法,可以计算
Figure PCTCN2020136402-appb-000011
包围区域的面积A 1,同时计算
Figure PCTCN2020136402-appb-000012
包围区域的最小外接矩形面积A 2,如果
Figure PCTCN2020136402-appb-000013
则判断出
Figure PCTCN2020136402-appb-000014
包围的区域是曲形区域,其中,σ为设定阈值,其值在[0,1]区间内,优选为σ=0.7。若其包围的区域面积A 1与最小外接矩形面积A 2越接近,比值越接近于1,则文本区域为矩形。
本申请通过文字识别模型对***进行端对端的检测识别,对***的背景文本干扰的鲁棒性较高,并且,能够将***中的曲形文本转化为直线形文本,不仅能够识别***中的曲形文本,也能够识别***中的矩形文本(直线形文本),解决了***文字难以识别的问题。
如图3所示,是本申请***文字检测识别装置的功能模块图。
本申请所述***文字检测识别装置100可以安装于电子设备中。根据实现的功能,所述***文字检测识别装置可以包括图片获取模块101、***提取模块102、检测分割模块103、文本转化模块104、文字识别模块105。本发所述模块也可以称之为单元,是指一种能够被电子设备处理器所执行,并且能够完成固定功能的一系列计算机程序段,其存储在电子设备的存储器中。
在本实施例中,关于各模块/单元的功能如下:
图片获取模块101用于获取待处理的文档图片,其中,文档图片中具有待识别的***,并且对***颜色没有限定,可以是黑白、红色、蓝色等;
***提取模块102用于对所述文档图片进行***检测定位,并根据检测定位结果提取***图片,其中,所述***图片为包括***的最小矩形图片;参照图2b所示,以圆形***为例,***图片为包括***外圆在内的最小矩形;对于输入的文档,先检测是否存在***,若存在***,则需要定位***位置,根据***位置提取***图片;
检测分割模块103用于对所述***图片进行文本检测,并分割得到***中的曲形文本区域;如图2c所示,在曲形文本区域中,仅包括曲形文本“XX市XXXXXX会”;
文本转化模块104用于将曲形文本区域由曲线形转变为直线形,得到直线形文本图片;参照图2d所示,通过对图2c所示的曲形文本区域转化得到,将圆弧形区域转化为矩形区域;
文字识别模块105用于将所述直线形文本图片输入文字识别模型,得到***中的文字信息;
其中,所述文字识别模型采用SAR网络进行文字识别,所述SAR网络包括ResNet模块,用于提取文字特征,并获取特征向量;基于LSTM编码器-解码器的框架,所述框架包括LSTM编码器和解码器;注意力模块,用于向解码器施加注意力机制;
通过所述ResNet模块获取特征向量,将所述特征向量输入所述LSTM编码器中获得隐藏状态向量;将隐藏状态向量输入施加了注意力机制的解码器中,获得***中的文字信息。
本申请的***文字检测识别可以识别复杂背景下的***文字,并且,可以识别变形的 ***文字,提高***文字识别的准确率,对***的背景文本干扰的鲁棒性较好。
本申请的文字识别模型利用真实复杂背景图进行训练,所述真实复杂背景图包括公司名称和***类型字符串数据,作为***文本内容,以及复杂背景,其中,复杂背景指的是***图片背景中包括的底色或其他干扰文字等,例如,图2b中的“20XX年XX月XX日”,作为***图片的背景文字,其不属于***本身包括的内容,而属于文档的内容。在训练文字识别模型时,将***类型字符串放置于复杂背景中,并对***文字做形变以及位置变换处理,位置变换包括旋转、上下左右移动等,使得训练样本具有多样性。
在一个实施例中,***提取模块102利用YOLOv3检测模型对所述文档图片进行***检测定位。具体地,通过下述方式实现:训练YOLOv3检测模型;利用训练得到的YOLOv3检测模型获取***在文档图片中的位置坐标。
其中,YOLOv3检测模型是基于YOLOv2模型改进的目标检测算法模型,包括多个卷积层,通过所述卷积层获取不同尺度下的特征图,每个特征图均包含了预测目标区域(***图片)的中心点坐标、目标区域的尺寸以及分类,根据不同尺度下的特征图得到***在文档图片中的位置坐标。采用这种多尺度特征预测的方法使得预测结果更加精确。
其中,训练YOLOv3检测模型的训练样本为带有***的文本图片,并标注有文本图片中的***位置坐标。利用这批数据训练模型可以实现***的精准定位。
对于已经检测到的***,现有的文字识别只能处理横排文字的识别,而***中可能同时存在曲形文本和矩形文本,因此,需要将***图片中的所有文本区域检测出来,根据需要分别进行提取。由于***中各个文字之间的间距比较小,在此通过***的定位检测得到***图片之后,将***图片裁剪下来(参照图2b所示),然后,将裁剪得到的***图片放大,可以拉开各个文字之间的间隔,方便通过检测算法检测出不同的文本区域。
在一个实施例中,检测分割模块103采取PSENet(Shape Robust Text Detection with Progressive Scale ExpansionNetwork)文字检测网络对所述***图片进行文本检测,检测出***图片中的各个文本区域,包括曲形文本区域和/或矩形文本区域。对于矩形文本区域,可以直接输入文字识别模型中进行文字识别,而对于曲形文本区域,则需要将曲形文本区域转化为矩形文本区域,再输入文字识别模型中进行处理。
进一步地,检测分割模块103对***图片进行文本检测的步骤包括:
将***图片输入PSENet文字检测网络,获取与输入的***图片对应的低维特征图,输入图片维度为[B,3,H,W],其中H表示图片高度,W表示图片宽度;
对输入的***图片进行下采样处理,得到高维特征图;
对所述高维特征图进行上采样处理,并与所述低维特征图进行特征融合,得到与输入的***图片相同尺寸的输出图片,维度为[B,C,H,W],其中C为设置的核个数,核根据从小到大表示为S1。。。Sn;
使用广度优先搜索算法(Breadth First Search,BFS)搜索所述输出图片,从S1开始,根据S2加入更多像素来扩展区域,直到Sn搜索结束,获取文本连通域,得到***图片中的文本区域。
由于***中可能存在有曲形文本区域和矩形文本区域,对于曲形文本区域,在***文字识别时,最关键的就是将曲形文字转为横排直的文本。通过检测分割可以得到一块曲形文本区域,其边缘线的点坐标,记为
Figure PCTCN2020136402-appb-000015
其中,p i表示第i个点的坐标(x i,y i),这N个点包围的区域记为S。
在一个实施例中,文本转化模块104通过下述方式将曲形文本区域由曲线形转变为直线形,得到直线形文本图片,具体地,包括:
步骤S41,假设曲形文本区域S为圆形区域的一部分,获取所述圆形区域的圆心坐标和圆半径;具体地,根据***的特点,***曲形文本,主要是圆形和椭圆形。因此,可以假设曲形文本区域S是一个圆形区域的一部分。求解圆形区域的圆心和半径,可以转化为 求解下述优化问题:
Figure PCTCN2020136402-appb-000016
其中,r为圆半径,c为圆心坐标,p i表示边缘线上第i个点的点坐标,N为曲形文本区域S的边缘线上点的数量;
步骤S42,根据所述圆心坐标和圆半径估算曲形文本区域对应的圆弧区域,得到所述圆弧区域起点及终点所对应的弧度,并获取所述圆弧区域的最小半径和最大半径;具体地,采取试探法,通过边缘线上每个点
Figure PCTCN2020136402-appb-000017
与圆心c=(c 0,c 1)的连线,可以求得相应的弧度
Figure PCTCN2020136402-appb-000018
Figure PCTCN2020136402-appb-000019
其中,(x i,y i)为边缘线上点坐标,(c 0,c 1)为圆心坐标。
现在需要找到文本区域的起始点和终点所对应的弧度值α。显然就是找到一个直线,其与圆心连线对应的弧度值不在
Figure PCTCN2020136402-appb-000020
与圆心连线的弧度值集合中,也即现需要找到一个弧度α,使得
Figure PCTCN2020136402-appb-000021
具体地,通过采样的方法,从[0,2π]区间中每隔0.01取样,判断是否满足公式
Figure PCTCN2020136402-appb-000022
如果满足,则将弧度值α作为曲形文本区域起点及终点所对应的弧度。
参照图2c所示,曲形文本区域是由两个同心圆组成的,分别通过下述公式估计最小半径r和最大半径R:
Figure PCTCN2020136402-appb-000023
其中,p i为边缘线上的第i个点坐标,c为圆心坐标,r为最小半径,R为最大半径。
为了保证这对同心圆能包括所有的文本,可以将R适当的增大以及r适当缩小一点,具体调整量根据实际情况确定。
步骤S43,根据所述圆心坐标、最小半径、最大半径和所述圆弧区域起点及终点所对应的弧度,将直线形文本图片中的坐标对应到***图片中的坐标,从而将曲形文本区域映射到矩形区域,得到直线形文本图片。具体地,假设曲形文本区域展开为矩形区域,高为R-r,长为2πR。通过下式将直线形文本图片中的坐标对应到***图片中的坐标,
Figure PCTCN2020136402-appb-000024
其中,(c 0,c 1)表示圆心坐标,r表示圆弧区域的最小半径,R表示圆弧区域的最大半径,α表示圆弧区域起点及终点所对应的弧度,(x,y)表示直线形文本图片中的坐标,(x',y')表示坐标(x,y)对应到***图片中的坐标。
由于***图片中分割得到的既可能是曲形文本区域,也可能是矩形文本区域。优选地,将曲形文本区域由曲线形转变为直线形之前,还包括:判断文本区域是否为曲形(包括圆形或椭圆形),若文本区域的边缘线的点坐标包围区域的面积与所述包围区域的最小外接矩形面积的比值小于预设阈值,则所述文本区域为曲形。具体地,利用OPENCV算法,可以计算
Figure PCTCN2020136402-appb-000025
包围区域的面积A 1,同时计算
Figure PCTCN2020136402-appb-000026
包围区域的最小外接矩形面积A 2,如果
Figure PCTCN2020136402-appb-000027
则判断出
Figure PCTCN2020136402-appb-000028
包围的区域是曲形区域,其中,σ为设定阈值,其值在[0,1]区间内,优选为σ=0.7。若其包围的区域面积A 1与最小外接矩形面积A 2越接近,比值越接近于1,则文本区域为矩形。
如图4所示,是本申请实现***文字检测识别方法的电子设备的结构示意图。
所述电子设备1可以包括处理器10、存储器11和总线,还可以包括存储在所述存储器11中并可在所述处理器10上运行的计算机程序,如***文字检测识别程序12。
其中,所述存储器11至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、移动硬盘、多媒体卡、卡型存储器(例如:SD或DX存储器等)、磁性存储器、磁盘、光盘等。所述存储器11在一些实施例中可以是电子设备1的内部存储单元,例如该电子设备1的移动硬盘。所述存储器11在另一些实施例中也可以是电子设备1的外部存储设备,例如电子设备1上配备的插接式移动硬盘、智能存储卡(Smart Media Card,SMC)、安全数字(Secure Digital,SD)卡、闪存卡(Flash Card)等。进一步地,所述存储器11还可以既包括电子设备1的内部存储单元也包括外部存储设备。所述存储器11不仅可以用于存储安装于电子设备1的应用软件及各类数据,例如***文字检测识别程序的代码等,还可以用于暂时地存储已经输出或者将要输出的数据。
所述处理器10在一些实施例中可以由集成电路组成,例如可以由单个封装的集成电路所组成,也可以是由多个相同功能或不同功能封装的集成电路所组成,包括一个或者多个中央处理器(Central Processing unit,CPU)、微处理器、数字处理芯片、图形处理器及各种控制芯片的组合等。所述处理器10是所述电子设备的控制核心(Control Unit),利用各种接口和线路连接整个电子设备的各个部件,通过运行或执行存储在所述存储器11内的程序或者模块(例如***文字检测识别程序等),以及调用存储在所述存储器11内的数据,以执行电子设备1的各种功能和处理数据。
所述总线可以是外设部件互连标准(peripheral component interconnect,简称PCI)总线或扩展工业标准结构(extended industry standard architecture,简称EISA)总线等。该总线可以分为地址总线、数据总线、控制总线等。所述总线被设置为实现所述存储器11以及至少一个处理器10等之间的连接通信。
图4仅示出了具有部件的电子设备,本领域技术人员可以理解的是,图3示出的结构并不构成对所述电子设备1的限定,可以包括比图示更少或者更多的部件,或者组合某些部件,或者不同的部件布置。
例如,尽管未示出,所述电子设备1还可以包括给各个部件供电的电源(比如电池),优选地,电源可以通过电源管理装置与所述至少一个处理器10逻辑相连,从而通过电源管理装置实现充电管理、放电管理、以及功耗管理等功能。电源还可以包括一个或一个以上的直流或交流电源、再充电装置、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。所述电子设备1还可以包括多种传感器、蓝牙模块、Wi-Fi模块等,在此不再赘述。
进一步地,所述电子设备1还可以包括网络接口,可选地,所述网络接口可以包括有线接口和/或无线接口(如WI-FI接口、蓝牙接口等),通常用于在该电子设备1与其他电子设备之间建立通信连接。
可选地,该电子设备1还可以包括用户接口,用户接口可以是显示器(Display)、输入单元(比如键盘(Keyboard)),可选地,用户接口还可以是标准的有线接口、无线接口。可选地,在一些实施例中,显示器可以是LED显示器、液晶显示器、触控式液晶显示器以及OLED(Organic Light-Emitting Diode,有机发光二极管)触摸器等。其中,显示器也可以适当的称为显示屏或显示单元,用于显示在电子设备1中处理的信息以及用于显示可视化的用户界面。
应该了解,所述实施例仅为说明之用,在专利申请范围上并不受此结构的限制。
所述电子设备1中的所述存储器11存储的***文字检测识别程序12是多个指令的组合,在所述处理器10中运行时,可以实现:
获取待处理的文档图片;
对所述文档图片进行***检测定位,并根据检测定位结果提取***图片,其中,所述***图片为包括***的最小矩形图片;
对所述***图片进行文本检测,并分割得到***中的曲形文本区域;
将曲形文本区域由曲线形转变为直线形,得到直线形文本图片;
将所述直线形文本图片输入文字识别模型,得到***中的文字信息;
其中,所述文字识别模型采用SAR网络进行文字识别,所述SAR网络包括ResNet模块,用于提取文字特征,并获取特征向量;基于LSTM编码器-解码器的框架,所述框架包括LSTM编码器和解码器;注意力模块,用于向解码器施加注意力机制;
通过所述ResNet模块获取特征向量,将所述特征向量输入所述LSTM编码器中获得隐藏状态向量;将隐藏状态向量输入施加了注意力机制的解码器中,获得***中的文字信息。
具体地,所述处理器10对上述指令的具体实现方法可参考图1对应实施例中相关步骤的描述,在此不赘述。
进一步地,所述电子设备1集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。所述计算机可读介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)。
此外,本申请实施例还提出一种计算机可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性,计算机可读存储介质中包括计算机程序,该计算机程序被处理器执行时实现如下操作:
获取待处理的文档图片;
对所述文档图片进行***检测定位,并根据检测定位结果提取***图片,其中,所述***图片为包括***的最小矩形图片;
对所述***图片进行文本检测,并分割得到***中的曲形文本区域;
将曲形文本区域由曲线形转变为直线形,得到直线形文本图片;
将所述直线形文本图片输入文字识别模型,得到***中的文字信息;
其中,所述文字识别模型采用SAR网络进行文字识别,所述SAR网络包括ResNet模块,用于提取文字特征,并获取特征向量;基于LSTM编码器-解码器的框架,所述框架包括LSTM编码器和解码器;注意力模块,用于向解码器施加注意力机制;
通过所述ResNet模块获取特征向量,将所述特征向量输入所述LSTM编码器中获得隐藏状态向量;将隐藏状态向量输入施加了注意力机制的解码器中,获得***中的文字信息。
本申请之计算机可读存储介质的具体实施方式与上述面向复杂环境的***文字检测识别方法、装置、电子设备的具体实施方式大致相同,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的设备,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。
对于本领域技术人员而言,显然本申请不限于上述示范性实施例的细节,而且在不背离本申请的精神或基本特征的情况下,能够以其他的具体形式实现本申请。
因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本申请的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本申请内。不应将权利要求中的任何附关联图标记视为限制所涉及的权利要求。
此外,显然“包括”一词不排除其他单元或步骤,单数不排除复数。***权利要求中陈述的多个单元或装置也可以由一个单元或装置通过软件或者硬件来实现。第二等词语用来表示名称,而并不表示任何特定的顺序。
最后应说明的是,以上实施例仅用以说明本申请的技术方案而非限制,尽管参照较佳实施例对本申请进行了详细说明,本领域的普通技术人员应当理解,可以对本申请的技术方案进行修改或等同替换,而不脱离本申请技术方案的精神和范围。

Claims (20)

  1. 一种面向复杂环境的***文字检测识别方法,其中,所述方法包括:
    获取待处理的文档图片;
    对所述文档图片进行***检测定位,并根据检测定位结果提取***图片,其中,所述***图片为包括***的最小矩形图片;
    对所述***图片进行文本检测,并分割得到***中的曲形文本区域;
    将曲形文本区域由曲线形转变为直线形,得到直线形文本图片;
    将所述直线形文本图片输入文字识别模型,得到***中的文字信息;
    其中,所述文字识别模型采用SAR网络进行文字识别,所述SAR网络包括ResNet模块,用于提取文字特征,并获取特征向量;基于LSTM编码器-解码器的框架,所述框架包括LSTM编码器和解码器;注意力模块,用于向解码器施加注意力机制;
    通过所述ResNet模块获取特征向量,将所述特征向量输入所述LSTM编码器中获得隐藏状态向量;将隐藏状态向量输入施加了注意力机制的解码器中,获得***中的文字信息。
  2. 如权利要求1所述的面向复杂环境的***文字检测识别方法,其中,对所述文档图片进行***检测定位的步骤包括:
    训练YOLOv3检测模型;
    利用训练得到的YOLOv3检测模型获取***在文档图片中的位置坐标。
  3. 如权利要求1所述的面向复杂环境的***文字检测识别方法,其中,采取PSENet文字检测网络对所述***图片进行文本检测,检测出***图片中的各个文本区域。
  4. 如权利要求3所述的面向复杂环境的***文字检测识别方法,其中,对***图片进行文本检测的步骤包括:
    将***图片输入PSENet文字检测网络,获取与输入的***图片对应的低维特征图;
    对输入的***图片进行下采样处理,得到高维特征图;
    对所述高维特征图进行上采样处理,并与所述低维特征图进行特征融合,得到与输入的***图片相同尺寸的输出图片;
    使用广度优先搜索算法搜索所述输出图片,获取文本连通域,得到***图片中的文本区域。
  5. 如权利要求1所述的面向复杂环境的***文字检测识别方法,其中,将曲形文本区域由曲线形转变为直线形,得到直线形文本图片的步骤包括:
    假设曲形文本区域为圆形区域的一部分,获取所述圆形区域的圆心坐标和圆半径;
    根据所述圆心坐标和圆半径估算曲形文本区域对应的圆弧区域,得到所述圆弧区域起点及终点所对应的弧度,并获取所述圆弧区域的最小半径和最大半径;
    根据所述圆心坐标、最小半径、最大半径和所述圆弧区域起点及终点所对应的弧度,将直线形文本图片中的坐标对应到***图片中的坐标,从而将曲形文本区域映射到矩形区域,得到直线形文本图片。
  6. 如权利要求5所述的面向复杂环境的***文字检测识别方法,其中,通过下式将直线形文本图片中的坐标对应到***图片中的坐标,
    Figure PCTCN2020136402-appb-100001
    其中,(c 0,c 1)表示圆心坐标,r表示圆弧区域的最小半径,R表示圆弧区域的最大半径,α表示圆弧区域起点及终点所对应的弧度,(x,y)表示直线形文本图片中的坐标,(x',y')表示坐标(x,y)对应到***图片中的坐标。
  7. 如权利要求5所述的面向复杂环境的***文字检测识别方法,其中,将曲形文本区 域由曲线形转变为直线形的步骤之前,还包括:判断文本区域是否为曲形,若文本区域的边缘线的点坐标包围区域的面积与所述包围区域的最小外接矩形面积的比值小于预设阈值,则所述文本区域为曲形。
  8. 一种面向复杂环境的***文字检测识别装置,其中,包括:
    图片获取模块,用于获取待处理的文档图片;
    ***提取模块,用于对所述文档图片进行***检测定位,并根据检测定位结果提取***图片,其中,所述***图片为包括***的最小矩形图片;
    检测分割模块,用于对所述***图片进行文本检测,并分割得到***中的曲形文本区域;
    文本转化模块,用于将曲形文本区域由曲线形转变为直线形,得到直线形文本图片;
    文字识别模块,用于将所述直线形文本图片输入文字识别模型,得到***中的文字信息;
    其中,所述文字识别模型采用SAR网络进行文字识别,所述SAR网络包括ResNet模块,用于提取文字特征,并获取特征向量;基于LSTM编码器-解码器的框架,所述框架包括LSTM编码器和解码器;注意力模块,用于向解码器施加注意力机制;
    通过所述ResNet模块获取特征向量,将所述特征向量输入所述LSTM编码器中获得隐藏状态向量;将隐藏状态向量输入施加了注意力机制的解码器中,获得***中的文字信息。
  9. 一种电子设备,其中,所述电子设备包括:
    至少一个处理器;以及,
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行以下步骤:
    获取待处理的文档图片;
    对所述文档图片进行***检测定位,并根据检测定位结果提取***图片,其中,所述***图片为包括***的最小矩形图片;
    对所述***图片进行文本检测,并分割得到***中的曲形文本区域;
    将曲形文本区域由曲线形转变为直线形,得到直线形文本图片;
    将所述直线形文本图片输入文字识别模型,得到***中的文字信息;
    其中,所述文字识别模型采用SAR网络进行文字识别,所述SAR网络包括ResNet模块,用于提取文字特征,并获取特征向量;基于LSTM编码器-解码器的框架,所述框架包括LSTM编码器和解码器;注意力模块,用于向解码器施加注意力机制;
    通过所述ResNet模块获取特征向量,将所述特征向量输入所述LSTM编码器中获得隐藏状态向量;将隐藏状态向量输入施加了注意力机制的解码器中,获得***中的文字信息。
  10. 根据权利要求9所述的电子设备,其中,对所述文档图片进行***检测定位的步骤包括:
    训练YOLOv3检测模型;
    利用训练得到的YOLOv3检测模型获取***在文档图片中的位置坐标。
  11. 根据权利要求9所述的电子设备,其中,采取PSENet文字检测网络对所述***图片进行文本检测,检测出***图片中的各个文本区域。
  12. 根据权利要求11所述的电子设备,其中,对***图片进行文本检测的步骤包括:
    将***图片输入PSENet文字检测网络,获取与输入的***图片对应的低维特征图;
    对输入的***图片进行下采样处理,得到高维特征图;
    对所述高维特征图进行上采样处理,并与所述低维特征图进行特征融合,得到与输 入的***图片相同尺寸的输出图片;
    使用广度优先搜索算法搜索所述输出图片,获取文本连通域,得到***图片中的文本区域。
  13. 根据权利要求9所述的电子设备,其中,将曲形文本区域由曲线形转变为直线形,得到直线形文本图片的步骤包括:
    假设曲形文本区域为圆形区域的一部分,获取所述圆形区域的圆心坐标和圆半径;
    根据所述圆心坐标和圆半径估算曲形文本区域对应的圆弧区域,得到所述圆弧区域起点及终点所对应的弧度,并获取所述圆弧区域的最小半径和最大半径;
    根据所述圆心坐标、最小半径、最大半径和所述圆弧区域起点及终点所对应的弧度,将直线形文本图片中的坐标对应到***图片中的坐标,从而将曲形文本区域映射到矩形区域,得到直线形文本图片。
  14. 根据权利要求13所述的电子设备,其中,通过下式将直线形文本图片中的坐标对应到***图片中的坐标,
    Figure PCTCN2020136402-appb-100002
    其中,(c 0,c 1)表示圆心坐标,r表示圆弧区域的最小半径,R表示圆弧区域的最大半径,α表示圆弧区域起点及终点所对应的弧度,(x,y)表示直线形文本图片中的坐标,(x′,y′)表示坐标(x,y)对应到***图片中的坐标。
  15. 根据权利要求13所述的电子设备,其中,将曲形文本区域由曲线形转变为直线形的步骤之前,还包括:判断文本区域是否为曲形,若文本区域的边缘线的点坐标包围区域的面积与所述包围区域的最小外接矩形面积的比值小于预设阈值,则所述文本区域为曲形。
  16. 一种计算机可读存储介质,存储有计算机程序,其中,所述计算机程序被处理器执行时实现以下步骤:
    获取待处理的文档图片;
    对所述文档图片进行***检测定位,并根据检测定位结果提取***图片,其中,所述***图片为包括***的最小矩形图片;
    对所述***图片进行文本检测,并分割得到***中的曲形文本区域;
    将曲形文本区域由曲线形转变为直线形,得到直线形文本图片;
    将所述直线形文本图片输入文字识别模型,得到***中的文字信息;
    其中,所述文字识别模型采用SAR网络进行文字识别,所述SAR网络包括ResNet模块,用于提取文字特征,并获取特征向量;基于LSTM编码器-解码器的框架,所述框架包括LSTM编码器和解码器;注意力模块,用于向解码器施加注意力机制;
    通过所述ResNet模块获取特征向量,将所述特征向量输入所述LSTM编码器中获得隐藏状态向量;将隐藏状态向量输入施加了注意力机制的解码器中,获得***中的文字信息。
  17. 根据权利要求16所述的计算机可读存储介质,其中,对所述文档图片进行***检测定位的步骤包括:
    训练YOLOv3检测模型;
    利用训练得到的YOLOv3检测模型获取***在文档图片中的位置坐标。
  18. 根据权利要求17所述的计算机可读存储介质,其中,采取PSENet文字检测网络对所述***图片进行文本检测,检测出***图片中的各个文本区域。
  19. 根据权利要求16所述的计算机可读存储介质,其中,对***图片进行文本检测的步骤包括:
    将***图片输入PSENet文字检测网络,获取与输入的***图片对应的低维特征图;
    对输入的***图片进行下采样处理,得到高维特征图;
    对所述高维特征图进行上采样处理,并与所述低维特征图进行特征融合,得到与输入的***图片相同尺寸的输出图片;
    使用广度优先搜索算法搜索所述输出图片,获取文本连通域,得到***图片中的文本区域。
  20. 根据权利要求16所述的计算机可读存储介质,其中,将曲形文本区域由曲线形转变为直线形,得到直线形文本图片的步骤包括:
    假设曲形文本区域为圆形区域的一部分,获取所述圆形区域的圆心坐标和圆半径;
    根据所述圆心坐标和圆半径估算曲形文本区域对应的圆弧区域,得到所述圆弧区域起点及终点所对应的弧度,并获取所述圆弧区域的最小半径和最大半径;
    根据所述圆心坐标、最小半径、最大半径和所述圆弧区域起点及终点所对应的弧度,将直线形文本图片中的坐标对应到***图片中的坐标,从而将曲形文本区域映射到矩形区域,得到直线形文本图片。
PCT/CN2020/136402 2020-06-22 2020-12-15 面向复杂环境的***文字检测识别方法、装置及介质 WO2021115490A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010573766.4A CN111767911B (zh) 2020-06-22 2020-06-22 面向复杂环境的***文字检测识别方法、装置及介质
CN202010573766.4 2020-06-22

Publications (1)

Publication Number Publication Date
WO2021115490A1 true WO2021115490A1 (zh) 2021-06-17

Family

ID=72721850

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/136402 WO2021115490A1 (zh) 2020-06-22 2020-12-15 面向复杂环境的***文字检测识别方法、装置及介质

Country Status (2)

Country Link
CN (1) CN111767911B (zh)
WO (1) WO2021115490A1 (zh)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113554031A (zh) * 2021-08-02 2021-10-26 杭州拼便宜网络科技有限公司 基于图像识别的物流交割方法、装置、设备和存储介质
CN113610090A (zh) * 2021-07-29 2021-11-05 广州广电运通金融电子股份有限公司 ***图像识别分类方法、装置、计算机设备和存储介质
CN113627423A (zh) * 2021-07-08 2021-11-09 广州广电运通金融电子股份有限公司 圆形***字符识别方法、装置、计算机设备和存储介质
CN113743400A (zh) * 2021-07-16 2021-12-03 华中科技大学 一种基于深度学习的电子公文智能审查方法及***
CN113743360A (zh) * 2021-09-16 2021-12-03 京东科技信息技术有限公司 智能化***解析的方法和装置
CN113807340A (zh) * 2021-09-07 2021-12-17 南京信息工程大学 一种基于注意力机制的不规则自然场景文本识别方法
CN113971745A (zh) * 2021-09-27 2022-01-25 哈尔滨工业大学 一种基于深度神经网络的出入境验讫章识别方法及装置
CN114359553A (zh) * 2022-03-17 2022-04-15 北京惠朗时代科技有限公司 一种基于物联网的签章定位方法、***及存储介质
CN115359543A (zh) * 2022-10-19 2022-11-18 北京惠朗时代科技有限公司 一种基于区块链的远程用印方法与***
CN115830584A (zh) * 2022-11-29 2023-03-21 南京云阶电力科技有限公司 基于深度学习的端子排文本检测方法及***
CN116416626A (zh) * 2023-06-12 2023-07-11 平安银行股份有限公司 圆形***数据的获取方法、装置、设备及存储介质
CN116702719A (zh) * 2023-06-16 2023-09-05 易签链(深圳)科技有限公司 一种基于人工排版深度学习的智能印文生成方法
CN117310591A (zh) * 2023-11-28 2023-12-29 广州思林杰科技股份有限公司 一种小型的用于测试设备校准精度检测的设备
CN117975492A (zh) * 2024-03-29 2024-05-03 南昌航空大学 一种矩形***文字识别方法
CN118072341A (zh) * 2024-04-19 2024-05-24 深圳豸印科技有限责任公司 一种用印安全监测方法、装置及***

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111767911B (zh) * 2020-06-22 2024-06-28 平安科技(深圳)有限公司 面向复杂环境的***文字检测识别方法、装置及介质
CN112488095A (zh) * 2020-12-18 2021-03-12 北京字节跳动网络技术有限公司 ***图像识别方法、装置和电子设备
CN113033325A (zh) * 2021-03-04 2021-06-25 杭州睿胜软件有限公司 图像处理方法及装置、智能***识别设备和存储介质
CN112926511A (zh) * 2021-03-25 2021-06-08 深圳市商汤科技有限公司 ***文本识别方法、装置、设备及计算机可读存储介质
CN113033543B (zh) * 2021-04-27 2024-04-05 中国平安人寿保险股份有限公司 曲形文本识别方法、装置、设备及介质
CN113327254A (zh) * 2021-05-27 2021-08-31 北京深睿博联科技有限责任公司 一种基于u型网络的图像分割方法和***
CN113269102A (zh) * 2021-05-28 2021-08-17 中邮信息科技(北京)有限公司 一种***信息识别方法、装置、计算机设备和存储介质
CN113627432A (zh) * 2021-08-18 2021-11-09 南京中孚信息技术有限公司 图像中***识别方法、装置、计算机设备及可读存储介质
CN113869017A (zh) * 2021-09-30 2021-12-31 平安科技(深圳)有限公司 基于人工智能的表格图像重构方法、装置、设备及介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107944452A (zh) * 2017-12-12 2018-04-20 深圳市创业***实业有限公司 一种圆形***文字识别方法
CN110287960A (zh) * 2019-07-02 2019-09-27 中国科学院信息工程研究所 自然场景图像中曲线文字的检测识别方法
CN110659647A (zh) * 2019-09-11 2020-01-07 杭州睿琪软件有限公司 ***图像识别方法及装置、智能***识别设备和存储介质
CN110728277A (zh) * 2019-09-27 2020-01-24 达而观信息科技(上海)有限公司 一种***智能检测与识别的方法
CN111178355A (zh) * 2019-12-27 2020-05-19 中化资本有限公司 ***识别方法、装置和存储介质
CN111767911A (zh) * 2020-06-22 2020-10-13 平安科技(深圳)有限公司 面向复杂环境的***文字检测识别方法、装置及介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102566812B (zh) * 2011-09-30 2015-02-18 北京壹人壹本信息科技有限公司 一种手写记事本的实现方法及装置
CN105631447B (zh) * 2015-12-18 2019-02-15 杭州仁盈科技股份有限公司 一种识别圆形公章中文字的方法
CN107609557B (zh) * 2017-08-24 2020-09-08 华中科技大学 一种指针式仪表读数识别方法
CN110443250B (zh) * 2019-07-31 2022-06-10 天津车之家数据信息技术有限公司 一种合同***的类别识别方法、装置和计算设备
CN110866529A (zh) * 2019-10-29 2020-03-06 腾讯科技(深圳)有限公司 字符识别方法、装置、电子设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107944452A (zh) * 2017-12-12 2018-04-20 深圳市创业***实业有限公司 一种圆形***文字识别方法
CN110287960A (zh) * 2019-07-02 2019-09-27 中国科学院信息工程研究所 自然场景图像中曲线文字的检测识别方法
CN110659647A (zh) * 2019-09-11 2020-01-07 杭州睿琪软件有限公司 ***图像识别方法及装置、智能***识别设备和存储介质
CN110728277A (zh) * 2019-09-27 2020-01-24 达而观信息科技(上海)有限公司 一种***智能检测与识别的方法
CN111178355A (zh) * 2019-12-27 2020-05-19 中化资本有限公司 ***识别方法、装置和存储介质
CN111767911A (zh) * 2020-06-22 2020-10-13 平安科技(深圳)有限公司 面向复杂环境的***文字检测识别方法、装置及介质

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113627423A (zh) * 2021-07-08 2021-11-09 广州广电运通金融电子股份有限公司 圆形***字符识别方法、装置、计算机设备和存储介质
CN113743400A (zh) * 2021-07-16 2021-12-03 华中科技大学 一种基于深度学习的电子公文智能审查方法及***
CN113743400B (zh) * 2021-07-16 2024-02-20 华中科技大学 一种基于深度学习的电子公文智能审查方法及***
CN113610090B (zh) * 2021-07-29 2023-12-26 深圳广电银通金融电子科技有限公司 ***图像识别分类方法、装置、计算机设备和存储介质
CN113610090A (zh) * 2021-07-29 2021-11-05 广州广电运通金融电子股份有限公司 ***图像识别分类方法、装置、计算机设备和存储介质
CN113554031A (zh) * 2021-08-02 2021-10-26 杭州拼便宜网络科技有限公司 基于图像识别的物流交割方法、装置、设备和存储介质
CN113807340A (zh) * 2021-09-07 2021-12-17 南京信息工程大学 一种基于注意力机制的不规则自然场景文本识别方法
CN113807340B (zh) * 2021-09-07 2024-03-15 南京信息工程大学 一种基于注意力机制的不规则自然场景文本识别方法
CN113743360A (zh) * 2021-09-16 2021-12-03 京东科技信息技术有限公司 智能化***解析的方法和装置
CN113743360B (zh) * 2021-09-16 2024-03-05 京东科技信息技术有限公司 智能化***解析的方法和装置
CN113971745A (zh) * 2021-09-27 2022-01-25 哈尔滨工业大学 一种基于深度神经网络的出入境验讫章识别方法及装置
CN113971745B (zh) * 2021-09-27 2024-04-16 哈尔滨工业大学 一种基于深度神经网络的出入境验讫章识别方法及装置
CN114359553B (zh) * 2022-03-17 2022-06-03 北京惠朗时代科技有限公司 一种基于物联网的签章定位方法、***及存储介质
CN114359553A (zh) * 2022-03-17 2022-04-15 北京惠朗时代科技有限公司 一种基于物联网的签章定位方法、***及存储介质
CN115359543B (zh) * 2022-10-19 2023-01-10 北京惠朗时代科技有限公司 一种基于区块链的远程用印方法与***
CN115359543A (zh) * 2022-10-19 2022-11-18 北京惠朗时代科技有限公司 一种基于区块链的远程用印方法与***
CN115830584A (zh) * 2022-11-29 2023-03-21 南京云阶电力科技有限公司 基于深度学习的端子排文本检测方法及***
CN115830584B (zh) * 2022-11-29 2024-05-24 南京云阶电力科技有限公司 基于深度学习的端子排文本检测方法及***
CN116416626A (zh) * 2023-06-12 2023-07-11 平安银行股份有限公司 圆形***数据的获取方法、装置、设备及存储介质
CN116416626B (zh) * 2023-06-12 2023-08-29 平安银行股份有限公司 圆形***数据的获取方法、装置、设备及存储介质
CN116702719A (zh) * 2023-06-16 2023-09-05 易签链(深圳)科技有限公司 一种基于人工排版深度学习的智能印文生成方法
CN117310591B (zh) * 2023-11-28 2024-03-19 广州思林杰科技股份有限公司 一种小型的用于测试设备校准精度检测的设备
CN117310591A (zh) * 2023-11-28 2023-12-29 广州思林杰科技股份有限公司 一种小型的用于测试设备校准精度检测的设备
CN117975492A (zh) * 2024-03-29 2024-05-03 南昌航空大学 一种矩形***文字识别方法
CN117975492B (zh) * 2024-03-29 2024-06-07 南昌航空大学 一种矩形***文字识别方法
CN118072341A (zh) * 2024-04-19 2024-05-24 深圳豸印科技有限责任公司 一种用印安全监测方法、装置及***

Also Published As

Publication number Publication date
CN111767911B (zh) 2024-06-28
CN111767911A (zh) 2020-10-13

Similar Documents

Publication Publication Date Title
WO2021115490A1 (zh) 面向复杂环境的***文字检测识别方法、装置及介质
US11645826B2 (en) Generating searchable text for documents portrayed in a repository of digital images utilizing orientation and text prediction neural networks
US10572754B2 (en) Area of interest boundary extracting method and apparatus, device and computer storage medium
US10303968B2 (en) Method and apparatus for image recognition
CN109816118B (zh) 一种基于深度学习模型的创建结构化文档的方法及终端
CN112418216B (zh) 一种复杂自然场景图像中的文字检测方法
CN109800698B (zh) 基于深度学习的图标检测方法、图标检测***和存储介质
WO2018233055A1 (zh) 保单信息录入的方法、装置、计算机设备及存储介质
CN106056114A (zh) 名片内容识别方法和装置
CN108764352B (zh) 重复页面内容检测方法和装置
WO2020133442A1 (zh) 一种识别文本的方法及终端设备
CN112699775A (zh) 基于深度学习的证件识别方法、装置、设备及存储介质
CN113033660B (zh) 一种通用小语种检测方法、装置以及设备
CN114694165A (zh) 一种pid图纸智能识别与重绘方法
CN110659637A (zh) 一种结合深度神经网络和sift特征的电能表示数与标签自动识别方法
CN111027456A (zh) 基于图像识别的机械水表读数识别方法
CN111950523A (zh) 基于航拍的船只检测优化方法、装置、电子设备及介质
WO2021168703A1 (zh) 字符处理及字符识别方法、存储介质和终端设备
CN111598099A (zh) 图像文本识别性能的测试方法、装置、测试设备及介质
CN111241974B (zh) 票据信息获取方法、装置、计算机设备和存储介质
WO2023038722A1 (en) Entry detection and recognition for custom forms
CN112101356A (zh) 一种图片中特定文本的定位方法、装置及存储介质
US20140239072A1 (en) Automatically Converting a Sign and Method for Automatically Reading a Sign
CN116758578B (zh) 机械制图信息提取方法、装置、***及存储介质
CN112464892B (zh) 票据区域识别方法、装置、电子设备及可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20900092

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20900092

Country of ref document: EP

Kind code of ref document: A1