CN112215235B - Scene text detection method aiming at large character spacing and local shielding - Google Patents

Scene text detection method aiming at large character spacing and local shielding Download PDF

Info

Publication number
CN112215235B
CN112215235B CN202011110021.0A CN202011110021A CN112215235B CN 112215235 B CN112215235 B CN 112215235B CN 202011110021 A CN202011110021 A CN 202011110021A CN 112215235 B CN112215235 B CN 112215235B
Authority
CN
China
Prior art keywords
text
feature
instance
features
map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011110021.0A
Other languages
Chinese (zh)
Other versions
CN112215235A (en
Inventor
高攀
刘磊
黄军文
汤红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Huafu Technology Co ltd
Original Assignee
Shenzhen Huafu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Huafu Technology Co ltd filed Critical Shenzhen Huafu Technology Co ltd
Priority to CN202011110021.0A priority Critical patent/CN112215235B/en
Publication of CN112215235A publication Critical patent/CN112215235A/en
Application granted granted Critical
Publication of CN112215235B publication Critical patent/CN112215235B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Character Input (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of optical character recognition, and particularly relates to a scene text detection method with large character spacing and partial shielding, which comprises the following steps: s1, extracting features from an input picture through a full convolution neural network, and fusing features of different layers; s2, outputting a text segmentation map through a text semantic segmentation network by the fused features, and outputting a text instance embedded feature map through a text instance feature embedding module; s3, embedding the text segmentation map and the text instance into the feature map to obtain a text detection result through a text instance recombination algorithm. The text instance embedding module embeds each pixel into the feature space, the average pixel feature in the text region is regarded as the feature of the text region, and then the text instance reorganizing algorithm reorganizes the text candidate region with similar features, so that the text instance which is segmented into a plurality of regions due to large character spacing or partial occlusion can be detected again as a complete object.

Description

Scene text detection method aiming at large character spacing and local shielding
Technical Field
The invention belongs to the technical field of optical character recognition, and particularly relates to a scene text detection method with large character spacing and partial shielding.
Background
Because the characters naturally contain rich and accurate semantic information, the computer can read and understand the characters on the pictures and has academic and practical application values. Scene text detection is to detect text in a natural scene picture. The difficulty of the task mainly comes from three aspects, namely the diversity of the text, and the text in the natural scene has various fonts, colors, sizes and artistic styles; secondly, the background of the natural scene picture is quite complex, and some objects with structures similar to texts exist in real life, such as windows, bricks and tiles, fences, grasslands and the like; thirdly, the influence of the imaging environment of the picture, uneven illumination, blurring and the like exist in part of the picture.
One of the existing methods is based on text box regression, and the methods use a general target detection framework such as SSD, faster R-CNN, etc., but cannot process text with any shape (such as curved text) due to the limitation of an anchor mechanism. Meanwhile, due to the limited receptive field, the frame regression of the long text is inaccurate.
The second existing method is based on semantic segmentation, and the segmentation-based method divides pixels on a picture into a foreground (text region) and a background. This approach is capable of handling arbitrarily shaped text without regard to the shape and size of the text object, but because the boundaries of the text are difficult to define, adjacent lines of text are not easily distinguished. In addition, most methods use connected region analysis to determine text instances, and when text characters are widely spaced or partially occluded, a text object corresponds to multiple detection boxes.
Disclosure of Invention
In order to solve the problem that one text object corresponds to a plurality of detection frames when texts in images are identified based on a semantic segmentation method in the background art and text characters are large in distance or are partially blocked, the invention provides the following technical scheme:
a scene text detection method aiming at a scene with large character spacing and local shielding comprises the following steps:
s1, extracting features from an input picture through a full convolution neural network, and fusing features of different layers;
s2, outputting a text segmentation map through a text semantic segmentation network by the fused features, and outputting a text instance embedded feature map through a text instance feature embedding module;
s3, embedding the text segmentation map and the text instance into the feature map to obtain a text detection result through a text instance recombination algorithm.
Further, in S1, a full convolution network with a feature pyramid structure is adopted, features of different levels are extracted from an input picture through the feature pyramid network, and then the features of different levels are fused together through point adding operation and channel cascading operation.
Further, in S2, the text instance feature embedding module embeds each pixel into a feature space, and the average pixel feature in the text region is considered as a feature of the text region
Further, the network structure constructed by the text instance feature embedding module enables the fused features to pass through two Conv-BN-Relu layers, then reduces the channel number by using a 1X 1 convolution, reduces the calculated amount, and then upsamples to the original input size through a relu activation layer.
Further, the text instance feature embedding module trains by decreasing feature distances for different pixels in the same text instance and increasing feature distances between different text instances.
Further, the text instance reorganization algorithm is a metric-based clustering algorithm.
The scene text detection method has the beneficial effects that the scene text detection method has large character spacing and partial shielding: the method aims at optimizing the false detection problem of the text with large character spacing and local occlusion, and provides a text instance feature embedding module and a text instance recombination algorithm, wherein the text instance embedding module embeds each pixel into a feature space and regards average pixel features in a text region as features of the text region. The text instance reassembly algorithm then reassembles the text candidate regions with similar features. By doing so, text instances that are segmented into multiple regions due to large character spacing or partial occlusion can be re-detected as a complete object. The two proposed modules are not dependent on specific model details, can be very portable and can be combined with any mainstream segmentation-based text detection algorithm, and the accuracy of the method is improved.
Drawings
FIG. 1 is a schematic diagram illustrating steps of a text detection method according to an embodiment of the present invention;
FIG. 2 is a diagram of a full convolution network of feature pyramids in an embodiment of the present invention;
FIG. 3 is a network architecture diagram of a text instance feature embedding module in an embodiment of the present invention;
Detailed Description
The invention is further illustrated below with reference to examples, which are only examples of part of the invention, which are intended to illustrate the invention and do not limit the scope of the invention in any way.
As shown in figure 1 of the specification, the method for detecting the scene text with large character spacing and partial occlusion comprises the following steps:
S1, extracting features from an input picture through a full convolution neural network, and fusing features of different layers
Any mainstream segmentation-based text detection method is selected, and in this embodiment, a full convolution network (fpn+fcn) with a classical feature pyramid structure is taken as an example. The whole network structure is shown in the figure 2 of the specification, the input picture firstly extracts the features of different layers through the feature pyramid network, and then the features of different layers are fused together through the channel cascading operation.
S2, outputting a text segmentation map through a text semantic segmentation network by the fused features, and outputting a text instance embedded feature map through a text instance feature embedding module
The text instance feature embedding module is mainly used for outputting text instance embedding feature diagram introduction. The text example features are embedded into a module network structure, as shown in figure 3 of the specification, the fused features pass through two Conv-BN-Relu layers, then a 1X 1 convolution is used for reducing the number of channels, the calculated amount is reduced, and then the fused features are up-sampled to the original input size through a relu activation layer. Specifically, the text instance feature embedding module outputs a feature vector F p = { x1, x2, x3, x4} (four dimensions are examples) for each pixel. The features of a text region are represented by the average feature vector of the pixels of the region, the mathematical formula of which can be defined as
Since the feature vector of each pixel is missing in the label and the purpose of the module is to learn the similarity between text instances, the idea of clustering is employed herein to supervise text instance embedding module learning by decreasing the feature distance of different pixels in the same text instance and increasing the feature distance between different text instances during training. Specifically, (1) decrease the intra-instance distance: the feature distance between pixels in the same text region should be as small as possible. The distance between a pixel and a text instance is used herein as a penalty to make the pixel characteristics more similar within the same text region. (2) increasing the inter-instance distance: the distance between feature vectors of different text regions should be as large as possible, as opposed to the intra-instance distance.
S3, obtaining a text detection result through a text instance recombination algorithm by embedding the text segmentation map and the text instance embedding feature map
The text instance reorganization algorithm is a clustering algorithm based on a metric (distance), and the main idea is to determine whether the feature vector distance of two text candidate sets is smaller than a threshold value, and if the feature distance is small enough, consider that the two text candidate sets may be the same text instance, and reorganize the two candidate texts into one. In addition to feature distance, some logic conditions, such as the relative position of two candidate texts, etc., need to be satisfied.
The invention provides a text instance feature embedding module and a text instance recombination algorithm aiming at texts with large character spacing and partially blocked, which can effectively detect the texts with large character spacing and partially blocked, effectively improve the overall accuracy of a model, realize plug and play, do not depend on a specific method, and can be very portable and combined with a mainstream segmentation-based text detection method.
The present invention is not limited to the above-mentioned embodiments, but is intended to be limited to the following embodiments, and any modifications, equivalent changes and variations in the above-mentioned embodiments can be made by those skilled in the art without departing from the scope of the present invention.

Claims (2)

1. A scene text detection method aiming at a scene with large character spacing and local shielding is characterized by comprising the following steps:
s1, extracting features from an input picture through a full convolution neural network, and fusing features of different layers;
s2, outputting a text segmentation map through a text semantic segmentation network by the fused features, and outputting a text instance embedded feature map through a text instance feature embedding module;
S3, embedding the text segmentation map and the text instance into the feature map to obtain a text detection result through a text instance recombination algorithm;
In S2, the text instance feature embedding module embeds each pixel into a feature space, and the average pixel feature in the text region is regarded as the feature of the text region;
The network structure constructed by the text example feature embedding module firstly passes through two Conv-BN-Relu layers, then uses a1 multiplied by 1 convolution to reduce the number of channels, reduces the calculated amount, and then carries out upsampling to the original input size through a relu activation layer;
The text instance feature embedding module trains by reducing feature distances of different pixels in the same text instance and increasing feature distances among different text instances;
the text instance reorganization algorithm is a metric-based clustering algorithm.
2. The scene text detection method with large character spacing and partial shielding according to claim 1, wherein in S1, a full convolution network with a feature pyramid structure is adopted, the input picture firstly extracts features of different levels through the feature pyramid network, and then the features of different levels are fused together through point adding operation and channel cascading operation.
CN202011110021.0A 2020-10-16 2020-10-16 Scene text detection method aiming at large character spacing and local shielding Active CN112215235B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011110021.0A CN112215235B (en) 2020-10-16 2020-10-16 Scene text detection method aiming at large character spacing and local shielding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011110021.0A CN112215235B (en) 2020-10-16 2020-10-16 Scene text detection method aiming at large character spacing and local shielding

Publications (2)

Publication Number Publication Date
CN112215235A CN112215235A (en) 2021-01-12
CN112215235B true CN112215235B (en) 2024-04-26

Family

ID=74055526

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011110021.0A Active CN112215235B (en) 2020-10-16 2020-10-16 Scene text detection method aiming at large character spacing and local shielding

Country Status (1)

Country Link
CN (1) CN112215235B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110210485A (en) * 2019-05-13 2019-09-06 常熟理工学院 The image, semantic dividing method of Fusion Features is instructed based on attention mechanism
WO2019192397A1 (en) * 2018-04-04 2019-10-10 华中科技大学 End-to-end recognition method for scene text in any shape
CN111209865A (en) * 2020-01-06 2020-05-29 中科鼎富(北京)科技发展有限公司 File content extraction method and device, electronic equipment and storage medium
CN111401376A (en) * 2020-03-12 2020-07-10 腾讯科技(深圳)有限公司 Target detection method, target detection device, electronic equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040114800A1 (en) * 2002-09-12 2004-06-17 Baylor College Of Medicine System and method for image segmentation
US10706079B2 (en) * 2018-01-23 2020-07-07 Vmware, Inc. Group clustering using inter-group dissimilarities
CN108304835B (en) * 2018-01-30 2019-12-06 百度在线网络技术(北京)有限公司 character detection method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019192397A1 (en) * 2018-04-04 2019-10-10 华中科技大学 End-to-end recognition method for scene text in any shape
CN110210485A (en) * 2019-05-13 2019-09-06 常熟理工学院 The image, semantic dividing method of Fusion Features is instructed based on attention mechanism
CN111209865A (en) * 2020-01-06 2020-05-29 中科鼎富(北京)科技发展有限公司 File content extraction method and device, electronic equipment and storage medium
CN111401376A (en) * 2020-03-12 2020-07-10 腾讯科技(深圳)有限公司 Target detection method, target detection device, electronic equipment and storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Split and Merge: Component Based Segmentation Network for Text Detection;Pan Gao 等;Pattern Recognition and Artificial Intelligence. International Conference, ICPRAI 2020;第12068卷;第14-页 *
Text Detection with Deep Neural Network System Based on Overlapped Labels and a Hierarchical Segmentation of Feature Maps;Kim, HH 等;INTERNATIONAL JOURNAL OF CONTROL AUTOMATION AND SYSTEMS;第17卷(第6期);第1599-1610页 *
基于深度学习的自然场景文本检测研究;高攀;中国优秀硕士学位论文全文数据库信息科技辑(第第10期期);第I138-78页 *
基于点云数据的分割方法综述;顾军华;李炜;董永峰;;燕山大学学报(第02期);第35-47页 *

Also Published As

Publication number Publication date
CN112215235A (en) 2021-01-12

Similar Documents

Publication Publication Date Title
CN112232349B (en) Model training method, image segmentation method and device
CN112884064B (en) Target detection and identification method based on neural network
CN108121991B (en) Deep learning ship target detection method based on edge candidate region extraction
CN102332092B (en) Flame detection method based on video analysis
CN106446015A (en) Video content access prediction and recommendation method based on user behavior preference
CN110766020A (en) System and method for detecting and identifying multi-language natural scene text
CN111666842B (en) Shadow detection method based on double-current-cavity convolution neural network
WO2023082784A1 (en) Person re-identification method and apparatus based on local feature attention
CN110969129A (en) End-to-end tax bill text detection and identification method
CN109815948B (en) Test paper segmentation algorithm under complex scene
CN102332097B (en) Method for segmenting complex background text images based on image segmentation
CN110517270B (en) Indoor scene semantic segmentation method based on super-pixel depth network
CN112070174A (en) Text detection method in natural scene based on deep learning
Wei et al. Pedestrian detection in underground mines via parallel feature transfer network
CN111652240A (en) Image local feature detection and description method based on CNN
CN111414938B (en) Target detection method for bubbles in plate heat exchanger
CN111507353A (en) Chinese field detection method and system based on character recognition
CN111507416A (en) Smoking behavior real-time detection method based on deep learning
CN111242829A (en) Watermark extraction method, device, equipment and storage medium
CN111242114B (en) Character recognition method and device
CN112215235B (en) Scene text detection method aiming at large character spacing and local shielding
CN103136536A (en) System and method for detecting target and method for exacting image features
CN112633179A (en) Farmer market aisle object occupying channel detection method based on video analysis
CN116824630A (en) Light infrared image pedestrian target detection method
CN114694133B (en) Text recognition method based on combination of image processing and deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Country or region after: China

Address after: 518000 Room 201, building A, No. 1, Qian Wan Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong (Shenzhen Qianhai business secretary Co., Ltd.)

Applicant after: Shenzhen Huafu Technology Co.,Ltd.

Address before: 518000 Room 201, building A, No. 1, Qian Wan Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong (Shenzhen Qianhai business secretary Co., Ltd.)

Applicant before: SHENZHEN HUAFU INFORMATION TECHNOLOGY Co.,Ltd.

Country or region before: China

GR01 Patent grant
GR01 Patent grant