CN112215235A - Scene text detection method aiming at large character spacing and local shielding - Google Patents

Scene text detection method aiming at large character spacing and local shielding Download PDF

Info

Publication number
CN112215235A
CN112215235A CN202011110021.0A CN202011110021A CN112215235A CN 112215235 A CN112215235 A CN 112215235A CN 202011110021 A CN202011110021 A CN 202011110021A CN 112215235 A CN112215235 A CN 112215235A
Authority
CN
China
Prior art keywords
text
feature
features
instance
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011110021.0A
Other languages
Chinese (zh)
Other versions
CN112215235B (en
Inventor
高攀
刘磊
黄军文
汤红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Huafu Information Technology Co ltd
Original Assignee
Shenzhen Huafu Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Huafu Information Technology Co ltd filed Critical Shenzhen Huafu Information Technology Co ltd
Priority to CN202011110021.0A priority Critical patent/CN112215235B/en
Publication of CN112215235A publication Critical patent/CN112215235A/en
Application granted granted Critical
Publication of CN112215235B publication Critical patent/CN112215235B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Character Input (AREA)

Abstract

The invention belongs to the technical field of optical character recognition, and particularly relates to a scene text detection method aiming at large character spacing and local shielding, which comprises the following steps: s1, extracting features of the input picture through a full convolution neural network, and fusing the features of different layers; s2, outputting a text segmentation graph by the fused features through a text semantic segmentation network, and outputting a text instance embedding feature graph by a text instance feature embedding module; and S3, embedding the text segmentation graph and the text example into the feature graph to obtain a text detection result through a text example recombination algorithm. The text instance embedding module embeds each pixel into a feature space, average pixel features in a text region are regarded as features of the text region, then a text instance recombination algorithm recombines text candidate regions with similar features, and text instances which are segmented into a plurality of regions due to large character spacing or local occlusion can be detected as a complete object again.

Description

Scene text detection method aiming at large character spacing and local shielding
Technical Field
The invention belongs to the technical field of optical character recognition, and particularly relates to a scene text detection method aiming at large character spacing and local shielding.
Background
Because the characters naturally contain rich and accurate semantic information, the computer can read and understand the characters on the pictures, and the method has academic and practical application values. Scene text detection is to detect text in a natural scene picture. The difficulty of the task mainly comes from three aspects, the first is the diversity of the text, and the text in a natural scene has various fonts, colors, sizes and artistic styles; secondly, the background of the natural scene picture is very complex, and objects with similar structures to texts exist in real life, such as windows, tiles, fences, grasslands and the like; and thirdly, the influence of the imaging environment of the picture, and uneven illumination, blurring and the like exist in part of the picture.
One of the existing methods is based on text box regression, which uses general object detection frameworks such as SSD, Faster R-CNN, etc., but due to the limitation of anchor box (anchor) mechanism, such methods cannot process arbitrarily shaped texts (such as curved texts). Meanwhile, due to the limited receptive field, the regression of the frame of the long text is inaccurate.
The second of the existing methods is based on semantic segmentation, and the pixels on the picture are divided into a foreground (text region) and a background based on the segmentation method. This approach can handle arbitrarily shaped text without considering the shape and size of the text object, but because the boundaries of the text are difficult to define, adjacent text lines are not easily distinguished. In addition, most methods use connected component analysis to determine text instances, and when text characters have large space or are partially occluded, one text object corresponds to multiple detection boxes.
Disclosure of Invention
In order to overcome the problem that when a text in an image is identified based on a semantic segmentation method in the background art, when the text character spacing is large or the text is partially shielded, one text object corresponds to a plurality of detection boxes, the invention provides the following technical scheme:
a method for detecting scene texts with large character spacing and local occlusion comprises the following steps:
s1, extracting features of the input picture through a full convolution neural network, and fusing the features of different layers;
s2, outputting a text segmentation graph by the fused features through a text semantic segmentation network, and outputting a text instance embedding feature graph by a text instance feature embedding module;
and S3, embedding the text segmentation graph and the text example into the feature graph to obtain a text detection result through a text example recombination algorithm.
Further, in S1, a full convolution network with a feature pyramid structure is adopted, and the input picture is subjected to feature pyramid network to extract features of different levels, and then subjected to a point adding operation and a channel cascade operation to fuse the features of different levels together.
Further, in S2, the text instance feature embedding module embeds each pixel into the feature space, and the average pixel feature in the text region is regarded as the feature of the text region
Further, the network structure constructed by the text instance feature embedding module enables the fused features to pass through two Conv-BN-Relu layers, then uses a 1 x1 convolution to reduce the number of channels and the calculation amount, and then passes through a Relu activation layer and then is up-sampled to the original input size.
Further, the text instance feature embedding module performs training by reducing the feature distance of different pixels in the same text instance and increasing the feature distance between different text instances.
Further, the text instance reorganization algorithm is a clustering algorithm based on measurement.
The method for detecting the scene text with the large character spacing and the local shielding has the advantages that: the method is optimized aiming at the problem of false detection of texts with large character spacing and local sheltered texts, and provides a text example feature embedding module and a text example recombination algorithm, wherein the text example embedding module embeds each pixel into a feature space, and average pixel features in a text region are regarded as features of the text region. Subsequently, the text instance reorganization algorithm reorganizes the text candidate regions with similar characteristics. By doing so, a text instance segmented into regions due to a large character pitch or partial occlusion can be re-detected as a complete object. The two modules do not depend on specific model details, can be combined with any mainstream text detection algorithm based on segmentation in a very portable mode, and improves the precision of the method.
Drawings
FIG. 1 is a schematic diagram illustrating steps of a text detection method according to an embodiment of the present invention;
FIG. 2 is a diagram of a full convolution network with a feature pyramid structure according to an embodiment of the present invention;
FIG. 3 is a diagram of a network structure of a text instance feature embedding module in an embodiment of the present invention;
Detailed Description
The present invention is further illustrated by the following examples, which are only a part of the examples of the present invention, and these examples are only for explaining the present invention and do not limit the scope of the present invention.
As shown in fig. 1 in the specification, a method for detecting a scene text with a large character space and a local occlusion includes the following steps:
s1, extracting features of the input picture through a full convolution neural network, and fusing the features of different layers
Any mainstream text detection method based on segmentation is selected, and in this embodiment, a full convolution network (FPN + FCN) with a classical feature pyramid structure is taken as an example. The overall network structure is as shown in fig. 2 of the specification, the input picture firstly extracts features of different layers through a feature pyramid network, and then the features of different layers are fused together through channel cascade operation.
S2, outputting text segmentation graph by the fused feature through a text semantic segmentation network, and outputting text instance embedding feature graph by a text instance feature embedding module
The main text example feature embedding module outputs text example embedding feature diagram introduction. The text example feature is embedded into a module network structure, as shown in the attached figure 3 of the specification, the fused features firstly pass through two Conv-BN-Relu layers, then use a 1 x1 convolution to reduce the number of channels and reduce the calculated amount, and then pass through a Relu activation layer and then are up-sampled to the original input size. Specifically, the text instance feature embedding module outputs a feature vector F for each pixelpX1, x2, x3, x4 (four dimensions are taken as an example). The feature of a text region is represented by the average feature vector of the pixels of the region, and its mathematical formula can be defined as
Figure BDA0002728294210000051
Since the feature vector of each pixel is missing in the label, and the purpose of the module is to learn the similarity between text instances, the idea of clustering is adopted in this document, and the learning of the text instance embedding module is supervised by reducing the feature distance of different pixels in the same text instance and increasing the feature distance between different text instances in the training process. Specifically, (1) decrease the intra-instance distance: the feature distance between pixels in the same text region should be as small as possible. The distance between the pixel and the text instance is used as loss, so that the pixel characteristics in the same text area are more similar. (2) Increase the distance between instances: the distance between feature vectors of different text regions should be as large as possible, as opposed to the intra-instance distance.
S3, embedding the text segmentation graph and the text example into the feature graph to obtain a text detection result through a text example recombination algorithm
The text instance recombination algorithm is a clustering algorithm based on measurement (distance), and the main idea is to judge whether the distance between feature vectors of two text candidate sets is smaller than a threshold value, if the distance between the feature vectors is small enough, the two text candidate sets are considered to be possibly the same text instance, and the two candidate texts are combined and recombined into one text instance. Besides the feature distance, some logic conditions need to be satisfied, such as the relative positions of the two candidate texts.
The invention provides a text instance feature embedding module and a text instance recombination algorithm aiming at a large character interval and a partially shielded text, can effectively detect the text with the large character interval and the partially shielded text, effectively improves the integral accuracy of the model, can realize plug and play, does not depend on a specific method, and can be very conveniently combined with a mainstream text detection method based on segmentation.
Although the present invention has been described with reference to a preferred embodiment, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (6)

1. A method for detecting a scene text with a large character space and local occlusion is characterized by comprising the following steps:
s1, extracting features of the input picture through a full convolution neural network, and fusing the features of different layers;
s2, outputting a text segmentation graph by the fused features through a text semantic segmentation network, and outputting a text instance embedding feature graph by a text instance feature embedding module;
and S3, embedding the text segmentation graph and the text example into the feature graph to obtain a text detection result through a text example recombination algorithm.
2. The method for detecting the scene text with the large character spacing and the local occlusion according to claim 1, wherein in S1, a full convolution network with a feature pyramid structure is adopted, and the inputted picture is firstly subjected to a feature pyramid network to extract features of different levels, and then subjected to a dot adding operation and a channel cascading operation to fuse the features of different levels together.
3. The method for detecting the text of the scene with the large character spacing and the local occlusion according to claim 1 or 2, wherein in S2, the text instance feature embedding module embeds each pixel into the feature space, and the average pixel feature in the text region is regarded as the feature of the text region.
4. The method for detecting the scene text with the large character spacing and the local occlusion according to claim 3, wherein a network structure constructed by the text instance feature embedding module leads the fused features to pass through two Conv-BN-Relu layers, then uses a 1 x1 convolution to reduce the number of channels, reduces the amount of calculation, passes through a Relu activation layer, and then is up-sampled to the original input size.
5. The method of claim 4, wherein the text instance feature embedding module performs training by reducing feature distances of different pixels in the same text instance and increasing feature distances between different text instances.
6. The method of claim 5, wherein the text instance reorganization algorithm is a metric-based clustering algorithm.
CN202011110021.0A 2020-10-16 2020-10-16 Scene text detection method aiming at large character spacing and local shielding Active CN112215235B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011110021.0A CN112215235B (en) 2020-10-16 2020-10-16 Scene text detection method aiming at large character spacing and local shielding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011110021.0A CN112215235B (en) 2020-10-16 2020-10-16 Scene text detection method aiming at large character spacing and local shielding

Publications (2)

Publication Number Publication Date
CN112215235A true CN112215235A (en) 2021-01-12
CN112215235B CN112215235B (en) 2024-04-26

Family

ID=74055526

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011110021.0A Active CN112215235B (en) 2020-10-16 2020-10-16 Scene text detection method aiming at large character spacing and local shielding

Country Status (1)

Country Link
CN (1) CN112215235B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040114800A1 (en) * 2002-09-12 2004-06-17 Baylor College Of Medicine System and method for image segmentation
US20190228097A1 (en) * 2018-01-23 2019-07-25 Vmware, Inc. Group clustering using inter-group dissimilarities
US20190272438A1 (en) * 2018-01-30 2019-09-05 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for detecting text
CN110210485A (en) * 2019-05-13 2019-09-06 常熟理工学院 The image, semantic dividing method of Fusion Features is instructed based on attention mechanism
WO2019192397A1 (en) * 2018-04-04 2019-10-10 华中科技大学 End-to-end recognition method for scene text in any shape
CN111209865A (en) * 2020-01-06 2020-05-29 中科鼎富(北京)科技发展有限公司 File content extraction method and device, electronic equipment and storage medium
CN111401376A (en) * 2020-03-12 2020-07-10 腾讯科技(深圳)有限公司 Target detection method, target detection device, electronic equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040114800A1 (en) * 2002-09-12 2004-06-17 Baylor College Of Medicine System and method for image segmentation
US20190228097A1 (en) * 2018-01-23 2019-07-25 Vmware, Inc. Group clustering using inter-group dissimilarities
US20190272438A1 (en) * 2018-01-30 2019-09-05 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for detecting text
WO2019192397A1 (en) * 2018-04-04 2019-10-10 华中科技大学 End-to-end recognition method for scene text in any shape
CN110210485A (en) * 2019-05-13 2019-09-06 常熟理工学院 The image, semantic dividing method of Fusion Features is instructed based on attention mechanism
CN111209865A (en) * 2020-01-06 2020-05-29 中科鼎富(北京)科技发展有限公司 File content extraction method and device, electronic equipment and storage medium
CN111401376A (en) * 2020-03-12 2020-07-10 腾讯科技(深圳)有限公司 Target detection method, target detection device, electronic equipment and storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
KIM, HH 等: "Text Detection with Deep Neural Network System Based on Overlapped Labels and a Hierarchical Segmentation of Feature Maps", INTERNATIONAL JOURNAL OF CONTROL AUTOMATION AND SYSTEMS, vol. 17, no. 6, pages 1599 - 1610, XP036795304, DOI: 10.1007/s12555-018-0578-8 *
PAN GAO 等: "Split and Merge: Component Based Segmentation Network for Text Detection", PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE. INTERNATIONAL CONFERENCE, ICPRAI 2020, vol. 12068, pages 14 *
顾军华;李炜;董永峰;: "基于点云数据的分割方法综述", 燕山大学学报, no. 02, pages 35 - 47 *
高攀: "基于深度学习的自然场景文本检测研究", 中国优秀硕士学位论文全文数据库信息科技辑, no. 10, pages 138 - 78 *

Also Published As

Publication number Publication date
CN112215235B (en) 2024-04-26

Similar Documents

Publication Publication Date Title
Liu et al. Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting
TWI744283B (en) Method and device for word segmentation
US20240078646A1 (en) Image processing method, image processing apparatus, and non-transitory storage medium
CN111066063B (en) System and method for depth estimation using affinity for convolutional spatial propagation network learning
US20180114071A1 (en) Method for analysing media content
JP4928310B2 (en) License plate recognition device, control method thereof, computer program
Shi et al. Multiscale multitask deep NetVLAD for crowd counting
Xu et al. Fast vehicle and pedestrian detection using improved Mask R‐CNN
CN110796643A (en) Rail fastener defect detection method and system
US9014479B2 (en) Method and system for text-image orientation
CN103593464A (en) Video fingerprint detecting and video sequence matching method and system based on visual features
Shivakumara et al. Fractals based multi-oriented text detection system for recognition in mobile video images
US11836958B2 (en) Automatically detecting and isolating objects in images
CN112364873A (en) Character recognition method and device for curved text image and computer equipment
CN110210480B (en) Character recognition method and device, electronic equipment and computer readable storage medium
CN113344826A (en) Image processing method, image processing device, electronic equipment and storage medium
CN113436222A (en) Image processing method, image processing apparatus, electronic device, and storage medium
CN111753714A (en) Multidirectional natural scene text detection method based on character segmentation
CN111242114A (en) Character recognition method and device
CN112215235A (en) Scene text detection method aiming at large character spacing and local shielding
US11875489B2 (en) Detecting hybdrid-distance adversarial patches
Mohanty et al. Robust scene text detection with deep feature pyramid network and CNN based NMS model
Zhang et al. Blind image quality assessment based on local quantized pattern
CN114550197A (en) Terminal strip image detection information matching method
CN116363656A (en) Image recognition method and device containing multiple lines of text and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Country or region after: China

Address after: 518000 Room 201, building A, No. 1, Qian Wan Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong (Shenzhen Qianhai business secretary Co., Ltd.)

Applicant after: Shenzhen Huafu Technology Co.,Ltd.

Address before: 518000 Room 201, building A, No. 1, Qian Wan Road, Qianhai Shenzhen Hong Kong cooperation zone, Shenzhen, Guangdong (Shenzhen Qianhai business secretary Co., Ltd.)

Applicant before: SHENZHEN HUAFU INFORMATION TECHNOLOGY Co.,Ltd.

Country or region before: China

GR01 Patent grant
GR01 Patent grant