CN111062386A - Natural scene text detection method based on depth pyramid attention and feature fusion - Google Patents

Natural scene text detection method based on depth pyramid attention and feature fusion Download PDF

Info

Publication number
CN111062386A
CN111062386A CN201911192949.5A CN201911192949A CN111062386A CN 111062386 A CN111062386 A CN 111062386A CN 201911192949 A CN201911192949 A CN 201911192949A CN 111062386 A CN111062386 A CN 111062386A
Authority
CN
China
Prior art keywords
feature
network
depth
conv5
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911192949.5A
Other languages
Chinese (zh)
Other versions
CN111062386B (en
Inventor
贾世杰
冯宇静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Wonderroad Magnesium Technology Co Ltd
Original Assignee
Beijing Wonderroad Magnesium Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Wonderroad Magnesium Technology Co Ltd filed Critical Beijing Wonderroad Magnesium Technology Co Ltd
Priority to CN201911192949.5A priority Critical patent/CN111062386B/en
Publication of CN111062386A publication Critical patent/CN111062386A/en
Application granted granted Critical
Publication of CN111062386B publication Critical patent/CN111062386B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/63Scene text, e.g. street names
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention provides a natural scene text detection method based on depth pyramid attention and feature fusion, which is a natural scene text detection algorithm combining a depth pyramid attention network and feature fusion and solves the two problems that an originally well-designed model cannot be fully utilized, the overall performance is limited, and the long dependence disappears along with the deepening of convolution due to the fact that convolution operation is based on a local receptive field. The utilization rate of the model is improved better by utilizing the feature fusion and the depth pyramid attention model, the defects that the existing many character detection models are good in design structure but cannot be fully utilized, the convolution operation is based on local receptive fields, and the long dependence disappears along with the deepening of the convolution are overcome.

Description

Natural scene text detection method based on depth pyramid attention and feature fusion
Technical Field
The invention relates to a natural scene text detection method, in particular to a natural scene text detection algorithm combining a depth pyramid attention network and a feature fusion technology.
Background
With the development of science and technology, the demand of internet products is continuously increased, and more aspects need to use text information in images. In order to identify the content of characters in an image more completely, character detection is the first step and is also an extremely important step, and the performance of character identification is directly influenced.
Text detection based on natural scenes needs to overcome background interference, variable aspect ratio of characters, variable directions of characters and detection complexity brought by small text to text detection, and is one of the most challenging subjects in the field of computer vision at present. The natural scene text detection can be divided into traditional natural scene character detection and natural scene character detection based on deep learning from different feature extraction modes. Scene pictures are different from document pictures and comprise complex backgrounds and character angle changes, and characters are difficult to distinguish from the backgrounds by using the traditional natural scene character detection method alone. At present, text detection in a natural scene of deep learning can be mainly divided into two types, namely a text detection method based on region suggestion and a text detection method based on image segmentation. Through analysis of the two methods, most models lack characteristic-level balancing, so that the originally well-designed models cannot be fully utilized, and the overall performance is limited.
In order to make full use of the model better, the invention provides a new network which overcomes the defect that the originally well designed model cannot be fully utilized and limits the overall performance, and solves the problem that the long dependence disappears along with the deepening of the convolution due to the fact that the convolution operation is based on the local receptive field.
Disclosure of Invention
The invention provides a natural scene text detection algorithm combining a depth pyramid attention network and feature fusion, and solves the problem that an originally well-designed model cannot be fully utilized and the overall performance is limited.
The technical scheme of the invention is as follows:
a natural scene text detection method based on depth pyramid attention and feature fusion comprises the following steps:
taking a common data set of a natural scene text as a training sample;
step two, inputting 8 pictures of training samples into a primary extracted feature network (the extracted feature network of PixelLink) according to each batch, wherein a basic framework is a VGG16 network, and a Unet structure is adopted; the top-down path uses a VGG16 network, which is a deep network consisting of a series of 3 × 3 convolutions and maximum pooling. The advantage of using multiple convolutional concatenations is: fewer parameters are required and there are more non-linear variations than using only one larger convolution kernel.
The bottom-up path, the upsampling phase. Wherein the up-sampling uses bilinear interpolation.
To prevent the feature map output by the VGG16 from being directly upsampled and thus losing context information, a cross-connect is employed. The method performs feature fusion on feature graphs with the same space size of a top-down path and a bottom-up path, thereby complementing lost information and enabling feature representation capability after up-sampling to be stronger.
Step three, extracting 4 characteristic mapping layers obtained by a characteristic network by the PixelLink: h4, h3, h2 and h1, wherein 4 feature mapping layers are up-sampled to h4, and average summation of pixel values is carried out, the size of channels is unchanged, and the feature fusion is called; wherein, the up-sampling uses bilinear interpolation; the formula of feature fusion is:
F=(h4+Up×2(h3)+Up×4(h2)+Up×4(h1))/4 (1)
wherein Up×2(. o) and Up×4(. cndot.) representing 2-fold and 4-fold enlargement, respectively;
step four, the output of the feature fusion is used as the input of the depth pyramid attention model, the depth pyramid attention model is further added, and the added depth pyramid attention model is more fully utilized;
the depth pyramid attention model consists of three branches: a depth feature pyramid network branch, a nonlinear transformation branch, and a global average pooling branch. The invention does not simply add the extracted information but does fine processing to the depth feature pyramid network. The depth feature pyramid network branches are respectively convolved with 2 convolutions 7 × 7,2 convolutions 5 × 5, and 2 convolutions 3 × 3, so as to extract information from different pyramid scales. The same convolution kernel adopts a serial connection mode, and different convolution kernels adopt a parallel connection mode. The left half of Conv7 is labeled as Conv7_1, the right half of Conv7 is labeled as Conv7_2, BN is labeled as Conv 357, BN. Similarly, Conv5 × 5, BN in the left half is labeled as Conv5_1, Conv5 × 5, BN in the right half is labeled as Conv5_2, Conv3 × 3, BN in the left half is labeled as Conv3_1, and Conv3 × 3, BN in the right half is labeled as Conv3_ 2. The refining process is as follows: feature mapping after feature fusion is performed by Conv7_1, Conv5_1, Conv3_1 and Conv3_2 respectively. The feature map of Conv3_2 is then upsampled and superimposed on the feature map of Conv5_1 in pixel values and the result of the superimposition is input to Conv5_ 2. Finally, the feature map of Conv5_2 is up-sampled and superimposed with the feature map of Conv7_1 for pixel values and the result of the superimposition is input to Conv7_ 2. Wherein, the up-sampling uses deconvolution, the size of the kernel is 4 x 4, the step size is 2, and BN and Relu activation functions are used;
inputting the refined feature mapping layer into a PixelLink output network;
the PixelLink output network mainly comprises two parts: the first part is to predict whether the pixel is text; the second part is to predict whether the pixel and the 8 surrounding pixels belong to the same text instance; connecting the pixels of the positive example by using positive connection to form a connected component, wherein each component is a text example;
step six, finally, the segmented text instance is used for obtaining a final connected domain through minAreaRect in an Opencv connected domain method; and when the connected region with the shortest side pixel less than 10 or the area less than 300 pixels is used as false detection, automatically filtering the text region, and finally outputting the bounding box.
The invention has the beneficial effects that:
(1) the utilization rate of the model is improved better by utilizing the feature fusion and the depth pyramid attention model, and the defects that the existing many character detection models are good in design structure but cannot be fully utilized and the overall performance is limited are overcome.
(2) The convolution operation is based on local receptive fields, and the problem that long dependence disappears along with the deepening of convolution is solved.
(3) Valid for multi-scale text.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Fig. 2 is a schematic diagram of the overall network architecture of the present invention.
FIG. 3 is a partial schematic diagram of a depth pyramid attention network structure.
Detailed Description
The following further describes a specific embodiment of the present invention with reference to the drawings and technical solutions.
As shown in fig. 1, the following steps are specifically described:
taking a common data set training set of a natural scene text as a training sample;
step two, using an extracted feature network of PixelLink as a primary extracted feature network, wherein a basic framework is a VGG16 network, and a Unet structure is adopted;
the Unet consists of top-down paths, bottom-up paths, and transverse connections.
(1) The top-down path uses a VGG16 network, which is a deep network consisting of a series of 3 × 3 convolutions and maximum pooling. The advantage of using multiple convolutional concatenations is: fewer parameters are required and there are more non-linear variations than using only one larger convolution kernel.
(2) The bottom-up path, the upsampling phase. Wherein the up-sampling uses bilinear interpolation.
(3) To prevent the feature map output by the VGG16 from being directly upsampled and thus losing context information, a cross-connect is employed. The method performs feature fusion on feature graphs with the same space size of a top-down path and a bottom-up path, thereby complementing lost information and enabling feature representation capability after up-sampling to be stronger.
Step three, extracting 4 characteristic mapping layers obtained by a characteristic network by the PixelLink: h 4; h 3; h 2; h1, up-sampling 4 feature mapping layers to h4, carrying out average summation of pixel values, and enabling the size of channels to be unchanged, wherein the process is called feature fusion; wherein, the up-sampling uses bilinear interpolation; the formula of feature fusion is:
F=(h4+Up×2(h3)+Up×4(h2)+Up×4(h1))/4 (1)
wherein Up×2(. o) and Up×4(. cndot.) representing 2-fold and 4-fold enlargement, respectively;
(1) due to hardware equipment, the training picture size is 256 × 256, the h4 size is 64 × 64, the h3 size is 32 × 32, the h2 size is 16 × 16, and the h1 size is 16 × 16.
Taking the output of the feature fusion as the input of the depth pyramid attention network, further refining the features and more fully utilizing the model;
(1) the depth pyramid attention network is composed of a depth feature pyramid network branch, a nonlinear transformation branch and a global average pooling branch. The design is made on the branch parts of the depth feature pyramid network, so that the features of each branch are not only simply fused, but also each part in the branch parts of the depth feature pyramid network is further refined.
And step five, inputting the refined feature mapping layer into a PixelLink output network.
(1) This output network is mainly composed of two parts. The first part is to predict whether the pixel is text/not text; the second part is to predict whether the pixel and its surrounding 8 pixels belong to the same text instance. Connecting the pixels of the positive example by using positive connection to form a connected component, wherein each component is a text example;
and step six, finally, the segmented text example is used for obtaining a final connected domain through minAreaRect in an Opencv connected domain method, but the method is sensitive to noise and can predict the noise as a real text, so that some threshold values are set, and false positives are reduced. When the connected region with the shortest edge pixel less than 10 or the area less than 300 pixels is used as the false detection, the text region is automatically filtered, and finally the bounding box is output.
The invention is characterized in that the refinement network is composed of two parts: the utilization rate of the model is improved better, the problems that the existing many character detection models are good in design structure but cannot be fully utilized and the convolution operation is based on local receptive fields, and long dependence disappears along with the deepening of convolution are solved.
The embodiments of the present invention will be described in detail below with reference to the accompanying drawings, which are implemented on the premise of the technical solution of the invention, and give detailed implementation manners and specific operation procedures, but the scope of the invention is not limited to the following implementation examples.
The data set for the experiments of the invention was used for ICDAR2015 and ICDAR 2013. There are 1500 pictures in the natural scene in the ICDAR2015 data set with a resolution size of 1280 × 720, of which 1000 are training pictures and 500 are testing pictures. The difference from the previous image of the ICDAR race is: these pictures are mainly obtained by *** glasses and are very random when being taken, and the text has the condition of inclination and blurring, and the aim is to increase the difficulty for detection.
ICDAR2013 contained 229 training pictures and 233 test pictures. This data set is a subset of the ICDAR2011, deleting ICDAR2011 duplicate pictures and remedying the problem of incorrect image labeling. The method is widely applied to text detection, but only contains horizontal text.
The experiment was carried out on a computer equipped with Intel (R) Core i7-6700 CPU 3.40GHz, running Linux Ubuntu 14.04 operating system and Pycharm Python 2.7. The deep learning framework is tensiflow-gpu ═ 1.3.0, and the libraries mainly needed are Opencv2, setprogram, matplotlib.
ICDAR2015 experiment: for the ICDAR2015 experiments, the training picture input size in the ICDAR2015 dataset used was 256 × 256 and the test picture resolution in the ICDAR2015 dataset was 1280 × 704. The evaluation criteria used were the R, P, F values published under the ICDAR2015 challenge.
Table 1 is the R, P, F values on the ICDAR2015 data set for the model of the invention and PixelLink, respectively. The results of the ICDAR2015 experiments are shown in table 1:
TABLE 1 ICDAR2015 multidirectional text detection experiment results
Model (model) Recall rate Rate of accuracy F value
Model of the invention 0.7708 0.7595 0.7651
PixelLink 0.7299 0.7607 0.7450
ICDAR2013 experiment: in the ICDAR2013 experiment, the input size of a training picture in the ICDAR2013 data set used is 256 × 256, and the resolution of a test picture in the ICDAR2013 data set is 384 × 384. The evaluation standard adopts R, P, F value of evaluation mode disclosed by ICDAR2013 challenge match.
Table 2 is the R, P, F values for the inventive model and PixelLink, respectively, on the ICDAR2013 dataset. The results of the ICDAR2013 experiments are shown in Table 2:
table 2 ICDAR2013 horizontal text detection experimental results
Model (model) Recall rate Rate of accuracy F value
Model of the invention 0.8168 0.7041 0.7563
PixelLink 0.6919 0.7508 0.7201

Claims (1)

1. A natural scene text detection method based on depth pyramid attention and feature fusion is characterized by comprising the following steps:
taking a common data set of a natural scene text as a training sample;
inputting training samples into a primary extraction feature network according to 8 pictures in each batch, wherein a basic framework is a VGG16 network and adopts a Unet structure; the primary extracted feature network is an extracted feature network of PixelLink;
step three, extracting 4 characteristic mapping layers obtained by a characteristic network by the PixelLink: h4, h3, h2 and h1, wherein 4 feature mapping layers are up-sampled to h4, and average summation of pixel values is carried out, the size of channels is unchanged, and the feature fusion is called; wherein, the up-sampling uses bilinear interpolation; the formula of feature fusion is:
F=(h4+Up×2(h3)+Up×4(h2)+Up×4(h1))/4 (1)
wherein Up×2(. o) and Up×4(. cndot.) representing 2-fold and 4-fold enlargement, respectively;
step four, the output of the feature fusion is used as the input of the depth pyramid attention model, the depth pyramid attention model is further added, and the added depth pyramid attention model is more fully utilized;
the depth pyramid attention model consists of three branches: the system comprises a depth characteristic pyramid network branch, a nonlinear transformation branch and a global average pooling branch; 2 convolutions of 7 × 7,2 convolutions of 5 × 5 and 2 convolutions of 3 × 3 are respectively used in the depth feature pyramid network branches, so that information is extracted from different pyramid scales; the same convolution kernels adopt a serial connection mode, and different convolution kernels adopt a parallel connection mode; the left half of Conv7 × 7, BN, Relu is marked as Conv7_1, the right half of Conv7 × 7, BN is marked as Conv7_ 2; similarly, the left half of Conv5 is Conv5_1, the right half of Conv5 is Conv5_2, the left half of Conv3 is BN, the Relu of Conv3_1, the right half of Conv3 is 3, the BN is Conv3_ 2; the refining process is as follows: the feature mapping after feature fusion is respectively performed by Conv7_1, Conv5_1, Conv3_1 and Conv3_ 2; then upsampling the feature map of Conv3_2 and performing superposition of pixel values with the feature map of Conv5_1 and inputting a superposition result to Conv5_ 2; finally, feature maps of Conv5_2 are up-sampled and are overlapped with feature maps of Conv7_1 to form pixel values, and the overlapping result is input into Conv7_ 2; wherein, the up-sampling uses deconvolution, the size of the kernel is 4 x 4, the step size is 2, and BN and relu activating functions are used;
inputting the refined feature mapping layer into a PixelLink output network;
the PixelLink output network consists of two parts: the first part is to predict whether the pixel is text; the second part is to predict whether the pixel and the 8 surrounding pixels belong to the same text instance; connecting the pixels of the positive example by using positive connection to form a connected component, wherein each component is a text example;
step six, finally, the segmented text instance is used for obtaining a final connected domain through minAreaRect in an Opencv connected domain method; and when the connected region with the shortest side pixel less than 10 or the area less than 300 pixels is used as false detection, automatically filtering the text region, and finally outputting the bounding box.
CN201911192949.5A 2019-11-28 2019-11-28 Natural scene text detection method based on depth pyramid attention and feature fusion Active CN111062386B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911192949.5A CN111062386B (en) 2019-11-28 2019-11-28 Natural scene text detection method based on depth pyramid attention and feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911192949.5A CN111062386B (en) 2019-11-28 2019-11-28 Natural scene text detection method based on depth pyramid attention and feature fusion

Publications (2)

Publication Number Publication Date
CN111062386A true CN111062386A (en) 2020-04-24
CN111062386B CN111062386B (en) 2023-12-29

Family

ID=70299270

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911192949.5A Active CN111062386B (en) 2019-11-28 2019-11-28 Natural scene text detection method based on depth pyramid attention and feature fusion

Country Status (1)

Country Link
CN (1) CN111062386B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111753714A (en) * 2020-06-23 2020-10-09 中南大学 Multidirectional natural scene text detection method based on character segmentation
CN111898570A (en) * 2020-08-05 2020-11-06 盐城工学院 Method for recognizing text in image based on bidirectional feature pyramid network
CN112257708A (en) * 2020-10-22 2021-01-22 润联软件***(深圳)有限公司 Character-level text detection method and device, computer equipment and storage medium
CN112613561A (en) * 2020-12-24 2021-04-06 哈尔滨理工大学 EAST algorithm optimization method
CN113609892A (en) * 2021-06-16 2021-11-05 北京工业大学 Handwritten poetry recognition method integrating deep learning with scenic spot knowledge map
CN113743291A (en) * 2021-09-02 2021-12-03 南京邮电大学 Method and device for detecting text in multiple scales by fusing attention mechanism
CN113744279A (en) * 2021-06-09 2021-12-03 东北大学 Image segmentation method based on FAF-Net network
CN113822232A (en) * 2021-11-19 2021-12-21 华中科技大学 Pyramid attention-based scene recognition method, training method and device
CN115471831A (en) * 2021-10-15 2022-12-13 中国矿业大学 Image significance detection method based on text reinforcement learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325534A (en) * 2018-09-22 2019-02-12 天津大学 A kind of semantic segmentation method based on two-way multi-Scale Pyramid
US20190130204A1 (en) * 2017-10-31 2019-05-02 The University Of Florida Research Foundation, Incorporated Apparatus and method for detecting scene text in an image
CN110097049A (en) * 2019-04-03 2019-08-06 中国科学院计算技术研究所 A kind of natural scene Method for text detection and system
CN110287960A (en) * 2019-07-02 2019-09-27 中国科学院信息工程研究所 The detection recognition method of curve text in natural scene image
WO2019192397A1 (en) * 2018-04-04 2019-10-10 华中科技大学 End-to-end recognition method for scene text in any shape

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190130204A1 (en) * 2017-10-31 2019-05-02 The University Of Florida Research Foundation, Incorporated Apparatus and method for detecting scene text in an image
WO2019192397A1 (en) * 2018-04-04 2019-10-10 华中科技大学 End-to-end recognition method for scene text in any shape
CN109325534A (en) * 2018-09-22 2019-02-12 天津大学 A kind of semantic segmentation method based on two-way multi-Scale Pyramid
CN110097049A (en) * 2019-04-03 2019-08-06 中国科学院计算技术研究所 A kind of natural scene Method for text detection and system
CN110287960A (en) * 2019-07-02 2019-09-27 中国科学院信息工程研究所 The detection recognition method of curve text in natural scene image

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
乔文凡;慎利;戴延帅;曹云刚;: "联合膨胀卷积残差网络和金字塔池化表达的高分影像建筑物自动识别", 地理与地理信息科学, no. 05 *
常宇飞;陈欣鹏;王远航;钱冰;: "基于特征金字塔的场景文本检测", 信息工程大学学报, no. 05 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111753714B (en) * 2020-06-23 2023-09-01 中南大学 Multidirectional natural scene text detection method based on character segmentation
CN111753714A (en) * 2020-06-23 2020-10-09 中南大学 Multidirectional natural scene text detection method based on character segmentation
CN111898570A (en) * 2020-08-05 2020-11-06 盐城工学院 Method for recognizing text in image based on bidirectional feature pyramid network
CN112257708A (en) * 2020-10-22 2021-01-22 润联软件***(深圳)有限公司 Character-level text detection method and device, computer equipment and storage medium
CN112613561A (en) * 2020-12-24 2021-04-06 哈尔滨理工大学 EAST algorithm optimization method
CN113744279A (en) * 2021-06-09 2021-12-03 东北大学 Image segmentation method based on FAF-Net network
CN113744279B (en) * 2021-06-09 2023-11-14 东北大学 Image segmentation method based on FAF-Net network
CN113609892A (en) * 2021-06-16 2021-11-05 北京工业大学 Handwritten poetry recognition method integrating deep learning with scenic spot knowledge map
CN113743291A (en) * 2021-09-02 2021-12-03 南京邮电大学 Method and device for detecting text in multiple scales by fusing attention mechanism
CN113743291B (en) * 2021-09-02 2023-11-07 南京邮电大学 Method and device for detecting texts in multiple scales by fusing attention mechanisms
CN115471831A (en) * 2021-10-15 2022-12-13 中国矿业大学 Image significance detection method based on text reinforcement learning
CN115471831B (en) * 2021-10-15 2024-01-23 中国矿业大学 Image saliency detection method based on text reinforcement learning
CN113822232A (en) * 2021-11-19 2021-12-21 华中科技大学 Pyramid attention-based scene recognition method, training method and device

Also Published As

Publication number Publication date
CN111062386B (en) 2023-12-29

Similar Documents

Publication Publication Date Title
CN111062386B (en) Natural scene text detection method based on depth pyramid attention and feature fusion
CN110428432B (en) Deep neural network algorithm for automatically segmenting colon gland image
WO2017148265A1 (en) Word segmentation method and apparatus
CN112232391B (en) Dam crack detection method based on U-net network and SC-SAM attention mechanism
CN110399840B (en) Rapid lawn semantic segmentation and boundary detection method
CN112767418B (en) Mirror image segmentation method based on depth perception
CN111275034B (en) Method, device, equipment and storage medium for extracting text region from image
Hou et al. BSNet: Dynamic hybrid gradient convolution based boundary-sensitive network for remote sensing image segmentation
CN112465759A (en) Convolutional neural network-based aeroengine blade defect detection method
CN114742799B (en) Industrial scene unknown type defect segmentation method based on self-supervision heterogeneous network
EP3895122A1 (en) Systems and methods for automated cell segmentation and labeling in immunofluorescence microscopy
CN110751154A (en) Complex environment multi-shape text detection method based on pixel-level segmentation
CN111986164A (en) Road crack detection method based on multi-source Unet + Attention network migration
Liu et al. Multi-component fusion network for small object detection in remote sensing images
Liu et al. Asflow: Unsupervised optical flow learning with adaptive pyramid sampling
Chen et al. A refined single-stage detector with feature enhancement and alignment for oriented objects
El Abbadi Scene Text detection and Recognition by Using Multi-Level Features Extractions Based on You Only Once Version Five (YOLOv5) and Maximally Stable Extremal Regions (MSERs) with Optical Character Recognition (OCR)
Gui et al. A fast caption detection method for low quality video images
CN110472490A (en) Based on the action identification method and device, storage medium and terminal for improving VGGNet
CN115565034A (en) Infrared small target detection method based on double-current enhanced network
Cloppet et al. Adaptive fuzzy model for blur estimation on document images
CN114332493A (en) Cross-dimension interactive significance detection model and detection method thereof
Callier et al. Automatic road area extraction from printed maps based on linear feature detection
CN112861860A (en) Natural scene lower word detection method based on upper and lower boundary extraction
Lad et al. LDWS-net: a learnable deep wavelet scattering network for RGB salient object detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant