CN113255669A - Method and system for detecting text of natural scene with any shape - Google Patents
Method and system for detecting text of natural scene with any shape Download PDFInfo
- Publication number
- CN113255669A CN113255669A CN202110715820.9A CN202110715820A CN113255669A CN 113255669 A CN113255669 A CN 113255669A CN 202110715820 A CN202110715820 A CN 202110715820A CN 113255669 A CN113255669 A CN 113255669A
- Authority
- CN
- China
- Prior art keywords
- mask
- frame
- candidate frame
- candidate
- score
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 68
- 238000001514 detection method Methods 0.000 claims abstract description 85
- 238000012805 post-processing Methods 0.000 claims abstract description 7
- 238000012549 training Methods 0.000 claims description 14
- 238000000605 extraction Methods 0.000 claims description 7
- 238000012216 screening Methods 0.000 claims description 4
- 238000004458 analytical method Methods 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000000873 masking effect Effects 0.000 claims description 3
- 238000012512 characterization method Methods 0.000 claims description 2
- 230000001902 propagating effect Effects 0.000 claims 1
- 230000008569 process Effects 0.000 abstract description 12
- 238000013461 design Methods 0.000 abstract description 5
- 230000008092 positive effect Effects 0.000 abstract description 2
- 230000011218 segmentation Effects 0.000 description 15
- 230000006870 function Effects 0.000 description 9
- 230000007246 mechanism Effects 0.000 description 4
- 230000004044 response Effects 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 3
- CURLTUGMZLYLDI-UHFFFAOYSA-N Carbon dioxide Chemical compound O=C=O CURLTUGMZLYLDI-UHFFFAOYSA-N 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000002708 enhancing effect Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000001788 irregular Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000001629 suppression Effects 0.000 description 2
- 102100027237 MAM domain-containing protein 2 Human genes 0.000 description 1
- 101710116166 MAM domain-containing protein 2 Proteins 0.000 description 1
- 241000695274 Processa Species 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 229910002092 carbon dioxide Inorganic materials 0.000 description 1
- 239000001569 carbon dioxide Substances 0.000 description 1
- 238000005094 computer simulation Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 239000002243 precursor Substances 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
- G06V20/63—Scene text, e.g. street names
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a method and a system for detecting a natural scene text with any shape, which comprises the following steps: acquiring a to-be-detected text image; inputting the image to be detected into the trained detection model to obtain a final detection frame; carrying out post-processing on the obtained final detection frame to form a text area; and the detection model screens the candidate detection frames through the classification score and the mask score to obtain the final detection frame. The invention designs a mask attention module for connecting a mask generation process and a mask quality scoring process, wherein the mask attention module has a positive effect on the prediction of the mask score.
Description
Technical Field
The invention relates to the technical field of natural scene text detection, in particular to a method and a system for detecting a natural scene text with any shape.
Background
The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.
Text appears in every corner of our daily lives as the most direct way of information dissemination. Due to potential application values in the aspects of automatic driving, blind navigation, information retrieval and the like, the text understanding task in a natural scene gets more and more attention. In general, the natural scene text understanding task involves two steps: text detection and text recognition. The first step is to locate text regions; the second step is for identifying content in the text region. Text detection is a crucial position as a precursor to text understanding tasks.
The traditional text detection method is mainly based on a connected region analysis method and a sliding window method, the two methods are based on the characteristics of manual design and can play a role in some simple scenes, and some methods cannot be used in some complex scenes. In recent years, due to the improvement of computer performance, the deep learning meets the unprecedented development opportunity, and the text detection technology based on the deep learning is also rapidly improved. Text detection based on deep learning can be mainly classified into two categories: regression-based methods and segmentation-based methods. The regression-based method can be used for detecting horizontal or multi-directional texts, and the segmentation-based method can be used for detecting texts in any shapes, so that the segmentation-based method occupies the dominant position of natural scene text detection at present.
One of the mainstream segmentation-based methods is an instance-based segmentation method. Such methods typically first use a horizontal candidate box (pro-visual) to locate a region; a classification score is then generated to determine whether the region enclosed by the candidate box belongs to text, and a segmentation mask is generated for delineating the text region. Such methods typically use classification scores as the only criteria for evaluating the quality of predicted candidate boxes, which can lead to serious false positive problems. False positive problems can be particularly classified into three categories:
(1) classify the resulting false positives. As shown in fig. 2(a), some areas in the natural scene have features similar to the text, such as graffiti on a wall, lines on a book, cracks on a road surface, etc., and these areas may be mistakenly classified as texts, resulting in a false positive sample.
(2) False positives due to regression. As shown in fig. 2(b), for some long texts or texts with larger character spacing (such as chinese), a candidate box may only contain a partial text segment, and an incomplete text segment may cause ambiguity in the subsequent recognition module.
(3) False positives resulting from segmentation. As shown in fig. 2(c), for some irregular texts, the horizontal candidate box may contain a large amount of background noise, so that the final segmented mask may not perfectly represent the text region.
For some systems with high precision requirements, the existence of false positive samples can cause immeasurable loss, many systems prefer not to recognize and fail to recognize, and the false positive samples in the detection result can cause fatal influence on the recognition result. For example, in the automatic driving, the four words of 'no parking' detect only the latter half, which may cause the vehicle to illegally park, and in the information retrieval process, the four words of 'football' detect only the former half, which may cause the detected result to be greatly different from the ideal result.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a method and a system for detecting a natural scene text with any shape;
in a first aspect, the invention provides a method for detecting a text in a natural scene in any shape;
the method for detecting the text of the natural scene with any shape comprises the following steps:
acquiring a to-be-detected text image;
inputting the image to be detected into the trained detection model to obtain a final detection frame; carrying out post-processing on the obtained final detection frame to form a text area;
and the detection model screens the candidate detection frames through the classification score and the mask score to obtain the final detection frame.
In a second aspect, the invention provides a system for detecting text in a natural scene in any shape;
an arbitrarily shaped natural scene text detection system, comprising:
an acquisition module configured to: acquiring a to-be-detected text image;
a detection module configured to: inputting the image to be detected into the trained detection model to obtain a final detection frame; carrying out post-processing on the obtained final detection frame to form a text area;
and the detection model screens the candidate detection frames through the classification score and the mask score to obtain the final detection frame.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention analyzes and summarizes the false positive problem existing in the traditional text detection method based on example segmentation, and provides a mechanism for scoring the mask quality to inhibit the false positive examples.
2. The invention designs a new method for detecting the natural scene text with any shape according to the proposed mask quality scoring mechanism, and the proposed method can inhibit all types of false positive samples in a simple and uniform manner.
3. The invention designs a mask attention module for connecting a mask generation process and a mask quality scoring process, wherein the mask attention module has a positive effect on the prediction of the mask score.
Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.
FIG. 1 is a flow chart of a method of the first embodiment;
FIG. 2(a) is a graph of false positives resulting from classification according to the first embodiment;
FIG. 2(b) is a false positive resulting from the regression of the first embodiment;
FIG. 2(c) is a graph of false positives resulting from the segmentation of the first embodiment;
FIG. 2(d) is a sample of the true positive of the first embodiment;
FIG. 3 is a detailed structure of the MAM of the first embodiment;
fig. 4 is a detailed structure of the Mask head of the first embodiment.
Detailed Description
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and it should be understood that the terms "comprises" and "comprising", and any variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The embodiments and features of the embodiments of the present invention may be combined with each other without conflict.
All data are obtained according to the embodiment and are legally applied on the data on the basis of compliance with laws and regulations and user consent.
In order to uniformly solve the above-mentioned false positive problem, the invention designs an arbitrary shape text detection method based on a mask quality scoring mechanism, which is different from the previous method that only uses the classification score of a candidate box to evaluate the quality of a candidate box, and in the model designed by the invention, whether a candidate box can be kept is determined by the classification score and the mask score of the candidate box. Based on the mechanism, the model can evaluate the quality of the candidate box more reasonably, and false positive samples are more likely to be found and filtered out. The overall framework of the model is shown in fig. 1. The model designed by the invention consists of four parts: a skeleton Network (Backbone), a candidate area Network (RPN), a bounding Box module (Box head), and a Mask module (Mask head). The frame module (Box head) comprises two full connection layers which are connected in sequence.
Example one
The embodiment provides a method for detecting a natural scene text in any shape;
as shown in fig. 1, the method for detecting a text in a natural scene with an arbitrary shape includes:
s1: acquiring a to-be-detected text image;
s2: inputting the image to be detected into the trained detection model to obtain a final detection frame; carrying out post-processing on the obtained final detection frame to form a text area;
and the detection model screens the candidate detection frames through the classification score and the mask score to obtain the final detection frame.
Further, the S2: inputting the image to be detected into the trained detection model to obtain a final detection frame; the method specifically comprises the following steps:
s21: carrying out feature extraction on the image to be detected;
s22: constructing an initial candidate frame based on the extracted image features;
s23: generating an initial candidate frame feature based on the initial candidate frame; predicting a classification score of the candidate box based on the initial candidate box feature; meanwhile, performing frame regression on the initial candidate frame, and adjusting the size and the position of the initial candidate frame to obtain an adjusted candidate frame;
s24: generating characteristics of the adjusted candidate frame for the adjusted candidate frame;
expanding the adjusted candidate frame to obtain an expanded candidate frame;
for the expansion candidate frame, generating an expansion candidate frame characteristic;
s25: generating a mask for the adjusted candidate frame based on the features of the adjusted candidate frame and the expanded candidate frame features; evaluating the mask quality to obtain a mask score;
s26: and screening the adjusted candidate frames through the classification score and the mask score to form a final detection frame.
Further, the S21: carrying out feature extraction on the image to be detected; the method specifically comprises the following steps:
and a Deep residual Network (ResNet) is adopted as a skeleton Network Backbone, the features of the image to be detected are extracted, and Feature Pyramid networks are used for enhancing and representing the features.
Further, the S22: constructing an initial candidate frame based on the extracted image features; the method specifically comprises the following steps:
inputting the extracted features into a candidate Region generation Network (RPN) to obtain a constructed initial candidate frame.
Illustratively, the RPN network will output several horizontal candidate boxes, and one candidate box may be specifically expressed as follows:
wherein,is a candidate frameThe coordinates of the upper left corner of the table,andare respectively asWidth and height of (d);
further, the S23: generating an initial candidate frame feature based on the initial candidate frame; the method specifically comprises the following steps:
and generating initial candidate frame features by adopting a candidate region alignment operation RoIAlign based on the initial candidate frame and the extracted image features.
Further, the S23: predicting a classification score of the candidate box based on the initial candidate box feature; meanwhile, performing frame regression on the initial candidate frame, and adjusting the size and the position of the initial candidate frame to obtain an adjusted candidate frame; the method specifically comprises the following steps:
reducing the dimension of the initial candidate frame features through two full-connection layers, and simultaneously and respectively sending the dimension-reduced features to a classification branch and a regression branch;
the classification branch is a full-connection layer with two-dimensional vector output, and a classification score is obtained by calculation according to the output of the classification branch;
the regression branch is a full connection layer with four-dimensional vector output, and the initial candidate frame is subjected to frame regression according to the output of the regression branch.
Illustratively, the S23: generating an initial candidate frame feature based on the initial candidate frame; predicting a classification score of the candidate box based on the initial candidate box feature; meanwhile, performing frame regression on the initial candidate frame, and adjusting the size and the position of the initial candidate frame to obtain an adjusted candidate frame; the method specifically comprises the following steps:
s233: outputting a two-dimensional vector by the reduced features through a full connection layer,Is directed to the output of a category of text,is directed to the output of the background analogy.
The classification score of a candidate box is used for determining whether the area enclosed by the candidate box belongs to the text or not, and the classification score:
S234: the feature after dimensionality reduction simultaneously outputs a four-dimensional vector through another full-connection layerFor bounding box regression, initial candidate boxesForming adjusted candidate frame by adjusting position and size through frame regression branch:
Further, the S24: generating characteristics of the adjusted candidate frame for the adjusted candidate frame; the method specifically comprises the following steps:
and generating the feature of the adjusted candidate frame by adopting a candidate region alignment operation RoIAlign based on the adjusted candidate frame and the image feature extracted in the S22.
Further, the S24: expanding the adjusted candidate frame to obtain an expanded candidate frame; the method specifically comprises the following steps:
and adopting the extension operation extension to form an extended candidate frame for the adjusted candidate frame.
The extension operation keeps the center position of the candidate frame unchanged, and the width and the height of the candidate frame are respectively expandedIn the practical operation processTypically 2 is taken.
Meanwhile, the extension operation also ensures that the candidate frame after expansion does not exceed the image boundary.
Further, the S24: generating an expansion candidate frame characteristic for the expansion candidate frame; the method specifically comprises the following steps:
and for the expansion candidate frame, generating an expansion candidate frame characteristic by adopting a candidate region alignment operation RoIAlign.
Further, the S25: generating a mask for the adjusted candidate frame based on the features of the adjusted candidate frame and the expanded candidate frame features; evaluating the mask quality to obtain a mask score; the method specifically comprises the following steps:
the adjusted features of the candidate box and the expanded features of the candidate box are input into a Mask module Mask head, and the Mask module Mask head comprises two workflows: mask generating stream and mask score stream; two workflows are connected through a Mask Attention Module (MAM).
The upper workflow in the Mask head of fig. 4 is a Mask generation flow, and outputs a Mask with the adjusted candidate box feature as an input. A mask generation stream comprising: the segmentation mask comprises a convolutional layer C1, a convolutional layer C2 and an deconvolution layer, wherein the characteristics of the deconvolution layer are subjected to dimensionality reduction through a convolutional layer C3 with a 1x1 convolutional kernel, and a segmentation mask is output.
The next workflow in the Mask head of fig. 4 is a Mask score stream, and outputs a Mask score with the adjusted candidate box features and the expanded candidate box features as inputs. The Mask score stream firstly adopts two layers of convolution layers and two Mask Attention Modules (MAM) to fuse the input characteristics, and the Mask score stream comprises: convolutional layer D1, convolutional layer D2, convolutional layer D3, full-link layer FC1, full-link layer FC2, and full-link layer FC 3.
The Mask of the feature and Mask generation stream output after the Mask Attention Module (MAM) in fig. 4 is stacked into the convolutional layer D3, the full connection layer FC1, the full connection layer FC2, and the full connection layer FC3, and outputs a predicted Mask score.
Further, the S26: screening the adjusted candidate frame through the classification score and the mask score to obtain a final detection frame; the method specifically comprises the following steps:
s261: all adjusted candidate frames are first deduplicated by non-maximal suppression (NMS);
s262: the quality of a candidate frame is scored by the candidate frameThe specific calculation formula is as follows:
wherein,for the predicted mask score, willFiltering out candidate frames smaller than 0.5, and forming a final detection frame by the reserved candidate frames;
s263: and selecting the maximum connected region in the mask of the reserved detection frame as a final detection result.
Further, the detection model, the model structure, includes:
the system comprises a skeleton network Backbone, a text detection module and a text analysis module, wherein the skeleton network Backbone is used for inputting a text detection image;
the output end of the Backbone Network Backbone is connected with the input end of a Region candidate Network (RPN);
an output end of the candidate Region generation Network (RPN) is connected with an input end of the RoIAlign layer; the output end of the RoIAlign layer is connected with the input end of the frame module Box head; the frame module Box head comprises two fully-connected layers which are connected in sequence;
the output end of the RoIAlign layer is also connected with the input end of the Mask head module.
Further, the Mask head module comprises: two parallel working branches: a first branch and a second branch;
wherein, first branch road includes: a convolutional layer C1 and a convolutional layer C2 connected in sequence; an input terminal of convolutional layer C1 for inputting the characteristics of the adjusted candidate frame;
wherein, the second branch road includes: a convolutional layer D1 and a convolutional layer D2 connected in this order; the input end of the convolution layer D1 is used for inputting the splicing value of the adjusted candidate frame characteristic and the expanded candidate frame characteristic;
the output terminal of convolutional layer C2 is connected to the first input terminal of the first Mask Attention Module (MAM);
the output end of the convolutional layer D2 is connected with the second input end of the first mask attention module;
the first output end of the first mask attention module is connected with the first input end of the second mask attention module;
the second output end of the first mask attention module is connected with the second input end of the second mask attention module;
a first output end of the second mask attention module is connected with an input end of the deconvolution layer, an output end of the deconvolution layer is connected with an input end of the convolution layer C3, and an output end of the convolution layer C3 generates a predicted mask;
the second output end of the second mask attention module is connected with the input end of the convolutional layer D3, the output end of the convolutional layer C3 is connected with the input end of the convolutional layer D3, the characteristics of the output end of the convolutional layer D3 are connected with three full-connection layers after being subjected to size adjustment, and the last full-connection layer outputs a mask score.
Illustratively, the Mask head module is divided into two workflows: the mask generates a stream and a masked score stream. The mask generation flow generates a corresponding mask for the candidate box feature by using the adjusted candidate box feature, and the mask scoring flow evaluates the mask quality by using the adjusted candidate box feature and the expanded candidate box feature, wherein the expanded candidate box comprises more peripheral information which is helpful for predicting the mask quality.
Illustratively, the Mask head module specifically works as follows:
step (1): adjusted candidate box featuresForming mask generation stream features over two convolutional layers;
Step (2): adjusted candidate box featuresAnd extended candidate box featuresCascading into two convolutional layers to form a masked scored stream feature;
And (3):andfeeding into a first Mask Attention Module (MAM); the first mask attention module MAM causes the masking score stream to focus on the areas contained in the mask in order to predict the mask quality more accurately. The detailed structure of the MAM can refer to fig. 3;
and (4): features of the two workflows pass through a second mask attention module;
and (5): mask generation stream characterization through an deconvolution layer and a convolution layer to generate predicted masks;
And (6): masked score stream feature sumOutputting predicted mask scores for a convolution layer and three fully-connected layers in a stack。
The specific process of the step (3) is as follows:
step (3.1): masked production flow featuresGenerating a phased mask through a convolutional layer as an attention map:
wherein,indicates an possession ofThe convolution layer of the convolution kernel is formed,is oneThe attention map of (1);the region with the middle response value higher than the set threshold value indicates off in the segmentation processA region of note;the region with the medium response value lower than the set threshold value represents a region which is not concerned in the segmentation process;
step (3.2): enhancing on features of masked score streamsThe representation of the region of interest is specifically operated as follows:
wherein,is a function for expanding the number of channels of the feature map, and the feature map is copied in actual operationFromExtend to;
Step (3.3):the response value of the area of no interest is usually very low, soThe response of these regions of no interest is greatly suppressed;
in order to prevent the loss of the whole area information, inAdding original characteristics on the basis of the following steps:
step (3.4):andrespectively obtaining output values through convolution layer fusion characteristics.
Wherein the internal structure of the first mask attention module and the second MAM module are the same.
Further, the first masked attention module includes:
convolutional layer E1; an input of the convolutional layer E1 is for connection with a first mask attention module first input; the output end of the convolutional layer E1 is used for being connected with a first output end of a first mask attention module;
a convolutional layer F1; an input of the convolutional layer F1 is for connection with a first mask attention module first input; the output end of the convolutional layer F1 is used for being connected with the input end of the multiplier;
the input end of the multiplier is also connected with the second input end of the first mask attention module; the output end of the multiplier is connected with the input end of the adder, and the input end of the adder is also connected with the second input end of the first mask attention module; the output of the adder is further adapted to be coupled to a second output of the first mask attention module via convolutional layer G1.
Further, the training of the trained detection model comprises:
sa 1: constructing a training set, wherein the training set is an image of a known candidate frame label;
sa 2: inputting the training set into the detection model, training the detection model,
sa 3: carrying out feature extraction on the image of the known candidate frame tag;
sa 4: constructing an initial candidate frame based on the extracted features;
sa 5: generating an initial candidate frame feature based on the initial candidate frame; predicting a classification score of the candidate box based on the initial candidate box feature; meanwhile, generating a four-dimensional regression bias vector for the initial candidate frame based on the characteristics of the initial candidate frame;
sa 6: generating characteristics of the initial candidate frame for the initial candidate frame; expanding the initial candidate frame to obtain an expanded candidate frame; generating an expansion candidate frame characteristic for the expansion candidate frame;
sa 7: generating a mask based on the initial candidate box feature and the expanded candidate box feature; evaluating the mask quality to obtain a mask score;
sa 8: and calculating a loss function according to the generated classification score, the regression bias vector, the mask score and the attention map generated in the step Sa7, and optimizing network parameters through back propagation to obtain a trained detection model.
Exemplarily, the Sa 6: for the initial candidate frame, generating characteristics of the initial candidate frame; expanding the initial candidate frame to obtain an expanded candidate frame; generating an expansion candidate frame characteristic for the expansion candidate frame; the method specifically comprises the following steps:
Sa62:Forming extended candidate boxes via extension operationsThe extension operation is specifically as follows:
wherein,is thatThe coordinates of the upper left corner of the table,andare respectively asThe width and the height of the base material,represents the expansion factor, i.e. the multiple by which the candidate box expands;
Further, in Sa8, the loss function of the computational model optimizes the parameters of the entire model by back propagation, and the specific form of the loss function is as follows:
wherein L is a loss function of the model and consists of six parts,、、、andfor balancing the importance between the loss functions;
,,the loss of the RPN network and the Box head is in the same form as the prior method based on example division,involving two parts, two classes of Log loss and bounding box regressionThe loss of the carbon dioxide gas is reduced,in the form of cross-entropy is used,by usingThe form will not be described herein.
whereinRepresentation maskTo middleThe value of the individual pixels is then calculated,is shown asMask labels for individual pixels (obtained by real labels of the training data),to representTotal number of middle pixels.
The loss function is a loss function aiming at a mask attention diagram in the MAM, and the specific form is as follows:
whereinAndrespectively representing the loss of two MAMs, and the loss forms of the two MAMs are consistent.Representation maskCode attention mapTo middleThe value of the individual pixels is then calculated,a mask label representing the pixel is identified,can be labeled by a maskAnd (3) obtaining secondary interpolation:
whereinIs the mask score of the model prediction and,is the true mask score, defined as the intersection ratio between the generation mask and the true mask.Can be obtained through the following process:
if a candidate box has an intersection ratio greater than 0.2 to a true horizontal text box, then the true mask score for the candidate box is calculated by the following formula:
if a candidate box does not intersect any real text boxes or the intersection ratio of all real text boxes is less than 0.2, then the candidate box's intersection ratio is less thanDirectly set to 0.
Fig. 2(a) -2 (d) show three types of false positive samples and one true positive sample. Wherein the rectangular box represents the final detection box of the model prediction; the shaded portions in the boxes represent the segmentation masks for these detection boxes. Wherein cls-score is the classification score predicted by the model; ms-score is the masking score predicted by the model. The traditional method only screens candidate frames through classification scores, and the three types of false positive samples generally have higher classification scores so that the false positive samples are reserved; the method provided by the invention screens the candidate boxes through the classification score and the mask score, the mask score of three types of false positive samples is very low and can be filtered, and the two scores of one true positive sample are both very high and can be reserved in the final detection result.
Training process:
step (1): acquiring original pictures of a training set and original labels of text regions in each picture (generally, horizontal or multidirectional texts are labeled by quadrangles, and irregular texts are labeled by polygons), and generating mask labels and candidate frame labels of the text regions by using the original labels. Marking all pixels inside the quadrangle or the polygon as a text category (namely, marking the pixel value as 1), and marking all pixels outside the quadrangle or the polygon as a background category (namely, marking the pixel value as 0), and forming a text region mask; taking the minimum horizontal frame capable of surrounding the quadrangle or the polygon as a candidate frame label;
step (2): sending the pictures into a backhaul extraction feature and constructing an initial candidate frame through an RPN;
and (3): the initial candidate Box characteristic and the expanded candidate Box characteristic are sent to the Box head and the Mask head to generate the classification scoreWith offset frameDividing the maskAnd a mask score. The MAM module in the Mask head outputs an attention diagram;
And (4): calculating a loss function of the model, and optimizing the whole model through back propagation;
and (5): after the whole training set trains K epochs, the fixed model stores network parameters, and K is a positive integer in the range of 30-40.
The testing process comprises the following steps:
step (1): acquiring a picture to be tested;
step (2): sending the pictures into a backhaul extraction feature and constructing an initial candidate frame through an RPN;
and (3): initial waiting timeThe Box feature is fed into Box head to generate a classification scoreAnd the frame offsetBy usingAdjusting the original candidate frame;
and (4): sending the adjusted candidate frame characteristics and the expanded adjusted candidate frame characteristics into a Mask head to generate a segmentation MaskAnd a mask score;
And (5): non-maxima suppression is used to filter out duplicate candidate blocks. Reusing classification scoresSum mask scoreCalculating candidate box scores. Will be provided withCandidate boxes smaller than 0.5 are filtered out;
and (6): and selecting the maximum connected region in the mask of the reserved candidate frame as a final detection result.
Example two
The embodiment provides a natural scene text detection system with any shape;
an arbitrarily shaped natural scene text detection system, comprising:
an acquisition module configured to: acquiring a to-be-detected text image;
a detection module configured to: inputting the image to be detected into the trained detection model to obtain a final detection frame; carrying out post-processing on the obtained final detection frame to form a text area;
and the detection model screens the candidate detection frames through the classification score and the mask score to obtain the final detection frame.
It should be noted that the above-mentioned acquiring module and detecting module correspond to steps S1 to S2 in the first embodiment, and the above-mentioned modules are the same as the examples and application scenarios realized by the corresponding steps, but are not limited to the disclosure of the first embodiment. It should be noted that the modules described above as part of a system may be implemented in a computer system such as a set of computer-executable instructions.
In the foregoing embodiments, the descriptions of the embodiments have different emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
The proposed system can be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, the division of the above-described modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules may be combined or integrated into another system, or some features may be omitted, or not executed.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (10)
1. The method for detecting the text of the natural scene in any shape is characterized by comprising the following steps:
acquiring a to-be-detected text image;
inputting the image to be detected into the trained detection model to obtain a final detection frame; carrying out post-processing on the obtained final detection frame to form a text area;
and the detection model screens the candidate detection frames through the product of the classification score and the mask score to obtain the final detection frame.
2. The method for detecting the texts in the natural scenes with the arbitrary shapes as claimed in claim 1, wherein the images to be detected are input into the trained detection model to obtain a final detection frame; the method specifically comprises the following steps:
carrying out feature extraction on the image to be detected;
constructing an initial candidate frame based on the extracted image features;
generating an initial candidate frame feature based on the initial candidate frame; predicting a classification score of the candidate box based on the initial candidate box feature; meanwhile, performing frame regression on the initial candidate frame, and adjusting the size and the position of the initial candidate frame to obtain an adjusted candidate frame;
generating characteristics of the adjusted candidate frame for the adjusted candidate frame; expanding the adjusted candidate frame to obtain an expanded candidate frame; for the expansion candidate frame, generating an expansion candidate frame characteristic;
generating a mask for the adjusted candidate frame based on the features of the adjusted candidate frame and the expanded candidate frame features; evaluating the mask quality to obtain a mask score;
and screening the adjusted candidate frame by the product of the classification score and the mask score to form a final detection frame.
3. The method for detecting the text in the natural scene with the arbitrary shape as claimed in claim 2, wherein the classification score of the candidate box is predicted based on the initial candidate box feature; meanwhile, performing frame regression on the initial candidate frame, and adjusting the size and the position of the initial candidate frame to obtain an adjusted candidate frame; the method specifically comprises the following steps:
reducing the dimension of the initial candidate frame features through two full-connection layers, and simultaneously and respectively sending the dimension-reduced features to a classification branch and a regression branch;
the classification branch is a full-connection layer with two-dimensional vector output, and a classification score is obtained by calculation according to the output of the classification branch;
the regression branch is a full connection layer with four-dimensional vector output, and the initial candidate frame is subjected to frame regression according to the output of the regression branch.
4. The method for detecting text in a natural scene with an arbitrary shape as set forth in claim 2,
generating a mask for the adjusted candidate frame based on the features of the adjusted candidate frame and the expanded candidate frame features; evaluating the mask quality to obtain a mask score; the method specifically comprises the following steps:
the adjusted features of the candidate box and the expanded features of the candidate box are input into a Mask module Mask head, and the Mask module Mask head comprises two workflows: mask generating stream and mask score stream;
a mask generation stream, which takes the adjusted candidate box characteristics as input and outputs a mask;
and the mask score stream takes the adjusted candidate box characteristics and the expanded candidate box characteristics as input and outputs a mask score.
5. The method for detecting the text in the natural scene with the arbitrary shape as claimed in claim 1, wherein the model structure of the detection model comprises:
the system comprises a skeleton network Backbone, a text detection module and a text analysis module, wherein the skeleton network Backbone is used for inputting a text detection image;
the output end of the Backbone network Backbone is connected with the input end of the candidate region generation network RPN;
the output end of the candidate region generation network RPN is connected with the input end of a RoIAlign layer; the output end of the RoIAlign layer is connected with the input end of the frame module Box head; the frame module Box head comprises two fully-connected layers which are connected in sequence;
the output end of the RoIAlign layer is also connected with the input end of the Mask head module.
6. The method for detecting texts in natural scenes with arbitrary shapes according to claim 5, wherein the Mask head module comprises: two parallel working branches: a first branch and a second branch;
wherein, first branch road includes: a convolutional layer C1 and a convolutional layer C2 connected in sequence; an input terminal of convolutional layer C1 for inputting the characteristics of the adjusted candidate frame;
wherein, the second branch road includes: a convolutional layer D1 and a convolutional layer D2 connected in this order; the input end of the convolution layer D1 is used for inputting the splicing value of the adjusted candidate frame characteristic and the expanded candidate frame characteristic;
the output end of the convolutional layer C2 is connected to the first input end of the first mask attention module MAM;
the output end of the convolutional layer D2 is connected with the second input end of the first mask attention module;
the first output end of the first mask attention module is connected with the first input end of the second mask attention module;
the second output end of the first mask attention module is connected with the second input end of the second mask attention module;
a first output end of the second mask attention module is connected with an input end of the deconvolution layer, an output end of the deconvolution layer is connected with an input end of the convolution layer C3, and an output end of the convolution layer C3 generates a predicted mask;
the second output end of the second mask attention module is connected with the input end of the convolutional layer D3, the output end of the convolutional layer C3 is connected with the input end of the convolutional layer D3, the characteristics of the output end of the convolutional layer D3 are connected with three full-connection layers after being subjected to size adjustment, and the last full-connection layer outputs a mask score.
7. The method for detecting the text of the natural scene with the arbitrary shape as claimed in claim 5, wherein the Mask head module specifically works as follows:
adjusted candidate box featuresForming mask generation stream features over two convolutional layers;
Adjusted candidate box featuresAnd extended candidate box featuresCascading into two convolutional layers to form a masked scored stream feature;
Andfeeding into a first mask attention module; the first mask attention module causes the mask score stream to focus on regions contained in the mask;
features of the two workflows pass through a second mask attention module;
mask generation stream characterization through an deconvolution layer and a convolution layer to generate predicted masks;
8. The method for detecting text in an arbitrarily-shaped natural scene as recited in claim 6, wherein the first masking attention module comprises:
convolutional layer E1; an input of the convolutional layer E1 is for connection with a first mask attention module first input; the output end of the convolutional layer E1 is used for being connected with a first output end of a first mask attention module;
a convolutional layer F1; an input of the convolutional layer F1 is for connection with a first mask attention module first input; the output end of the convolutional layer F1 is used for being connected with the input end of the multiplier;
the input end of the multiplier is also connected with the second input end of the first mask attention module; the output end of the multiplier is connected with the input end of the adder, and the input end of the adder is also connected with the second input end of the first mask attention module; the output of the adder is further adapted to be coupled to a second output of the first mask attention module via convolutional layer G1.
9. The method for detecting the text in the natural scene with the arbitrary shape as claimed in claim 1, wherein the training step of the trained detection model comprises:
constructing a training set, wherein the training set is an image of a known candidate frame label;
inputting the training set into the detection model, training the detection model,
carrying out feature extraction on the image of the known candidate frame tag;
constructing an initial candidate frame based on the extracted features;
generating an initial candidate frame feature based on the initial candidate frame; predicting a classification score of the candidate box based on the initial candidate box feature; meanwhile, generating a four-dimensional regression bias vector for the initial candidate frame based on the characteristics of the initial candidate frame;
generating characteristics of the initial candidate frame for the initial candidate frame; expanding the initial candidate frame to obtain an expanded candidate frame; generating an expansion candidate frame characteristic for the expansion candidate frame;
generating a mask based on the initial candidate box feature and the expanded candidate box feature; evaluating the mask quality to obtain a mask score;
and calculating a loss function according to the generated classification score, the regression bias vector, the mask score and the generated attention map, and obtaining a trained candidate frame screening model by reversely propagating and optimizing network parameters.
10. The system for detecting the texts in the natural scenes in any shapes is characterized by comprising the following steps:
an acquisition module configured to: acquiring a to-be-detected text image;
a detection module configured to: inputting the image to be detected into the trained detection model to obtain a final detection frame; carrying out post-processing on the obtained final detection frame to form a text area;
and the detection model screens the candidate detection frames through the product of the classification score and the mask score to obtain the final detection frame.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110715820.9A CN113255669B (en) | 2021-06-28 | 2021-06-28 | Method and system for detecting text of natural scene with any shape |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110715820.9A CN113255669B (en) | 2021-06-28 | 2021-06-28 | Method and system for detecting text of natural scene with any shape |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113255669A true CN113255669A (en) | 2021-08-13 |
CN113255669B CN113255669B (en) | 2021-10-01 |
Family
ID=77189947
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110715820.9A Active CN113255669B (en) | 2021-06-28 | 2021-06-28 | Method and system for detecting text of natural scene with any shape |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113255669B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114067237A (en) * | 2021-10-28 | 2022-02-18 | 清华大学 | Video data processing method, device and equipment |
CN114863431A (en) * | 2022-04-14 | 2022-08-05 | 中国银行股份有限公司 | Text detection method, device and equipment |
CN116958981A (en) * | 2023-05-31 | 2023-10-27 | 广东南方网络信息科技有限公司 | Character recognition method and device |
Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150170002A1 (en) * | 2013-05-31 | 2015-06-18 | Google Inc. | Object detection using deep neural networks |
CN108549893A (en) * | 2018-04-04 | 2018-09-18 | 华中科技大学 | A kind of end-to-end recognition methods of the scene text of arbitrary shape |
CN109299274A (en) * | 2018-11-07 | 2019-02-01 | 南京大学 | A kind of natural scene Method for text detection based on full convolutional neural networks |
CN110287960A (en) * | 2019-07-02 | 2019-09-27 | 中国科学院信息工程研究所 | The detection recognition method of curve text in natural scene image |
CN110807422A (en) * | 2019-10-31 | 2020-02-18 | 华南理工大学 | Natural scene text detection method based on deep learning |
CN110895695A (en) * | 2019-07-31 | 2020-03-20 | 上海海事大学 | Deep learning network for character segmentation of text picture and segmentation method |
CN111754531A (en) * | 2020-07-08 | 2020-10-09 | 深延科技(北京)有限公司 | Image instance segmentation method and device |
JP2020181255A (en) * | 2019-04-23 | 2020-11-05 | 国立大学法人 東京大学 | Image analysis device, image analysis method, and image analysis program |
CN111950545A (en) * | 2020-07-23 | 2020-11-17 | 南京大学 | Scene text detection method based on MSNDET and space division |
CN112163634A (en) * | 2020-10-14 | 2021-01-01 | 平安科技(深圳)有限公司 | Example segmentation model sample screening method and device, computer equipment and medium |
CN112183545A (en) * | 2020-09-29 | 2021-01-05 | 佛山市南海区广工大数控装备协同创新研究院 | Method for recognizing natural scene text in any shape |
CN112183322A (en) * | 2020-09-27 | 2021-01-05 | 成都数之联科技有限公司 | Text detection and correction method for any shape |
AU2020103585A4 (en) * | 2020-11-20 | 2021-02-04 | Sonia Ahsan | CDN- Object Detection System: Object Detection System with Image Classification and Deep Neural Networks |
CN112446356A (en) * | 2020-12-15 | 2021-03-05 | 西北工业大学 | Method for detecting text with any shape in natural scene based on multiple polar coordinates |
CN112749704A (en) * | 2019-10-31 | 2021-05-04 | 北京金山云网络技术有限公司 | Text region detection method and device and server |
CN112861855A (en) * | 2021-02-02 | 2021-05-28 | 华南农业大学 | Group-raising pig instance segmentation method based on confrontation network model |
CN112989927A (en) * | 2021-02-03 | 2021-06-18 | 杭州电子科技大学 | Scene graph generation method based on self-supervision pre-training |
-
2021
- 2021-06-28 CN CN202110715820.9A patent/CN113255669B/en active Active
Patent Citations (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150170002A1 (en) * | 2013-05-31 | 2015-06-18 | Google Inc. | Object detection using deep neural networks |
CN108549893A (en) * | 2018-04-04 | 2018-09-18 | 华中科技大学 | A kind of end-to-end recognition methods of the scene text of arbitrary shape |
CN109299274A (en) * | 2018-11-07 | 2019-02-01 | 南京大学 | A kind of natural scene Method for text detection based on full convolutional neural networks |
JP2020181255A (en) * | 2019-04-23 | 2020-11-05 | 国立大学法人 東京大学 | Image analysis device, image analysis method, and image analysis program |
CN110287960A (en) * | 2019-07-02 | 2019-09-27 | 中国科学院信息工程研究所 | The detection recognition method of curve text in natural scene image |
CN110895695A (en) * | 2019-07-31 | 2020-03-20 | 上海海事大学 | Deep learning network for character segmentation of text picture and segmentation method |
CN112749704A (en) * | 2019-10-31 | 2021-05-04 | 北京金山云网络技术有限公司 | Text region detection method and device and server |
CN110807422A (en) * | 2019-10-31 | 2020-02-18 | 华南理工大学 | Natural scene text detection method based on deep learning |
CN111754531A (en) * | 2020-07-08 | 2020-10-09 | 深延科技(北京)有限公司 | Image instance segmentation method and device |
CN111950545A (en) * | 2020-07-23 | 2020-11-17 | 南京大学 | Scene text detection method based on MSNDET and space division |
CN112183322A (en) * | 2020-09-27 | 2021-01-05 | 成都数之联科技有限公司 | Text detection and correction method for any shape |
CN112183545A (en) * | 2020-09-29 | 2021-01-05 | 佛山市南海区广工大数控装备协同创新研究院 | Method for recognizing natural scene text in any shape |
CN112163634A (en) * | 2020-10-14 | 2021-01-01 | 平安科技(深圳)有限公司 | Example segmentation model sample screening method and device, computer equipment and medium |
AU2020103585A4 (en) * | 2020-11-20 | 2021-02-04 | Sonia Ahsan | CDN- Object Detection System: Object Detection System with Image Classification and Deep Neural Networks |
CN112446356A (en) * | 2020-12-15 | 2021-03-05 | 西北工业大学 | Method for detecting text with any shape in natural scene based on multiple polar coordinates |
CN112861855A (en) * | 2021-02-02 | 2021-05-28 | 华南农业大学 | Group-raising pig instance segmentation method based on confrontation network model |
CN112989927A (en) * | 2021-02-03 | 2021-06-18 | 杭州电子科技大学 | Scene graph generation method based on self-supervision pre-training |
Non-Patent Citations (4)
Title |
---|
MINGHUI LIAO 等: "Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》 * |
ZHAOJIN HUANG 等: "Mask Scoring R-CNN", 《2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 * |
肖雅娟: "基于深度学习的图像文本检测技术研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
许信顺 等: "文本分类中一种新的特征选择方法", 《山东大学学报(工学版)》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114067237A (en) * | 2021-10-28 | 2022-02-18 | 清华大学 | Video data processing method, device and equipment |
CN114863431A (en) * | 2022-04-14 | 2022-08-05 | 中国银行股份有限公司 | Text detection method, device and equipment |
CN116958981A (en) * | 2023-05-31 | 2023-10-27 | 广东南方网络信息科技有限公司 | Character recognition method and device |
CN116958981B (en) * | 2023-05-31 | 2024-04-30 | 广东南方网络信息科技有限公司 | Character recognition method and device |
Also Published As
Publication number | Publication date |
---|---|
CN113255669B (en) | 2021-10-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113255669B (en) | Method and system for detecting text of natural scene with any shape | |
CN101971190B (en) | Real-time body segmentation system | |
CN111598030A (en) | Method and system for detecting and segmenting vehicle in aerial image | |
CN113486726A (en) | Rail transit obstacle detection method based on improved convolutional neural network | |
CN105574524B (en) | Based on dialogue and divide the mirror cartoon image template recognition method and system that joint identifies | |
CN109492596B (en) | Pedestrian detection method and system based on K-means clustering and regional recommendation network | |
CN111767927A (en) | Lightweight license plate recognition method and system based on full convolution network | |
Aggarwal et al. | A robust method to authenticate car license plates using segmentation and ROI based approach | |
Ji et al. | Filtered selective search and evenly distributed convolutional neural networks for casting defects recognition | |
CN115131797B (en) | Scene text detection method based on feature enhancement pyramid network | |
CN114648665A (en) | Weak supervision target detection method and system | |
CN112990282B (en) | Classification method and device for fine-granularity small sample images | |
CN112287941A (en) | License plate recognition method based on automatic character region perception | |
CN112507876A (en) | Wired table picture analysis method and device based on semantic segmentation | |
He et al. | Aggregating local context for accurate scene text detection | |
CN113496480A (en) | Method for detecting weld image defects | |
Qin et al. | Traffic sign segmentation and recognition in scene images | |
Wang | A survey on IQA | |
CN116152226A (en) | Method for detecting defects of image on inner side of commutator based on fusible feature pyramid | |
CN114330234A (en) | Layout structure analysis method and device, electronic equipment and storage medium | |
CN117372876A (en) | Road damage evaluation method and system for multitasking remote sensing image | |
Li et al. | An improved PCB defect detector based on feature pyramid networks | |
CN117522735A (en) | Multi-scale-based dense-flow sensing rain-removing image enhancement method | |
CN110363198B (en) | Neural network weight matrix splitting and combining method | |
CN111178275A (en) | Fire detection method based on convolutional neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |