CN113191358B - Metal part surface text detection method and system - Google Patents
Metal part surface text detection method and system Download PDFInfo
- Publication number
- CN113191358B CN113191358B CN202110603294.7A CN202110603294A CN113191358B CN 113191358 B CN113191358 B CN 113191358B CN 202110603294 A CN202110603294 A CN 202110603294A CN 113191358 B CN113191358 B CN 113191358B
- Authority
- CN
- China
- Prior art keywords
- text
- image
- text box
- corrected
- saliency map
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 239000002184 metal Substances 0.000 title claims abstract description 65
- 238000001514 detection method Methods 0.000 title claims abstract description 24
- 238000000034 method Methods 0.000 claims abstract description 28
- 230000011218 segmentation Effects 0.000 claims abstract description 24
- 238000001914 filtration Methods 0.000 claims abstract description 16
- 238000012937 correction Methods 0.000 claims abstract description 15
- 230000001629 suppression Effects 0.000 claims abstract description 12
- 230000007246 mechanism Effects 0.000 claims description 13
- 238000012545 processing Methods 0.000 claims description 9
- 238000012805 post-processing Methods 0.000 claims description 8
- 238000011176 pooling Methods 0.000 claims description 6
- 238000007781 pre-processing Methods 0.000 claims description 6
- 230000001680 brushing effect Effects 0.000 claims description 5
- 230000003044 adaptive effect Effects 0.000 claims description 3
- 230000006870 function Effects 0.000 description 5
- 238000012015 optical character recognition Methods 0.000 description 5
- 238000003707 image sharpening Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000002372 labelling Methods 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 3
- 238000003062 neural network model Methods 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 239000010813 municipal solid waste Substances 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007797 corrosion Effects 0.000 description 1
- 238000005260 corrosion Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000009776 industrial production Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
- 238000010147 laser engraving Methods 0.000 description 1
- 238000004806 packaging method and process Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000011158 quantitative evaluation Methods 0.000 description 1
- 238000011867 re-evaluation Methods 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
- G06V20/63—Scene text, e.g. street names
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/40—Image enhancement or restoration using histogram techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/73—Deblurring; Sharpening
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/136—Segmentation; Edge detection involving thresholding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20092—Interactive image processing based on input by user
- G06T2207/20104—Interactive definition of region of interest [ROI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30176—Document
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a method and a system for detecting a text on the surface of a metal part, which comprises the following steps: a pretreatment step: identifying a metal surface character image, and performing image enhancement on the metal surface character image to obtain a preprocessed image; foreground feature focusing step: based on the preprocessed image, obtaining a saliency map through the image characteristics of the depth convolution network highlight text area; multi-scale correction: filtering background text boxes of different levels of the pyramid network by using pixel information of the saliency map, and evaluating and predicting the selected text box through a corrected feature network to obtain a corrected text box; post-treatment: calculating the example score of the corrected text box, and applying a non-maximum suppression algorithm to obtain the final text box position by combining the prediction score. The method solves the problem of text detection with complex background caused by metal attribute and industrial environment, realizes automatic segmentation of the character image of the metal part, outputs a high-precision text positioning frame, and improves the detection precision.
Description
Technical Field
The invention relates to the technical field of text detection, in particular to a method and a system for detecting a text on the surface of a metal part.
Background
Text information is used as a key ring in the information era, is applied to network electronic information, text printing, traffic signs, product trademarks and the like, and plays an increasingly important role in the scientific era, so that the research on Optical Character Recognition (OCR) plays an important role in the fields of intelligent automation, information processing, AI and the like. Optical Character Recognition (OCR) applications hatched in the business scenario of enterprise resource planning have received a great deal of attention, such as gesture recognition, package print recognition and metal surface character recognition. Among them, tracking of metal parts is the most challenging in many industrial scenarios.
The direct metal part marking technology is a main means for marking part products, and means that determined part information is directly printed on the products when the parts are manufactured and produced, and the direct metal part marking technology mainly comprises three modes of laser engraving, pinhole marking and ink jet marking. The research and the analysis of OCR technique to metal parts surface character sign can be at the processing lines of all kinds of machines on the quick identification part model, information such as production information and producer, prevent the artifical emergence that leads to the mistake because of discerning fatigue, improve production efficiency.
The existing text detection method mainly researches the influence of the complexity of a natural scene, however, because a character data set on the surface of a metal part is difficult to collect, and in the field of text detection of the metal part, the problems of strong light reflection on the metal surface, large difference of metal textures, inconsistent character arrangement, poor foreground and background contrast, complex metal texture background and the like exist, the positioning of a text detection frame is not accurate enough, and the character identification tracked by the metal part is difficult.
Patent document CN110222680A (application number: CN 201910416098.1) discloses a text detection method for outer packaging of municipal solid waste articles: collecting an image data set of an outer package of the urban garbage article, and labeling a text region of each image in the image data set; generating a text score characteristic map and a multi-channel position characteristic map for each image in the image data set subjected to labeling according to the labeling of the text region to form a training label of each image; images in the image data set are processed according to the following steps of 9:1 into a training set and a test set; constructing a full convolution neural network model and training to obtain a trained full convolution neural network model; acquiring a prediction text region of an image to be detected by using a trained full convolution neural network model; a threshold value screening stage; and a non-maximum value inhibition stage, and obtaining a final text area detection result.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a method and a system for detecting a text on the surface of a metal part.
The invention provides a metal part surface text detection method, which comprises the following steps:
a pretreatment step: identifying a metal surface character image, and performing image enhancement on the metal surface character image to obtain a preprocessed image;
foreground feature focusing: based on the preprocessed image, highlighting the image characteristics of the text area through a depth convolution network to obtain a saliency map;
multi-scale correction: filtering background text boxes of different levels of the pyramid network by using pixel information of the saliency map, and evaluating and predicting the selected text box through a corrected feature network to obtain a corrected text box;
and (3) post-treatment: calculating the example score of the corrected text box, and applying a non-maximum suppression algorithm to obtain the final text box position by combining the prediction score.
Preferably, the pretreatment step comprises:
an image enhancement step: the method comprises the steps of carrying out equalization enhancement on the local contrast of a metal surface character image based on an RGB image self-adaptive histogram, sharpening the metal surface character image by adopting a Laplace operator, and reserving high-frequency information and highlight text character details to obtain a preprocessed image.
Preferably, the foreground feature focusing step includes:
semantic segmentation step: sending the metal surface character image into a multi-level convolution network, setting a parallel convolution structure and a channel attention mechanism to fuse high-level features, adding a self-adaptive operator to highlight the difference value between the foreground and the background of the low-level features, and combining the high-level features and the low-level features to obtain a saliency map;
foreground focusing: and comparing the saliency map with a mask map with tag information, and setting a segmentation threshold for an area with contrast and identification lower than a preset threshold to obtain more foreground text features.
Preferably, the multi-scale rectification step comprises:
polygon selection: carrying out image binarization processing on the saliency map, filtering text boxes generated by convolutional networks of different levels and text boxes excluding background areas by using a binarization result as a mask, and brushing and selecting to obtain multi-scale suggested text boxes;
a position correction step: and coding the suggested text box into a fixed shape by applying the ROI pooling model, extracting the ROI regional characteristics, and sending the ROI regional characteristics into a classification and regression network to obtain a corrected text box.
Preferably, the post-processing step comprises:
and (4) re-grading: calculating an example score of the corrected text box according to the binarized saliency map, and re-evaluating the score of each corrected text box by combining the predicted score of the corrected text box;
a non-maximum suppression step: and filtering repeated text boxes by applying an NMS method to obtain the final text box position.
The invention provides a metal part surface text detection system, which comprises:
a preprocessing module: identifying a metal surface character image, and performing image enhancement on the metal surface character image to obtain a preprocessed image;
a foreground feature focusing module: based on the preprocessed image, highlighting the image characteristics of the text area through a depth convolution network to obtain a saliency map;
a multi-scale rectification module: filtering background text boxes of different levels of the pyramid network by using pixel information of the saliency map, and evaluating and predicting the selected text box through a corrected feature network to obtain a corrected text box;
a post-processing module: calculating the example score of the corrected text box, and applying a non-maximum suppression algorithm to obtain the final text box position by combining the prediction score.
Preferably, the preprocessing module comprises:
an image enhancement module: the method comprises the steps of carrying out equalization enhancement on the local contrast of a metal surface character image based on an RGB image self-adaptive histogram, sharpening the metal surface character image by adopting a Laplace operator, and reserving high-frequency information and highlight text character details to obtain a preprocessed image.
Preferably, the foreground feature focusing module includes:
a semantic segmentation module: sending the metal surface character image into a multi-level convolution network, setting a parallel convolution structure and a channel attention mechanism to fuse high-level features, adding a self-adaptive operator to highlight the difference value between the foreground and the background of the low-level features, and combining the high-level features and the low-level features to obtain a saliency map;
a foreground focusing module: and comparing the saliency map with a mask map with tag information, and setting a segmentation threshold for an area with contrast and identification lower than a preset threshold to obtain more foreground text features.
Preferably, the multi-scale rectification module comprises:
a polygon selection module: carrying out image binarization processing on the saliency map, filtering text boxes generated by convolutional networks of different levels and text boxes excluding background areas by using a binarization result as a mask, and brushing and selecting to obtain multi-scale suggested text boxes;
a position correction module: and coding the suggested text box into a fixed shape by applying the ROI pooling model, extracting the ROI regional characteristics, and sending the ROI regional characteristics into a classification and regression network to obtain a corrected text box.
Preferably, the post-processing module comprises:
a re-scoring module: calculating an example score of the corrected text box according to the binarized saliency map, and re-evaluating the score of each corrected text box by combining the predicted score of the corrected text box;
a non-maxima suppression module: and filtering repeated text boxes by applying an NMS method to obtain the final text box position.
Compared with the prior art, the invention has the following beneficial effects:
(1) The invention relates to a metal part surface text detection method based on refined positions and classification characteristics, which aims at the phenomena of low contrast, strong light reflection, uneven characters and complex textures on a metal surface, enhances the character contrast of the metal surface through self-adaptive histogram equalization and image sharpening, designs a semantic segmentation method focusing on the foreground, and highlights the text characteristics of a character area;
(2) The invention provides a quick and effective polygon selection algorithm aiming at the phenomenon that the text positioning of metal parts is inaccurate, so that background frames are effectively filtered, more accurate foreground frames are provided for correcting a network for regression, and the positioning effect is improved;
(3) The invention provides a re-scoring mechanism by combining the example scores of the predicted position frames, thereby not only obtaining the detection frames with accurate positioning, but also improving the comprehensive index of text detection.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a flow chart of text detection on a surface of a metal part.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will aid those skilled in the art in further understanding the present invention, but are not intended to limit the invention in any manner. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
The embodiment is as follows:
the method adopts a self-adaptive histogram equalization and image sharpening method to preprocess the RGB image; aiming at the characteristics of low contrast, strong light reflection, character corrosion and the like of the surface of the metal part, respectively adopting attention mechanism focusing text characteristics of high-level characteristics and low-level characteristics; then, more text region features with low contrast and difficult distinguishing are detected by adopting a foreground-based segmentation loss function, so that the segmentation effect is further improved; aiming at the problem that the prediction box of the algorithm is prone to have inaccurate positioning in the text edge area, selecting a better prediction box according to a saliency map obtained by segmentation, and then carrying out secondary correction on a suggestion box; calculating the example score of the correction frame based on the position and the score of the correction frame, and screening all the correction frames by adopting a re-scoring mechanism; and finally, obtaining a final prediction frame according to a non-maximum suppression algorithm, and establishing a text detection evaluation system by combining the indexes of calculating the accuracy, recall rate, comprehensive index, IOU and the like of the IOU from the prediction frame, thereby realizing the digital quantitative evaluation of the text detection algorithm.
The invention provides a metal part surface text detection method based on a refined position and classification characteristics, which comprises the following steps:
a pretreatment step: identifying the metal surface character image, and carrying out image enhancement on the metal surface character image through preprocessing to obtain a high-quality preprocessed image;
foreground feature focusing step: inputting the preprocessed metal surface character image, and obtaining a segmented saliency map through the image characteristics of a deep convolution network highlight text area;
multi-scale correction: filtering background text boxes of different levels of the pyramid network by using pixel information of the saliency map, and sending the selected text boxes into the correction feature network for re-evaluation;
and (3) post-treatment: extracting the example scores of the corrected network prediction frame on the image features of the saliency map, re-scoring the suggestion frame based on the classification scores of the corrected feature network prediction by combining the example scores and the classification scores, and obtaining a final corrected text frame by applying a non-maximum suppression post-processing method.
Specifically, the pretreatment step includes:
an image enhancement step: the method comprises the steps of carrying out equalization enhancement on the local contrast of a metal part image based on an RGB image self-adaptive histogram, sharpening the metal part image by adopting a Laplacian operator, and reserving high-frequency information and highlight text character details to obtain a preprocessed image.
Specifically, the foreground feature focusing step includes:
semantic segmentation step: sending the metal part character image into a multi-level convolution network, setting a parallel convolution structure and a channel attention mechanism to fuse high-level features, adding an adaptive operator to highlight the difference value between the foreground and the background of the low-level features, and combining the high-level features and the low-level features to obtain a saliency map;
foreground focusing step: and comparing the saliency map obtained in the semantic segmentation step with a mask map with label information, and setting segmentation threshold values for low-contrast and difficult-to-identify areas to exchange more foreground text features.
Specifically, the multi-scale rectification step comprises:
polygon selection: filtering text boxes generated by convolutional networks of different levels as masks according to the binarization result of the semantically segmented saliency map, excluding text boxes in a background area, and brushing to obtain multi-scale suggested text boxes;
and a position correction step: and coding the suggested text box generated in the polygon selection step into a fixed shape by applying an RROI pooling model, extracting the characteristics of the ROI area, and sending the ROI area characteristics into a classification and regression network to obtain a corrected text box.
Specifically, the post-processing step includes:
and (3) re-grading: calculating the score of the modified text box example according to the binarized saliency map, and re-evaluating the score of each modified text box by combining the prediction score of the modified text box;
a non-maximum suppression step: and filtering repeated text boxes by applying an NMS method to obtain the final text box position.
As shown in fig. 1, the segmentation method based on foreground feature focusing, the multi-scale position correction method and the text box re-scoring mechanism of the present invention process the enhanced metal part image, so as to realize the detection and positioning of the text box, and include the following steps:
step 100, image preprocessing, including step 110 and step 120, wherein adaptive histogram equalization and image sharpening are adopted to enhance the image, improve the image quality and prepare for subsequent processing;
step 110: enhancing the contrast of the metal part image, and adopting different enhancement schemes for different parts of the image based on self-adaptive histogram equalization to enhance the contrast and simultaneously keep the image details;
step 120: realizing image sharpening based on a Laplace operator, keeping high-frequency information of the image, and highlighting detail characteristics of a text region, wherein the Laplace operator is as follows:
step 200: foreground feature focusing based segmentation comprises steps 210, 220 and 230, an attention mechanism is set based on pyramid network hierarchical levels of a ResNet-FPN framework, multi-level features are fused to enhance hierarchical feature relevance, high-level features and low-level features are combined to obtain a saliency map, and then a loss function focusing foreground is designed to promote segmentation of low-contrast text images.
Step 210: the low-level network has more details and poor semantic perception capability, while the high-level network has more semantic information and less detail characteristics. A parallel structure is designed for a high-level network, and a multi-resolution subnet of a high-level feature map is fused by mutually exchanging information, so that each level of the multi-resolution subnet is ensured to contain feature information with higher resolution, and the spatial features of the subnet are enriched. And weighting each channel of the low-level network to highlight text features, and acquiring more semantic information.
Specifically, let the pyramid network be divided into P 2 ,P 3 ,P 4 ,P 5 Four layers, conv, DConv and UConv respectively represent different convolution types, c is the channel number, j represents a pixel point on the characteristic diagram, CA is a channel attention mechanism, and P is selected 2 As a low-level network, P 3 ,P 4 ,P 5 As a high-level network, the following processing is performed on the low-level features:
L map =Conv(P 2 )
the high-level network has multi-scale subnets, and more detailed characteristics are obtained by the following processing:
then fusing the high-level feature mapsAnd low-level feature mapsForming a supervised saliency map S map 。
Step 220: according to the characteristics of low-contrast characters on a metal surface, when the background noise is eliminated, the character features of the text edge can be lost in a segmented mode, the regression position is easy to be incorrect, the foreground is concerned more, and firstly, dice is introduced to serve as a loss function L dice To solve the problem of the imbalance of positive and negative samples, then the significance map segmentation is promoted according to the following two original design segmentation loss mechanisms: a) As many text features as possible are included; b) The number of false detections is minimized. Assume that the tag is S gt And the label is related to S map Is the difference of S Diff The following loss function is proposed.
L a =(S Diff ≥1/2)*(1-F)
L b =(-S Diff ≥1/2)*F
Gamma denotes the equilibrium parameter, F denotes S map The result after binarization, Δ, represents an upper limit of the allowable false detection rate in exchange for detecting more text features in low contrast and indistinguishable areas.
Step 230: the complicated metal part background is easy to interfere with a subsequent classification network and a regression network, so that a saliency map obtained by semantic segmentation is applied to each hierarchical subnet, text features in the classification network and the regression network are highlighted, and background noise is suppressed, and the method specifically comprises the following steps:
the classification network and regression network refer to two sub-networks of common structure and different parameters, which are composed of four 3 × 3 convolutional layers and one 3 × 5 convolutional layer. Each layer of the pyramid network is used for classification and regression tasks, respectively, after the saliency map is weighted. The method effectively inhibits background noise, retains more text features as far as possible, generates a prediction box with high matching precision, has the defect that the position of the prediction box cannot effectively cover the text, and is difficult to realize metal tracking subsequently, and the problem can be solved in secondary regression based on a saliency map subsequently.
Step 300 is a multi-scale saliency map-based rectification step, comprising steps 310 and 320, using foreground-focus based segmentation results, employing a polygon selection algorithm, and applying a refinement model to recalibrate the proposed box.
Step 310: each layer of the pyramid network generates a prediction box through classification and regression subnetworks, however, especially for oblique texts, the positions of the prediction boxes are not accurate enough, so k =500 prediction boxes attached to the foreground are obtained based on a saliency map obtained through segmentation and fed into a refinement feature network, and a specific algorithm is as follows:
step 320: and (3) coding the suggested text box generated in the polygon selection step into a fixed shape by applying an RROI pooling model, extracting the ROI regional characteristics from the ResNet-FPN frame, and sending the ROI regional characteristics into a classification and regression network to obtain the corrected text box position and score.
Step 400: a re-scoring mechanism, which comprises a step 410 and a step 420, wherein a score of a modified text box instance is calculated based on the binarized saliency map, and a score is re-evaluated for each modified text box in combination with a prediction score of the modified text box;
step 410: and calculating a score of a modified text box instance by combining the extracted saliency frames according to the positions and the scores of the text boxes generated by the modified network, and re-evaluating the score of each text box. Assume a classification score of S for the suggested text box c Example score of S I The specific process comprises the following steps:
P V ={ρ 1 ,…,ρ n }
wherein: mu is set to 1/4 V For the suggestion box from S map The extracted set of pixels.
Step 420: and an NMS (non-maximum rendering) algorithm is applied to remove repeated text boxes, and a text detection evaluation system is established based on indexes such as the IOU value establishing accuracy, the recall rate and the comprehensive score of the prediction box and the label.
The accuracy calculation formula:
recall ratio calculation formula:
the comprehensive fraction calculation formula is as follows:
where tp, fp, fn represent the number of hit text boxes, identify erroneous text boxes and missing text boxes, respectively.
The method solves the problem of positioning the text information on the surface of the metal part in the industrial environment so as to help the metal part to track and record on the industrial production line; through research and analysis to metal parts surface character sign, can discern information such as part model, size and producer fast on the processing lines of all kinds of machines, prevent the artifical emergence that leads to the mistake because of discerning fatigue, improve production efficiency.
Those skilled in the art will appreciate that, in addition to implementing the systems, apparatus, and various modules thereof provided by the present invention in purely computer readable program code, the same procedures can be implemented entirely by logically programming method steps such that the systems, apparatus, and various modules thereof are provided in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system, the device and the modules thereof provided by the present invention can be considered as a hardware component, and the modules included in the system, the device and the modules thereof for implementing various programs can also be considered as structures in the hardware component; modules for performing various functions may also be considered to be both software programs for performing the methods and structures within hardware components.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.
Claims (2)
1. A method for detecting texts on the surfaces of metal parts is characterized by comprising the following steps:
a pretreatment step: recognizing the metal surface character image, and performing image enhancement on the metal surface character image to obtain a preprocessed image;
foreground feature focusing step: based on the preprocessed image, highlighting the image characteristics of the text area through a depth convolution network to obtain a saliency map;
multi-scale correction: filtering background text boxes of different levels of the pyramid network by using pixel information of the saliency map, and evaluating and predicting the selected text box through a corrected feature network to obtain a corrected text box;
post-treatment: calculating an example score of the corrected text box, and applying a non-maximum suppression algorithm to obtain a final text box position in combination with the prediction score;
the pretreatment step comprises:
an image enhancement step: based on an RGB image self-adaptive histogram, the local contrast of a metal surface character image is enhanced in a balanced mode, meanwhile, a Laplacian operator is adopted to sharpen the metal surface character image, high-frequency information and highlight text character details are reserved, and a preprocessed image is obtained;
the foreground feature focusing step comprises:
semantic segmentation step: sending the metal surface character image into a multi-level convolution network, setting a parallel convolution structure and a channel attention mechanism to fuse high-level features, adding a self-adaptive operator to highlight the difference value between the foreground and the background of the low-level features, and combining the high-level features and the low-level features to obtain a saliency map;
foreground focusing step: comparing the saliency map with a mask map with label information, and setting a segmentation threshold for an area with contrast and identification lower than a preset threshold to obtain more foreground text features;
the multi-scale rectification step comprises:
polygon selection: carrying out image binarization processing on the saliency map, filtering text boxes generated by convolutional networks of different levels and text boxes excluding background areas by using a binarization result as a mask, and brushing and selecting to obtain multi-scale suggested text boxes;
a position correction step: coding the suggested text box to a fixed shape by applying an ROI pooling model, extracting ROI regional characteristics, and sending the ROI regional characteristics into a classification and regression network to obtain a corrected text box;
the post-processing step comprises:
and (3) re-grading: calculating an example score of the corrected text box according to the binarized saliency map, and re-evaluating the score of each corrected text box by combining the predicted score of the corrected text box;
a non-maximum suppression step: and filtering repeated text boxes by applying an NMS method to obtain the final text box position.
2. A metal part surface text detection system, comprising:
a preprocessing module: identifying a metal surface character image, and performing image enhancement on the metal surface character image to obtain a preprocessed image;
a foreground feature focusing module: based on the preprocessed image, obtaining a saliency map through the image characteristics of the depth convolution network highlight text area;
a multi-scale rectification module: filtering background text boxes of different levels of the pyramid network by using pixel information of the saliency map, and evaluating and predicting the selected text box through a corrected feature network to obtain a corrected text box;
a post-processing module: calculating an example score of the corrected text box, and applying a non-maximum suppression algorithm to obtain a final text box position in combination with the prediction score;
the preprocessing module comprises:
an image enhancement module: based on an RGB image self-adaptive histogram, the local contrast of a metal surface character image is enhanced in a balanced mode, meanwhile, a Laplacian operator is adopted to sharpen the metal surface character image, high-frequency information and highlight text character details are reserved, and a preprocessed image is obtained;
the foreground feature focusing module includes:
a semantic segmentation module: sending the metal surface character image into a multi-level convolution network, setting a parallel convolution structure and a channel attention mechanism to fuse high-level features, adding an adaptive operator to highlight the difference value between the foreground and the background of the low-level features, and combining the high-level features and the low-level features to obtain a saliency map;
a foreground focusing module: comparing the saliency map with a mask map with label information, and setting a segmentation threshold for an area with contrast and identification lower than a preset threshold to obtain more foreground text features;
the multi-scale rectification module comprises:
a polygon selection module: carrying out image binarization processing on the saliency map, filtering text boxes generated by convolutional networks of different levels and text boxes excluding background areas by using a binarization result as a mask, and brushing and selecting to obtain multi-scale suggested text boxes;
a position correction module: coding the suggested text box to a fixed shape by applying an ROI pooling model, extracting ROI regional characteristics, and sending the ROI regional characteristics into a classification and regression network to obtain a corrected text box;
the post-processing module comprises:
a re-scoring module: calculating an example score of the corrected text box according to the binarized saliency map, and re-evaluating the score of each corrected text box by combining the predicted score of the corrected text box;
a non-maxima suppression module: and filtering repeated text boxes by applying an NMS method to obtain the final text box position.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110603294.7A CN113191358B (en) | 2021-05-31 | 2021-05-31 | Metal part surface text detection method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110603294.7A CN113191358B (en) | 2021-05-31 | 2021-05-31 | Metal part surface text detection method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113191358A CN113191358A (en) | 2021-07-30 |
CN113191358B true CN113191358B (en) | 2023-01-24 |
Family
ID=76985941
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110603294.7A Active CN113191358B (en) | 2021-05-31 | 2021-05-31 | Metal part surface text detection method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113191358B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115471831B (en) * | 2021-10-15 | 2024-01-23 | 中国矿业大学 | Image saliency detection method based on text reinforcement learning |
CN114120305B (en) * | 2021-11-26 | 2023-07-07 | 北京百度网讯科技有限公司 | Training method of text classification model, and text content recognition method and device |
CN116912845B (en) * | 2023-06-16 | 2024-03-19 | 广东电网有限责任公司佛山供电局 | Intelligent content identification and analysis method and device based on NLP and AI |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110020676A (en) * | 2019-03-18 | 2019-07-16 | 华南理工大学 | Method for text detection, system, equipment and medium based on more receptive field depth characteristics |
CN111898411A (en) * | 2020-06-16 | 2020-11-06 | 华南理工大学 | Text image labeling system, method, computer device and storage medium |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108549893B (en) * | 2018-04-04 | 2020-03-31 | 华中科技大学 | End-to-end identification method for scene text with any shape |
CN110895695B (en) * | 2019-07-31 | 2023-02-24 | 上海海事大学 | Deep learning network for character segmentation of text picture and segmentation method |
CN111598861B (en) * | 2020-05-13 | 2022-05-03 | 河北工业大学 | Improved Faster R-CNN model-based non-uniform texture small defect detection method |
-
2021
- 2021-05-31 CN CN202110603294.7A patent/CN113191358B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110020676A (en) * | 2019-03-18 | 2019-07-16 | 华南理工大学 | Method for text detection, system, equipment and medium based on more receptive field depth characteristics |
CN111898411A (en) * | 2020-06-16 | 2020-11-06 | 华南理工大学 | Text image labeling system, method, computer device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN113191358A (en) | 2021-07-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113191358B (en) | Metal part surface text detection method and system | |
CN108596066B (en) | Character recognition method based on convolutional neural network | |
CN112686812B (en) | Bank card inclination correction detection method and device, readable storage medium and terminal | |
CN110598609A (en) | Weak supervision target detection method based on significance guidance | |
CN108334881B (en) | License plate recognition method based on deep learning | |
Nandi et al. | Traffic sign detection based on color segmentation of obscure image candidates: a comprehensive study | |
TW202239281A (en) | Electronic substrate defect detection | |
CN109460735A (en) | Document binary processing method, system, device based on figure semi-supervised learning | |
CN107016394B (en) | Cross fiber feature point matching method | |
CN112819840B (en) | High-precision image instance segmentation method integrating deep learning and traditional processing | |
CN110956167A (en) | Classification discrimination and strengthened separation method based on positioning characters | |
Verma et al. | Automatic container code recognition via spatial transformer networks and connected component region proposals | |
CN113836850A (en) | Model obtaining method, system and device, medium and product defect detection method | |
CN110796210A (en) | Method and device for identifying label information | |
Matoš et al. | The speed limit road signs recognition using hough transformation and multi-class SVM | |
Kumar et al. | D-PNR: deep license plate number recognition | |
Pham et al. | CNN-based character recognition for license plate recognition system | |
CN112508000B (en) | Method and equipment for generating OCR image recognition model training data | |
CN114419006A (en) | Method and system for removing watermark of gray level video characters changing along with background | |
CN111950556A (en) | License plate printing quality detection method based on deep learning | |
CN112200789A (en) | Image identification method and device, electronic equipment and storage medium | |
Deb et al. | A vehicle license plate detection method for intelligent transportation system applications | |
CN116912872A (en) | Drawing identification method, device, equipment and readable storage medium | |
CN111402185A (en) | Image detection method and device | |
CN112330659B (en) | Geometric tolerance symbol segmentation method combining LSD (least squares) linear detection and connected domain marking method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |