CN116778497A - Method and device for identifying hand well number, computer equipment and storage medium - Google Patents

Method and device for identifying hand well number, computer equipment and storage medium Download PDF

Info

Publication number
CN116778497A
CN116778497A CN202211710215.3A CN202211710215A CN116778497A CN 116778497 A CN116778497 A CN 116778497A CN 202211710215 A CN202211710215 A CN 202211710215A CN 116778497 A CN116778497 A CN 116778497A
Authority
CN
China
Prior art keywords
target
model
hand well
network
dbnet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211710215.3A
Other languages
Chinese (zh)
Inventor
冯超
刘忠江
王广善
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dongtu Tuoming Technology Co ltd
Original Assignee
Beijing Dongtu Tuoming Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dongtu Tuoming Technology Co ltd filed Critical Beijing Dongtu Tuoming Technology Co ltd
Priority to CN202211710215.3A priority Critical patent/CN116778497A/en
Publication of CN116778497A publication Critical patent/CN116778497A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The application relates to a method and a device for identifying a hand well number, computer equipment and a storage medium. The method comprises the following steps: training the improved deep learning model based on a plurality of target training samples containing the hand well numbers to obtain a target recognition model for recognizing the hand well numbers; wherein the improved deep learning model comprises: sequentially connecting the set target DBnet model, target Mobilenetv3 model and target CRNN model; the target DBnet model comprises: a residual network with an attention mechanism; and inputting the hand well image to be detected into the target recognition model to obtain a recognition result of the hand well image to be detected. By adopting the method, the accuracy of the identification of the number of the manhole can be improved, so that the investment of personnel is reduced.

Description

Method and device for identifying hand well number, computer equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and apparatus for identifying a manhole number, a computer device, and a storage medium.
Background
With the continuous development of OCR technology, there are currently mainly two ways: two-stage algorithms and end-to-end algorithms. Two-stage OCR algorithms are generally divided into two parts, text detection and recognition algorithms, which derive a detection box of text lines from an image, and then recognition algorithms recognize the content in the text box. The end-to-end OCR algorithm uses one model to complete text detection and text recognition simultaneously, and the basic idea is to share the same backhaul network, design different detection modules and recognition modules, and train text detection and text recognition simultaneously. Since character recognition can be completed by one model, the end-to-end model is smaller and the speed is faster. But the end-to-end accuracy is generally low, so two-stage text recognition is the dominant one.
At present, whether a two-step algorithm is adopted for identifying the number of the manhole or not is mainly adopted, and the target detection and CRNN technology is mainly adopted, so that the problems of high identification error, low identification speed, poor generalization capability and the like exist in the identification of the character region of the manhole of the human under a complex natural scene, and the problem of reduced precision exists in the CRNN when the complex text under the natural scene is encountered during the identification of the number of the manhole.
Therefore, it is needed to provide a technical solution to solve the above technical problems.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a manhole number recognition method, apparatus, computer device, and storage medium that can improve the manhole number recognition capability.
A method of identifying a human hand well number, the method comprising:
step A: training the improved deep learning model based on a plurality of target training samples containing the hand well numbers to obtain a target recognition model for recognizing the hand well numbers; wherein the improved deep learning model comprises: sequentially connecting the set target DBnet model, target Mobilenetv3 model and target CRNN model; wherein the target DBnet model comprises: a residual network with an attention mechanism; the target DBnet model is used for detecting a text box of an image, the target Mobilenetv3 model is used for performing angle calibration on the feature map, and the target CRNN model is used for performing text recognition on the feature map;
And (B) step (B): and inputting the hand well image to be detected into the target recognition model to obtain a recognition result of the hand well image to be detected.
In one embodiment, the image of the hand well to be detected is input to the target recognition model, and a recognition result of the image of the hand well to be detected is obtained.
In one embodiment, the step of training the improved deep learning model based on a plurality of target training samples including the manhole number to obtain a target recognition model for manhole number recognition includes:
and respectively inputting each target training sample into the target DBnet model to perform text box detection, and obtaining a first feature map corresponding to each target training sample.
Respectively inputting each first characteristic diagram to the target Mobilenetv3 model for angle calibration treatment to obtain a second characteristic diagram corresponding to each first characteristic diagram;
respectively inputting each second feature map into the target CRNN model after self-supervision pre-training to perform text recognition, and obtaining a target loss value corresponding to each target training sample;
and optimizing parameters of the improved deep learning model according to all target loss values to obtain an optimized deep learning model, taking the optimized deep learning model as the improved deep learning model, and returning to execute the step of respectively inputting each target training sample into the target DBnet model for text box detection until the maximum iteration number is reached, and determining the optimized deep learning model as the target recognition model.
In one embodiment, the step of inputting each target training sample to the target DBnet model to perform text box detection to obtain a first feature map corresponding to each target training sample includes:
based on a distillation mode of mutual learning of a plurality of student models, each target training sample is respectively input into the target DBnet model for text box detection, and a first feature map corresponding to each target training sample is obtained.
In one embodiment, the target DBnet model further includes: an improved FPN network; the construction process of the improved FPN network comprises the following steps: the convolution kernel to which each up-sampled and fused feature map in the original FPN network is connected is increased to increase the receptive field of the original FPN network.
In one embodiment, the method further comprises:
and compressing the original Mobilenetv3 model based on a filter pruning method to obtain the target Mobilenetv3 model.
In one embodiment, the method further comprises:
replacing an RNN network in an original CRNN model with a Swin-transform network to obtain the target CRNN model; wherein the original CRNN model includes: and sequentially connecting the set CNN network, the RNN network and the CTC loss network.
In one embodiment, the method further comprises:
acquiring a plurality of original training samples containing a human hand well number, and respectively carrying out data enhancement processing on each original training sample to obtain a plurality of first training samples;
and respectively carrying out resolution enhancement processing on each first training sample to obtain a plurality of target training samples.
A human hand well number identification device, the device comprising:
the training module is used for training the improved deep learning model based on a plurality of target training samples containing the hand well numbers to obtain a target recognition model for recognizing the hand well numbers; wherein the improved deep learning model comprises: sequentially connecting the set target DBnet model, target Mobilenetv3 model and target CRNN model; wherein the target DBnet model comprises: a residual network with an attention mechanism; the target DBnet model is used for detecting a text box of an image, the target Mobilenetv3 model is used for performing angle calibration on the feature map, and the target CRNN model is used for performing text recognition on the feature map;
and the identification module is used for inputting the hand well image to be detected into the target identification model to obtain the identification result of the hand well image to be detected.
A computer device comprising a memory storing a computer program and a processor implementing the steps of the method for identifying a human hand well number in the above embodiments when the processor executes the computer program.
A computer readable storage medium having stored thereon a computer program which when executed by a processor implements the steps of the human hand well number identification method of the above-described embodiments.
According to the method, the device, the computer equipment and the storage medium for identifying the hand well numbers, the original deep learning model is improved, so that the accuracy of identifying the hand well numbers is improved, and the investment of personnel is reduced.
Drawings
FIG. 1 is a flow chart of a method for identifying a human hand well number in one embodiment;
FIG. 2 is a schematic diagram of image types of manhole numbers in a manhole number identification method in one embodiment;
FIG. 3 is a flow chart of step 110 in a method for identifying a number of a manhole in one embodiment;
FIG. 4 is a schematic diagram of a model distillation learning method in a method for identifying a human hand well number according to an embodiment;
FIG. 5 is a schematic diagram of a filter pruning method in a method for identifying a human hand well number in one embodiment;
FIG. 6 is a flow chart of a self-supervised pre-training in a method of identifying a manhole number in one embodiment;
FIG. 7 is a schematic diagram of the structure of an original DBnet model in a method for identifying a human hand well number in one embodiment;
FIG. 8 is a schematic diagram of the structure of a target DBnet model in a method for identifying a human hand well number in one embodiment;
FIG. 9 is a schematic diagram of a residual network in a method for identifying a human hand well number in one embodiment;
FIG. 10 is a schematic diagram of the structure of an original CRNN model in a method for identifying a human hand well number in one embodiment;
FIG. 11 is a schematic diagram of a structure of a target CRNN model in a method for identifying a human hand well number in an embodiment;
FIG. 12 is a schematic diagram of the structure of a Swin-transducer network in a method of identifying a human hand well number in one embodiment;
FIG. 13 is a schematic diagram of a human hand well number identification device in one embodiment;
fig. 14 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
Fig. 1 shows a flowchart of an embodiment of a method for identifying a hand well number provided by the invention. As shown in fig. 1, the method comprises the following steps:
step 110: training the improved deep learning model based on a plurality of target training samples containing the hand well numbers to obtain a target recognition model for recognizing the hand well numbers.
Wherein (1) the improved deep learning model comprises: the set target DBnet model, target Mobilenetv3 model and target CRNN model are connected in sequence. (2) The target DBnet model is: the model obtained after the original DBnet model is improved comprises the following components: the backbone network (Resnet 18 network), the neg network (FPN network) and the head network are sequentially connected. The improvement points of the target DBet model are as follows: setting a residual error network with an attention mechanism between a neg network and a head network; the function is as follows: and fully fusing the characteristic information and acquiring local effective information. (3) The target DBnet model is used for detecting a text box of an image, the target Mobilenetv3 model is used for performing angle calibration on the feature map, and the target CRNN model is used for performing text recognition on the feature map. (4) The target training samples are: an image containing a manhole number of any one image type. Image types corresponding to the manhole number include, but are not limited to: 1) manholes with nameplates (fig. 2 (a) and fig. 2 (b)), 2) paint-spraying numbers for walls in the wells (fig. 2 (c)), 3) paint-spraying numbers for rings in the wells (fig. 2 (d)), 4) paint-spraying numbers for well covers (fig. 2 (e) and fig. 2 (f)). (3) The target training sample can be an image sample after data enhancement or an original image sample without any treatment, and is not limited herein; the target training sample may be a sample of a human hand well number after any angle rotation, a sample of a human hand well number with a bend, a sample of a human hand well number with an inclination during photographing, or the like, and is not limited herein. (4) The target recognition model is as follows: after training is carried out through a plurality of target training samples, the obtained deep learning model for identifying the number of the hand well is obtained.
Specifically, a plurality of target training samples containing the manholes are input into an improved deep learning model for iterative training until the maximum iteration times are met, and a target recognition model for manholes number recognition is obtained.
Step 120: and inputting the hand well image to be detected into the target recognition model to obtain a recognition result of the hand well image to be detected.
Wherein, (1) the hand well image to be measured is: the image comprises a manhole number and is required to be identified and detected. (2) The identification result is as follows: the identification text of the hand well number in the hand well image to be tested is contained.
Specifically, the hand well image to be detected is input into the target recognition model for hand well number recognition, and a recognition text which is output by the target recognition model and contains the hand well number in the hand well image to be detected is obtained.
According to the code recognition method for the human hand well, the original deep learning model is improved, so that the accuracy of the code recognition of the human hand well is improved, and the investment of personnel is reduced.
Preferably, as shown in fig. 3, step 110 includes:
step 111: and respectively inputting each target training sample into the target DBnet model to perform text box detection, and obtaining a first feature map corresponding to each target training sample.
Specifically, based on a distillation mode of mutual learning of a plurality of student models, each target training sample is respectively input into the target DBnet model for text box detection, and a first feature map corresponding to each target training sample is obtained.
Wherein, (1) the distillation mode that a plurality of student models learn each other is: one Resnet50 network is selected as a Teacher model, and a Resnet18 network (Resnet 18 network is a part of a target DBnet model) is adopted as a Student model. Because the Resnet50 network has deeper network layers and a more complex network structure, after the Resnet50 network is selected as a basic network, the accuracy of the Resnet50-Dbnet (R50 for short) network is higher than that of the Resnnet18-DBnet network (R18 for short), so that R50 is used as a teacher model and R18 is used as a student model. The method shown in fig. 4 is adopted, namely two student models are divided into a group, a student mutual learning strategy DML is adopted, and the DML is introduced to improve the accuracy of sample identification. (2) The target DBnet model is used for: a predictive text box for each target training sample is obtained. (3) The first feature map is: predictive text boxes of target training samples output by the target DBnet model.
Step 112: and respectively inputting each first characteristic diagram to the target Mobilenetv3 model for angle calibration processing to obtain a second characteristic diagram corresponding to each first characteristic diagram.
The (1) Mobilenetv3 model has the characteristics of small volume, low calculated amount, high model accuracy and the like. (2) The second feature map is: and the first feature map is subjected to angle calibration processing through a target mobiletv 3 model to obtain a feature map (normal text box). (3) The categories of the first feature map (predictive text box) include: normal text boxes, 90 text boxes, 180 degree text boxes, 270 degree text boxes.
Step 113: and respectively inputting each second feature map into the target CRNN model after self-supervision pre-training to perform text recognition, so as to obtain a target loss value corresponding to each target training sample.
The method comprises the steps of (1) performing self-supervision pre-training on a model, wherein a main network can identify angle information of a text, so that model convergence is quickened, and generalization capability of the model is improved; the process of self-supervised pre-training is shown in figure 5. (2) The target loss value is: and combining the loss function of the improved deep learning model, the text predicted value (text identification information output by the target CRNN model) of the target training sample and the text true value (text labeling information corresponding to the target training sample) to calculate the loss value of the target training sample.
Specifically, performing self-supervision pre-training on a target CRNN model, performing text recognition on the target CRNN model after self-supervision pre-training on any second feature map input value to obtain a text predicted value of a target training sample corresponding to the second feature map, obtaining a target loss value of the target training sample based on a loss function of an improved deep learning model, the text predicted value and a text true value of the target training sample, and repeating the above manner until the target loss value of each target training sample is obtained.
Step 114: and optimizing parameters of the improved deep learning model according to all target loss values to obtain an optimized deep learning model, taking the optimized deep learning model as the improved deep learning model, and returning to the execution step 111 until the maximum iteration number is reached, and determining the optimized deep learning model as the target recognition model.
The optimized deep learning model is as follows: the model after optimizing the parameters of the improved deep learning model according to the target loss values of the target training samples is a general deep learning model parameter optimizing mode, and the detailed model parameter optimizing mode is not repeated herein.
Specifically, according to target loss values corresponding to all target training samples, parameters of the improved deep learning model are optimized, and the optimized deep learning model is obtained. Judging whether the optimized deep learning model meets preset iteration training conditions (such as whether the maximum iteration times are reached or not); if not, the optimized deep learning model is used as an improved deep learning model, and the step 111 is executed in a return mode until the judgment result is yes, and the optimized deep learning model obtained after the iterative training is determined to be a target recognition model.
In the above preferred technical scheme, the deep learning model is further trained by a target DBnet model, a target Mobilenetv3 model and a target CRNN model in combination with modes of distillation learning, self-supervision pre-training and the like, so that the efficiency and the accuracy of the identification of the number of the hand well are improved.
Preferably, the target DBnet model further comprises: an improved FPN network; the construction process of the improved FPN network comprises the following steps: the convolution kernel to which each up-sampled and fused feature map in the original FPN network is connected is increased to increase the receptive field of the original FPN network.
Wherein, (1) as shown in fig. 6, the original DBnet model is divided into three parts: a backbone network (Resnet 18 network), a neg network (original FPN network), and a head network, which are connected in sequence. The backbone network is mainly used for extracting features from the original image; the network of the novel is an FPN structure, 4 stages of output of the network of the FPN fusion Resnet18 are adopted, the output is finally fused together through splicing, the network of the novel mainly fuses each feature map, and finally a higher-level feature map is generated; the head network mainly generates a probability map and a threshold map, and finally outputs a text segmentation example result through up-sampling through head output. (2) The target DBnet model is improved based on the neg network (FPN network). As shown in fig. 7, the improved FPN network is constructed by: the convolution kernel connected with each target feature graph in an original FPN network is increased, and a residual network module is arranged between a feature graph splicing module of the original FPN network and the DBhead network; each target feature map is: and the residual error network module is used for carrying out up-sampling and fusion on the feature map, wherein the residual error network module is as follows: residual network with attention mechanism.
It should be noted that: (1) the improved FPN network increases the receptive field, has better support for the number of the large human hand well occupied by the text box, and the recognition accuracy F1-Score of the whole sample is increased by about 5% after improvement, especially about 10% for the recognition of the large text box. (2) Because the image is not fully fused after the image is extracted through the FPN network to obtain the advanced features, the local effective information is not highlighted, in order to fully fuse the feature information and obtain the local effective information, a residual network module is arranged between a feature map splicing module of the original FPN network and the DBhead network, the residual network module is a residual network se-block with an attention mechanism, the structure of the se-block is shown in the following figure 8, the F1-score added with the attention mechanism is improved by about 2%, and the detection of the number of a manhole under a complex scene is improved by 5%.
Preferably, the method further comprises:
and compressing the original Mobilenetv3 model based on a filter pruning method to obtain the target Mobilenetv3 model.
As shown in fig. 9, the filter pruning method includes: the first cycle is epoch, which is essentially a common iterative training, and pruning is started after each epoch is completed. In the second layer loop, N is selected by calculating the geometric median of the convolution kernel i+1 *P i The convolution kernels near the geometric median are pruned, and the pruning mode is still the process of parameter setting 0.
Preferably, the method further comprises:
and replacing the RNN network in the original CRNN model with a Swin-transducer network to obtain the target CRNN model.
Wherein (1) the raw CRNN model comprises: and sequentially connecting the set CNN network, RNN network and CTC loss network. (2) The target CRNN model includes: and the CNN network, the Swin-converter network and the CTC loss network are sequentially connected. (3) The original CRNN model has the disadvantage that: (1) The recognition accuracy is relatively low, and is much lower in the case of containing a large number of Chinese character sets than in the case of the text model of the RARE type, and is much lower in the case of English character sets. (2) For short texts with larger deformation such as artistic words, or texts with larger change in natural scenes, the CRNN recognition accuracy is lower. The number of the man-hand well on site belongs to a text in a natural scene, and meanwhile, the text is changed greatly and the noise is large; in the recognition model trained by using the original CRNN model, recognition errors exist in some pictures, and the optimization effect is very small through various means. (4) As shown in fig. 11, the target CRNN model is an improvement of the RNN network of the original CRNN model, and the RNN network is replaced with a Swin-transmitter network. (5) The transformers in the Swin-Transformer network have a self-attention mechanism, global text information can be effectively obtained, and multiple heads can map the global text information to multiple spaces, so that the model expression capacity is enhanced; the Transformer has good modal fusion capability, and for an image, initial Embeddins obtained by Conv or direct pixel operation on the image can be fed into the Transformer without always maintaining the FeatureMap structure of H×W×C. Similar to Position Embedding, the information can be utilized very easily as long as it can be encoded. The structure of the Swin-transducer network is shown in FIG. 12. There are four steps per two consecutive blocks in fig. 12. In the first Block: firstly, a feature map passes through a LayerNorm layer, passes through a W-MSA, and then is connected in a jumping manner; and the feature map after the first connection passes through the Layer Norm Layer again, passes through the full connection Layer MLP, and then is subjected to jump connection to obtain the feature map after the second connection. In the second Block: firstly, the feature map after the second connection in the first Block passes through a LayerNorm layer, then passes through SW-MSA, and then is subjected to jump connection to obtain the feature map after the third connection; and the feature map after the third connection passes through the LayerNorm layer again, passes through the full-connection layer MLP, and then is subjected to jump connection to obtain the feature map after the fourth connection and output the feature map to the next part.
Preferably, the method further comprises:
and obtaining a plurality of original training samples containing the numbers of the human hand wells, and respectively carrying out data enhancement processing on each original training sample to obtain a plurality of first training samples.
Wherein, (1) the original training samples are: training samples that were not processed at all and included the manholes numbers. (2) The first training samples are: the training samples are subjected to data enhancement processing and comprise the number of the hand well. (3) The process of data enhancement processing includes, but is not limited to: changing brightness, changing color difference, changing saturation, image cropping, mosaicing, embossing, transparency sharpening, etc. (4) The original training samples are obtained by the following ways including but not limited to: and (5) field collection and the like.
Specifically, a plurality of original training samples containing the numbers of the human hand wells are acquired in a field acquisition mode, data enhancement processing is carried out on any one of the original training samples to obtain at least one first training sample corresponding to the original training sample, and the mode is repeated until at least one first training sample (namely a plurality of first training samples) corresponding to each original training sample is obtained.
And respectively carrying out resolution enhancement processing on each first training sample to obtain a plurality of target training samples.
Wherein, the target training sample is: the first training sample is a training sample after resolution enhancement processing.
Specifically, resolution enhancement processing is performed on any first training sample to obtain a target training sample corresponding to the first training sample, and the above mode is repeated until the target training sample corresponding to each first training sample is obtained.
It should be noted that, besides data enhancement and resolution enhancement, a plurality of target training samples may be obtained by a sample expansion method. For images containing the manhole numbers, the standard fonts of the sample fonts collected at present account for about 70 percent, the painted fonts account for 30 percent, and the standard fonts are easier to identify, and the most difficult is the painted fonts. At present, since the open source model is based on standard fonts and handwriting, no corresponding sample exists for spray-paint character recognition at present, and the number of the required character recognition on training samples is more than hundred thousand, the samples can be added only through a manual synthesis mode, and the manual synthesis steps are as follows: (1) font selection: and selecting a painting font. (2) Character generation: after obtaining the corresponding ttf font file, the training sample with the label needs to be automatically generated, and the generation strategies of the manholes number include but are not limited to: 1) generating 1-15 digits of a pure digital manholes number, 2) generating numbers and letter manholes numbers, 3) generating pure Chinese characters, 4) generating Chinese characters and numbers, 5) generating Chinese characters and numbers and letters, 6) generating vertical manholes numbers, and 7) generating horizontal manholes numbers. (3) Character splicing: after the paint spraying fonts are generated, the problem of foreground characters is solved, background pictures are required to be automatically generated, and a plurality of areas are randomly cut in the original pictures to be spliced, so that original training sample data are generated.
In the above preferred technical scheme, the data enhancement, the resolution enhancement and the sample expansion are further performed on the training samples, so that the number of the training samples is ensured, the problem of data unevenness in the training process is reduced, and the accuracy of the model is improved.
Furthermore, it may further include:
and verifying the recognition result of the hand well image to be tested based on the hand well information database, outputting the hand well image to be tested to be successfully recognized if verification is successful, and outputting prompt information of failure in verifying the hand well image to be tested if verification is not successful.
Wherein (1) the manhole information database comprises: the position information, the number information and the like of a plurality of hand wells are stored in advance. (2) The hint information includes, but is not limited to: "error in current number", "need maintenance", etc.
Specifically, the serial number information of the hand well corresponding to the hand well image to be detected is obtained from the hand well information data, the hand well serial number contained in the identification result of the hand well image to be detected is compared with the serial number information of the hand well to be detected, if the identification is successful, the hand well image to be detected is successfully identified, and the investigation is not needed; if the verification fails, outputting that the hand well image to be detected fails to be identified, sending prompt information to the corresponding terminal, and informing relevant personnel to process.
And the identification result is compared with the man-hand well information database for verification, so that the man-hand well information is pushed according to the verification result, and the operation and maintenance efficiency of the man-hand well is improved.
It should be understood that, although the steps in the flowcharts of fig. 1 and 3 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 1 and 3 may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed sequentially, but may be performed alternately or alternately with at least a portion of the other steps or sub-steps of other steps.
Fig. 13 is a block diagram showing an embodiment of a human hand well number recognition device provided by the present invention. As shown in fig. 13, the apparatus 200 includes: a training module 210 and an identification module 220.
The training module 210 trains the improved deep learning model based on a plurality of target training samples containing the hand well numbers to obtain a target recognition model for the hand well number recognition; wherein the improved deep learning model comprises: sequentially connecting the set target DBnet model, target Mobilenetv3 model and target CRNN model; wherein the target DBnet model comprises: a residual network with an attention mechanism; the target DBnet model is used for detecting a text box of an image, the target Mobilenetv3 model is used for performing angle calibration on the feature map, and the target CRNN model is used for performing text recognition on the feature map;
And the identification module 220 inputs the hand well image to be detected to the target identification model to obtain an identification result of the hand well image to be detected.
According to the code recognition method for the human hand well, the original deep learning model is improved, so that the accuracy of the code recognition of the human hand well is improved, and the investment of personnel is reduced.
Preferably, the training module 210 includes: a first training module 211, a second training module 212, a third training module 213, and an iterative training module 214;
the first training module 211 inputs each target training sample to the target DBnet model to perform text box detection, so as to obtain a first feature map corresponding to each target training sample;
the second training module 212 is configured to input each first feature map to the target mobiletv 3 model for performing angle calibration processing, so as to obtain a second feature map corresponding to each first feature map;
the third training module 213 is configured to input each second feature map to the target CRNN model after self-supervision pre-training to perform text recognition, so as to obtain a target loss value corresponding to each target training sample;
the iterative training module 214 is configured to optimize parameters of the improved deep learning model according to all target loss values, obtain an optimized deep learning model, take the optimized deep learning model as the improved deep learning model, and return to invoke the first training module until the maximum number of iterations is reached, and determine the optimized deep learning model as the target recognition model.
In the above preferred technical solution, the step of inputting each target training sample to the target DBnet model to perform text box detection to obtain a first feature map corresponding to each target training sample includes:
based on a distillation mode of mutual learning of a plurality of student models, each target training sample is respectively input into the target DBnet model for text box detection, and a first feature map corresponding to each target training sample is obtained.
In the above preferred technical solution, the target DBnet model further includes: an improved FPN network; the construction process of the improved FPN network comprises the following steps: the convolution kernel to which each up-sampled and fused feature map in the original FPN network is connected is increased to increase the receptive field of the original FPN network.
In the above preferred technical solution, the method further includes:
and compressing the original Mobilenetv3 model based on a filter pruning method to obtain the target Mobilenetv3 model.
In the above preferred technical solution, the method further includes:
replacing an RNN network in an original CRNN model with a Swin-transform network to obtain the target CRNN model; wherein the original CRNN model includes: and sequentially connecting the set CNN network, the RNN network and the CTC loss network.
In the above preferred technical solution, the method further includes:
acquiring a plurality of original training samples containing a human hand well number, and respectively carrying out data enhancement processing on each original training sample to obtain a plurality of first training samples;
and respectively carrying out resolution enhancement processing on each first training sample to obtain a plurality of target training samples.
In the above preferred technical scheme, the data enhancement, the resolution enhancement and the sample expansion are further performed on the training samples, so that the number of the training samples is ensured, the problem of data unevenness in the training process is reduced, and the accuracy of the model is improved.
For specific limitations of the manhole number recognition device, reference may be made to the above limitation of the manhole number recognition method, and no further description is given here. All or part of the modules in the hand well number identification device can be realized by software, hardware and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
As shown in fig. 14, in one embodiment, a computer device is provided comprising a memory storing a computer program and a processor that when executing the computer program performs the steps of:
Training the improved deep learning model based on a plurality of target training samples containing the hand well numbers to obtain a target recognition model for recognizing the hand well numbers; wherein the improved deep learning model comprises: sequentially connecting the set target DBnet model, target Mobilenetv3 model and target CRNN model; wherein the target DBnet model comprises: a residual network with an attention mechanism; the target DBnet model is used for detecting a text box of an image, the target Mobilenetv3 model is used for performing angle calibration on the feature map, and the target CRNN model is used for performing text recognition on the feature map;
and inputting the hand well image to be detected into the target recognition model to obtain a recognition result of the hand well image to be detected.
In one embodiment, the step of training the improved deep learning model based on a plurality of target training samples including the manhole number to obtain a target recognition model for manhole number recognition includes:
and respectively inputting each target training sample into the target DBnet model to perform text box detection, and obtaining a first feature map corresponding to each target training sample.
Respectively inputting each first characteristic diagram to the target Mobilenetv3 model for angle calibration treatment to obtain a second characteristic diagram corresponding to each first characteristic diagram;
respectively inputting each second feature map into the target CRNN model after self-supervision pre-training to perform text recognition, and obtaining a target loss value corresponding to each target training sample;
and optimizing parameters of the improved deep learning model according to all target loss values to obtain an optimized deep learning model, taking the optimized deep learning model as the improved deep learning model, and returning to execute the step of respectively inputting each target training sample into the target DBnet model for text box detection until the maximum iteration number is reached, and determining the optimized deep learning model as the target recognition model.
In one embodiment, the step of inputting each target training sample to the target DBnet model to perform text box detection to obtain a first feature map corresponding to each target training sample includes:
based on a distillation mode of mutual learning of a plurality of student models, each target training sample is respectively input into the target DBnet model for text box detection, and a first feature map corresponding to each target training sample is obtained.
In one embodiment, the target DBnet model further comprises: an improved FPN network; the construction process of the improved FPN network comprises the following steps: the convolution kernel to which each up-sampled and fused feature map in the original FPN network is connected is increased to increase the receptive field of the original FPN network.
In one embodiment, further comprising:
and compressing the original Mobilenetv3 model based on a filter pruning method to obtain the target Mobilenetv3 model.
In one embodiment, further comprising:
replacing an RNN network in an original CRNN model with a Swin-transform network to obtain the target CRNN model; wherein the original CRNN model includes: and sequentially connecting the set CNN network, the RNN network and the CTC loss network.
In one embodiment, further comprising:
acquiring a plurality of original training samples containing a human hand well number, and respectively carrying out data enhancement processing on each original training sample to obtain a plurality of first training samples;
and respectively carrying out resolution enhancement processing on each first training sample to obtain a plurality of target training samples.
In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:
Training the improved deep learning model based on a plurality of target training samples containing the hand well numbers to obtain a target recognition model for recognizing the hand well numbers; wherein the improved deep learning model comprises: sequentially connecting the set target DBnet model, target Mobilenetv3 model and target CRNN model; wherein the target DBnet model comprises: a residual network with an attention mechanism; the target DBnet model is used for detecting a text box of an image, the target Mobilenetv3 model is used for performing angle calibration on the feature map, and the target CRNN model is used for performing text recognition on the feature map;
and inputting the hand well image to be detected into the target recognition model to obtain a recognition result of the hand well image to be detected.
In one embodiment, the step of training the improved deep learning model based on a plurality of target training samples including the manhole number to obtain a target recognition model for manhole number recognition includes:
and respectively inputting each target training sample into the target DBnet model to perform text box detection, and obtaining a first feature map corresponding to each target training sample.
Respectively inputting each first characteristic diagram to the target Mobilenetv3 model for angle calibration treatment to obtain a second characteristic diagram corresponding to each first characteristic diagram;
respectively inputting each second feature map into the target CRNN model after self-supervision pre-training to perform text recognition, and obtaining a target loss value corresponding to each target training sample;
and optimizing parameters of the improved deep learning model according to all target loss values to obtain an optimized deep learning model, taking the optimized deep learning model as the improved deep learning model, and returning to execute the step of respectively inputting each target training sample into the target DBnet model for text box detection until the maximum iteration number is reached, and determining the optimized deep learning model as the target recognition model.
In one embodiment, the step of inputting each target training sample to the target DBnet model to perform text box detection to obtain a first feature map corresponding to each target training sample includes:
based on a distillation mode of mutual learning of a plurality of student models, each target training sample is respectively input into the target DBnet model for text box detection, and a first feature map corresponding to each target training sample is obtained.
In one embodiment, the target DBnet model further comprises: an improved FPN network; the construction process of the improved FPN network comprises the following steps: the convolution kernel to which each up-sampled and fused feature map in the original FPN network is connected is increased to increase the receptive field of the original FPN network.
In one embodiment, further comprising:
and compressing the original Mobilenetv3 model based on a filter pruning method to obtain the target Mobilenetv3 model.
In one embodiment, further comprising:
replacing an RNN network in an original CRNN model with a Swin-transform network to obtain the target CRNN model; wherein the original CRNN model includes: and sequentially connecting the set CNN network, the RNN network and the CTC loss network.
In one embodiment, further comprising:
acquiring a plurality of original training samples containing a human hand well number, and respectively carrying out data enhancement processing on each original training sample to obtain a plurality of first training samples;
and respectively carrying out resolution enhancement processing on each first training sample to obtain a plurality of target training samples.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims (10)

1. A method for identifying a hand well number, the method comprising:
training the improved deep learning model based on a plurality of target training samples containing the hand well numbers to obtain a target recognition model for recognizing the hand well numbers; wherein the improved deep learning model comprises: sequentially connecting the set target DBnet model, target Mobilenetv3 model and target CRNN model; wherein the target DBnet model comprises: a residual network with an attention mechanism; the target DBnet model is used for detecting a text box of an image, the target Mobilenetv3 model is used for performing angle calibration on the feature map, and the target CRNN model is used for performing text recognition on the feature map;
And inputting the hand well image to be detected into the target recognition model to obtain a recognition result of the hand well image to be detected.
2. The method of claim 1, wherein the training the improved deep learning model based on a plurality of target training samples including a human hand well number to obtain a target recognition model for human hand well number recognition comprises:
and respectively inputting each target training sample into the target DBnet model to perform text box detection, and obtaining a first feature map corresponding to each target training sample.
Respectively inputting each first characteristic diagram to the target Mobilenetv3 model for angle calibration treatment to obtain a second characteristic diagram corresponding to each first characteristic diagram;
respectively inputting each second feature map into the target CRNN model after self-supervision pre-training to perform text recognition, and obtaining a target loss value corresponding to each target training sample;
and optimizing parameters of the improved deep learning model according to all target loss values to obtain an optimized deep learning model, taking the optimized deep learning model as the improved deep learning model, and returning to execute the step of respectively inputting each target training sample into the target DBnet model for text box detection until the maximum iteration number is reached, and determining the optimized deep learning model as the target recognition model.
3. The method of claim 2, wherein the step of inputting each target training sample to the target DBnet model for text box detection to obtain a first feature map corresponding to each target training sample includes:
based on a distillation mode of mutual learning of a plurality of student models, each target training sample is respectively input into the target DBnet model for text box detection, and a first feature map corresponding to each target training sample is obtained.
4. A method according to any one of claims 1 to 3, wherein the target DBnet model further comprises: an improved FPN network; the construction process of the improved FPN network comprises the following steps: the convolution kernel to which each up-sampled and fused feature map in the original FPN network is connected is increased to increase the receptive field of the original FPN network.
5. The method as recited in claim 1, further comprising:
and compressing the original Mobilenetv3 model based on a filter pruning method to obtain the target Mobilenetv3 model.
6. The method as recited in claim 1, further comprising:
replacing an RNN network in an original CRNN model with a Swin-transform network to obtain the target CRNN model; wherein the original CRNN model includes: and sequentially connecting the set CNN network, the RNN network and the CTC loss network.
7. The method as recited in claim 1, further comprising:
acquiring a plurality of original training samples containing a human hand well number, and respectively carrying out data enhancement processing on each original training sample to obtain a plurality of first training samples;
and respectively carrying out resolution enhancement processing on each first training sample to obtain a plurality of target training samples.
8. A human hand well number identification device, the device comprising:
the training module is used for training the improved deep learning model based on a plurality of target training samples containing the hand well numbers to obtain a target recognition model for recognizing the hand well numbers; wherein the improved deep learning model comprises: sequentially connecting the set target DBnet model, target Mobilenetv3 model and target CRNN model; wherein the target DBnet model comprises: a residual network with an attention mechanism; the target DBnet model is used for detecting a text box of an image, the target Mobilenetv3 model is used for performing angle calibration on the feature map, and the target CRNN model is used for performing text recognition on the feature map;
and the identification module is used for inputting the hand well image to be detected into the target identification model to obtain the identification result of the hand well image to be detected.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.
CN202211710215.3A 2022-12-29 2022-12-29 Method and device for identifying hand well number, computer equipment and storage medium Pending CN116778497A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211710215.3A CN116778497A (en) 2022-12-29 2022-12-29 Method and device for identifying hand well number, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211710215.3A CN116778497A (en) 2022-12-29 2022-12-29 Method and device for identifying hand well number, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116778497A true CN116778497A (en) 2023-09-19

Family

ID=87988386

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211710215.3A Pending CN116778497A (en) 2022-12-29 2022-12-29 Method and device for identifying hand well number, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116778497A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117746267A (en) * 2023-12-14 2024-03-22 广西环保产业投资集团有限公司 Crown extraction method, device and medium based on semi-supervised active learning

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117746267A (en) * 2023-12-14 2024-03-22 广西环保产业投资集团有限公司 Crown extraction method, device and medium based on semi-supervised active learning

Similar Documents

Publication Publication Date Title
US11922318B2 (en) System and method of character recognition using fully convolutional neural networks with attention
CN109902622B (en) Character detection and identification method for boarding check information verification
CN111723585B (en) Style-controllable image text real-time translation and conversion method
CN111046784B (en) Document layout analysis and identification method and device, electronic equipment and storage medium
CN112232149B (en) Document multimode information and relation extraction method and system
CA3124358C (en) Method and system for identifying citations within regulatory content
US10896357B1 (en) Automatic key/value pair extraction from document images using deep learning
CN110516541B (en) Text positioning method and device, computer readable storage medium and computer equipment
CN112801010A (en) Visual rich document information extraction method for actual OCR scene
RU2641225C2 (en) Method of detecting necessity of standard learning for verification of recognized text
CN110689012A (en) End-to-end natural scene text recognition method and system
CN114005123A (en) System and method for digitally reconstructing layout of print form text
CN114155527A (en) Scene text recognition method and device
CN116311310A (en) Universal form identification method and device combining semantic segmentation and sequence prediction
CN111027553A (en) Character recognition method for circular seal
CN116778497A (en) Method and device for identifying hand well number, computer equipment and storage medium
CN115512378A (en) Chinese environment mathematical formula extraction and identification method based on Transformer
CN112183542A (en) Text image-based recognition method, device, equipment and medium
CN111126059B (en) Short text generation method, short text generation device and readable storage medium
CN110674721A (en) Method for automatically detecting test paper layout formula
CN114202765A (en) Image text recognition method and storage medium
Hossain et al. Neural net based complete character recognition scheme for Bangla printed text books
CN116311275B (en) Text recognition method and system based on seq2seq language model
CN113255613B (en) Question judging method and device and computer storage medium
CN115797853B (en) Attention and multi-scale pooling-based rock residue image processing method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination