CN110942008B - Deep learning-based face sheet information positioning method and system - Google Patents

Deep learning-based face sheet information positioning method and system Download PDF

Info

Publication number
CN110942008B
CN110942008B CN201911149243.0A CN201911149243A CN110942008B CN 110942008 B CN110942008 B CN 110942008B CN 201911149243 A CN201911149243 A CN 201911149243A CN 110942008 B CN110942008 B CN 110942008B
Authority
CN
China
Prior art keywords
frames
deep learning
frame
neural network
face sheet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911149243.0A
Other languages
Chinese (zh)
Other versions
CN110942008A (en
Inventor
张春月
孙跃峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yto Express Co ltd
Original Assignee
Yto Express Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yto Express Co ltd filed Critical Yto Express Co ltd
Priority to CN201911149243.0A priority Critical patent/CN110942008B/en
Publication of CN110942008A publication Critical patent/CN110942008A/en
Application granted granted Critical
Publication of CN110942008B publication Critical patent/CN110942008B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a deep learning-based method and a deep learning-based system for positioning face sheets, which can provide more accurate positioning service for effective information (such as information of receiving and sending persons) of electronic face sheets, for example, applied to the field of express delivery. The technical proposal is as follows: extracting image features and a convolutional layer feature map from an input face sheet image through a convolutional neural network model; carrying out regression and classification of the boundary frames on the feature graphs with different convolution layers; training a convolutional neural network model through a loss function based on the regression and classification feature graphs to generate candidate frames; selecting the region with highest confidence coefficient and the target from the candidate frames to obtain the position information of the boundary frame; and cutting according to the obtained boundary frame position information to obtain the picture including the effective information.

Description

Deep learning-based face sheet information positioning method and system
Technical Field
The invention relates to a deep learning image detection technology, in particular to a (express) face order information positioning method and system based on deep learning.
Background
In recent years, with the rapid development of Taobao electronic commerce, the service volume of the express logistics industry is increased in an explosive manner, and particularly, a large number of express bill can be generated in special electronic commerce holidays such as great promotion in the year and twenty-one. The method brings great difficulty to the express delivery company when processing the express delivery bill, and how to effectively and rapidly process the effective information (namely the information of the receiving and sending person) in the express delivery bill is a problem worthy of research.
The deep learning method has greatly advanced along with the rapid development of the deep learning theory in recent years, and particularly, the detection precision is greatly improved compared with the original algorithm. A large number of documents show that the characteristics of the detection targets can be better described by deep learning self-learning characteristics, and complex characteristic extraction and data modeling processes are avoided. The main stream in the deep learning method is convolutional neural network CNN, which is used for Mnist handwritten digital character data set at the earliest. The current mainstream target detection algorithm is an R-CNN series two-stage target detector, the earliest R-CNN algorithm uses a Selective Search method to generate about 2000-3000 candidate regions from one image, then features are extracted from the candidate regions through a convolutional neural network and judged, and then Fast R-CNN and Fast-RCNN algorithms are all improvements of the R-CNN algorithm. However, although these two-stage detection algorithms have achieved higher accuracy, they are difficult to apply in practical applications, mainly because the huge network structure of the deep neural network results in huge calculation amount and difficulty in meeting real-time requirements. Therefore, we propose to use an improved single-stage neural network-based model to locate the electronic bill of express delivery.
Disclosure of Invention
The following presents a simplified summary of one or more aspects in order to provide a basic understanding of such aspects. This summary is not an extensive overview of all contemplated aspects, and is intended to neither identify key or critical elements of all aspects nor delineate the scope of any or all aspects. Its sole purpose is to present some concepts of one or more aspects in a simplified form as a prelude to the more detailed description that is presented later.
The invention aims to solve the problems and provide a method and a system for positioning the face bill information based on deep learning, which can provide more accurate positioning service for effective information (such as receiving and sending person information) of an electronic face bill applied to the field of express delivery.
The technical scheme of the invention is as follows: the invention discloses a face sheet information positioning method based on deep learning, which comprises the following steps:
step 1: extracting image features and a convolutional layer feature map from an input face sheet image through a convolutional neural network model;
step 2: carrying out regression and classification of the boundary frames on the feature graphs of different convolution layers output in the step 1;
step 3: training the convolutional neural network model through a loss function based on the feature map regressed and classified in the step 2 to generate candidate frames;
step 4: selecting the region with highest confidence coefficient and the target from the candidate frames to obtain the position information of the boundary frame;
step 5: and cutting according to the obtained boundary frame position information to obtain the picture including the effective information.
According to an embodiment of the deep learning-based face sheet information positioning method, step 1 is to extract image features by adopting a VGG16 convolutional neural network model, and output feature diagrams of different convolutional layers while extracting the image features.
According to one embodiment of the deep learning-based face sheet information positioning method, feature extraction is performed by adopting a front convolution layer 6-2 layer in a VGG16 convolution neural network model, and a convolution layer after the convolution layer 6-2 layer is deleted.
According to an embodiment of the deep learning-based face sheet information positioning method, step 2 is to predict the category and the coordinates of the object by adopting a series of small convolution modules, and regression and classification are carried out on the feature images with different layers and different receptive fields.
According to an embodiment of the deep learning based face sheet information positioning method of the present invention, the loss function in step 3 includes a classification error and a positioning error, and the prediction error of classification and positioning is shortened by minimizing the loss.
According to an embodiment of the method for locating face sheet information based on deep learning of the present invention, in step 4, the feature map of any face sheet image contains a plurality of candidate frames for object detection, the plurality of candidate frames have overlapping portions, the non-maximum suppression process is used for retaining an optimal frame, and assuming that there are N frames, the score of each frame calculated by the classifier is S i Wherein 1 is<=i<The processing steps of non-maximum suppression include:
the first step: constructing a set H for storing candidate frames to be processed, and initializing to contain all N frames; constructing a set M for storing the optimal frames, and initializing the set M as an empty set;
and a second step of: sorting the frames in all the sets H, selecting the frame M with the highest score, and moving the frame M from the set H to the set M;
and a third step of: traversing the frames in the set H, respectively calculating the cross ratio with the frames m, if the cross ratio is higher than a threshold value, considering that the frames overlap with the frames m, and removing the frames from the set H;
fourth step: returning to the first step for iteration until set H is empty and the boxes in set M are the boxes needed in the face-sheet image.
According to an embodiment of the deep learning based sheet information positioning method of the present invention, step 5 is clipping by open CV tool.
The invention also discloses a face sheet information positioning system based on deep learning, which comprises:
the feature extraction module is used for extracting image features and a convolution layer feature map from the input face single image through the convolution neural network model;
the regression classification module is used for carrying out regression and classification of the boundary frames on the feature graphs with different convolution layers output by the feature extraction module;
the model training module is used for training the convolutional neural network model through the loss function based on the feature map output by the regression classification module to generate a candidate frame;
the screening module is used for screening the area with the highest confidence coefficient and the target from the candidate frames to obtain the position information of the boundary frame;
and the clipping module is used for clipping according to the obtained boundary frame position information so as to obtain a picture comprising the effective information.
According to one embodiment of the deep learning-based face sheet information positioning system, the feature extraction module adopts a VGG16 convolutional neural network model to extract image features, and outputs feature graphs of different convolutional layers while extracting the image features.
According to one embodiment of the deep learning-based face sheet information positioning system, feature extraction is performed by adopting a front convolution layer 6-2 layer in a VGG16 convolution neural network model, and a convolution layer after the convolution layer 6-2 layer is deleted.
According to one embodiment of the deep learning-based face sheet information positioning system, a regression classification module predicts the category and the coordinate of an object by adopting a series of small convolution modules, and carries out regression and classification on feature images with different layers and different receptive fields.
According to one embodiment of the deep learning based face sheet information positioning system of the present invention, the loss function in the model training module includes classification errors and positioning errors, and prediction errors of classification and positioning are shortened by minimizing the loss.
According to an embodiment of the deep learning-based face sheet information positioning system of the present invention, in the filtering module, a feature map of any face sheet image contains a plurality of candidate frames for object detection, the plurality of candidate frames have overlapping portions, the non-maximum suppression process is used for retaining an optimal frame, and assuming that N frames are provided, the score of each frame calculated by the classifier is S i Wherein 1 is<=i<The processing steps of non-maximum suppression in the screening module include:
the first step: constructing a set H for storing candidate frames to be processed, and initializing to contain all N frames; constructing a set M for storing the optimal frames, and initializing the set M as an empty set;
and a second step of: sorting the frames in all the sets H, selecting the frame M with the highest score, and moving the frame M from the set H to the set M;
and a third step of: traversing the frames in the set H, respectively calculating the cross ratio with the frames m, if the cross ratio is higher than a threshold value, considering that the frames overlap with the frames m, and removing the frames from the set H;
fourth step: returning to the first step for iteration until set H is empty and the boxes in set M are the boxes needed in the face-sheet image.
According to one embodiment of the deep learning based sheet information positioning system of the present invention, the clipping module clips through the open CV tool.
The invention also discloses a face sheet information positioning system based on deep learning, which comprises:
a processor; and
a memory configured to store a series of computer-executable instructions and computer-accessible data associated with the series of computer-executable instructions,
wherein the series of computer executable instructions, when executed by the processor, cause the processor to perform the method as described above.
Also disclosed is a non-transitory computer-readable storage medium having stored thereon a series of computer-executable instructions that, when executed by a computing device, cause the computing device to perform a method as previously described.
Compared with the prior art, the invention has the following beneficial effects: the invention discloses a face sheet information positioning method and a face sheet information positioning system, which are a method based on a deep learning convolutional neural network. Unlike traditional face sheet information positioning technology based on image processing technology, the method provided by the invention has good robustness and higher accuracy. Specifically, the invention uses the front convolution layer 6-2 layer of the VGG16 convolution neural network model to extract image characteristics so as to obtain a face-sheet image to be positioned, after preprocessing the input face-sheet image, the convolution characteristics of the face-sheet image are extracted by using the basic framework of the front convolution layer 6-2 layer of the VGG16 convolution neural network model, an image pyramid structure is constructed by characteristic diagrams with different sizes to predict, and softmax classification and position regression are simultaneously carried out on a plurality of characteristic diagrams, thereby avoiding the problem of insensitivity of low-dimensional characteristic extraction during detection and improving the detection accuracy.
Drawings
The above features and advantages of the present invention will be better understood after reading the detailed description of embodiments of the present disclosure in conjunction with the following drawings. In the drawings, the components are not necessarily to scale and components having similar related features or characteristics may have the same or similar reference numerals.
FIG. 1 is a flow chart of an embodiment of a deep learning based face sheet information positioning method of the present invention.
Fig. 2 shows a schematic structural diagram of the convolutional neural network of the present invention.
FIG. 3 illustrates a schematic diagram of one embodiment of a deep learning based face sheet information positioning system of the present invention.
Detailed Description
The invention is described in detail below with reference to the drawings and the specific embodiments. It is noted that the aspects described below in connection with the drawings and the specific embodiments are merely exemplary and should not be construed as limiting the scope of the invention in any way.
FIG. 1 shows a flow of one embodiment of the deep learning based face sheet information positioning method of the present invention. Referring to fig. 1, the following is a detailed description of the implementation steps of the positioning method of the present embodiment.
Step S1: and extracting image features and a convolutional layer feature map from the input face sheet image through a convolutional neural network model.
In this embodiment, the VGG16 convolutional neural network model is used to extract image features, and different convolutional layer feature maps are output while the image features are extracted.
Because the detection target is relatively small (such as the name of the addressee on the express delivery piece), and the characteristic diagram receptive field of the SSD at the lower layer is relatively small and the receptive field of the SSD at the upper layer is relatively large, the embodiment adopts the front convolution layer 6-2 layer in the VGG16 convolution neural network model shown in fig. 2 to perform characteristic extraction, and deletes the convolution layers after the convolution layer 6-2 layer, so that the accuracy is not affected, and the recognition size requirement can be met.
Step S2: and (3) carrying out regression and classification of the boundary boxes on the feature graphs of different convolution layers output in the step S1.
The method belongs to a detection link, and particularly adopts a series of small convolution modules to predict the category and the coordinates of an object, and carries out regression and classification on the feature images with different layers and different receptive fields.
For example, for each training image, the input image is preprocessed to 512 x 512, and the images are randomly selected as follows: using the original image; randomly sampling a plurality of patches (CropImage, cropped image blocks) and the minimum intersection ratio with the object is: 0.1,0.3,0.5,0.7 and 0.9. The sample patch is the original image size scale [0.3,1.0], the spatial scale ratio is 0.5 or 2. This cropped image block is preserved when the center (center) of the real bounding box is in the sampled patch and the real bounding box area in the sampled patch is greater than 0. After these sampling steps, each sampled patch is resized to a fixed size and randomly flipped (horizontally flipped) horizontally with a probability of 0.5 such that a sample is sampled by the batch samplers to generate a plurality of candidate samples, and then randomly selecting a sample from among them for network training.
Step S3: based on the feature map regressed and classified in the step S2, training the convolutional neural network model through a loss function to generate a candidate frame.
This step belongs to the training process, and the loss function in this embodiment includes a classification error and a positioning error, and shortens the prediction error of classification and positioning by minimizing the loss.
For example, a convolutional neural network is trained by a loss function, where the loss function L is:
Figure BDA0002283077200000071
/>
wherein:
Figure BDA0002283077200000072
Figure BDA0002283077200000073
the loss in this embodiment is divided into L conf (x, c) (confidence loss) and L loc (x, l, g) (positioning loss) where N is the number of predicted frames that match to the true callout; the a parameter is used to adjust the ratio between confidence loss and classification loss, defaulting to a=1. Wherein the method comprises the steps of
Figure BDA0002283077200000074
Is an indication parameter, when +>
Figure BDA0002283077200000075
When it means that the i priori frame matches the j true frame, and the class of the true frame is p. c is a category confidence predictor. l is the position prediction value of the corresponding bounding box of the a priori frame, and g is the position parameter of the real frame. A cross entropy loss function is employed in the confidence loss function, which employs an L1 smoothing loss function for locating losses.
Step S4: and screening the region with the highest confidence coefficient and the target from the candidate frames to obtain the position information of the boundary frame.
This step belongs to the verification process, and is selected from the candidate boxes by NMS (non-maximal suppression). The detailed processing steps for non-maximum suppression are as follows:
a feature map of a single image is given that contains many candidate boxes for object detection (i.e. each box may represent a certain type), but these candidate boxes are likely to have overlapping parts, and all that is required to do is to keep only the optimal box. Assuming N frames, the score calculated by the classifier for each frame is S i Wherein 1 is<=i<=N。
Step S41: constructing a set H for storing candidate frames to be processed, and initializing to contain all N frames; a set M of storage optimal frames is built and initialized to an empty set.
Step S42: and sorting the frames in all the sets H, selecting the frame M with the highest score, and moving the frame M from the set H to the set M.
Step S43: traversing the frames in the set H, calculating the cross ratio with the frames m respectively, and if the cross ratio is higher than a certain threshold (generally 0-0.5), considering that the frames overlap with m, and removing the frames from the set H.
Step S44: the iteration is returned to step S41 until the set H is empty. The boxes in the set M are the boxes required in the face-sheet image.
Step S5: and cutting according to the obtained boundary frame position information to obtain the picture including the effective information.
The present embodiment is cut by an Open CV tool, open CV being Open Source Computer Vision Library, which is a cross-platform computer vision library. In this step, the effective information refers to the information of the recipient and the sender on the express delivery.
FIG. 3 illustrates the principles of one embodiment of a deep learning based face sheet information positioning system of the present invention. Referring to fig. 3, the system of the present embodiment includes: the system comprises a feature extraction module, a regression classification module, a model training module, a screening module and a cutting module.
The feature extraction module is used for extracting image features and a convolution layer feature map from the input face single image through the convolution neural network model. In this embodiment, the VGG16 convolutional neural network model is used to extract image features, and different convolutional layer feature maps are output while the image features are extracted. Because the detection target is relatively small (such as the name of the addressee on the express delivery piece), and the characteristic diagram receptive field of the SSD at the lower layer is relatively small and the receptive field of the SSD at the upper layer is relatively large, the embodiment adopts the front convolution layer 6-2 layer in the VGG16 convolution neural network model shown in fig. 2 to perform characteristic extraction, and deletes the convolution layers after the convolution layer 6-2 layer, so that the accuracy is not affected, and the recognition size requirement can be met.
The regression classification module is used for carrying out regression and classification of the boundary frames on the feature graphs with different convolution layers. The module belongs to a detection link, and particularly adopts a series of small convolution modules to predict the category and the coordinates of an object, and carries out regression and classification on the feature images with different layers and different receptive fields. For example, for each training image, the input image is preprocessed to 512 x 512, and the images are randomly selected as follows: using the original image; randomly sampling a plurality of patches (CropImage, cropped image blocks) and the minimum intersection ratio with the object is: 0.1,0.3,0.5,0.7 and 0.9. The sample patch is the original image size scale [0.3,1.0], the spatial scale ratio is 0.5 or 2. This cropped image block is preserved when the center (center) of the real bounding box is in the sampled patch and the real bounding box area in the sampled patch is greater than 0. After these sampling steps, each sampled patch is resized to a fixed size and randomly flipped (horizontally flipped) horizontally with a probability of 0.5 such that a sample is sampled by the batch samplers to generate a plurality of candidate samples, and then randomly selecting a sample from among them for network training.
The model training module is used for training the convolutional neural network model through the loss function to generate candidate frames. The module belongs to a training link, and the loss function in the embodiment comprises classification errors and positioning errors, so that the prediction errors of classification and positioning are shortened through minimization of loss. For example, a convolutional neural network is trained by a loss function, where the loss function L is:
Figure BDA0002283077200000091
wherein:
Figure BDA0002283077200000092
Figure BDA0002283077200000093
the loss in this embodiment is divided into L conf (x, c) (confidence loss) and L loc (x, l, g) (positioning loss) where N is the number of predicted frames that match to the true callout; the a parameter is used to adjust the ratio between confidence loss and classification loss, defaulting to a=1. Wherein the method comprises the steps of
Figure BDA0002283077200000101
Is an indication parameter, when +>
Figure BDA0002283077200000102
When it means that the i priori frame matches the j true frame, and the class of the true frame is p. c is a category confidence predictor. l is the position predictive value of the corresponding bounding box of the prior boxAnd g is the position parameter of the real frame. A cross entropy loss function is employed in the confidence loss function, which employs an L1 smoothing loss function for locating losses.
The screening module is used for screening the area with the highest confidence coefficient and the target from the candidate frames so as to obtain the position information of the boundary frame. Screening is performed from the candidate boxes by the NMS (non-maximal suppression) in this embodiment. The screening module belongs to a verification link, and the detailed steps of non-maximum value inhibition are as follows:
a feature map of a single image is given that contains many candidate boxes for object detection (i.e. each box may represent a certain type), but these candidate boxes are likely to have overlapping parts, and all that is required to do is to keep only the optimal box. Assuming N frames, the score calculated by the classifier for each frame is S i Wherein 1 is<=i<=N。
The first step: constructing a set H for storing candidate frames to be processed, and initializing to contain all N frames; a set M of storage optimal frames is built and initialized to an empty set.
And a second step of: and sorting the frames in all the sets H, selecting the frame M with the highest score, and moving the frame M from the set H to the set M.
And a third step of: traversing the frames in the set H, calculating the cross ratio with the frames m respectively, and if the cross ratio is higher than a certain threshold (generally 0-0.5), considering that the frames overlap with m, and removing the frames from the set H.
Fourth step: the iteration is performed back to the first step until set H is empty. The boxes in the set M are the boxes required in the face-sheet image.
And the clipping module is used for clipping according to the obtained boundary frame position information so as to obtain a picture comprising effective information. The present embodiment is cut by an Open CV tool, open CV being Open Source Computer Vision Library, which is a cross-platform computer vision library. In this step, the effective information refers to the information of the recipient and the sender on the express delivery.
The invention also discloses a face sheet information positioning system based on deep learning, which comprises: a processor and a memory. The memory is configured to store a series of computer-executable instructions and computer-accessible data associated with the series of computer-executable instructions, wherein the series of computer-executable instructions, when executed by the processor, cause the processor to perform the method as previously described.
The invention also discloses a non-transitory computer readable storage medium having stored thereon a series of computer executable instructions which, when executed by a computing device, cause the computing device to perform a method as described above.
While, for purposes of simplicity of explanation, the methodologies are shown and described as a series of acts, it is to be understood and appreciated that the methodologies are not limited by the order of acts, as some acts may, in accordance with one or more embodiments, occur in different orders and/or concurrently with other acts from that shown and described herein or not shown and described herein, as would be understood and appreciated by those skilled in the art.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The various illustrative logical blocks, modules, and circuits described in connection with the embodiments disclosed herein may be implemented or performed with a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.
In one or more exemplary embodiments, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software as a computer program product, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. Any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a web site, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk (disk) and disc (disk) as used herein include Compact Disc (CD), laser disc, optical disc, digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks (disk) usually reproduce data magnetically, while discs (disk) reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
The previous description of the disclosure is provided to enable any person skilled in the art to make or use the disclosure. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the spirit or scope of the disclosure. Thus, the disclosure is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (14)

1. The method for positioning the face sheet information based on the deep learning is characterized by comprising the following steps:
step 1: extracting image features and a convolutional layer feature map from an input face sheet image through a convolutional neural network model;
step 2: carrying out regression and classification of the boundary frames on the feature graphs of different convolution layers output in the step 1;
step 3: training the convolutional neural network model through a loss function based on the feature map regressed and classified in the step 2 to generate candidate frames;
step 4: selecting the region with highest confidence coefficient and the target from the candidate frames to obtain the position information of the boundary frame;
step 5: cutting according to the obtained boundary frame position information to obtain a picture including effective information;
in step 4, the feature map of any single image contains a plurality of candidate frames for object detection, the candidate frames have overlapping portions, the non-maximum suppression processing is to keep the optimal frame, and it is assumed that there are N frames, and the score calculated by the classifier for each frame is S i Wherein 1 is<=i<The processing steps of non-maximum suppression include:
the first step: constructing a set H for storing candidate frames to be processed, and initializing to contain all N frames; constructing a set M for storing the optimal frames, and initializing the set M as an empty set;
and a second step of: sorting the frames in all the sets H, selecting the frame M with the highest score, and moving the frame M from the set H to the set M;
and a third step of: traversing the frames in the set H, respectively calculating the cross ratio with the frames m, if the cross ratio is higher than a threshold value, considering that the frames overlap with the frames m, and removing the frames from the set H;
fourth step: returning to the first step for iteration until set H is empty and the boxes in set M are the boxes needed in the face-sheet image.
2. The deep learning-based face sheet information positioning method of claim 1, wherein step 1 is to extract image features by adopting a VGG16 convolutional neural network model, and output feature maps of different convolutional layers while extracting the image features.
3. The deep learning based face sheet information positioning method of claim 2, wherein feature extraction is performed by using a front convolution layer 6-2 layer in a VGG16 convolutional neural network model, and a convolution layer after the convolution layer 6-2 layer is deleted.
4. The method for locating face sheets based on deep learning according to claim 1, wherein step 2 is to predict the category and coordinates of the object by using a series of small convolution modules, and to regress and classify the feature images with different layers of different receptive fields.
5. The deep learning based face sheet information positioning method of claim 1, wherein the loss function in step 3 includes a classification error and a positioning error, and the prediction error of classification and positioning is shortened by minimizing the loss.
6. The deep learning based sheet information localization method of claim 1, wherein step 5 is clipping by open CV tool.
7. A deep learning-based face sheet information positioning system, comprising:
the feature extraction module is used for extracting image features and a convolution layer feature map from the input face single image through the convolution neural network model;
the regression classification module is used for carrying out regression and classification of the boundary frames on the feature graphs with different convolution layers output by the feature extraction module;
the model training module is used for training the convolutional neural network model through the loss function based on the feature map output by the regression classification module to generate a candidate frame;
the screening module is used for screening the area with the highest confidence coefficient and the target from the candidate frames to obtain the position information of the boundary frame;
the clipping module is used for clipping according to the obtained boundary frame position information so as to obtain a picture comprising effective information;
wherein, in the screening module, the feature map of any single image contains a plurality of candidate frames for object detection, the candidate frames have overlapping parts, the non-maximum suppression processing is used for retaining the optimal frame, and the N frames are assumed, and the score of each frame calculated by the classifier is S i Wherein 1 is<=i<The processing steps of non-maximum suppression in the screening module include:
the first step: constructing a set H for storing candidate frames to be processed, and initializing to contain all N frames; constructing a set M for storing the optimal frames, and initializing the set M as an empty set;
and a second step of: sorting the frames in all the sets H, selecting the frame M with the highest score, and moving the frame M from the set H to the set M;
and a third step of: traversing the frames in the set H, respectively calculating the cross ratio with the frames m, if the cross ratio is higher than a threshold value, considering that the frames overlap with the frames m, and removing the frames from the set H;
fourth step: returning to the first step for iteration until set H is empty and the boxes in set M are the boxes needed in the face-sheet image.
8. The deep learning based face sheet information positioning system of claim 7, wherein the feature extraction module extracts image features by using a VGG16 convolutional neural network model, and outputs feature maps of different convolutional layers while extracting the image features.
9. The deep learning based face sheet information positioning system of claim 8, wherein feature extraction is performed using a front convolutional layer 6-2 in a VGG16 convolutional neural network model, and convolutional layers after the convolutional layer 6-2 are deleted.
10. The deep learning based face sheet information positioning system of claim 7, wherein the regression classification module predicts the class and coordinates of the object by using a series of small convolution modules to regress and classify feature maps of different layers with different receptive fields.
11. The deep learning based face sheet information positioning system of claim 7, wherein the loss function in the model training module includes classification errors and positioning errors, and prediction errors of classification and positioning are shortened by minimizing losses.
12. The deep learning based sheet information localization method of claim 7, wherein the clipping module clips by an open CV tool.
13. A deep learning-based face sheet information positioning system, comprising:
a processor; and
a memory configured to store a series of computer-executable instructions and computer-accessible data associated with the series of computer-executable instructions,
wherein the series of computer executable instructions, when executed by the processor, cause the processor to perform the method of any one of claims 1 to 6.
14. A non-transitory computer-readable storage medium having stored thereon a series of computer-executable instructions that, when executed by a computing device, cause the computing device to perform the method of any of claims 1-6.
CN201911149243.0A 2019-11-21 2019-11-21 Deep learning-based face sheet information positioning method and system Active CN110942008B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911149243.0A CN110942008B (en) 2019-11-21 2019-11-21 Deep learning-based face sheet information positioning method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911149243.0A CN110942008B (en) 2019-11-21 2019-11-21 Deep learning-based face sheet information positioning method and system

Publications (2)

Publication Number Publication Date
CN110942008A CN110942008A (en) 2020-03-31
CN110942008B true CN110942008B (en) 2023-05-12

Family

ID=69907352

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911149243.0A Active CN110942008B (en) 2019-11-21 2019-11-21 Deep learning-based face sheet information positioning method and system

Country Status (1)

Country Link
CN (1) CN110942008B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112989312B (en) * 2020-11-30 2024-04-30 北京金堤科技有限公司 Verification code identification method and device, electronic equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109145898A (en) * 2018-07-26 2019-01-04 清华大学深圳研究生院 A kind of object detecting method based on convolutional neural networks and iterator mechanism
WO2019144575A1 (en) * 2018-01-24 2019-08-01 中山大学 Fast pedestrian detection method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7418128B2 (en) * 2003-07-31 2008-08-26 Microsoft Corporation Elastic distortions for automatic generation of labeled data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019144575A1 (en) * 2018-01-24 2019-08-01 中山大学 Fast pedestrian detection method and device
CN109145898A (en) * 2018-07-26 2019-01-04 清华大学深圳研究生院 A kind of object detecting method based on convolutional neural networks and iterator mechanism

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
武林秀 ; 李厚杰 ; 贺建军 ; 陈璇 ; .基于深度学习的交通标志检测方法研究.大连民族大学学报.2018,(05),全文. *

Also Published As

Publication number Publication date
CN110942008A (en) 2020-03-31

Similar Documents

Publication Publication Date Title
CN110533084B (en) Multi-scale target detection method based on self-attention mechanism
US10679085B2 (en) Apparatus and method for detecting scene text in an image
CN112418117B (en) Small target detection method based on unmanned aerial vehicle image
CN110796186A (en) Dry and wet garbage identification and classification method based on improved YOLOv3 network
CN111738262A (en) Target detection model training method, target detection model training device, target detection model detection device, target detection equipment and storage medium
CN110991311A (en) Target detection method based on dense connection deep network
CN111898432B (en) Pedestrian detection system and method based on improved YOLOv3 algorithm
CN110298227B (en) Vehicle detection method in unmanned aerial vehicle aerial image based on deep learning
CN111353491B (en) Text direction determining method, device, equipment and storage medium
CN110688902B (en) Method and device for detecting vehicle area in parking space
CN111178451A (en) License plate detection method based on YOLOv3 network
CN113487610B (en) Herpes image recognition method and device, computer equipment and storage medium
CN111523439B (en) Method, system, device and medium for target detection based on deep learning
CN111353544A (en) Improved Mixed Pooling-Yolov 3-based target detection method
CN110633594A (en) Target detection method and device
CN113723377A (en) Traffic sign detection method based on LD-SSD network
CN110852327A (en) Image processing method, image processing device, electronic equipment and storage medium
CN113239753A (en) Improved traffic sign detection and identification method based on YOLOv4
CN116958688A (en) Target detection method and system based on YOLOv8 network
CN114283431B (en) Text detection method based on differentiable binarization
CN110942008B (en) Deep learning-based face sheet information positioning method and system
CN116704490B (en) License plate recognition method, license plate recognition device and computer equipment
Li et al. An efficient method for DPM code localization based on depthwise separable convolution
KR102026280B1 (en) Method and system for scene text detection using deep learning
Tian et al. BAN, a barcode accurate detection network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant