CN113724261A - Fast image composition method based on convolutional neural network - Google Patents

Fast image composition method based on convolutional neural network Download PDF

Info

Publication number
CN113724261A
CN113724261A CN202110920914.XA CN202110920914A CN113724261A CN 113724261 A CN113724261 A CN 113724261A CN 202110920914 A CN202110920914 A CN 202110920914A CN 113724261 A CN113724261 A CN 113724261A
Authority
CN
China
Prior art keywords
model
neural network
image
training
anchor frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110920914.XA
Other languages
Chinese (zh)
Inventor
倪志彬
何震宇
梁淇奥
蒋新科
向芝莹
周啸宇
石爻
李顺
左健甫
杨若辰
吴世涵
张恩华
吉雪莲
常世晴
罗佳源
陈攀宇
王瑞锦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202110920914.XA priority Critical patent/CN113724261A/en
Publication of CN113724261A publication Critical patent/CN113724261A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a fast image composition method based on a convolutional neural network, which comprises the following steps: step 1: training a view evaluation model architecture based on a twin neural network: step 2: deploying the trained view evaluation model into a teacher model, and scoring the candidate image anchor frame; taking a score training view suggestion model of the teacher as a student model, and outputting a score ranking of the same anchor frame; and step 3: extracting multi-scale features through a target detection network; and 4, step 4: outputting the extracted features to an anchor frame through a neural network full-connection layer; and 5: and (4) cutting the original input picture according to the anchor frame obtained in the step (4) to obtain a new composition. The invention aims to train the model to find the view with good composition, has good robustness, can generate the processed view in a very short time, and can be widely applied to image cutting, image thumbnail, image repositioning and real-time viewing suggestions.

Description

Fast image composition method based on convolutional neural network
Technical Field
The invention relates to the field of image processing, in particular to a fast image composition method based on a convolutional neural network.
Background
Early cropping methods explicitly designed various manual features based on photographic knowledge (e.g., the trisection and center methods). With the development of deep learning, a great deal of researchers are dedicated to developing clipping methods in a data-driven manner, and the release of some reference data sets for comparison greatly facilitates the progress of related research.
However, obtaining the best candidate clip map is still extremely difficult, and is mainly influenced by the following three aspects: 1) the potential of image saliency information cannot be fully released. Previous saliency-based clipping methods focused on preserving the most important content in the best clip diagram, but ignored this: the saliency region and the best cropped picture overlap if the rectangle of the saliency region is located near the boundary of the source image. Moreover, the saliency information is only used for the generation of candidate clipping maps and is not continuously used in subsequent clipping modules. 2) The potential region pairs (region of interest (ROI) and region of discard (ROD)) and their internal laws are not well represented. In general, a pair-wise cropping method explicitly forms and feeds a pair of source images into an automated cropping model, but the performance of such methods is often poor due to the selection of a source image pair that is overly dependent on detail and uncertain. 3) Traditional indicators for evaluating clipping methods are unreliable and inaccurate. In some cases, the intersection ratio (IoU) and the Boundary Displacement Error (BDE) are not sufficient to subjectively evaluate the performance of its clipping method.
In the field of image processing technology, deep learning brings revolutionary changes to machine learning and makes significant improvements over a wide variety of complex tasks. In recent years, with the dramatic increase in image processing data volumes, many researchers have been working on training Deep Neural Networks (DNNs) in a distributed manner. Under distributed training, a data parallel Stochastic Gradient Descent (SGD) method is generally adopted for training, training examples are scattered on each worker, each worker trains gradients based on own data, all gradient update model parameters are aggregated in an all reduce or parameter server mode, and the updated parameters are sent back to all workers of the next iteration. Many applications benefit from training the model to find a view with a good composition, such as image cropping, image thumbnails, recommended viewing, and self-contained photography. Image cropping, which aims to find an image crop with the best aesthetic quality, is widely used in image post-processing, visual recommendation, and image selection as an important technique. Especially when a large number of images need to be cropped, image cropping becomes a laborious task. Thus, in recent years, automated image cropping has attracted increasing attention within the research community and industry.
Patent application CN202110400578.6 discloses a saliency sensing image cropping method, device, computing device and storage medium, which solve the problems of insufficient utilization of image saliency information and possible overfitting of a model in the prior art.
However, assigning anchor boxes to ground truth views based on an overlap measure makes it difficult to train a composition model, and slightly adjusting the views typically produces large differences in composition quality. Moreover, the annotations are not exhaustive and most anchor boxes will not be annotated.
Meanwhile, in the prior art, the present invention cannot assume that they are negative samples for the target detection scenario.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a fast image composition method based on a convolutional neural network.
The purpose of the invention is realized by the following technical scheme:
a fast image composition method based on a convolutional neural network comprises the following steps:
step 1: training a view evaluation model architecture based on a twin neural network:
step 2: deploying the trained view evaluation model into a teacher model, and scoring the candidate image anchor frame; taking a score training view suggestion model of the teacher as a student model, and outputting a score ranking of the same anchor frame;
and step 3: extracting multi-scale features through a target detection network;
and 4, step 4: outputting the extracted features to an anchor frame through a neural network full-connection layer;
and 5: and (4) cutting the original input picture according to the anchor frame obtained in the step (4) to obtain a new composition.
Further, the step 1 comprises the following substeps:
step 101: two sub-networks are adopted, each sub-network receives an input, maps the input to a high-dimensional feature space and outputs a corresponding representation;
step 102: the degree of similarity of the two inputs is compared by calculating the distance of the two tokens.
Further, the distance between the two characterizations is a euclidean distance.
Further, the loss function used to train the student model is:
Figure BDA0003207410160000021
wherein y represents the score output by the teachermodel, q represents the score output by the student model, and n represents the number of output scores.
Further, the loss function migrates the knowledge owned by the teacher model to the student model during the training phase, and the parameters of the student model are continuously optimized through back propagation.
Further, the convolution kernel size of the view evaluation model is 3 × 3; the structure adopts the alternative arrangement of the convolution layer and the pooling layer, and increases the number of layers of nonlinear transformation.
The invention has the beneficial effects that: the invention aims to train the model to find the view with good composition, has good robustness, can generate the processed view in a very short time, and can be widely applied to image cutting, image thumbnail, image repositioning and real-time viewing suggestions.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to more clearly understand the technical features, objects, and effects of the present invention, embodiments of the present invention will now be described with reference to the accompanying drawings.
In this embodiment, as shown in fig. 1, a fast image composition method based on a convolutional neural network includes the following steps:
step 1: training a view evaluation model architecture based on a twin neural network:
step 2: deploying the trained view evaluation model into a teacher model, and scoring the candidate image anchor frame; taking a score training view suggestion model of the teacher as a student model, and outputting a score ranking of the same anchor frame;
and step 3: extracting multi-scale features through a target detection network;
and 4, step 4: outputting the extracted features to an anchor frame through a neural network full-connection layer;
and 5: and according to the obtained anchor frame, cutting the original input picture to obtain a new composition.
Further, the step 1 comprises the following substeps:
step 101: two sub-networks are adopted, each sub-network receives an input, maps the input to a high-dimensional feature space and outputs a corresponding representation;
step 102: the degree of similarity of the two inputs is compared by calculating the distance of the two tokens.
Further, the distance between the two characterizations is a euclidean distance.
Further, the loss function used to train the student model is:
Figure BDA0003207410160000041
wherein y represents the score output by the teachermodel, q represents the score output by the studentmodel, and n represents the number of output scores.
Further, the loss function migrates the knowledge owned by the teacher model to the student model during the training phase, and the parameters of the student model are continuously optimized through back propagation.
Further, the convolution kernel size of the view evaluation model is 3 × 3; the structure adopts the alternative arrangement of the convolution layer and the pooling layer, and increases the number of layers of nonlinear transformation.
The invention adopts a novel knowledge transfer framework to train a real-time view suggestion model based on an anchor frame. Namely, the knowledge learned by the teacher model is transferred to the student model, and the model parameters are small and the composition speed is high under the condition that the same effect is achieved as far as possible.
Unlike the target proposal network, the label assignment model of the view proposal of the present solution is very challenging. First, assigning anchor boxes to ground truth views based on an overlap measure makes it difficult to train a composition model, and slightly adjusting the views typically produces large differences in composition quality. Moreover, the annotations are not exhaustive and most anchor boxes will not be annotated. Meanwhile, for target detection scenarios, it cannot be assumed that they are negative samples.
A twin neural network based view evaluation model architecture is trained in the teachermodel: two sub-networks are used, each sub-network receiving an input, mapping it to a high-dimensional feature space, and outputting a corresponding representation. The degree of similarity of the two inputs is compared by calculating the distance of the two tokens, e.g. the euclidean distance. The invention then deploys this model as a teacher model to score candidate image anchor boxes, and the score training views of the teachers suggest that the network outputs the same anchor box score ranking as a student model. To train the student, the present invention proposes a Mean Pairwise Squared Error (MPSE) loss.
Figure BDA0003207410160000042
The loss function migrates the knowledge owned by the teacher model to the student model during the training phase, and the parameters of the student model are continuously optimized through back propagation.
In this embodiment, an extremely deep convolutional neural network for large-range image patterning is employed, which is a type of feed-forward neural network that contains convolution calculations and has a depth structure. Convolutional neural networks have a characteristic learning ability, and can perform translation invariant classification on input information according to a hierarchical structure thereof, and are also called "translation invariant artificial neural networks". Convolutional neural networks are standard hierarchical structures containing five main levels: the data exchange system comprises a data input layer, a convolution calculation layer, a ReLU excitation layer, a pooling layer and a full connection layer, wherein data are exchanged among the layers.
Because the standard convolutional neural network has the problems of gradient disappearance, gradient explosion and the like, the depth of the model is limited during training, and the image features cannot be well extracted.
Therefore, in the invention, the extremely deep convolutional neural network for composition of a large-range image based on the VGG framework is adopted, the convolutional kernel is replaced by the convolutional kernel with the size of 3x3, the structure that the convolutional layer and the pooling layer are arranged alternately is adopted, and the number of layers of nonlinear transformation is increased, so that the training parameters required by the model are greatly reduced, the model training and reasoning speed is improved, and the generalization is enhanced.
The model is applied to Scissorhands, and tests show that the model has enhanced composition quality and improved cutting speed. The VGG framework is based on a standard convolutional neural network, 3x3 convolutional kernels are used for replacing a 7x7 convolutional kernel, and 2 x3 convolutional kernels are used for replacing a 5 x 5 convolutional kernel.
The model architecture and parameters are shown in table 1.
TABLE 1 model architecture and parameter table
Figure BDA0003207410160000061
Table 2:Number of parameters(in millions).
Network A,A-LRN B C D E
Number of parameters 133 133 134 138 144
A convolutional network based SSD employs a fixed size set of bounding boxes and scores for the existence of object classes. The instances in these blocks are followed by a non-maximum suppression step to generate the final detection. Early network architectures generated high quality image classification based on a standard architecture (truncated before any classification layer), and the present invention would call the underlying network and then add auxiliary structures in the network to generate detections with critical features. SSD networks can also improve the patterning quality of scissorhands, mainly because of an important idea of the network: the characteristic pyramid detects the target on a plurality of scales to improve the detection precision, namely (1) the higher the characteristic layer is, the richer the semantic information is, the different characteristic layers represent the characteristic utilization of different levels, and the detection result is better than the detection effect only on the last layer; (2) the characteristic layers are from low to high, the receptive field is from small to large, and different characteristic layers are helpful for detecting targets with different sizes. The SSD network is additionally provided with a plurality of feature layers on the basis of a VGG16 basic layer, FC7 in VGG16 is changed into a convolutional layer Conv7, and Conv8, Conv9, Conv10 and Conv11 feature layers are added.
In this embodiment, taking an image as an example, the specific process of performing image cropping is as follows:
(1) input image data normalization: the vectors of the three channels of the input picture are converted into vectors with the average values of 0.486, 0.456, 0 and 406 and the standard deviations of 0.229, 0.224 and 0.225 respectively,
(2) calculating the vertex value of the frame, acquiring the subimage through a preset anchor frame, calculating the vertex value of the frame, and storing the acquired subimage and the frame value.
(3) Acquiring image and model parameters: and acquiring a path of the picture to be used and acquiring parameters of the training model.
(4) Data enhancement: the method comprises the steps of converting a PIL picture into a numpy array type, changing the size of the picture, disordering the sequence of the picture, randomly changing the gray value of the picture, adding Gaussian noise to the picture, or distorting the picture, and the like, enhancing the data of the picture, and enhancing the robustness of a training model through data enhancement. The various data enhancement methods can be realized by the encapsulation and calling of classes (objects), and the data enhancement functions are combined and encapsulated by the classes to be called, so that a training sample set is increased.
(5) And (3) visualizing the cut image: acquiring a predefined frame anchor, and acquiring the clipped picture image _ crops and the position bboxes of the clipped picture relative to the original picture.
The pre-training parameters of the network are called VGG by using the classic VGG network, and the rest of the architecture is the same as that of the VGG network, and the parameters are activated and updated by the last full-connection layer during training. Twin network: a classical twin network architecture is applied here.
Training of the model: and calling a GPU training model, and calling a plurality of GPUs for parallel training.
(6) Check if the data is valid: checking whether the input data has nan, infinity, and all 0, checking whether the batch normalization is effective, and obtaining the results of the batch normalization of all the input data (the batch normalization is used for enhancing the training depth of the model and improving the robustness of the model).
(7) Saving the training model: the model parameters of the current state, all the parameters of the model and the parameters of the best N models are respectively saved.
(8) Generating an image cropping annotation (i.e., generating an anchor frame): converting the picture vector into a torch.FloatTensor format for calling a Pythroch library to accelerate deep learning operation; returning a batch of training data to train in parallel; and returning to the picture shape, dynamically updating the learning rate, setting the learning rate and calculating the average accuracy.
A) Creating an output file;
B) generating a cutting label and saving (. txt format);
C) cut data save (json format).
The invention aims to train the model to find the view with good composition, has good robustness, can generate the processed view in a very short time, and can be widely applied to image cutting, image thumbnail, image repositioning and real-time viewing suggestions.
It should be noted that, for simplicity of description, the above-mentioned embodiments of the method are described as a series of acts or combinations, but those skilled in the art should understand that the present application is not limited by the order of acts described, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and elements referred to are not necessarily required in this application.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a ROM, a RAM, etc.
The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention, and it is therefore to be understood that the invention is not limited by the scope of the appended claims.

Claims (6)

1. A fast image composition method based on a convolution neural network is characterized by comprising the following steps:
step 1: training a view evaluation model architecture based on a twin neural network:
step 2: deploying the trained view evaluation model into a teacher model, and scoring the candidate image anchor frame; taking a score training view suggestion model of the teacher as a student model, and outputting a score ranking of the same anchor frame;
and step 3: extracting multi-scale features through a target detection network;
and 4, step 4: outputting the extracted features to an anchor frame through a neural network full-connection layer;
and 5: and (4) cutting the original input picture according to the anchor frame obtained in the step (4) to obtain a new composition.
2. The convolutional neural network-based fast image composition method as claimed in claim 1, wherein said step 1 comprises the following sub-steps:
step 101: two sub-networks are adopted, each sub-network receives an input, maps the input to a high-dimensional feature space and outputs a corresponding representation;
step 102: the degree of similarity of the two inputs is compared by calculating the distance of the two tokens.
3. The convolutional neural network-based fast image patterning method as claimed in claim 2, wherein the distance between the two features is Euclidean distance.
4. The convolutional neural network-based fast image composition method as claimed in claim 1, wherein the loss function used for training the student model is:
Figure FDA0003207410150000011
wherein y represents the score output by the teachermodel, q represents the score output by the studentmodel, and n represents the number of output scores.
5. The convolutional neural network-based fast image composition method as claimed in claim 4, wherein the loss function migrates the knowledge owned by the teacher model to the student model in the training phase, and the parameters of the student model are continuously optimized by back propagation.
6. The convolutional neural network-based fast image patterning method as claimed in claim 1, wherein the convolution kernel size of the view evaluation model is 3x 3; the structure adopts the alternative arrangement of the convolution layer and the pooling layer, and increases the number of layers of nonlinear transformation.
CN202110920914.XA 2021-08-11 2021-08-11 Fast image composition method based on convolutional neural network Pending CN113724261A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110920914.XA CN113724261A (en) 2021-08-11 2021-08-11 Fast image composition method based on convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110920914.XA CN113724261A (en) 2021-08-11 2021-08-11 Fast image composition method based on convolutional neural network

Publications (1)

Publication Number Publication Date
CN113724261A true CN113724261A (en) 2021-11-30

Family

ID=78675614

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110920914.XA Pending CN113724261A (en) 2021-08-11 2021-08-11 Fast image composition method based on convolutional neural network

Country Status (1)

Country Link
CN (1) CN113724261A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023147693A1 (en) * 2022-02-04 2023-08-10 Qualcomm Incorporated Non-linear thumbnail generation supervised by a saliency map

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108510574A (en) * 2018-04-17 2018-09-07 福州大学 A kind of example-based learning and the 3D rendering method of cutting out for enhancing visual quality
CN108830813A (en) * 2018-06-12 2018-11-16 福建帝视信息科技有限公司 A kind of image super-resolution Enhancement Method of knowledge based distillation
CN110097177A (en) * 2019-05-15 2019-08-06 电科瑞达(成都)科技有限公司 A kind of network pruning method based on pseudo- twin network
CN110533097A (en) * 2019-08-27 2019-12-03 腾讯科技(深圳)有限公司 A kind of image definition recognition methods, device, electronic equipment and storage medium
CN111354017A (en) * 2020-03-04 2020-06-30 江南大学 Target tracking method based on twin neural network and parallel attention module
CN111523463A (en) * 2020-04-22 2020-08-11 南京工程学院 Target tracking method and training method based on matching-regression network
WO2020204460A1 (en) * 2019-04-01 2020-10-08 Samsung Electronics Co., Ltd. A method for recognizing human emotions in images
WO2021023667A1 (en) * 2019-08-06 2021-02-11 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. System and method for assisting selective hearing
CN112381083A (en) * 2020-06-12 2021-02-19 杭州喔影网络科技有限公司 Saliency perception image clipping method based on potential region pair

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108510574A (en) * 2018-04-17 2018-09-07 福州大学 A kind of example-based learning and the 3D rendering method of cutting out for enhancing visual quality
CN108830813A (en) * 2018-06-12 2018-11-16 福建帝视信息科技有限公司 A kind of image super-resolution Enhancement Method of knowledge based distillation
WO2020204460A1 (en) * 2019-04-01 2020-10-08 Samsung Electronics Co., Ltd. A method for recognizing human emotions in images
CN110097177A (en) * 2019-05-15 2019-08-06 电科瑞达(成都)科技有限公司 A kind of network pruning method based on pseudo- twin network
WO2021023667A1 (en) * 2019-08-06 2021-02-11 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. System and method for assisting selective hearing
CN110533097A (en) * 2019-08-27 2019-12-03 腾讯科技(深圳)有限公司 A kind of image definition recognition methods, device, electronic equipment and storage medium
CN111354017A (en) * 2020-03-04 2020-06-30 江南大学 Target tracking method based on twin neural network and parallel attention module
CN111523463A (en) * 2020-04-22 2020-08-11 南京工程学院 Target tracking method and training method based on matching-regression network
CN112381083A (en) * 2020-06-12 2021-02-19 杭州喔影网络科技有限公司 Saliency perception image clipping method based on potential region pair
CN113159028A (en) * 2020-06-12 2021-07-23 杭州喔影网络科技有限公司 Saliency-aware image cropping method and apparatus, computing device, and storage medium

Non-Patent Citations (10)

* Cited by examiner, † Cited by third party
Title
KAREN SIMONYAN 等: "Very Deep Convolutional Networks for Large-Scale Image Recognition", pages 1 - 14 *
KAREN SIMONYAN等: "VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION", HTTP://ARXIV.ORG/ABS/1409.1556V6, pages 1 - 14 *
KELLY L. WIGGERS 等: "Image Retrieval and Pattern Spotting using Siamese Neural Network", pages 1 - 8 *
MICHAEL LOSTER 等: "Knowledge Transfer for Entity Resolution with Siamese Neural Networks", vol. 13, no. 1, pages 1 - 44 *
S. CHOPRA 等: "Learning a similarity metric discriminatively, with application to face verification", 2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR\'05), pages 539 - 546 *
WEI LIU 等: "SSD: Single Shot MultiBox Detector", ARXIV:1512.02325V5 [CS.CV], pages 1 - 17 *
YUANPEI LIU 等: "Teacher-Students Knowledge Distillation for Siamese Trackers", pages 1 - 11 *
Z. WEI 等: "Good View Hunting: Learning Photo Composition from Dense View Pairs", 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, pages 5437 - 5446 *
尚欣茹 等: "孪生导向锚框RPN网络实时目标跟踪", vol. 26, no. 2, pages 415 - 424 *
张志扬 等: "基于深度学习的信息级联预测方法综述", vol. 47, no. 7, pages 141 - 153 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023147693A1 (en) * 2022-02-04 2023-08-10 Qualcomm Incorporated Non-linear thumbnail generation supervised by a saliency map

Similar Documents

Publication Publication Date Title
CN110738207B (en) Character detection method for fusing character area edge information in character image
CN108564129B (en) Trajectory data classification method based on generation countermeasure network
CN112070768B (en) Anchor-Free based real-time instance segmentation method
CN114549913B (en) Semantic segmentation method and device, computer equipment and storage medium
CN107784288A (en) A kind of iteration positioning formula method for detecting human face based on deep neural network
CN110674777A (en) Optical character recognition method in patent text scene
CN112308825B (en) SqueezeNet-based crop leaf disease identification method
KR102370910B1 (en) Method and apparatus for few-shot image classification based on deep learning
CN111767962A (en) One-stage target detection method, system and device based on generation countermeasure network
CN112651418B (en) Data classification method, classifier training method and system
CN114332473A (en) Object detection method, object detection device, computer equipment, storage medium and program product
CN112364974A (en) Improved YOLOv3 algorithm based on activation function
CN116977844A (en) Lightweight underwater target real-time detection method
Ouf Leguminous seeds detection based on convolutional neural networks: Comparison of faster R-CNN and YOLOv4 on a small custom dataset
Jeevanantham et al. Deep learning based plant diseases monitoring and detection system
CN113724261A (en) Fast image composition method based on convolutional neural network
CN113408418A (en) Calligraphy font and character content synchronous identification method and system
CN117315752A (en) Training method, device, equipment and medium for face emotion recognition network model
CN116597275A (en) High-speed moving target recognition method based on data enhancement
CN110659724A (en) Target detection convolutional neural network construction method based on target scale range
CN111242114A (en) Character recognition method and device
CN116030341A (en) Plant leaf disease detection method based on deep learning, computer equipment and storage medium
CN115640401A (en) Text content extraction method and device
CN114241470A (en) Natural scene character detection method based on attention mechanism
CN113989671A (en) Remote sensing scene classification method and system based on semantic perception and dynamic graph convolution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20211130