CN113724261A - Fast image composition method based on convolutional neural network - Google Patents
Fast image composition method based on convolutional neural network Download PDFInfo
- Publication number
- CN113724261A CN113724261A CN202110920914.XA CN202110920914A CN113724261A CN 113724261 A CN113724261 A CN 113724261A CN 202110920914 A CN202110920914 A CN 202110920914A CN 113724261 A CN113724261 A CN 113724261A
- Authority
- CN
- China
- Prior art keywords
- model
- neural network
- image
- training
- anchor frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 18
- 238000012549 training Methods 0.000 claims abstract description 33
- 238000013528 artificial neural network Methods 0.000 claims abstract description 13
- 238000013210 evaluation model Methods 0.000 claims abstract description 12
- 238000001514 detection method Methods 0.000 claims abstract description 11
- 238000011176 pooling Methods 0.000 claims description 5
- 238000000059 patterning Methods 0.000 claims description 4
- 230000009466 transformation Effects 0.000 claims description 4
- 230000006870 function Effects 0.000 description 6
- 238000010606 normalization Methods 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000002708 enhancing effect Effects 0.000 description 3
- 230000001965 increasing effect Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 239000013598 vector Substances 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000012512 characterization method Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000013519 translation Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 238000005538 encapsulation Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20112—Image segmentation details
- G06T2207/20132—Image cropping
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a fast image composition method based on a convolutional neural network, which comprises the following steps: step 1: training a view evaluation model architecture based on a twin neural network: step 2: deploying the trained view evaluation model into a teacher model, and scoring the candidate image anchor frame; taking a score training view suggestion model of the teacher as a student model, and outputting a score ranking of the same anchor frame; and step 3: extracting multi-scale features through a target detection network; and 4, step 4: outputting the extracted features to an anchor frame through a neural network full-connection layer; and 5: and (4) cutting the original input picture according to the anchor frame obtained in the step (4) to obtain a new composition. The invention aims to train the model to find the view with good composition, has good robustness, can generate the processed view in a very short time, and can be widely applied to image cutting, image thumbnail, image repositioning and real-time viewing suggestions.
Description
Technical Field
The invention relates to the field of image processing, in particular to a fast image composition method based on a convolutional neural network.
Background
Early cropping methods explicitly designed various manual features based on photographic knowledge (e.g., the trisection and center methods). With the development of deep learning, a great deal of researchers are dedicated to developing clipping methods in a data-driven manner, and the release of some reference data sets for comparison greatly facilitates the progress of related research.
However, obtaining the best candidate clip map is still extremely difficult, and is mainly influenced by the following three aspects: 1) the potential of image saliency information cannot be fully released. Previous saliency-based clipping methods focused on preserving the most important content in the best clip diagram, but ignored this: the saliency region and the best cropped picture overlap if the rectangle of the saliency region is located near the boundary of the source image. Moreover, the saliency information is only used for the generation of candidate clipping maps and is not continuously used in subsequent clipping modules. 2) The potential region pairs (region of interest (ROI) and region of discard (ROD)) and their internal laws are not well represented. In general, a pair-wise cropping method explicitly forms and feeds a pair of source images into an automated cropping model, but the performance of such methods is often poor due to the selection of a source image pair that is overly dependent on detail and uncertain. 3) Traditional indicators for evaluating clipping methods are unreliable and inaccurate. In some cases, the intersection ratio (IoU) and the Boundary Displacement Error (BDE) are not sufficient to subjectively evaluate the performance of its clipping method.
In the field of image processing technology, deep learning brings revolutionary changes to machine learning and makes significant improvements over a wide variety of complex tasks. In recent years, with the dramatic increase in image processing data volumes, many researchers have been working on training Deep Neural Networks (DNNs) in a distributed manner. Under distributed training, a data parallel Stochastic Gradient Descent (SGD) method is generally adopted for training, training examples are scattered on each worker, each worker trains gradients based on own data, all gradient update model parameters are aggregated in an all reduce or parameter server mode, and the updated parameters are sent back to all workers of the next iteration. Many applications benefit from training the model to find a view with a good composition, such as image cropping, image thumbnails, recommended viewing, and self-contained photography. Image cropping, which aims to find an image crop with the best aesthetic quality, is widely used in image post-processing, visual recommendation, and image selection as an important technique. Especially when a large number of images need to be cropped, image cropping becomes a laborious task. Thus, in recent years, automated image cropping has attracted increasing attention within the research community and industry.
Patent application CN202110400578.6 discloses a saliency sensing image cropping method, device, computing device and storage medium, which solve the problems of insufficient utilization of image saliency information and possible overfitting of a model in the prior art.
However, assigning anchor boxes to ground truth views based on an overlap measure makes it difficult to train a composition model, and slightly adjusting the views typically produces large differences in composition quality. Moreover, the annotations are not exhaustive and most anchor boxes will not be annotated.
Meanwhile, in the prior art, the present invention cannot assume that they are negative samples for the target detection scenario.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a fast image composition method based on a convolutional neural network.
The purpose of the invention is realized by the following technical scheme:
a fast image composition method based on a convolutional neural network comprises the following steps:
step 1: training a view evaluation model architecture based on a twin neural network:
step 2: deploying the trained view evaluation model into a teacher model, and scoring the candidate image anchor frame; taking a score training view suggestion model of the teacher as a student model, and outputting a score ranking of the same anchor frame;
and step 3: extracting multi-scale features through a target detection network;
and 4, step 4: outputting the extracted features to an anchor frame through a neural network full-connection layer;
and 5: and (4) cutting the original input picture according to the anchor frame obtained in the step (4) to obtain a new composition.
Further, the step 1 comprises the following substeps:
step 101: two sub-networks are adopted, each sub-network receives an input, maps the input to a high-dimensional feature space and outputs a corresponding representation;
step 102: the degree of similarity of the two inputs is compared by calculating the distance of the two tokens.
Further, the distance between the two characterizations is a euclidean distance.
Further, the loss function used to train the student model is:
wherein y represents the score output by the teachermodel, q represents the score output by the student model, and n represents the number of output scores.
Further, the loss function migrates the knowledge owned by the teacher model to the student model during the training phase, and the parameters of the student model are continuously optimized through back propagation.
Further, the convolution kernel size of the view evaluation model is 3 × 3; the structure adopts the alternative arrangement of the convolution layer and the pooling layer, and increases the number of layers of nonlinear transformation.
The invention has the beneficial effects that: the invention aims to train the model to find the view with good composition, has good robustness, can generate the processed view in a very short time, and can be widely applied to image cutting, image thumbnail, image repositioning and real-time viewing suggestions.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to more clearly understand the technical features, objects, and effects of the present invention, embodiments of the present invention will now be described with reference to the accompanying drawings.
In this embodiment, as shown in fig. 1, a fast image composition method based on a convolutional neural network includes the following steps:
step 1: training a view evaluation model architecture based on a twin neural network:
step 2: deploying the trained view evaluation model into a teacher model, and scoring the candidate image anchor frame; taking a score training view suggestion model of the teacher as a student model, and outputting a score ranking of the same anchor frame;
and step 3: extracting multi-scale features through a target detection network;
and 4, step 4: outputting the extracted features to an anchor frame through a neural network full-connection layer;
and 5: and according to the obtained anchor frame, cutting the original input picture to obtain a new composition.
Further, the step 1 comprises the following substeps:
step 101: two sub-networks are adopted, each sub-network receives an input, maps the input to a high-dimensional feature space and outputs a corresponding representation;
step 102: the degree of similarity of the two inputs is compared by calculating the distance of the two tokens.
Further, the distance between the two characterizations is a euclidean distance.
Further, the loss function used to train the student model is:
wherein y represents the score output by the teachermodel, q represents the score output by the studentmodel, and n represents the number of output scores.
Further, the loss function migrates the knowledge owned by the teacher model to the student model during the training phase, and the parameters of the student model are continuously optimized through back propagation.
Further, the convolution kernel size of the view evaluation model is 3 × 3; the structure adopts the alternative arrangement of the convolution layer and the pooling layer, and increases the number of layers of nonlinear transformation.
The invention adopts a novel knowledge transfer framework to train a real-time view suggestion model based on an anchor frame. Namely, the knowledge learned by the teacher model is transferred to the student model, and the model parameters are small and the composition speed is high under the condition that the same effect is achieved as far as possible.
Unlike the target proposal network, the label assignment model of the view proposal of the present solution is very challenging. First, assigning anchor boxes to ground truth views based on an overlap measure makes it difficult to train a composition model, and slightly adjusting the views typically produces large differences in composition quality. Moreover, the annotations are not exhaustive and most anchor boxes will not be annotated. Meanwhile, for target detection scenarios, it cannot be assumed that they are negative samples.
A twin neural network based view evaluation model architecture is trained in the teachermodel: two sub-networks are used, each sub-network receiving an input, mapping it to a high-dimensional feature space, and outputting a corresponding representation. The degree of similarity of the two inputs is compared by calculating the distance of the two tokens, e.g. the euclidean distance. The invention then deploys this model as a teacher model to score candidate image anchor boxes, and the score training views of the teachers suggest that the network outputs the same anchor box score ranking as a student model. To train the student, the present invention proposes a Mean Pairwise Squared Error (MPSE) loss.
The loss function migrates the knowledge owned by the teacher model to the student model during the training phase, and the parameters of the student model are continuously optimized through back propagation.
In this embodiment, an extremely deep convolutional neural network for large-range image patterning is employed, which is a type of feed-forward neural network that contains convolution calculations and has a depth structure. Convolutional neural networks have a characteristic learning ability, and can perform translation invariant classification on input information according to a hierarchical structure thereof, and are also called "translation invariant artificial neural networks". Convolutional neural networks are standard hierarchical structures containing five main levels: the data exchange system comprises a data input layer, a convolution calculation layer, a ReLU excitation layer, a pooling layer and a full connection layer, wherein data are exchanged among the layers.
Because the standard convolutional neural network has the problems of gradient disappearance, gradient explosion and the like, the depth of the model is limited during training, and the image features cannot be well extracted.
Therefore, in the invention, the extremely deep convolutional neural network for composition of a large-range image based on the VGG framework is adopted, the convolutional kernel is replaced by the convolutional kernel with the size of 3x3, the structure that the convolutional layer and the pooling layer are arranged alternately is adopted, and the number of layers of nonlinear transformation is increased, so that the training parameters required by the model are greatly reduced, the model training and reasoning speed is improved, and the generalization is enhanced.
The model is applied to Scissorhands, and tests show that the model has enhanced composition quality and improved cutting speed. The VGG framework is based on a standard convolutional neural network, 3x3 convolutional kernels are used for replacing a 7x7 convolutional kernel, and 2 x3 convolutional kernels are used for replacing a 5 x 5 convolutional kernel.
The model architecture and parameters are shown in table 1.
TABLE 1 model architecture and parameter table
Table 2:Number of parameters(in millions).
Network | A,A-LRN | B | C | D | E |
Number of parameters | 133 | 133 | 134 | 138 | 144 |
A convolutional network based SSD employs a fixed size set of bounding boxes and scores for the existence of object classes. The instances in these blocks are followed by a non-maximum suppression step to generate the final detection. Early network architectures generated high quality image classification based on a standard architecture (truncated before any classification layer), and the present invention would call the underlying network and then add auxiliary structures in the network to generate detections with critical features. SSD networks can also improve the patterning quality of scissorhands, mainly because of an important idea of the network: the characteristic pyramid detects the target on a plurality of scales to improve the detection precision, namely (1) the higher the characteristic layer is, the richer the semantic information is, the different characteristic layers represent the characteristic utilization of different levels, and the detection result is better than the detection effect only on the last layer; (2) the characteristic layers are from low to high, the receptive field is from small to large, and different characteristic layers are helpful for detecting targets with different sizes. The SSD network is additionally provided with a plurality of feature layers on the basis of a VGG16 basic layer, FC7 in VGG16 is changed into a convolutional layer Conv7, and Conv8, Conv9, Conv10 and Conv11 feature layers are added.
In this embodiment, taking an image as an example, the specific process of performing image cropping is as follows:
(1) input image data normalization: the vectors of the three channels of the input picture are converted into vectors with the average values of 0.486, 0.456, 0 and 406 and the standard deviations of 0.229, 0.224 and 0.225 respectively,
(2) calculating the vertex value of the frame, acquiring the subimage through a preset anchor frame, calculating the vertex value of the frame, and storing the acquired subimage and the frame value.
(3) Acquiring image and model parameters: and acquiring a path of the picture to be used and acquiring parameters of the training model.
(4) Data enhancement: the method comprises the steps of converting a PIL picture into a numpy array type, changing the size of the picture, disordering the sequence of the picture, randomly changing the gray value of the picture, adding Gaussian noise to the picture, or distorting the picture, and the like, enhancing the data of the picture, and enhancing the robustness of a training model through data enhancement. The various data enhancement methods can be realized by the encapsulation and calling of classes (objects), and the data enhancement functions are combined and encapsulated by the classes to be called, so that a training sample set is increased.
(5) And (3) visualizing the cut image: acquiring a predefined frame anchor, and acquiring the clipped picture image _ crops and the position bboxes of the clipped picture relative to the original picture.
The pre-training parameters of the network are called VGG by using the classic VGG network, and the rest of the architecture is the same as that of the VGG network, and the parameters are activated and updated by the last full-connection layer during training. Twin network: a classical twin network architecture is applied here.
Training of the model: and calling a GPU training model, and calling a plurality of GPUs for parallel training.
(6) Check if the data is valid: checking whether the input data has nan, infinity, and all 0, checking whether the batch normalization is effective, and obtaining the results of the batch normalization of all the input data (the batch normalization is used for enhancing the training depth of the model and improving the robustness of the model).
(7) Saving the training model: the model parameters of the current state, all the parameters of the model and the parameters of the best N models are respectively saved.
(8) Generating an image cropping annotation (i.e., generating an anchor frame): converting the picture vector into a torch.FloatTensor format for calling a Pythroch library to accelerate deep learning operation; returning a batch of training data to train in parallel; and returning to the picture shape, dynamically updating the learning rate, setting the learning rate and calculating the average accuracy.
A) Creating an output file;
B) generating a cutting label and saving (. txt format);
C) cut data save (json format).
The invention aims to train the model to find the view with good composition, has good robustness, can generate the processed view in a very short time, and can be widely applied to image cutting, image thumbnail, image repositioning and real-time viewing suggestions.
It should be noted that, for simplicity of description, the above-mentioned embodiments of the method are described as a series of acts or combinations, but those skilled in the art should understand that the present application is not limited by the order of acts described, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and elements referred to are not necessarily required in this application.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a ROM, a RAM, etc.
The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention, and it is therefore to be understood that the invention is not limited by the scope of the appended claims.
Claims (6)
1. A fast image composition method based on a convolution neural network is characterized by comprising the following steps:
step 1: training a view evaluation model architecture based on a twin neural network:
step 2: deploying the trained view evaluation model into a teacher model, and scoring the candidate image anchor frame; taking a score training view suggestion model of the teacher as a student model, and outputting a score ranking of the same anchor frame;
and step 3: extracting multi-scale features through a target detection network;
and 4, step 4: outputting the extracted features to an anchor frame through a neural network full-connection layer;
and 5: and (4) cutting the original input picture according to the anchor frame obtained in the step (4) to obtain a new composition.
2. The convolutional neural network-based fast image composition method as claimed in claim 1, wherein said step 1 comprises the following sub-steps:
step 101: two sub-networks are adopted, each sub-network receives an input, maps the input to a high-dimensional feature space and outputs a corresponding representation;
step 102: the degree of similarity of the two inputs is compared by calculating the distance of the two tokens.
3. The convolutional neural network-based fast image patterning method as claimed in claim 2, wherein the distance between the two features is Euclidean distance.
4. The convolutional neural network-based fast image composition method as claimed in claim 1, wherein the loss function used for training the student model is:
wherein y represents the score output by the teachermodel, q represents the score output by the studentmodel, and n represents the number of output scores.
5. The convolutional neural network-based fast image composition method as claimed in claim 4, wherein the loss function migrates the knowledge owned by the teacher model to the student model in the training phase, and the parameters of the student model are continuously optimized by back propagation.
6. The convolutional neural network-based fast image patterning method as claimed in claim 1, wherein the convolution kernel size of the view evaluation model is 3x 3; the structure adopts the alternative arrangement of the convolution layer and the pooling layer, and increases the number of layers of nonlinear transformation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110920914.XA CN113724261A (en) | 2021-08-11 | 2021-08-11 | Fast image composition method based on convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110920914.XA CN113724261A (en) | 2021-08-11 | 2021-08-11 | Fast image composition method based on convolutional neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113724261A true CN113724261A (en) | 2021-11-30 |
Family
ID=78675614
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110920914.XA Pending CN113724261A (en) | 2021-08-11 | 2021-08-11 | Fast image composition method based on convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113724261A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023147693A1 (en) * | 2022-02-04 | 2023-08-10 | Qualcomm Incorporated | Non-linear thumbnail generation supervised by a saliency map |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108510574A (en) * | 2018-04-17 | 2018-09-07 | 福州大学 | A kind of example-based learning and the 3D rendering method of cutting out for enhancing visual quality |
CN108830813A (en) * | 2018-06-12 | 2018-11-16 | 福建帝视信息科技有限公司 | A kind of image super-resolution Enhancement Method of knowledge based distillation |
CN110097177A (en) * | 2019-05-15 | 2019-08-06 | 电科瑞达(成都)科技有限公司 | A kind of network pruning method based on pseudo- twin network |
CN110533097A (en) * | 2019-08-27 | 2019-12-03 | 腾讯科技(深圳)有限公司 | A kind of image definition recognition methods, device, electronic equipment and storage medium |
CN111354017A (en) * | 2020-03-04 | 2020-06-30 | 江南大学 | Target tracking method based on twin neural network and parallel attention module |
CN111523463A (en) * | 2020-04-22 | 2020-08-11 | 南京工程学院 | Target tracking method and training method based on matching-regression network |
WO2020204460A1 (en) * | 2019-04-01 | 2020-10-08 | Samsung Electronics Co., Ltd. | A method for recognizing human emotions in images |
WO2021023667A1 (en) * | 2019-08-06 | 2021-02-11 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | System and method for assisting selective hearing |
CN112381083A (en) * | 2020-06-12 | 2021-02-19 | 杭州喔影网络科技有限公司 | Saliency perception image clipping method based on potential region pair |
-
2021
- 2021-08-11 CN CN202110920914.XA patent/CN113724261A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108510574A (en) * | 2018-04-17 | 2018-09-07 | 福州大学 | A kind of example-based learning and the 3D rendering method of cutting out for enhancing visual quality |
CN108830813A (en) * | 2018-06-12 | 2018-11-16 | 福建帝视信息科技有限公司 | A kind of image super-resolution Enhancement Method of knowledge based distillation |
WO2020204460A1 (en) * | 2019-04-01 | 2020-10-08 | Samsung Electronics Co., Ltd. | A method for recognizing human emotions in images |
CN110097177A (en) * | 2019-05-15 | 2019-08-06 | 电科瑞达(成都)科技有限公司 | A kind of network pruning method based on pseudo- twin network |
WO2021023667A1 (en) * | 2019-08-06 | 2021-02-11 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | System and method for assisting selective hearing |
CN110533097A (en) * | 2019-08-27 | 2019-12-03 | 腾讯科技(深圳)有限公司 | A kind of image definition recognition methods, device, electronic equipment and storage medium |
CN111354017A (en) * | 2020-03-04 | 2020-06-30 | 江南大学 | Target tracking method based on twin neural network and parallel attention module |
CN111523463A (en) * | 2020-04-22 | 2020-08-11 | 南京工程学院 | Target tracking method and training method based on matching-regression network |
CN112381083A (en) * | 2020-06-12 | 2021-02-19 | 杭州喔影网络科技有限公司 | Saliency perception image clipping method based on potential region pair |
CN113159028A (en) * | 2020-06-12 | 2021-07-23 | 杭州喔影网络科技有限公司 | Saliency-aware image cropping method and apparatus, computing device, and storage medium |
Non-Patent Citations (10)
Title |
---|
KAREN SIMONYAN 等: "Very Deep Convolutional Networks for Large-Scale Image Recognition", pages 1 - 14 * |
KAREN SIMONYAN等: "VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION", HTTP://ARXIV.ORG/ABS/1409.1556V6, pages 1 - 14 * |
KELLY L. WIGGERS 等: "Image Retrieval and Pattern Spotting using Siamese Neural Network", pages 1 - 8 * |
MICHAEL LOSTER 等: "Knowledge Transfer for Entity Resolution with Siamese Neural Networks", vol. 13, no. 1, pages 1 - 44 * |
S. CHOPRA 等: "Learning a similarity metric discriminatively, with application to face verification", 2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR\'05), pages 539 - 546 * |
WEI LIU 等: "SSD: Single Shot MultiBox Detector", ARXIV:1512.02325V5 [CS.CV], pages 1 - 17 * |
YUANPEI LIU 等: "Teacher-Students Knowledge Distillation for Siamese Trackers", pages 1 - 11 * |
Z. WEI 等: "Good View Hunting: Learning Photo Composition from Dense View Pairs", 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, pages 5437 - 5446 * |
尚欣茹 等: "孪生导向锚框RPN网络实时目标跟踪", vol. 26, no. 2, pages 415 - 424 * |
张志扬 等: "基于深度学习的信息级联预测方法综述", vol. 47, no. 7, pages 141 - 153 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023147693A1 (en) * | 2022-02-04 | 2023-08-10 | Qualcomm Incorporated | Non-linear thumbnail generation supervised by a saliency map |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110738207B (en) | Character detection method for fusing character area edge information in character image | |
CN108564129B (en) | Trajectory data classification method based on generation countermeasure network | |
CN112070768B (en) | Anchor-Free based real-time instance segmentation method | |
CN114549913B (en) | Semantic segmentation method and device, computer equipment and storage medium | |
CN107784288A (en) | A kind of iteration positioning formula method for detecting human face based on deep neural network | |
CN110674777A (en) | Optical character recognition method in patent text scene | |
CN112308825B (en) | SqueezeNet-based crop leaf disease identification method | |
KR102370910B1 (en) | Method and apparatus for few-shot image classification based on deep learning | |
CN111767962A (en) | One-stage target detection method, system and device based on generation countermeasure network | |
CN112651418B (en) | Data classification method, classifier training method and system | |
CN114332473A (en) | Object detection method, object detection device, computer equipment, storage medium and program product | |
CN112364974A (en) | Improved YOLOv3 algorithm based on activation function | |
CN116977844A (en) | Lightweight underwater target real-time detection method | |
Ouf | Leguminous seeds detection based on convolutional neural networks: Comparison of faster R-CNN and YOLOv4 on a small custom dataset | |
Jeevanantham et al. | Deep learning based plant diseases monitoring and detection system | |
CN113724261A (en) | Fast image composition method based on convolutional neural network | |
CN113408418A (en) | Calligraphy font and character content synchronous identification method and system | |
CN117315752A (en) | Training method, device, equipment and medium for face emotion recognition network model | |
CN116597275A (en) | High-speed moving target recognition method based on data enhancement | |
CN110659724A (en) | Target detection convolutional neural network construction method based on target scale range | |
CN111242114A (en) | Character recognition method and device | |
CN116030341A (en) | Plant leaf disease detection method based on deep learning, computer equipment and storage medium | |
CN115640401A (en) | Text content extraction method and device | |
CN114241470A (en) | Natural scene character detection method based on attention mechanism | |
CN113989671A (en) | Remote sensing scene classification method and system based on semantic perception and dynamic graph convolution |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20211130 |