CN111488886B - Panoramic image significance prediction method, system and terminal for arranging attention features - Google Patents
Panoramic image significance prediction method, system and terminal for arranging attention features Download PDFInfo
- Publication number
- CN111488886B CN111488886B CN202010171615.6A CN202010171615A CN111488886B CN 111488886 B CN111488886 B CN 111488886B CN 202010171615 A CN202010171615 A CN 202010171615A CN 111488886 B CN111488886 B CN 111488886B
- Authority
- CN
- China
- Prior art keywords
- channel
- attention
- feature
- map
- features
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 54
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 18
- 238000012163 sequencing technique Methods 0.000 claims abstract description 6
- 238000010586 diagram Methods 0.000 claims description 31
- 230000015654 memory Effects 0.000 claims description 22
- 238000000605 extraction Methods 0.000 claims description 21
- 238000004590 computer program Methods 0.000 claims description 8
- 230000004927 fusion Effects 0.000 claims description 7
- 238000011176 pooling Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 claims description 3
- 239000000284 extract Substances 0.000 claims description 3
- 230000000007 visual effect Effects 0.000 abstract description 7
- 230000007246 mechanism Effects 0.000 abstract description 6
- 238000005516 engineering process Methods 0.000 description 4
- 230000036961 partial effect Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000002411 adverse Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000004424 eye movement Effects 0.000 description 1
- 230000004886 head movement Effects 0.000 description 1
- 230000000670 limiting effect Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a panoramic image significance prediction method based on arrangement attention characteristics, which comprises the following steps: extracting a template feature map and a channel-by-channel feature map, and multiplying the template feature map and the channel-by-channel feature map to generate channel-by-channel features; performing attention feature arrangement on the generated channel-by-channel features; and selecting the channel-by-channel characteristics which are useful for fine granularity significance prediction according to the sequencing result, and inputting the selected channel-by-channel characteristics into a convolutional neural network for head fixation point prediction. The invention also provides a system and a terminal corresponding to the method. The invention not only can better simulate the human visual attention mechanism, but also can obtain higher prediction accuracy.
Description
Technical Field
The invention relates to the technical field of image saliency prediction, in particular to a panoramic image saliency prediction method based on arrangement attention features, and especially relates to a panoramic image saliency prediction method based on partial attention features, channel-by-channel features and arrangement attention features.
Background
In recent years, with the rapid development of mobile internet technology and advanced display technology, virtual Reality (VR) is gradually moving into people's lives and is widely used. Among them, the presentation of panoramic images and panoramic video by a Head Mounted Display (HMD) is a very important application of VR technology. Unlike traditional images and videos, panoramic images and panoramic videos may provide users with an immersive and interactive visual experience. Specifically, users can freely move their heads through the head-mounted display to view contents having a view angle field range within 360 ° x 180 °. In other words, people can freely rotate their heads to view the area of the panoramic image that is most attractive to people's vision. Thus, it can be seen that head gaze point is critical to exploring and modeling visual attention in panoramic images. It is necessary to predict the head gaze point in the panoramic image.
Models for significance prediction of head gaze points in panoramic images can be divided into two categories: one is a significance prediction method based on low-level feature extraction; the other category is a significance prediction method based on high-level semantic feature extraction of the deep learning technology. Among them, for the first type of saliency prediction method, representative work is "Gbvs360, BMS360, prosal: extending existing saliency prediction models from 2d to omnidirectional images" published by Lebreton et al in 2018, "Signal Processing: image Communication", which is a saliency prediction method of BMS360 and Gbvs360 proposed by expanding the conventional two saliency prediction methods BMS and Gbvs so that they are applicable to panoramic images.
In addition, there is "The prediction of head and eye movement for 360 degranulation images" published by Zhu et al in 2018, "Signal Processing: image Communication," which simulate a viewing angle window by projecting a panoramic image into a plurality of view blocks, then extracting bottom-up and top-down features on the plurality of view blocks, and finally fusing the extracted features to obtain a salient map of the head gaze point. However, these methods are heuristic and the accuracy of the significance prediction is not high. The second category is significance prediction methods based on deep learning, and the current method with better performance is Salgan: visual saliency prediction with adversarial networks published by Pan et al in 2018, "CVPR Scene Understanding Workshop," which realizes significance prediction by introducing a challenge sample and performing challenge training. However, when this type of CNN model based on various varieties performs saliency prediction on a panoramic image, not all features extracted through the CNN model are useful for final fine-grained saliency prediction, i.e., there are cases where feature redundancy, which may adversely affect saliency prediction.
Disclosure of Invention
In view of the above-mentioned shortcomings in the prior art, the present invention aims to provide a panoramic image saliency prediction method, a system and a terminal based on arrangement attention features, and a panoramic image saliency prediction based on partial attention features (foreground and background attention force), channel-by-channel features and arrangement attention feature models.
According to a first aspect of the present invention, there is provided a panoramic image saliency prediction method based on arrangement attention characteristics, comprising:
extracting a template feature map and a channel-by-channel feature map, and multiplying the template feature map and the channel-by-channel feature map to generate channel-by-channel features;
performing attention feature arrangement on the generated channel-by-channel features;
and selecting the channel-by-channel characteristics which are useful for fine granularity significance prediction according to the sequencing result, and inputting the selected channel-by-channel characteristics into a convolutional neural network for head fixation point prediction.
Optionally, the extracting the template feature map includes:
extracting a foreground attention map and a background attention map using a two-phase branched network based on a ResNet50 predictive network;
and carrying out weighted fusion on the obtained foreground attention force diagram and the obtained background attention force diagram to obtain a template feature diagram.
Optionally, the extracting the foreground attention map and the background attention map using the two-stage branch network of the ResNet 50-based predictive network includes:
the formula for the prediction in the first stage is as follows:
wherein ,F1 and B1 Representing a predicted foreground attention map and background attention map, M, respectively 1 Is a feature map obtained by predicting a network through ResNet50, phi 1 Andrepresenting two independent ResNet50 predictive networks;
in the second stage, the foreground attention force diagram and the background attention force diagram generated in the first stage are enhanced, and the specific formula is as follows:
wherein ,Fatt and Batt Representing the final predicted foreground attention map and background attention map, respectively.
Optionally, the obtained foreground attention map and the obtained background attention map are subjected to weighted fusion to obtain a template feature map, which means that:
and fusing the obtained foreground attention force diagram and the obtained background attention force diagram by adopting a linear weighting method to obtain a template feature diagram.
Optionally, the extracting the channel-by-channel feature map includes:
channel-by-channel feature maps are extracted using a network predicted based on ResNet50, which is the feature map output at the last layer of the ResNet50 predicted network.
Optionally, the ranking the generated channel-by-channel features with attention features includes:
the channel-by-channel feature maps are ranked from large to small according to their corresponding scores, and the greater the score of the channel-by-channel feature map, the more important the channel feature is for final fine-grained saliency prediction.
Optionally, the channel-by-channel feature to be generated is arranged for attention features, and is implemented according to the following method:
the importance of a channel-by-channel feature map is demonstrated by ranking the network automatically learning ranking scores, and the formula for calculating ranking scores is defined as:
r'=f n (S')+f max (S')
wherein ,fn Is a CNN-based network, f max Is a network comprising a channel-by-channel global maximum pooling layer, S 'represents a channel-by-channel feature map, and r' represents an arrangement score;
and (3) arranging the channel-by-channel characteristic graphs from large to small according to the obtained arrangement score, wherein a specific calculation formula is as follows:
Optionally, the selecting feature enhancement of the channel-by-channel feature useful for fine-grained saliency prediction comprises:
selecting some features important for fine granularity significance prediction according to the arrangement score size and experimental effect of the channel-by-channel feature map, and discarding some features with smaller arrangement scores, namely redundant features;
and sending the selected channel-by-channel characteristics into a convolutional neural network, and outputting a predicted saliency map.
According to a second aspect of the present invention, there is provided a panoramic image saliency prediction system based on arrangement attention characteristics, comprising:
the feature extraction module is used for extracting a template feature map and a channel-by-channel feature map, and multiplying the template feature map and the channel-by-channel feature map to generate channel-by-channel features;
an attention feature arrangement module for arranging the attention features of the channel-by-channel features generated by the feature extraction module;
and the characteristic enhancement module is used for selecting the channel-by-channel characteristics which are useful for fine granularity significance prediction to carry out characteristic enhancement according to the sequencing result of the arrangement attention characteristic module, and inputting the selected channel-by-channel characteristics into a convolutional neural network to carry out head fixation point prediction.
According to a third aspect of the present invention, there is provided a panoramic image saliency prediction terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, said processor being operable to perform the above-described panoramic image saliency prediction method based on a permutation attention characteristic when said program is executed.
Compared with the prior art, the invention has the following beneficial effects:
according to the method, the system and the terminal, after the template feature map and the channel-by-channel feature map are extracted, the attention is arranged, the features which are useful for fine granularity saliency prediction are arranged and selected based on the score index, and the method can be used for obtaining high saliency prediction accuracy.
According to the method, the system and the terminal, partial attention (foreground and background attention) feature extraction and channel-by-channel feature extraction are organically integrated together, training is carried out in an end-to-end mode, a human visual attention mechanism can be well simulated, and meanwhile, high prediction accuracy can be obtained.
Drawings
Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments, given with reference to the accompanying drawings in which:
FIG. 1 is a flow chart of a conventional saliency prediction method;
FIG. 2 is a flowchart of a panoramic image saliency prediction method according to an embodiment of the present invention;
FIG. 3 is a block diagram of a panoramic image saliency prediction method according to a preferred embodiment of the present invention;
FIG. 4 is a diagram of an arrangement mechanism in arrangement attention according to a preferred embodiment of the present invention;
fig. 5 is a block diagram of a panoramic image saliency prediction system according to a preferred embodiment of the present invention.
Detailed Description
The following describes embodiments of the present invention in detail: the embodiment is implemented on the premise of the technical scheme of the invention, and detailed implementation modes and specific operation processes are given. It should be noted that variations and modifications can be made by those skilled in the art without departing from the spirit of the invention, which falls within the scope of the invention.
Fig. 1 is a flowchart of a conventional panoramic image saliency prediction method, and it can be seen from the figure that the conventional panoramic image saliency prediction method generally only includes panoramic image input, feature extraction, and prediction result. The problem with this conventional approach is that all features extracted by the CNN model are used for fine-grained saliency prediction, but not all features are effective for fine-grained saliency prediction, i.e. there is a feature redundancy, thus resulting in lower prediction accuracy.
Fig. 2 is a flowchart of a panoramic image saliency prediction method according to an embodiment of the present invention.
Referring to fig. 2, in the panoramic image saliency prediction method based on the arrangement attention feature in the embodiment of the present invention, a template feature map and a channel-by-channel feature map are first extracted, and the template feature map and the channel-by-channel feature map are multiplied to generate a channel-by-channel feature; then the generated channel-by-channel characteristics are sent to an arrangement attention module for arranging the characteristics; and finally, selecting the characteristics useful for fine granularity significance prediction at the characteristic enhancement module, and inputting the characteristics into a convolutional neural network to predict a head fixation point. The method of the embodiment not only can better simulate the human visual attention mechanism, but also can obtain higher prediction accuracy.
Fig. 3 is a frame diagram of a panoramic image saliency prediction method according to a preferred embodiment of the present invention.
Referring to fig. 3, in the preferred embodiment, the panoramic image saliency prediction method based on the arrangement attention characteristics may be performed as follows:
s1: extracting attention patterns of foreground and background parts by using a two-stage branch network based on a ResNet50 prediction network, and carrying out weighted fusion on the obtained foreground attention patterns and background attention patterns to obtain a template feature map (mask);
s2: extracting a channel-by-channel feature map by using the ResNet50 predictive network, wherein the channel-by-channel feature map is output at the last layer of the ResNet50 network;
s3: multiplying the obtained template feature map by the channel-by-channel feature map to generate channel-by-channel features; then automatically learning a ranking score by a proposed ranking mechanism to reveal the importance of each feature map; finally, adding the characteristics (expressed as tensors) generated by the channel-by-channel global maximum pooling layer and the spatial attention characteristics extracted based on CNN element by element to generate an arrangement score corresponding to each characteristic map;
s4: features important for fine granularity significance prediction are selected for feature enhancement, and redundant features are discarded. Finally, the selected useful features are fed into a convolutional neural network to output a predicted saliency map.
In some preferred embodiments, S1 may be performed as follows:
s1.1, using an attention-seeking diagram of the two-stage branch network prediction foreground and background portions of a network based on the res net50 prediction, the formula for performing the prediction in the first stage is as follows:
wherein ,F1 and B1 Representing a predicted foreground attention map and background attention map, M, respectively 1 Is a feature map obtained by predicting a network through ResNet50, phi 1 Andtwo independent ResNet50 predicted networks are shown.
In the second stage, the foreground attention force diagram and the background attention force diagram generated in the first stage are enhanced, and the specific formula is as follows:
wherein ,Fatt and Batt Representing the final predicted foreground attention map and background attention map, respectively.
S1.2, fusing the obtained two attention attempts by adopting a linear weighting method to obtain a template characteristic diagram (mask).
In some preferred embodiments, S2 may be performed as follows:
s2.1: for ResNet50 to predict the feature map of the last layer output of the network, the up-sampling operation and the dimension reduction operation are used to obtain the adjusted feature map, and then the feature map is sent to the arrangement attention module for feature arrangement.
Referring to fig. 4, in a part of the preferred embodiment, S3 may be performed as follows:
s3.1: multiplying the obtained template feature map by the channel-by-channel feature map as a mask to obtain the channel-by-channel feature map;
s3.2: the importance of a channel-by-channel feature map is demonstrated by ranking the network automatically learning ranking scores, and the formula for calculating ranking scores is defined as:
r'=f n (S')+f max (S')
wherein ,fn Is a CNN-based network, f max Is a network comprising a channel-by-channel global maximization pooling layer, S 'represents a channel-by-channel feature map, and r' represents a permutation score.
S3.3: according to the obtained arrangement score, the channel-by-channel characteristic diagram is arranged from large to small, and a specific calculation formula is as follows:
In some preferred embodiments, S4 may be performed as follows:
s4.1: selecting some features important for fine granularity significance prediction according to the arrangement score size of the channel-by-channel feature map and experimental effect, and discarding some features with smaller arrangement scores (redundant features).
S4.2: and sending the selected important features into a convolutional neural network, and finally outputting a predicted saliency map.
Fig. 5 is a block diagram of a panoramic image saliency prediction system based on a permutation attention feature, which can be used to implement the above-described panoramic image saliency prediction method based on permutation attention feature, according to an embodiment of the present invention.
Referring to fig. 5, the panoramic image saliency prediction system based on the arrangement attention feature in this embodiment includes: the device comprises a feature extraction module, a attention arrangement feature module and a feature enhancement module; wherein: the feature extraction module is used for extracting a template feature map and a channel-by-channel feature map, and multiplying the template feature map and the channel-by-channel feature map to generate channel-by-channel features; the attention feature arrangement module is used for carrying out attention feature arrangement on the channel-by-channel features generated by the feature extraction module; the feature enhancement module selects channel-by-channel features useful for fine granularity significance prediction for feature enhancement according to the sequencing result of the arrangement attention feature module, and inputs the selected channel-by-channel features into the convolutional neural network for head gaze point prediction.
In the above embodiment, the feature extraction module includes an attention feature extraction sub-module and a channel-by-channel feature extraction sub-module, where the attention feature extraction module can better capture fine partial attention features (foreground and background areas) in the panoramic image, and perform weighted fusion on the generated foreground and background attention attempts to obtain a template feature map (mask). The channel-by-channel feature extraction submodule extracts a channel-by-channel feature map using a network based on ResNet50, where the channel-by-channel feature map refers to output at the last layer of the ResNet50 network.
The specific implementation techniques of the arrangement attention feature module and the feature enhancement module are the same as those of the corresponding steps in the panoramic image saliency prediction method based on the arrangement attention feature, and are easy to be implemented by those skilled in the art, and are not repeated here.
Based on the above embodiments, in another embodiment of the present invention, there is provided a panoramic image saliency prediction terminal including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor being operable to perform any one of the above methods for predicting panoramic image saliency based on arrangement attention characteristics when executing the program. The method not only can better simulate the human visual attention mechanism, but also can obtain higher prediction accuracy.
Optionally, a memory for storing a program; memory, which may include volatile memory (english) such as random-access memory (RAM), such as static random-access memory (SRAM), double data rate synchronous dynamic random-access memory (Double Data Rate Synchronous Dynamic Random Access Memory, DDR SDRAM), and the like; the memory may also include a non-volatile memory (English) such as a flash memory (English). The memory is used to store computer programs (e.g., application programs, functional modules, etc. that implement the methods described above), computer instructions, etc., which may be stored in one or more memories in a partitioned manner. And the above-described computer programs, computer instructions, data, etc. may be invoked by a processor.
The computer programs, computer instructions, etc. described above may be stored in one or more memories in partitions. And the above-described computer programs, computer instructions, data, etc. may be invoked by a processor.
A processor for executing the computer program stored in the memory to implement the steps in the method according to the above embodiment. Reference may be made in particular to the description of the embodiments of the method described above.
The processor and the memory may be separate structures or may be integrated structures that are integrated together. When the processor and the memory are separate structures, the memory and the processor may be connected by a bus coupling.
It should be noted that, the steps in the method provided by the present invention may be implemented by using corresponding units in the apparatus, etc., and those skilled in the art may refer to a technical solution of the apparatus to implement the step flow of the method, that is, the embodiment in the apparatus may be understood as a preferred example for implementing the method, which is not described herein.
It will be appreciated by those skilled in the art that the apparatus provided by the present invention and its various units may be implemented as logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. by simply programming the logic of the method steps, except for implementing the apparatus provided by the present invention as pure computer readable program code. Therefore, the apparatus provided by the present invention may be regarded as a hardware component, and the units included therein for realizing various functions may also be regarded as structures within the hardware component; the means for achieving the various functions may also be considered as being either a software module for implementing the method or a structure within a hardware component.
The foregoing has been a description of specific embodiments of the invention. It is to be understood that the invention is not limited to the particular embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the claims without affecting the spirit of the invention. The above preferred features may be used alone in any of the embodiments, or in any combination without interfering with each other.
Claims (7)
1. A panoramic image saliency prediction method based on arrangement attention features, comprising:
extracting a template feature map and a channel-by-channel feature map, and multiplying the template feature map and the channel-by-channel feature map to generate channel-by-channel features;
performing attention feature arrangement on the generated channel-by-channel features;
selecting the channel-by-channel characteristics useful for fine granularity significance prediction to perform characteristic enhancement according to the sequencing result, and inputting the selected channel-by-channel characteristics into a convolutional neural network to perform head fixation point prediction;
the extracting the template feature map comprises the following steps:
extracting a foreground attention map and a background attention map using a two-phase branched network based on a ResNet50 predictive network;
carrying out weighted fusion on the obtained foreground attention force diagram and the background attention force diagram to obtain a template feature diagram;
the two-stage branch network extraction foreground attention and background attention profiles using a ResNet50 based predictive network, comprising:
the formula for the prediction in the first stage is as follows:
wherein ,F1 and B1 Representing a predicted foreground attention map and background attention map, M, respectively 1 Is a feature map obtained by predicting a network through ResNet50, phi 1 Andrepresenting two independent ResNet50 predictive networks;
in the second stage, the foreground attention force diagram and the background attention force diagram generated in the first stage are enhanced, and the specific formula is as follows:
wherein ,Fatt and Batt Representing a final predicted foreground attention map and background attention map, M, respectively 2 Is a feature map, phi, predicted by ResNet50 network in the second stage 2 Andrepresenting two independent ResNet50 predictive networks in the second phase;
the extracting the channel-by-channel feature map includes:
channel-by-channel feature maps are extracted using a network predicted based on ResNet50, which is the feature map output at the last layer of the ResNet50 predicted network.
2. The method for predicting the saliency of a panoramic image based on aligned attention features of claim 1, wherein the weighted fusion of the foreground attention map and the background attention map to obtain a template feature map means:
and fusing the obtained foreground attention force diagram and the obtained background attention force diagram by adopting a linear weighting method to obtain a template feature diagram.
3. The method for predicting the saliency of a panoramic image based on arrangement of attention features as recited in claim 1, wherein said arranging attention features on a channel-by-channel basis includes:
the channel-by-channel feature maps are ranked from large to small according to their corresponding scores, and the greater the score of the channel-by-channel feature map, the more important the channel feature is for final fine-grained saliency prediction.
4. A panoramic image significance prediction method based on arrangement attention features according to claim 3, characterized in that said attention feature arrangement of generated channel-by-channel features is implemented as follows:
the importance of a channel-by-channel feature map is demonstrated by ranking the network automatically learning ranking scores, and the formula for calculating ranking scores is defined as:
r'=f n (S')+f max (S')
wherein ,fn Is a CNN-based network, f max Is a network comprising a channel-by-channel global maximum pooling layer, S 'represents a channel-by-channel feature map, and r' represents an arrangement score;
and (3) arranging the channel-by-channel characteristic graphs from large to small according to the obtained arrangement score, wherein a specific calculation formula is as follows:
5. The method of permutation attention feature-based panoramic image saliency prediction according to any of claims 1 to 4, wherein the selecting feature-enhances channel-by-channel features useful for fine-granularity saliency prediction, comprising:
selecting some features important for fine granularity significance prediction according to the arrangement score size and experimental effect of the channel-by-channel feature map, and discarding some features with smaller arrangement scores, namely redundant features;
and sending the selected channel-by-channel characteristics into a convolutional neural network, and outputting a predicted saliency map.
6. A panoramic image saliency prediction system based on a ranking attention feature, comprising:
the feature extraction module is used for extracting a template feature map and a channel-by-channel feature map, and multiplying the template feature map and the channel-by-channel feature map to generate channel-by-channel features;
an attention feature arrangement module for arranging the attention features of the channel-by-channel features generated by the feature extraction module;
the feature enhancement module is used for selecting the channel-by-channel features which are useful for fine granularity significance prediction to conduct feature enhancement according to the sequencing result of the arrangement attention feature module, and inputting the selected channel-by-channel features into a convolutional neural network to conduct head fixation point prediction;
the feature extraction module extracts a template feature map, including:
extracting a foreground attention map and a background attention map using a two-phase branched network based on a ResNet50 predictive network;
carrying out weighted fusion on the obtained foreground attention force diagram and the background attention force diagram to obtain a template feature diagram;
the two-stage branch network extraction foreground attention and background attention profiles using a ResNet50 based predictive network, comprising:
the formula for the prediction in the first stage is as follows:
wherein ,F1 and B1 Representing a predicted foreground attention map and background attention map, M, respectively 1 Is a feature map obtained by predicting a network through ResNet50, phi 1 Andrepresenting two independent ResNet50 predictive networks;
in the second stage, the foreground attention force diagram and the background attention force diagram generated in the first stage are enhanced, and the specific formula is as follows:
wherein ,Fatt and Batt Representing a final predicted foreground attention map and background attention map, M, respectively 2 Is a feature map, phi, predicted by ResNet50 network in the second stage 2 Andrepresenting two independent ResNet50 predictive networks in the second phase;
the feature extraction module extracts a channel-by-channel feature map, including:
channel-by-channel feature maps are extracted using a network predicted based on ResNet50, which is the feature map output at the last layer of the ResNet50 predicted network.
7. A panoramic image saliency prediction terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor is operable to perform the method of any one of claims 1 to 5 when executing the program.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010171615.6A CN111488886B (en) | 2020-03-12 | 2020-03-12 | Panoramic image significance prediction method, system and terminal for arranging attention features |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010171615.6A CN111488886B (en) | 2020-03-12 | 2020-03-12 | Panoramic image significance prediction method, system and terminal for arranging attention features |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111488886A CN111488886A (en) | 2020-08-04 |
CN111488886B true CN111488886B (en) | 2023-04-28 |
Family
ID=71811714
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010171615.6A Active CN111488886B (en) | 2020-03-12 | 2020-03-12 | Panoramic image significance prediction method, system and terminal for arranging attention features |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111488886B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112488122B (en) * | 2020-11-25 | 2024-04-16 | 南京航空航天大学 | Panoramic image visual saliency prediction method based on convolutional neural network |
CN114742170B (en) * | 2022-04-22 | 2023-07-25 | 马上消费金融股份有限公司 | Countermeasure sample generation method, model training method, image recognition method and device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110084249A (en) * | 2019-04-24 | 2019-08-02 | 哈尔滨工业大学 | The image significance detection method paid attention to based on pyramid feature |
CN110110642A (en) * | 2019-04-29 | 2019-08-09 | 华南理工大学 | A kind of pedestrian's recognition methods again based on multichannel attention feature |
CN110414377A (en) * | 2019-07-09 | 2019-11-05 | 武汉科技大学 | A kind of remote sensing images scene classification method based on scale attention network |
CN110648334A (en) * | 2019-09-18 | 2020-01-03 | 中国人民解放***箭军工程大学 | Multi-feature cyclic convolution saliency target detection method based on attention mechanism |
CN110827193A (en) * | 2019-10-21 | 2020-02-21 | 国家广播电视总局广播电视规划院 | Panoramic video saliency detection method based on multi-channel features |
-
2020
- 2020-03-12 CN CN202010171615.6A patent/CN111488886B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110084249A (en) * | 2019-04-24 | 2019-08-02 | 哈尔滨工业大学 | The image significance detection method paid attention to based on pyramid feature |
CN110110642A (en) * | 2019-04-29 | 2019-08-09 | 华南理工大学 | A kind of pedestrian's recognition methods again based on multichannel attention feature |
CN110414377A (en) * | 2019-07-09 | 2019-11-05 | 武汉科技大学 | A kind of remote sensing images scene classification method based on scale attention network |
CN110648334A (en) * | 2019-09-18 | 2020-01-03 | 中国人民解放***箭军工程大学 | Multi-feature cyclic convolution saliency target detection method based on attention mechanism |
CN110827193A (en) * | 2019-10-21 | 2020-02-21 | 国家广播电视总局广播电视规划院 | Panoramic video saliency detection method based on multi-channel features |
Non-Patent Citations (2)
Title |
---|
Shengkai Xiang等.Feature Decomposition and Attention-guided Boundary Refinement for Saliency Detection.《2019 IEEE 3rd Advanced Information Management,Communicates,Electronic and Automation Control Conference》.2019,982-989. * |
崔丽群等.复合域的显著性目标检测方法.《中国图象图形学报》.2018,(第undefined期),72-82. * |
Also Published As
Publication number | Publication date |
---|---|
CN111488886A (en) | 2020-08-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110599492B (en) | Training method and device for image segmentation model, electronic equipment and storage medium | |
JP7286013B2 (en) | Video content recognition method, apparatus, program and computer device | |
CN111563502A (en) | Image text recognition method and device, electronic equipment and computer storage medium | |
CN110837811A (en) | Method, device and equipment for generating semantic segmentation network structure and storage medium | |
CN112040311B (en) | Video image frame supplementing method, device and equipment and storage medium | |
CN111488886B (en) | Panoramic image significance prediction method, system and terminal for arranging attention features | |
WO2023035531A1 (en) | Super-resolution reconstruction method for text image and related device thereof | |
CN113014988B (en) | Video processing method, device, equipment and storage medium | |
JP7267453B2 (en) | image augmentation neural network | |
CN112488923A (en) | Image super-resolution reconstruction method and device, storage medium and electronic equipment | |
US20230143452A1 (en) | Method and apparatus for generating image, electronic device and storage medium | |
US11908103B2 (en) | Multi-scale-factor image super resolution with micro-structured masks | |
CN112070040A (en) | Text line detection method for video subtitles | |
CN114529574A (en) | Image matting method and device based on image segmentation, computer equipment and medium | |
US20230067934A1 (en) | Action Recognition Method, Apparatus and Device, Storage Medium and Computer Program Product | |
CN115147935B (en) | Behavior identification method based on joint point, electronic device and storage medium | |
CN113066089A (en) | Real-time image semantic segmentation network based on attention guide mechanism | |
JP2023001926A (en) | Method and apparatus of fusing image, method and apparatus of training image fusion model, electronic device, storage medium and computer program | |
CN114119373A (en) | Image cropping method and device and electronic equipment | |
CN113177432A (en) | Head pose estimation method, system, device and medium based on multi-scale lightweight network | |
JP2023543964A (en) | Image processing method, image processing device, electronic device, storage medium and computer program | |
CN116757923B (en) | Image generation method and device, electronic equipment and storage medium | |
CN116975347A (en) | Image generation model training method and related device | |
US20230409899A1 (en) | Computer vision neural networks with learned tokenization | |
CN117255998A (en) | Unsupervised learning of object representations from video sequences using spatial and temporal attention |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |