CN111488886B - Panoramic image significance prediction method, system and terminal for arranging attention features - Google Patents

Panoramic image significance prediction method, system and terminal for arranging attention features Download PDF

Info

Publication number
CN111488886B
CN111488886B CN202010171615.6A CN202010171615A CN111488886B CN 111488886 B CN111488886 B CN 111488886B CN 202010171615 A CN202010171615 A CN 202010171615A CN 111488886 B CN111488886 B CN 111488886B
Authority
CN
China
Prior art keywords
channel
attention
feature
map
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010171615.6A
Other languages
Chinese (zh)
Other versions
CN111488886A (en
Inventor
杨小康
朱丹丹
闵雄阔
朱煜程
朱文瀚
翟广涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202010171615.6A priority Critical patent/CN111488886B/en
Publication of CN111488886A publication Critical patent/CN111488886A/en
Application granted granted Critical
Publication of CN111488886B publication Critical patent/CN111488886B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a panoramic image significance prediction method based on arrangement attention characteristics, which comprises the following steps: extracting a template feature map and a channel-by-channel feature map, and multiplying the template feature map and the channel-by-channel feature map to generate channel-by-channel features; performing attention feature arrangement on the generated channel-by-channel features; and selecting the channel-by-channel characteristics which are useful for fine granularity significance prediction according to the sequencing result, and inputting the selected channel-by-channel characteristics into a convolutional neural network for head fixation point prediction. The invention also provides a system and a terminal corresponding to the method. The invention not only can better simulate the human visual attention mechanism, but also can obtain higher prediction accuracy.

Description

Panoramic image significance prediction method, system and terminal for arranging attention features
Technical Field
The invention relates to the technical field of image saliency prediction, in particular to a panoramic image saliency prediction method based on arrangement attention features, and especially relates to a panoramic image saliency prediction method based on partial attention features, channel-by-channel features and arrangement attention features.
Background
In recent years, with the rapid development of mobile internet technology and advanced display technology, virtual Reality (VR) is gradually moving into people's lives and is widely used. Among them, the presentation of panoramic images and panoramic video by a Head Mounted Display (HMD) is a very important application of VR technology. Unlike traditional images and videos, panoramic images and panoramic videos may provide users with an immersive and interactive visual experience. Specifically, users can freely move their heads through the head-mounted display to view contents having a view angle field range within 360 ° x 180 °. In other words, people can freely rotate their heads to view the area of the panoramic image that is most attractive to people's vision. Thus, it can be seen that head gaze point is critical to exploring and modeling visual attention in panoramic images. It is necessary to predict the head gaze point in the panoramic image.
Models for significance prediction of head gaze points in panoramic images can be divided into two categories: one is a significance prediction method based on low-level feature extraction; the other category is a significance prediction method based on high-level semantic feature extraction of the deep learning technology. Among them, for the first type of saliency prediction method, representative work is "Gbvs360, BMS360, prosal: extending existing saliency prediction models from 2d to omnidirectional images" published by Lebreton et al in 2018, "Signal Processing: image Communication", which is a saliency prediction method of BMS360 and Gbvs360 proposed by expanding the conventional two saliency prediction methods BMS and Gbvs so that they are applicable to panoramic images.
In addition, there is "The prediction of head and eye movement for 360 degranulation images" published by Zhu et al in 2018, "Signal Processing: image Communication," which simulate a viewing angle window by projecting a panoramic image into a plurality of view blocks, then extracting bottom-up and top-down features on the plurality of view blocks, and finally fusing the extracted features to obtain a salient map of the head gaze point. However, these methods are heuristic and the accuracy of the significance prediction is not high. The second category is significance prediction methods based on deep learning, and the current method with better performance is Salgan: visual saliency prediction with adversarial networks published by Pan et al in 2018, "CVPR Scene Understanding Workshop," which realizes significance prediction by introducing a challenge sample and performing challenge training. However, when this type of CNN model based on various varieties performs saliency prediction on a panoramic image, not all features extracted through the CNN model are useful for final fine-grained saliency prediction, i.e., there are cases where feature redundancy, which may adversely affect saliency prediction.
Disclosure of Invention
In view of the above-mentioned shortcomings in the prior art, the present invention aims to provide a panoramic image saliency prediction method, a system and a terminal based on arrangement attention features, and a panoramic image saliency prediction based on partial attention features (foreground and background attention force), channel-by-channel features and arrangement attention feature models.
According to a first aspect of the present invention, there is provided a panoramic image saliency prediction method based on arrangement attention characteristics, comprising:
extracting a template feature map and a channel-by-channel feature map, and multiplying the template feature map and the channel-by-channel feature map to generate channel-by-channel features;
performing attention feature arrangement on the generated channel-by-channel features;
and selecting the channel-by-channel characteristics which are useful for fine granularity significance prediction according to the sequencing result, and inputting the selected channel-by-channel characteristics into a convolutional neural network for head fixation point prediction.
Optionally, the extracting the template feature map includes:
extracting a foreground attention map and a background attention map using a two-phase branched network based on a ResNet50 predictive network;
and carrying out weighted fusion on the obtained foreground attention force diagram and the obtained background attention force diagram to obtain a template feature diagram.
Optionally, the extracting the foreground attention map and the background attention map using the two-stage branch network of the ResNet 50-based predictive network includes:
the formula for the prediction in the first stage is as follows:
Figure BDA0002409373320000021
wherein ,F1 and B1 Representing a predicted foreground attention map and background attention map, M, respectively 1 Is a feature map obtained by predicting a network through ResNet50, phi 1 And
Figure BDA0002409373320000031
representing two independent ResNet50 predictive networks;
in the second stage, the foreground attention force diagram and the background attention force diagram generated in the first stage are enhanced, and the specific formula is as follows:
Figure BDA0002409373320000032
wherein ,Fatt and Batt Representing the final predicted foreground attention map and background attention map, respectively.
Optionally, the obtained foreground attention map and the obtained background attention map are subjected to weighted fusion to obtain a template feature map, which means that:
and fusing the obtained foreground attention force diagram and the obtained background attention force diagram by adopting a linear weighting method to obtain a template feature diagram.
Optionally, the extracting the channel-by-channel feature map includes:
channel-by-channel feature maps are extracted using a network predicted based on ResNet50, which is the feature map output at the last layer of the ResNet50 predicted network.
Optionally, the ranking the generated channel-by-channel features with attention features includes:
the channel-by-channel feature maps are ranked from large to small according to their corresponding scores, and the greater the score of the channel-by-channel feature map, the more important the channel feature is for final fine-grained saliency prediction.
Optionally, the channel-by-channel feature to be generated is arranged for attention features, and is implemented according to the following method:
the importance of a channel-by-channel feature map is demonstrated by ranking the network automatically learning ranking scores, and the formula for calculating ranking scores is defined as:
r'=f n (S')+f max (S')
wherein ,fn Is a CNN-based network, f max Is a network comprising a channel-by-channel global maximum pooling layer, S 'represents a channel-by-channel feature map, and r' represents an arrangement score;
and (3) arranging the channel-by-channel characteristic graphs from large to small according to the obtained arrangement score, wherein a specific calculation formula is as follows:
Figure BDA0002409373320000033
wherein ,
Figure BDA0002409373320000034
representing the ordered channel-by-channel feature map after alignment.
Optionally, the selecting feature enhancement of the channel-by-channel feature useful for fine-grained saliency prediction comprises:
selecting some features important for fine granularity significance prediction according to the arrangement score size and experimental effect of the channel-by-channel feature map, and discarding some features with smaller arrangement scores, namely redundant features;
and sending the selected channel-by-channel characteristics into a convolutional neural network, and outputting a predicted saliency map.
According to a second aspect of the present invention, there is provided a panoramic image saliency prediction system based on arrangement attention characteristics, comprising:
the feature extraction module is used for extracting a template feature map and a channel-by-channel feature map, and multiplying the template feature map and the channel-by-channel feature map to generate channel-by-channel features;
an attention feature arrangement module for arranging the attention features of the channel-by-channel features generated by the feature extraction module;
and the characteristic enhancement module is used for selecting the channel-by-channel characteristics which are useful for fine granularity significance prediction to carry out characteristic enhancement according to the sequencing result of the arrangement attention characteristic module, and inputting the selected channel-by-channel characteristics into a convolutional neural network to carry out head fixation point prediction.
According to a third aspect of the present invention, there is provided a panoramic image saliency prediction terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, said processor being operable to perform the above-described panoramic image saliency prediction method based on a permutation attention characteristic when said program is executed.
Compared with the prior art, the invention has the following beneficial effects:
according to the method, the system and the terminal, after the template feature map and the channel-by-channel feature map are extracted, the attention is arranged, the features which are useful for fine granularity saliency prediction are arranged and selected based on the score index, and the method can be used for obtaining high saliency prediction accuracy.
According to the method, the system and the terminal, partial attention (foreground and background attention) feature extraction and channel-by-channel feature extraction are organically integrated together, training is carried out in an end-to-end mode, a human visual attention mechanism can be well simulated, and meanwhile, high prediction accuracy can be obtained.
Drawings
Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments, given with reference to the accompanying drawings in which:
FIG. 1 is a flow chart of a conventional saliency prediction method;
FIG. 2 is a flowchart of a panoramic image saliency prediction method according to an embodiment of the present invention;
FIG. 3 is a block diagram of a panoramic image saliency prediction method according to a preferred embodiment of the present invention;
FIG. 4 is a diagram of an arrangement mechanism in arrangement attention according to a preferred embodiment of the present invention;
fig. 5 is a block diagram of a panoramic image saliency prediction system according to a preferred embodiment of the present invention.
Detailed Description
The following describes embodiments of the present invention in detail: the embodiment is implemented on the premise of the technical scheme of the invention, and detailed implementation modes and specific operation processes are given. It should be noted that variations and modifications can be made by those skilled in the art without departing from the spirit of the invention, which falls within the scope of the invention.
Fig. 1 is a flowchart of a conventional panoramic image saliency prediction method, and it can be seen from the figure that the conventional panoramic image saliency prediction method generally only includes panoramic image input, feature extraction, and prediction result. The problem with this conventional approach is that all features extracted by the CNN model are used for fine-grained saliency prediction, but not all features are effective for fine-grained saliency prediction, i.e. there is a feature redundancy, thus resulting in lower prediction accuracy.
Fig. 2 is a flowchart of a panoramic image saliency prediction method according to an embodiment of the present invention.
Referring to fig. 2, in the panoramic image saliency prediction method based on the arrangement attention feature in the embodiment of the present invention, a template feature map and a channel-by-channel feature map are first extracted, and the template feature map and the channel-by-channel feature map are multiplied to generate a channel-by-channel feature; then the generated channel-by-channel characteristics are sent to an arrangement attention module for arranging the characteristics; and finally, selecting the characteristics useful for fine granularity significance prediction at the characteristic enhancement module, and inputting the characteristics into a convolutional neural network to predict a head fixation point. The method of the embodiment not only can better simulate the human visual attention mechanism, but also can obtain higher prediction accuracy.
Fig. 3 is a frame diagram of a panoramic image saliency prediction method according to a preferred embodiment of the present invention.
Referring to fig. 3, in the preferred embodiment, the panoramic image saliency prediction method based on the arrangement attention characteristics may be performed as follows:
s1: extracting attention patterns of foreground and background parts by using a two-stage branch network based on a ResNet50 prediction network, and carrying out weighted fusion on the obtained foreground attention patterns and background attention patterns to obtain a template feature map (mask);
s2: extracting a channel-by-channel feature map by using the ResNet50 predictive network, wherein the channel-by-channel feature map is output at the last layer of the ResNet50 network;
s3: multiplying the obtained template feature map by the channel-by-channel feature map to generate channel-by-channel features; then automatically learning a ranking score by a proposed ranking mechanism to reveal the importance of each feature map; finally, adding the characteristics (expressed as tensors) generated by the channel-by-channel global maximum pooling layer and the spatial attention characteristics extracted based on CNN element by element to generate an arrangement score corresponding to each characteristic map;
s4: features important for fine granularity significance prediction are selected for feature enhancement, and redundant features are discarded. Finally, the selected useful features are fed into a convolutional neural network to output a predicted saliency map.
In some preferred embodiments, S1 may be performed as follows:
s1.1, using an attention-seeking diagram of the two-stage branch network prediction foreground and background portions of a network based on the res net50 prediction, the formula for performing the prediction in the first stage is as follows:
Figure BDA0002409373320000061
wherein ,F1 and B1 Representing a predicted foreground attention map and background attention map, M, respectively 1 Is a feature map obtained by predicting a network through ResNet50, phi 1 And
Figure BDA0002409373320000062
two independent ResNet50 predicted networks are shown.
In the second stage, the foreground attention force diagram and the background attention force diagram generated in the first stage are enhanced, and the specific formula is as follows:
Figure BDA0002409373320000063
wherein ,Fatt and Batt Representing the final predicted foreground attention map and background attention map, respectively.
S1.2, fusing the obtained two attention attempts by adopting a linear weighting method to obtain a template characteristic diagram (mask).
In some preferred embodiments, S2 may be performed as follows:
s2.1: for ResNet50 to predict the feature map of the last layer output of the network, the up-sampling operation and the dimension reduction operation are used to obtain the adjusted feature map, and then the feature map is sent to the arrangement attention module for feature arrangement.
Referring to fig. 4, in a part of the preferred embodiment, S3 may be performed as follows:
s3.1: multiplying the obtained template feature map by the channel-by-channel feature map as a mask to obtain the channel-by-channel feature map;
s3.2: the importance of a channel-by-channel feature map is demonstrated by ranking the network automatically learning ranking scores, and the formula for calculating ranking scores is defined as:
r'=f n (S')+f max (S')
wherein ,fn Is a CNN-based network, f max Is a network comprising a channel-by-channel global maximization pooling layer, S 'represents a channel-by-channel feature map, and r' represents a permutation score.
S3.3: according to the obtained arrangement score, the channel-by-channel characteristic diagram is arranged from large to small, and a specific calculation formula is as follows:
Figure BDA0002409373320000064
wherein ,
Figure BDA0002409373320000071
Representing the ordered channel-by-channel feature map after alignment.
In some preferred embodiments, S4 may be performed as follows:
s4.1: selecting some features important for fine granularity significance prediction according to the arrangement score size of the channel-by-channel feature map and experimental effect, and discarding some features with smaller arrangement scores (redundant features).
S4.2: and sending the selected important features into a convolutional neural network, and finally outputting a predicted saliency map.
Fig. 5 is a block diagram of a panoramic image saliency prediction system based on a permutation attention feature, which can be used to implement the above-described panoramic image saliency prediction method based on permutation attention feature, according to an embodiment of the present invention.
Referring to fig. 5, the panoramic image saliency prediction system based on the arrangement attention feature in this embodiment includes: the device comprises a feature extraction module, a attention arrangement feature module and a feature enhancement module; wherein: the feature extraction module is used for extracting a template feature map and a channel-by-channel feature map, and multiplying the template feature map and the channel-by-channel feature map to generate channel-by-channel features; the attention feature arrangement module is used for carrying out attention feature arrangement on the channel-by-channel features generated by the feature extraction module; the feature enhancement module selects channel-by-channel features useful for fine granularity significance prediction for feature enhancement according to the sequencing result of the arrangement attention feature module, and inputs the selected channel-by-channel features into the convolutional neural network for head gaze point prediction.
In the above embodiment, the feature extraction module includes an attention feature extraction sub-module and a channel-by-channel feature extraction sub-module, where the attention feature extraction module can better capture fine partial attention features (foreground and background areas) in the panoramic image, and perform weighted fusion on the generated foreground and background attention attempts to obtain a template feature map (mask). The channel-by-channel feature extraction submodule extracts a channel-by-channel feature map using a network based on ResNet50, where the channel-by-channel feature map refers to output at the last layer of the ResNet50 network.
The specific implementation techniques of the arrangement attention feature module and the feature enhancement module are the same as those of the corresponding steps in the panoramic image saliency prediction method based on the arrangement attention feature, and are easy to be implemented by those skilled in the art, and are not repeated here.
Based on the above embodiments, in another embodiment of the present invention, there is provided a panoramic image saliency prediction terminal including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor being operable to perform any one of the above methods for predicting panoramic image saliency based on arrangement attention characteristics when executing the program. The method not only can better simulate the human visual attention mechanism, but also can obtain higher prediction accuracy.
Optionally, a memory for storing a program; memory, which may include volatile memory (english) such as random-access memory (RAM), such as static random-access memory (SRAM), double data rate synchronous dynamic random-access memory (Double Data Rate Synchronous Dynamic Random Access Memory, DDR SDRAM), and the like; the memory may also include a non-volatile memory (English) such as a flash memory (English). The memory is used to store computer programs (e.g., application programs, functional modules, etc. that implement the methods described above), computer instructions, etc., which may be stored in one or more memories in a partitioned manner. And the above-described computer programs, computer instructions, data, etc. may be invoked by a processor.
The computer programs, computer instructions, etc. described above may be stored in one or more memories in partitions. And the above-described computer programs, computer instructions, data, etc. may be invoked by a processor.
A processor for executing the computer program stored in the memory to implement the steps in the method according to the above embodiment. Reference may be made in particular to the description of the embodiments of the method described above.
The processor and the memory may be separate structures or may be integrated structures that are integrated together. When the processor and the memory are separate structures, the memory and the processor may be connected by a bus coupling.
It should be noted that, the steps in the method provided by the present invention may be implemented by using corresponding units in the apparatus, etc., and those skilled in the art may refer to a technical solution of the apparatus to implement the step flow of the method, that is, the embodiment in the apparatus may be understood as a preferred example for implementing the method, which is not described herein.
It will be appreciated by those skilled in the art that the apparatus provided by the present invention and its various units may be implemented as logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. by simply programming the logic of the method steps, except for implementing the apparatus provided by the present invention as pure computer readable program code. Therefore, the apparatus provided by the present invention may be regarded as a hardware component, and the units included therein for realizing various functions may also be regarded as structures within the hardware component; the means for achieving the various functions may also be considered as being either a software module for implementing the method or a structure within a hardware component.
The foregoing has been a description of specific embodiments of the invention. It is to be understood that the invention is not limited to the particular embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the claims without affecting the spirit of the invention. The above preferred features may be used alone in any of the embodiments, or in any combination without interfering with each other.

Claims (7)

1. A panoramic image saliency prediction method based on arrangement attention features, comprising:
extracting a template feature map and a channel-by-channel feature map, and multiplying the template feature map and the channel-by-channel feature map to generate channel-by-channel features;
performing attention feature arrangement on the generated channel-by-channel features;
selecting the channel-by-channel characteristics useful for fine granularity significance prediction to perform characteristic enhancement according to the sequencing result, and inputting the selected channel-by-channel characteristics into a convolutional neural network to perform head fixation point prediction;
the extracting the template feature map comprises the following steps:
extracting a foreground attention map and a background attention map using a two-phase branched network based on a ResNet50 predictive network;
carrying out weighted fusion on the obtained foreground attention force diagram and the background attention force diagram to obtain a template feature diagram;
the two-stage branch network extraction foreground attention and background attention profiles using a ResNet50 based predictive network, comprising:
the formula for the prediction in the first stage is as follows:
F 1 =φ 1 (M 1 ),
Figure FDA0004082118440000011
wherein ,F1 and B1 Representing a predicted foreground attention map and background attention map, M, respectively 1 Is a feature map obtained by predicting a network through ResNet50, phi 1 And
Figure FDA0004082118440000012
representing two independent ResNet50 predictive networks;
in the second stage, the foreground attention force diagram and the background attention force diagram generated in the first stage are enhanced, and the specific formula is as follows:
F att =φ 2 (M 2 |F 1 ,B 1 ),
Figure FDA0004082118440000013
wherein ,Fatt and Batt Representing a final predicted foreground attention map and background attention map, M, respectively 2 Is a feature map, phi, predicted by ResNet50 network in the second stage 2 And
Figure FDA0004082118440000014
representing two independent ResNet50 predictive networks in the second phase;
the extracting the channel-by-channel feature map includes:
channel-by-channel feature maps are extracted using a network predicted based on ResNet50, which is the feature map output at the last layer of the ResNet50 predicted network.
2. The method for predicting the saliency of a panoramic image based on aligned attention features of claim 1, wherein the weighted fusion of the foreground attention map and the background attention map to obtain a template feature map means:
and fusing the obtained foreground attention force diagram and the obtained background attention force diagram by adopting a linear weighting method to obtain a template feature diagram.
3. The method for predicting the saliency of a panoramic image based on arrangement of attention features as recited in claim 1, wherein said arranging attention features on a channel-by-channel basis includes:
the channel-by-channel feature maps are ranked from large to small according to their corresponding scores, and the greater the score of the channel-by-channel feature map, the more important the channel feature is for final fine-grained saliency prediction.
4. A panoramic image significance prediction method based on arrangement attention features according to claim 3, characterized in that said attention feature arrangement of generated channel-by-channel features is implemented as follows:
the importance of a channel-by-channel feature map is demonstrated by ranking the network automatically learning ranking scores, and the formula for calculating ranking scores is defined as:
r'=f n (S')+f max (S')
wherein ,fn Is a CNN-based network, f max Is a network comprising a channel-by-channel global maximum pooling layer, S 'represents a channel-by-channel feature map, and r' represents an arrangement score;
and (3) arranging the channel-by-channel characteristic graphs from large to small according to the obtained arrangement score, wherein a specific calculation formula is as follows:
Figure FDA0004082118440000021
wherein ,
Figure FDA0004082118440000022
representing the ordered channel-by-channel feature map after alignment.
5. The method of permutation attention feature-based panoramic image saliency prediction according to any of claims 1 to 4, wherein the selecting feature-enhances channel-by-channel features useful for fine-granularity saliency prediction, comprising:
selecting some features important for fine granularity significance prediction according to the arrangement score size and experimental effect of the channel-by-channel feature map, and discarding some features with smaller arrangement scores, namely redundant features;
and sending the selected channel-by-channel characteristics into a convolutional neural network, and outputting a predicted saliency map.
6. A panoramic image saliency prediction system based on a ranking attention feature, comprising:
the feature extraction module is used for extracting a template feature map and a channel-by-channel feature map, and multiplying the template feature map and the channel-by-channel feature map to generate channel-by-channel features;
an attention feature arrangement module for arranging the attention features of the channel-by-channel features generated by the feature extraction module;
the feature enhancement module is used for selecting the channel-by-channel features which are useful for fine granularity significance prediction to conduct feature enhancement according to the sequencing result of the arrangement attention feature module, and inputting the selected channel-by-channel features into a convolutional neural network to conduct head fixation point prediction;
the feature extraction module extracts a template feature map, including:
extracting a foreground attention map and a background attention map using a two-phase branched network based on a ResNet50 predictive network;
carrying out weighted fusion on the obtained foreground attention force diagram and the background attention force diagram to obtain a template feature diagram;
the two-stage branch network extraction foreground attention and background attention profiles using a ResNet50 based predictive network, comprising:
the formula for the prediction in the first stage is as follows:
F 1 =φ 1 (M 1 ),
Figure FDA0004082118440000031
wherein ,F1 and B1 Representing a predicted foreground attention map and background attention map, M, respectively 1 Is a feature map obtained by predicting a network through ResNet50, phi 1 And
Figure FDA0004082118440000032
representing two independent ResNet50 predictive networks;
in the second stage, the foreground attention force diagram and the background attention force diagram generated in the first stage are enhanced, and the specific formula is as follows:
F att =φ 2 (M 2 |F 1 ,B 1 ),
Figure FDA0004082118440000033
wherein ,Fatt and Batt Representing a final predicted foreground attention map and background attention map, M, respectively 2 Is a feature map, phi, predicted by ResNet50 network in the second stage 2 And
Figure FDA0004082118440000034
representing two independent ResNet50 predictive networks in the second phase;
the feature extraction module extracts a channel-by-channel feature map, including:
channel-by-channel feature maps are extracted using a network predicted based on ResNet50, which is the feature map output at the last layer of the ResNet50 predicted network.
7. A panoramic image saliency prediction terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor is operable to perform the method of any one of claims 1 to 5 when executing the program.
CN202010171615.6A 2020-03-12 2020-03-12 Panoramic image significance prediction method, system and terminal for arranging attention features Active CN111488886B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010171615.6A CN111488886B (en) 2020-03-12 2020-03-12 Panoramic image significance prediction method, system and terminal for arranging attention features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010171615.6A CN111488886B (en) 2020-03-12 2020-03-12 Panoramic image significance prediction method, system and terminal for arranging attention features

Publications (2)

Publication Number Publication Date
CN111488886A CN111488886A (en) 2020-08-04
CN111488886B true CN111488886B (en) 2023-04-28

Family

ID=71811714

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010171615.6A Active CN111488886B (en) 2020-03-12 2020-03-12 Panoramic image significance prediction method, system and terminal for arranging attention features

Country Status (1)

Country Link
CN (1) CN111488886B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112488122B (en) * 2020-11-25 2024-04-16 南京航空航天大学 Panoramic image visual saliency prediction method based on convolutional neural network
CN114742170B (en) * 2022-04-22 2023-07-25 马上消费金融股份有限公司 Countermeasure sample generation method, model training method, image recognition method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084249A (en) * 2019-04-24 2019-08-02 哈尔滨工业大学 The image significance detection method paid attention to based on pyramid feature
CN110110642A (en) * 2019-04-29 2019-08-09 华南理工大学 A kind of pedestrian's recognition methods again based on multichannel attention feature
CN110414377A (en) * 2019-07-09 2019-11-05 武汉科技大学 A kind of remote sensing images scene classification method based on scale attention network
CN110648334A (en) * 2019-09-18 2020-01-03 中国人民解放***箭军工程大学 Multi-feature cyclic convolution saliency target detection method based on attention mechanism
CN110827193A (en) * 2019-10-21 2020-02-21 国家广播电视总局广播电视规划院 Panoramic video saliency detection method based on multi-channel features

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084249A (en) * 2019-04-24 2019-08-02 哈尔滨工业大学 The image significance detection method paid attention to based on pyramid feature
CN110110642A (en) * 2019-04-29 2019-08-09 华南理工大学 A kind of pedestrian's recognition methods again based on multichannel attention feature
CN110414377A (en) * 2019-07-09 2019-11-05 武汉科技大学 A kind of remote sensing images scene classification method based on scale attention network
CN110648334A (en) * 2019-09-18 2020-01-03 中国人民解放***箭军工程大学 Multi-feature cyclic convolution saliency target detection method based on attention mechanism
CN110827193A (en) * 2019-10-21 2020-02-21 国家广播电视总局广播电视规划院 Panoramic video saliency detection method based on multi-channel features

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Shengkai Xiang等.Feature Decomposition and Attention-guided Boundary Refinement for Saliency Detection.《2019 IEEE 3rd Advanced Information Management,Communicates,Electronic and Automation Control Conference》.2019,982-989. *
崔丽群等.复合域的显著性目标检测方法.《中国图象图形学报》.2018,(第undefined期),72-82. *

Also Published As

Publication number Publication date
CN111488886A (en) 2020-08-04

Similar Documents

Publication Publication Date Title
CN110599492B (en) Training method and device for image segmentation model, electronic equipment and storage medium
JP7286013B2 (en) Video content recognition method, apparatus, program and computer device
CN111563502A (en) Image text recognition method and device, electronic equipment and computer storage medium
CN110837811A (en) Method, device and equipment for generating semantic segmentation network structure and storage medium
CN112040311B (en) Video image frame supplementing method, device and equipment and storage medium
CN111488886B (en) Panoramic image significance prediction method, system and terminal for arranging attention features
WO2023035531A1 (en) Super-resolution reconstruction method for text image and related device thereof
CN113014988B (en) Video processing method, device, equipment and storage medium
JP7267453B2 (en) image augmentation neural network
CN112488923A (en) Image super-resolution reconstruction method and device, storage medium and electronic equipment
US20230143452A1 (en) Method and apparatus for generating image, electronic device and storage medium
US11908103B2 (en) Multi-scale-factor image super resolution with micro-structured masks
CN112070040A (en) Text line detection method for video subtitles
CN114529574A (en) Image matting method and device based on image segmentation, computer equipment and medium
US20230067934A1 (en) Action Recognition Method, Apparatus and Device, Storage Medium and Computer Program Product
CN115147935B (en) Behavior identification method based on joint point, electronic device and storage medium
CN113066089A (en) Real-time image semantic segmentation network based on attention guide mechanism
JP2023001926A (en) Method and apparatus of fusing image, method and apparatus of training image fusion model, electronic device, storage medium and computer program
CN114119373A (en) Image cropping method and device and electronic equipment
CN113177432A (en) Head pose estimation method, system, device and medium based on multi-scale lightweight network
JP2023543964A (en) Image processing method, image processing device, electronic device, storage medium and computer program
CN116757923B (en) Image generation method and device, electronic equipment and storage medium
CN116975347A (en) Image generation model training method and related device
US20230409899A1 (en) Computer vision neural networks with learned tokenization
CN117255998A (en) Unsupervised learning of object representations from video sequences using spatial and temporal attention

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant