CN114860979A

CN114860979A - Image retrieval method and system based on region of interest extraction

Info

Publication number: CN114860979A
Application number: CN202210575033.3A
Authority: CN
Inventors: 不公告发明人
Original assignee: Changzhou Naisilai Technology Co ltd
Current assignee: Changzhou Naisilai Technology Co ltd
Priority date: 2022-05-24
Filing date: 2022-05-24
Publication date: 2022-08-05

Abstract

The invention discloses an image retrieval method and system based on region of interest extraction, and belongs to the technical field of image processing. The method comprises the following steps: constructing a visual fixation model, and extracting an interested area in an original picture/original video based on the visual fixation model; extracting characteristic values in the region of interest, and performing relevance storage on the characteristic values and corresponding original pictures/original videos according to a preset relation to obtain a retrieval database; establishing a retrieval interface and inputting a retrieval instruction; and searching out and storing original pictures/original videos meeting the requirements in a retrieval database based on the retrieval instruction to obtain a picture/video library. The visual attention model-based region of interest detection in the invention adds a human visual attention mechanism, and is more in line with the human visual perception process.

Description

Image retrieval method and system based on region of interest extraction

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to an image retrieval method and system based on region of interest extraction.

Background

In recent years, with the rapid development of network and multimedia technologies, video and image-based retrieval technologies have attracted more and more attention. In the conventional method for image retrieval based on text, annotation, and the like, a specific flow is shown in fig. 4. The method mainly adopts a manual mode to label and annotate the image video data, stores labels and the image video data in a correlation mode, and retrieves videos and image data by retrieving label keywords.

However, the above method has some drawbacks: firstly, with the sharp increase of files such as image video and the like, the workload is enormous by using a manual filling method; secondly, understanding of the image video is different for each person, so that inaccurate labeling is easily caused, and a retrieval error is easily caused; thirdly, the method cannot meet the requirements of user personalization, such as retrieval requirements of image low-level visual feature contents and the like.

Disclosure of Invention

The invention provides an image retrieval technology for extracting a region of interest based on a visual attention model, aiming at solving the problems of low efficiency, inaccuracy, incapability of meeting personalized retrieval requirements and the like of the image retrieval technology in the background technology.

The invention adopts the following technical scheme: an image retrieval method based on region of interest extraction at least comprises the following steps:

constructing a visual fixation model, and extracting an interested area in an original picture/original video based on the visual fixation model;

extracting characteristic values in the region of interest, and performing relevance storage on the characteristic values and corresponding original pictures/original videos according to a preset relation to obtain a retrieval database;

establishing a retrieval interface and inputting a retrieval instruction; searching out original pictures/original videos meeting the requirements in a retrieval database based on the retrieval instruction and storing the original pictures/videos to obtain a picture/video library; the extraction process of extracting the region of interest is as follows:

firstly, processing data of an original picture/original video, and extracting a saliency map by using a visual fixation model;

step two, acquiring at least one visual attention focus in the saliency map by utilizing a competition mechanism;

and step three, taking the visual attention focus as a seed point for region growth segmentation, and obtaining the region of interest by a region growth method.

In a further embodiment, the step one specifically includes the following steps:

step 101, filtering an original image/original video by using a multi-scale multi-channel filter, extracting visual features, and obtaining a feature map about the visual features; the characteristic diagram at least comprises: a color feature map, a brightness feature map and a direction feature map;

102, selecting a scale according to requirements, defining the center c and the periphery s of each feature map, and respectively calculating the central periphery difference of the scale of the center c and the scale of the periphery s in the corresponding feature map; the result of the central peripheral difference is an attention map corresponding to the visual features;

103, normalizing the attention images to respectively obtain normalized color attention images

Luminance attention map

And direction attention map

Step 104, obtaining a saliency map SM by adopting the following formula:

in a further embodiment, the second step specifically includes the following steps:

step 201, obtaining a plurality of significance levels in a significance map, sorting the significance levels in a sequence from strong to weak, and selecting points corresponding to the top 10 significance levels as candidate points;

step 202, comparing the saliency corresponding to the candidate points with a first threshold t in sequence, wherein the candidate points with the saliency greater than the first threshold t are the visual attention focus.

In a further embodiment, the third step specifically includes the following steps:

calculating Euclidean distance between the visual attention focuses, if the Euclidean distance is smaller than a second threshold value d, merging the corresponding two visual attention focuses by adopting the following formula,

resulting in a merged visual focus of attention (X, Y):

in the formula (x) _i ,y _i ) And (x) _j ,y _j ) Respectively representing the coordinates of the ith visual attention focus and the jth visual attention focus, wherein i is not equal to j; v. of _s,i And v _s,j The gray values of the ith and jth visual attention focuses in the saliency map are respectively represented.

In a further embodiment, the step 101 is further represented by:

firstly, filtering an image BY using a Gaussian weight matrix, performing down-sampling to obtain a Gaussian pyramid with n layers, extracting color, brightness and direction characteristics under each scale sigma of the pyramid to form corresponding RG (sigma), BY (sigma), I (sigma) and O (sigma) characteristic pyramids, wherein sigma belongs to [0, n-1 ];

wherein, the color characteristics in the color characteristic diagram are as follows: red colour

Green colour

Blue color

And yellow

The luminance characteristics in the luminance characteristic diagram are represented as: i ═ r + g + b)/3;

the directional characteristic is a four directional characteristic formed through transformation of a Gabor wavelet in four directions of θ ═ {0 °,45 °,90 °,135 ° } on the basis of the luminance characteristic, where r, g, and b are three components of red, green, and blue of the original image.

In a further embodiment, the calculation method for calculating the central perimeter difference between the central c-scale and the perimeter s-scale in the corresponding feature map is as follows:

wherein RG (c, s) represents the central peripheral difference of the red-green color feature map, BY (c, s) represents the central peripheral difference of the blue-yellow color feature map, I (c, s) represents the central peripheral difference of the luminance feature map, O (c, s, θ) represents the central peripheral difference of the directional feature map, r (c) and r(s) represent the center and periphery of the red feature map, respectively, g (c) and g(s) represent the center and periphery of the green feature map, respectively, b (c) and b(s) represent the center and periphery of the blue feature map, respectively, y (c) and y(s) represent the center and periphery of the yellow feature map, respectively, I (c) and I(s) represent the center and periphery of the luminance feature map, respectively, and O (c, θ) and O (s, θ) represent the center and periphery of the directional feature map, respectively.

In a further embodiment, the saliency is obtained as follows:

in the formula, σ _c And σ _s Scale factors representing the center c and the periphery s, respectively, (x, y) being the coordinates of the pixel points in the saliency map

In a further embodiment, the specific process of searching out the original image/original video meeting the requirement in the retrieval database is as follows:

and comparing the similarity of the characteristic values in the retrieval database with the input retrieval instruction, sequencing the similarity from high to low, screening out the characteristic values of a preset number which are sequenced in the front, and matching out the corresponding original image/original video based on the characteristic values.

The image retrieval system based on region-of-interest extraction for implementing the image retrieval method as described above includes:

a first module configured to construct a visual gaze model, and extract an area of interest in an original picture/original video based on the visual gaze model;

the second module is set to extract a characteristic value in the region of interest, and the characteristic value and the corresponding original picture/original video are subjected to associative storage according to a preset relation to obtain a retrieval database;

a third module, configured to establish a search interface and input a search instruction; and searching out and storing original pictures/original videos meeting the requirements in a retrieval database based on the retrieval instruction to obtain a picture/video library.

In a further embodiment, the first module further comprises a fourth module connected thereto, the fourth module being configured to: processing data of an original picture/original video, and extracting a saliency map by using a visual fixation model; obtaining at least one visual focus of attention in the saliency map using a competition mechanism; and taking the visual attention focus as a seed point for region growth segmentation, and obtaining the region of interest by a region growth method.

The invention has the beneficial effects that: the method for extracting the region of interest based on the visual attention model adds a human visual attention mechanism into the region of interest detection based on the visual attention model, and is more in line with the human visual perception process, wherein the method for extracting the region of interest based on the visual attention model comprises the following steps: obtaining a saliency map from physiological characteristics of the visual system of the human eye; and obtaining an attention focus by a winner by taking a full competition mechanism, taking the attention focus as a seed point for region growing and dividing, and then obtaining the region of interest by a region growing and dividing method. The problem that the region of interest extracted by the traditional method is separated from the subjective understanding of the user is solved.

Drawings

Fig. 1 is a flowchart of an image retrieval method based on region of interest extraction according to the present invention.

Fig. 2 is a flow chart of region of interest extraction based on visual fixation model in the present invention.

Fig. 3 is a flowchart of acquiring a saliency map in the present invention.

Fig. 4 is a flow chart of a prior art image retrieval technique based on text annotation.

FIG. 5 is a flow diagram of a prior art content-based image retrieval technique.

Fig. 6 is a flowchart of a prior art region-of-interest based image retrieval technique.

Detailed Description

Content-Based Image Retrieval (CBIR) is a research focus at present, the CBRI technology is to perform Image Retrieval by extracting low-level visual features such as color and shape of an Image, has strong objectivity, and overcomes the defects of the conventional Image Retrieval, and a specific flow is shown in fig. 5. However, it is difficult to represent the high-level semantic features of the image from the low-level visual features of the image, i.e. the so-called "semantic gap" problem: the information that the user obtains from the visual data is inconsistent with the user's own understanding of the visual data. Therefore, acquiring the high-level semantics of the image is the key to solve the semantic gap problem.

The detection of the region of Interest (ROI) of the image is an effective method for obtaining the high-level semantic meaning of the image. In recent years, with the development of interest detection technology, many interesting detection methods have been proposed, such as: the specific flow is shown in fig. 6, wherein the human-computer interaction based region of interest detection requires a user to participate, so that the intention of the user can be accurately obtained, but the interaction process is relatively complex; the method breaks away from the subjective understanding of a user on the image and easily causes an opposite result in the data with prominent image background.

Therefore, in order to solve the above technical problems, the present embodiment provides an image retrieval method based on region of interest extraction, and in order to solve the problem that the region of interest extraction in the conventional method is separated from the subjective understanding of the user, the region of interest detection based on the visual attention model in the present invention adds a human visual attention mechanism, which better conforms to the human visual perception process. As shown in fig. 1, the method comprises the following steps:

constructing a visual fixation model, and extracting an interested area in an original picture/original video based on the visual fixation model; in the present embodiment, the present invention is applicable to both the analysis search of images and the analysis search of videos.

establishing a retrieval interface and inputting a retrieval instruction; and searching out original pictures/videos meeting the requirements from a retrieval database based on the retrieval instruction, and storing to obtain a picture/video library. Further shown are: and comparing the similarity of the characteristic values in the retrieval database with the input retrieval instruction, sequencing the similarity from high to low, screening out the characteristic values of a preset number which are sequenced in the front, and matching out the corresponding original image/original video based on the characteristic values.

In a further embodiment, the process of extracting the region of interest is shown in fig. 5, and includes:

The method effectively overcomes the defect that a traditional region growing and dividing method needs to select seed points manually, and simultaneously solves the problems that the image is divided inaccurately by using a visual attention mechanism and the obtained region of interest is small. The visual fixation model is adopted, non-uniform sampling is carried out on the image by utilizing a human visual attention mechanism, the central peripheral difference is calculated to obtain a characteristic diagram of the image, and the characteristic diagram is fused into a saliency map of the image.

Specifically, the step one specifically comprises the following steps:

101, filtering an original image/original video by using a multi-scale multi-channel filter, extracting visual features, and obtaining a feature map about the visual features; the characteristic diagram at least comprises: a color feature map, a brightness feature map and a direction feature map; firstly, filtering an image BY using a Gaussian weight matrix, performing down-sampling to obtain a Gaussian pyramid with n layers, extracting color, brightness and direction characteristics under each scale sigma of the pyramid to form corresponding RG (sigma), BY (sigma), I (sigma) and O (sigma) characteristic pyramids, wherein sigma belongs to [0, n-1 ];

Green colour

Blue color

And yellow

102, selecting a scale according to requirements, defining the center c and the periphery s of each feature map, and respectively calculating the central periphery difference of the scale of the center c and the scale of the periphery s in the corresponding feature map; the result of the central peripheral difference is an attention map corresponding to the visual features; the calculation method for calculating the central peripheral difference between the central c scale and the peripheral s scale in the corresponding feature map is as follows:

It should be noted that, since the feature maps have different sizes at different scales, the feature map at the large scale s needs to be interpolated and enlarged to obtain the same size as the feature map at the small scale c when performing the difference.

Luminance attention map

And direction attention map

Step 104, obtaining a saliency map SM by adopting the following formula:

due to the selective and transitive nature of the attention focus, the selection and shifting of attention focus is achieved through the network contention mechanism of WTA. This ensures that all but the most active one, with the focus of attention directed by the most active part in terms of identifiable orientation points, is suppressed. Those local inhibit points are also temporarily activated while looking for the current focus of attention in the saliency map, and the next-to-saliency-area is considered the most active winner as the WTA network moves to the next focus of attention. The fixation area of the human eye thus shifts from a strong focus of attention to a weaker focus of attention, a process that is known as the shift of the point of attention. For further screening of attention focus, a method of weighting euclidean distances is proposed in the prior art, but the method is only suitable for a single object, for a plurality of object images.

Therefore, in the present embodiment, attention is paid to a method of comparing the degree of saliency of the focus with the threshold t, which is specifically expressed as: step 201, obtaining a plurality of saliency degrees in a saliency map, sorting the plurality of saliency degrees according to a sequence from strong to weak (numbering is performed according to the sequence from strong to weak), and selecting points corresponding to the top 10 saliency degrees as candidate points; in this embodiment, the saliency is obtained as follows:

in the formula, σ _c And σ _s Scale factors representing the center c and the periphery s, respectively, (x, y) are the coordinates of the pixel points in the saliency map.

In order to further improve the accuracy of taking the attention focus as the region growing and dividing seed point, the similar attention focuses need to be further merged for processing, which is specifically represented as follows: calculating Euclidean distance between the visual attention focuses, if the Euclidean distance is smaller than a second threshold value d, merging the corresponding two visual attention focuses by adopting the following formula,

resulting in a merged visual focus of attention (X, Y):

In another embodiment, an image retrieval system based on region of interest extraction for implementing the above method is disclosed, comprising:

Wherein the first module further comprises a fourth module connected thereto, the fourth module being arranged to: processing data of an original picture/original video, and extracting a saliency map by using a visual fixation model; obtaining at least one visual focus of attention in the saliency map using a competition mechanism; and taking the visual attention focus as a seed point for region growth segmentation, and obtaining the region of interest by a region growth method.

Claims

1. An image retrieval method based on region of interest extraction is characterized by at least comprising the following steps:

2. The image retrieval method based on region of interest extraction according to claim 1, wherein the first step specifically comprises the following steps:

Luminance attention map

And direction attention map

Step 104, obtaining a saliency map SM by adopting the following formula:

3. the image retrieval method based on region of interest extraction according to claim 1, wherein the second step specifically comprises the following steps:

4. The image retrieval method based on region of interest extraction according to claim 1, wherein the third step specifically comprises the following steps:

calculating Euclidean distances between the visual attention focuses, and if the Euclidean distances are smaller than a second threshold value d, combining the two corresponding visual attention focuses by adopting the following formula to obtain a combined visual attention focus (X, Y):

in the formula (x) _i ,y _i ) And (x) _j ,y _j ) Respectively representing the coordinates of the ith visual attention focus and the jth visual attention focus, wherein i is not equal to j; v. of _s,i And v _s,j Individual watchGray values of the visual attention focus point i and the visual attention focus point j in the saliency map are shown.

5. The image retrieval method based on region of interest extraction according to claim 2, wherein the step 101 is further represented by:

Green colour

Blue color

And yellow

6. The image retrieval method based on region of interest extraction as claimed in claim 2, wherein the calculation method for calculating the central peripheral difference between the central c scale and the peripheral s scale in the corresponding feature map is as follows:

7. the image retrieval method based on region of interest extraction as claimed in claim 3, wherein the saliency is obtained as follows:

8. The image retrieval method based on region of interest extraction as claimed in claim 1, wherein the specific process of searching out the original image/original video meeting the requirement in the retrieval database is as follows:

9. An image retrieval system based on region-of-interest extraction for implementing the image retrieval method according to any one of claims 1 to 8, comprising:

a third module, configured to establish a search interface and input a search instruction; and searching out original pictures/videos meeting the requirements from a retrieval database based on the retrieval instruction, and storing to obtain a picture/video library.

10. An image retrieval system based on region of interest extraction as claimed in claim 9, wherein the first module further comprises a fourth module connected thereto, the fourth module configured to: processing data of an original picture/original video, and extracting a saliency map by using a visual fixation model; obtaining at least one visual focus of attention in the saliency map using a competition mechanism; and taking the visual attention focus as a seed point for region growth segmentation, and obtaining the region of interest by a region growth method.