CN112418109A

CN112418109A - Image processing method and device

Info

Publication number: CN112418109A
Application number: CN202011344784.1A
Authority: CN
Inventors: 周平红; 李全林; 诸炎
Original assignee: Zhongshan Hospital Fudan University
Current assignee: Zhongshan Hospital Fudan University
Priority date: 2020-11-26
Filing date: 2020-11-26
Publication date: 2021-02-26
Anticipated expiration: 2040-11-26
Also published as: CN112418109B

Abstract

The invention provides a technical scheme for providing an image processing method. Another aspect of the present invention provides an image processing apparatus, including: an identification unit; a first determination unit; and (5) an intercepting unit. According to the method, the collected images are processed by utilizing the deep learning and computer vision technology to obtain the change areas in the images, the change areas in the images are further analyzed, the target areas are obtained through calculation, and the target areas can be regarded as areas with effective contents in the images, so that the collected images can be intercepted on the basis of the target areas, the contents contained in the to-be-processed images are effective contents, the processing of the ineffective contents of the images is avoided, and the image processing efficiency is improved.

Description

Image processing method and device

Technical Field

The present invention relates to an image processing method and an image processing apparatus.

Background

The existing image processing method mainly processes the whole image, but the target object generally only appears in a partial area of the image, and if the whole image is identified, a large amount of non-target area images are processed, so that a large amount of memory and computing resources are occupied, and waste is caused. In addition, because there may be multiple light sources in the process of shooting or collecting images, different light sources may have a large influence on the collected image data, and if the image data collected under different light sources are identified and processed in a uniform manner, there may be an adverse effect on the identification of the target object, and there may be a case where the image data collected under a certain type or types of light sources cannot be accurately identified.

Disclosure of Invention

The purpose of the invention is: the situation of processing redundant image areas at present is improved, memory and computing resources are saved, and the image processing efficiency is improved.

In order to achieve the above object, an aspect of the present invention provides an image processing method, including:

identifying each frame of video image in a video to be processed to obtain change areas respectively contained in each frame of video image;

determining boundary information of a target area according to variation areas respectively contained in multiple frames of video images in a video to be processed;

and respectively intercepting each frame of video image in the video to be processed based on the boundary information to obtain the image to be processed contained in each frame of video image.

Preferably, before each frame of video image in the video to be processed is identified and the change regions respectively contained in each frame of video image are obtained, the method further includes the following steps:

performing convolution operation on each frame of video image in the original video based on a preset convolution kernel to obtain a result image corresponding to each frame of video image, wherein the preset convolution kernel is determined based on a second-order differential convolution kernel;

and combining the result image to generate the video to be processed.

Preferably, identifying each frame of video image in the video to be processed to obtain the change areas respectively included in each frame of video image, includes:

calculating each frame of video image in a video to be processed through a proximity algorithm to obtain a binarization mask video frame corresponding to each frame of video image;

performing Gaussian blur on the binary mask video frame to obtain a Gaussian mask corresponding to the binary mask video frame;

calculating the Gaussian mask to obtain a content mask matrix corresponding to the Gaussian mask;

and determining the change areas respectively contained in each frame of video image from the content mask matrix.

Preferably, the operating the gaussian mask to obtain the content mask matrix corresponding to the gaussian mask includes:

carrying out corrosion operation on the Gaussian mask to obtain a corrosion mask corresponding to the Gaussian mask;

and performing expansion operation on the corrosion mask to obtain the content mask matrix corresponding to the corrosion mask.

Preferably, determining the variation regions respectively contained in each frame of the video image from the content mask matrix includes:

calculating the content mask matrix to obtain a row coordinate of a starting point of an enclosing frame, a column coordinate of the starting point and the width and height of the enclosing frame;

and respectively determining the change areas corresponding to the enclosing frames in each frame of video image according to the line coordinates, the column coordinates and the width and the height of the enclosing frames.

Preferably, after the changed regions corresponding to the bounding boxes are respectively determined in each frame of the video image, the method further includes the following steps:

determining bounding box information corresponding to the change area of each frame of video image, wherein the bounding box information at least comprises a row coordinate of a starting point of the bounding box, a column coordinate of the starting point and the width and height of the bounding box;

storing the bounding box information corresponding to the change area to a fixed-length cache;

determining the boundary information of the target area according to the change areas respectively contained in the multiple frames of video images in the video to be processed, including:

reading the bounding box information of the change area respectively contained in a plurality of frames of the video images in the video to be processed from a fixed-length cache;

and determining the boundary information of the target area according to the bounding box information of the change area respectively contained in the multiple frames of video images.

Preferably, determining boundary information of a target area according to the bounding box information of the change area respectively included in the plurality of frames of video images includes:

respectively calculating the area of each change region according to the width and the height of the surrounding frame in the surrounding frame information of the change region respectively contained in the multi-frame video image;

deleting the change region with the area of zero from the plurality of change regions to obtain a candidate change region;

determining a first change region with the largest area and a second change region with the smallest area from the candidate change regions;

deleting the first change area and the second change area from the candidate change area to obtain a change area to be calculated;

calculating the area of the change area to be calculated to obtain a fitting curve;

determining a maximum area value from the fitted curve;

determining the boundary information of the target region according to the maximum area value.

Another aspect of the present invention provides an image processing apparatus, including:

the identification unit is used for identifying each frame of video image in the video to be processed to obtain the change areas respectively contained in each frame of video image;

the first determining unit is used for determining the boundary information of the target area according to the change areas respectively contained in the multi-frame video images in the video to be processed;

and the intercepting unit is used for respectively intercepting each frame of video image in the video to be processed based on the boundary information to obtain the image to be processed contained in each frame of video image.

Preferably, the apparatus further comprises:

the convolution unit is used for carrying out convolution operation on each frame of video image in the video based on a preset convolution kernel before the identification unit identifies any frame of video image in the video to be processed to obtain a change area contained in any frame of video image, so as to obtain a result image corresponding to each frame of video image, wherein the preset convolution kernel is determined by taking a second-order differential convolution kernel as a basis;

and the generating unit is used for generating the video to be processed by combining the result image.

Preferably, the identification unit includes:

the calculating subunit is used for calculating each frame of video image in the video to be processed through a proximity algorithm to obtain a binary mask video frame corresponding to each frame of video image;

the Gaussian blur subunit is used for carrying out Gaussian blur on the binary mask video frame to obtain a Gaussian mask corresponding to the binary mask video frame;

the operation subunit is used for operating the Gaussian mask to obtain a content mask matrix corresponding to the Gaussian mask;

and the first determining subunit is used for determining the change areas respectively contained in each frame of video image from the content mask matrix.

Preferably, the operation subunit includes:

the corrosion operation module is used for carrying out corrosion operation on the Gaussian mask to obtain a corrosion mask corresponding to the Gaussian mask;

and the expansion operation module is used for performing expansion operation on the corrosion mask to obtain the content mask matrix corresponding to the corrosion mask.

Preferably, the first determining subunit includes:

the matrix operation module is used for operating the content mask matrix to obtain a row coordinate of a starting point of the surrounding frame, a column coordinate of the starting point and the width and height of the surrounding frame;

and the first determining module is used for respectively determining the change areas corresponding to the surrounding frames in each frame of video image according to the row coordinates, the column coordinates and the width and the height of the surrounding frames.

Preferably, the apparatus further comprises:

a second determining unit, configured to determine bounding box information corresponding to the change area of each frame of the video image after the first determining module determines the change area corresponding to the bounding box in each frame of the video image, where the bounding box information at least includes a row coordinate of a start point of the bounding box, a column coordinate of the start point, and a width and a height of the bounding box;

the storage unit is used for storing the bounding box information corresponding to the change area to a fixed-length cache;

the first determination unit includes:

the reading subunit is configured to read bounding box information of the change areas respectively and correspondingly included in the multiple frames of video images in the video to be processed from the fixed-length cache;

and the second determining subunit is used for determining the boundary information of the target area according to the bounding box information of the change areas respectively contained in the multiple frames of video images.

Preferably, the second determining subunit includes:

the calculation module is used for respectively calculating the area of each change area according to the width and the height of an enclosure frame in enclosure frame information of the change area respectively contained in the multi-frame video image;

a deleting module, configured to delete a change region with an area of zero from the plurality of change regions, so as to obtain a candidate change region;

a second determining module, configured to determine a first variation region with a largest area and a second variation region with a smallest area from the candidate variation regions;

the deleting module is further used for deleting the first change area and the second change area from the candidate change areas to obtain a change area to be calculated;

the area operation module is used for operating the area of the change area to be calculated to obtain a fitting curve;

the second determination module is further configured to determine a maximum area value from the fitted curve;

the second determining module is further configured to determine boundary information of the target region according to the maximum area value.

According to the method, the collected images are processed by utilizing the deep learning and computer vision technology to obtain the change areas in the images, the change areas in the images are further analyzed, the target areas are obtained through calculation, and the target areas can be regarded as areas with effective contents in the images, so that the collected images can be intercepted on the basis of the target areas, the contents contained in the to-be-processed images are effective contents, the processing of the ineffective contents of the images is avoided, and the image processing efficiency is improved.

Drawings

Fig. 1 is a schematic flowchart of an image processing method according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart illustrating a model processing method according to another embodiment of the present invention;

FIG. 3 is a schematic diagram of generating a fitting curve according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present invention.

Detailed Description

The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention may be made by those skilled in the art after reading the teaching of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.

The flow 100 of the image processing method according to the exemplary embodiment of the present invention includes the steps of:

step S110, identifying each frame of image in the video to be processed to obtain the change areas respectively contained in each frame of image;

step S120, determining boundary information of a target area according to the change areas respectively contained in the multi-frame images in the video to be processed;

and step S130, respectively intercepting each frame of image in the video to be processed based on the boundary information to obtain the image to be processed contained in each frame of image.

The image processing method proposed in the present invention is directed to a method for processing an image captured using a device such as an endoscope or a probe in a usage scenario of the device such as the endoscope or the probe based on an image recognition technology, and includes, but is not limited to, the usage scenario of the device such as the endoscope or the probe.

According to the technical scheme, the boundary information of the target area needing to be cut can be determined according to the change area contained in each frame of the obtained image, each frame of the image in the video to be processed is cut based on the boundary information, and the final image to be processed is obtained, so that the determined image to be processed does not contain the area needing not to be processed as much as possible, the memory and the computing resource are saved, and the image processing efficiency is improved.

How to save memory and computational resources and improve the efficiency of image processing is described below with reference to the accompanying drawings:

the video to be processed may be acquired by an endoscope and a probe image acquisition device, which is not limited in this embodiment. The change area may be an area where a color of the image changes, and each frame of the image may or may not include the change area. When the image contains the change region, the area of the change region is not zero; when the image does not include the change region, the area of the change region may be considered to be zero.

In addition, the multi-frame image in the video to be processed may be each frame image in the video to be processed, or may also be a part of image selected from the video to be processed according to a preset selection manner, where the preset selection manner may be a continuous multi-frame image in the video to be processed, or a multi-frame image obtained at intervals from the video to be processed, and the preset selection manner is not limited in this embodiment.

Referring to fig. 2, fig. 2 is a schematic flowchart of an image processing method according to another embodiment of the present invention, and a flowchart 200 of the image processing method according to another embodiment of the present invention shown in fig. 2 includes the following steps:

step S210, performing convolution operation on each frame of video image in the video based on a preset convolution kernel to obtain a result image corresponding to each frame of video image, wherein the preset convolution kernel is determined based on a second-order differential convolution kernel;

step S220, combining the result image to generate a video to be processed;

by implementing the steps S210 to S220, the collected video can be denoised, and the denoising effect of the video by the above method is obvious.

The preset convolution kernel may be set with reference to a second order differential convolution kernel of laplace [ [1,1,1], [1, -8,1], [1,1,1] ], for example, the preset convolution kernel may be [ [1,1,1], [1, -9,1], [1,1,1] ] or [ [1,1,1], [1, -10,1], [1,1,1] ], and an operator may select a parameter with the best convolution effect by testing different parameters and determine the preset convolution kernel according to the parameter.

In addition, before each frame of image in the video is processed, each frame of image in the video can be preprocessed, specifically, dimension reduction can be performed on each frame of image in the video, that is, R, G, B/B, G, R epsilon (0,255), G epsilon (0,255), B epsilon (0,255) space of pixels in each frame of image can be converted into G, G epsilon (0,255) space can be converted into G based on a preset convolution kernel, so that pixel difference is reduced, and feature value calculation is facilitated.

Step S230, calculating each frame of image in the video to be processed through a proximity algorithm to obtain a binarization mask video frame corresponding to each frame of image;

step S240, carrying out Gaussian blur on the binary mask video frame to obtain a Gaussian mask corresponding to the binary mask video frame;

step S250, the Gaussian mask is operated to obtain a content mask matrix corresponding to the Gaussian mask;

step S260, determining the change areas respectively contained in each frame of image from the content mask matrix;

by implementing the steps S230 to S260, each frame of image in the video may be subjected to dimensionality reduction to simplify the subsequent data processing process, and then each frame of image may be subjected to noise reduction processing to improve the accuracy of image processing.

The change region of each frame of image in the video to be processed can be calculated through a k-nearest neighbor (kNN) algorithm to obtain a binarization mask video frame corresponding to the image, and then the binarization mask video frame can be subjected to gaussian blurring to reduce image noise and reduce the detail level of the image.

As an optional implementation manner, in step S250, the operation is performed on the gaussian mask to obtain a content mask matrix corresponding to the gaussian mask, which may specifically include the following steps:

and performing expansion operation on the corrosion mask to obtain a content mask matrix corresponding to the corrosion mask.

Therefore, by implementing the above embodiment, the noise reduction effect on the gaussian mask can be enhanced by performing the erosion operation on the gaussian mask first and then performing the expansion operation, so that the obtained data in the final content mask matrix is more effective.

As an optional implementation manner, in step S260, the manner of determining the changed regions respectively included in each frame of image from the content mask matrix may specifically include the following steps:

calculating the content mask matrix to obtain a row coordinate of a starting point of the bounding box, a column coordinate of the starting point and the width and height of the bounding box;

and respectively determining the change areas corresponding to the surrounding frames in each frame of image according to the line coordinates, the column coordinates and the width and the height of the surrounding frames.

As can be seen, by implementing the above embodiment, by performing an operation on the content mask matrix, a region having a higher data processing value in each frame of image can be obtained, and further, bounding box data corresponding to the region can be determined, so that the amount of information included in the obtained image in the bounding box is larger.

The content mask matrix may be represented as M, and processing M may obtain a boundary of the change area, including:

wherein, a_ijA pixel channel value of 0 or 1, i 1, 2,.. and n, j 1, 2,.. and m, n rows and m columns a_ijThe content mask matrix M is formed by combining the content mask matrix M into a matrix, and then the row and column of the content mask matrix M can be traversed from four directions, wherein the traversing process is as follows:

the matrix is split according to rows to form row vectors of a content mask matrix M

Sequentially traversing the row vectors from 1 to n to obtain the maximum value of each row, wherein the process is as follows:

y_ithe maximum value of the ith row is indicated.

If y is_iAnd if the coordinate is more than 0, returning the starting coordinate y of the vertical axis of the bounding box and terminating the traversal process, wherein the y is equal to i.

Sequentially traversing the row vector from n to 1 to obtain the maximum value of each row, wherein the process is as follows:

h_irepresents the maximum value of the ith row if h_iAnd if the height h is larger than 0, returning the height h of the surrounding box and terminating the traversal process, wherein h is i.

Splitting the matrix according to columns to form a column vector a of a content mask matrix M_j：

Traversing the column vector from 1 to m sequentially to obtain the maximum value of each column, wherein the process is as follows:

x_j＝argmax(a_j)

x_jrepresents the maximum value of the j-th column if x_jAnd if the value is more than 0, returning the starting coordinate x of the horizontal axis of the bounding box and terminating the traversal process, wherein x is j.

Traversing the column vector by j from m to 1 sequentially to obtain the maximum value of each column, wherein the process is as follows:

w_j＝argmax(a_j)

w_jrepresents the maximum value of the j-th column if w_jAnd if the width w is larger than 0, returning the width w of the surrounding box and terminating the traversal process, wherein the w is j.

Finally, a line coordinate x of the starting point of the bounding box, a column coordinate y of the starting point, a width w of the bounding box, and a height h of the bounding box are obtained, and further a tuple BBox of the bounding box is formed as (x, y, w, h), namely BBox is a change area corresponding to the bounding box in each frame of image.

Optionally, after determining the change areas corresponding to the bounding boxes in each frame of image, the following steps may be further performed:

determining bounding box information corresponding to the change area of each frame of image, wherein the bounding box information at least comprises row coordinates of a starting point of the bounding box, column coordinates of the starting point and the width and height of the bounding box;

and storing the bounding box information corresponding to the change area to a fixed-length cache.

By implementing the embodiment, the data of the bounding box can be identified, so that the data of the bounding box is quantized, and the quantized bounding box information is stored, so that the bounding box information can be read at any time under the condition of needing to be used.

Wherein, every other preset time interval (preset time interval can be the time interval that operating personnel or system predetermine, still can be for the video broadcast duration etc. that sets up in advance), can deposit the BBox in the fixed length buffer memory to calculate the bounding box area that each BBox medium-width and high product constitutes, for example: the length of the buffer is 30, the horizontal axis represents the insertion timing, and the vertical axis represents the area of each BBox.

Step S270, determining boundary information of a target area according to the change areas respectively contained in the multi-frame images in the video to be processed;

optionally, in step S270, determining the boundary information of the target area according to the change areas respectively included in the multiple frames of images in the video to be processed may specifically include the following steps:

reading bounding box information of a change area respectively contained in a plurality of frames of images in a video to be processed from a fixed-length cache;

and determining the boundary information of the target area according to the bounding box information of the change area respectively contained in the multi-frame images.

By implementing the implementation mode, the information of the plurality of bounding boxes can be processed and analyzed to obtain the effective data of the bounding boxes in the multi-frame image, and then the target area which can be used in all images is obtained by calculation according to the effective data of the bounding boxes, so that the image content contained in the target area is ensured to be more effective.

Further, the method for determining the boundary information of the target area according to the bounding box information of the variation areas respectively contained in the multi-frame images may include the following steps:

respectively calculating the area of each change area according to the width and the height of an enclosing frame in the enclosing frame information of the change area respectively contained in the multi-frame image;

deleting the change area with the area of zero from the plurality of change areas to obtain a candidate change area;

determining a maximum area value from the fitted curve;

and determining the boundary information of the target area according to the maximum area value.

By implementing the implementation mode, invalid data can be deleted from the plurality of bounding box data, then fitting can be carried out on the remaining valid bounding box data, and the most suitable data can be determined from the fitting result to be used as the boundary information of the target area, so that the target area is determined, and the accuracy of determining the target area is improved.

The BBox corresponding to the change area with the area of 0 can be deleted from the plurality of change areas to reduce the deviation of the data processing result of the change area, and then the BBox corresponding to the candidate change area with the largest area and the smallest area in the remaining candidate change areas can be deleted to obtain the final BBox of the change area to be calculated, so that the data processing result of the final BBox of the change area to be calculated is more accurate.

In addition, referring to fig. 3, fig. 3 is a schematic diagram of generating a fitting curve according to an embodiment of the present invention, where X represents an insertion timing sequence, Y represents an area of each BBox, and a is a final BBox of a change region to be calculated, and after performing second-order linear fitting on a, a fitting curve b may be obtained, and then the fitting curve b may be traversed to obtain a maximum area value c, and a BBox 'corresponding to the maximum area value c is determined, so that boundary information of a target region may be determined according to the BBox'.

Step S280, each frame of image in the video to be processed is respectively intercepted based on the boundary information, and the image to be processed contained in each frame of image is obtained.

According to the method and the device, the boundary information of the target area needing to be cut can be determined according to the change area contained in each frame of the obtained image, each frame of the image in the video to be processed is cut based on the boundary information, and the final image to be processed is obtained, so that the determined image to be processed does not contain the area needing not to be processed as much as possible, the memory and the computing resource are saved, and the image processing efficiency is improved.

Claims

1. An image processing method, characterized by comprising the steps of:

2. The image processing method according to claim 1, wherein before identifying each frame of video image in the video to be processed and obtaining the changed regions respectively contained in each frame of video image, the method further comprises the following steps:

and combining the result image to generate the video to be processed.

3. The image processing method according to claim 1 or 2, wherein identifying each frame of video image in the video to be processed to obtain the change areas respectively contained in each frame of video image comprises:

4. The image processing method according to claim 3, wherein the performing an operation on the gaussian mask to obtain the content mask matrix corresponding to the gaussian mask comprises:

5. The method according to claim 4, wherein determining the changed regions respectively contained in the video images of each frame from the content mask matrix comprises:

6. The image processing method according to claim 5, further comprising the following steps after determining the changed regions corresponding to the bounding boxes in the video images of each frame respectively:

7. The image processing method according to claim 6, wherein determining boundary information of a target area based on the bounding box information of the changed area respectively included in the plurality of frames of video images comprises:

determining a maximum area value from the fitted curve;

8. An image processing apparatus characterized by comprising:

9. The image processing apparatus according to claim 8, characterized in that the apparatus further comprises:

10. The image processing apparatus according to claim 8 or 9, wherein the identifying unit includes:

11. The image processing apparatus according to claim 10, wherein the operation subunit includes:

12. The image processing apparatus according to claim 11, wherein the first determining subunit includes:

13. The image processing apparatus according to claim 12, characterized in that the apparatus further comprises:

the first determination unit includes:

14. The image processing apparatus according to claim 13, wherein the second determining subunit includes: