CN112241982A

CN112241982A - Image processing method and device and machine-readable storage medium

Info

Publication number: CN112241982A
Application number: CN201910651713.7A
Authority: CN
Inventors: 蔡晓望
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2019-07-18
Filing date: 2019-07-18
Publication date: 2021-01-19

Abstract

The application provides an image processing method, an image processing device and a machine-readable storage medium, wherein the method comprises the following steps: carrying out target detection and tracking on the monitoring video frames to determine the position information of the same target in each monitoring video frame; determining a region of interest (ROI) corresponding to the same target in each monitoring video frame according to the position information of the target in each monitoring video frame; enhancing the image sequence of the ROI area based on the quality score of the ROI area in each monitoring video frame; and carrying out coding transmission on the image sequence after the enhancement processing. The method can improve the effect of image quality optimization.

Description

Image processing method and device and machine-readable storage medium

Technical Field

The present application relates to the field of video monitoring, and in particular, to an image processing method and apparatus, and a machine-readable storage medium.

Background

In recent years, with the continuous development of science and technology, more and more intelligent cameras are applied to monitoring scenes to complete specific tasks. For example, the face snapshot machine can detect the face in the monitoring picture in real time and transmit the detected face image to the server for face recognition; the traffic snapshot machine can help the traffic management department to quickly and effectively monitor illegal vehicles and snapshot and evidence the illegal vehicles.

However, due to the influence of factors such as illumination, camera erection height, and target motion, the quality of the region of interest (ROI) in the finally obtained target image generally has a large difference, and some images have good quality, but there may be poor quality conditions such as blur, insufficient brightness, insufficient contrast, and the like.

In view of the above situation, the current solution is to perform optimization processing on decoded image data at the back end, improve the brightness and contour details of the ROI in the monitored image, and control the color of the ROI, thereby achieving the purpose of improving the image definition of the ROI.

Practice shows that the information when the ROI is processed by the solution is the decoded image data after network transmission, and the information is seriously lost due to coding compression at the moment, so that the improvement of subsequent processing is limited; in addition, the current solution only processes the traffic vehicle information, and the applicable scenarios are limited.

Disclosure of Invention

In view of the above, the present application provides an image processing method and an apparatus thereof.

Specifically, the method is realized through the following technical scheme:

according to a first aspect of the embodiments of the present application, there is provided an image processing method applied to a video monitoring front-end device, the method including:

carrying out target detection and tracking on the monitoring video frames to determine the position information of the same target in each monitoring video frame;

determining the corresponding ROI (region of interest) of the target in each monitoring video frame according to the position information of the same target in each monitoring video frame;

intercepting the ROI area from the cached monitoring video frame in the original format according to the quality score of the ROI area in each monitoring video frame, and performing enhancement processing on the image sequence of the intercepted ROI area in the original format;

and carrying out coding transmission on the image sequence after the enhancement processing.

According to a second aspect of the embodiments of the present application, there is provided an image processing apparatus applied to a video surveillance front-end device, the apparatus including:

the target detection unit is used for carrying out target detection and tracking on the monitoring video frames so as to determine the position information of the same target in each monitoring video frame;

the determining unit is used for determining a region of interest (ROI) corresponding to the target in each monitoring video frame according to the position information of the same target in each monitoring video frame;

the intercepting unit is used for intercepting the ROI area from the cached monitoring video frame in the original format according to the quality score of the ROI area in each monitoring video frame;

the enhancement processing unit is used for carrying out enhancement processing on the intercepted image sequence of the ROI in the original format;

and the transmission unit is used for coding and transmitting the image sequence after the enhancement processing.

According to a third aspect of embodiments of the present application, there is provided an image processing apparatus comprising a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor, the processor being caused by the machine-executable instructions to perform the above-mentioned image processing method.

According to a fourth aspect of embodiments of the present application, there is provided a machine-readable storage medium storing machine-executable instructions that, when invoked and executed by a processor, cause the processor to perform the above-described image processing method.

According to the image processing method, target detection and tracking are carried out on monitoring video frames through video monitoring front-end equipment, so that the position information of the same target in each monitoring video frame is determined, the ROI (region of interest) region of the target corresponding to each monitoring video frame is determined according to the position information of the same target in each monitoring video frame, further, the ROI region is intercepted from the cached monitoring video frame in the original format according to the quality score of the ROI region in each monitoring video frame, the intercepted image sequence of the ROI region in the original format is subjected to enhancement processing, and the image sequence subjected to enhancement processing is subjected to coding transmission, so that the influence of information loss in the compression transmission process on image quality optimization is avoided, and the image quality optimization effect is improved; in addition, the image quality optimization is no longer limited to vehicles, and the applicable scenes of the scheme are expanded.

Drawings

FIG. 1 is a flow chart illustrating a method of image processing according to an exemplary embodiment of the present application;

FIG. 2 is a schematic diagram illustrating a specific application scenario according to an exemplary embodiment of the present application;

FIG. 3A is a block diagram illustrating a first processing module according to an exemplary embodiment of the present application;

FIG. 3B is a schematic diagram of a second sub-processing unit according to an exemplary embodiment of the present application;

FIG. 3C is a block diagram of a second processing module according to an exemplary embodiment of the present application;

FIG. 4 is a schematic diagram of an image processing apparatus according to an exemplary embodiment of the present application;

fig. 5 is a schematic diagram of a hardware structure of an image processing apparatus according to an exemplary embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

In order to make those skilled in the art better understand the technical solutions provided by the embodiments of the present application, the following is a brief description of the original format described in the present application.

Raw data format refers to a raw data format in which the image sensor converts the captured light source signals into digital signals, the raw data containing sensing data from one or more spectral bands.

Illustratively, the raw data may include sensed data sampled for optical signals in the spectral band having wavelengths in the range of 380nm to 780nm, and/or 780nm to 2500 nm.

For example, an RGB sensor senses a resulting RAW (unprocessed) image signal.

Illustratively, the imaging device collects a light source signal, converts the collected light source signal into an analog signal, converts the analog signal into a digital signal, inputs the digital signal into a processing chip for processing (the processing may include bit width clipping, image processing, encoding and decoding processing, and the like), obtains data in a second data format (the original format may be referred to as a first data format), and transmits the data in the second data format to a display device for display or other devices for processing.

Therefore, the image in the original format is the image when the sensor converts the acquired light source information into the digital signal, the image is not processed by the processing chip, the bit width is high, and the image contains more abundant image information compared with the image in the second data format which is subjected to bit width cutting, image processing and encoding and decoding processing.

In order to make the aforementioned objects, features and advantages of the embodiments of the present application more comprehensible, embodiments of the present application are described in detail below with reference to the accompanying drawings.

Referring to fig. 1, a schematic flowchart of an image processing method provided in an embodiment of the present application is shown, where the image processing method may be applied to a video monitoring front-end device, such as an IPC (Internet Protocol Camera), and as shown in fig. 1, the image processing method may include the following steps:

for convenience of description and understanding, the following description will be made taking the execution subject of steps S100 to S130 as an example of IPC.

Step S100, carrying out target detection and tracking on the monitoring video frames to determine the position information of the same target in each monitoring video frame.

In the embodiment of the application, when the IPC acquires the surveillance video, the IPC may perform target detection on the surveillance video frames, and track the target after the target is detected, so as to determine the position information of the same target in each surveillance video frame.

Illustratively, the targets may include, but are not limited to, pedestrians, vehicles, animals, license plates, and the like.

In an example, the performing target detection and tracking on the surveillance video frames to determine the position information of the same target in each surveillance video frame may include:

and based on the neural network, carrying out target detection and tracking on the monitoring video frames so as to determine the position information of the same target in each monitoring video frame.

For example, in order to implement target detection and tracking of the surveillance video frames, a neural network for target detection and tracking may be trained through a training sample subjected to target labeling, and target detection and tracking are performed on the surveillance video frames by using the trained neural network, so as to determine position information of the same target in each surveillance video frame.

Step S110, according to the position information of the same target in each monitoring video frame, determining the ROI area of the corresponding interesting area of the target in each monitoring video frame.

In the embodiment of the application, for any object, the IPC may determine the corresponding ROI area of the object in each monitored video frame according to the position information of the object in each monitored video frame.

For example, for any surveillance video frame, the target framing area in the surveillance video frame may be determined as the corresponding ROI area of the target in the surveillance video frame.

And step S120, intercepting the ROI area from the cached monitoring video frame in the original format according to the quality score of the ROI area in each monitoring video frame, and performing enhancement processing on the image sequence of the ROI area in the original format.

In the embodiment of the application, when the video monitoring front-end equipment performs target detection and tracking on the monitoring video frame, the monitoring video frame in the original format can be cached.

For any object, when the IPC determines the ROI area of the object in each monitored video frame, the IPC may intercept the ROI area from the cached monitored video frame in the original format according to the quality score of the ROI area in each monitored video frame, and perform enhancement processing on the image sequence of the intercepted ROI area in the original format.

In a possible implementation manner, the intercepting the ROI from the buffered original format surveillance video frame according to the quality score of the ROI in each surveillance video frame may include:

determining a quality score of an ROI area in each monitoring video frame;

and intercepting the ROI area with the quality score higher than a preset score threshold value from the cached monitoring video frame in the original format based on the quality score of the ROI area in each monitoring video frame.

For example, for any object, when the IPC determines the ROI region of the object in each monitored video frame, the quality score can be performed on the ROI region in each monitored video frame.

For example, the quality of the ROI region may be scored according to target imaging quality (e.g., face sharpness, face angle, etc.) in the ROI region, or the quality of the ROI region may be scored according to quality of an image corresponding to the ROI region (e.g., image sharpness, brightness, contrast, etc.), which is not described herein in this embodiment.

In one example, a quality score for a ROI region in each surveillance video frame may be determined based on a neural network.

When the IPC determines the quality score of the ROI region in each monitored video frame, the monitored video frames with the quality score of the ROI region not higher than a preset score threshold (which may be set according to an actual application scene) are removed from the cached monitored video frames in the original format based on the quality score of the ROI region in each monitored video frame, and the monitored video frames with the quality score of the ROI region higher than the preset score threshold are subjected to ROI region interception to obtain an image sequence of the ROI region in the original format, and the intercepted image sequence of the ROI region in the original format is subjected to enhancement processing.

In an example, the performing enhancement processing on the image sequence of the intercepted ROI region in the original format may include:

selecting a reference frame from a plurality of frame images in an input image sequence of an ROI (region of interest) in an original format;

aligning the plurality of frame image pairs based on the reference frame;

and performing enhancement processing on the multi-frame images based on complementary information between the aligned multi-frame images.

Illustratively, when the image sequence of the ROI in the original format needs to be enhanced, a plurality of frames of images (the specific number of frames may be set according to the actual scene) may be selected from the image sequence of the ROI in the original format each time as the input of the enhancement process, and one frame of image may be selected from the input plurality of frames of images as the reference frame.

For example, the middle frame or the last frame in the multi-frame image may be selected as the reference frame.

For example, assuming that the number of the multi-frame images is 3, the 2 nd frame or the 3 rd frame may be selected as the reference frame.

After the reference frame is selected, the multiple frames of images may be aligned based on the reference frame, and the multiple frames of images may be enhanced based on complementary information between the aligned multiple frames of images.

And step S130, coding and transmitting the image sequence after the enhancement processing.

In the embodiment of the application, after the IPC performs enhancement processing on the image sequence of the corresponding ROI region according to the manner described in the above steps, the image sequence after enhancement processing may be encoded and compressed, and the encoded and compressed code stream is transmitted to the back-end device through the network.

For example, after the IPC performs the enhancement processing on the image sequence of the corresponding ROI region in the manner described in the above steps, a part or all of the image sequence may be selected from the image sequence after the enhancement processing, compressed and encoded, and transmitted to the backend device through the network.

In an example, the encoding and transmitting the image sequence after the enhancement processing may include:

and for the same target, selecting a frame of image from the image sequence after enhancement processing for coding transmission.

For any object, for example, after the IPC performs enhancement processing on the image sequence of the ROI corresponding to the object in the manner described in the above steps, a frame of image may be selected from the image sequence after enhancement processing for encoding and transmission.

For example, the IPC may determine the quality score of each enhanced image frame, and select the image frame with the highest quality score for encoding transmission.

It can be seen that, in the method flow shown in fig. 1, the video monitoring front-end device performs image quality optimization on the monitored video frame for the target in the monitored video, thereby avoiding the influence of information loss in the compression transmission process on the image quality optimization and improving the effect of the image quality optimization; in addition, the image quality optimization is no longer limited to vehicles, and the applicable scenes of the scheme are expanded.

In order to enable those skilled in the art to better understand the technical solutions provided in the embodiments of the present application, the following describes the technical solutions provided in the embodiments of the present application with reference to specific application scenarios.

Referring to fig. 2, a schematic structural diagram of a specific application scenario provided in the embodiment of the present application is shown in fig. 2, in the application scenario, a video monitoring front-end device may include a first processing module, an ROI region extraction module, a second processing module, and a third processing module.

The image processing flow of the present application will be described below with reference to the functions of the respective modules.

A first processing module

The first processing module is used for detecting the target and tracking the target after the target is detected.

For example, the first processing module may detect a target such as a pedestrian, a vehicle, an animal, or a license plate, perform accurate positioning after detecting the target, and output target position information after positioning is completed; and then, tracking the target around the target according to the positioned target position information, and determining the position information of the target in each frame of image.

For example, the first processing module may buffer the image sequence during the target detection, such as buffering the image sequence in a first data format (e.g., RGB (red green blue) format or YUV (brightness, chrominance, saturation) format).

In one example, referring to fig. 3A, the first processing module may include: the system comprises a first sub-processing unit, a second sub-processing unit and a third sub-processing unit.

The first sub-processing unit is used for carrying out target detection so as to determine target position information in each frame of image;

the second sub-processing unit is used for tracking the target so as to determine the position information of the same target in each frame of image;

the third sub-processing unit is used for evaluating the image quality of the target area and outputting an evaluation score (namely the quality score).

For example, the first sub-processing unit may be implemented by a neural network, and directly outputs the target coordinates. As shown in fig. 3B, the neural network for implementing the first sub-processing unit may include a convolutional layer (Conv), a pooling layer (Pool), a full connection layer (FC layer), and a Bounding Box Regression (BBR).

For example, the operation of a convolutional layer may be represented by the following formula:

YC_i(I)＝g(W_i*YC_i-1(I)+B_i)

wherein, YC_i(I) Is the output of the current convolutional layer, YC_i-1(I) For the input of the current convolution layer, denotes the convolution operation, W_iAnd B_iWeight coefficients and offset coefficients of the convolution filter of the current convolution layer, respectively, g () represents an activation function, when the activation function is ReLU, g (x) is max (0, x).

The pooling layer is a special down-sampling layer, i.e. the feature map obtained by convolution is reduced, the size of a reduction window is NxN, and when the maximum pooling is used, the maximum value is obtained from the NxN window to be used as the value of the corresponding point of the latest image, and the specific formula is as follows:

YP_j(I)＝maxpool(YP_j-1(I))

wherein, YC_i(I) For the input of the jth pooling layer, YP_j(I) Is the output of the jth pooling layer.

The full connection layer FC can be regarded as a convolution layer with a filter window of 1 × 1, and is implemented similarly to convolution filtering, where the expression is as follows:

wherein, F_kI(I) For input to the kth fully-connected layer, YF_k(I) Is the output of the kth fully-connected layer, R, C is F_kI(I) Width and height of (W)_ijAnd B_ijThe connection weights and offsets, respectively, for the fully-connected layer, g () represents the activation function.

The frame regression BBR is to find a relation such that the window P output by the full link layer is mapped to obtain a window G' closer to the real window G. The regression is typically performed by a translation or scaling transformation of the window P.

Let the coordinate of the window P of the full link layer output be (x)₁，x₂，y₁，y₂) Transformed coordinate (x) after window₃，x₄，y₃，y₄) If the translation is converted into translation transformation, the translation scale is (Δ x, Δ y), and the coordinate relationship before and after translation is as follows:

x₃＝x₁+Δx

x₄＝x₂+Δx

y₃＝y₁+Δy

y₄＝y₂+Δy

if the transformation is scaling transformation, the scaling in direction X, Y is dx and dy, and the coordinate relationship before and after transformation is:

x₄-x₃＝(x₂-x₁)*dx

y₄-y₃＝(y₂-y₁)*dy

the second sub-processing unit is used for positioning a target in the current frame according to the target position information of the current image frame detected by the first sub-processing unit, comparing the similarity of the target of the current image frame with the target of the previous image frame, if the similarity is higher than a set threshold, the target is considered to be the same target, and updating the target position information, so that target tracking is realized.

The third sub-processing unit is used for evaluating the image quality of the target area of the current image frame to obtain a quality score.

Taking the image quality evaluation of the target area by the average brightness as an example, the average brightness of the target area can be calculated as follows:

wherein L is_mRepresents the average luminance of the target area, n represents the total number of pixels of the target area, and R represents the target area.

The calculation formula of the quality score of the target area is as follows:

S＝100-|80-L_m|

the higher the quality score, the better the representative image quality. .

ROI area intercepting module

The ROI area intercepting module is used for intercepting an image of an ROI area (namely the target area) in a multi-frame image (an original image) based on the position information of the target.

Illustratively, the ROI region truncation module may truncate an ROI region image from the cached image sequence in the first data format based on the position information of the target.

When the ROI area image interception is carried out on the original image, the original image with the quality score of the target area not higher than a preset score threshold value is removed, the original image with the quality score of the target area higher than the preset score threshold value is reserved, and the ROI area image interception is carried out.

Third, second processing module

The second processing module is used for aligning the multi-frame ROI area images and enhancing the multi-frame ROI area images according to complementary information among the multi-frame ROI area images.

In one example, as shown in fig. 3C, the second processing module includes: a fourth sub-processing unit and a fifth sub-processing unit.

The fourth sub-processing unit is used for selecting one frame from the input multi-frame ROI area images as a reference frame, and carrying out position transformation on other frames based on the reference frame to enable the other frames to be aligned with the reference frame;

and the fifth sub-processing unit is used for performing enhancement processing by using complementary information between the multiple frames of ROI area images according to the aligned multiple frames of ROI area images.

In one example, after performing the enhancement processing on the input multiple frames of ROI area images in the above manner, one frame of enhanced ROI area image (which may be a reference frame enhanced by using complementary information between the multiple frames of ROI area images) may be output.

For example, the fourth sub-processing unit may or may not be implemented by a convolutional neural network.

For example, if the fourth sub-processing unit is implemented by a non-convolutional neural network, feature points of a target image (such as any frame of ROI region image except a reference frame in the aforementioned multiple frames of ROI region images) and a reference image (i.e., the aforementioned reference frame) may be extracted by using a feature point detection method such as Scale-invariant feature transform (SIFT), Harris corner detection, and the like; then calculating according to the position information among the characteristic points to obtain image transformation parameters; and finally, converting the multi-frame image to obtain aligned multi-frame image data.

Methods of image transformation may include, but are not limited to, projective transformation, affine transformation, rotational-translational transformation, and the like.

Taking the example of implementing image transformation by rotational-translational transformation, the image transformation formula is as follows:

wherein x and y represent the transformed coordinates of each pixel point of the image, w and z represent the coordinates of each pixel point of the image, and alpha and d_hor、d_verAnd parameters representing the image transformation obtained by the transformation parameter calculation module.

Illustratively, the fifth sub-processing unit is configured to enhance the images of the multiple frames of ROI regions, and the enhancement may be implemented by a convolutional neural network or by a non-convolutional neural network, which is not described herein again in this embodiment of the present application.

Fourth and third processing module

And the third processing module is used for carrying out compression coding on the ROI area image after enhancement processing and transmitting the ROI area image to the back-end equipment through a network.

In the embodiment of the application, the video monitoring front-end equipment is used for carrying out target detection and tracking on monitoring video frames to determine the position information of the same target in each monitoring video frame, and determining the ROI (region of interest) corresponding to the target in each monitoring video frame according to the position information of the same target in each monitoring video frame, and further intercepting the ROI from the cached monitoring video frame in the original format according to the quality score of the ROI in each monitoring video frame, carrying out enhancement processing on the image sequence of the intercepted ROI in the original format, and carrying out coding transmission on the image sequence after enhancement processing, so that the influence of information loss in the compression transmission process on image quality optimization is avoided, and the image quality optimization effect is improved; in addition, the image quality optimization is no longer limited to vehicles, and the applicable scenes of the scheme are expanded.

The methods provided herein are described above. The following describes the apparatus provided in the present application:

referring to fig. 4, a schematic structural diagram of an image processing apparatus according to an embodiment of the present disclosure is shown in fig. 4, where the image processing apparatus may include:

a target detection unit 410, configured to perform target detection and tracking on the monitored video frames to determine position information of the same target in each monitored video frame;

a determining unit 420, configured to determine, according to position information of the same target in each monitored video frame, a region of interest ROI corresponding to the target in each monitored video frame;

an intercepting unit 430, configured to intercept an ROI from the cached surveillance video frame in the original format according to the quality score of the ROI in each surveillance video frame;

an enhancement processing unit 440, configured to perform enhancement processing on the captured image sequence of the ROI in the original format;

a transmitting unit 450, configured to perform encoding transmission on the image sequence after the enhancement processing.

In an optional implementation manner, the target detection unit 410 is specifically configured to perform target detection and tracking on the surveillance video frames based on a neural network, so as to determine position information of the same target in each surveillance video frame.

In an optional embodiment, the determining unit 420 is further configured to determine a quality score of a ROI region in each of the monitored video frames;

the intercepting unit 430 is specifically configured to intercept, from the cached monitoring video frames in the original format, the ROI area with the quality score higher than the preset score threshold based on the quality score of the ROI area in each monitoring video frame.

In an optional implementation manner, the enhancement processing unit 440 is specifically configured to select a reference frame from a plurality of frame images in an input image sequence of the ROI region in the original format; aligning the plurality of image pairs based on the reference frame; and performing enhancement processing on the multi-frame images based on the complementary information between the aligned multi-frame images.

In an alternative embodiment, the transmission unit 450 is specifically configured to select, for the same target, a part or all of image frames from the enhanced image sequence for encoding transmission.

In an optional implementation manner, the transmission unit 450 is specifically configured to select a frame of image with the highest quality score from the image sequence after the enhancement processing for encoding and transmitting.

Fig. 5 is a schematic diagram of a hardware structure of an image processing apparatus according to an embodiment of the present disclosure. The image processing apparatus may include a processor 501, a machine-readable storage medium 502 storing machine-executable instructions. The processor 501 and the machine-readable storage medium 502 may communicate via a system bus 503. Also, the processor 501 may perform the image processing method described above by reading and executing machine-executable instructions in the machine-readable storage medium 502 corresponding to the image processing logic.

The machine-readable storage medium 502 referred to herein may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like. For example, the machine-readable storage medium may be: a RAM (random Access Memory), a volatile Memory, a non-volatile Memory, a flash Memory, a storage drive (e.g., a hard drive), a solid state drive, any type of storage disk (e.g., an optical disk, a dvd, etc.), or similar storage medium, or a combination thereof.

Embodiments of the present application also provide a machine-readable storage medium, such as machine-readable storage medium 502 in fig. 5, comprising machine-executable instructions that are executable by processor 501 in an image processing apparatus to implement the image processing method described above.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.

Claims

1. An image processing method is applied to video monitoring front-end equipment, and the method comprises the following steps:

determining a region of interest (ROI) corresponding to the same target in each monitoring video frame according to the position information of the target in each monitoring video frame;

2. The method of claim 1, wherein the performing object detection and tracking on the surveillance video frames to determine the position information of the same object in each surveillance video frame comprises:

and based on a neural network, carrying out target detection and tracking on the monitoring video frames so as to determine the position information of the same target in each monitoring video frame.

3. The method of claim 1, wherein the step of intercepting the ROI from the buffered surveillance video frames in the original format according to the quality score of the ROI in each surveillance video frame comprises:

determining a quality score of the ROI area in each monitoring video frame;

4. The method according to claim 3, wherein the enhancing the image sequence of the intercepted ROI area in original format comprises:

selecting a reference frame from a plurality of frame images in the input image sequence of the ROI in the original format;

aligning the plurality of image pairs based on the reference frame;

and performing enhancement processing on the multi-frame images based on the complementary information between the aligned multi-frame images.

5. The method according to claim 1, wherein said encoding transmission of the image sequence after enhancement processing comprises:

and for the same target, selecting part or all image frames from the image sequence after the enhancement processing for coding transmission.

6. The method of claim 5, wherein the selecting a portion of the image frame from the enhanced image sequence for encoded transmission comprises:

and selecting a frame of image with the highest quality score from the image sequence after the enhancement processing for coding and transmission.

7. An image processing apparatus, applied to a video surveillance front-end device, the apparatus comprising:

8. The apparatus of claim 7,

the target detection unit is specifically configured to perform target detection and tracking on the surveillance video frames based on a neural network, so as to determine position information of the same target in each surveillance video frame.

9. The apparatus of claim 7,

the determining unit is further configured to determine a quality score of the ROI region in each of the monitored video frames;

the intercepting unit is specifically configured to intercept, from the cached monitoring video frames in the original format, the ROI region whose quality score is higher than a preset score threshold based on the quality score of the ROI region in each monitoring video frame.

10. The apparatus of claim 9,

the enhancement processing unit is specifically used for selecting a reference frame from a plurality of frame images in an input image sequence of an ROI (region of interest) in an original format; aligning the plurality of image pairs based on the reference frame; and performing enhancement processing on the multi-frame images based on the complementary information between the aligned multi-frame images.

11. The apparatus of claim 7,

the transmission unit is specifically configured to select, for the same target, a part or all of the image frames from the image sequence after the enhancement processing for encoding transmission.

12. The apparatus of claim 11,

the transmission unit is specifically configured to select a frame of image with the highest quality score from the image sequence after the enhancement processing for encoding and transmission.

13. An image processing apparatus comprising a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor, the processor being caused by the machine-executable instructions to perform the method of any one of claims 1 to 6.

14. A machine-readable storage medium having stored thereon machine-executable instructions that, when invoked and executed by a processor, cause the processor to perform the method of any of claims 1-6.