CN115037915A

CN115037915A - Video processing method and processing device

Info

Publication number: CN115037915A
Application number: CN202110245949.8A
Authority: CN
Inventors: 李�瑞; 张俪耀; 陆洋; 刘蒙
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-03-05
Filing date: 2021-03-05
Publication date: 2022-09-09
Anticipated expiration: 2041-03-05
Also published as: CN115037915B

Abstract

The application provides a video processing method and a video processing device. According to the technical scheme, the position information of the imaging time of one or more to-be-processed images in a to-be-processed video frame of a shooting device is obtained, an ideal motion curve of the shooting device is estimated according to the position information corresponding to the plurality of video frames in the to-be-processed video, the position information, corresponding to the imaging time of the one or more to-be-processed images in the to-be-processed video frame, of the shooting device on the ideal motion curve is used as the ideal position information of the shooting device in the to-be-processed video frame, then a transformation matrix capable of correcting the position information of the imaging time of all to-be-processed images in the to-be-processed video is calculated, finally, a target HDR image of the to-be-processed video frame is fused according to all to-be-processed images in the to-be-processed video frame and the corresponding transformation matrix, and a target video is further generated. The method avoids the amplification of the ghost image, improves the HDR fusion precision and reduces the occurrence probability of the ghost image.

Description

Video processing method and processing device

Technical Field

The present application relates to the field of digital image processing, and in particular, to a video processing method and apparatus.

Background

With the rapid development of video image technology, people have higher and higher requirements on video viewing experience, and High Dynamic Range (HDR) videos have been gradually applied to the field of special effects of movies and televisions. Compared with a Low Dynamic Range (LDR) image, the HDR video can present wider luminance and more colors, and can better show visual effects.

In the method for obtaining the HDR video, firstly, LDR images with different exposure time are fused to obtain an HDR image of each frame, then the HDR video is generated according to the HDR images of all the frames, and finally, the HDR video is subjected to video stabilization, so that the obtained HDR video has stability.

An unacceptable floating object phenomenon often occurs in the HDR video obtained by using the method, and the visual experience of a user is influenced.

Disclosure of Invention

The application provides a video processing method and a video processing device, which can avoid the problem that the ghost of an image in an obtained HDR video is amplified, improve the HDR fusion precision and reduce the occurrence probability of the ghost.

In a first aspect, the present application provides a video processing method. The method comprises the following steps: acquiring a video to be processed, wherein the video to be processed comprises J video frames to be processed, each video frame to be processed in the J video frames to be processed comprises K images to be processed, the K images to be processed correspond to K exposure durations one by one, and J and K are integers greater than 1; acquiring position information of an mth to-be-processed video frame in J to-be-processed video frames of a to-be-processed video to include an imaging moment of a first to-be-processed image in K to-be-processed images by a shooting device of the to-be-processed video, wherein m is an integer and is taken from 1 to J; estimating a motion curve when the shooting device shoots a video to be processed according to the position information of the shooting device at the imaging moment of the first image to be processed corresponding to all the video frames to be processed in the J video frames to be processed respectively; determining the position information of the shooting device at the imaging moment of the first image to be processed in the mth video frame to be processed on the motion curve as the ideal position information of the shooting device at the imaging moment of the nth image to be processed in the mth video frame to be processed, wherein n is an integer and is taken from 1 to K; determining a first transformation matrix from the position information of the imaging time of the nth to-be-processed image in the mth to-be-processed video frame to the ideal position information of the imaging time of the nth to-be-processed image in the mth to-be-processed video frame according to the ideal position information of the imaging time of the imaging device in the mth to-be-processed image in the mth to-be-processed video frame and the position information of the imaging time of the imaging device in the nth to-be-processed image in the mth to-be-processed video frame; according to a first transformation matrix from the position information of the imaging time of the nth image to be processed in the mth video frame to the ideal position information of the imaging time of the nth image to be processed in the mth video frame to be processed, performing high dynamic range HDR fusion processing on K images to be processed in the mth video frame to be processed to obtain a target video frame corresponding to the mth video frame, wherein J target video frames corresponding to the J video frames to be processed form a target video.

According to the video processing method provided by the application, before HDR fusion is carried out, the first transformation matrix between the position information and the ideal position information of all the to-be-processed image imaging moments in each to-be-processed video frame is already calculated, namely the video stabilized image is placed before the HDR fusion, so that the enlargement of ghost images in the video stabilized image is avoided. Through the first transformation matrix, all images to be processed in each video frame to be processed can be corrected to ideal positions, that is, before HDR fusion is performed, position information of all images to be processed in each video frame to be processed is registered to an image shot by a shooting device at the same ideal position, so that all images to be processed are registered once, and a subsequent HDR fusion module can further perform registration fusion, so that registration between the images to be processed which is coarse before fine is formed. Therefore, the method also improves the HDR fusion precision and reduces the occurrence probability of ghost.

With reference to the first aspect, in a possible implementation manner, the image to be processed includes h rows of pixels, and the method further includes: determining a second transformation matrix from the imaging time position information of the ith row of the nth image to be processed in the mth video frame to be processed to the imaging time position information of the ith row of the nth image to be processed in the mth video frame to be processed according to the imaging time ideal position information of the shooting device in the mth video frame to be processed and the imaging time position information of the shooting device in the ith row of the nth image to be processed in the mth video frame to be processed, wherein h is an integer larger than 1, i is an integer and is taken from 1 to h; correspondingly, according to a first transformation matrix from the position information of the imaging time of the nth to-be-processed image in the mth to-be-processed video frame to the ideal position information of the imaging time of the nth to-be-processed image in the mth to-be-processed video frame, performing high dynamic range HDR fusion processing on K to-be-processed images in the mth to-be-processed video frame to obtain target video frames corresponding to the mth to-be-processed video frame, where J target video frames corresponding to the J to-be-processed video frames constitute a target video, including: according to a first transformation matrix from the position information of the imaging time of the nth to-be-processed image in the mth to-be-processed video frame to the ideal position information of the imaging time of the nth to-be-processed image in the mth to-be-processed video frame, and a second transformation matrix from the position information of the imaging time of the ith row of the nth to-be-processed image in the mth to-be-processed video frame to the ideal position information of the imaging time of the ith row of the nth to-be-processed image in the mth to-be-processed video frame, performing high dynamic range HDR (high dynamic range) fusion processing on K to-be-processed images in the mth to-be-processed video frame to obtain a target video frame corresponding to the mth to-be-processed video frame, wherein J target video frames corresponding to the J to-be-processed video frames form a target video.

The video processing method provided by the application not only calculates the first transformation matrix for correcting each image to be processed to the ideal position, but also calculates the second transformation matrix for correcting each line of each image to be processed to the ideal position, so that under the condition that the whole image to be processed is corrected to the ideal position, each line of the image to be processed can be guaranteed to be corrected to the ideal position, the registration between the lines in each image to be processed is completed, the accuracy of shooting each image to be processed is improved, further, the registration degree between the images to be processed in each video frame can be improved, and the ghost problem occurring in the fusion process is reduced.

With reference to the first aspect, in a possible implementation manner, the acquiring, by a shooting device of a to-be-processed video, position information of an imaging time of a first to-be-processed image in K to-be-processed images included in an mth to-be-processed video frame of J to-be-processed video frames includes: according to the information recorded by the motion sensor, acquiring sensor information of the imaging moment of a first to-be-processed image in K to-be-processed images contained in the mth to-be-processed video frame of the J to-be-processed video frames by the shooting device through an interpolation function; and integrating the sensor information of the first image to be processed at the imaging moment to obtain the position information of the first image to be processed at the imaging moment.

With reference to the first aspect, in one possible implementation manner, the motion sensor includes a gyroscope or an inertial measurement unit.

With reference to the first aspect, in a possible implementation manner, determining, according to ideal position information of a shooting device at an imaging time of an nth to-be-processed image in an mth to-be-processed video frame and position information of the shooting device at an imaging time of an nth to-be-processed image in the mth to-be-processed video frame, a first transformation matrix between the position information of the imaging time of the nth to-be-processed image in the mth to-be-processed video frame and the ideal position information of the imaging time of the nth to-be-processed image in the mth to-be-processed video frame includes: calculating a transformation matrix R between ideal position information of a shooting device at the imaging moment of an nth to-be-processed image in an mth to-be-processed video frame and position information of the shooting device at the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame; transforming the matrix R according to the formula TRT ^-1 And transforming to obtain a first transformation matrix, wherein T is a parameter matrix of the shooting device.

With reference to the first aspect, in a possible implementation manner, performing high dynamic range HDR fusion processing on K to-be-processed images in an mth to-be-processed video frame according to a first transformation matrix between actual position information corresponding to an imaging time of an nth to-be-processed image in the mth to-be-processed video frame and corresponding ideal position information to obtain a target video frame corresponding to the mth to-be-processed video frame, includes: carrying out affine transformation on the nth to-be-processed image in the mth to-be-processed video frame through the corresponding first transformation matrix to obtain an image subjected to affine transformation on the nth to-be-processed image in the mth to-be-processed video frame; inputting the image after affine transformation into an HDR fusion module to generate a target video frame corresponding to the mth video frame to be processed; or, the nth to-be-processed image in the mth to-be-processed video frame and the corresponding first transformation matrix are simultaneously input to the HDR fusion module, and a target video frame corresponding to the mth to-be-processed video frame is generated.

With reference to the first aspect, in one possible implementation manner, each to-be-processed image is a native RAW image.

With reference to the first aspect, in a possible implementation manner, the K to-be-processed images include a first exposure image, a second exposure image, and a third exposure image, where the first exposure image, the second exposure image, and the third exposure image correspond to a first exposure duration, a second exposure duration, and a third exposure duration one to one, the first exposure duration is greater than the second exposure duration, the second exposure duration is greater than the third exposure duration, and the second exposure image is the first to-be-processed image.

In a second aspect, the present application provides a video processing apparatus comprising: the device comprises a to-be-processed video acquisition module, a to-be-processed video acquisition module and a processing module, wherein the to-be-processed video acquisition module is used for acquiring a to-be-processed video, the to-be-processed video comprises J to-be-processed video frames, each to-be-processed video frame in the J to-be-processed video frames comprises K to-be-processed images, the K to-be-processed images correspond to K exposure durations one to one, and J and K are integers greater than 1; the position information acquisition module of the first image to be processed is used for acquiring the position information of the imaging moment of the first image to be processed in K images contained in the mth video frame of the J video frames to be processed of the shooting device of the video to be processed, wherein m is an integer and is taken from 1 to J; the estimation module is used for estimating a motion curve when the shooting device shoots the video to be processed according to the position information of the shooting device at the imaging time of the first image to be processed corresponding to all the video frames to be processed in the J video frames to be processed respectively; the ideal position information determining module is used for determining the position information of the imaging moment of the first to-be-processed image in the mth to-be-processed video frame of the shooting device on the motion curve as the ideal position information of the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame of the shooting device, wherein n is an integer and is taken from 1 to K; the first transformation matrix determining module is used for determining a first transformation matrix from the position information of the imaging time of the nth to-be-processed image in the mth to-be-processed video frame to the ideal position information of the imaging time of the nth to-be-processed image in the mth to-be-processed video frame according to the ideal position information of the imaging time of the nth to-be-processed image in the mth to-be-processed video frame of the shooting device and the position information of the imaging time of the nth to-be-processed image in the mth to-be-processed video frame; and the fusion module is used for performing High Dynamic Range (HDR) fusion processing on K to-be-processed images in the mth to-be-processed video frame according to a first transformation matrix from the position information of the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame to ideal position information of the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame to obtain a target video frame corresponding to the mth to-be-processed video frame, and J target video frames corresponding to the J to-be-processed video frames form a target video.

With reference to the second aspect, in a possible implementation manner, the image to be processed includes h rows of pixels, and the apparatus further includes: a second transformation matrix determining module, configured to determine a second transformation matrix between the position information of the imaging time of the ith row of the nth to-be-processed image in the mth to-be-processed video frame and the ideal position information of the imaging time of the ith row of the nth to-be-processed image in the mth to-be-processed video frame according to the ideal position information of the imaging time of the photographing device in the nth to-be-processed image in the mth to-be-processed video frame and the position information of the imaging time of the photographing device in the ith row of the nth to-be-processed image in the mth to-be-processed video frame, where h is an integer greater than 1, i is an integer, and h is taken from 1 to h; correspondingly, the fusion module is further configured to: according to a first transformation matrix from the position information of the imaging time of the nth to-be-processed image in the mth to-be-processed video frame to the ideal position information of the imaging time of the nth to-be-processed image in the mth to-be-processed video frame and a second transformation matrix from the position information of the imaging time of the ith row of the nth to-be-processed image in the mth to-be-processed video frame to the ideal position information of the imaging time of the ith row of the nth to-be-processed image in the mth to-be-processed video frame, performing high dynamic range HDR (high dynamic range) fusion processing on K to-be-processed images in the mth to-be-processed video frame to obtain a target video frame corresponding to the mth to-be-processed video frame, wherein J target video frames corresponding to the J to-be-processed video frames form a target video.

With reference to the second aspect, in a possible implementation manner, the position information obtaining module of the first image to be processed is specifically configured to: according to the information recorded by the motion sensor, obtaining the sensor information of the imaging moment of the first image to be processed in K images to be processed in the mth video frame to be processed in the J video frames to be processed by the shooting device through an interpolation function; and integrating the sensor information of the first image to be processed at the imaging moment to obtain the position information of the first image to be processed at the imaging moment.

With reference to the second aspect, in one possible implementation manner, the motion sensor includes a gyroscope or an inertial measurement unit.

With reference to the second aspect, in a possible implementation manner, the first transformation matrix determining module is specifically configured to: calculating a transformation matrix between ideal position information of the photographing device at the imaging time of the nth to-be-processed image in the mth to-be-processed video frame and position information of the photographing device at the imaging time of the nth to-be-processed image in the mth to-be-processed video frameR; transforming the matrix R according to the formula TRT ^-1 And transforming to obtain a first transformation matrix, wherein T is a parameter matrix of the shooting device.

With reference to the second aspect, in a possible implementation manner, the fusion module is specifically configured to: carrying out affine transformation on the nth to-be-processed image in the mth to-be-processed video frame through the corresponding first transformation matrix to obtain an image subjected to affine transformation on the nth to-be-processed image in the mth to-be-processed video frame; inputting the affine-transformed image into an HDR fusion module to generate a target video frame corresponding to the mth video frame to be processed; or, the nth to-be-processed image in the mth to-be-processed video frame and the corresponding first transformation matrix are simultaneously input to the HDR fusion module, and a target video frame corresponding to the mth to-be-processed video frame is generated.

With reference to the second aspect, in one possible implementation manner, each to-be-processed image is a native RAW image.

With reference to the second aspect, in a possible implementation manner, the K to-be-processed images include a first exposure image, a second exposure image, and a third exposure image, where the first exposure image, the second exposure image, and the third exposure image correspond to a first exposure duration, a second exposure duration, and a third exposure duration one to one, the first exposure duration is greater than the second exposure duration, the second exposure duration is greater than the third exposure duration, and the second exposure image is the first to-be-processed image.

In a third aspect, the present application provides a video processing apparatus comprising: a memory and a processor; the memory is to store program instructions; the processor is configured to invoke program instructions in the memory to perform the video processing method according to the first aspect or any one of the possible implementations.

In a fourth aspect, the present application provides a chip, which includes at least one processor and a communication interface, where the communication interface and the at least one processor are interconnected by a line, and the at least one processor is configured to execute a computer program or instructions to perform the video processing method according to the first aspect or any one of the possible implementations.

In a fifth aspect, the present application provides a computer readable medium storing program code for execution by a device, the program code comprising instructions for performing a video processing method according to the first aspect or any one of the possible implementations thereof.

In a sixth aspect, the present application provides a computer program product containing instructions, where the computer program product includes computer program code, and when the computer program code runs on a computer, the computer is caused to execute the video processing method according to the first aspect or any one of the possible implementations.

Drawings

FIG. 1 is a schematic view of an imaging system provided in one embodiment of the present application;

fig. 2 is a schematic structural diagram of a video processing method according to an embodiment of the present application;

fig. 3 is a schematic flow chart of a video processing method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of obtaining images to be processed with different exposure durations according to an embodiment of the present application;

FIG. 5 is a diagram illustrating an estimated motion profile according to an embodiment of the present application;

FIG. 6 is a schematic flow chart diagram of a video processing method according to another embodiment of the present application;

fig. 7 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a video processing apparatus according to another embodiment of the present application.

Detailed Description

For understanding, the relevant terminology referred to in this application will be first described.

1. Dynamic range

The dynamic range in a digital imaging system generally represents the ratio of the maximum to the minimum of two points in an image. In most application scenarios, the larger the dynamic range of the digital imaging system is, the wider the range of the illumination intensity detectable by the imaging system is, and the details of the scene in the captured image are richer.

Generally, the amount of charge stored in the photosensitive element in the image sensor determines the dynamic range of the sensor device, and when the image sensor is in a saturation exposure state, the image sensor reaches the maximum saturation capacity, i.e. no more electrons are accepted no matter how the exposure is increased, and then the photosensitive element is in a full charge capacity saturation state. The minimum exposure amount is also equal to the noise exposure amount, which is equivalent to the exposure amount when the image sensor has only its own dark current in a completely black environment. The current passing through the photosensitive element is dark current. The images of the digital imaging system are output in digital form, so that the storage of information is ultimately done in digital form. An important component of the system is an analog-to-digital converter (ADC). An important indicator of an ADC converter is its bit number. For an 8-bit ADC converter, the minimum signal it can record is 1 and the maximum signal is 255, so its output dynamic range is 255.

The dynamic range span of a real scene in nature is very wide (10) ^-6 ～10 ⁶ ) However, the dynamic range that can be obtained by our human eyes can only cover a part of the dynamic range, and the traditional digital image can store more limited brightness information and dynamic range information due to the limitation of the obtaining mode and the image information storage format.

2. High dynamic range imaging techniques

When a picture is taken by using a camera or a mobile phone or other common photographing equipment, the dynamic range of the obtained picture is far lower than that contained in a real scene. The former belongs to a Low Dynamic Range (LDR) image with a size of 256 units, and the latter belongs to a High Dynamic Range (HDR) scene with a size of 10 units ⁶ And (4) units. Therefore, when an image is taken, the brighter or darker areas of the real scene will appear saturated, i.e., completely black or completely white, in the taken image, thereby causing loss of image information.

The high dynamic range imaging technology can solve the problem of the difference between the dynamic range of a real scene and the dynamic range of a shot image, better captures the details in the real scene, and can be mainly divided into hardware and software. The hardware method is generally implemented by using an imaging device with a special sensor or simultaneously using multiple imaging devices, and although the dynamic range of the camera is significantly improved compared with that of a common camera, the dynamic range of the camera cannot be compared with that of a natural scene. These methods are limited by hardware, exposure speed, resolution, etc., are relatively demanding on hardware, and are too expensive for most people.

The multi-exposure image fusion technology is the most important method for obtaining HDR images based on software. Under the premise of not changing hardware, the same scene is subjected to multiple different exposures by adjusting the aperture and the exposure time of a camera, and then the multiple exposures are properly fused to obtain an HDR image for reproducing the dynamic range of the target scene. According to different operation domains of the exposure fusion algorithm, the exposure fusion algorithm can be divided into two types: a radiance domain fusion algorithm and an image domain fusion algorithm. The more classical is a fusion algorithm of a radiation domain, which firstly calculates a camera response function through information such as aperture, exposure time and the like to obtain real radiation values of all pixel points of an imaged scene, namely, to generate an HDR image corresponding to an LDR image obtained by a video camera, and then performs nonlinear mapping on the obtained HDR image by applying tone mapping, so that the HDR image can be displayed on a common LDR device. The exposure fusion in the image domain is to perform fusion operation directly on the pixel values of the image, and the common image that can be directly displayed on the LDR device is obtained without recovering the camera response function (RCF) and restoring the radiation values. The radiometric domain fusion algorithm can truly restore the dynamic range of the scene, and has been well applied to various image processing software. However, the camera response function calculation is sensitive to image noise and image sequence registration errors and is not easy to operate. And the image fusion algorithm bypasses the estimation of the CRF of the camera and directly fuses the pixel values, so the performance is more stable, the fusion process is simpler, and the operation cost is relatively lower.

3. Ghost image

When the HDR image is obtained by the multi-exposure image fusion technique, a target scene to be photographed or a camera remains still. Once the scene changes during the shooting process, for example, a moving object intrudes into the scene or the camera shakes, a blurred or translucent image, called a ghost, appears in the region where the scene changes in the finally obtained fused image. Since the outdoor shot scene is mostly a dynamic scene, and moving objects are difficult to avoid, ghost images are very easy to appear.

4. Video image stabilization technique

With the rapid development of electronic technology, users can take video shots through mobile terminals, such as mobile phones, tablet computers, digital cameras, handheld video cameras, and the like. However, in the shooting process, due to the influence of factors such as shooting skills and shooting environment, the shooting equipment may shake, so that the shot video has an unstable picture, which affects the normal viewing of the user, and therefore, the video needs to be subjected to image stabilization processing. Video stabilization is a technique for modifying and arranging a sequence of dynamic images acquired by a randomly jittered or randomly moving camera so that the sequence of dynamic images can be displayed on a display more smoothly. The method eliminates or weakens the irregular translation, rotation, scaling and other distortion conditions among image sequences, and improves the quality of the picture, so that the picture is more suitable for target detection, tracking, identification and other processing operations in intelligent video analysis.

Currently, video stabilization methods include mechanical, optical, and digital stabilization methods.

Mechanical video stabilization uses motion detected by special sensors (such as gyroscopes and accelerometers) to move the image sensor to compensate for the camera motion.

Optical video stabilization achieves stabilization by moving portions of the lens. This approach does not move the entire camera, but rather uses a movable lens assembly that variably adjusts the path length of the light as it passes through the lens system of the camera.

Digital video stabilization does not require special sensors to estimate the motion of the camera, and mainly comprises three steps: motion estimation, motion smoothing, and shake correction. The motion estimation is to estimate motion information of a video image. The motion smoothing is to smooth the estimated motion information of the video image to obtain a new smooth motion track of the video image. And the jitter correction is to obtain compensation information of the current video frame according to the estimated motion track of the video image and the smoothed motion track and correct the current video frame.

5. Original image

A RAW (RAW) image is an image generated by a digital camera, light is irradiated to a photosensitive element through a lens, and is converted into an image file in which an electronic signal with image data is directly stored without any processing, so that a RAW image is formed, which records the most original data obtained by the photosensitive element, and is not an image file directly generated after encoding compression.

Since the data in such images has not been processed, printed or used for editing, typically with a wide internal color, it can be adjusted only precisely or with some simple modification before conversion.

From conventional film cameras to digital cameras today, images are also rapidly moving into the digital age as one of the most important carriers for people to obtain information on the external environment. Images that are generally converted to digital form for storage and processing are collectively referred to as digital images.

Fig. 1 is a schematic diagram of a digital imaging system according to an embodiment of the present application. As shown in fig. 1, the digital imaging system of the present application may include a lens 102, a photosensor 103, an analog-to-digital converter (ADC) 104, an image signal processor 105, and a memory 106.

For the digital imaging system shown in fig. 1, the imaging process generally consists of several stages: first, the lens 102 focuses the light 101 and transmits the light 101 to the photosensor 103, then the photosensor 103 converts the information of the light 101 into an analog electrical signal, and then the analog electrical signal is converted into a digital signal by the ADC converter 104, the image after ADC conversion is called a RAW image, and the pixel value of the RAW image at this time is in a substantially linear relationship with the intensity of the ambient light, which is quite close to the HDR image, except that it does not have a high enough dynamic range. After the RAW image is taken, a series of operations of Image Signal Processing (ISP) 105 are also required, and finally, an LDR image which can be stored in a memory 106 and displayed on a display screen is obtained.

The ISP method can comprise the following steps: white balance, the most basic correction methods are mainly two, based on a gray world assumption and a whiteness world assumption, the operation is to make white perceived by human eyes accurately appear as white in an image; demosaic, wherein information recorded by a general photosensitive element is single-channel, and information of three RGB channels is arranged in a Bayer pattern, so that specific numerical values of the corresponding RGB channels need to be solved in an interpolation mode; noise suppression, namely, after the three RGB channels are supplemented, various noises introduced in the camera imaging process need to be further suppressed; color space conversion, which converts image information from an RGB color space of a sensor into a Standard Red Green Blue (SRGB), and then can convert the image information from an SRGB system into various RGB systems related to display; tone mapping, namely after obtaining an image of a standard SRGB system, performing compressed or stretched nonlinear processing on a pixel value which is originally in a linear relation with ambient illumination through a tone mapping curve in order to enable the image to be displayed on display equipment more naturally; and compressing and quantizing the image, compressing the image file into a JPEG format and outputting the JPEG format to obtain a finally displayable LDR image.

The digital imaging system shown in fig. 1 may also be used to capture video comprising a plurality of video frames. For example, the digital imaging system may capture LDR images with different exposure durations for each video frame, then fuse the LDR images with different exposure durations by using a multi-exposure fusion algorithm to obtain an initial HDR image corresponding to each video frame, then obtain an enhanced HDR image by using ISP methods such as denoising, white balance, and color correction, and finally perform a video image stabilization operation on an HDR video composed of multiple HDR images corresponding to the multiple video frames to obtain an HDR video with a certain stability.

However, in the method for obtaining the HDR video, in order to obtain most of information in a scene well from each frame of HDR image in the HDR video, it is necessary to assume that both the camera and the captured scene are still. However, in actual shooting, the motion of the camera and the object in the scene is inevitable, which may cause the HDR image obtained after fusion to contain ghost. At this time, if a ghost image is contained in the HDR image of a certain frame, the ghost image may be further enlarged in the subsequent video image stabilization operation, so that an unacceptable floating object phenomenon appears in the final HDR video, which affects the visual experience of the user.

As an example, assuming that when generating an HDR video, LDR images of different exposure durations include one long-exposure image and one short-exposure image, an HDR image becomes after fusion with the long-exposure image or the short-exposure image as a reference image. Then, when the whole HDR video sequence is subsequently corrected, because the fused HDR image is used as each frame for correction, if a certain pixel point in the image corresponds to a pixel value of a short-exposure image, and the fused pixel point corresponds to a value of a long-exposure image, then the conversion value of the point will increase, and at this time, if the pixel point is a ghost point, the ghost of the point in the image will be amplified.

In view of this, the present application proposes a new video processing scheme. In the technical scheme provided by the application, the video image stabilization operation is placed before HDR fusion. As shown in fig. 2, the image data of the video includes J frames, each frame includes multiple exposure images, and then the video image stabilization operation is performed, which includes estimation of camera motion vectors, smoothing of camera motion curves, and calculation of image transformation matrices. After the video image stabilization operation, performing HDR fusion on a plurality of exposed images in each frame to obtain an HDR image. Optionally, after obtaining the HDR image, ISP processing may be performed on each HDR image, and a specific implementation process is not described herein again.

In the technical scheme of the application, one implementation manner of the video image stabilization operation is as follows: acquiring actual camera position information corresponding to a reference image in each video frame in a video; then, estimating an ideal motion curve of the camera based on actual camera position information corresponding to reference images of all video frames in the video, and acquiring ideal camera position information corresponding to each video frame from the ideal motion curve; and calculating a transformation matrix capable of correcting the position information of all the images to be processed in the video to be processed at the imaging moment according to the actual camera position information of the images with different exposure durations in each video frame and the ideal camera position information of the video frame, and then performing HDR fusion on the images with different exposure durations based on the transformation matrix.

Further, in the technical solution of the present application, intra-frame alignment of video frames may also be performed. One implementation of intra-frame alignment is as follows: for each exposure image in each video frame, a transformation matrix capable of correcting the position information of each line in each exposure image at the imaging time is calculated by the position information of the camera at the imaging time of each line in the exposure image and the ideal position information of the camera in the video frame, so that each line in each exposure image can be aligned to the image of the video frame, which is shot by the camera at the ideal position, through the transformation matrix.

Fig. 3 is a schematic flow chart of a video processing method according to an embodiment of the present application. As shown in fig. 3, the method of the present embodiment may include S301, S302, S303, S304, S305, and S306. The video processing method may be performed by the digital imaging system shown in fig. 1.

S301, a to-be-processed video is obtained, wherein the to-be-processed video comprises J to-be-processed video frames, each to-be-processed video frame in the J to-be-processed video frames comprises K to-be-processed images, the K to-be-processed images correspond to K exposure durations one by one, and J and K are integers larger than 1.

It should be understood that the K to-be-processed images correspond to the K exposure time periods one to one, that is, the to-be-processed images are different in different exposure time periods.

In this embodiment, the to-be-processed video represents a video before each frame in the video is fused into an HDR image, where the to-be-processed video may include J to-be-processed video frames, and each to-be-processed video frame includes K to-be-processed images with different exposure durations. For example, the video to be processed includes 30 video frames to be processed, and each video frame to be processed includes 3 images to be processed with different exposure time lengths.

In one implementation, images of K different exposure times may be obtained by an HDR capable sensor. It is noted that the HDR-capable sensor is capable of obtaining an LDR image for HDR fusion, and a specific implementation process thereof may be described with reference to the related art, and is not described herein again.

As an example, fig. 4 is an image obtained based on an HDR capable sensor with 2 different exposure durations by alternately performing a long exposure and a short exposure provided by an embodiment of the present application. As shown in fig. 4, each frame includes h lines of images, and when images with different exposure durations are obtained, long exposure is performed on each line first, then short exposure is performed, and after the long exposure and the short exposure of the last line are completed, the long exposure image and the short exposure image in each frame are finally obtained.

Alternatively, the format of the long-exposure image and the short-exposure image may be a Bayer (Bayer) format, which is not limited in this embodiment.

It is noted that, reference may be made to the related art for implementing the process of obtaining images with different exposure durations by using the HDR-capable sensor, and details thereof are not repeated herein.

S302, acquiring the position information of the imaging moment of the mth to-be-processed video frame in the J to-be-processed video frames of the to-be-processed video, wherein the mth to-be-processed video frame comprises the first to-be-processed image in the K to-be-processed images, m is an integer and is taken from 1 to J.

For example, the shooting device of the video to be processed may be a camera, or another shooting device that can acquire the video to be processed, which is not limited in this embodiment of the present application.

In the present embodiment, m is taken from 1 to J, that is, the photographing device that acquires the video to be processed contains the position information of the imaging timing of the first image to be processed of the K images to be processed in each frame of the video to be processed.

It should be understood that each to-be-processed video frame in the present embodiment includes K to-be-processed images, and each to-be-processed image also has its corresponding imaging time, and the first to-be-processed image is one of the K to-be-processed images. For example, each video frame to be processed includes an image of a first exposure duration, an image of a second exposure duration, and an image of a third exposure duration, and the image of the second exposure duration is taken as the first image to be processed.

It should also be understood that when capturing K to-be-processed images in each to-be-processed video frame, each to-be-processed image has positional information of the corresponding capturing device at the imaging time, and therefore, positional information of the capturing device at the imaging time of the first to-be-processed image can also be obtained.

And S303, estimating a motion curve when the shooting device shoots the video to be processed according to the position information of the shooting device at the imaging time of the first image to be processed corresponding to all the video frames to be processed in the J video frames to be processed.

In this embodiment, all the to-be-processed video frames in the J to-be-processed videos include a corresponding first to-be-processed image, and after the position information of the shooting device at the imaging time of the first to-be-processed image of all the to-be-processed video frames is obtained, the position information of the shooting device corresponding to all the first to-be-processed images can be estimated to obtain the motion curve of the to-be-processed video.

It should be understood that the motion curve is estimated by the position information of the camera at the imaging instant of the first image to be processed in all video frames to be processed, and therefore, the motion curve can be considered as an ideal motion curve of the camera at different imaging instants.

In an implementation manner, the position information corresponding to the imaging time of the first to-be-processed image of all to-be-processed video frames in the J to-be-processed video frames of the shooting device may be fitted to obtain the motion curve when the to-be-processed video is shot by the shooting device.

As an example, as shown in fig. 5, the abscissa represents the imaging time of the first image to be processed in each video frame to be processed, with T _i In this case, the ordinate represents the position information of the imaging device. The black dots in the figure represent the position information of the camera corresponding to the imaging time of the first image to be processed in each video frame to be processed, e.g. the imaging time T of the first image to be processed in the first video frame to be processed ₁ The position information of the corresponding shooting device is P ₁ The first image to be processed in the second video frame to be processed is at the imaging time T ₂ The position information of the corresponding shooting device is P ₂ The first image to be processed in the third video frame to be processed is at the imaging time T ₃ The position information of the corresponding shooting device is P ₃ And so on, the first image to be processed in the J-th video frame to be processed is at the imaging time T _J The position information of the corresponding shooting device is P _J Then, by fitting the position information corresponding to the first to-be-processed imaging time in all the to-be-processed video frames, the motion curve of the shooting device in all the to-be-processed video frames is estimated, namely, the motion curve is obtained through P in the figure ₁ 、P ₂ 、P ₃ By analogy with P _J And fitting the estimated motion curve.

It should be noted that, in the fitting process, denoising processing may be performed on the position information of the shooting device corresponding to the first to-be-processed image in each to-be-processed video frame, then modeling is performed on the position information, and then a modeling parameter is estimated to obtain a motion curve, which is not limited in the embodiment of the present application.

S304, determining the position information of the imaging moment of the first to-be-processed image in the mth to-be-processed video frame of the shooting device on the motion curve as the ideal position information of the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame of the shooting device, wherein n is an integer and is taken from 1 to K.

In the present embodiment, n is an integer and is taken from 1 to K, that is, the position information of the imaging timing of the first to-be-processed image of the photographing device in the mth to-be-processed video frame on the motion curve is determined as the ideal position information of the imaging timing of the photographing device in the K to-be-processed images of the mth to-be-processed video frame.

When the motion curve of the camera is obtained, at the imaging time of the first to-be-processed image in each to-be-processed video frame, the motion curve also has the position information of a corresponding camera, and taking fig. 4 as an example, the white dots in the figure represent the position information of the camera corresponding to the imaging time of the first to-be-processed image in each to-be-processed video frame on the motion curve, for example, the first to-be-processed image in the first to-be-processed video frame at the imaging time T ₁ Position information of the camera corresponding to the motion curve is Q ₁ The first image to be processed in the second video frame to be processed is at the imaging time T ₂ Position information of the camera corresponding to the motion curve is Q ₂ The first image to be processed in the third video frame to be processed is at the imaging time T ₃ Position information of the camera corresponding to the motion curve is Q ₃ By analogy, the first image to be processed in the J-th video frame to be processed is at the imaging time T _J Position information of the camera corresponding to the motion curve is Q _J 。

In the present embodiment, the position information of the imaging time of the first to-be-processed image in the mth to-be-processed video frame of the photographing apparatus on the motion curve is determined as the ideal position information of the imaging time of the K to-be-processed images in the mth to-be-processed video frame of the photographing apparatus, and further, the ideal position information is the ideal position information of the imaging time of the K to-be-processed images in each to-be-processed video of the photographing apparatus. For example, the shooting device is arranged on the motion curve at the imaging time T of the first image to be processed in the 1 st video frame to be processed ₁ Position information Q of ₁ And determining ideal position information of the shooting device corresponding to the imaging moments of all the images to be processed in the first video frame to be processed.

S305, determining a first transformation matrix from the position information of the imaging time of the nth to-be-processed image in the mth to-be-processed video frame to the ideal position information of the imaging time of the nth to-be-processed image in the mth to-be-processed video frame according to the ideal position information of the imaging time of the imaging device in the mth to-be-processed image in the mth to-be-processed video frame and the position information of the imaging time of the imaging device in the nth to-be-processed image in the mth to-be-processed video frame.

As can be seen from S304, the ideal position information of the photographing device at the imaging time of all the images to be processed in the mth video frame to be processed is the same, and the ideal position information is the position information of the photographing device at the imaging time of the first image to be processed in the mth video frame to be processed on the motion curve.

It should be understood that the imaging time of each image to be processed in each video frame to be processed has the position information of the corresponding shooting device, and the position information is inevitably influenced by the shaking of the shooting device, so that the photographing device deviates from the ideal position information at the time of photographing, and therefore, in order to correct the position information corresponding to the imaging time of each of the images to be processed to the position of the photographing device photographed under the ideal position information, it is necessary to calculate a first transformation matrix from the position information of the imaging time of each of the images to be processed in the mth video frame to the image photographed under the ideal position information in the mth video frame to be processed, the first transformation matrix may align an image photographed due to the occurrence of shaking to an image photographed at an ideal position of the photographing device, thereby completing the frame-to-frame alignment.

It should be noted that, for example, the first transformation matrix between the position information corresponding to each to-be-processed image and the ideal position information of the to-be-processed image may be calculated by referring to the description of the related art, and details are not repeated here.

S306, according to a first transformation matrix from the position information of the imaging time of the nth to-be-processed image in the mth to-be-processed video frame to the ideal position information of the imaging time of the nth to-be-processed image in the mth to-be-processed video frame, performing High Dynamic Range (HDR) fusion processing on K to-be-processed images in the mth to-be-processed video frame to obtain a target video frame corresponding to the mth to-be-processed video frame, wherein J target video frames corresponding to the J to-be-processed video frames form a target video.

In this embodiment, for each to-be-processed video frame, after obtaining a first transformation matrix between position information corresponding to imaging time of K to-be-processed images and corresponding ideal position information, HDR fusion processing may be performed on the K to-be-processed images through the K to-be-processed images and the first transformation matrices corresponding to the K to-be-processed images, so as to fuse the K to-be-processed images into one HDR image, and further obtain a target video frame corresponding to each to-be-processed video frame, and further, the J target video frames constitute a target video.

In an implementation manner, for each video frame to be processed, affine transformation may be performed on a corresponding image to be processed by using K first transformation matrices to obtain K images after affine transformation of the image to be processed; and then inputting the affine-transformed image into an HDR fusion deep learning network, generating a fused HDR image, and taking the HDR image as a target video frame, wherein the specific implementation of the HDR image can refer to the description of the related technology, and is not described herein again.

In this implementation, for each to-be-processed image in each to-be-processed video frame, affine transformation is performed on the to-be-processed image by using the corresponding first transformation matrix, so that when the to-be-processed image is captured, an image that deviates from ideal position information due to shake of a capturing device can be aligned to the ideal position information, and thus, one-time calibration of the to-be-processed image can be completed.

In another implementation manner, for each to-be-processed video frame, K to-be-processed images and corresponding first transformation matrices may be simultaneously input into an HDR fusion network, a fused HDR image is generated, and the HDR image is taken as a target video frame. The specific implementation thereof can be described with reference to the related art, and is not described herein again.

In the video processing method provided by the embodiment of the application, first, an ideal motion curve of a shooting device is estimated through position information of a shooting device at an imaging time of a first to-be-processed image in all to-be-processed video frames, position information of the shooting device corresponding to the imaging time of the first to-be-processed image on the ideal motion curve is used as ideal position information of the shooting device at the shooting time of each to-be-processed video, then, a first transformation matrix capable of correcting the imaging time of all to-be-processed images in each to-be-processed video frame to the ideal position information is calculated, finally, a target HDR image of each to-be-processed video frame is fused according to all to-be-processed images in each to-be-processed video frame and the corresponding first transformation matrix, and further, a target video is generated.

In the technical scheme of the application, before the HDR fusion is performed, the first transformation matrix between the position information of all the images to be processed at the imaging time and the ideal position information in each video frame to be processed is already calculated, that is, the video image stabilization is placed before the HDR fusion, so that the enlargement of ghosts in the video image stabilization is avoided. Through the first transformation matrix, all images to be processed in each video frame to be processed can be corrected to ideal positions, that is, before HDR fusion is performed, position information of all images to be processed in each video frame to be processed is registered to an image shot by a shooting device at the same ideal position, so that all images to be processed are registered once, and a subsequent HDR fusion module can further perform registration fusion, so that registration between the images to be processed which is coarse and fine is formed. Therefore, the method also improves the HDR fusion precision and reduces the occurrence probability of ghost.

As an alternative embodiment, when the image to be processed includes h rows of pixels, the step S305 may further include: determining a second transformation matrix from the imaging time position information of the ith row of the nth image to be processed in the mth video frame to be processed to the imaging time position information of the ith row of the nth image to be processed in the mth video frame to be processed according to the imaging time ideal position information of the shooting device in the mth video frame to be processed and the imaging time position information of the shooting device in the ith row of the nth image to be processed, wherein h is an integer greater than 1, i is an integer and is taken from 1 to h; accordingly, S306 includes: according to a first transformation matrix from the position information of the imaging time of the nth to-be-processed image in the mth to-be-processed video frame to the ideal position information of the imaging time of the nth to-be-processed image in the mth to-be-processed video frame and a second transformation matrix from the position information of the imaging time of the ith row of the nth to-be-processed image in the mth to-be-processed video frame to the ideal position information of the imaging time of the ith row of the nth to-be-processed image in the mth to-be-processed video frame, performing high dynamic range HDR (high dynamic range) fusion processing on K to-be-processed images in the mth to-be-processed video frame to obtain a target video frame corresponding to the mth to-be-processed video frame, wherein J target video frames corresponding to the J to-be-processed video frames form a target video.

The ideal position information of the imaging time of the nth to-be-processed image in the mth to-be-processed video frame may refer to the related description of the embodiment shown in fig. 3, and is not repeated here.

It should be understood that, for each to-be-processed image in each to-be-processed video frame, the shooting device also has a corresponding position information at the imaging time of each line of the to-be-processed image, and the position information is inevitably affected by the shake of the shooting device, so that each line of the to-be-processed image obtained by the shooting device deviates from the ideal position information, therefore, in order to correct the position information corresponding to the imaging time of each to-be-processed image at each line to the image shot by the shooting device at the same ideal position information, a second transformation matrix between the position information corresponding to the imaging time of the ith line of each to-be-processed image in the mth to-be-processed video frame and the ideal position information needs to be calculated.

After the second transformation matrix is obtained, according to a first transformation matrix between the position information of the imaging time of the nth to-be-processed image in the mth to-be-processed video frame and the corresponding ideal position information, and according to a second transformation matrix between the position information of the imaging time of the ith row of the nth to-be-processed image in the mth to-be-processed video frame and the corresponding ideal position information of the imaging time of the ith row, performing high dynamic range HDR fusion processing on K to-be-processed images in the mth to-be-processed video frame to obtain a target video frame corresponding to the mth to-be-processed video frame.

In this embodiment, when each to-be-processed image includes h rows, each row corresponds to one second transformation matrix, and therefore each row of the to-be-processed image is corrected to an ideal position because each row corresponds to h transformation matrices.

The specific implementation process of performing the high dynamic range HDR fusion processing on the K to-be-processed images in the mth to-be-processed video frame may refer to related technical description according to a first transformation matrix between the position information of the imaging time of the nth to-be-processed image in the mth to-be-processed video frame and the first transformation matrix between the position information corresponding to the imaging time of the ith row of the nth to-be-processed image in the mth to-be-processed video frame and the ideal position information, and is not described herein again.

In the technical scheme provided by the embodiment of the application, in addition to the first transformation matrix for correcting each image to be processed to the ideal position, the second transformation matrix for correcting each line of each image to be processed to the ideal position is also calculated, so that under the condition that the whole image to be processed is corrected to the ideal position, each line of the image to be processed can be also corrected to the ideal position, the registration between the lines in each image to be processed is completed, the accuracy of shooting each image to be processed is improved, further, the registration degree between the images to be processed in each video frame can be improved, and the ghost problem occurring in the fusion process is reduced.

As an alternative embodiment, an implementation manner of S302 includes: according to the information recorded by the motion sensor, acquiring sensor information of the imaging moment of a first to-be-processed image in K to-be-processed images contained in the mth to-be-processed video frame of the J to-be-processed video frames by the shooting device through an interpolation function; and integrating the sensor information of the first image to be processed at the imaging moment to obtain the position information of the first image to be processed at the imaging moment.

Wherein, the information recorded by the motion sensor can represent the motion information when the video to be processed is shot.

As an example, let us assume that the information recorded by the motion sensor at the nth sampling time is y (n), let us

Showing that the exposure time is t when the jth video frame is shot _k At the imaging time corresponding to the imaging center of (1), to

The information representing the motion sensor at the imaging moment is:

where N represents the number of points sampled by the motion sensor and f (·) represents the interpolation function. Taking a 100Hz gyroscope as an example, the gyroscope records 100 camera rotation information every 1 second, and the sensor information at a certain time can be interpolated by using the information recorded by the gyroscope at the sampling time.

And finally, integrating the sensor information to obtain the pose of the camera corresponding to the imaging moment

The specific implementation process of integrating the sensor information may refer to the description of related technologies, and is not described herein again.

It should be noted that the information recorded by the motion sensor in the embodiment of the present application may include information such as an angular velocity and an angle when the photographing device rotates, and the embodiment of the present application is not limited thereto.

Optionally, the motion sensor comprises a gyroscope or an inertial measurement unit.

As an alternative embodiment, the S305 packetComprises the following steps: calculating a transformation matrix R between ideal position information of the shooting device at the imaging moment of the nth image to be processed in the mth video frame to be processed and position information of the shooting device at the imaging moment of the nth image to be processed in the mth video frame to be processed; transforming the matrix R according to the formula TRT ^-1 And transforming to obtain a first transformation matrix, wherein T is a parameter matrix of the shooting device.

Taking fig. 5 as an example, for the first frame to be processed, the position information of the camera corresponding to the imaging time of the first image to be processed is P ₁ The ideal position information of the shooting device corresponding to the imaging time of the first image to be processed is Q ₁ Then, the transformation matrix of the position information of the photographing device corresponding to the imaging timing of the first image to be processed and the ideal position information is R1, and then passes TRT ^-1 A transformation is performed from the transformation matrix R1 between the cameras to the first transformation matrix when taken with the camera in the ideal position.

It is stated here that the transformation matrix R is calculated and is based on the formula TRT ^-1 The manner of performing the conversion can be described with reference to the related art, and is not described herein again.

Alternatively, each of the images to be processed in the above embodiments may be a RAW image.

In some implementation manners of this embodiment, the K to-be-processed images include three to-be-processed images, which are respectively referred to as a first exposure image, a second exposure image, and a third exposure image, and the first exposure image, the second exposure image, and the third exposure image correspond to a first exposure duration, a second exposure duration, and a third exposure duration one to one, where the first exposure duration is greater than the second exposure duration, the second exposure duration is greater than the third exposure duration, and the second exposure image is used as the first to-be-processed image. The first exposure image may also be referred to as a long exposure image, the second exposure image may also be referred to as a medium exposure image, and the third exposure image may also be referred to as a short exposure image.

Taking the K to-be-processed images including three to-be-processed images, each to-be-processed image being a RAW image as an example, the video processing method according to the embodiment of the present application is described in detail with reference to fig. 6.

As shown in fig. 6, the method of this embodiment may include S601, S602, S603, S604, S605, S606, and S607. The video processing method may be performed by the digital imaging system shown in fig. 1.

S601, a to-be-processed video is obtained, wherein the to-be-processed video comprises J to-be-processed video frames, each to-be-processed video frame in the J to-be-processed video frames comprises 3 to-be-processed images, the 3 to-be-processed images correspond to 3 exposure durations one by one, and J and K are integers larger than 1.

In this embodiment, each to-be-processed video frame includes 3 to-be-processed images, which are a long-exposure image, a medium-exposure image, and a short-exposure image, and a detailed implementation process of this step may refer to S301, which is not described herein again.

And S602, acquiring gyroscope data.

In this embodiment, the gyroscope data represents information recorded by the motion sensor.

And S603, obtaining the camera postures of the 3 to-be-processed images in each to-be-processed video frame at the corresponding imaging time according to the gyroscope data.

In this embodiment, the camera pose may be regarded as position information of the shooting device, and may be specifically described with reference to the above embodiments, which are not described herein again.

That is, the position information of the camera at the imaging time corresponding to the long-exposure image, the medium-exposure image, and the short-exposure image in each video frame to be processed is acquired.

In one implementation, interpolation may be performed on the gyroscope data, and the camera poses of the 3 to-be-processed images in each to-be-processed video frame at the corresponding imaging time are obtained through integration.

In this step, a specific implementation process of how to obtain the camera pose may be described with reference to the related embodiments, and details are not repeated here.

And S604, taking the middle exposure images in all the video frames to be processed as reference frames, and estimating a motion curve when the camera shoots the video to be processed according to the camera pose at the imaging moment of the middle exposure images in all the frames.

In this embodiment, the motion curve of the video to be processed is estimated with reference to the position information of the camera at the imaging time of the middle exposure image in each video frame to be processed.

In this step, the motion curve of the video to be processed shot by the camera is estimated, which may refer to the description in the embodiment shown in fig. 5, and is not described herein again.

And S605, taking the position information of the imaging time of the middle exposure image in each video frame to be processed on the motion curve as the ideal position information of the imaging time of the camera in 3 images to be processed in each video frame.

In this embodiment, for each video frame to be processed, the ideal position information of the camera is the position information of the imaging time of the middle exposure image on the motion curve as a reference, that is, in each video frame to be processed, when the long exposure image, the middle exposure image and the short exposure image are taken, the ideal position information of the camera is the position information of the imaging time of the middle exposure image on the motion curve.

The relevant implementation process of this step may refer to the relevant description of S304, and is not described herein again.

S606, calculating first transformation matrixes respectively corresponding to the long exposure image, the medium exposure image and the short exposure image in each video frame to be processed, and calculating second transformation matrixes corresponding to each row in each image to be processed in each video frame to be processed.

In this embodiment, in each video frame to be processed, the ideal position information of the camera is the position information corresponding to the middle exposure image in the video frame to be processed at the imaging time on the motion curve.

It should be understood that when the camera takes the long exposure image, the medium exposure image and the short exposure image, because the camera inevitably shakes, when the three images are taken, the position information of the camera may be different, but for each video frame to be processed, there is corresponding ideal position information of the camera, so in the embodiment of the present application, a first transformation matrix corresponding to the long exposure image, the medium exposure image and the short exposure image in each video frame to be processed is calculated through the actual position information and the ideal position information of the 3 images to be processed in each video frame to be processed at the corresponding imaging time, and the first transformation matrix can align the image taken due to shaking to the image taken at the ideal position of the camera, thereby completing the alignment between frames.

It should also be understood that the camera may also have different position information when capturing each line of the same image, since the camera inevitably shakes when capturing each line of the long exposure image, the medium exposure image, and the short exposure image. Therefore, in this embodiment, the second transformation matrix corresponding to each line in each image to be processed in each video frame to be processed, which can align each line in the image captured due to the occurrence of shaking to each line captured at the ideal position of the capturing device, is also calculated from the actual position information and the ideal position information of the camera at the imaging time of each line in each image to be processed in each video frame to be processed, thereby completing the alignment in the frame.

The specific implementation process of calculating the first transformation matrix and the second transformation matrix may refer to the description of the related embodiments and is not described herein again.

S607, according to the first transformation matrix corresponding to the 3 images to be processed in each video frame to be processed and the second transformation matrix corresponding to each row of each image to be processed in each video frame to be processed, performing High Dynamic Range (HDR) fusion processing on the 3 images to be processed in each video frame to be processed to obtain a target video frame corresponding to each video frame to be processed, and further forming a target video by J target video frames.

In this embodiment, each to-be-processed video frame includes 3 to-be-processed images, each of the 3 to-be-processed images corresponds to a first transformation matrix, each row of each to-be-processed image in the 3 to-be-processed images also corresponds to a second transformation matrix, and if the to-be-processed image includes h rows, each to-be-processed image corresponds to h second transformation matrices, that is, each to-be-processed image corresponds to one first transformation matrix and h second transformation matrices.

After the first transformation matrix and the corresponding second transformation matrix of each image to be processed are obtained, HDR fusion processing can be performed on 3 images to be processed in the video frame to be processed through the first transformation matrix and the corresponding second transformation matrix corresponding to each image to be processed.

The specific implementation process of the HDR fusion processing may be described in S306 or related technologies, and is not described herein again.

In the video processing method provided by the embodiment of the application, firstly, an ideal motion curve of a shooting device is estimated through the position information of the shooting time of a middle exposure image in all video frames to be processed by the shooting device, the position information of the shooting device on the ideal motion curve corresponding to the imaging time of the middle exposure image is used as the ideal position information of each video to be processed when shooting, then, a first transformation matrix for correcting the position information of the imaging time of all images to be processed to the ideal position information in each video to be processed, and a second transformation matrix for correcting each line of each image to be processed to the ideal position are calculated; and finally, fusing all images to be processed in each video frame to be processed according to the first transformation matrix and the second transformation matrix to fuse a target HDR image of each video frame to be processed, and further generating a target video.

Alternatively, the gyroscope in the present embodiment may be replaced with an inertial measurement unit.

Fig. 7 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present application. The video processing apparatus shown in fig. 7 may be configured to perform the method for configuring system parameters according to any one of the foregoing embodiments.

As shown in fig. 7, the video processing apparatus 700 of the present embodiment includes: a video to be processed acquiring module 701, a first image to be processed position information acquiring module 702, an estimating module 703, an ideal position information determining module 704, a first transformation matrix determining module 705 and a fusing module 706. The to-be-processed video acquiring module 701 is configured to acquire a to-be-processed video.

The position information acquiring module 702 of the first to-be-processed image is used to acquire the position information of the imaging time of the m-th to-be-processed video frame of the to-be-processed video frames in the J to-be-processed video frames, where m is an integer and is taken from 1 to J.

The estimation module 703 is configured to estimate a motion curve when the to-be-processed video is captured by the capturing device according to the position information of the imaging time of the first to-be-processed image corresponding to each of all to-be-processed video frames in the J to-be-processed video frames of the capturing device.

The ideal position information determining module 704 is configured to determine the position information of the imaging time of the first to-be-processed image in the mth to-be-processed video frame of the shooting device on the motion curve as the ideal position information of the imaging time of the shooting device in the nth to-be-processed image in the mth to-be-processed video frame, where n is an integer and is taken from 1 to K.

The first transformation matrix determining module 705 is configured to determine a first transformation matrix from the position information of the imaging time of the nth to-be-processed image in the mth to-be-processed video frame to the ideal position information of the imaging time of the nth to-be-processed image in the mth to-be-processed video frame according to the ideal position information of the imaging time of the imaging device in the mth to-be-processed video frame and the position information of the imaging time of the imaging device in the nth to-be-processed image in the mth to-be-processed video frame.

The fusion module 706 is configured to perform high dynamic range HDR fusion processing on K to-be-processed images in the mth to-be-processed video frame according to a first transformation matrix from the position information of the imaging time of the nth to-be-processed image in the mth to-be-processed video frame to the ideal position information of the imaging time of the nth to-be-processed image in the mth to-be-processed video frame, so as to obtain a target video frame corresponding to the mth to-be-processed video frame, where J target video frames corresponding to the J to-be-processed video frames constitute a target video.

As an example, the to-be-processed video obtaining module 701 may be configured to perform the step of obtaining the to-be-processed video in the video processing method described in any one of fig. 3 or fig. 6. For example, the to-be-processed video acquisition module 701 is configured to execute S301.

As another example, the estimation module 703 may be used to perform the step of estimating the motion curve when the to-be-processed video is captured by the capturing device in the video processing method described in any one of fig. 3 or fig. 6. For example, the estimation module 703 is configured to perform S303 or S602.

As still another example, the fusion module 706 may be configured to execute the HDR fusion processing step performed on the to-be-processed image in each to-be-processed video frame in the video processing method described in any one of fig. 3 or fig. 6. For example, the fusion module 706 is configured to perform S306 or S607.

In one possible implementation manner, the image to be processed includes h rows of pixels, and the apparatus further includes: a second transformation matrix determining module 707, configured to determine a second transformation matrix between the position information of the imaging time of the ith row of the nth to-be-processed image in the mth to-be-processed video frame to the ideal position information of the imaging time of the ith row of the nth to-be-processed image in the mth to-be-processed video frame according to the ideal position information of the imaging time of the photographing device at the imaging time of the nth to-be-processed image in the mth to-be-processed video frame and the position information of the photographing device at the imaging time of the ith row of the nth to-be-processed image in the mth to-be-processed video frame, where h is an integer greater than 1, i is an integer and is taken from 1 to h; accordingly, the fusion module 706 is further configured to: according to a first transformation matrix from the position information of the imaging time of the nth to-be-processed image in the mth to-be-processed video frame to the ideal position information of the imaging time of the nth to-be-processed image in the mth to-be-processed video frame, and a second transformation matrix from the position information of the imaging time of the ith row of the nth to-be-processed image in the mth to-be-processed video frame to the ideal position information of the imaging time of the ith row of the nth to-be-processed image in the mth to-be-processed video frame, performing high dynamic range HDR (high dynamic range) fusion processing on K to-be-processed images in the mth to-be-processed video frame to obtain a target video frame corresponding to the mth to-be-processed video frame, wherein J target video frames corresponding to the J to-be-processed video frames form a target video.

In a possible implementation manner, the first to-be-processed image location information obtaining module 702 is specifically configured to: according to the information recorded by the motion sensor, obtaining the sensor information of the imaging moment of the imaging device at the mth video frame to be processed in the J video frames to be processed, wherein the mth video frame to be processed comprises a first image to be processed in K images to be processed; and integrating the sensor information of the first image to be processed at the imaging moment to obtain the position information of the first image to be processed at the imaging moment.

In one possible implementation, the motion sensor comprises a gyroscope or an inertial measurement unit.

In a possible implementation manner, the first transformation matrix determining module 705 is specifically configured to: calculating a transformation matrix R between ideal position information of the shooting device at the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame and position information of the shooting device at the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame; transforming the matrix R according to the formula TRT ^-1 And transforming to obtain a first transformation matrix, wherein T is a parameter matrix of the shooting device.

In a possible implementation manner, the fusion module 706 is specifically configured to: carrying out affine transformation on the nth to-be-processed image in the mth to-be-processed video frame through the corresponding first transformation matrix to obtain an image subjected to affine transformation on the nth to-be-processed image in the mth to-be-processed video frame; inputting the affine-transformed image into an HDR fusion module to generate a target video frame corresponding to the mth video frame to be processed; or, the nth to-be-processed image in the mth to-be-processed video frame and the corresponding first transformation matrix are simultaneously input to the HDR fusion module, and a target video frame corresponding to the mth to-be-processed video frame is generated.

In one possible implementation, each of the to-be-processed images is a native RAW image.

In a possible implementation manner, the K images to be processed include a first exposure image, a second exposure image, and a third exposure image, where the first exposure image, the second exposure image, and the third exposure image correspond to a first exposure duration, a second exposure duration, and a third exposure duration one to one, the first exposure duration is greater than the second exposure duration, the second exposure duration is greater than the third exposure duration, and the second exposure image is the first image to be processed.

Fig. 8 is a schematic structural diagram of a video processing apparatus according to another embodiment of the present application. The apparatus shown in fig. 8 may be used to perform the video processing method described in any of the foregoing embodiments.

As shown in fig. 8, the apparatus 800 of the present embodiment includes: memory 801, processor 802, communication interface 803, and bus 804. The memory 801, the processor 802, and the communication interface 803 are communicatively connected to each other via a bus 804.

The memory 801 may be a Read Only Memory (ROM), a static memory device, a dynamic memory device, or a Random Access Memory (RAM). The memory 801 may store a program, and the processor 802 is configured to perform the steps of the method shown in fig. 3 when the program stored in the memory 801 is executed by the processor 802.

The processor 802 may be a general-purpose Central Processing Unit (CPU), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to execute related programs to implement the methods of the embodiments of the present application.

The processor 802 may also be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the method of the embodiments of the present application may be implemented by integrated logic circuits of hardware in the processor 802 or instructions in the form of software.

The processor 802 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 801, and the processor 802 reads the information in the memory 801 and completes the functions required to be performed by the units included in the thermometric apparatus according to the application in combination with the hardware thereof, for example, the steps/functions of the embodiments shown in fig. 3 or fig. 6 may be performed.

The communication interface 803 may enable communication between the apparatus 800 and other devices or communication networks using, but not limited to, transceiver means such as transceivers.

The bus 804 may include a pathway to transfer information between various components of the apparatus 800 (e.g., memory 801, processor 802, communication interface 803).

It should be understood that the apparatus 800 shown in the embodiment of the present application may be an electronic device, or may also be a chip configured in the electronic device.

It should be understood that the processor in the embodiments of the present application may be a Central Processing Unit (CPU), and the processor may also be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It will also be appreciated that the memory in the embodiments of the subject application may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example, but not limitation, many forms of Random Access Memory (RAM) are available, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), synchlink DRAM (SLDRAM), and direct bus RAM (DR RAM).

The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. The procedures or functions according to the embodiments of the present application are wholly or partially generated when the computer instructions or the computer program are loaded or executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more collections of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.

It should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. In addition, the "/" in this document generally indicates that the former and latter associated objects are in an "or" relationship, but may also indicate an "and/or" relationship, which may be understood with particular reference to the former and latter text.

In the present application, "at least one" means one or more, "a plurality" means two or more. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of the singular or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or multiple.

It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not imply any order of execution, and the order of execution of the processes should be determined by their functions and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: u disk, removable hard disk, read only memory, random access memory, magnetic or optical disk, etc. for storing program codes.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A video processing method, comprising:

acquiring a video to be processed, wherein the video to be processed comprises J video frames to be processed, each video frame to be processed in the J video frames to be processed comprises K images to be processed, the K images to be processed correspond to K exposure durations in a one-to-one manner, and J and K are integers greater than 1;

acquiring the position information of the imaging moment of the mth to-be-processed video frame in the J to-be-processed video frames, which contains the first to-be-processed image in the K to-be-processed images, of the shooting device of the to-be-processed video, wherein m is an integer and is taken from 1 to J;

estimating a motion curve when the shooting device shoots the video to be processed according to the position information of the shooting device at the imaging time of the first image to be processed corresponding to all the video frames to be processed in the J video frames to be processed respectively;

determining the position information of the shooting device at the imaging moment of the first image to be processed in the mth video frame to be processed on the motion curve as the ideal position information of the shooting device at the imaging moment of the nth image to be processed in the mth video frame to be processed, wherein n is an integer and is taken from 1 to K;

determining a first transformation matrix from the position information of the imaging time of the nth to-be-processed image in the mth to-be-processed video frame to the ideal position information of the imaging time of the nth to-be-processed image in the mth to-be-processed video frame according to the ideal position information of the imaging time of the imaging device in the nth to-be-processed image in the mth to-be-processed video frame and the position information of the imaging time of the imaging device in the nth to-be-processed image in the mth to-be-processed video frame;

according to a first transformation matrix from the position information of the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame to the ideal position information of the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame, performing High Dynamic Range (HDR) fusion processing on K to-be-processed images in the mth to-be-processed video frame to obtain a target video frame corresponding to the mth to-be-processed video frame, wherein J target video frames corresponding to the J to-be-processed video frames form a target video.

2. The method of claim 1, wherein the image to be processed comprises h rows of pixels, the method further comprising:

determining a second transformation matrix from the imaging time position information of the imaging time of the ith row of the nth image to be processed in the mth video frame to be processed to the imaging time position information of the ith row of the nth image to be processed in the mth video frame to be processed according to the imaging time ideal position information of the shooting device in the mth video frame to be processed and the imaging time position information of the shooting device in the ith row of the nth image to be processed in the mth video frame to be processed, wherein h is an integer larger than 1, i is an integer and is taken from 1 to h;

correspondingly, according to a first transformation matrix from the position information of the imaging time of the nth to-be-processed image in the mth to-be-processed video frame to the ideal position information of the imaging time of the nth to-be-processed image in the mth to-be-processed video frame, performing high dynamic range HDR fusion processing on K to-be-processed images in the mth to-be-processed video frame to obtain a target video frame corresponding to the mth to-be-processed video frame, where J target video frames corresponding to the J to-be-processed video frames constitute a target video, including:

performing High Dynamic Range (HDR) fusion processing on K to-be-processed images in the mth to-be-processed video frame according to a first transformation matrix from the position information of the imaging time of the nth to-be-processed image in the mth to-be-processed video frame to ideal position information of the imaging time of the ith row of the nth to-be-processed image in the mth to-be-processed video frame and a second transformation matrix from the position information of the imaging time of the ith row of the nth to-be-processed image in the mth to-be-processed video frame to obtain a target video frame corresponding to the mth to-be-processed video frame, wherein J target video frames corresponding to the J to-be-processed video frames form a target video.

3. The method according to claim 1 or 2, wherein the acquiring of the position information of the photographing device of the to-be-processed video at the imaging time when the mth to-be-processed video frame of the J to-be-processed video frames contains the first to-be-processed image of the K to-be-processed images comprises:

according to the information recorded by the motion sensor, obtaining the sensor information of the imaging moment of the first to-be-processed image in K to-be-processed images contained in the mth to-be-processed video frame in the J to-be-processed video frames by the shooting device through an interpolation function;

and integrating the sensor information of the first image to be processed at the imaging moment to obtain the position information of the first image to be processed at the imaging moment.

4. The method of claim 3, wherein the motion sensor comprises a gyroscope or an inertial measurement unit.

5. The method according to any one of claims 1 to 4, wherein determining a first transformation matrix between the position information of the imaging time of the nth to-be-processed image in the mth to-be-processed video frame to the ideal position information of the imaging time of the nth to-be-processed image in the mth to-be-processed video frame according to the ideal position information of the camera at the imaging time of the nth to-be-processed image in the mth to-be-processed video frame comprises:

calculating a transformation matrix R between ideal position information of the shooting device at the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame and position information of the shooting device at the imaging moment of the nth to-be-processed image in the mth to-be-processed video frame;

transforming the matrix R according to the formula TRT ^-1 And transforming to obtain a first transformation matrix, wherein T is a parameter matrix of the shooting device.

6. The method according to any one of claims 1 to 5, wherein performing High Dynamic Range (HDR) fusion processing on K images to be processed in the mth video frame according to a first transformation matrix between actual position information corresponding to an imaging time of an nth image to be processed in the mth video frame to be processed and corresponding ideal position information to obtain a target video frame corresponding to the mth video frame, includes:

carrying out affine transformation on the nth to-be-processed image in the mth to-be-processed video frame through a corresponding first transformation matrix to obtain an affine-transformed image of the nth to-be-processed image in the mth to-be-processed video frame;

inputting the affine-transformed image into an HDR fusion module, and generating a target video frame corresponding to the mth video frame to be processed; alternatively, the first and second electrodes may be,

and simultaneously inputting the nth to-be-processed image in the mth to-be-processed video frame and the corresponding first transformation matrix into an HDR fusion module to generate a target video frame corresponding to the mth to-be-processed video frame.

7. The method according to any of claims 1 to 6, wherein each image to be processed is a native RAW image.

8. The method according to any one of claims 1 to 7, wherein the K images to be processed comprise a first exposure image, a second exposure image and a third exposure image, the first exposure image, the second exposure image and the third exposure image are in one-to-one correspondence with a first exposure duration, a second exposure duration and a third exposure duration, the first exposure duration is greater than the second exposure duration, the second exposure duration is greater than the third exposure duration, and the second exposure image is the first image to be processed.

9. A video processing apparatus, characterized in that the apparatus comprises functional modules for performing the method according to any of claims 1 to 8.

10. A video processing apparatus, comprising: a memory and a processor;

the memory is to store program instructions;

the processor is configured to invoke program instructions in the memory to perform the video processing method of any of claims 1 to 8.

11. A chip comprising at least one processor and a communication interface, the communication interface and the at least one processor being interconnected by a line, the at least one processor being configured to execute a computer program or instructions to perform the method of any one of claims 1 to 8.

12. A computer-readable medium, characterized in that the computer-readable medium stores program code for computer execution, the program code comprising instructions for performing the method of any of claims 1 to 8.

13. A computer program product comprising computer program code which, when run on a computer, causes the computer to carry out the method according to any one of claims 1 to 8.