CN113841112A

CN113841112A - Image processing method, camera and mobile terminal

Info

Publication number: CN113841112A
Application number: CN202080035108.8A
Authority: CN
Inventors: 李广; 朱传杰; 李志强; 李静
Original assignee: SZ DJI Technology Co Ltd
Current assignee: SZ DJI Technology Co Ltd
Priority date: 2020-08-06
Filing date: 2020-08-06
Publication date: 2021-12-24
Also published as: WO2022027447A1

Abstract

The embodiment of the application discloses an image processing method, which comprises the following steps: acquiring a body-separating effect instruction; processing an original video shot with a moving main body according to the body separating effect instruction to obtain a target video, wherein the target video comprises the moving main body and at least one dynamic body separating corresponding to the moving main body, and the dynamic body separating repeats the movement of the moving main body with a specified time delay. The method disclosed by the embodiment of the application realizes the split video effect, improves the interestingness of the user in making the video, and enables the user to make creative videos.

Description

Image processing method, camera and mobile terminal

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image processing method, a camera, a mobile terminal, and a computer-readable storage medium.

Background

With the development of video technology, more and more electronic devices have a function of capturing video. By shooting the video, people can easily record what you see. After shooting a video, people can add various effects to the video in order to increase the originality of the video content.

Disclosure of Invention

The embodiment of the application provides an image processing method, a camera, a mobile terminal and a computer readable storage medium, which can realize a body-separated video effect.

A first aspect of an embodiment of the present application provides an image processing method, including:

acquiring a body-separating effect instruction;

processing an original video shot with a moving main body according to the body separating effect instruction to obtain a target video, wherein the target video comprises the moving main body and at least one dynamic body separating corresponding to the moving main body, and the dynamic body separating repeats the movement of the moving main body with a specified time delay.

A second aspect of embodiments of the present application provides a camera, including: a processor and a memory storing a computer program;

the processor, when executing the computer program, implements the steps of:

acquiring a body-separating effect instruction;

A third aspect of the embodiments of the present application provides a mobile terminal, including: a processor and a memory storing a computer program;

the processor, when executing the computer program, implements the steps of:

acquiring a body-separating effect instruction;

A fourth aspect of embodiments of the present application provides a computer-readable storage medium, which stores a computer program that, when executed by a processor, implements any one of the image processing methods in the first aspect.

The image processing method provided by the embodiment of the application can process the original video shot with the moving main body after the body separating effect instruction is obtained, so that the moving main body in the video has at least one dynamic body separating function, and the dynamic body separating function can repeat the movement of the moving main body with the appointed time delay. The embodiment of the application provides a video body separating effect, improves the interestingness of video making by a user, and enables the user to make creative videos.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.

Fig. 1A is an nth frame in an original video provided by an embodiment of the present application.

Fig. 1B is an effect diagram of the nth frame shown in fig. 1A after processing.

Fig. 2 is a flowchart of an image processing method according to an embodiment of the present application.

Fig. 3 is a structural diagram of a camera according to an embodiment of the present disclosure.

Fig. 4 is a block diagram of a mobile terminal according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The embodiment of the application provides an image processing method, which can add a body-splitting effect to a moving body in a video, that is, the moving body can have at least one dynamic body-splitting corresponding to the moving body, and the dynamic body-splitting can repeat the movement of the moving body with a specified time delay.

Reference may be made to fig. 1A and fig. 1B, where fig. 1A is an nth frame in an original video provided in an embodiment of the present application, and fig. 1B is an effect diagram of the nth frame after processing. If the motion subject of the nth frame in the original video is X, in the nth frame of the target video (i.e., the video obtained by processing the original video), the motion subject X may have at least one body, such as two bodies X ' and X ' in fig. 1B, where the motion performed by the body in the nth frame is what the motion subject X has performed, for example, in one example, the motion performed by X ' may be the motion performed by the motion subject X5 frames ago, and the motion performed by X ″ may be the motion performed by the motion subject X10 frames ago.

It should be noted that fig. 1A and 1B show the effect of only one video frame before and after processing, and when a plurality of video frames are played continuously, each avatar is not static in terms of video effect, but dynamic in terms of repeating the motion of the moving body with a certain time delay, that is, each avatar may be a dynamic avatar.

It should be noted that fig. 1A and 1B are only examples provided for easy understanding, and in practical applications, parameters of the body-splitting effect, such as the number of the body-splitting, the delay after the body-splitting, the transparency of the body-splitting, and the like, may be set by a user or default parameters of the system may be used, and implementation of the contents will be described later.

The image processing method provided by the embodiment of the application can realize the body separating effect, improves the interestingness of video making by a user, and enables the user to make creative videos.

Referring to fig. 2, fig. 2 is a flowchart of an image processing method according to an embodiment of the present disclosure. The method can be applied to cameras, mobile terminals, image processing devices and other electronic devices, and comprises the following steps:

and S210, acquiring a body-separating effect instruction.

And S220, processing the original video shot with the moving body according to the body-separating effect instruction to obtain a target video with the body-separating effect.

The body-split effect instruction may be user-triggered. In one example, the body-splitting effect instruction may be a button in the interactive interface, and when the user clicks the button, the user may trigger the body-splitting effect processing on the original video. In one example, the avatar effect command may be an entity button. Of course, the body-splitting effect instruction may be triggered by other means, such as voice, touch gesture, and the like.

The body-separating effect instruction can include one or more of the following information: number of separated bodies, frame interval of separated bodies and transparency of separated bodies. The body frame interval may be the number of frames of difference in motion between adjacent bodies. As previously mentioned, this information may be set by the user or default parameters of the system may be used.

After the body-separating effect instruction is obtained, the original video shot with the moving subject may be processed, and in one embodiment, the processing of the original video shot with the moving subject may include the following steps:

s221, acquiring a first video frame and a second video frame from an original video shot with a moving body.

S222, mapping the first video frame to a space corresponding to the second video frame.

And S223, synthesizing a first target video frame according to the mapped first video frame and the mapped second video frame.

For example, the second video frame may be an ith frame, and the frame number corresponding to the first video frame is smaller than i, such as i-3, i-5, and the like.

The body-separating effect can be realized by fusing the motion main body in the first video frame into the second video frame, so that the motion main body in the second video frame has a body-separating function, and the body-separating function is the motion main body of the first video frame.

It is considered that a photographer generally changes a photographing angle when photographing a moving subject, in other words, a photographing angle corresponding to a first video frame may be different from a photographing angle corresponding to a second video frame. Then, when the moving subject in the first video frame is fused into the second video frame, in order to make the body-distinguishing effect more natural and real, the first video frame may be mapped to a space corresponding to the second video frame, and then the two frames are synthesized.

For example, a moving subject in an original video is running, where the moving subject in a first video frame is in the air, and a shooting angle of the first video frame corresponds to the left front of a photographer, and if the moving subject in a second video frame just lands on the ground and a shooting angle of the second video frame corresponds to the right front of the photographer, the first video frame may be mapped to a shooting angle corresponding to the right front by spatial transformation or the like, so as to obtain an image (i.e., the mapped first video frame) obtained by shooting the moving subject in the first video frame at the right front angle. Because the shooting angles of the mapped first video frame and the second video frame are matched, the body separating effect in the synthesized first target video frame is more natural and real.

In one embodiment, the original video may be taken by a camera in place with a rotating camera. It should be noted that the term "in-place" refers to the coordinate of the camera in the world coordinate system being substantially unchanged, for example, if the displacement of the camera in the world coordinate system is less than or equal to a predetermined threshold, the camera may be considered to be in-place. During shooting, the camera can rotate freely in situ, for example, the camera can rotate from left to right or from top to bottom, which is not limited in the present application.

The original video is obtained by rotating and shooting the camera in situ, namely the coordinate of the camera on a world coordinate system is approximately unchanged, so that when the first video frame is mapped to the space corresponding to the second video frame, only two-dimensional space transformation is needed, namely, only the rotation amount is needed to be calculated, and the whole scene is not needed to be subjected to three-dimensional modeling, so that the calculation resources needed by the body separating effect are greatly reduced, the processing speed of the body separating effect is greatly increased, real-time processing can be realized, and great convenience is brought to users for video sharing.

In one embodiment, the original video may be captured in real time after the body-separating effect instruction is acquired. For example, a shooting mode with a body-separating effect may be configured in the camera, and the user may trigger the shooting mode and issue a body-separating effect instruction through operations such as clicking, and then the camera may enter the shooting mode after acquiring the body-separating effect instruction. Before shooting, the camera can prompt the user to shoot in situ through characters, voice and the like.

The processing of the body separating effect may be that the camera performs the shooting of the original video and the processing of the body separating effect on the shot video frame, or that the camera performs the processing of the body separating effect on the original video after the user finishes the shooting of the original video.

When the user shoots an original video, the camera can also position the position of the original video in a world coordinate system in real time, and if the displacement of the camera is detected to exceed a preset threshold value, shooting can be suspended and a prompt that the displacement is too large is sent to the user.

In one embodiment, the original video may also be a segment that the user selected from the video material. For example, the video captured by the user may include a section corresponding to a landscape and a section corresponding to a person movement, and the user may intercept the section corresponding to the person movement and add a body-separating effect to the section corresponding to the person movement.

In one embodiment, the camera can be loaded on the pan-tilt and the camera is configured with an algorithm for automatically following the target, so that when the moving body is shot, the camera can automatically follow the moving body in situ to perform rotary shooting under the control of the pan-tilt.

When mapping the first video frame to the space corresponding to the second video frame, specifically, the first video frame may be processed through a spatial transformation matrix.

The spatial transformation matrix may be determined in a number of ways, and in one embodiment, the spatial transformation matrix may be a rotation matrix. The rotation matrix can be calculated by using pose information of the camera, and the pose information of the camera can be acquired by an inertial measurement unit IMU of the camera. For example, the camera pose information corresponding to shooting the first video frame and the camera pose information corresponding to shooting the second video frame may be obtained, and the rotation matrix may be calculated according to a difference between the camera pose information corresponding to the first video frame and the camera pose information corresponding to the second video frame.

In another embodiment, the spatial transform matrix may also include a homography matrix. The homography matrix can be calculated according to the feature matching result of the first video frame and the second video frame. Specifically, the feature matching may be performed for a specified region (specified content) in the video frame, which may be, for example, a background region (scene region) other than the moving subject in one example. Feature point extraction is carried out on the background area of the first video frame, feature point extraction is carried out on the background area of the second video frame, therefore, feature matching can be carried out on the extracted feature points, a plurality of matched feature pairs are obtained, and the homography matrix can be calculated according to the feature pairs.

Furthermore, considering that the matched plurality of feature pairs are not necessarily matched accurately, that is, some matched feature pairs may be unreliable and inaccurate, the plurality of feature pairs can be screened, the credible feature pairs which are matched correctly are screened, and then the homography matrix is calculated according to the screened credible feature pairs.

In one embodiment, the mapped first video frame may be synthesized with the second video frame to obtain the first target video frame. The first target video frame may be a frame in the target video. Considering that when the first video frame is mapped to the space corresponding to the second video frame, the spatial transformation of the first video frame is not absolutely accurate, that is, the calculated spatial relationship between the first video frame and the second video frame has a certain error, therefore, if the mapped whole first video frame and the mapped second video frame are directly used for synthesis, the first target video frame obtained by synthesis will appear blurred, and the main body of the current frame will also become transparent. Therefore, in another embodiment, the motion subject may be extracted from the mapped first video frame, the body-separated image may be extracted, and the body-separated image may be combined with the second video frame.

There are many possible ways to extract the motion subject from the mapped first video frame. In one embodiment, an original mask corresponding to a moving subject may be obtained by performing subject segmentation on a first video frame; mapping the original mask to a space corresponding to the second video frame through a space transformation matrix to obtain a target mask; the target mask may be used to process the mapped first video frame, for example, the target mask may be multiplied by the mapped first video frame, so that a motion subject in the mapped first video frame may be extracted, and an avatar image may be obtained.

In the above embodiment, after the target mask is obtained, further, a portion of the target mask that overlaps with the moving object in the second video frame may be removed. In a specific implementation, for example, the second video frame may be subject-segmented to obtain a mask of a moving subject corresponding to the second video frame, and then a portion of the target mask overlapping with the mask of the moving subject corresponding to the second video frame may be removed. After the removal processing of the overlapped part is performed on the target mask, the mapped first video frame can be processed by using the processed target mask, so that the motion body does not overlap with the body of the user too much in the finally synthesized first target video frame.

After the target mask is obtained, the target mask may be subjected to a blurring process, specifically, a non-0 value (i.e., a region corresponding to the moving subject) in the target mask may be subjected to a gaussian blurring process, for example, the non-0 value of the target mask may be multiplied by 255 and then limited to 255. By carrying out fuzzy processing on the target mask, the fusion effect of the extracted body-separating image and the second video frame can be more natural, the body-separating in the target video frame does not have obvious image processing traces such as boundaries and the like, and the body-separating effect is more real.

For the implementation of the multi-differentiation effect, in one embodiment, FIR-type synthesis may be employed. When FIR-type synthesis is adopted, there may be a plurality of the aforementioned first video frames, that is, the first video frame may refer to a type of video frame whose corresponding time is earlier than that of the second video frame, and the first target video frame may be any frame in the target video from which an avatar starts. FIR-type synthesis can synthesize each first video frame for making a body into a second video frame, thereby realizing that a moving body in the second video frame has a plurality of bodies. For example, in an example, if it is desired that a moving body in a first target video frame obtained by synthesis has 3 frame identities, a second video frame may be, for example, a 10 th frame, and a first video frame may include, for example, a 1 st frame, a 4 th frame, and a 7 th frame, when the 3 frame identities of the moving body are implemented, the 1 st frame, the 4 th frame, and the 7 th frame may be synthesized into the 10 th frame, so that the moving body in the 10 th frame has 3 identities, and the 3 identities correspond to the moving body in the 1 st frame, the 4 th frame, and the 7 th frame, respectively.

It should be noted that, since one avatar corresponds to one video frame in the original video, if K avatars need to be implemented, the frame number of the second video frame may be greater than K, so that there may be at least K first video frames for making avatars.

In the above example of combining the 1 st frame, the 4 th frame, and the 7 th frame into the 10 th frame, the body frame interval is 3 frames. The frame interval of the body segments can be used to represent the number of frames with difference in motion between adjacent body segments, for example, in the synthesized first target video frame, the body segment corresponding to the 7 th frame lags behind the motion subject by 3 frames, the body segment corresponding to the 4 th frame lags behind the body segment corresponding to the 7 th frame by 3 frames, and the body segment corresponding to the 1 st frame lags behind the body segment corresponding to the 4 th frame by 3 frames. The first target video frame obtained by synthesis corresponds to the frame number of the second video frame, namely the first target video frame is the 10 th frame in the target video. For the 11 th frame of the target video, if 3 segmentations are still realized, the 2 nd frame, the 5 th frame and the 8 th frame in the original video can be synthesized into the 11 th frame in the original video. And for the 13 th frame of the target video frame, the 4 th frame, the 7 th frame and the 10 th frame in the original video can be synthesized to the 13 th frame of the original video. The synthesizing thinking of the subsequent video frames of the target video is the same, and is not repeated herein.

In the FIR-type synthesis method, when K segmentations need to be synthesized, K first video frames need to be synthesized into a second video frame, which results in a large amount of calculation. Therefore, the embodiment of the present application provides another implementation manner, and IIR synthesis may be adopted, that is, a target video frame obtained by synthesis may be used to synthesize a subsequent target video frame, so that the amount of calculation may be greatly reduced.

For IIR synthesis, the aforementioned first video frame may be a frame in the original video, and the first target video frame with 1 frame may be obtained by synthesizing the mapped first video frame into the second video frame. After the first target video frame is obtained through synthesis, a third video frame can be obtained from the original video, the time corresponding to the third video frame is later than that of the second video frame, and the frame intervals among the first video frame, the second video frame and the third video frame are the same. For example, the first video frame is the 1 st frame in the original video, the second video frame is the 4 th frame in the original video, and the acquired third video frame may be the 7 th frame in the original video.

After the third video frame is obtained, the synthesized first target video frame may be mapped to a space corresponding to the third video frame, and then the second target video frame is synthesized according to the mapped first target video frame and the third video frame. Since the first target video frame already includes the moving subject and 1 avatar corresponding to the moving subject, the synthesized second target video frame may include the moving subject and 2 avatars corresponding to the moving subject.

It is understood that in the IIR synthesis, the synthesized first target video frame has 1 frame of the body. For example, if the frame separation is set to 3, then the 1 st frame, the 2 nd frame and the 3 rd frame of the target video have no frame separation, and the 4 th frame of the target video starts to have 1 frame separation, where the 4 th frame is synthesized by the 1 st frame and the 4 th frame in the original video; the 5 th frame of the target video has 1 body, and the 5 th frame is synthesized by utilizing the 2 nd frame and the 5 th frame in the original video; the 6 th frame of the target video has 1 frame, the 6 th frame is synthesized by using the 3 rd frame and the 6 th frame in the original video, and the first target video frame can be any one of the 4 th frame, the 5 th frame or the 6 th frame.

For the 7 th frame of the target video, in the IIR synthesis, the 7 th frame may have 2 entities, which may be synthesized by using the 4 th frame of the synthesized target video and the 7 th frame of the original video; the 8 th frame of the target video can have 2 separate entities, and the 8 th frame can be synthesized by using the 5 th frame of the synthesized target video and the 8 th frame of the original video; the 9 th frame of the target video can have 2 separate entities, and the 9 th frame can be synthesized by using the 6 th frame of the synthesized target video and the 9 th frame of the original video; the 10 th frame of the target video may have 3 separate entities, the 10 th frame may be synthesized … … with the 7 th frame of the synthesized target video being synthesized with the 10 th frame of the original video, and so on.

It can be seen that in the IIR synthesis, when K segmentations need to be synthesized, the synthesized target video frames with K-1 segmentations can be used to be synthesized with the corresponding video frames in the original video, in other words, no matter how many segmentations are synthesized, the synthesis of each target video frame is the synthesis of only two video frames, which greatly reduces the calculation amount compared with the FIR synthesis.

For the space corresponding to the mapping of the first target video frame to the third video frame, since the first target video frame actually corresponds to the second video frame of the original video in space, the mapping of the first target video frame may be performed by using the spatial transformation matrix corresponding to the mapping of the second video frame to the third video frame. For example, a rotation matrix may be calculated by using a difference value of camera pose information between the second video frame and the third video frame, or a homography matrix may be calculated by performing feature matching on the second video frame and the third video frame.

In one embodiment, the motion subject of the mapped first target video frame may also be extracted. In particular, reference may be made to the examples provided below.

If the first video frame is the i-fs frame, the second video frame is the i-th frame, the third video frame is the i + fs frame, and fs is the framing frame interval, the first video frame, the second video frame, and the third video frame may be subject-divided to obtain respective corresponding masks M (i-fs), M (i), and M (i + fs) (the masks may separate out moving subjects in the video frames). The spatial transformation matrix H (i) for mapping the first video frame F (i-fs) to the second video frame F (i), and the spatial transformation matrix H (i + fs) for mapping the second video frame F (i) to the third video frame F (i + fs) may be calculated, and the specific calculation manner may refer to the related description in the foregoing.

By means of h (i), the mask M (i-fs) may be mapped to a space corresponding to the second video frame to obtain a target mask, and the target mask may be removed from the overlapping portion with M (i) to obtain a mask Mch (i-fs). The mask Mch (i-fs) may be subjected to Gaussian blur to obtain a mask Mchb (i-fs).

Through h (i), the first video frame F (i-fs) may be mapped to a space corresponding to the second video frame F (i), so as to obtain a mapped first video frame Fch (i-fs). The motion subject of the mapped first video frame Fch (i-fs) is extracted by using the mask Mchb (i-fs), and the extracted avatar image can be synthesized with the second video frame f (i), so that the first target video frame fc (i) can be obtained.

Further, the mask mc (i) corresponding to the first target video frame may be calculated by the formula mc (i) ═ m (i) + Mch (i-fs)/r. Since Mch (i-fs) corresponds to the motion subject of the first video frame and m (i) corresponds to the motion subject of the second video frame, the motion subject of the first video frame can be attenuated by Mch (i-fs)/r, where r is an attenuation coefficient, which can be set as required. For example, r may be set to 2, and in the final effect, the greater the number of frames after the body of the sports subject is moved, the greater the corresponding transparency, as shown in fig. 1B, the transparency of the sports subject X is 0%, the transparency of the body X' may be 50%, and the transparency of the body X ″ may be 75%. Of course, if each segment is desired to be opaque, r may be set to 1, that is, attenuation may not be performed.

The attenuated mask Mch (i-fs)/r may be combined with a mask m (i) corresponding to a motion subject of the second video frame, so as to obtain a mask mc (i) corresponding to the first target video frame, where the mask mc (i) may extract the motion subject and the avatar in the first target video frame.

The pixel values in the mask mc (i) may also be limited, for example, a portion of the mask mc (i) where the pixel values are lower than a preset threshold may be 0, so that, in cooperation with the attenuation coefficient, the effect of limiting the number of the segmentations may be achieved. Of course, there are other ways to limit the number of individuals, and this application does not limit this.

The mask mc (i) may be mapped by H (i + fs), and the portion overlapping M (i + fs) may be removed from the mapped mask mc (i) to obtain mch (i). Similarly, Mch (i) may be gaussian blurred to yield Mchb (i). The first target video frame fc (i) may be mapped to a space corresponding to the third video frame F (i + fs) through H (i + fs), so as to obtain a mapped first target video frame fch (i). The mask mchb (i) is used for extracting a motion subject and a body of the mapped first target video frame fch (i), and the extracted body image can be synthesized with the third video frame F (i + fs), so that the second target video frame Fc (i + fs) can be obtained.

For the synthesis of the subsequent video frames of the target video, reference may be made to the synthesis manner of the second target video frame, which is not described herein again.

In one embodiment, the frame spacing may be varied, i.e., a non-equally spaced frame spacing effect may be achieved. For example, in the ith frame of the target video, the moving body may have three avatars, the first avatar may correspond to the (i-2) th frame (2 frames apart from the moving body) in the original video, the second avatar may correspond to the (i-5) th frame (3 frames apart from the first avatar) in the original video, and the third avatar may correspond to the (i-9) th frame (4 frames apart from the second avatar) in the original video.

The above is a detailed description of the image processing method provided in the embodiments of the present application.

The image processing method provided by the embodiment of the application can process the video, so that the motion subject in the video has the function of distinguishing the body, and the originality and the interestingness of video production are improved. In addition, the original video is restrained from being shot in situ by the user, the calculation amount required for increasing the body separating effect on the original video can be greatly reduced, so that the body separating effect can be realized without using post-processing special effect software such as AE (automatic analysis) and the like, the user can process the body separating effect on the video on electronic equipment such as a camera, a mobile terminal and the like, and the video is greatly convenient for the user to make and share.

Reference may be made to fig. 3, where fig. 3 is a structural diagram of a camera provided in this embodiment, the camera may be a configuration camera on an electronic device such as a mobile phone, or a camera mounted on an unmanned aerial vehicle, or a motion camera. The camera may include a lens, an image sensor, a processor 310 and a memory 320 storing computer programs.

The lens and the image sensor can be used for video shooting.

The processor may be adapted to process the captured video, which when executing the computer program performs the steps of:

acquiring a body-separating effect instruction;

Optionally, the processor is configured to, when processing an original video shot with a moving subject, obtain a first video frame and a second video frame from the original video shot with the moving subject, where a time corresponding to the first video frame is earlier than the second video frame; mapping the first video frame to a space corresponding to the second video frame; and synthesizing a first target video frame according to the mapped first video frame and the mapped second video frame.

Optionally, when the processor maps the first video frame to the space corresponding to the second video frame, the processor is configured to perform spatial transformation on the first video frame through a spatial transformation matrix, so as to map the first video frame to the space corresponding to the second video frame.

Optionally, the method further includes: an inertial measurement unit IMU;

the spatial transformation matrix comprises a rotation matrix, the rotation matrix is obtained by calculation based on the camera pose information corresponding to the first video frame and the camera pose information corresponding to the second video frame, and the camera pose information is acquired by the IMU.

Optionally, the spatial transformation matrix includes a homography matrix, and the processor is further configured to perform feature matching on the first video frame and the second video frame, and calculate the homography matrix according to a matching result.

Optionally, the matching result includes a plurality of feature pairs between the first video frame and the second video frame;

and the processor is used for screening the plurality of characteristic pairs when calculating the homography matrix according to the matching result, and calculating the homography matrix according to the screened credible characteristic pairs.

Optionally, when performing feature matching on the first video frame and the second video frame, the processor is configured to extract feature points for the designated areas of the first video frame and the second video frame, and perform feature matching on the extracted feature points.

Optionally, the designated area includes a background area other than the moving subject.

Optionally, the processor is configured to extract a moving subject in the mapped first video frame to obtain an avatar image when synthesizing a first target video frame according to the mapped first video frame and the mapped second video frame; and synthesizing a target video frame corresponding to the second video frame according to the body-separating image and the second video frame.

Optionally, when the processor extracts the motion subject in the mapped first video frame, the processor is configured to process the mapped first video frame through a target mask corresponding to the motion subject.

Optionally, the processor is further configured to perform motion body segmentation on the first video frame to obtain an original mask corresponding to the motion body; and mapping the original mask to a space corresponding to the second video frame to obtain the target mask.

Optionally, the processor is further configured to remove a portion of the target mask that overlaps with a moving object in the second video frame before processing the mapped first video frame through the target mask.

Optionally, the processor is further configured to perform a blurring process on the target mask before processing the mapped first video frame through the target mask.

Optionally, the processor is further configured to obtain a third video frame from the original video, where a time corresponding to the third video frame is later than the second video frame, and frame intervals between the first video frame, the second video frame, and the third video frame are the same; mapping the first target video frame to a space corresponding to the third video frame; and synthesizing a second target video frame according to the mapped first target video frame and the mapped third video frame.

Optionally, the original video is obtained by taking a rotating shot of the camera in situ.

Optionally, the original video is obtained by the camera in situ rotating and shooting along with the moving body.

Optionally, different ones of the dynamic segmentations have different transparency.

Optionally, the number of frames of the motion subject after the dynamic body split is positively correlated with the transparency of the dynamic body split.

Optionally, the body-separating effect instruction includes one or more of the following information: number of separated bodies, frame interval of separated bodies and transparency of separated bodies.

Optionally, the body-splitting effect instruction is triggered by a user.

Optionally, the original video is obtained by shooting in real time after the body-separating effect instruction is obtained.

Optionally, the processor is further configured to determine whether a displacement of the camera in the world coordinate system is less than or equal to a preset threshold when the original video is shot.

Optionally, the original video is a section selected by a user from the captured video.

For the cameras of the various embodiments provided above, reference may be made to the relevant description in the foregoing, and details are not described here again.

The camera provided by the embodiment of the application can process the video, so that the motion main body in the video has the function of separating, and the creative idea of the video and the interestingness of video production are improved. In addition, the original video is restrained from being shot in situ by the user, the calculation amount required for increasing the body separating effect on the original video can be greatly reduced, so that the body separating effect can be realized without using post-processing special effect software such as AE and the like, and the user can make and share the video greatly conveniently. In one embodiment, an IIR synthesis mode is also provided, which further reduces the amount of calculation required for implementing multiple body-separating operations, and greatly reduces the hardware conditions required for implementing the body-separating effect.

An embodiment of the present application further provides a mobile terminal, and refer to fig. 4, where fig. 4 is a structural diagram of the mobile terminal provided in the embodiment of the present application.

In one embodiment, the mobile terminal can be in wired or wireless connection with the camera, acquire original video shot by the camera from the camera, and perform the processing of the body separating effect on the original video. In one embodiment, the mobile terminal may be configured with a camera itself, and the original video may be a video captured by the camera itself.

The mobile terminal may include a processor 410 and a memory 420 storing computer programs;

the processor, when executing the computer program, implements the steps of:

acquiring a body-separating effect instruction;

Optionally, the spatial transformation matrix includes a rotation matrix, and the rotation matrix is obtained by calculation based on the camera pose information corresponding to the first video frame and the camera pose information corresponding to the second video frame.

Optionally, the original video is obtained by taking a rotating picture of the camera in situ.

Optionally, the body-splitting effect instruction is triggered by a user.

Optionally, the mobile terminal is configured with a camera, and the original video is obtained by real-time shooting through the camera after the body separating effect instruction is obtained.

For the mobile terminal of the above embodiments, specific implementations thereof may refer to the related descriptions in the foregoing, and are not described herein again.

The mobile terminal provided by the embodiment of the application can process the video, so that the motion main body in the video has the function of distinguishing the body, and the originality of the video and the interestingness of video production are improved. In addition, the original video is restrained from being shot in situ by the user, the calculation amount required for increasing the body separating effect on the original video can be greatly reduced, so that the body separating effect can be realized without using post-processing special effect software such as AE and the like, and the user can make and share the video greatly conveniently. In one embodiment, an IIR synthesis mode is also provided, which further reduces the amount of calculation required for implementing multiple body-separating operations, and greatly reduces the hardware conditions required for implementing the body-separating effect.

The embodiment of the application also provides a computer readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the embodiment of the application provides any one of the image processing methods.

In the above embodiments, various embodiments are provided for each step, and as to which embodiment is specifically adopted for each step, on the basis of no conflict or contradiction, a person skilled in the art can freely select or combine the embodiments according to actual situations, thereby forming various embodiments. While the present document is intended to be limited to the details and not by way of limitation, it is understood that various embodiments are also within the scope of the disclosure of the embodiments of the present application.

Embodiments of the present application may take the form of a computer program product embodied on one or more storage media including, but not limited to, disk storage, CD-ROM, optical storage, and the like, in which program code is embodied. Computer-usable storage media include permanent and non-permanent, removable and non-removable media, and information storage may be implemented by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of the storage medium of the computer include, but are not limited to: phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technologies, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic tape storage or other magnetic storage devices, or any other non-transmission medium, may be used to store information that may be accessed by a computing device.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The method, the electronic device and the like provided by the embodiment of the invention are described in detail, a specific example is applied in the description to explain the principle and the embodiment of the invention, and the description of the embodiment is only used to help understanding the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. An image processing method, comprising:

acquiring a body-separating effect instruction;

2. The method according to claim 1, wherein the processing of the original video with the moving subject comprises:

acquiring a first video frame and a second video frame from an original video shot with a moving body, wherein the corresponding moment of the first video frame is earlier than that of the second video frame;

mapping the first video frame to a space corresponding to the second video frame;

and synthesizing a first target video frame according to the mapped first video frame and the mapped second video frame.

3. The method of claim 2, wherein mapping the first video frame to a space corresponding to the second video frame comprises:

and performing spatial transformation on the first video frame through a spatial transformation matrix so as to map the first video frame to a space corresponding to the second video frame.

4. The method of claim 3, wherein the spatial transformation matrix comprises a rotation matrix calculated based on camera pose information corresponding to the first video frame and camera pose information corresponding to the second video frame.

5. The method of claim 3, wherein the spatial transformation matrix comprises a homography matrix, the homography matrix determined based on:

and performing feature matching on the first video frame and the second video frame, and calculating the homography matrix according to a matching result.

6. The method of claim 5, wherein the matching result comprises a plurality of feature pairs between the first video frame and the second video frame;

the calculating the homography matrix according to the matching result comprises:

and screening the plurality of feature pairs, and calculating the homography matrix according to the screened credible feature pairs.

7. The method of claim 5, wherein the feature matching the first video frame with the second video frame comprises:

and respectively extracting feature points aiming at the specified areas of the first video frame and the second video frame, and performing feature matching on the extracted feature points.

8. The method of claim 7, wherein the designated area comprises a background area other than the moving subject.

9. The method of claim 2, wherein the synthesizing a first target video frame from the mapped first video frame and the second video frame comprises:

extracting a motion subject in the mapped first video frame to obtain a body-separating image;

and synthesizing a target video frame corresponding to the second video frame according to the body-separating image and the second video frame.

10. The method of claim 9, wherein the extracting the motion subject in the mapped first video frame comprises:

and processing the mapped first video frame through a target mask corresponding to the moving body.

11. The method of claim 10, wherein the target mask is obtained based on:

carrying out motion body segmentation on the first video frame to obtain an original mask corresponding to the motion body;

and mapping the original mask to a space corresponding to the second video frame to obtain the target mask.

12. The method of claim 10, further comprising, prior to processing the mapped first video frame through the target mask:

removing a portion of the target mask that overlaps with the moving body in the second video frame.

13. The method of claim 10, further comprising, prior to processing the mapped first video frame through the target mask:

and carrying out fuzzy processing on the target mask.

14. The method of claim 2, further comprising:

acquiring a third video frame from the original video, wherein the corresponding moment of the third video frame is later than that of the second video frame, and the frame intervals among the first video frame, the second video frame and the third video frame are the same;

mapping the first target video frame to a space corresponding to the third video frame;

and synthesizing a second target video frame according to the mapped first target video frame and the mapped third video frame.

15. The method of claim 1, wherein the original video is captured by a camera rotating in place.

16. The method of claim 15, wherein the original video is captured by the camera rotating in situ to follow the moving subject.

17. The method of claim 1, wherein different ones of the dynamic segmentations have different transparency.

18. The method of claim 17, wherein the number of frames after the dynamic avatar is positively correlated with the transparency of the dynamic avatar.

19. The method of claim 1, wherein the body-split effect instruction comprises one or more of the following: number of separated bodies, frame interval of separated bodies and transparency of separated bodies.

20. The method of claim 1, wherein the body-split effect instruction is triggered by a user.

21. The method of claim 1, wherein the original video is captured in real-time after the body-separating effect instruction is obtained.

22. The method of claim 21, further comprising:

and when the original video is shot, judging whether the displacement of the camera in the world coordinate system is less than or equal to a preset threshold value.

23. The method of claim 1, wherein the original video is a user selected segment of the captured video.

24. A camera, comprising: a processor and a memory storing a computer program;

the processor, when executing the computer program, implements the steps of:

acquiring a body-separating effect instruction;

25. The camera according to claim 24, wherein the processor is configured to process an original video captured with a moving subject to obtain a first video frame and a second video frame from the original video captured with the moving subject, wherein the first video frame corresponds to a time earlier than the second video frame; mapping the first video frame to a space corresponding to the second video frame; and synthesizing a first target video frame according to the mapped first video frame and the mapped second video frame.

26. The camera of claim 25, wherein the processor, when mapping the first video frame to the space corresponding to the second video frame, is configured to perform a spatial transformation on the first video frame by a spatial transformation matrix to map the first video frame to the space corresponding to the second video frame.

27. The camera of claim 26, further comprising: an inertial measurement unit IMU;

28. The camera of claim 26, wherein the spatial transformation matrix comprises a homography matrix, and wherein the processor is further configured to perform feature matching on the first video frame and the second video frame and to calculate the homography matrix according to a result of the matching.

29. The camera of claim 28, wherein the matching result comprises a plurality of feature pairs between the first video frame and the second video frame;

30. The camera according to claim 28, wherein the processor is configured to, when performing feature matching on the first video frame and the second video frame, extract feature points for specified regions of the first video frame and the second video frame, respectively, and perform feature matching on the extracted feature points.

31. The camera according to claim 30, wherein the designated area includes a background area other than the moving subject.

32. The camera according to claim 25, wherein the processor is configured to extract a moving subject in the mapped first video frame to obtain a body-segmented image when synthesizing a first target video frame from the mapped first video frame and the mapped second video frame; and synthesizing a target video frame corresponding to the second video frame according to the body-separating image and the second video frame.

33. The camera of claim 32, wherein the processor, when extracting the motion subject from the mapped first video frame, is configured to process the mapped first video frame through a target mask corresponding to the motion subject.

34. The camera of claim 33, wherein the processor is further configured to perform motion subject segmentation on the first video frame to obtain an original mask corresponding to the motion subject; and mapping the original mask to a space corresponding to the second video frame to obtain the target mask.

35. The camera of claim 33, wherein the processor is further configured to remove a portion of the target mask that overlaps with a moving object in the second video frame before processing the mapped first video frame through the target mask.

36. The camera of claim 33, wherein the processor is further configured to blur the target mask prior to processing the mapped first video frame through the target mask.

37. The camera according to claim 25, wherein the processor is further configured to obtain a third video frame from the original video, the third video frame corresponding to a time later than the second video frame, and the frame intervals between the first video frame, the second video frame and the third video frame are the same; mapping the first target video frame to a space corresponding to the third video frame; and synthesizing a second target video frame according to the mapped first target video frame and the mapped third video frame.

38. The camera of claim 24, wherein the raw video is captured by the camera rotating in place.

39. The camera of claim 38, wherein the raw video is captured by the camera rotating in situ to follow the moving body.

40. The camera of claim 24, wherein different ones of the dynamic segmentations have different transparency.

41. The camera of claim 40, wherein the number of frames after the dynamic avatar is behind the moving subject is positively correlated with the transparency of the dynamic avatar.

42. The camera of claim 24, wherein the body-split effect instructions comprise one or more of the following: number of separated bodies, frame interval of separated bodies and transparency of separated bodies.

43. The camera of claim 24, wherein the body-split effect instruction is triggered by a user.

44. The camera of claim 24, wherein the original video is captured in real-time after the body-separating effect instruction is obtained.

45. The camera of claim 44, wherein the processor is further configured to determine whether a displacement of the camera in a world coordinate system is less than or equal to a preset threshold when capturing the original video.

46. The camera of claim 24, wherein the original video is a user selected segment of the captured video.

47. A mobile terminal, comprising: a processor and a memory storing a computer program;

the processor, when executing the computer program, implements the steps of:

acquiring a body-separating effect instruction;

48. The mobile terminal of claim 47, wherein the processor is configured to process an original video captured with a moving subject to obtain a first video frame and a second video frame from the original video captured with the moving subject, wherein the first video frame corresponds to a time earlier than the second video frame; mapping the first video frame to a space corresponding to the second video frame; and synthesizing a first target video frame according to the mapped first video frame and the mapped second video frame.

49. The mobile terminal of claim 48, wherein the processor, when mapping the first video frame to the space corresponding to the second video frame, is configured to perform a spatial transformation on the first video frame by using a spatial transformation matrix to map the first video frame to the space corresponding to the second video frame.

50. The mobile terminal of claim 49, wherein the spatial transformation matrix comprises a rotation matrix calculated based on the camera pose information corresponding to the first video frame and the camera pose information corresponding to the second video frame.

51. The mobile terminal of claim 49, wherein the spatial transform matrix comprises a homography matrix, and wherein the processor is further configured to perform feature matching on the first video frame and the second video frame and to calculate the homography matrix according to a matching result.

52. The mobile terminal of claim 51, wherein the matching result comprises a plurality of feature pairs between the first video frame and the second video frame;

53. The mobile terminal of claim 51, wherein the processor, when performing feature matching on the first video frame and the second video frame, is configured to extract feature points for specified areas of the first video frame and the second video frame, respectively, and perform feature matching on the extracted feature points.

54. The mobile terminal of claim 53, wherein the designated area comprises a background area other than the moving body.

55. The mobile terminal of claim 48, wherein the processor is configured to extract a motion subject in the mapped first video frame to obtain a body-segmented image when synthesizing a first target video frame according to the mapped first video frame and the mapped second video frame; and synthesizing a target video frame corresponding to the second video frame according to the body-separating image and the second video frame.

56. The mobile terminal of claim 55, wherein the processor, when extracting a motion subject from the mapped first video frame, is configured to process the mapped first video frame through a target mask corresponding to the motion subject.

57. The mobile terminal of claim 56, wherein the processor is further configured to perform motion body segmentation on the first video frame to obtain an original mask corresponding to the motion body; and mapping the original mask to a space corresponding to the second video frame to obtain the target mask.

58. The mobile terminal of claim 56, wherein the processor is further configured to remove a portion of the target mask that overlaps with a moving object in the second video frame before processing the mapped first video frame through the target mask.

59. The mobile terminal of claim 56, wherein the processor is further configured to blur the target mask prior to processing the mapped first video frame through the target mask.

60. The mobile terminal of claim 48, wherein the processor is further configured to obtain a third video frame from the original video, wherein the third video frame corresponds to a time later than the second video frame, and the frame intervals between the first video frame, the second video frame, and the third video frame are the same; mapping the first target video frame to a space corresponding to the third video frame; and synthesizing a second target video frame according to the mapped first target video frame and the mapped third video frame.

61. The mobile terminal of claim 47, wherein the original video is obtained by taking a rotating picture of a camera in place.

62. The mobile terminal of claim 61, wherein the original video is captured by the camera rotating in-situ to follow the moving object.

63. The mobile terminal of claim 47, wherein different ones of the dynamic segmentations have different transparency.

64. The mobile terminal of claim 63, wherein a number of frames after the dynamic avatar is behind the moving subject is positively correlated with a transparency of the dynamic avatar.

65. The mobile terminal of claim 47, wherein the body-splitting effect instruction comprises one or more of the following information: number of separated bodies, frame interval of separated bodies and transparency of separated bodies.

66. The mobile terminal of claim 47, wherein the body-split effect instruction is triggered by a user.

67. The mobile terminal according to claim 47, wherein the mobile terminal is configured with a camera, and the original video is obtained by real-time shooting through the camera after the body-splitting effect instruction is acquired.

68. The mobile terminal of claim 67, wherein the processor is further configured to determine whether a displacement of the camera in the world coordinate system is less than or equal to a preset threshold when capturing the original video.

69. The mobile terminal of claim 47, wherein the original video is a user selected segment of a captured video.

70. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, implements the image processing method according to any one of claims 1 to 23.