CN114494004B

CN114494004B - Sky image processing method and device

Info

Publication number: CN114494004B
Application number: CN202210395165.8A
Authority: CN
Inventors: 李博贤; 彭丽江; 郑鹏程; 陶颖
Original assignee: Beijing Meishe Network Technology Co ltd
Current assignee: Beijing Meishe Network Technology Co ltd
Priority date: 2022-04-15
Filing date: 2022-04-15
Publication date: 2022-08-05
Anticipated expiration: 2042-04-15
Also published as: CN114494004A

Abstract

The embodiment of the invention provides a sky image processing method and a sky image processing device. The method comprises the following steps: when the mobile terminal shoots, shooting video stream data is obtained, wherein the video stream data comprises a video frame sky image; extracting high-frequency characteristics and low-frequency characteristics of the video frame sky image; fusing the high-frequency features and the low-frequency features to generate global features; performing matrix decoding on the global features to generate mask data; and generating an image of the sky of the video frame by combining the mask data and the sky replacement image data. By the embodiment of the invention, the sky part of each frame of image is repeatedly segmented and replaced frame by frame, so that the effect of changing the sky in real time is achieved.

Description

Sky image processing method and device

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a sky image processing method, a sky image processing apparatus, an electronic device, and a storage medium.

Background

With the rise of intelligent automobiles, vehicle-mounted equipment is increasingly perfected, and the corresponding intelligent requirements for various parts of the vehicles are increased. At present, the vehicle-mounted camera is a standard assembly of an intelligent automobile, and automatic post processing is carried out on images captured by the camera, so that the image processing requirement of a user on vehicle acquisition is met. However, since the hardware performance of the in-vehicle terminal on the vehicle is limited, the processing method used on the terminal such as a PC (personal computer) is not suitable for the in-vehicle terminal.

Disclosure of Invention

In view of the above, embodiments of the present invention are proposed to provide a sky image processing method, a sky image processing apparatus, an electronic device and a storage medium that overcome or at least partially solve the above problems.

In order to solve the above problem, an embodiment of the present invention discloses a sky image processing method, which is applied to a mobile terminal, where sky replacement image data is stored on the mobile terminal, and the method includes:

when the mobile terminal shoots, shooting video stream data is obtained, wherein the video stream data comprises a video frame sky image;

extracting high-frequency characteristics and low-frequency characteristics of the video frame sky image;

fusing the high-frequency features and the low-frequency features to generate global features;

performing matrix decoding on the global features to generate mask data;

and generating an image of the sky of the video frame by combining the mask data and the sky replacement image data.

Optionally, the method further comprises:

and performing image post-processing on the sky-changing image to generate an effect image.

Optionally, the step of extracting the high-frequency feature and the low-frequency feature of the video frame sky image includes:

performing fast down-sampling processing on the video frame sky image;

copying the processed video frame sky image to generate a first sky image and a second sky image;

performing high-frequency feature extraction on the first sky image;

and extracting low-frequency features of the second sky image.

Optionally, the step of extracting low-frequency features of the second sky image includes:

extracting low-frequency semantic information in the second sky image in a low-frequency coding mode;

and determining the low-frequency semantic information as the low-frequency features.

Optionally, the step of performing high-frequency feature extraction on the first sky image includes:

extracting high-frequency semantic information in the first sky image in a high-frequency coding mode;

and determining the high-frequency semantic information as the high-frequency features.

Optionally, the fusing the high-frequency features and the low-frequency features to generate global features includes:

and combining the high-frequency features and the low-frequency features in the channel direction through a preset matrix to generate global features.

Optionally, after the step of performing matrix decoding on the global features to generate mask data, the method further includes:

and performing guiding filtering processing on the mask data.

The embodiment of the invention also discloses a sky image processing device, which is applied to a mobile terminal, wherein sky replacement image data are stored on the mobile terminal, and the device comprises:

the mobile terminal comprises an acquisition module, a storage module and a display module, wherein the acquisition module is used for acquiring the shot video stream data when the mobile terminal shoots, and the video stream data comprises a video frame sky image;

the extraction module is used for extracting high-frequency characteristics and low-frequency characteristics of the video frame sky image;

the fusion module is used for fusing the high-frequency characteristic and the low-frequency characteristic to generate a global characteristic;

the decoding module is used for carrying out matrix decoding on the global features to generate mask data;

and the combining module is used for combining the mask data and the sky replacement image data on the video frame sky image to generate a sky-changing image.

An embodiment of the present invention further discloses an electronic device, which includes a processor, a memory, and a computer program stored on the memory and capable of running on the processor, and when being executed by the processor, the computer program implements the steps of the sky image processing method described above.

The embodiment of the invention also discloses a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and when being executed by a processor, the computer program realizes the steps of the sky image processing method.

The embodiment of the invention has the following advantages:

the embodiment of the invention obtains the shot video stream data when the mobile terminal shoots, wherein the video stream data comprises a video frame sky image; extracting high-frequency characteristics and low-frequency characteristics of the video frame sky image; fusing the high-frequency features and the low-frequency features to generate global features; performing matrix decoding on the global features to generate mask data; and generating an image of the sky of the video frame by combining the mask data and the sky replacement image data. According to the embodiment of the invention, the shot video frame sky image is obtained, the sky part in the captured image is segmented to obtain the global characteristics, the mask data is generated, and the mask data and the sky replacement image data are combined on the video frame sky image to generate the sky replacement image, so that the effect of replacing the sky part of the image with other scenes in real time is realized, the accuracy is high, the parameter quantity is small, and the power consumption is low; the method is suitable for the mobile terminal.

Drawings

FIG. 1 is a flowchart illustrating steps of a sky image processing method according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating steps of another embodiment of a sky image processing method according to the present invention;

fig. 3 is a block diagram of an embodiment of a sky image processing apparatus according to the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

Referring to fig. 1, a flowchart illustrating steps of an embodiment of a sky image processing method according to the present invention is applied to a mobile terminal having sky replacement image data stored thereon.

It should be noted that the mobile terminal may include a handheld mobile terminal, such as a mobile phone, a tablet computer, and the like, and may also include a terminal that mechanically bears movement, such as a vehicle-mounted terminal, and the like. The embodiment of the present invention is not particularly limited thereto.

The sky replacement image data may be material data related to a sky image or material data of a different type from the sky, and the sky replacement image data is an image for replacing a sky portion in the sky image of the video frame. The storage address of the internal storage space of the mobile terminal may be stored, and the storage address of the external storage space of the mobile terminal may also be stored.

The method may specifically comprise the steps of:

step 101, when the mobile terminal shoots, obtaining shot video stream data, wherein the video stream data comprises a video frame sky image;

in practical application, the mobile terminal may be configured with at least one camera, and when one of the cameras is started to shoot, video stream data shot by the camera for an actual scene may be acquired, where the video stream data includes a plurality of frames of video frame sky images.

Step 102, extracting high-frequency characteristics and low-frequency characteristics of the video frame sky image;

after the video frame sky image is obtained, high-frequency feature extraction and low-frequency feature extraction can be respectively performed on the video frame sky image so as to extract high-frequency features and low-frequency features of the video frame sky image.

Step 103, fusing the high-frequency features and the low-frequency features to generate global features;

and fusing the high-frequency features and the low-frequency features at the same positions according to the obtained high-frequency features and the low-frequency features to generate matrix data, wherein the matrix data is the global features.

104, performing matrix decoding on the global features to generate mask data;

because the matrix data is data which is easy to read by a computer, the global characteristics need to be subjected to matrix decoding to restore natural semantic information which can be understood by people, and mask data are generated.

And 105, generating a sky-changing image on the video frame sky image by combining the mask data and the sky replacement image data.

And combining the mask data and the sky replacement image data to be replaced layer by layer on the sky image of the video frame to generate a new sky image so as to replace the sky part in the sky image of the video frame.

Referring to fig. 2, a flowchart illustrating steps of another embodiment of a sky image processing method according to the present invention is applied to a mobile terminal having sky replacement image data stored thereon. The mobile terminal may specifically be a vehicle-mounted terminal.

Step 201, when the mobile terminal shoots, obtaining shot video stream data, wherein the video stream data comprises a video frame sky image;

in practical application, during the running process of a vehicle, a vehicle-mounted camera shoots road condition images in real time, and a vehicle-mounted terminal pulls a stream to obtain video stream data; intercepting a single frame image in video stream data to be a video frame sky image, wherein the video stream is converted into an image frame which can be intercepted for 24 to 30 frames in one second according to different devices so as to generate the video frame sky image. In a preferred example of the present invention, since the frame rate at which the human eye can distinguish the fluency is 24 frames/second, the video stream data can be intercepted at a frequency of 24 frames/second.

In addition, the single frame of the sky image of the video frame can be preprocessed. The pretreatment process may specifically include two steps:

1. converting the video frame sky image into matrix data, specifically, converting according to the following formula:

where x is the matrix of image transformation, mean is the mean of the training data set, std is the variance of the training data set, mean used in the embodiments of the present invention corresponds to three RGB channels [0.485, 0.456, 0.406], std corresponds to three RGB channels [0.229, 0.224, 0.225 ].

2. Scaling the obtained matrix to a fixed size by using a bilinear difference, where the fixed size may be set by a person skilled in the art according to a requirement, and this is not specifically limited in the embodiment of the present invention. However, the product of length and width is proportional to the amount of calculation of the model, but the accuracy of the model is reduced due to the excessively small size, and in a preferred example of the present invention, the fixed size is 512 width, 256 height, and 3 channel number.

Step 202, extracting high-frequency characteristics and low-frequency characteristics of the video frame sky image;

and extracting characteristics of the sky image of the video frame, and extracting high-frequency characteristics and low-frequency characteristics of the sky image of the video frame.

In an optional embodiment of the present invention, the step of extracting the high frequency feature and the low frequency feature of the sky image of the video frame includes:

substep S2021, carrying out fast down-sampling processing on the video frame sky image;

a fast down-sampling module in a neural network model can be adopted to carry out fast down-sampling processing on the sky image of the video frame. The fast down-sampling module consists of a convolution module, a batch normalization module and a nonlinear module.

The convolution module mainly functions to quickly extract information of the video frame sky image and perform compression refining, and in an example of the invention, a convolution kernel of 5 × 5 is used, so that the output size of the video frame sky image is 1/16 of the area of the input size.

The non-linear module may specifically be a ReLU6 (activation function) module, the maximum output of which is limited to 6, i.e. the output value through the ReLU6 does not exceed the number 6, if it is greater than 6, it is calculated as 6. So that the mobile terminal device float16/int8 has good numerical resolution at the time of low precision. Wherein the definition for the ReLU6 is:

where x is the input data, typically in the form of a matrix.

The substep S2022, copying the processed video frame sky image to generate a first sky image and a second sky image;

after the fast down-sampling, one copy of the video frame sky image after the fast down-sampling processing can be copied to generate the same video frame sky image, and one copy of the video frame sky image is determined to be a first sky image, and the other copy of the video frame sky image is determined to be a second sky image. It should be noted that the first sky image and the second sky image are identical, and the two image data, the first sky image and the second sky image, are only used for distinguishing and sending to the high frequency module or the low frequency module.

Substep S2023, performing high frequency feature extraction on the first sky image;

in practical application, the first sky image may be sent to a high-frequency encoding module, and the high-frequency encoding module performs high-frequency feature extraction on the first sky image.

Specifically, the step of extracting the high-frequency feature of the first sky image includes:

a substep S20231, extracting high-frequency semantic information in the first sky image by adopting a high-frequency coding mode;

substep S20232, determining the high frequency semantic information as the high frequency feature.

It should be noted that the high frequency coding module is an independent convolution unit, the input of the high frequency coding module is identical to that of the low frequency coding module, and the high frequency coding module can correspond to the position, shape and size of the sky. The high frequency encoding module is similar in structure to the fast down-sampling module. The convolution module is replaced by depth separable convolution, namely, a depthwise convolution of 3x3 and a pointwise convolution replace a single 3x3 convolution, so that the purpose of reducing the calculation amount by an approximation effect is achieved. In addition, the nonlinear module in the high frequency coding module adopts a PReLU activation function. The PReLU activation function is defined as:

where a is a learnable parameter and the initial value is 0.25.

Therefore, a high-frequency coding mode of a high-frequency coding module is adopted to extract high-frequency semantic information in the first sky image; the high-frequency semantic features are features displayed in a high-frequency band after the image is converted into a frequency domain so as to represent the details of the image and sharpen textures. High frequency semantic information is then determined as the high frequency feature.

And a substep S2024 of extracting low frequency features of the second sky image.

In practical application, the second sky image may be sent to a low-frequency encoding module, and the low-frequency encoding module performs high-frequency feature extraction on the second sky image. The low-frequency coding module is used for extracting low-frequency semantic information in the second sky image.

Specifically, the step of extracting low-frequency features of the second sky image includes:

a substep S20241, extracting low-frequency semantic information in the second sky image by using a low-frequency coding mode;

sub-step S20242 determines the low frequency semantic information to be the low frequency feature.

In the embodiment of the invention, the position, the shape and the size of the sky can be corresponded, and the related information in the input data can be further refined by training the learned low-frequency coding module.

The low-frequency coding module uses a skeleton structure of ShuffleNet V2, replaces a common convolution by a depth separable convolution, and greatly reduces the calculation amount of a model under the condition of little accuracy reduction. And the method is optimized for the 1x1 convolution with high complexity in the depth separable convolution, and the effect of the traditional point convolution is simulated by using a mode of combining the 1x1 convolution with group and channel random scrambling with less calculation amount.

Therefore, the low-frequency encoding mode of the low-frequency encoding module is adopted to extract the low-frequency semantic information in the second sky image; wherein the low-frequency semantic features characterize the general outline and approximate edges of the sky portion. Low frequency semantic information is then determined as the low frequency features.

Step 203, fusing the high-frequency features and the low-frequency features to generate global features;

it should be noted that the fusion of the characteristic high-frequency characteristic and the low-frequency characteristic is to fuse the characteristic data output by the receptive field enhancement module in the low-frequency extraction module with the characteristic data output by the high-frequency coding module. Because the existence forms of the feature data in the neural network are all matrix forms, the global features can be generated by fusing the matrix data respectively corresponding to the high-frequency features and the low-frequency features.

Specifically, the global features may be generated by combining the high-frequency features and the low-frequency features in a channel direction through a preset matrix.

In practical application, each bit of data in the two matrixes can be calculated by combining the matrixes in the channel direction, and the generated matrix is the global characteristic. The global feature score thus obtained can ensure that both high-frequency and low-frequency information is preserved as much as possible.

Step 204, performing matrix decoding on the global features to generate mask data;

the global features can be input into a decoding module for matrix decoding, and the coded information is restored into natural semantic information which can be understood by people, so that mask data is generated. The decoding module and the high-frequency coding module have the same structure.

Specifically, the global features are input into a decoding module, the decoding module outputs a matrix with all values between 0 and 1, and the matrix is determined to be mask data of the size of the matrix corresponding to the video frame sky image.

Step 205, performing guided filtering processing on the mask data;

and conducting guide filtering processing on the mask data to realize fine processing on the mask data, and scaling the sky image of the video frame and the mask data to the same size. For this size, the smaller the size, the shorter the overall time consumed to conduct the filtering process, but the more the loss of precision will be, and the specific size values can be determined by the field according to the actual task requirements and effects.

Note that one guide image is required for the guide filtering process, and the guide image may be a separate image or the input image itself. In the embodiment of the invention, a sky image of a video frame of data is used as a guide image, and refined mask data is obtained by calculating through a guide filtering function corresponding to guide filtering processing. Wherein the guided filter function is:

wherein, the filter is a guiding filter, N is a constant matrix of all 1 s with the same size as the size of the input image, and N after the filter processing is still a constant. The eps is a small constant value and mainly has the main function of ensuring that the denominator is not zero and avoiding data abnormity under extreme conditions, such as no sky in a picture. eps was 0.002. And x and y are respectively a mask data matrix for scaling to a fixed size and processing the sky image of the video frame and guiding, and a fine _ mask is obtained through a series of operations, namely fine mask data finely tuned through guiding filtering.

Step 206, generating an image of the sky in the video frame by combining the mask data and the sky replacement image data;

in practical application, the refined mask data can be scaled to the size of the original image, and then the sky replacement image data and the mask data are fused to the sky image of the video frame according to a certain proportion based on a fusion formula, so that the sky changing effect is visually formed. Specifically, the fusion formula is:

the sky is replaced by sky to be replaced, the mask is mask data, the closer the value of the mask to 1, the more probable the pixel model considers to be the sky, the closer the value of the mask to 0, the more probable the pixel model does not mean to be the sky, and the sky to be replaced and the original image are fused according to a certain proportion after the above formula is calculated. It should be noted that all the equations are matrix operations.

And step 207, performing image post-processing on the sky-changing image to generate an effect image.

And on the basis of the sky-changing image, adjusting the color temperature, the tone and the brightness of the sky-changing image so as to ensure that the target image is more real. The adjustment values of color temperature, hue and brightness are related to the sky-changing image, for example, the color temperature is decreased and the brightness is decreased and decreased in sky at night, and vice versa, and those skilled in the art can determine the adjustment values according to the actual needs, and the embodiment of the present invention is not limited thereto.

In the embodiment of the invention, single-frame images are preprocessed by acquiring video stream data captured by a vehicle-mounted camera and intercepting the video stream data into video frame sky images; fast down-sampling is carried out on the matrix obtained after the processing, and the obtained characteristic coding matrix is respectively sent to a low-frequency coding module and a high-frequency coding module; the data processed by the low-frequency coding module is sent to a receptive field enhancement module; carrying out feature fusion on the data processed by the high-frequency coding module and the data processed by the receptive field enhancement module to generate global features; processing the global features through a decoding module to obtain mask data of sky segmentation; the mask data and the video frame sky image are processed through a guide filtering module together to obtain fine sky segmentation mask data; the sky replacement image data to be replaced and the mask data act on the sky image of the video frame together to obtain a sky-changing image; and carrying out post-processing on the image after the change of days, such as color temperature, hue, brightness and the like to obtain a final result graph. The sky of each frame of image is repeatedly segmented and replaced frame by frame rapidly, so that the effect of changing the sky in real time is achieved; the method has the advantages of high precision, small parameter quantity and low power consumption.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.

Referring to fig. 3, a block diagram of a sky image processing apparatus according to an embodiment of the present invention is shown, where the apparatus is applied to a mobile terminal, and the mobile terminal stores sky replacement image data thereon, and the apparatus may specifically include the following modules:

an obtaining module 301, configured to obtain video stream data that is shot when the mobile terminal shoots, where the video stream data includes a video frame sky image;

an extracting module 302, configured to extract a high-frequency feature and a low-frequency feature of the video frame sky image;

a fusion module 303, configured to fuse the high-frequency feature and the low-frequency feature to generate a global feature;

a decoding module 304, configured to perform matrix decoding on the global features to generate mask data;

a combining module 305, configured to combine the mask data and the sky replacement image data to generate an image of sky in the video frame sky image.

In an optional embodiment of the invention, the apparatus further comprises:

and the post-processing module is used for performing image post-processing on the sky-changing image to generate an effect image.

In an optional embodiment of the present invention, the extracting module 302 includes:

the fast down-sampling sub-module is used for carrying out fast down-sampling processing on the video frame sky image;

the copying submodule is used for copying the processed video frame sky image to generate a first sky image and a second sky image;

the first extraction submodule is used for extracting high-frequency features of the first sky image;

and the second extraction submodule is used for extracting the low-frequency characteristics of the second sky image.

In an optional embodiment of the invention, the second extraction sub-module comprises:

the first extraction unit is used for extracting low-frequency semantic information in the second sky image in a low-frequency coding mode;

and the first determining unit is used for determining the low-frequency semantic information as the low-frequency feature.

In an optional embodiment of the invention, the first extraction sub-module comprises:

the second extraction unit is used for extracting high-frequency semantic information in the first sky image in a high-frequency coding mode;

and the second determining unit is used for determining the high-frequency semantic information as the high-frequency feature.

In an optional embodiment of the present invention, the fusion module 303 includes:

and the merging submodule is used for merging the high-frequency features and the low-frequency features in the channel direction through a preset matrix to generate global features.

In an optional embodiment of the invention, the apparatus further comprises:

and the guiding filtering is used for carrying out guiding filtering processing on the mask data.

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

An embodiment of the present invention further provides an electronic device, including:

a processor and a storage medium storing a computer program executable by the processor, the computer program being executable by the processor to perform a method according to any one of the embodiments of the invention when the electronic device is run. The specific implementation manner and technical effects are similar to those of the method embodiment, and are not described herein again.

The embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the method according to any one of the embodiments of the present invention. The specific implementation manner and technical effects are similar to those of the method embodiment, and are not described herein again.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.

The sky image processing method and the sky image processing apparatus provided by the present invention are described in detail above, and the principle and the implementation of the present invention are explained in detail herein by applying specific examples, and the description of the above examples is only used to help understanding the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A sky image processing method applied to a mobile terminal having sky replacement image data stored thereon, the method comprising:

performing matrix decoding on the global features to generate mask data;

2. The method of claim 1, further comprising:

3. The method of claim 1, wherein the step of extracting high frequency features and low frequency features of the video frame sky image comprises:

performing fast down-sampling processing on the video frame sky image;

performing high-frequency feature extraction on the first sky image;

and extracting low-frequency features of the second sky image.

4. The method of claim 3, wherein the step of low frequency feature extraction of the second sky image comprises:

5. The method of claim 3, wherein the step of high frequency feature extraction of the first sky image comprises:

6. The method of claim 1, wherein the fusing the high frequency features and the low frequency features to generate global features comprises:

7. The method of claim 1, wherein after the step of matrix decoding the global features to generate mask data, the method further comprises:

and performing guiding filtering processing on the mask data.

8. A sky image processing apparatus applied to a mobile terminal having sky replacement image data stored thereon, the apparatus comprising:

the decoding module is used for carrying out matrix decoding on the global characteristics to generate mask data;

9. An electronic device comprising a processor, a memory and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of a sky-image processing method as claimed in any one of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of a sky-image processing method according to any one of claims 1 to 7.