CN113269676B

CN113269676B - Panoramic image processing method and device

Info

Publication number: CN113269676B
Application number: CN202110547908.4A
Authority: CN
Inventors: 邓欣; 王昊; 徐迈; 关振宇; 李大伟
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2021-05-19
Filing date: 2021-05-19
Publication date: 2023-01-10
Anticipated expiration: 2041-05-19
Also published as: CN113269676A

Abstract

The utility model relates to a panoramic image processing method and a device, which inputs the panoramic image to be processed into a convolution neural network model trained in advance by acquiring the panoramic image to be processed, wherein the convolution neural network model comprises a first processing layer, a second processing layer and a synthesis layer, the panoramic image to be processed is processed by the first processing layer to obtain first characteristic information, a first dimension zone and a first residual zone, the first characteristic information and the first residual zone are processed by the second processing layer to obtain a first characteristic image, the first dimension zone and the first characteristic image are processed by the synthesis layer to obtain a first panoramic image, and the convolution neural network model is constructed and generated to rapidly and efficiently improve the resolution of the panoramic image, ensure the quality of the resolution of the panoramic image and save the calculation resources.

Description

Panoramic image processing method and device

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a panoramic image processing method and apparatus.

Background

With the rapid development of virtual reality technology, panoramic multimedia plays an increasingly important role in human life, and when a panoramic image is viewed by using the panoramic multimedia, the panoramic image is required to have extremely high resolution, but due to the limitations of acquisition, storage and transmission, the resolution of the panoramic image is generally low at present.

At present, a super-resolution method is mostly adopted to restore a single or a series of low-resolution images into high-resolution images, but the existing super-resolution method aims at two-dimensional plane images and is not suitable for panoramic images; or the spherical panoramic image is projected onto a two-dimensional plane by adopting equal-moment cylindrical projection (ERP), but the adoption of the projection method can cause uneven pixel density at different latitudes, and particularly, geometric distortion and obvious stretching deformation are easy to occur in high-latitude areas in the panoramic image; or the panoramic image sequence is combined into the panoramic image with high resolution by adopting other methods, however, the panoramic image obtained by the method does not consider the change of the pixel density on the dimension, and the matching precision of the obtained super-resolution panoramic image is not high.

Disclosure of Invention

In order to solve the above technical problems or at least partially solve the above technical problems, the present disclosure provides a panoramic image processing method and apparatus, which solve the problem of improving the resolution of a panoramic image.

In a first aspect, an embodiment of the present disclosure provides a panoramic image processing method, including:

acquiring a panoramic image to be processed;

inputting a panoramic image to be processed into a convolutional neural network model which is trained in advance, wherein the convolutional neural network model comprises a first processing layer, a second processing layer and a synthesis layer, processing the panoramic image to be processed through the first processing layer to obtain first characteristic information, a first dimension band and a first residual band, processing the first characteristic information and the first residual band through the second processing layer to obtain a first characteristic image, and processing the first dimension band and the first characteristic image through the synthesis layer to obtain the first panoramic image.

Optionally, the first processing layer includes a plurality of sub-processing layers, where the sub-processing layers include a feature extraction layer, a spatial segmentation layer, and a discrimination layer;

the second processing layer comprises a feature extraction layer;

the resolution of the first panoramic image is greater than or equal to a preset resolution.

Optionally, the processing the panoramic image to be processed by the first processing layer to obtain first feature information, a first dimensional band and a first residual band includes:

performing feature extraction on a panoramic image to be processed through a feature extraction layer to obtain first feature information;

segmenting the first characteristic information through a space segmentation layer to obtain stripe data;

and judging the stripe data through the judging layer to obtain a first dimension band and a first residual band.

Optionally, the convolutional neural network model further includes a sub-pixel convolutional layer and a deconvolution layer, and the first feature information is segmented by the spatial segmentation layer to obtain stripe data, including:

sampling the first characteristic information by adopting the sub-pixel convolution layer to obtain first sampling data;

if the current sub-processing layer is the first sub-processing layer in the first processing layer, sampling the panoramic image to be processed by adopting a deconvolution layer to obtain second sampling data;

if the current sub-processing layer is not the first sub-processing layer in the first processing layer, sampling a first residual band obtained by the last sub-processing layer of the current sub-processing layer by using the deconvolution layer to obtain second sampling data;

obtaining a second characteristic image according to the first sampling data and the second sampling data;

and segmenting the second characteristic image according to the dimensionality of the panoramic image to be processed through the space segmentation layer to obtain stripe data.

Optionally, the processing the first feature information and the first residual band by the second processing layer to obtain a first feature image includes:

performing feature extraction on the first feature information through a feature extraction layer in the second processing layer to obtain second feature information;

sampling the second characteristic information by adopting the sub-pixel convolution layer to obtain updated first sampling data;

sampling the first residual band by using the deconvolution layer to obtain updated second sampling data;

and obtaining a first characteristic image according to the updated first sampling data and the updated second sampling data.

Optionally, the processing the first dimensional band and the first feature image through the synthesis layer to obtain a first panoramic image includes:

and sampling the first dimension band through the synthesis layer, and combining the sampled first dimension band and the first characteristic image according to the segmentation rule of the space segmentation layer to obtain a first panoramic image.

Optionally, before the panoramic image to be processed is input to the convolutional neural network model trained in advance, the method further includes:

acquiring a first panoramic image sample and a first tag panoramic image corresponding to the first panoramic image sample;

training a feature extraction layer, a second processing layer and a synthesis layer in a first processing layer according to the first panoramic image sample and the first label panoramic image to obtain an initial neural network model;

acquiring a second panoramic image sample and a second label panoramic image corresponding to the second panoramic image sample;

and training the initial neural network model, the space segmentation layer in the first processing layer and the discrimination layer in the first processing layer according to the second panoramic image sample and the second label panoramic image to obtain a convolutional neural network model.

Optionally, training a first feature extraction layer, a second processing layer, and a synthesis layer in the first processing layer according to the first panoramic image sample and the first tag panoramic image to obtain an initial neural network model, including:

inputting a first panoramic image sample into a network frame constructed by a feature extraction layer, a second processing layer and a synthesis layer in a first processing layer to obtain first prediction data;

obtaining a first loss function according to the first prediction data and the first label panoramic image;

and updating the parameters of the feature extraction layer, the parameters of the second processing layer and the parameters of the synthesis layer in the first processing layer according to the first loss function to obtain an initial neural network model.

Optionally, training the initial neural network model, the spatial segmentation layer in the first processing layer, and the discrimination layer in the first processing layer according to the second panoramic image sample and the second tag panoramic image to obtain a convolutional neural network model, including:

constructing a frame of a convolutional neural network model consisting of a network layer in the initial neural network model, a space division layer in the first processing layer and a discrimination layer in the first processing layer;

inputting the second panoramic image sample into a frame of a convolutional neural network model to obtain second prediction data;

obtaining a second loss function and an evaluation index corresponding to the discrimination layer according to the second prediction data and the second label panoramic image;

and updating parameters of each layer in the frame of the convolutional neural network model according to the second loss function and the evaluation index to obtain the convolutional neural network model.

In a second aspect, an embodiment of the present disclosure provides a panoramic image processing apparatus, including:

the acquisition module is used for acquiring a panoramic image to be processed;

the processing module is used for inputting the panoramic image to be processed into a convolutional neural network model which is trained in advance, wherein the convolutional neural network model comprises a first processing layer, a second processing layer and a synthesis layer, the panoramic image to be processed is processed through the first processing layer to obtain first characteristic information, a first dimension band and a first residual band, the first characteristic information and the first residual band are processed through the second processing layer to obtain a first characteristic image, and the first dimension band and the first characteristic image are processed through the synthesis layer to obtain the first panoramic image.

The panoramic image processing method and the panoramic image processing device provided by the embodiment of the disclosure can rapidly and efficiently improve the resolution of the panoramic image by acquiring the panoramic image to be processed and inputting the panoramic image to be processed into the convolutional neural network model trained in advance, wherein the convolutional neural network model comprises a first processing layer, a second processing layer and a synthesis layer, the first processing layer is used for processing the panoramic image to be processed to obtain first characteristic information, a first dimension band and a first residual band, the second processing layer is used for processing the first characteristic information and the first residual band to obtain a first characteristic image, and the synthesis layer is used for processing the first dimension band and the first characteristic image to obtain the first panoramic image.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present disclosure, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

Fig. 1 is a schematic diagram of an application scenario provided in an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of a training method of a convolutional neural network model according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a convolutional neural network model provided in an embodiment of the present disclosure;

fig. 4 is a schematic flowchart of a training method of a convolutional neural network model according to an embodiment of the present disclosure;

fig. 5 is a schematic flowchart of a training method of a convolutional neural network model according to an embodiment of the present disclosure;

fig. 6 is a schematic flowchart of a training method for a discrimination layer according to an embodiment of the present disclosure;

fig. 7 is a schematic flowchart of a panoramic image processing method according to an embodiment of the present disclosure;

fig. 8 is a schematic flowchart of a panoramic image processing method according to an embodiment of the present disclosure;

fig. 9 is a schematic flowchart of a panoramic image processing method according to an embodiment of the present disclosure;

fig. 10 is a schematic flowchart of a panoramic image processing method according to an embodiment of the present disclosure;

fig. 11 is a schematic structural diagram of a panoramic image processing apparatus according to an embodiment of the present disclosure.

Detailed Description

In order that the above objects, features and advantages of the present disclosure may be more clearly understood, aspects of the present disclosure will be further described below. It should be noted that the embodiments and features of the embodiments of the present disclosure may be combined with each other without conflict.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure, but the present disclosure may be practiced otherwise than as described herein; it is to be understood that the embodiments disclosed in the specification are only a few embodiments of the present disclosure, and not all embodiments.

Specifically, with the rapid development of virtual reality technology, panoramic multimedia plays an increasingly important role in human life, and when viewing panoramic images (ODI), people can obtain immersive interactive experience by changing the viewing angle within a range of 360 × 180 °. Typically, a person views a panoramic image through a Head Mounted Display (HMD), in which only a 110 × 110 ° range of viewing windows is visible. In order to have a High Resolution (HR) for this view window, the entire panoramic image needs to be of extremely high resolution. However, the resolution of panoramic images is currently generally low due to acquisition, storage and transmission limitations.

Specifically, the panoramic image processing method may be performed by a terminal or a server. Specifically, the terminal or the server may process the panoramic image to be processed through the convolutional neural network model, so as to obtain the panoramic image with the processed resolution higher than the preset resolution. The execution subject of the training method of the convolutional neural network model and the execution subject of the panoramic image processing method may be the same or different.

For example, in one application scenario, as shown in FIG. 1, the server 12 trains a convolutional neural network model. The terminal 11 obtains the trained convolutional neural network model from the server 12, and the terminal 11 processes the panoramic image through the trained convolutional neural network model, so as to improve the resolution of the panoramic image to be processed. The panoramic image may be captured by the terminal 11. Alternatively, the panoramic image is acquired by the terminal 11 from another device. Still alternatively, the panoramic image is an image obtained by the terminal 11 performing image processing on a preset image, where the preset image may be obtained by shooting with the terminal 11, or the preset image may be obtained by the terminal 11 from another device. Here, the other devices are not particularly limited.

In another application scenario, the server 12 trains a convolutional neural network model. Further, the server 12 processes the panoramic image through the trained convolutional neural network model, so as to improve the resolution of the panoramic image to be processed. The manner in which the server 12 acquires the panoramic image may be similar to the manner in which the terminal 11 acquires the panoramic image as above, and will not be described herein.

In yet another application scenario, the terminal 11 trains a convolutional neural network model. Further, the terminal 11 processes the panoramic image through the trained convolutional neural network model, so as to improve the resolution of the panoramic image to be processed.

It can be understood that the training method of the convolutional neural network model and the panoramic image processing method provided by the embodiments of the present disclosure are not limited to the above several possible scenarios. Since the trained convolutional neural network model can be applied to the panoramic image processing method, before introducing the panoramic image processing method, a training method of the convolutional neural network model may be introduced below.

Taking the server 12 training the convolutional neural network model as an example, a training method of the convolutional neural network model, that is, a training process of the convolutional neural network model, is described below. It is understood that the convolutional neural network model training method is also applicable to the scenario in which the terminal 11 trains the convolutional neural network model.

Fig. 2 is a schematic flow chart of a training method of a convolutional neural network model according to an embodiment of the present disclosure, including S210 to S240 shown in fig. 2:

s210, a first panoramic image sample and a first tag panoramic image corresponding to the first panoramic image sample are obtained.

Understandably, a panoramic image refers to a 360-degree panorama photographed by a panoramic camera containing image information from various directions of the camera. The method includes the steps of obtaining a first panoramic image sample, wherein the first panoramic image sample can be a panoramic image with a resolution smaller than a preset resolution, and the first tag panoramic image can be a panoramic image with a resolution greater than or equal to the preset resolution corresponding to the first panoramic image sample, that is, in a model training process, the first tag panoramic image is an accurate reference (tag) panoramic image obtained after the first panoramic image sample is subjected to panoramic processing.

Exemplarily, fig. 3 is a schematic diagram of a network structure of a convolutional neural network model according to an embodiment of the present disclosure, where a convolutional neural network framework 300 constructed in this embodiment includes: a sub-processing layer 310 in the first processing layer, a sub-processing layer 320 in the first processing layer, a second processing layer 330 and a composite layer 340, wherein the first processing layer in the frame 300 may include one or more sub-processing layers, fig. 3 includes two sub-processing layers 310 and 320, each sub-processing layer 310 or 320 includes a feature extraction layer, a spatial division layer and a discrimination layer, the second processing layer 330 includes a feature processing layer having the same structure as the feature extraction layer in the sub-processing layer, the feature extraction layer in the second processing layer 330 has the same structure as the feature extraction layer in the sub-processing layer, parameters may be different, and different magnification factors may be set by setting the number of sub-processing layers in the first processing layer in the frame 300 to achieve an improvement in resolution of the panoramic image, for example, setting the magnification factor to 2 ^j Taking the network structure of fig. 3 as an example, the amplification factor in the sub-processing layer 310 is 2 times, the amplification factor in the sub-processing layer 320 is 4 times, the amplification factor in the second processing layer 330 is 8 times, and so on.

S220, training a feature extraction layer, a second processing layer and a synthesis layer in the first processing layer according to the first panoramic image sample and the first label panoramic image to obtain an initial neural network model.

Understandably, according to the first panoramic image sample data, in the first training of the convolutional neural network model, only the feature extraction layer in the first processing layer, the feature extraction layer in the second processing layer and the synthesis layer are trained, and the space segmentation layer and the discrimination layer in the first processing layer do not participate in the first training to obtain the initial neural network model.

And S230, acquiring a second panoramic image sample and a second tag panoramic image corresponding to the second panoramic image sample.

Understandably, a second panoramic image sample is obtained, where the second panoramic image sample may be a panoramic image with a resolution smaller than a preset resolution, and the second tagged panoramic image may be a panoramic image with a resolution greater than or equal to the preset resolution corresponding to the second panoramic image sample, that is, in the model training process, the second tagged panoramic image is an accurate reference (tagged) panoramic image obtained after the second panoramic image sample is subjected to panoramic processing.

S240, training the initial neural network model, the space division layer in the first processing layer and the discrimination layer in the first processing layer according to the second panoramic image sample and the second label panoramic image to obtain a convolutional neural network model.

Understandably, on the basis of the trained initial neural network model, the network structure in the initial neural network model, the spatial segmentation layer and the discrimination layer in the first processing layer are jointly trained according to the second panoramic image sample and the second label panoramic image, namely the network structure of the whole convolutional neural network included in the training graph 3, and on the basis of the completion of the training of each feature extraction layer and the synthesis layer, the spatial segmentation layer and the discrimination layer are jointly trained, so that the time of model training can be effectively reduced, and the accuracy of the model is further improved.

According to the training method of the convolutional neural network model, the initial neural network model is obtained by obtaining the first panoramic image sample and the first label panoramic image corresponding to the first panoramic image sample and training the feature extraction layer, the second processing layer and the synthesis layer in the first processing layer according to the first panoramic image sample and the first label panoramic image, and the network structure of the whole convolutional neural network model is trained twice, so that the training time can be effectively reduced, and the accuracy of the model is improved to a certain extent.

Fig. 4 is a schematic flowchart of a training method of a convolutional neural network model according to an embodiment of the present disclosure. On the basis of the foregoing embodiment, optionally, according to the first panoramic image sample and the first tag panoramic image, training a first feature extraction layer, a second processing layer, and a synthesis layer in the first processing layer is performed to obtain an initial neural network model, the training of the whole convolutional neural network model is divided into two parts, and the training of the first part is performed to obtain the initial neural network model, which includes steps S410 to S430 shown in fig. 4:

s410, inputting the first panoramic image sample into a network framework constructed by a feature extraction layer, a second processing layer and a synthesis layer in the first processing layer to obtain first prediction data.

Understandably, the first panoramic image sample is input into a network framework constructed by a feature extraction layer in a first processing layer, a feature extraction layer in a second processing layer and a synthesis layer to obtain first prediction data, wherein the first prediction data is obtained by processing the first panoramic image sample through the network framework, and the first prediction data is data for improving or reducing the resolution of the first panoramic image sample to a certain extent.

And S420, obtaining a first loss function according to the first prediction data and the first label panoramic image.

Understandably, on the basis of the above S410, a first loss function is calculated according to the first prediction data obtained by inputting the first panoramic image sample into the network frame and the tag data of the first panoramic image sample, namely the first tagged panoramic image, and the network parameters of each layer in the network frame are updated by using the first loss function, where the calculation formula of the first loss function is shown in formula (1):

wherein L is _j Representing a first loss function, N representing the number of training samples, i.e. the number of first panoramic image samples, W _j Is a weight matrix defining the importance of each pixel according to its latitude, j represents the number of layers in the network framework, mainly including the number of levels of the first and second processing layers,

which represents the first prediction data, is,

representing a first tagged panoramic image.

And S430, updating the parameters of the feature extraction layer, the parameters of the second processing layer and the parameters of the synthesis layer in the first processing layer according to the first loss function to obtain an initial neural network model.

Understandably, on the basis of the above S420, according to the first loss function obtained by calculation, the network parameters of each feature extraction layer and each synthesis layer in the network framework are updated, and the initial neural network model is stored and generated.

The convolutional neural network model training method includes inputting a first panoramic image sample into a network framework constructed by a feature extraction layer, a second processing layer and a synthesis layer in a first processing layer to obtain first prediction data, obtaining a first loss function according to the first prediction data and a first label panoramic image, updating parameters of the feature extraction layer, the second processing layer and the synthesis layer in the first processing layer according to the first loss function to obtain an initial neural network model, calculating the first loss function of the network framework in the initial neural network model in training of a first part, updating parameters of the network framework in the network framework according to the first loss function, and ensuring accuracy of the model.

Fig. 5 is a schematic flowchart of a convolutional neural network model training method according to an embodiment of the present disclosure. On the basis of the foregoing embodiment, optionally, according to the second panoramic image sample and the second tag panoramic image, training the initial neural network model, the spatial division layer in the first processing layer, and the discrimination layer in the first processing layer to obtain a convolutional neural network model, where the training of the entire convolutional neural network model is divided into two parts, and after the initial neural network model is obtained through the training of the first part, training of the second part is performed to obtain the convolutional neural network model, where the training of the second part is performed to obtain the convolutional neural network model, and the method includes, as shown in fig. 5, S510 to S540:

s510, constructing a frame of a convolutional neural network model consisting of a network layer in the initial neural network model, a space division layer in the first processing layer and a discrimination layer in the first processing layer.

Understandably, a frame of the convolutional neural network model consisting of the network layer (network structure layer with network parameters), the spatial segmentation layer in the first processing layer and the discrimination layer, which are included in the initial neural network model after the training is completed, is constructed.

S520, inputting the second panoramic image sample into a frame of the convolutional neural network model to obtain second prediction data.

Understandably, on the basis of the above S510, the second panoramic image sample is input into the frame of the constructed convolutional neural network model for calculation, and the second prediction data corresponding to the second panoramic image sample is output.

And S530, obtaining a second loss function and an evaluation index corresponding to the discrimination layer according to the second prediction data and the second label panoramic image.

It can be understood that, on the basis of the foregoing S520 and S510, a second loss function and an evaluation index (reward) corresponding to a discrimination layer are obtained through calculation according to second prediction data output in a frame of the convolutional neural network model and a second tag panoramic image corresponding to standard data corresponding to a second panoramic image sample, where the second loss function is used to update parameters of each layer in the entire frame, and the discrimination layer may adopt a reinforcement learning theory to classify data output by the spatial segmentation layer. The second loss function is calculated as shown in equation (2):

wherein L is _total Represents the second loss function and J represents the number of first loss functions, i.e., the number of levels included in the overall convolutional neural network model.

Understandably, the discrimination layer is trained to maximize the accumulated reward in the convolutional neural network model, and because the process of determining the first latitude zone and the first remaining zone in each sub-processing layer is not trivial, the discrimination layer can be described as a Markov Decision Process (MDP), therefore, the embodiment trains the discrimination layer by adopting reinforcement learning, wherein the reinforcement learning comprises a state, an action and a reward, and the discrimination layer generates a discrete distribution reduction or non-reduction; the state in reinforcement learning is input of a discrimination layer, which is each dimensional band (stripe) obtained by spatially dividing a layer in a panoramic image; the reward mechanism in reinforcement learning not only considers the performance of the overall resolution in the panoramic image, but also considers the complexity of different dimensions in the panoramic image.

Understandably, the calculation formula of the discrimination layer is shown in formula (3):

f _Ek (X _k )＝π(a|X _k ),a∈{0,1} (3)

wherein, f _Ek Indicating a discriminating layer, X _k Indicating the state, a indicates the action.

Understandably, in the training phase, the actions in reinforcement learning are sampled from the column distribution of the actions a, and the actions a in the training phase _k Shows that the action a _k ～π(a|X _k ) And (4) showing.

Understandably, in the testing stage, the action in the reinforcement learning is valued according to the maximum probability, and the calculation formula of the action in the testing stage is shown as formula (4):

understandably, in the training phase, the calculation formula of the evaluation index (reward) corresponding to the current discrimination layer is shown as formula (5):

wherein, the first and the second end of the pipe are connected with each other,

expressing the evaluation index (reward) corresponding to the k-th discrimination layer in the current j-th level, wherein k expresses the number of discrimination layers in the current level, the selection of k corresponds to the number of stripes obtained after the space segmentation layer is segmented, b is reward weight and belongs to the balance between network performance and calculation, 1 _{1} An indicator function is represented.

Understandably, the calculation formula of the cumulative reward in the network framework of the whole convolution neural model is shown as formula (6):

wherein the content of the first and second substances,

indicating a cumulative prize, theta _k Expressed as a dimension in the k-th dimension band, using θ _k To take into account a non-uniform pixel distribution, gamma, over the dimensions of the panoramic image ^i-1 An attenuation coefficient representing future consideration, J is the number of tiers in the overall network,

model output representing the k-th dimension band, I ^gt (k) Standard data representing the k-th dimensional band.

And S540, updating parameters of each layer in the frame of the convolutional neural network model according to the second loss function and the evaluation index to obtain the convolutional neural network model.

It can be understood that, on the basis of the above S530, according to the calculated second loss function and the evaluation index corresponding to the discrimination layer, the parameters of each layer included in the framework of the convolutional neural network model are updated, including the network parameters of the first processing layer, the second processing layer, and the synthesis layer, and for the parameters of each layer included in the initial neural network model trained in advance, the parameters of each layer slightly change along with the joint training of the spatial division layer and the discrimination layer.

Understandably, the updating formula of the parameters of the discrimination layer in each hierarchy in the first processing layer is shown as formula (7):

wherein, w _k The parameter represents the discrimination layer, and β represents the learning rate.

Fig. 6 is a schematic flowchart of a process of training a discrimination layer by reinforcement learning, which specifically includes the following processes: inputting stripe data in the current environment into a pre-constructed discrimination layer to obtain an action signal, wherein the discrimination layer preferably comprises 4 convolution layers, 1 global pooling layer and 1 full-connection layer; determining whether a current action signal is reserved according to a preset stripe reservation rule, wherein the action signal can be regarded as the output of a judgment layer, the output of the judgment layer is a first dimension band and a first residual band, the first dimension band is required to be reserved, and the first residual band is not required to be reserved; if the motion signal is retained, obtaining a model output according to the motion signal, wherein the model output, namely the second prediction data, is synthesized according to the retained motion signal (the first dimension band) corresponding to the stripe data; subsequently, a loss function is obtained from the model output and the true value, and a reward is obtained from the loss function, that is, the loss function is calculated from the model output (second prediction data) and the true value (second tag panorama image), wherein the loss function may specifically refer to the above formula (6)

Calculating a reward (evaluation index) according to the loss function, the reward being calculated by the above formula (5) and formula (6); and updating the discrimination layer according to the reward, namely feeding the reward back to the discrimination layer for correcting the parameters of the discrimination layer, wherein the updating mode of each parameter in the discrimination layer can be shown in the formula (7), so as to obtain the updated discrimination layer, and continuously training the discrimination layer according to the stripe data. If the action signal is not reserved, the action signal is input into the next sub-processing layer, wherein the network type of the next sub-processing layer can be a convolutional neural network, that is, the action signal (the first remaining band) which is not reserved is input into the next sub-processing layer, and the parameters of the discrimination layer are continuously updated by the flow in the next sub-processing layer.

The convolutional neural network model training method provided by the embodiment of the disclosure includes the steps of constructing a frame of a convolutional neural network model composed of a network layer in an initial neural network model, a space division layer in a first processing layer and a discrimination layer in the first processing layer, inputting a second panoramic image sample into the frame of the convolutional neural network model to obtain second prediction data, obtaining a second loss function and an evaluation index corresponding to the discrimination layer according to the second prediction data and a second label panoramic image, updating parameters of each layer in the frame of the convolutional neural network model according to the second loss function and the evaluation index to obtain the convolutional neural network model, calculating the second loss function of the network frame in the whole convolutional neural network model and the evaluation index of the discrimination layer through training of a second part, updating parameters of each layer in the network frame in the convolutional neural network model according to the second loss function and the evaluation index, extracting network parameters of the layer and a synthesis layer according to characteristics which training is completed in advance, and ensuring the accuracy of data input to the space layer in the process of joint training, further improving the accuracy of the training of the whole network model, and further reducing the accuracy of the training model.

Fig. 7 is a schematic flowchart of a panoramic image processing method according to an embodiment of the present disclosure. On the basis of the above embodiment, the acquired panoramic image is processed according to the convolutional neural network model trained in advance, which includes steps S710 to S720 shown in fig. 7:

and S710, acquiring a panoramic image to be processed.

Understandably, the panoramic image to be processed is acquired, and the manner of acquiring the panoramic image is not limited, wherein the resolution of the panoramic image may be smaller than the preset resolution.

S720, inputting the panoramic image to be processed into a convolutional neural network model which is trained in advance, wherein the convolutional neural network model comprises a first processing layer, a second processing layer and a synthesis layer, the panoramic image to be processed is processed through the first processing layer to obtain first characteristic information, a first dimension zone and a first residual zone, the first characteristic information and the first residual zone are processed through the second processing layer to obtain a first characteristic image, and the first dimension zone and the first characteristic image are processed through the synthesis layer to obtain a first panoramic image.

Understandably, on the basis of the above S710, the obtained to-be-processed panoramic image is input into the convolutional neural network model trained in advance to obtain a first panoramic image, where the first panoramic image may be understood as an image obtained by increasing the resolution of the to-be-processed panoramic image to be equal to or greater than the preset resolution.

Understandably, the processing method between each layer of the interior of the convolution neural network model which is trained in advance comprises the following steps: inputting a panoramic image to be processed into a first processing layer, wherein the first processing layer may include one or more sub-processing layers, each sub-processing layer performs processing according to a processing sequence from front to back to obtain first feature information, a first dimension band and a first residual band, the last sub-processing layer transmits the obtained first feature information and the first residual band to a second processing layer, wherein each sub-processing layer has corresponding first feature information, the first dimension band and the first residual band, names, meanings and modes of data output by each sub-processing layer are the same, and specific numerical values are different; and performing feature extraction on the first feature information through a feature extraction layer in the second processing layer, and obtaining a first feature image according to the extracted first feature data and the first residual band.

Optionally, the processing the first dimensional band and the first feature image through the synthesis layer to obtain a first panoramic image includes: and sampling the first dimension band through the synthesis layer, and combining the sampled first dimension band and the first characteristic image according to the segmentation rule of the space segmentation layer to obtain a first panoramic image.

Understandably, in the synthesis layer, first, the first dimension band in each sub-processing layer is up-sampled, the up-sampled first dimension band and the first feature image have the same width resolution, the sampled first dimension band and the first feature image are combined according to the division rule of the space division layer to generate a first panoramic image, and in the combination process, in order to avoid artifacts, the overlapping area of the boundary can be reserved, and a weighted average is used to generate a smooth boundary.

The panoramic image processing method provided by the embodiment of the disclosure inputs a to-be-processed panoramic image into a pre-trained convolutional neural network model by acquiring the to-be-processed panoramic image, wherein the convolutional neural network model comprises a first processing layer, a second processing layer and a synthesis layer, the to-be-processed panoramic image is processed by the first processing layer to obtain first characteristic information, a first dimension band and a first residual band, the first characteristic information and the first residual band are processed by the second processing layer to obtain a first characteristic image, the first dimension band and the first characteristic image are processed by the synthesis layer to obtain a first panoramic image, and the convolutional neural network model is constructed and generated to rapidly and efficiently improve the resolution of the panoramic image, so that the calculation resources can be saved while the quality of the resolution-improved panoramic image is ensured.

Fig. 8 is a schematic flowchart of a panoramic image processing method according to an embodiment of the present disclosure. On the basis of the foregoing embodiment, optionally, processing the panoramic image to be processed by the first processing layer to obtain the first feature information, the first dimensional band, and the first remaining band includes steps S810 to S830 shown in fig. 8:

optionally, the first processing layer includes a plurality of sub-processing layers, where the sub-processing layers include a feature extraction layer, a spatial segmentation layer, and a discrimination layer; the second processing layer comprises a feature extraction layer; the resolution of the first panoramic image is greater than or equal to a preset resolution.

And S810, performing feature extraction on the panoramic image to be processed through the feature extraction layer to obtain first feature information.

Understandably, inputting a panoramic image to be processed into a feature extraction layer in a first sub-processing layer in a first processing layer to obtain first feature information, wherein the first feature information is obtained by performing convolution on the panoramic image to be processed, the feature extraction layer can be constructed by selecting a dense sub-network, the dense sub-network can be composed of a plurality of channel attention dense sub-blocks, each channel attention dense sub-block has global connection for extracting high-order features, and preferably, each channel attention dense sub-block comprises 8 basic convolution layers and has local connection.

Optionally, in a first sub-processing layer in the first processing layer, after performing a layer of convolution on the input image to be processed, the input image to be processed is input into a feature extraction layer in the first sub-processing layer to perform feature extraction.

Understandably, the formula for extracting the features of the panoramic image to be processed by the feature extraction layer in the first sub-processing layer in the first processing layer can be expressed as formula (8):

F ₁ ＝f _CAD (Conv(I _LR )) (8)

wherein, F ₁ Represents the first characteristic information in the first sub-processing layer (first hierarchy) in the first processing layer, and the first characteristic information in the j-th hierarchy may be represented as F _j ，f _CAD Representing operation of dense subnetworks in the feature extraction layer, conv representing a single convolutional layer, I _LR Representing a panoramic image to be processed.

And S820, segmenting the first characteristic information through the space segmentation layer to obtain stripe data.

Understandably, on the basis of the above S810, the first feature information is input to the spatial segmentation layer for segmentation, and stripe data corresponding to each dimension in the to-be-processed panoramic image is obtained, where the stripe data includes a plurality of stripes.

S830, the stripe data is judged through the judging layer, and a first dimension band and a first residual band are obtained.

Understandably, on the basis of the above S820, inputting the stripe data into the discrimination layer to obtain a first dimension zone and a first residual zone, where the number of stripes corresponds to the number of discrimination layers in the sub-processing layer, the first dimension zone is a stripe with higher resolution screened by the discrimination layer, the first residual zone is a stripe with lower resolution than the first dimension zone, and the sum of the number of stripes corresponding to the first dimension zone and the number of stripes corresponding to the first residual zone is equal to the total number of stripes.

The panoramic image processing method provided by the embodiment of the disclosure extracts features of a panoramic image to be processed through a feature extraction layer to obtain first feature information, segments the first feature information through a spatial segmentation layer to obtain stripe data, discriminates the stripe data through a discrimination layer to obtain a first dimension band and a first residual band, extracts features corresponding to a high resolution from an input panoramic image with a low resolution by constructing a first processing layer network through the feature extraction layer in the first processing layer, segments the extracted first feature information according to the dimension band through the spatial segmentation layer, and can effectively distinguish the first residual band with the low resolution and the first dimension band with the high resolution in the stripe data on a current processing level through the discrimination layer.

Fig. 9 is a schematic flowchart of a panoramic image processing method according to an embodiment of the present disclosure. On the basis of the foregoing embodiment, optionally, the convolutional neural network model further includes a sub-pixel convolutional layer and a deconvolution layer, and the first feature information is segmented by the spatial segmentation layer to obtain stripe data, where the first processing layer and the second processing layer both include the sub-pixel convolutional layer and the deconvolution layer, and the method includes steps S910 to S950 shown in fig. 9:

s910, sampling the first characteristic information by adopting the sub-pixel convolution layer to obtain first sampling data.

Understandably, after the first characteristic information of the panoramic image to be processed is extracted, the first characteristic information is up-sampled by adopting the sub-pixel convolution layer to obtain first sampling data.

And S920, if the current sub-processing layer is the first sub-processing layer in the first processing layer, sampling the panoramic image to be processed by adopting the deconvolution layer to obtain second sampling data.

Understandably, on the basis of the above S910, it is first determined whether the current sub-processing layer is the first sub-processing layer in the first processing layer, and if so, the deconvolution layer is adopted to perform upsampling on the panoramic image to be processed, so as to obtain second sampling data.

S930, if the current sub-processing layer is not the first sub-processing layer in the first processing layer, sampling the first residual band obtained by the last sub-processing layer of the current sub-processing layer by using the deconvolution layer to obtain second sampling data.

It can be understood that, on the basis of the foregoing S910, if it is determined that the current sub-processing layer is not the first sub-processing layer in the first processing layer, the deconvolution layer is used to perform upsampling on the first residual band obtained by the last sub-processing layer of the current sub-processing layer to obtain second sample data, and if the current sub-processing layer is the first sub-processing layer, the input of the first sub-processing layer is only the panoramic image to be processed, and does not include the first residual band obtained by the last sub-processing layer, therefore, for the first sub-processing layer, the deconvolution layer is used to perform upsampling on the panoramic image to be processed to obtain second sample data, whereas for the non-first sub-processing layer, the deconvolution layer can be directly used to perform upsampling on the first residual band obtained by the last sub-processing layer to obtain second sample data, and the operation steps of S920 and S930 are in an or relationship.

For example, taking fig. 3 as an example, the sub-processing layer 310 needs to perform upsampling on the panoramic image to be processed by using a deconvolution layer, and the sub-processing layer 320 only needs to perform upsampling on the first residual band obtained by the sub-processing layer 310.

And S940, obtaining a second characteristic image according to the first sampling data and the second sampling data.

It can be understood that, on the basis of the foregoing S920 and S910, the deconvolution layer is adopted to perform upsampling on the panoramic image to be processed, a second feature image is obtained according to the first sample data corresponding to the first feature information and the second sample data corresponding to the panoramic image to be processed, and the manner of calculating the second feature image on the basis of the foregoing S930 and S910 is the same as that on the basis of the foregoing S920 and S910, and a specific calculation formula of the second feature image is as shown in formula (9):

G ₁ ＝f _REC (F ₁ )+f _UP (I _LR ) (9)

wherein G is ₁ Representing the second feature image in the first sub-processing layer (first hierarchy) in the first processing layer, the second feature image in the j-th hierarchy may be represented as G _j ，f _REC Representing the operation of the sub-pixel convolution layer, f _UR Showing the operation of the deconvolution layer, it can be understood that the first feature image and the second feature image in the above embodiments are obtained in the same manner.

And S950, segmenting the second characteristic image according to the dimension of the panoramic image to be processed through the space segmentation layer to obtain stripe data.

Understandably, on the basis of the above S940, each level (sub-processing layer) in the first processing layer includes a spatial division layer, and the spatial division layer divides the second feature image according to the dimension of the panoramic image to be processed, so as to obtain stripe data including a plurality of stripes.

Illustratively, the spatial segmentation layer in the first hierarchy is applied to the second feature image G ₁ By dividing, G can be ₁ Dividing the obtained object into M stripes with the same size along the dimension to obtain M divided stripes { X ₁ ,X ₂ ,.......,X _M H, the height of each stripe is h _d = h/M, where h is the height of the panoramic image to be processed.

Understandably, each stripe in the stripe data corresponds to a discrimination layer, that is, the number of stripes is the same as the number of discrimination layers, and the expression for the discrimination layer to filter the stripe data is shown in formula (10):

wherein the content of the first and second substances,

representing a first remnant band in a sub-treatment layer in the first treatment layer,

representing a first residual band in the sub-treatment layer in the first treatment layer, f _SSM Indicating the operation of the discrimination layer.

The panoramic image processing method provided by the embodiment of the disclosure obtains first sampling data by sampling first feature information by using a sub-pixel convolution layer, obtains second sampling data by sampling a panoramic image to be processed by using a deconvolution layer if a current sub-processing layer is a first sub-processing layer in the first processing layer, obtains a second sampling data by sampling a first residual band obtained by a last sub-processing layer of the current sub-processing layer by using the deconvolution layer if the current sub-processing layer is not the first sub-processing layer in the first processing layer, obtains a second feature image according to the first sampling data and the second sampling data, and segments the second feature image according to the dimension of the panoramic image to be processed by using a space segmentation layer to obtain stripe data.

Fig. 10 is a schematic flowchart of a panoramic image processing method according to an embodiment of the present disclosure. On the basis of the foregoing embodiment, optionally, the second processing layer processes the first feature information and the first remaining band to obtain a first feature image, which includes steps S1100 to S1400 shown in fig. 10:

and S1100, performing feature extraction on the first feature information through a feature extraction layer in the second processing layer to obtain second feature information.

Understandably, the first feature information is input into a feature extraction layer in the second processing layer for feature extraction, so as to obtain second feature information, wherein a calculation formula of the second feature information is shown as formula (11):

D ₁ ＝f _CAD (f _FA (F ₁ )) (11)

wherein D is ₁ Representing second feature information obtained by the feature extraction layer in the second processing layer, f _FA Indicating a feature size alignment operation, pair F ₁ Horizontal cropping is performed to maintain the same latitude range as the first remnant tape.

S1200, sampling the second characteristic information by adopting the sub-pixel convolution layer to obtain updated first sampling data.

Understandably, on the basis of S1100, the second feature information after feature extraction is up-sampled by using a sub-pixel convolution layer to obtain updated first sampling data, where the network structure of the sub-pixel convolution layer is the same as that of the sub-pixel convolution layer in the first processing layer.

S1300, sampling the first residual band by adopting the deconvolution layer to obtain updated second sampling data.

Understandably, on the basis of S1100, the first residual band obtained from the last sub-processing layer in the first processing layer is upsampled by using the deconvolution layer, so as to obtain updated second sampled data, where the network structure of the deconvolution layer in the deconvolution layer is the same as that of the deconvolution layer in the first processing layer.

And S1400, obtaining a first characteristic image according to the updated first sampling data and the updated second sampling data.

Understandably, on the basis of the above S1100 and S1200, a first feature image is obtained according to the updated first sampling data and the updated second sampling data, wherein the first feature image and the second feature image have the same calculation mode, and a calculation formula of the first feature image is as shown in formula (12):

wherein H ₁ Representing a first characteristic image obtained by the second processing layer,

a first remnant tape output for a last sub-process layer in the first process layer.

The panoramic image processing method provided by the embodiment of the disclosure performs feature extraction on first feature information through a feature extraction layer in a second processing layer to obtain second feature information, samples the second feature information by using a sub-pixel convolution layer to obtain updated first sample data, samples a first residual band by using an deconvolution layer to obtain updated second sample data, obtains a first feature image according to the updated first sample data and the updated second sample data, processes output in the first processing layer in the second processing layer to obtain a first feature image, is convenient for a subsequent synthesis layer to generate the first panoramic image according to the size of the first feature image, and does not change other image information of the panoramic image while improving the resolution of the panoramic image.

Fig. 11 is a schematic structural diagram of a panoramic image processing apparatus according to an embodiment of the present disclosure. The panoramic image processing apparatus provided by the embodiment of the present disclosure may execute the processing flow provided by the panoramic image processing method embodiment, and as shown in fig. 11, the panoramic image processing apparatus 1100 includes:

an obtaining module 1101, configured to obtain a panoramic image to be processed;

the processing module 1102 is configured to input a panoramic image to be processed into a convolutional neural network model trained in advance, where the convolutional neural network model includes a first processing layer, a second processing layer, and a synthesis layer, the panoramic image to be processed is processed through the first processing layer to obtain first feature information, a first dimensional band, and a first residual band, the first feature information and the first residual band are processed through the second processing layer to obtain a first feature image, and the first dimensional band and the first feature image are processed through the synthesis layer to obtain a first panoramic image.

Optionally, the first processing layer in the processing module 1102 includes a plurality of sub-processing layers, where the sub-processing layers include a feature extraction layer, a spatial segmentation layer, and a discrimination layer; the second processing layer comprises a feature extraction layer; the resolution of the first panoramic image is greater than or equal to a preset resolution.

Optionally, in the processing module 1102, the panoramic image to be processed is processed through the first processing layer to obtain first feature information, a first dimension band and a first residual band, and the processing module is specifically configured to:

Optionally, the convolutional neural network model in the processing module 1102 further includes a sub-pixel convolutional layer and an inverse convolutional layer, and the first feature information is segmented by the spatial segmentation layer to obtain stripe data, which is specifically used for:

Optionally, in the processing module 1102, the first feature information and the first residual band are processed by the second processing layer to obtain a first feature image, which is specifically used for:

Optionally, the processing module 1102 processes the first dimension band and the first feature image through the synthesis layer to obtain a first panoramic image, which is specifically configured to:

Optionally, the apparatus 1100 further includes a training module, and the training module is specifically configured to:

acquiring a second panoramic image sample and a second tag panoramic image corresponding to the second panoramic image sample;

and training the initial neural network model, the spatial segmentation layer in the first processing layer and the discrimination layer in the first processing layer according to the second panoramic image sample and the second tag panoramic image to obtain a convolutional neural network model.

Optionally, the training module trains the first feature extraction layer, the second processing layer, and the synthesis layer in the first processing layer according to the first panoramic image sample and the first tag panoramic image to obtain an initial neural network model, which is specifically used for:

Optionally, the training module trains the initial neural network model, the spatial division layer in the first processing layer, and the discrimination layer in the first processing layer according to the second panoramic image sample and the second tag panoramic image, to obtain a convolutional neural network model, which is specifically used for:

The panoramic image processing apparatus in the embodiment shown in fig. 11 can be used to implement the technical solutions of the above method embodiments, and the implementation principle and technical effects are similar, which are not described herein again.

In addition, the disclosed embodiments also provide a computer-readable storage medium on which a computer program is stored, the computer program being executed by a processor to implement the panoramic image processing method of the above embodiments.

Furthermore, the embodiments of the present disclosure also provide a computer program product including a computer program or instructions, which when executed by a processor, implement the panoramic image processing method as the above-described embodiments.

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.

The foregoing are merely exemplary embodiments of the present disclosure, which enable those skilled in the art to understand or practice the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the disclosure. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A panoramic image processing method, characterized by comprising:

acquiring a panoramic image to be processed;

inputting the panoramic image to be processed into a convolutional neural network model which is trained in advance, wherein the convolutional neural network model comprises a first processing layer, a second processing layer and a synthesis layer, the panoramic image to be processed is processed through the first processing layer to obtain first characteristic information, a first dimension band and a first residual band, the first characteristic information and the first residual band are processed through the second processing layer to obtain a first characteristic image, and the first dimension band and the first characteristic image are processed through the synthesis layer to obtain a first panoramic image;

the first processing layer comprises a plurality of sub-processing layers, wherein each sub-processing layer comprises a feature extraction layer, a space segmentation layer and a discrimination layer;

the second processing layer comprises the feature extraction layer;

the resolution of the first panoramic image is greater than or equal to a preset resolution;

and wherein the processing the to-be-processed panoramic image through the first processing layer to obtain first feature information, a first dimension band and a first remainder band comprises:

performing feature extraction on the panoramic image to be processed through the feature extraction layer to obtain first feature information;

segmenting the first characteristic information through the space segmentation layer to obtain stripe data;

2. The method of claim 1, wherein the convolutional neural network model further comprises a sub-pixel convolutional layer and a deconvolution layer, and wherein segmenting the first feature information by the spatial segmentation layer to obtain stripe data comprises:

if the current sub-processing layer is the first sub-processing layer in the first processing layer, sampling the panoramic image to be processed by using the deconvolution layer to obtain second sampling data;

and segmenting the second characteristic image according to the dimension of the panoramic image to be processed through the space segmentation layer to obtain stripe data.

3. The method of claim 2, wherein the processing the first feature information and the first remaining band by the second processing layer to obtain a first feature image comprises:

performing feature extraction on the first feature information through the feature extraction layer in the second processing layer to obtain second feature information;

4. The method of claim 1, wherein said processing the first dimension band and the first feature image through the synthesis layer to obtain a first panoramic image comprises:

5. The method of claim 1, wherein before inputting the panoramic image to be processed into the pre-trained convolutional neural network model, the method further comprises:

training the feature extraction layer, the second processing layer and the synthesis layer in the first processing layer according to the first panoramic image sample and the first label panoramic image to obtain an initial neural network model;

and training the initial neural network model, the space segmentation layer in the first processing layer and the discrimination layer in the first processing layer according to the second panoramic image sample and the second tag panoramic image to obtain a convolutional neural network model.

6. The method of claim 5, wherein the training the feature extraction layer, the second processing layer, and the synthesis layer in the first processing layer according to the first panoramic image sample and the first tagged panoramic image to obtain an initial neural network model comprises:

inputting the first panoramic image sample into a network framework constructed by the feature extraction layer, the second processing layer and the synthesis layer in the first processing layer to obtain first prediction data;

7. The method of claim 6, wherein the training the initial neural network model, the spatial segmentation layer in the first processing layer, and the discrimination layer in the first processing layer according to the second panoramic image sample and the second tagged panoramic image to obtain a convolutional neural network model comprises:

constructing a frame of a convolutional neural network model consisting of a network layer in the initial neural network model, the space segmentation layer in the first processing layer and the discrimination layer in the first processing layer;

inputting the second panoramic image sample into a frame of the convolutional neural network model to obtain second prediction data;

8. A panoramic image processing apparatus characterized by comprising:

the acquisition module is used for acquiring a panoramic image to be processed;

the processing module is used for inputting the panoramic image to be processed into a convolutional neural network model which is trained in advance, wherein the convolutional neural network model comprises a first processing layer, a second processing layer and a synthesis layer, the first processing layer is used for processing the panoramic image to be processed to obtain first characteristic information, a first dimension band and a first residual band, the second processing layer is used for processing the first characteristic information and the first residual band to obtain a first characteristic image, and the synthesis layer is used for processing the first dimension band and the first characteristic image to obtain a first panoramic image;

the second processing layer comprises the feature extraction layer;

and wherein the processing module is to: