CN109246407B

CN109246407B - Image coding

Info

Publication number: CN109246407B
Application number: CN201710294882.0A
Authority: CN
Inventors: 许继征; 李一鸣
Original assignee: Microsoft Technology Licensing LLC
Current assignee: Microsoft Technology Licensing LLC
Priority date: 2017-04-28
Filing date: 2017-04-28
Publication date: 2020-09-25
Anticipated expiration: 2037-04-28
Also published as: CN109246407A; WO2018200293A1

Abstract

In this disclosure, methods, devices, and computer-readable media for image and/or video encoding are presented. According to the method, in response to a source image being mapped to a destination image, an effect of at least one destination pixel on the mapping is determined based on at least one source pixel in the source image and at least one destination pixel in the destination image, at least one source pixel being mapped to at least one destination pixel. Then, distortion caused by the encoding is determined based on at least an effect of the at least one destination pixel on the mapping. Next, encoding parameters for encoding the target image are determined based on the distortion.

Description

Image coding

Background

With the development of camera arrays, head mounted displays, and sensors for tracking head position, applications of omnidirectional video have emerged to provide users with a visually immersive experience. Omnidirectional video is typically composed of multiple spherical images. Omnidirectional video consisting of multiple spherical images is difficult to compress because conventional image and/or video coding systems typically employ rectangular images as input. Therefore, in order to compress omni-directional video, it is generally necessary to map a spherical image to a rectangular image. However, such mapping processes may introduce large distortions to the image. Furthermore, different parts of the moving object may have different geometric distortions. Such distortion may significantly degrade the performance of the encoding.

Disclosure of Invention

According to implementations of the present disclosure, an encoding method, an encoding apparatus, and a computer program product are presented. Once the source image is mapped to a destination image and the source pixels in the source image are mapped to destination pixels in the destination image, the method determines the effect of the destination pixels on the mapping based on the source and destination pixels. The method also determines a distortion caused by the encoding based at least on the determined effect. Furthermore, the method determines encoding parameters for encoding the target image based on the determined distortion.

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Drawings

FIG. 1 illustrates a block diagram of a system 100 in which implementations of the present disclosure can be implemented;

FIG. 2 illustrates a flow diagram of an encoding method 200 in accordance with implementations of the present disclosure;

FIG. 3 shows a schematic diagram of a spherical image mapped to a rectangular image using equirectangular projection; and

fig. 4 illustrates a block diagram of an example computing system/server 400 in which one or more implementations of the present disclosure may be implemented.

In the drawings, the same or similar reference characters are used to designate the same or similar elements.

DETAILED DESCRIPTION OF EMBODIMENT (S) OF INVENTION

The present disclosure will now be discussed with reference to several example implementations. It should be understood that these implementations are discussed only to enable those of ordinary skill in the art to better understand and thus implement the present disclosure, and are not intended to imply any limitation on the scope of the present subject matter.

As used herein, the term "include" and its variants are to be read as open-ended terms meaning "including, but not limited to. The term "based on" is to be read as "based, at least in part, on". The terms "one implementation" and "an implementation" are to be read as "at least one implementation". The term "another implementation" is to be read as "at least one other implementation". The terms "first," "second," and the like may refer to different or the same object. Other explicit and implicit definitions are also possible below. The definitions of the terms are consistent throughout the specification unless explicitly stated otherwise.

Fig. 1 illustrates a block diagram of a system 100 in which implementations of the present disclosure can be implemented. The system 100 may be used to process images and/or video. In particular, the system 100 may be used to process omni-directional video. As shown in fig. 1, system 100 may be generally divided into an encoding subsystem 110 and a decoding subsystem 120. It is to be understood that the description of the structure and function of the system 100 is for exemplary purposes only and is not meant to imply any limitation as to the scope of the disclosure. The present disclosure may be embodied in different structures and/or functions. Additionally, some or all of the modules included in system 100 may be implemented by software, hardware, firmware, and/or combinations thereof.

As shown in fig. 1, the encoding subsystem 110 may include a capture module 111, a stitching module 112, a mapping module 113, and an encoding module 114. In some implementations, the modules may be distributed across different devices. Alternatively, in other implementations, these modules may be implemented on the same device. In addition, some of these modules may be combined with other modules. For example, in some implementations, the mapping module 113 may be incorporated into the encoding module 114.

The capture module 111 may be implemented, for example, by a camera array, which may cover all directions of light directed to the camera array. The capture module 111 may be used to capture video signals from different directions.

The stitching module 112 may receive image signals from different directions captured by the capture module 111 and apply a stitching process to obtain a source image. For example, the stitching module 112 may generate source images from image signals from different directions captured by the capture module 111. The source image may be a non-planar image. In particular, the source image may for example comprise a spherical image recording the visual signal in all directions as seen from the center of the sphere. The stitching module 112 may also combine the plurality of spherical images into a sequence of spherical images. The combined spherical image sequence is an omnidirectional video signal.

An omnidirectional video signal represented by a sequence of spherical images can treat the video signal from each direction equally, which is beneficial for subsequent video rendering. However, such omni-directional video signals are difficult to compress because conventional image and/or video coding systems typically employ rectangular images as input. For compressing and transmitting the omnidirectional video signal, the source image in the omnidirectional video may be mapped, for example, by means of the mapping module 113. For example, the mapping module 113 may map a source image to a destination image. In particular, the mapping module 113 may map spherical images in a sequence of spherical images to rectangular images, for example.

The mapping module 113 may perform the above mapping using a number of methods. For example, in some implementations, to map a spherical image to a rectangular image, the mapping module 113 may perform the mapping using equirectangular projection (equirectangular projection), which is widely used in head-mounted displays. The "equirectangular projection" described herein refers to a mapping process of mapping the meridian lines of the sphere having a constant interval to the vertical lines having a constant interval, and mapping the latitude lines of the sphere having a constant interval to the horizontal lines having a constant interval. For convenience of description, the implementation of the present disclosure will be described below by taking an equal rectangular projection as an example. However, it should be understood that the present disclosure may also be applied to other mapping approaches. The scope of the present disclosure is not limited in this respect.

After mapping a source image (e.g., a spherical image in a sequence of spherical images) to a destination image (e.g., a rectangular image), the encoding module 114 may be utilized to compress the image and/or video signal into an image and/or video bitstream. The compressed image and/or video bit stream may be used, for example, for transmission to different devices.

The decoding subsystem 120 may perform the inverse of the above-described process performed by the encoding subsystem 110 to reconstruct an image and/or video signal from a received image and/or video bitstream. In particular, when the received bitstream is a compressed omnidirectional video signal, the decoding subsystem 120 may reconstruct an omnidirectional video signal from the received video bitstream. As shown in fig. 1, the decoding subsystem 120 may include a decoding module 121, a reflection module 122, and a rendering module 123. In some implementations, the modules may be distributed across different devices. Alternatively, in other implementations, these modules may be implemented on the same device. In addition, some of these modules may be combined with other modules. For example, in some implementations, demapping module 122 may be incorporated into decoding module 121.

The decoding module 121 may decode the received image and/or video bitstream into an image and/or video. The de-mapping module 122 may then perform the inverse of the above described operations performed by the mapping module 113 to map a destination image (e.g., a rectangular image) in the image and/or video back to a source image (e.g., a spherical image) to reconstruct the image and/or video signal. Rendering module 123 may present the reconstructed image and/or video signals onto a respective display device (e.g., a head mounted display, etc.).

As described above, during the processing of images and/or video, the mapping process that maps a source image (e.g., a spherical image), which may be a non-planar image, to a destination image (e.g., a rectangular image) may introduce large distortions to the image. Taking the equirectangular projection as an example, the north pole of a sphere may be mapped to the top line of sight in the rectangular image, while the relative distance between points on the equator of the sphere may remain constant before and after mapping. Such geometric distortions will become larger as the points are farther from the equator. In addition, distortion may also affect motion. For example, an object with rigid motion may have more complex motion after mapping because different parts of the object may have different geometric distortions. As used herein, "rigid motion" refers to movement of an object in which the distance between any two points remains constant before and after movement. However, conventional video coding systems are typically designed to accommodate linear motion (i.e., motion in a linear motion trajectory) and/or rigid motion. Therefore, such geometric distortion may significantly reduce the performance of video coding.

To address one or more of the above issues and other potential issues, in accordance with an example implementation of the present disclosure, a coding scheme is presented. The scheme is able to determine the different effects of different points in the mapping process from the source image to the destination image on the image resulting from the mapping process, and the scheme is able to accommodate any known or to be developed mapping approach (including but not limited to equirectangular projections). The scheme is capable of adjusting a metric for distortion caused by encoding based on the determined effect, and determining at least a portion of encoding parameters for an encoding-purpose image based on the adjusted distortion metric to improve encoding performance. It should be understood that encoding schemes according to implementations of the present disclosure may be used to encode images and/or video. In particular, the coding scheme may be used to code omni-directional video to improve coding performance for omni-directional video.

Fig. 2 shows a flow diagram of an encoding method 200 according to an implementation of the present disclosure. For example, in some implementations, the method 200 may be performed by the encoding subsystem 110 as shown in fig. 1. It is to be understood that method 200 may also include additional acts not shown and/or may omit acts shown. The scope of the present disclosure is not limited in this respect.

At block 210, the encoding subsystem 110, in response to a source image being mapped to a destination image, determines an effect of at least one destination pixel on the mapping based on at least one source pixel in the source image and at least one destination pixel in the destination image. The at least one source pixel in the source image is mapped to the at least one destination pixel in the destination image.

The "source image" as described herein may include any type of planar or non-planar image, for example. Furthermore, the source image may also be an image in a video. For example, the source image may be at least one of a plurality of spherical images constituting an omnidirectional video. The "destination image" as described herein may be, for example, any type of image to which the source image is mapped. For example, when the source image is a spherical image, the destination image may be a rectangular image. As used herein, a "source pixel" may refer to a pixel in a source image, while a "destination pixel" refers to a pixel in a destination image to which the source image is mapped that corresponds to the source pixel. For convenience of description, a spherical image is taken as an example of a source image, and a rectangular image is taken as an example of a destination image. It should be understood, however, that this is merely exemplary and is not intended to limit the scope of the present disclosure in any way.

For example, FIG. 3 shows a schematic diagram of mapping a spherical image to a rectangular image using iso-rectangular projection. As shown in fig. 3, the spherical image 310 is mapped to a rectangular image 320. In particular, a source pixel 311 in the spherical image 310 is mapped to a destination pixel 321 in the rectangular image 320. Further, it can be seen that by equirectangular projection, the north pole in the spherical image 310 is mapped to the top line of sight in the rectangular image 320, and the relative distance between points on the equator in the spherical image 310 remains unchanged before and after mapping. The actions of method 200 are described in further detail below in conjunction with fig. 3. For example, for purposes of illustration only, spherical image 310 is used as an example of a source image and rectangular image 320 is used as an example of a destination image in the following description. It should be understood, however, that this is merely exemplary and is not intended to limit the scope of the present disclosure in any way.

As described above, although mapping spherical image 310 to rectangular image 320 makes encoding more convenient, such mapping may introduce geometric distortion that significantly impacts encoding performance. And so that different parts in the image have different effects on the geometric distortion.

Assuming that each source pixel in the spherical image 310 is equally important, the destination pixels in the rectangular image 320 may have different importance. That is, different destination pixels in the rectangular image 320 may have different effects on the mapping.

In some implementations, the encoding subsystem 110 may determine the effect of the destination pixel 321 on the mapping based on the number of pixels included in the source pixel 311 in the spherical image 310 (referred to as a "first number") and the number of pixels included in the destination pixel 321 in the rectangular image 320 (referred to as a "second number"). For example, the impact may be represented by a weight associated with the destination pixel 321. In some implementations, the weight may be quantized, for example, to a ratio of the first number to the second number. Of course, this is merely exemplary and is not intended to limit the scope of the present disclosure in any way. In alternative implementations, any suitable calculation may be made for the first number and the second number, and the results of the calculation taken as a quantitative representation of the above-described effects.

For example, assuming that the rectangular image 320 mapped from the spherical image 310 has a width of W pixels and a height of H pixels, the pixel at the jth pixel in the horizontal direction and the ith pixel in the vertical direction in the rectangular image 320 (e.g., the destination pixel 321) may be represented as I [ I, j ], where I ∈ [0, H-1] and j ∈ [0, W-1 ]. The weight associated with the destination pixel I [ I, j ] may be denoted A [ I, j ]. For example, a smaller a [ I, j ] may indicate that the destination pixel I [ I, j ] corresponds to a smaller area in the spherical image (i.e., fewer source pixels), and thus the destination pixel I [ I, j ] has less impact on the first distortion. Accordingly, a larger a [ I, j ] may indicate that the destination pixel I [ I, j ] corresponds to a larger area in the spherical image (i.e., more source pixels), and thus the destination pixel I [ I, j ] has a larger impact on the first distortion.

In some implementations, for example in the case of equal rectangular projections, a [ i, j ] may be determined from the value of i as follows:

further, each source pixel in the spherical image 310 may also have a different importance, e.g., each source pixel in the spherical image 310 may have a degree of importance associated with encoding. In some implementations, the importance level is expressed, for example, as a weight reflecting the importance of the source pixel itself. For example, a higher weight value indicates that the source pixel has a greater impact on encoding, while a lower weight value indicates that the source pixel has a lesser impact on encoding. In this case, the encoding subsystem 110 may also determine the effect of the destination pixel on the mapping based on the source pixel, the destination pixel corresponding to the source pixel, and the degree of importance associated with the source pixel (e.g., a weight reflecting the importance of the source pixel itself).

For example, a destination pixel I [ I, j ] (e.g., destination pixel 321) in the rectangular image 320 may correspond to at least one source pixel (e.g., source pixel 311) in the spherical image 310, and each of the at least one source pixel may have a weight reflecting its importance. In this case, in some implementations, the encoding subsystem 110 may determine the weight associated with each of the at least one source pixel as a function of its respective weight (e.g., by averaging or any other suitable means). Assume that a weight value associated with the at least one source pixel is represented as w [ I, j ], and a weight value determined based on the number of pixels in the at least one source pixel and the number of pixels in the destination pixel I [ I, j ] (e.g., according to equation (1)) is a [ I, j ]. In some implementations, the weight A [ I, j ] associated with the destination pixel I [ I, j ] may be determined, for example, as the product of w [ I, j ] and a [ I, j ].

In some implementations, to encode the destination image, the destination image may be divided into a plurality of image blocks. When an image block is small, it can be approximated that different destination pixels in the image block have the same effect on the mapping. That is, different destination pixels in the image block may have the same weight. In this case, for example, weights associated with one destination pixel may be assigned to other pixels in the image block to simplify certain subsequent processing.

In this way, implementations of the present disclosure are able to determine, in a quantitative manner, different effects of different points on the mapping from a source image to a destination image during the mapping. Further, it can be seen that implementations of the present disclosure are not limited to a particular mapping approach, but can accommodate any known or yet to be developed mapping approach in determining this effect.

With continued reference to fig. 2, at block 220, the encoding subsystem 110 determines a distortion caused by the encoding based on an effect of at least one destination pixel (e.g., destination pixel 321) on the mapping. In some implementations, the encoding subsystem 110 may determine the distortion by generating an error that measures the distortion, such that the generated error is utilized in subsequent actions to estimate corresponding image and/or video encoding parameters.

In some implementations, the error that measures the distortion caused by encoding may include, for example, the Sum of Squares of Errors (SSE) between a rectangular image and its reconstructed image. As used herein, a "reconstructed image" refers to an image obtained by applying, at least in part, the inverse of encoding to an encoded image of interest. For example, in image and/or video coding, the encoding module 114 generally has the function of reconstructing an encoded image in order to provide partial reference data (e.g., in video coding to provide reference images for predictive coding). The encoding subsystem 110 may obtain a reconstructed image of the target image, for example, from the encoding module 114, to calculate the SSE between the target image and its reconstructed image.

In some implementations, the error that measures the distortion caused by encoding may also include a Sum of Absolute Differences (SAD) or Sum of Absolute Transformed Differences (SATD), etc., between the target image and its reconstructed image. For example, the encoding subsystem 110 may obtain reconstructed images of the image of interest (e.g., from the encoding module 114) to calculate the SAD or SATD.

Different errors that measure the distortion caused by encoding may be applied to different modules (e.g., motion estimation modules, filtering modules, etc.) of an image and/or video encoder, respectively, for example, to estimate encoding parameters applied to the modules.

Assuming that a reconstructed pixel corresponding to a target pixel I [ I, j ] in a reconstructed image of the target image is represented by I' [ I, j ], SSE, SAD, and SATD based on the conventional scheme may be respectively determined as follows:

wherein

Representing a Hadamard transform.

In implementations of the present disclosure, the encoding subsystem 110 may determine the distortion based on the destination image, a reconstructed image of the destination image, and different effects of different pixels in the destination image on the mapping, taking into account the different effects of different pixels in the destination image on the mapping. For example, the encoding subsystem 110 may adjust a distortion metric based on the legacy scheme based on the different impact.

In particular, in some implementations, the encoding subsystem 110 may generate a sum of squared errors (hereinafter "SSE '"), a sum of absolute differences (hereinafter "SAD '"), and/or a sum of absolute transformed differences (hereinafter "SATD '") between the target image and its reconstructed image based on different effects of different pixels in the target image on the mapping for image and/or video encoding parameter estimation in subsequent actions. For example, SSE ', SAD ', and SATD ' may be determined as follows, respectively:

wherein

Representing a Hadamard transform.

More generally, assuming that the distortion between a pixel I [ I, j ] based on the conventional scheme and its reconstructed pixel I '[ I, j ] is represented by D (I [ I, j ], I' [ I, j ]), the distortion D '(I [ I, j ], I' [ I, j ]) according to implementations of the present disclosure may be represented as follows:

in this way, implementations of the present disclosure are able to adjust a metric of distortion for image and/or video coding based on different effects of different pixels on the mapping.

At block 230, the encoding subsystem 110 determines at least a portion of the encoding parameters for encoding the image of interest based on the distortion determined at block 220.

In image and/or video coding, for example, for a certain image block in a destination image to be encoded, there may be many encoding parameters used to encode the image block. Examples of encoding parameters may include, for example, one or more of: a Quantization Parameter (QP) corresponding to a corresponding quantization step and deciding a quantization degree of image and/or video encoding; the encoding mode of the image block (e.g., intra prediction or inter prediction in video encoding, and prediction direction of intra prediction and/or inter prediction, etc.); the image block is further divided into a pattern of sub-image blocks; the type of transform applied to the image block, such as a discrete cosine transform or a direct linear transform; and so on.

In some implementations, the encoding subsystem 110 may apply a rate-distortion optimization process to compare different encoding parameters to select at least a portion of the encoding parameters having the best encoding performance. In some implementations, the encoding subsystem 110 may compare different values of all encoding parameters to determine all encoding parameters with the best encoding performance. In other implementations, the encoding subsystem 110 may only compare different values of the partial encoding parameters to determine the partial encoding parameters with the best encoding performance. For example, in some cases, one or more encoding parameters may have been determined, and thus the encoding subsystem may only apply the optimization process to encoding parameters other than the one or more encoding parameters.

The "rate-distortion optimization process" described herein refers to a process of applying rate-distortion theory to determine coding parameters with optimal coding performance. The main goals of the rate-distortion optimization process are: 1) under the limit of a certain coding code rate, the distortion of the image and/or the video is reduced; and 2) reducing the coding rate as much as possible while allowing a certain distortion.

In some implementations, the rate distortion optimization process can be converted to a lagrangian optimization process. In particular, the lagrangian optimization process may be described as follows:

where N denotes the number of image blocks in the target image to be encoded, D_kRepresenting the coding distortion corresponding to the kth image block, λ represents the Lagrangian, and R_kRepresenting the total number of bits used to encode the k-th image block. Suppose that D is used to represent

And is represented by R

The above equation (9) can be simplified as:

minimize(D+λR) (10)

according to equation (10), the lagrangian optimization process may be described as determining the encoding parameter that minimizes the value of (D + λ R) (i.e., the encoding parameter with the best encoding performance).

In some implementations, the encoding subsystem 110 may use any one of SSE ', SAD ', and SATD ' determined according to equations (5) - (7) as D in equation (10) to determine the encoding parameters with the best encoding performance. More generally, in some implementations, the encoding subsystem 110 may also determine the encoding parameters with the best encoding performance using the other distortion determined according to equation (8) as D in equation (10).

In some implementations, as described above, when image blocks are smaller, different pixels in the same image block may be approximated as having the same weight. In this case, the lagrangian optimization process can be further simplified when SSE' is employed as an error to measure coding distortion.

For example, assume that the weight of the k-th image block is w_kThe distortion corresponding to the k-th image block determined based on the conventional scheme (e.g., equation (2)) is SSE_kWhile the distortion corresponding to the k-th image block determined according to implementations of the present disclosure (e.g., equation (5)) is SSE_k'. According to the formulas (2) and (5), it is possible toDetermining SSE_kAnd SSE_k' has the following relationship: SSE_k′＝w_kSSE_k. The above equation (9) can thus be transformed into:

as can be seen from equation (11), in the implementation of the present disclosure, the calculation manner of the SSE may not be changed during the lagrangian optimization (i.e., the SSE calculated by equation (2) may be utilized) and only the lagrangian λ may be changed to be the one calculated by equation (2)

To determine the coding parameters with the best coding performance. In this manner, implementations of the present disclosure are able to determine optimal encoding parameters for encoding images and/or video (e.g., omni-directional video) based on the adjusted distortion metric to improve encoding performance with as little change as possible to existing image and/or video encoders.

In some implementations, the encoding parameters to be determined may include a QP, which corresponds to a respective quantization step size and determines a quantization level for image and/or video encoding. In image and/or video coding, there is typically the following relationship between the lagrangian λ and QP:

λ＝0.57×2^(QP-12)/3(12)

namely:

QP＝3log₂λ+14.43 (13)

according to equation (11), for the k-th image block, its quantization parameter QP' may be determined as follows:

as can be seen from equation (14), in the implementation of the present disclosure, for the kthFor image blocks, 3log may be applied to QP₂w_kTo obtain the quantization parameter QP' with the best coding performance. That is, the encoding subsystem 110 may determine the optimal quantization parameter as at least a portion of the encoding parameters for the encoding-purpose image based only on the weights and lagrangian operators associated with the image block, thereby further improving the encoding performance for the omni-directional video.

Further, in some implementations, method 200 may also include acts not shown in fig. 2. For example, prior to block 210, the encoding subsystem 110 (e.g., the mapping module 113) may map the source image to a rectangular image. For example, the encoding subsystem 110 may map the spherical image 310 into a rectangular image 320 by iso-rectangular projection. Following block 230, the encoding subsystem (e.g., encoding module 114) may encode the image of interest using the encoding parameters determined at block 230. It should be understood that this is done merely for purposes of simplifying the description and is not intended to limit the scope of the present disclosure.

From the above description, it can be seen that implementations of the present disclosure are able to determine the different effects of different points on the mapping from a source image to a destination image, and that implementations of the present disclosure are able to accommodate any known or to be developed mapping approach (including but not limited to equirectangular projections). Implementations of the present disclosure can adjust a metric for distortion caused by encoding based on the determined impact, and determine at least a portion of encoding parameters for an encoding-destination image based on the adjusted distortion metric to improve encoding performance.

Fig. 4 illustrates a block diagram of an example computing system/server 400 in which one or more implementations of the present disclosure may be implemented. For example, in some implementations, the encoding subsystem 110 as shown in fig. 1 may be implemented by a computing system/server 400. The computing system/server 400 shown in fig. 4 is only an example, and should not be taken as limiting the scope or functionality of use of the implementations described herein.

As shown in fig. 4, computing system/server 400 is in the form of a general purpose computing device. Components of computing system/server 400 may include, but are not limited to, one or more processors or processing units 400, memory 420, one or more input devices 430, one or more output devices 440, storage 450, and one or more communication units 460. The processing unit 400 may be a real or virtual processor and may be capable of performing various processes according to the persistence stored in the memory 420. In a multi-processing system, multiple processing units execute computer-executable instructions to increase processing power.

Computing system/server 400 typically includes a number of computer media. Such media may be any available media that is accessible by computing system/server 400 and includes, but is not limited to, volatile and non-volatile media, removable and non-removable media. Memory 420 may be volatile memory (e.g., registers, cache, Random Access Memory (RAM)), non-volatile memory (e.g., Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory), or some combination thereof. Storage 450 may be removable or non-removable, and may include machine-readable media, such as a flash drive, a diskette, or any other media, which may be capable of being used to store information and which may be accessed within computing system/server 400.

The computing system/server 400 may further include additional removable/non-removable, volatile/nonvolatile computer system storage media. Although not shown in FIG. 4, a magnetic disk drive for reading from and writing to a removable, non-volatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, non-volatile optical disk may be provided. In these cases, each drive may be connected to the bus by one or more data media interfaces.

Memory 420 may include at least one program product having (e.g., at least one) set of program modules that are configured to carry out the functions of the various implementations described herein. For example, when one or more modules in system 100 are implemented as software modules, they may be stored in memory 420. When accessed and executed by processing unit 400, may perform the functions and/or methods described herein, such as method 200.

The input unit 430 may be one or more of various input devices. For example, the input unit 439 may include a user device such as a mouse, a keyboard, a trackball, or the like. The communication unit 460 enables communication over a communication medium to another computing entity. Additionally, the functionality of the components of computing system/server 400 may be implemented in a single computing cluster or multiple computing machines, which are capable of communicating over a communication connection. Thus, computing system/server 400 may operate in a networked environment using logical connections to one or more other servers, network Personal Computers (PCs), or another general network node. By way of example, and not limitation, communication media includes wired or wireless networking technologies.

Computing system/server 400 may also communicate with one or more external devices (not shown), such as storage devices, display devices, etc., as desired, with one or more devices that enable a user to interact with computing system/server 400, or with any device (e.g., network card, modem, etc.) that enables computing system/server 400 to communicate with one or more other computing devices. Such communication may be performed via input/output (I/O) interfaces (not shown).

From the above description, it will be appreciated that implementations of the present disclosure are able to determine the different effects of different points on the mapping from a source image to a destination image, and that implementations of the present disclosure are able to accommodate any known or to be developed mapping approach (including but not limited to equirectangular projections). Implementations of the present disclosure can adjust a metric for distortion caused by encoding based on the determined impact, and determine at least a portion of encoding parameters for an encoding-destination image based on the adjusted distortion metric to improve encoding performance.

The functions described herein may be performed, at least in part, by one or more hardware logic components. By way of example, and not limitation, illustrative types of hardware logic components that may be used include Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

Further, while operations are depicted in a particular order, this should be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate implementations can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable subcombination.

Some example implementations of the present disclosure are listed below.

In a first aspect, an encoding device is provided. The apparatus includes a processing unit and a memory. A memory is coupled to the processing unit and stores instructions for execution by the processing unit. The instructions, when executed by the processing unit, cause the apparatus to perform acts comprising: responsive to a source image being mapped to a destination image, determining an effect of at least one destination pixel on the mapping based on at least one source pixel in the source image and at least one destination pixel in the destination image, at least one source pixel being mapped to at least one destination pixel; determining a distortion caused by the encoding based at least on an effect of the at least one destination pixel on the mapping; and determining at least a portion of the encoding parameters for encoding the image of interest based on the distortion.

In some implementations, determining the impact includes: the impact is determined based on a first number of pixels of the at least one source pixel and a second number of pixels of the at least one destination pixel.

In some implementations, the at least one source pixel has at least one degree of importance associated with the encoding, and determining the impact includes: the impact is determined based on the at least one source pixel, the at least one destination pixel, and the at least one importance level.

In some implementations, a rectangular image is associated with a plurality of image blocks, the acts further comprising: assigning the determined impact associated with the at least one destination pixel to pixels of the same image block of the plurality of image blocks as the at least one destination pixel.

In some implementations, determining the distortion includes: acquiring a reconstructed image of the target image, wherein the reconstructed image is obtained by applying the inverse process of coding to at least part of the coded target image; and determining distortion based on the target image, the reconstructed image, and the effect.

In some implementations, determining the distortion includes: an error is generated that measures the distortion.

In some implementations, generating the error includes: based on the influence, at least one of a Sum of Squared Error (SSE), a Sum of Absolute Differences (SAD), a Sum of Absolute Transformed Differences (SATD) between the target image and the reconstructed image is generated.

In some implementations, determining at least a portion of the encoding parameters includes: at least a portion of the encoding parameters are determined by a lagrangian optimization process based on the distortion.

In some implementations, determining at least a portion of the encoding parameters further includes: the quantization parameter is determined as at least a portion of the encoding parameter based on the impact and a lagrangian operator employed in the lagrangian optimization process.

In some implementations, a source image is included in a video, the device is to encode the video, the source image comprises a spherical image, the destination image comprises a rectangular image, and the acts further comprise: the spherical image is mapped to a rectangular image by equal rectangular projection before determining an impact associated with the at least one pixel of interest.

In a second aspect, a method of encoding is provided. The method comprises the following steps: responsive to a source image being mapped to a destination image, determining an effect of at least one destination pixel on the mapping based on at least one source pixel in the source image and at least one destination pixel in the destination image, at least one source pixel being mapped to at least one destination pixel; determining a distortion caused by the encoding based at least on an effect of the at least one destination pixel on the mapping; and determining at least a portion of the encoding parameters for encoding the image of interest based on the distortion.

In some implementations, a rectangular image is associated with a plurality of image blocks, the method further comprising: assigning the determined impact associated with the at least one destination pixel to pixels of the same image block of the plurality of image blocks as the at least one destination pixel.

In some implementations, generating the error includes: based on the influence, at least one of a Sum of Squared Error (SSE), a Sum of Absolute Differences (SAD), a Sum of Absolute Transformed Differences (SATD) between the rectangular image and the reconstructed image is generated.

In some implementations, a source image is included in a video, the device is for encoding the video, the source image includes a spherical image, the destination image includes a rectangular image, and the method further includes: the spherical image is mapped to a rectangular image by equal rectangular projection before determining an impact associated with the at least one pixel of interest.

In a third aspect, there is provided a computer program product tangibly stored in a non-transitory computer storage medium and comprising machine executable instructions that, when executed by a device, cause the device to perform the actions of the method according to the first aspect.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. An encoding device comprising:

a processing unit;

a memory coupled to the processing unit and storing instructions for execution by the processing unit, the instructions when executed by the processing unit causing the apparatus to perform acts comprising:

responsive to a source image being mapped to a destination image, determining an effect of at least one destination pixel in the source image on the mapping based on at least the at least one source pixel and the at least one destination pixel in the destination image, the at least one source pixel being mapped to the at least one destination pixel;

determining distortion caused by encoding the image of interest based at least on the effect of the at least one pixel of interest on the mapping; and

determining at least a portion of encoding parameters for encoding the destination image based on the distortion.

2. The apparatus of claim 1, wherein determining the impact comprises:

determining the impact based on a first number of pixels in the at least one source pixel and a second number of pixels in the at least one destination pixel.

3. The apparatus of claim 1, wherein the at least one source pixel has at least one degree of importance associated with the encoding, and determining the effect comprises:

determining the impact based on the at least one source pixel, the at least one destination pixel, and the at least one importance level.

4. The apparatus of claim 1, wherein the destination image is associated with a plurality of image blocks, and the actions further comprise:

assigning the determined impact associated with the at least one destination pixel to pixels located in a same image block of the plurality of image blocks as the at least one destination pixel.

5. The apparatus of claim 1, wherein determining the distortion comprises:

acquiring a reconstructed image of the target image, wherein the reconstructed image is an image obtained by at least partially applying the encoding inverse process to the encoded target image; and

determining the distortion based on the destination image, the reconstructed image, and the effect.

6. The apparatus of claim 5, wherein determining the distortion comprises:

an error is generated that measures the distortion.

7. The apparatus of claim 6, wherein generating the error comprises:

based on the effect, at least one of a Sum of Squared Error (SSE), Sum of Absolute Difference (SAD), Sum of Absolute Transformed Difference (SATD) between the target image and the reconstructed image is generated.

8. The apparatus of claim 1, wherein determining the at least a portion of the encoding parameters comprises:

determining the at least a portion of the encoding parameters by a Lagrangian optimization process based on the distortion.

9. The apparatus of claim 8, wherein determining the at least a portion of the encoding parameters further comprises:

determining a quantization parameter as the at least a portion of the encoding parameter based on the impact and a Lagrangian operator employed in the Lagrangian optimization process.

10. The apparatus of claim 1, wherein the source image is included in a video, the apparatus to encode the video, the source image comprises a spherical image, the destination image comprises a rectangular image, and the acts further comprise:

mapping the spherical image to the rectangular image by equal rectangular projection prior to determining the effect associated with the at least one destination pixel.

11. An encoding method, comprising:

12. The method of claim 11, wherein determining the impact comprises:

13. The method of claim 11, wherein the at least one source pixel has at least one degree of importance associated with the encoding, and determining the effect comprises:

14. The method of claim 11, wherein the destination image is associated with a plurality of image blocks, the method further comprising:

15. The method of claim 11, wherein determining the distortion comprises:

16. The method of claim 15, wherein determining the distortion comprises:

an error is generated that measures the distortion.

17. The method of claim 16, wherein generating the error comprises:

18. The method of claim 11, wherein determining the at least a portion of the encoding parameters comprises:

19. The method of claim 18, wherein determining the at least a portion of the encoding parameters further comprises:

20. A computer-readable storage medium comprising machine-executable instructions that, when executed by a device, cause the device to perform acts comprising: