CN113170133B - Block-based spatial activity metrics for pictures - Google Patents

Block-based spatial activity metrics for pictures Download PDF

Info

Publication number
CN113170133B
CN113170133B CN201980077959.6A CN201980077959A CN113170133B CN 113170133 B CN113170133 B CN 113170133B CN 201980077959 A CN201980077959 A CN 201980077959A CN 113170133 B CN113170133 B CN 113170133B
Authority
CN
China
Prior art keywords
block
region
inverse
blocks
picture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201980077959.6A
Other languages
Chinese (zh)
Other versions
CN113170133A (en
Inventor
V·阿季奇
H·卡瓦
B·富尔赫特
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OP Solutions LLC
Original Assignee
OP Solutions LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OP Solutions LLC filed Critical OP Solutions LLC
Publication of CN113170133A publication Critical patent/CN113170133A/en
Application granted granted Critical
Publication of CN113170133B publication Critical patent/CN113170133B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/119Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/14Coding unit complexity, e.g. amount of activity or edge presence estimation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/189Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
    • H04N19/192Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding the adaptation method, adaptation tool or adaptation type being iterative or recursive
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/189Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
    • H04N19/192Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding the adaptation method, adaptation tool or adaptation type being iterative or recursive
    • H04N19/194Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding the adaptation method, adaptation tool or adaptation type being iterative or recursive involving only two passes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/625Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using discrete cosine transform [DCT]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/80Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
    • H04N19/82Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/48Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using compressed domain processing techniques other than decoding, e.g. modification of transform coefficients, variable length coding [VLC] data or run-length data

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Discrete Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Image Analysis (AREA)

Abstract

An encoder includes circuitry configured to: a video frame is received, split into a plurality of blocks, a respective spatial activity metric for each of the plurality of blocks is determined, and the video frame is encoded by the spatial activity metric using a transform matrix. Related devices, systems, techniques, and articles are also described.

Description

Block-based spatial activity metrics for pictures
Cross Reference to Related Applications
The present application claims priority from U.S. provisional patent application serial No. 62/771,909, 11/27/2018, entitled "BLOCK-BASED spatial activity metric for pictures" (BLOCK-base SPATIAL ACTIVITY MEASURE FOR PICTURES), the entire contents of which are incorporated herein by reference.
Technical Field
The present invention relates generally to the field of video compression. More particularly, the present invention relates to a block-based spatial activity metric for pictures.
Background
The video codec may include electronic circuitry or software that compresses or decompresses digital video. The video codec may convert uncompressed video to a compressed format and vice versa. In the field of video compression, devices that compress video (and/or perform some of its functions) are commonly referred to as encoders, while devices that decompress video (and/or perform some of its functions) are referred to as decoders.
The format of the compressed data may be in compliance with standard video compression specifications. But may also be lost due to the loss of some of the information in the original video from the compressed video. As a result, the decompressed video cannot accurately reconstruct the original video due to insufficient information, thereby making its quality lower than the original uncompressed video.
There is a complex relationship between video quality and the amount of data used to characterize the video (e.g., determined by bit rate), complexity of encoding and decoding algorithms, susceptibility to data loss and errors, editing simplicity, random access, end-to-end delay (e.g., time delay), and so forth.
During encoding, a picture (e.g., a video frame) is partitioned (e.g., split) into relatively large blocks, such as 128 x 128, and this structure is fixed. But by cutting the picture into larger blocks for compression and regardless of the underlying video information (e.g., video content), the large blocks may not partition the picture in an efficient encoding manner, resulting in poor bit rate performance.
Disclosure of Invention
In one aspect, an encoder includes circuitry configured to: receiving a video frame; splitting a video frame into a plurality of blocks; determining a respective spatial activity metric for each of a plurality of blocks; and encoding the video frames by spatial activity metrics using the transformation matrix.
In another aspect, a method includes: receiving, by an encoder, a video frame; splitting a video frame into a plurality of blocks; determining a respective spatial activity metric for each of a plurality of blocks; and encoding the video frame by the spatial activity metric using the transform matrix.
One or more variations of the subject matter described in this specification can be practiced with reference to the following figures and description. Other features and advantages of the subject matter described in this specification will become apparent from the description, the drawings, and the claims.
Drawings
For the purpose of illustrating the invention, there is shown in the drawings aspects of one or more embodiments of the invention. It should be understood, however, that the invention is not limited to the precise arrangements and instrumentalities shown in the drawings, wherein:
FIG. 1 is a process flow diagram of an exemplary process for encoding video that may utilize a transformation matrix to determine spatial activity metrics to improve the operation of an encoder;
FIG. 2 is a process flow diagram of an exemplary process for performing frequency-based block fusion;
FIG. 3 is a system block diagram of an exemplary video encoder capable of encoding using spatial activity metrics that may include computing frequency components using a transform matrix; and
FIG. 4 is a block diagram of a computing system that may be used to perform any one or more of the methods of the present disclosure and any one or more portions thereof.
The drawings are not necessarily to scale, but are shown in phantom, schematic and partial views. In certain instances, details that are not necessary for an understanding of the embodiments or that render other details difficult to perceive may have been omitted. Like reference symbols in the various drawings indicate like elements.
Detailed Description
Some embodiments of the present subject matter relate to a method of encoding video that includes utilizing a spatial activity metric that includes calculating frequency components using a transform matrix. The use of the transform matrix can generate frequency metrics of more frequencies than an encoder using a high pass filter, and the additional information can improve coding efficiency. For example, by determining the frequency components using a transformation matrix, more accurate information about the video frame frequency components may be generated. And by using more accurate information about the frequency components, block fusion can be improved, and conversely prediction can be improved again, thereby reducing residual and bit stream bit rate.
In some implementations, the picture is partitioned using a block of samples as a base unit. The sample blocks are uniform in size, the sample blocks may be of a side length size in square pixels, for example, but not limited to, embodiments of the present disclosure may use 4 x 4 sample blocks as a base unit, and may use spatial activity metrics to calculate the frequency components of the 4 x 4 blocks. By taking a4 x 4 block as the basic partition size, some embodiments of the present subject matter allow finer granularity by the encoder and the size to be consistent with the set transform block size, so that the coding efficiency can be improved using standard and defined transform matrices. In addition, this approach is in contrast to some existing encoding methods that use fixed block structures of relatively large size. It will be appreciated by those skilled in the art after reading the entire disclosure that, according to any measurement method, generally any size or shape of sample block may be used for picture splitting and/or segmentation, although for brevity, only a4 x 4 sample block is described in many of the examples that follow.
FIG. 1 is a process flow diagram of an exemplary process 100 for encoding video that may utilize a transformation matrix to determine spatial activity metrics to improve the operation of an encoder. In step 105, a video frame is received by an encoder. Or in any manner suitable for receiving video in the form of video streams and/or files from any device and/or input port. Receiving the video frame may include retrieving from a memory of the encoder and/or a memory of a computing device in communication with, integrated with, and/or integrated into the encoder. The receiving may include receiving from a remote device over a network. Receiving a video frame may include receiving a plurality of video frames comprised of one or more videos.
In step 110, and still referring to fig. 1, the encoder may divide and/or split the video frame into blocks. The blocks may have any suitable shape or size as described above, including a size of 4 pixels by 4 pixels (4 by 4). The 4 x 4 size is compatible with many standard video resolutions that can be divided into an integer number of 4 x 4 blocks.
In step 115, and with continued reference to fig. 1, the encoder may determine a respective spatial activity metric for each of the plurality of blocks. As used in this disclosure, "spatial activity metrics" refer to the frequency and magnitude of changes in texture within a block. That is, the spatial activity metric is lower in flat areas such as sky and higher in complex areas such as grass. Determining the corresponding spatial activity metric may include determining using a transform matrix, but is not limited to a discrete cosine transform matrix. Determining the respective spatial activity metric for each block includes determining using a generalized discrete cosine transform matrix. For example, in the case where the above block is a 4×4 pixel block, the generalized discrete cosine transform matrix may include a generalized discrete cosine transform matrix II of the form:
Wherein, a=1/2, And/>
In some implementations, integer approximations of the transformation matrix may be used, which may be implemented using efficient hardware and software. For example, in the case where the above block is a 4×4 pixel block, the generalized discrete cosine transform matrix may include a generalized discrete cosine transform matrix II of the form:
For each block B i, the block spectrum is calculated using the following formula:
FBi=T×Bi×T’
wherein T' is the transverse direction of the cosine transform matrix T; b i is a block represented by a matrix of values corresponding to pixels in the block, such as the 4 x 4 block of pixels described above in a 4 x 4 matrix; the operator x represents a matrix multiplication.
In some implementations, block fusion may be implemented using spatial activity metrics. In block fusion, each block may be assigned an area. For example, a first region within a video frame may be determined, the first region including a first grouping of a first subset of blocks. The first grouping may be based on a corresponding spatial activity metric. Blocks may be grouped into regions having similar spectrums to represent regions having similar spatial activities.
For example, determining the first region may include iterating through each of the plurality of blocks, and for each current block, comparing a spatial activity metric for the current block with a spatial activity metric for a previous block and determining whether the difference is below a predetermined threshold, and assigning the current block to the first region. The current block is assigned to the second region in response to or based on a determination that the difference is greater than a predetermined threshold. It will be appreciated by those skilled in the art upon reading the entire disclosure that the threshold comparison may be performed in a variety of ways, including alternatively or additionally by determining whether the degree of similarity exceeds a threshold and/or by determining whether the degree of difference exceeds a threshold. The threshold may be stored in memory and/or generated using previously or currently received values and/or calculated values, including any of the values and/or measurements described in this disclosure.
Fig. 2 is a process flow diagram of an exemplary process 200 for performing frequency-based block fusion. In 205, each block B i is iterated (e.g., in 205, i may be incremented by 1). In step 205, the spectrum of block F Bi is calculated. In step 215, it is determined whether the spatial activity metric difference between the current block F Bi and the previous block F Bi-i is below a predetermined threshold T F. If yes, in step 220, the current block B i is added to the current region. If not, in step 225, current block B i is added to the new or different region.
In some embodiments, the average metric of information may be used for block fusion, and as a non-limiting example, may be determined from a sum of information metrics for each block in the first region, which may be weighted by and/or multiplied by a saliency coefficient, e.g., a summation equation as follows:
Wherein N is the sequence number of the first region; s N is a significance coefficient; k is an index corresponding to one block of the plurality of blocks constituting the first area; n is the number of blocks constituting the first region; b k is an information metric for one of the plurality of blocks; and a N is a first average measure of information. For example, B k includes a spatial activity metric calculated using a block discrete cosine transform matrix.
The saliency coefficient S N may be provided by an external expert and/or calculated based on a feature value (e.g., a fusion block) of the first region. As used herein, a region "eigenvalue" is a measured attribute of a region that can be determined based on the content of the region, and the eigenvalue can be represented numerically using the output of one or more calculations performed in the first region. One or more calculations include any analysis of any signal representing the first region. One non-limiting example includes: in the quality modeling application, regions with a smooth background are assigned a higher S N, while regions with a lower smooth background are assigned a lower S N. As a non-limiting example, the number of edges may be measured using a Canny edge detection algorithm to determine smoothness; wherein the lower the number, the higher the smoothness. Another example of automatic smoothness detection may include performing a Fast Fourier Transform (FFT) on the spatially variant signal of an area, where the signal may be analyzed on any two-dimensional coordinate system and FFT on channels representing red, green, and blue values, etc. With FFT computation, in the frequency domain where the frequency component is lower, a higher relative dominance indicates a higher smoothness, whereas in the frequency domain where the frequency component is higher, a higher relative dominance indicates a more frequent and faster transition of color and/or shading values in the background region, such that the smoothness score is lower. Important semantic objects can be identified by user input. The semantic importance may alternatively or additionally be detected based on edge configuration and/or texture patterns. The background may be identified by, but is not limited to, receiving and/or detecting a portion of an area representing an important or "foreground" object (e.g., a face or other item, including but not limited to an important semantic object). Another example may include assigning a higher S N, such as a face, to a region containing important semantic objects. Important semantic objects can be identified by user input.
In some embodiments, frequency-based block fusion may be omitted in embodiments where other information processing algorithms directly use the calculated F Bi values. This embodiment may include adaptive quantization, wherein quantization parameters are determined based on F Bi values for each 4 x 4 block, F Bi values for 4 x 4 blocks combined into adjacent groupings of 8 x 8, 16 x 16, 32 x 32, or 64 x 64 blocks, or any other suitable combination of F Bi values.
In step 120, a video frame is encoded. The encoding may include controlling a quantization parameter based on the spatial activity metric, e.g., an average spatial activity metric of a first region generated by a block fusion process of 4 x 4 blocks. Quantization parameters include, are equal to, and/or proportional to, and/or linear to quantization sizing. As used in this disclosure, a "quantization level" and/or "quantization size" is a number of digits representing the amount of lost information in a compressed video frame. The quantization level may include, but is not limited to, a number such as an integer, dividing and/or subtracting the integer number by one or more coefficients including, but not limited to, transform coefficients, reducing the encoded and subsequently encoded frame information content. Controlling may include determining a first quantization size based on a first metric of information; the quantization level may represent a metric that is stored directly or indirectly in a memory required to capture information describing luminance and/or chrominance in a block of pixels, wherein the larger the variance of the information determined by the first metric of information, the larger the number of bits required for storage. The quantization size may be based on the first measure of information described above, wherein the larger the first measure of information, the larger the quantization size and the smaller the first measure of information, the smaller the quantization size. The quantization size may be proportional and/or linearly related to the first metric of information. In general, the more information content, the larger the quantization size. By controlling the quantization size, the information about the block fusion region can be used to optimize the rate distortion of the encoding. Control may be further based on a second average metric of information of the second region.
Fig. 3 is a system block diagram of an exemplary video encoder 200 that can encode using spatial activity metrics that can include computing frequency components using a transform matrix. The exemplary video encoder 300 receives an input video 304 that may be initially partitioned or split into 4 x 4 blocks for further processing.
The exemplary video encoder 300 includes an intra prediction processor 308, a motion estimation and compensation processor 312 (also referred to as an inter prediction processor), a transform/quantization processor 316, an inverse quantization and inverse transform processor 320, a loop filter 324, a decoded picture buffer 328, and an entropy encoding processor 332. The bitstream parameters may be input to the entropy encoding processor 332 for inclusion in the output bitstream 336.
The transform/quantization processor 313 can perform block fusion and calculate spatial activity metrics, including calculating frequency components using the transform matrix of each block.
In operation, it is determined whether to process each block of a frame of the input video 304 by intra-picture prediction or with motion estimation/compensation. The block may be provided to the intra prediction processor 308 or the motion estimation and compensation processor 312. If the block is processed by intra prediction, processing is performed by the intra prediction processor 308 to output a prediction variable; if the motion estimation and compensation process is passed, the process is performed by the motion estimation and compensation processor 312.
The residual may be formed by subtracting a prediction variable from the input video. The residual may be received by a transform/quantization processor 316, which performs a transform process, such as a Discrete Cosine Transform (DCT), to generate quantized coefficients. The quantized coefficients and any associated signaling information are provided to an entropy encoding processor 332 for entropy encoding and inclusion in an output bitstream 336. In addition, the quantized coefficients may be provided to an inverse quantization and inverse transform processor 320 to render pixels, which may be combined with the prediction variables and processed by a loop filter 324; its output is stored in decoded picture buffer 328 for use by motion estimation and compensation processor 312.
It should be noted that any one or more of the aspects and embodiments described herein are facilitated to be implemented in digital electronic circuitry, integrated circuitry, specially designed Application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs), computer hardware, firmware, software and/or combinations thereof. It will be apparent to one of ordinary skill in the computer arts that one or more machines programmed in accordance with the teachings of the present specification (e.g., one or more computing devices acting as a consumer electronic document computing device, one or more server devices, such as a document server) may be implemented and/or implemented. These various aspects or features can include manners implemented in one or more computer programs and/or software executable and/or interpretable by a programmable system including at least one programmable processor, which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device. It will be readily apparent to those of ordinary skill in the software art that a corresponding software code may be readily prepared by a skilled programmer based on the teachings of this disclosure. The software and/or software modules employed by the above aspects and embodiments may also include corresponding hardware for assisting in implementing machine-executable software and/or software module instructions.
Such software may be a computer program product employing a machine-readable storage medium. The machine-readable storage medium may be any medium that can store and/or encode a sequence of instructions that can be executed by a machine (e.g., a computing device) and which cause the machine to perform any one of the methods and/or embodiments described herein. Examples of machine-readable storage media include, but are not limited to, magnetic disks, optical disks (e.g., CD-R, DVD, DVD-R, etc.), magneto-optical disks, read-only memory devices "ROM", random access memory devices "RAM", magnetic cards, optical cards, solid-state memory devices, erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), programmable Logic Devices (PLD), and/or any combination thereof. A machine-readable storage medium as used in the present invention is intended to include both single media and physically separate sets of media, such as sets of optical disks or a combination of one or more hard disk drives and computer memory. A machine-readable storage medium as used in the present invention does not include transitory storage forms in signal transmission.
Such software may also include information (e.g., data) carried as data signals on a data carrier such as a carrier wave. For example, machine-readable information embodied on a data carrier as a data signal carried therein, wherein the signal encodes a sequence of instructions, or portions thereof, for execution by a machine (e.g., a computing device), and any related information (e.g., data structures and data) that may cause the machine to perform any one of the methods and/or embodiments described herein.
Examples of computing devices include, but are not limited to, electronic book reading devices, computer workstations, terminal computers, server computers, handheld devices (e.g., tablet computers, smartphones, etc.), network computers, network routers, network switches, bridges, any machine capable of executing a sequence of instructions for instructing a machine to take an action, and any combination thereof. In one example, the computing device may include and/or be included in a self-service terminal.
FIG. 4 illustrates a schematic diagram of one embodiment of a computing device in an example form of a computer system 400 in which a set of instructions for causing a control system to perform any one or more of the aspects and/or methods of the present disclosure may be executed. It is also contemplated that a set of specially configured instructions be executed by a plurality of computing devices to cause one or more of the devices to perform any one or more of the aspects and/or methods of the present disclosure. Computer system 400 includes a processor 404 and a memory 408, which communicate with each other and with other components via a bus 412. Bus 412 may include any of a variety of bus structures including, but not limited to, a memory bus, a memory controller, a peripheral bus, a local bus, and any combination thereof using any of a variety of bus architectures.
Memory 408 may include various components (e.g., machine readable media) including, but not limited to, random access memory components, read-only components, and any combination thereof. In one example, a basic input/output system 416 (BIOS), containing the basic routines that help to transfer information between elements within computer system 400, such as during start-up, may be stored in memory 408. The memory 408 may also include instructions (e.g., software) 420 (e.g., stored on one or more machine-readable media) that implement any one or more of the aspects and/or methods of the present disclosure. In another example, memory 408 may further include any number of program modules including, but not limited to, an operating system, one or more application programs, other program modules, program data, and any combination thereof.
Computer system 400 may also include a storage device 424. Examples of a storage device (e.g., storage device 424) include, but are not limited to, a hard disk drive, a magnetic disk drive, a combination of an optical drive and an optical medium, a solid-state storage device, and any combination thereof. Storage devices 424 may be connected to bus 412 by a corresponding interface (not shown). Exemplary interfaces include, but are not limited to, small Computer System Interface (SCSI), advanced Technology Attachment (ATA), serial Advanced Technology Attachment (SATA), universal Serial Bus (USB), IEEE 1394 interface (FireWire), and any combination thereof. In one example, the storage device 424 (or one or more components thereof) may be removably connected with the computer system 400, for example, via an external port connector (not shown). In particular, the storage device 424 and associated machine-readable media 428 may provide non-volatile and/or volatile storage for machine-readable instructions, data structures, program modules, and/or other data for the computer system 400. In one example, software 420 may reside, in whole or in part, within machine-readable medium 428. In another example, software 420 may reside, in whole or in part, in processor 404.
Computer system 400 may also include an input device 432. In one example, a user of computer system 400 may enter commands and/or other information into computer system 400 via input device 432. Examples of input devices 432 include, but are not limited to, an alphanumeric input device (e.g., keyboard), a pointing device, a joystick, a game pad, an audio input device (e.g., microphone and voice response system, etc.), a cursor control device (e.g., mouse), a touchpad, an optical scanner, a video capture device (e.g., camera and video camera), a touch screen, and any combination thereof. Input device 432 may be connected to bus 412 via any of a variety of interfaces (not shown); interfaces include, but are not limited to, a serial interface, a parallel interface, a game port, a USB interface, a firewire interface, a bus direct interface 412, and any combination thereof. The input device 432 may include a touch screen interface that may be part of the display 436 or separate from the display 436, as discussed further below. The input device 432 may be used as a user selection device to select one or more graphical representations in a graphical interface as described above.
A user may also enter instructions and/or other information into computer system 400 via storage device 424 (e.g., a removable disk drive, a flash memory drive, etc.) and/or network interface device 440. A network interface device (e.g., network interface device 440) may be used to connect computer system 400 to one or more of a plurality of networks, such as network 444, and to one or more remote devices 448 connected thereto. Examples of network interface devices include, but are not limited to, network interface cards (e.g., mobile network interface cards, local area network LAN interface cards), modems, and any combination thereof. Examples of networks include, but are not limited to, wide area networks (e.g., the internet and enterprise networks), local area networks (e.g., networks associated with offices, buildings, campuses, or other relatively small geographic spaces), telephony networks, data networks associated with telephony/voice providers (e.g., data and/or voice networks of mobile communication providers), direct connections between two computing devices, and any combination thereof. The network may employ wired and/or wireless modes of communication, such as network 444. In general, any network topology may be used. Information (e.g., data and software 420, etc.) may be transferred to computer system 400 and/or from computer system 400 via network interface device 440.
Computer system 400 may further include a video display adapter 452 for transferring displayable images (images) to a display device, such as display device 436. Examples of display devices include, but are not limited to, liquid Crystal Displays (LCDs), cathode Ray Tubes (CRTs), plasma displays, light Emitting Diode (LED) displays, and any combination thereof. A display adapter 452 and display device 436 may be used in conjunction with processor 404 to provide a graphical representation of aspects of the invention. In addition to the display device, computer system 400 may include one or more other peripheral output devices including, but not limited to, audio speakers, a printer, and any combination thereof. Which may be connected to bus 412 via a peripheral interface 456. Examples of peripheral interfaces include, but are not limited to, a serial port, a USB interface, a firewire interface, a parallel interface, and any combination thereof.
The foregoing has described in detail the illustrative embodiments of the invention. Various modifications and additions may be made to the present invention without departing from the spirit and scope of the invention. The features of each of the above-described embodiments may be combined with the features of the other described embodiments as appropriate to provide various combinations of features in the relevant new embodiments. Furthermore, while the above describes a number of individual embodiments, what has been described in this disclosure is illustrative of the application of the principles of the present disclosure. Additionally, although the particular methods described in this disclosure are shown and/or described as being performed in a particular order, the order is highly variable within the ordinary skill to implement embodiments of the disclosure. Accordingly, the description is intended to be illustrative only and is not intended to limit the scope of the invention.
In the description and claims above, the phrases "at least one" or "one or more" or the like may occur, followed by an associated list of elements or features. The term "and/or" may also occur in a list comprising two or more elements or features. Unless otherwise implied or explicitly stated to contradict the phrase used in context, the phrase is intended to mean any listed element or feature alone or in combination with other listed elements or features. For example, the phrases "at least one of a and B", "one or more of a and B", and "a and/or B" are intended to mean "a alone, B alone, or a and B, respectively. Similar explanations also apply to lists containing three or more items. For example, the phrases "at least one of A, B and C", "one or more of A, B and C", and "A, B and/or C" are intended to mean "a alone, B alone, C, A and B, A and C, B and C, or A, B and C", respectively. Furthermore, the use of the term "based on" in the foregoing and claims is intended to mean "based, at least in part, on" and thus also allows for the inclusion of unrecited features or elements.
The subject matter described in this disclosure may be implemented in systems, devices, methods, and/or articles of manufacture as desired. The embodiments set forth in the foregoing specification do not represent all embodiments consistent with the subject matter described in this disclosure. Rather, they are merely examples of some aspects consistent with the subject matter. Although a few variations have been described in detail above, other modifications or additions are possible. In addition to the variations set forth above, other features and/or variations may be provided, among others. For example, the above-described embodiments are intended to provide various combinations and subcombinations of the disclosed features and/or combinations and subcombinations of the several other features disclosed above. Additionally, the logic flows depicted in the figures and/or described in this disclosure do not necessarily require the particular order shown, or sequential order, to achieve desirable results. Other implementations are within the scope of the following claims.

Claims (10)

1. An inverse quantization and inverse transform processor configured to:
receiving a video signal generated from a picture, the picture having been processed at an encoder by:
Partitioning the picture into a plurality of blocks;
obtaining a residual for each of the plurality of blocks using the prediction variable; applying a transform to each residual to obtain a transformed block; determining a spatial activity metric for each transformed block; and
Determining whether each block of the plurality of blocks is allocated to a first region of the picture or a second region of the picture, a block in the first region having been allocated a first quantization parameter by the encoder, and a block in the second region having been allocated a second quantization parameter by the encoder, by determining whether a difference between the spatial activity metric of the block and the spatial activity metric of an adjacent block is greater than or less than a threshold;
inverse quantizing the block of the first region using the first quantization parameter; and inversely quantizing the block of the second region using the second quantization parameter.
2. The processor of claim 1, further configured to:
performing inverse transform on the inverse quantized block of the first region; and
And carrying out inverse transformation on the block after the inverse quantization of the second region.
3. The processor of claim 1, wherein the transform is a discrete cosine transform.
4. The processor of claim 2, further comprising:
A loop filter; and
And decoding the picture buffer.
5. The processor of claim 1, wherein each block of the plurality of blocks is 128 x 128.
6. A method, the method comprising:
Receiving, using an inverse quantization and inverse transform processor, a video signal generated from a picture, the picture having been processed at an encoder by:
Partitioning the picture into a plurality of blocks;
obtaining a residual for each of the plurality of blocks using the prediction variable; applying a transform to each residual to obtain a transformed block; determining a spatial activity metric for each transformed block; and
Determining whether each block of the plurality of blocks is allocated to a first region of the picture or a second region of the picture, a block in the first region having been allocated a first quantization parameter by the encoder, and a block in the second region having been allocated a second quantization parameter by the encoder, by determining whether a difference between the spatial activity metric of the block and the spatial activity metric of an adjacent block is greater than or less than a threshold;
Inverse quantizing, with the inverse quantization and inverse transform processor, the block of the first region using the first quantization parameter; and
And inversely quantizing, with the inverse quantization and inverse transformation processor, the block of the second region using the second quantization parameter.
7. The method of claim 6, further comprising:
inverse transforming the inverse quantized block of the first region using the inverse quantization and inverse transformation processor, and
And inversely transforming the inversely quantized block of the second region using the inverse quantization and inverse transformation processor.
8. The method of claim 6, wherein the transform is a discrete cosine transform.
9. The method of claim 7, wherein the inverse quantization and inverse transform processor comprises a loop filter and a decoded picture buffer.
10. The method of claim 6, wherein each block of the plurality of blocks is 128 x 128.
CN201980077959.6A 2018-11-27 2019-11-27 Block-based spatial activity metrics for pictures Active CN113170133B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201862771909P 2018-11-27 2018-11-27
US62/771,909 2018-11-27
PCT/US2019/063704 WO2020113073A1 (en) 2018-11-27 2019-11-27 Block-based spatial activity measures for pictures cross-reference to related applications

Publications (2)

Publication Number Publication Date
CN113170133A CN113170133A (en) 2021-07-23
CN113170133B true CN113170133B (en) 2024-06-14

Family

ID=70853102

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201980077959.6A Active CN113170133B (en) 2018-11-27 2019-11-27 Block-based spatial activity metrics for pictures

Country Status (10)

Country Link
US (1) US11546597B2 (en)
EP (1) EP3888365A4 (en)
JP (1) JP7253053B2 (en)
KR (1) KR20210093336A (en)
CN (1) CN113170133B (en)
BR (1) BR112021010167A2 (en)
MX (1) MX2021006200A (en)
PH (1) PH12021551222A1 (en)
SG (1) SG11202105604UA (en)
WO (1) WO2020113073A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003056839A1 (en) * 2001-12-31 2003-07-10 Stmicroelectronics Asia Pacific Pte Ltd Video encoding
WO2010141899A2 (en) * 2009-06-05 2010-12-09 Qualcomm Incorporated 4x4 transform for media coding

Family Cites Families (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3258840B2 (en) * 1994-12-27 2002-02-18 シャープ株式会社 Video encoding device and region extraction device
JPH10341436A (en) * 1997-06-06 1998-12-22 Matsushita Electric Ind Co Ltd Highly efficient encoding device
JP3060376B2 (en) * 1997-04-25 2000-07-10 日本ビクター株式会社 Image encoding / decoding device and image decoding device
JP2894335B2 (en) * 1997-10-09 1999-05-24 日本電気株式会社 Image encoding device, method, and recording medium recording program
US20020191695A1 (en) * 2001-06-07 2002-12-19 Irvine Ann Chris Interframe encoding method and apparatus
KR100961760B1 (en) * 2002-08-13 2010-06-07 퀄컴 인코포레이티드 Motion Estimation Method and Apparatus Which Refer to Discret Cosine Transform Coefficients
US8503536B2 (en) * 2006-04-07 2013-08-06 Microsoft Corporation Quantization adjustments for DC shift artifacts
CN101771868B (en) * 2008-12-31 2016-03-02 华为技术有限公司 To quantizing method and the device of image
KR101480412B1 (en) * 2009-01-22 2015-01-09 삼성전자주식회사 Method and apparatus for transforming image, and method and apparatus for inverse-transforming image
EP2373049A1 (en) * 2010-03-31 2011-10-05 British Telecommunications Public Limited Company Video quality measurement
US20110268180A1 (en) * 2010-04-29 2011-11-03 Naveen Srinivasamurthy Method and System for Low Complexity Adaptive Quantization
US8787443B2 (en) * 2010-10-05 2014-07-22 Microsoft Corporation Content adaptive deblocking during video encoding and decoding
US20120218432A1 (en) * 2011-02-28 2012-08-30 Sony Corporation Recursive adaptive intra smoothing for video coding
US9008180B2 (en) * 2011-04-21 2015-04-14 Intellectual Discovery Co., Ltd. Method and apparatus for encoding/decoding images using a prediction method adopting in-loop filtering
EP2705667B1 (en) * 2011-06-30 2016-09-21 Huawei Technologies Co., Ltd. Lossless coding and associated signaling methods for compound video
GB201312382D0 (en) * 2013-07-10 2013-08-21 Microsoft Corp Region-of-interest aware video coding
US9432696B2 (en) * 2014-03-17 2016-08-30 Qualcomm Incorporated Systems and methods for low complexity forward transforms using zeroed-out coefficients
KR102273670B1 (en) * 2014-11-28 2021-07-05 삼성전자주식회사 Data processing system modifying a motion compensation information, and method for decoding video data including the same
JP6459761B2 (en) * 2015-05-01 2019-01-30 富士通株式会社 Moving picture coding apparatus, moving picture coding method, and moving picture coding computer program
CN108293116A (en) * 2015-11-24 2018-07-17 三星电子株式会社 Video encoding/decoding method and equipment and method for video coding and equipment
US10218976B2 (en) * 2016-03-02 2019-02-26 MatrixView, Inc. Quantization matrices for compression of video
US11095877B2 (en) * 2016-11-30 2021-08-17 Microsoft Technology Licensing, Llc Local hash-based motion estimation for screen remoting scenarios
JP6822121B2 (en) * 2016-12-19 2021-01-27 ソニー株式会社 Image processing equipment, image processing methods and programs
CN116886900A (en) * 2017-09-26 2023-10-13 松下电器(美国)知识产权公司 Decoding device, encoding device, decoding method, and encoding method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003056839A1 (en) * 2001-12-31 2003-07-10 Stmicroelectronics Asia Pacific Pte Ltd Video encoding
WO2010141899A2 (en) * 2009-06-05 2010-12-09 Qualcomm Incorporated 4x4 transform for media coding

Also Published As

Publication number Publication date
JP7253053B2 (en) 2023-04-05
PH12021551222A1 (en) 2021-12-06
WO2020113073A1 (en) 2020-06-04
EP3888365A1 (en) 2021-10-06
JP2022508246A (en) 2022-01-19
US11546597B2 (en) 2023-01-03
KR20210093336A (en) 2021-07-27
CN113170133A (en) 2021-07-23
BR112021010167A2 (en) 2021-08-17
EP3888365A4 (en) 2022-05-04
US20210289206A1 (en) 2021-09-16
SG11202105604UA (en) 2021-06-29
MX2021006200A (en) 2021-09-10

Similar Documents

Publication Publication Date Title
JP7134200B2 (en) digital image recompression
CN102611823B (en) Method and equipment capable of selecting compression algorithm based on picture content
CN104754361B (en) Image Coding, coding/decoding method and device
WO2015120818A1 (en) Picture coding and decoding methods and devices
US20220377339A1 (en) Video signal processor for block-based picture processing
JP2022524916A (en) Shape-adaptive discrete cosine transform for geometric division with adaptive number regions
CN113647105A (en) Inter prediction for exponential partitions
US9787985B2 (en) Reduction of spatial predictors in video compression
US20240129473A1 (en) Probability estimation in multi-symbol entropy coding
CN113170133B (en) Block-based spatial activity metrics for pictures
WO2022166370A1 (en) Video encoding and decoding method and apparatus, computer program product, computer-readable storage medium, and electronic device
CN115442617A (en) Video processing method and device based on video coding
US10045022B2 (en) Adaptive content dependent intra prediction mode coding
EP4117289A1 (en) Image processing method and image processing device
CN117356092A (en) System, method and bitstream structure for a hybrid feature video bitstream and decoder
JP2023522845A (en) Video coding method and system using reference region
RU2796934C2 (en) Measurements of spatial actions of images based on blocks
RU2782583C1 (en) Block-based image merging for context segmentation and processing
US11889055B2 (en) Methods and systems for combined lossless and lossy coding
US20240114185A1 (en) Video coding for machines (vcm) encoder and decoder for combined lossless and lossy encoding
US20240242320A1 (en) System and method for analyzing compressed video
WO2022047144A1 (en) Methods and systems for combined lossless and lossy coding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant