CN104718752A - Signaling of down-sampling phase information in scalable video coding - Google Patents

Signaling of down-sampling phase information in scalable video coding Download PDF

Info

Publication number
CN104718752A
CN104718752A CN201380053388.5A CN201380053388A CN104718752A CN 104718752 A CN104718752 A CN 104718752A CN 201380053388 A CN201380053388 A CN 201380053388A CN 104718752 A CN104718752 A CN 104718752A
Authority
CN
China
Prior art keywords
coefficient
phase
phase shifts
video
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201380053388.5A
Other languages
Chinese (zh)
Other versions
CN104718752B (en
Inventor
陈建乐
郭立威
李想
马尔塔·卡切维奇
濮伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qualcomm Inc
Original Assignee
Qualcomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qualcomm Inc filed Critical Qualcomm Inc
Publication of CN104718752A publication Critical patent/CN104718752A/en
Application granted granted Critical
Publication of CN104718752B publication Critical patent/CN104718752B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/117Filters, e.g. for pre-processing or post-processing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/187Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a scalable video layer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/33Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the spatial domain

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Methods and systems for video image coding are provided. Sets of filters may be selected and applied to video information at least partially based on phase displacement information between a first and second layer of video information. For example, the phase displacement information may correspond to a difference between a position of a pixel in the first layer and a corresponding position of the pixel in the second layer. The selected filter set can be an up-sampling filter or a down-sampling filter. The phase displacement information may be encoded as a syntax element embedded in the video bit stream.

Description

Transmitting of sampling phase information is reduced in scalable video coding
Technical field
The present invention relates to video coding, comprise the Code And Decode of video content, and in particular in frame and inter prediction.
Background technology
Digital video capabilities can be incorporated in the device of broad range, comprises Digital Television, digital direct broadcast system, wireless broadcast system, personal digital assistant (PDA), on knee or desktop PC, digital camera, digital recorder, digital media player, video game apparatus, video game console, honeycomb fashion or satellite radiotelephone, video conference call device and fellow.Digital video apparatus can implement video compression technology, such as, by MPEG-2, MPEG-4, ITU-T H.263, the technology that describes in the standard of the ITU-T H.264/MPEG-4 expanded definition of the 10th partial higher video coding (AVC), high efficiency video coding (HEVC) standard developed at present and this little standard, more efficiently to launch, to receive and to store digital video information.
Video coding technique comprises space (in picture) prediction and/or the time (between picture) predicts reduce or remove redundancy intrinsic in video sequence.For block-based video coding, video segment (that is, a part for frame of video or frame of video) may be partitioned into video block, and video block also can be referred to as tree block, decoding unit (CU) and/or decode node.The unit that high efficiency video coding (HVEC) comprises the information of three block concepts represents: decoding unit (CU), predicting unit (PU) and converter unit (TU).Use the video block in intra-coding (I) section of the spatial predictive encoding picture about the reference sample in the adjacent block in same picture.The video block in interframe decoding (P or B) section of picture can use about the spatial prediction of the reference sample in the adjacent block in same picture or the time prediction about the reference sample in other reference picture.Picture can be referred to as frame, and reference picture can be referred to as reference frame.
Space or time prediction cause the predictability block for block to be decoded.Residual data represents that the pixel between original block to be decoded and predictability block is poor.Encode through the residual data of the difference between decode block and prediction block according to the motion vector and instruction that point to the reference sample block forming predictability block through interframe decode block.Encode according to Intra coding modes and residual data through intra-coding block.In order to further compression, residual data can be transformed to transform domain from pixel domain, thus cause residual transform coefficients, then can quantize residual transform coefficients.Can scan at the beginning by two-dimensional array through quantization transform coefficient, to produce the one-dimensional vector of conversion coefficient, and can apply entropy decoding with reach more multiple pressure contracting.
Summary of the invention
In order to sum up object of the present invention, some aspect, advantage and novel feature are described herein.Should be understood that any specific embodiment according to disclosing herein, all these advantages may not be reached.Therefore, an advantage or a group advantage can be reached or optimize according to such as this paper institute teaching and may not reach as may the mode of other advantage of teaching or the suggestion feature that embodies or disclose herein herein.
According to some embodiments, a kind of device for coded video information comprises processor and memory.Described memory can be configured to stored video data, and described video data can comprise the ground floor of video information.Described processor can be configured to: determine the phase shift information of the second layer of video information relative to described ground floor; Image filter set is selected at least partly based on described phase shift information; And use the image filter set of described ground floor and described identification produce described ground floor through revision.
The device for coded video information of previous paragraph can comprise one or many person in following characteristics: described ground floor can comprise basal layer; The described second layer can comprise enhancement layer; Described selected digital image filter set can comprise increases sampled picture filter; And and described processor can be configured to the syntactic element that reception is extracted from the coded video bitstream transmitting described phase shift information further.Alternatively, described ground floor can comprise enhancement layer; The described second layer can comprise basal layer; Described selected digital image filter set can comprise minimizing sampled picture filter; And described processor can be configured to the syntactic element of generation for coded video bitstream further to transmit described phase shift information.
Described phase shift information can comprise the difference between the position of the pixel in described ground floor and the corresponding position of the described pixel in the described second layer.Described phase shift information can comprise the bi-values of the zero phase relation between the described ground floor of instruction and the described second layer or the one in symmetrical phase relation.Described phase shift information can comprise the first syntactic element to indicate horizontal phase displacement information and the second syntactic element to indicate vertical phase displacement information.In addition, at least one in described first syntactic element and described second syntactic element can comprise non-bi-values.Described processor can be configured to further: if described phase shift information does not transmit in bit stream, so selects default image filter set; And if described phase shift information transmits in bit stream, so select image filter set based on described phase shift information.Described default image filter set can at least partly based on the symmetrical phase relation between described ground floor and the described second layer.Alternatively, described default image filter set can at least partly based on the zero phase relation between described ground floor and the described second layer.Described phase shift information can comprise alignment information.Such as, described alignment information can be modeled as function x pixel coordinate and y pixel coordinate being mapped to phase deviation.Described phase shift information can comprise video parameter collection (VPS) syntactic element of instruction chrominance phase displacement information.Described selected digital image filter set can comprise the coefficient that the part as bit stream transmits.Described device can comprise further following at least one: desktop PC, notebook, flat computer, Set Top Box, telephone handset, TV, camera, display unit, digital media player, video game console and comprise the stream video device of memory and processor.
Described selected digital image filter set can comprise coefficient { 0,0,0,64,0,0,0, the 0} for zero phase-shift, for the coefficient { 0,1 ,-3,63,4 ,-2,1,0} of a phase shift, for the coefficient { 0,2 ,-6,61,9 ,-3,1,0} of two phase shifts, for the coefficient {-1,3 ,-8,60,13 ,-4,1,0} of three phase shifts, for coefficient {-Isosorbide-5-Nitrae ,-10,58,17 ,-5,1, the 0} of four phase shifts, for coefficient {-Isosorbide-5-Nitrae ,-11,53,25 ,-8,3 ,-1} of five phase shifts, for coefficient {-Isosorbide-5-Nitrae ,-11,50,29 ,-9,3 ,-1} of six phase shifts, for coefficient {-Isosorbide-5-Nitrae ,-11,45,34 ,-10,4 ,-1} of seven phase shifts, for coefficient {-Isosorbide-5-Nitrae ,-11,40,40 ,-11,4 ,-1} of eight phase shifts, for coefficient {-Isosorbide-5-Nitrae ,-10,34,45 ,-11,4 ,-1} of nine phase shifts, for the coefficient {-1,3 ,-9,29,50 ,-11,4 ,-1} of ten phase shifts, for the coefficient {-1,3 ,-8,25,53 ,-11,4 ,-1} of 11 phase shifts, for the coefficient { 0,1 ,-5,17,58 ,-10,4 ,-1} of 12 phase shifts, for the coefficient { 0,1 ,-4,13,60 ,-8,3 ,-1} of 13 phase shifts, for the coefficient { 0,1 ,-3,8,62 ,-6,2,0} and coefficient { 0,1 ,-2,4,63 ,-3,1,0} for 15 phase shifts of 14 phase shifts.Described selected digital image filter set can comprise the coefficient { 0 for zero phase-shift, 64, 0, 0}, for the coefficient {-2 of a phase shift, 62, 4, 0}, for the coefficient {-2 of two phase shifts, 58, 10,-2}, for the coefficient {-4 of three phase shifts, 56, 14,-2}, for the coefficient {-4 of four phase shifts, 54, 16,-2}, for the coefficient {-6 of five phase shifts, 52, 20,-2}, for the coefficient {-6 of six phase shifts, 48, 26,-4}, for the coefficient {-4 of seven phase shifts, 42, 30,-4}, for the coefficient {-4 of eight phase shifts, 36, 36,-4}, for the coefficient {-4 of nine phase shifts, 30, 42,-4}, for the coefficient {-4 of ten phase shifts, 26, 48,-6}, for the coefficient {-2 of 11 phase shifts, 20, 52,-6}, for the coefficient {-2 of 12 phase shifts, 16, 54,-4}, for the coefficient {-2 of 13 phase shifts, 14, 56,-4}, for the coefficient {-2 of 14 phase shifts, 10, 58,-2} and the coefficient { 0 for 15 phase shifts, 4, 62,-2}.
According to some embodiments, a kind of method of decode video information can comprise: the basal layer obtaining video information; Receive the syntactic element extracted from coded video bitstream, institute's syntax elements comprises the phase shift information of described basal layer relative to enhancement layer of video information; Image filter set is selected at least partly based on described phase shift information; And use the image filter set of described basal layer and described identification produce described enhancement layer through increase sampled version.
According to some embodiments, a kind of method of encode video information can comprise: the enhancement layer obtaining video information; Select to reduce sampled picture filter set; Described enhancement layer and described selected digital image filter set is used to produce basal layer; And generation comprises the syntactic element of described basal layer relative to the phase shift information of described enhancement layer.
According to some embodiments, a kind of equipment for coded video bit stream can comprise: for obtaining the device of the enhancement layer of video information; For generation of the device of basal layer relative to the syntactic element of the phase shift information of described enhancement layer comprising video information; For at least part of device selecting image filter set based on described phase shift information; For the device through reducing sampled version using the image filter set of described enhancement layer and described identification to produce described enhancement layer; And for storing the described device through reducing sampled version of described enhancement layer.
According to some embodiments, a kind of non-transitory computer-readable media can have the instruction be stored thereon, and describedly makes described processor when being performed by processor: the basal layer obtaining video information; Receive the syntactic element extracted from coded video bitstream, institute's syntax elements comprises the phase shift information of basal layer relative to enhancement layer of video information; Image filter set is selected at least partly based on described phase shift information; And use the image filter set of described enhancement layer and described identification produce described enhancement layer through increase sampled version.
Accompanying drawing explanation
Fig. 1 illustrates the block diagram that can utilize the instance video coded system of technology of the present invention.
Fig. 2 illustrates the block diagram that can be configured to the example video encoder implementing technology of the present invention.
Fig. 3 illustrates the block diagram that can be configured to the example video encoder implementing technology of the present invention.
Fig. 4 is the curve chart of the scalability illustrated on three different dimensions.
Fig. 5 illustrates the schematic diagram of the exemplary construction of SVC bit stream.
Fig. 6 A illustrates the schematic diagram of the example of the SVC access unit in bit crossfire.
Fig. 6 B illustrates the schematic diagram of the example of BL model prediction in frame.
Fig. 6 C illustrates that original and through 2x minimizing sampling video relative brightness samples grid.
Fig. 6 D illustrates that original and through 1.5x minimizing sampling video relative brightness samples grid.
Fig. 7 illustrates the flow chart for the embodiment of the process 700 of coded video information.
The schematic diagram of the out-of-alignment example of Fig. 8 A pixels illustrated information.
Another schematic diagram of the out-of-alignment example of Fig. 8 B pixels illustrated information.
Fig. 9 illustrates the schematic diagram of the example of chroma samples position.
Embodiment
Scalable video coding (SVC) refers to the video coding using basal layer (being sometimes referred to as reference layer) and one or more scalable enhancement layer.For SVC, basal layer can carry the video data with mass of foundation level.One or more enhancement layer can carry extra video data to support higher space, time and/or noise SNR level.Enhancement layer can be defined relative to the layer of previous coding.
Basal layer and enhancement layer can have different resolution.Such as, increase sampling filtering (be sometimes referred to as and sample filtering again) can be applied to basal layer to mate the space aspect ratio of enhancement layer.This process can be called spatial scalability.The set of increase sampling filter can be applied to basal layer, and can based on phase place (being sometimes referred to as fraction pixel skew) from described Resource selection filter.Described phase place can be calculated based on the space aspect ratio between basal layer and enhancement-layer pictures.
In some systems, the set of single fixing increase sampling filter is applied to basal layer to produce through convergent-divergent content for inter-layer prediction.For all inter-layer prediction types, fixing increase sampling may be efficient not.Such as, in some systems, the phase place used in increase sampling filter and reference pixel are determined by means of only spatial scalable ratio, and the basal layer that its hypothesis produces in minimizing sampling process has same phase all the time.These systems unfortunately suffer when increasing sampling basal layer the problem lacking flexibility.In addition, in some systems, in bit stream, do not transmit the phase place (such as, reducing sample position) reducing sampling filter.In such systems, assuming that perform to reduce by correct phase (such as, with increase the sampling phase phase place of mating) and sample.If the phase place between existence increase sampling samples with minimizing is not mated, the decoding efficiency loss of more than 20% or 20% so may be there is.
In some embodiments of the invention, technology of the present invention adds in the flexibility increased in sampling and minimizing sampling video data procedures and performance.By (such as) at least part of second layer based on video information (such as, through reduce sampling basal layer) relative to ground floor (such as, enhancement layer) phase shift information control or change in order to decoded video data filter by adaptive mode advantageously perform increase sampling and reduce sampling process.Phase shift information can be embedded in video bit stream as syntactic element.Therefore, embodiment described herein can pass on the phase information for reducing sampling filter efficiently, eliminates any translation loss when selecting can occur when having the minimizing sampling filter of incorrect phase place thus efficiently.
Some embodiment described herein relates to the inter-layer prediction for the such as scalable video coding of HEVC (high efficiency video coding) when advanced video coder-decoder.More particularly, the present invention relates to the system and method for the performance of the improvement for the inter-layer prediction in scalable video coding (SVC) expansion of HEVC.In the following description, the H.264/AVC technology relevant with some embodiment is described; Also discuss HEVC standard and relevant technologies.Although describe some embodiment at HEVC and/or H.264 in standard and situation herein, those skilled in the art can understand, and the system and method disclosed herein is applicable to any suitable video coding standard.Such as, the embodiment disclosed herein is applicable to one or many person in following standard: ITU-T H.261, ISO/IEC MPEG-1Visual, ITU-T H.262 or ISO/IEC MPEG-2Visual, ITU-T H.263, H.264 ISO/IEC MPEG-4Visual and ITU-T (be also referred to as ISO/IEC MPEG-4AVC), comprises its scalable video coding (SVC) and multi-view video decoding (MVC) expansion.
SVC extension can comprise multiple video information layer.Such as, bottom can serve as basal layer (BL), and top layer can serve as enhancement mode layer (EL).Term " enhancement mode layer " uses according to its wide in range and common meaning in this article, and can exchange with term " enhancement layer " and use.Intermediate layer can serve as EL or reference layer (RL) or both.For example, can be the EL of layer (such as, basal layer or any insertion enhancement layer) thereunder at the layer of centre, and serve as the RL of one or more enhancement layer above it simultaneously.
In order to the object only illustrated, such as, with the example comprising only two layers (such as, the lower-level layer of basal layer and the higher levels layer of such as enhancement mode layer) some embodiment disclosed is described herein.Should be understood that these examples are applicable to the configuration comprising multiple basal layer and/or enhancement layer.In addition, for ease of explaining, with reference to some embodiment, following disclosure comprises term " frame " or " block ".But these terms do not intend to be restrictive.Such as, technology described below can supply any suitable video unit (such as, block (such as, CU, PU, TU, macro block etc.), section, frame etc.) to use.
In many aspects, HEVC follows the framework of previous video coding standards usually.The unit of the prediction in HEVC is different from the predicting unit (such as, macro block) in some previous video coding standards.In fact, do not exist in HEVC as in some previous video coding standards the concept of macro block understood.Macro block is replaced by the pyramid structure based on four points of tree schemes, and pyramid structure can provide high flexibility and other possibility benefit.Such as, in HEVC scheme, the block of definition three types, such as, decoding unit (CU), predicting unit (PU) and converter unit (TU).CU can refer to the elementary cell of regional split.Can consider that CU is similar to the concept of macro block, but it does not limit largest amount, and recurrence can be allowed to split into four equal sizes CU to improve context adaptive.PU can be thought of as the elementary cell of interframe/infra-frame prediction, and its can in single PU containing multiple arbitrary shape cut section with decoding irregular image pattern effectively.TU can be thought of as the elementary cell of conversion.Can come its definition independent of PU; But its size may be limited to the CU that TU belongs to.Agllutination form three different concepts this separately can to allow each optimised according to its effect, this can cause the decoding efficiency of improvement.Fig. 1 illustrates the block diagram that can utilize the instance video decoding system 10 of technology of the present invention.Use as described in this article, term " video decoder " refers generally to video encoder and Video Decoder.In the present invention, term " video coding " or " decoding " can usually refer to Video coding and video decode.
As shown in fig. 1, video decoding system 10 comprises source apparatus 12 and destination device 14.Source apparatus 12 produces encoded video data.Therefore, source apparatus 12 can be referred to as video coding apparatus.The encoded video data that destination device 14 decodable code is produced by source apparatus 12.Therefore, destination device 14 can be referred to as video decoder.Source apparatus 12 and destination device 14 can be the example of video decoding apparatus.
Source apparatus 12 and destination device 14 can comprise the device of broad range, comprise desktop PC, action calculation element, notes type (such as, on knee) computer, flat computer, Set Top Box, such as so-called " intelligence " phone and telephone handset, TV, camera, display unit, digital media player, video game console, car-mounted computer or fellow.In some instances, source apparatus 12 and destination device 14 can through equipment for radio communications.
Destination device 14 can receive encoded video data via channel 16 from source apparatus 12.Channel 16 can comprise media or the device that encoded video data can be moved to the type of destination device 14 from source apparatus 12.In an example, channel 16 can comprise the communication medium making source apparatus 12 in real time encoded video data directly can be transferred to destination device 14.In this example, source apparatus 12 can according to the communication standard of such as wireless communication protocol modulation encoded video data, and can by through modulating video transfer of data to destination device 14.Described communication medium can comprise wireless or wired communication media, such as, and radio frequency (RF) frequency spectrum or one or more physical transmission line.Communication medium can form the part of packet network (such as, the global network of local area network (LAN), wide area network or such as internet).Communication medium can comprise router, interchanger, base station or promote other equipment from source apparatus 12 to the communication of destination device 14.
In another example, channel 16 may correspond to the medium in storing the encoded video data produced by source apparatus 12.In this example, destination device 14 can access medium via disk access or card access.Medium can comprise the data storage medium of multiple local access, such as, and Blu-ray Disc, DVD, CD-ROM, flash memory or other appropriate digital medium for storing encoded video data.In another example, channel 16 can comprise the file server or another intermediate storage mean that store the Encoded video produced by source apparatus 12.In this example, destination device 14 can access via stream transmission or download the encoded video data being stored in file server or other intermediate storage mean place.File server can be and can store encoded video data and server encoded video data being transferred to the type of destination device 14.Instance file server comprises the webserver (such as, for website), file transfer protocol (FTP) (FTP) server, network attached storage (NAS) device and local drive.Destination device 14 accesses encoded video data by normal data connection (comprising Internet connection).The example types of data cube computation can comprise be suitable for accessing the encoded video data be stored on file server wireless channel (such as, Wi-Fi connects), wired connection (such as, DSL, cable modem etc.) or both combinations.Encoded video data can be stream transmission from the transmission of file server, downloads transmission or both combinations.
Technology of the present invention is not limited to wireless application or setting.Described technology can be applicable to video coding to support multiple multimedia application, such as, airborne television broadcast, CATV transmission, satellite TV transmissions, streaming video transmission (such as, via internet), encoded digital video are for being stored on data storage medium, decoding the digital video or other application that are stored on data storage medium.In some instances, video decoding system 10 can be configured to support that unidirectional or bi-directional video transmission is to support the application of such as stream video, video playback, video broadcasting and/or visual telephone.
In the example of fig. 1, source apparatus 12 comprises video source 18, video encoder 20 and output interface 22.In some cases, output interface 22 can comprise modulator/demodulator (modulator-demodulator) and/or reflector.In source apparatus 12, video source 18 can comprise such as that video capture device is (such as, video camera), the source such as video archive, the video feed-in interface from video content provider's receiving video data and/or the computer graphics system for generation of video data containing the video data of previously having captured, or the combinations in these sources.
Video encoder 20 codified through capturing, in advance capture or computer produce video data.Via the output interface 22 of source apparatus 12, encoded video data can be directly transferred to destination device 14.Also encoded video data can be stored on medium or file server for being accessed to carry out decoding and/or playback by destination device 14 after a while.
In the example of fig. 1, destination device 14 comprises input interface 28, Video Decoder 30 and display unit 32.In some cases, input interface 28 can comprise receiver and/or modulator-demodulator.The input interface 28 of destination device 14 receives encoded video data via channel 16.Encoded video data can comprise multiple syntactic elements of the described video data of the expression produced by video encoder 20.These syntactic elements can with transmit on communication medium, to store on the storage media or together with the encoded video data be stored on file server is included in.
Display unit 32 can combine with destination device 14 or in the outside of destination device 14.In some instances, destination device 14 can comprise integrated display unit and also can be configured to be situated between with exterior display device connect.In other example, destination device 14 can be display unit.In general, display unit 32 shows through decode video data to user.Display unit 32 can comprise any one in multiple display unit, such as, and the display unit of liquid crystal display (LCD), plasma display, Organic Light Emitting Diode (OLED) display or another type.
Video encoder 20 and Video Decoder 30 can operate according to video compression standard (such as, high efficiency video coding (HEVC) standard developed at present), and can meet HEVC test model (HM).
Alternatively, video encoder 20 and Video Decoder 30 exclusive or industry standard can operate according to other, described standard is ITU-T H.264 standard such as, is alternatively called as MPEG-4 the 10th part advanced video decoding (AVC), or the expansion of these standards.But technology of the present invention is not limited to any specific coding standards or technology.Other example of video compression standard and technology comprises MPEG-2, ITU-T H.263 and such as VP8 and about the exclusive of form or open source compressed format.
Although do not show in the example of Fig. 1, but video encoder 20 and Video Decoder 30 can separately and audio coder and decoder integrated, and suitable multiplexer-demultiplexer unit or other hardware and software can be comprised to dispose the coding of both the Voice & Videos in corporate data stream or separate data stream.In some instances, if applicable, multiplexer-demultiplexer unit can meet ITU H.223 multiplexer agreement, or other agreement of such as User Datagram Protoco (UDP) (UDP).
Again, Fig. 1 is only example, and technology of the present invention is applicable to video coding setting (such as, Video coding or video decode) of any data communication that may not comprise between code device and decoding device.In other example, data can be retrieved from local storage, via network stream transmission or fellow.Code device codified data and data are stored into memory, and/or decoding device can from memory search data and decoded data.In many instances, by do not communicate with one another but only coded data to memory and/or from memory search data and the device of decoded data perform Code And Decode.
Video encoder 20 and Video Decoder 30 can be embodied as any one in the multiple appropriate circuitry of such as following each separately: one or more microprocessor, digital signal processor (DSP), application-specific integrated circuit (ASIC) (ASIC), field programmable gate array (FPGA), discrete logic, hardware or its any combination.When partly with implement software technology, the instruction being used for software can be stored in suitable non-transitory computer-readable storage medium and one or more processor can be used to perform instruction within hardware to perform technology of the present invention by device.Each in video encoder 20 and Video Decoder 30 can be included in one or more encoder or decoder, and any one accessible site in described encoder or decoder is the part of the combined encoder/decoder (CODEC) in related device.
As briefly mentioned above, video encoder 20 coding video frequency data.Video data can comprise one or more picture.Each in picture is the still image of the part forming video.In some cases, picture can be called as video " frame ".When video encoder 20 coding video frequency data, video encoder 20 can produce bit stream.Bit stream can comprise a succession of position represented through decoding forming video data.Bit stream can comprise through decoding picture and the data be associated.It is representing through decoding of picture through decoding picture.
For producing bit stream, video encoder 20 can perform encoding operation to each picture in video data.When video encoder 20 performs encoding operation to described picture, video encoder 20 can produce a succession of through decoding picture and associated data.Associated data can comprise sequence parameter set, image parameters collection, auto-adaptive parameter collection and other syntactic structure.Sequence parameter set (SPS) can containing the parameter being applicable to zero or more sequence of pictures.Image parameters collection (PPS) can containing the parameter being applicable to zero or more pictures.Auto-adaptive parameter collection (APS) can containing the parameter being applicable to zero or more pictures.Parameter in APS can be the parameter that more may change than the parameter in PPS.
For producing through decoding picture, video encoder 20 can by video block equal sized by picture segmentation.Video block can be the two-dimensional array of sample.Each in video block is associated with tree block.In some cases, set block and can be referred to as maximum decoding unit (LCU).The tree block of HEVC extensively can be similar to the macro block of such as Previous standards H.264/AVC.But tree block is not necessarily limited to specific size, and can comprise one or more decoding unit (CU).Video encoder 20 can use four points to set segmentation and the video block of tree block is divided into the video block (therefore name is called " tree block ") be associated with CU.
In some instances, picture segmentation can be become multiple section by video encoder 20.Each in described section can comprise an integer number CU.In some cases, section comprises an integer number tree block.In other cases, the border of section can in tree block.
As part picture being performed to encoding operation, video encoder 20 can perform encoding operation to each section of picture.When video encoder 20 performs encoding operation to section, video encoder 20 can produce and the encoded data be associated of cutting into slices.The encoded data be associated can be referred to as " cutting into slices through decoding " with cutting into slices.
For producing through decoding section, video encoder 20 can perform encoding operation to each the tree block in section.When video encoder 20 performs encoding operation to tree block, video encoder 20 can produce through decoding tree block.The data of the encoded version representing tree block can be comprised through decoding tree block.
When video encoder 20 produce cut into slices through decoding time, video encoder 20 can perform encoding operation (such as, encode) according to raster scan order to the tree block in section.In other words, video encoder 20 can carry out the tree block of coded slice by following order: the far top row of crossing over the tree block in section is from left to right carried out, next lower rows of then crossing over tree block is from left to right carried out, by that analogy, until each in tree block in the encoded section of video encoder 20.
As the result according to raster scan order code tree block, can be encoded above given tree block and the tree block on the left side, but be not yet coded in the below of given tree block and the tree block on the right.Therefore, when encode given tree block time, video encoder 20 may can access the tree block of top and the left side by being coded in given tree block and the information that produces.But, when encode given tree block time, video encoder 20 may not access the tree block of below and the right by being coded in given tree block and the information that produces.
For producing through decoding tree block, video encoder 20 can recursively perform four points of tree segmentations described video block to be divided into the video block diminished gradually to the video block of tree block.Can be associated from different CU compared with each in small video block.Such as, the video block of tree block can be divided into four equal-sized sub-blocks, one or many person in described sub-block is divided into four equal-sized sub-sub-blocks etc. by video encoder 20.The CU split can be the CU that video block is divided into the video block be associated with other CU.Do not split CU and can be the CU that video block is not divided into the video block be associated with other CU.
One or more syntactic element in bit stream can the maximum times of video block of the divisible tree block of instruction video encoder 20.The video block of CU can be square in shape.Size (such as, the size of the CU) scope of the video block of CU can from 8 × 8 pixels until have the size (such as, setting the size of block) of the video block of maximum 64 × 64 pixels or larger tree block.
Video encoder 20 can perform encoding operation (such as, encoding) according to z scanning sequence each CU to tree block.In other words, video encoder 20 can by upper left CU, upper right CU, lower-left CU and then bottom right CU by this order encoding.When video encoder 20 is to when performing encoding operation through the CU of segmentation, video encoder 20 can be encoded the CU be associated with the sub-block of the video block of the CU through splitting according to z scanning sequence.In other words, video encoder 20 can by the CU be associated with top left sub-block, the CU be associated with upper right sub-block, the CU be associated with lower-left sub-block and the CU be then associated with bottom right sub-block by this order encoding.
As the result of the CU according to z scanning sequence code tree block, can encoded above given CU, upper left side, upper right side, the left side the CU of lower left.Not yet be coded in the bottom-right CU of given CU.Therefore, as the given CU of coding, video encoder 20 may can access the information produced by some CU of the adjacent given CU that encodes.But as the given CU of coding, video encoder 20 may not access the information produced by other CU of the adjacent given CU that encodes.
When video encoder 20 encodes undivided CU, video encoder 20 can produce one or more predicting unit (PU) for described CU.Each in the PU of CU can be associated with the different video block in the video block of CU.Video encoder 20 can produce for CU each PU through predicted video block.PU can be sample block through predicted video block.Video encoder 20 can use infra-frame prediction or inter prediction produce for PU through predicted video block.
When video encoder 20 use infra-frame prediction to produce PU through predicted video block time, video encoder 20 can based on the picture be associated with PU through decoded samples produce PU through predicted video block.If video encoder 20 use infra-frame prediction to produce the PU of CU through predicted video block, so CU is the CU through infra-frame prediction.When video encoder 20 use inter prediction to produce PU through predicted video block time, video encoder 20 can based on be different from the picture be associated with PU one or more picture through decoded samples produce described PU through predicted video block.If video encoder 20 uses inter prediction to produce the predicted video block of the PU of CU, then described CU is through inter prediction CU.
In addition, when video encoder 20 use inter prediction to produce for PU through predicted video block time, video encoder 20 can produce the movable information for described PU.Movable information for PU can indicate one or more reference block of described PU.Each reference block of PU can be the video block in reference picture.Reference picture can be the picture being different from the picture be associated with PU.In some cases, the reference block of PU also can be referred to as " reference sample " of PU.Video encoder 20 can based on PU reference block produce be used for described PU through predicted video block.
Produce at video encoder 20 one or more PU being used for CU after predicted video block, video encoder 20 can be used for the residual data of described CU based on producing through predicted video block of the PU for CU.Difference between can indicating for the sample in the original video block of predicted video block and CU of the PU of CU for the residual data of CU.
In addition, as the part undivided CU being performed to encoding operation, video encoder 20 can perform to pull over four points to the residual data of CU sets segmentation the residual data of CU to be divided into one or more residual data block (such as, residual video block) be associated with the converter unit of CU (TU).Each TU of CU can be associated from different residual video block.
Video decoder 20 can be applied one or more and convert to produce the transformation coefficient block (such as, the block of conversion coefficient) be associated with TU to the residual video block be associated with TU.Conceptually, transformation coefficient block can be two dimension (2D) matrix of conversion coefficient.
After generation transformation coefficient block, video encoder 20 can perform quantizing process to described transformation coefficient block.Quantize to be often referred to and conversion coefficient quantized with the amount that may reduce the data representing conversion coefficient thus the process of compression is further provided.Quantizing process can reduce the bit depth be associated with some or all in conversion coefficient.Such as, during quantizing, n bit map coefficient can be rounded down to m bit map coefficient, wherein n is greater than m.
Video encoder 20 can make each CU be associated with quantization parameter (QP) value.The QP value be associated with CU can determine how video encoder 20 quantizes the transformation coefficient block be associated with described CU.Video encoder 20 adjusts the degree of the quantification being applied to the transformation coefficient block be associated with CU by adjusting the QP value be associated with CU.
After video encoder 20 quantization transform coefficient block, video encoder 20 can produce the set of the syntactic element of the conversion coefficient represented in quantization transform coefficient block.Some in these syntactic elements of the entropy code operational applications that such as context-adaptive binary arithmetic decoding (CABAC) can operate by video encoder 20.Also other entropy decoding technique of such as content-adaptive variable-length decoding (CAVLC), (PIPE) decoding of probability interal separation entropy or other binary arithmetic decodings can be used.
The bit stream produced by video encoder 20 can comprise a series of network abstraction layer (NAL) unit.Each in described NAL unit can be the syntactic structure of the instruction containing the data type in NAL unit and the byte containing data.Such as, NAL unit can containing representing sequence parameter set, image parameters collection, data through the data of decoding section, supplemental enhancement information (SEI), access unit delimiter, padding data or another type.Data in NAL unit can comprise various syntactic structure.
Video Decoder 30 can receive the bit stream produced by video encoder 20.Described bit stream can comprise representing through decoding of the video data of being encoded by video encoder 20.When Video Decoder 30 receives bit stream, Video Decoder 30 can perform described bit stream and dissect operation.When Video Decoder 30 performs anatomy operation, Video Decoder 30 can extract syntactic element from described bit stream.Video Decoder 30 can based on the picture of the syntactic element reconstruction video data extracted from bit stream.Process based on syntactic element reconstruction video data can be substantially reciprocal with the process being performed to produce syntactic element by video encoder 20.
After Video Decoder 30 extracts the syntactic element be associated with CU, Video Decoder 30 can based on institute syntax elements produce the PU being used for CU through predicted video block.In addition, Video Decoder 30 can the transformation coefficient block that is associated with the TU of CU of inverse quantization.Video Decoder 30 can perform inverse transformation to rebuild the residual video block be associated with the TU of CU to transformation coefficient block.After producing predicted video block and rebuild residual video block, Video Decoder 30 can based on the video block rebuilding CU through predicted video block and residual video block.In this way, Video Decoder 30 can rebuild the video block of CU based on the syntactic element in bit stream.
According to embodiments of the invention, video encoder 20 can comprise increases sampling module 130, can be configured to decoding (such as, encoding) and define at least one basal layer and the video data in the scalable video coding scheme of at least one enhancement layer.Increase sampling module 130 and can increase sampling at least some video data, as the part of cataloged procedure, wherein perform in a self-adaptive manner and increase sampling, such as, by using at least partly based on the image filter set that the second layer of video information is selected relative to the phase shift information of ground floor, such as, as follows about described by Fig. 4 to 7.
Fig. 2 illustrates the block diagram that can be configured to implement the example video encoder 20 of technology of the present invention.Fig. 2 provides for illustrative purposes, and should not be regarded as extensively illustrating as the present invention and describe limit as described in technology.For illustrative purposes, the video encoder 20 when the present invention is described in HEVC decoding.But technology of the present invention is applicable to other coding standards or method.
In the example of figure 2, video encoder 20 comprises multiple functional unit.The functional unit of video encoder 20 comprises prediction module 100, remaining generation module 102, conversion module 104, quantization modules 106, inverse quantization module 108, inverse transform block 110, rebuilds module 112, filter module 113, through decode picture buffer 114 and entropy code module 116.Prediction module 100 comprises Inter prediction module 121, motion estimation module 122, motion compensating module 124 and intra-framed prediction module 126.In other example, video encoder 20 can comprise more, less or difference in functionality assembly.In addition, motion estimation module 122 and motion compensating module 124 can be highly integrated, but separate expression for illustrative purposes and in the example of figure 2.
Video encoder 20 can receiving video data.Video encoder 20 can from each provenance receiving video data.Such as, video encoder 20 can from video source 18 (Fig. 1) or another source receiving video data.Video data can represent a series of pictures.For coding video frequency data, video encoder 20 can perform encoding operation to each in picture.As part picture being performed to encoding operation, video encoder 20 can perform encoding operation to each section of picture.As part section being performed to encoding operation, video encoder 20 can perform encoding operation to the tree block in section.
As part tree block being performed to encoding operation, prediction module 100 can perform four points of tree segmentations described video block to be divided into the video block diminished gradually to the video block of tree block.Can be associated from different CU compared with each in small video block.Such as, the video block of tree block can be divided into four equal-sized sub-blocks, one or many person in described sub-block be divided into four equal-sized sub-blocks, by that analogy by prediction module 100.
The magnitude range of the video block be associated with CU can from 8 × 8 samples until maximum 64 × 64 samples or larger tree block size.In the present invention, " N × N " and " N takes advantage of N " are used interchangeably the sample-size of the video block referred in vertical and horizontal dimensions, and such as, 16 samples taken advantage of by 16 × 16 samples or 16.In general, 16 × 16 video blocks have 16 samples (y=16) in vertical direction, and have 16 samples (x=16) in the horizontal direction.Equally, N × N block generally has N number of sample in vertical direction, and has N number of sample in the horizontal direction, and wherein N represents nonnegative integral value.
In addition, as part tree block being performed to encoding operation, prediction module 100 can produce the hierarchy type four points of data tree structures for described tree block.Such as, tree block may correspond to the root node in quaternary tree data structure.If the video block of tree block is divided into four sub-blocks by prediction module 100, so described root node has four child nodes in described four points of data tree structures.Each in described child node corresponds to the CU be associated with the one in sub-block.If the one in sub-block is divided into four sub-blocks by prediction module 100, then the node corresponding to the CU be associated with sub-block can have four child nodes, and each wherein corresponds to the CU be associated with described sub-block.
Each node of four points of data tree structures can containing the syntax data (such as, syntactic element) for correspondence tree block or CU.Such as, the node in four points of trees can comprise division flag, and the video block that its instruction corresponds to the CU of described node whether divided (such as, dividing) becomes four sub-blocks.Syntactic element for CU can recursively define, and whether the video block that can be depending on CU splits into sub-block.The not divided CU of video block may correspond to the leaf node in four points of data tree structures.The data based on four points of data tree structures for correspondence tree block can be comprised through decoding tree block.
Video encoder 20 can not split CU to each of tree block and is performed encoding operation.When video encoder 20 performs encoding operation to undivided CU, video encoder 20 produces the data of the encoded expression representing undivided CU.
As part CU being performed to encoding operation, prediction module 100 can split the video block of CU between one or more PU of CU.Video encoder 20 and Video Decoder 30 can support various PU size.Assuming that the size of specific CU is 2N × 2N, video encoder 20 and Video Decoder 30 can support the PU size of 2N × 2N or N × N, and 2N × 2N, 2N × N, N × 2N, N × N, 2N × nU, nL × 2N, nR × 2N or similar symmetrical PU size inter prediction.Video encoder 20 and Video Decoder 30 also can support the asymmetric segmentation of the PU size for 2N × nU, 2N × nD, nL × 2N and nR × 2N.In some instances, prediction module 100 can perform geometry segmentation to split the video block of CU in the middle of the PU of CU along the border of not joining with the side of the video block of right angle and CU.
Inter prediction module 121 can perform inter prediction to each PU of CU.Inter prediction can provide time compress.In order to perform inter prediction to PU, motion estimation module 122 can produce the movable information for described PU.Motion compensating module 124 can based on be different from the picture (such as, reference picture) be associated with described CU picture movable information and through decoded samples produce for described PU through predicted video block.In the present invention, what produced by motion compensating module 124 can be referred to as through inter-prediction video block through predicted video block.
Section can be I section, P section or B section.Motion estimation module 122 and motion compensating module 124 can be depending on PU be in I section, P section or B section in and for CU PU perform different operating.In I section, all PU are through infra-frame prediction.Therefore, if PU is in I section, then motion estimation module 122 and motion compensating module 124 do not perform inter prediction to PU.
If PU is in P section, then the picture containing described PU is associated with the reference picture list being referred to as " list 0 ".Each in reference picture in list 0 is containing the sample of inter prediction that can be used for other picture.When motion estimation module 122 performs motion estimation operation about the PU in P section, motion estimation module 122 can reference picture in search listing 0 to find the reference block for PU.The reference block of PU can be one group of sample of the sample in the video block the most closely corresponding to PU, such as, and sample block.Motion estimation module 122 can use multiple measuring to determine that one group of sample in reference picture corresponds to the tightness degree of the sample in the video block of PU.Such as, by absolute difference summation (SAD), difference of two squares summation (SSD) or other difference measurement, motion estimation module 122 determines that one group of sample in reference picture corresponds to the tightness degree of the sample in the video block of PU.
After the reference block identifying the PU in P section, motion estimation module 122 reference picture that can produce in instruction list 0 contains the reference key of the motion vector of reference block and the space displacement between instruction PU and reference block.In various example, the accuracy that estimation module 122 can change produces motion vector.Such as, motion estimation module 122 1/4th sample precisions, 1/8th sample precisions or other fractional samples accuracy can produce motion vector.When fractional samples accuracy, reference block value can from the integer position sample value interpolation reference picture.The movable information that motion estimation module 122 can be PU with reference to index and motion vector output.Motion compensating module 124 can produce based on the reference block of the movable information identification by PU PU through predicted video block.
If PU is in B section, the picture so containing PU can be associated with two reference picture list being referred to as " list 0 " and " inventory 1 ".In some instances, the picture containing B section can be associated with for list 0 is combined with the list of the combination of list 1.
In addition, if PU is in B section, so motion estimation module 122 can perform single directional prediction or bi-directional predicted for PU.When motion estimation module 122 performs single directional prediction for PU, motion estimation module 122 can reference picture in search listing 0 or list 1 to find the reference block for described PU.Motion estimation module 122 then can produce the motion vector of the reference key of the reference picture containing reference block in instruction list 0 or list 1 and the space displacement between instruction PU and described reference block.The exportable reference key of motion estimation module 122, prediction direction designator and motion vector, as the movable information for described PU.Prediction direction designator can indicate the reference picture in reference key instruction list 0 or list 1.The reference block that motion compensating module 124 can indicate based on the movable information by PU produce PU through predicted video block.
When motion estimation module 122 performs bi-directional predicted for PU, motion estimation module 122 can reference picture in search listing 0 to find the reference block for described PU, and also can reference picture in search listing 1 to find another reference block for described PU.Motion estimation module 122 then can produce the reference key of the reference picture containing reference block in instruction list 0 and list 1 and indicate the motion vector of the space displacement between described reference block and PU.The reference key of PU and the motion vector movable information as PU can export by motion estimation module 122.The reference block that motion compensating module 124 can indicate based on the movable information by PU and produce PU through predicted video block.
In some cases, the full set of the movable information of PU is not outputted to entropy code module 116 by motion estimation module 122.On the contrary, motion estimation module 122 can transmit the movable information of PU with reference to the movable information of another PU.Such as, motion estimation module 122 can determine that the movable information of PU is fully similar to the movable information of adjacent PU.In this example, motion estimation module 122 can indicate a value in the syntactic structure be associated with PU, and described value indicates PU to have the movable information identical with adjacent PU to Video Decoder 30.In another example, motion estimation module 122 can identify adjacent PU and motion vector difference (MVD) in the syntactic structure be associated with PU.Difference between the motion vector of difference motion vector instruction PU and the motion vector of indicated adjacent PU.Video Decoder 30 can use the motion vector of indicated adjacent PU and difference motion vector to determine the motion vector of PU.By the movable information when transmitting the movable information of the 2nd PU with reference to a PU, video encoder 20 may can use less bits to transmit the movable information of the 2nd PU.
As part CU being performed to encoding operation, infra-frame prediction module 126 can perform infra-frame prediction to the PU of CU.Infra-frame prediction can provide space compression.When the intra-framed prediction module 126 couples of PU perform infra-frame prediction, intra-framed prediction module 126 can produce prediction data for PU based on other PU in same picture through decoded samples.Prediction data for PU can comprise through predicted video block and various syntactic element.PU during intra-framed prediction module 126 can be cut into slices to I, P cuts into slices and B cuts into slices performs infra-frame prediction.
For performing infra-frame prediction to PU, intra-framed prediction module 126 can use multiple intra prediction mode to produce the many groups prediction data for PU.When intra-framed prediction module 126 uses intra prediction mode to produce one group of prediction data for PU, intra-framed prediction module 126 can expand sample across the video block of PU from the video block of adjacent PU on the direction be associated with intra prediction mode and/or gradient.Assuming that adopt coding orders from left to right, from top to bottom for PU, CU and tree block, adjacent PU can above described PU, upper right side, upper left side or left.Infra-frame prediction module 126 can be depending on the size of PU and uses a various number intra prediction mode, such as, and 33 direction intra prediction modes.
Prediction module 100 can select to be used for the prediction data of PU in the middle of the prediction data produced for PU by motion compensating module 124 or the prediction data produced for PU by intra-framed prediction module 126.In some instances, prediction module 100 selects for PU based on the rate/distortion degree degree of the set of prediction data prediction data.
If prediction module 100 selects the prediction data produced by intra-framed prediction module 126, so prediction module 100 can transmit the intra prediction mode in order to produce for the prediction data of PU, such as, and institute's selected frame inner estimation mode.Prediction module 100 can transmit institute's selected frame inner estimation mode in every way.Such as, likely selected frame inner estimation mode is identical with the intra prediction mode of adjacent PU.In other words, the intra prediction mode of adjacent PU can be the most probable pattern for current PU.Therefore, prediction module 100 can produce syntactic element to indicate institute's selected frame inner estimation mode identical with the intra prediction mode of adjacent PU.
After prediction module 100 selects the prediction data of the PU being used for CU, remaining generation module 102 produces the residual data for CU by the predicted video block of the PU deducting CU from the video block of CU.The residual data of CU can comprise the 2D residual video block of the different sample components corresponding to the sample in the video block of CU.Such as, residual data can to comprise corresponding to the luminance component of the sample in the luminance component of the sample in predicted video block of the PU of CU and the original video block of CU between the residual video block of difference.In addition, the residual data of CU can to comprise corresponding to the chromatic component of the sample in the chromatic component of the sample in predicted video block of the PU of CU and the original video block of CU between the residual video block of difference.
Prediction module 100 can perform four points of tree segmentations so that the residual video block comminute of CU is become sub-block.Each unallocated residual video block can be associated from the different TU of CU.The size of the residual video block be associated with the TU of CU and position can or can not based on the size of the video block be associated with the PU of CU and positions.Four sub-tree structure being called as " remaining four points of trees " (RQT) can comprise the node be associated with each in residual video block.The TU of CU may correspond to the leaf node in RQT.
Conversion module 104 produces by one or more conversion is applied to the residual video block that is associated with each TU of CU one or more transformation coefficient block being used for described TU.Each in described transformation coefficient block can be the 2D matrix of conversion coefficient.Various conversion can be applied to the residual video block be associated with TU by conversion module 104.Such as, conversion module 104 can to residual video block application discrete cosine transform (DCT) be associated with TU, directional transforms or conceptive similar conversion.
After conversion module 104 produces the transformation coefficient block be associated with TU, quantization modules 106 can quantize the conversion coefficient in described transformation coefficient block.Quantization modules 106 can quantize the transformation coefficient block be associated with the TU of CU based on the QP value be associated with CU.
Video encoder 20 can make QP value be associated with CU in every way.Such as, video encoder 20 can perform rate distortion analysis to the tree block be associated with CU.In rate-distortion is analyzed, video encoder 20 produces described the multiple of tree block by performing repeatedly encoding operation to tree block and represents through decoding.When the encoded expression of the difference of video encoder 20 generation tree block, video encoder 20 can make different Q P value be associated with CU.When given QP value is associated with the CU in decoding represents of the tree block with lowest order speed and distortion measure, video encoder 20 can transmit described given QP value and be associated with CU.
Inverse quantization module 108 and inverse transform block 110 can apply inverse quantization respectively and contravariant changes to transformation coefficient block to rebuild residual video block from transformation coefficient block.Rebuild module 112 can by through rebuild residual video block add to the corresponding sample that carrys out one or more predicted video block that free prediction module 100 produces with produce be associated with TU through reconstruction video block.By rebuilding the video block of each TU being used for CU in this way, video encoder 20 can rebuild the video block of CU.
After module 112 rebuilds the video block of CU in reconstruction, filter module 113 can perform deblocking operation to reduce the one-tenth block vacation shadow in the video block that is associated with described CU.After one or more deblocking operation of execution, filter module 113 can being stored in CU in decode picture buffer 114 through reconstruction video block.Motion estimation module 122 and motion compensating module 124 can use and perform inter prediction containing the reference picture through reconstruction video block to the PU of picture subsequently.In addition, intra-framed prediction module 126 can use and perform infra-frame prediction through reconstruction video block to other PU of being in the picture identical with CU in decode picture buffer 114.
Entropy code module 116 can receive data from other functional unit of video encoder 20.Such as, entropy code module 116 from quantization modules 106 receiving conversion coefficient block, and can receive syntactic element from prediction module 100.When entropy code module 116 receives data, entropy code module 116 can perform the operation of one or more entropy code to produce through entropy code data.Such as, video encoder 20 can perform context-adaptive variable-length decoding (CAVLC) operation, CABAC operation to data, can change to the decoded operation of variable (V2V) length, operate based on the context adaptive binary arithmetically decoding (SBAC) of grammer, the entropy code of probability interval segmentation entropy (PIPE) decoded operation or another type operates.The exportable bit stream comprised through entropy code data of entropy code module 116.
As part data being performed to entropy code operation, entropy code module 116 can select context model.If entropy code module 116 is just performing CABAC operation, then context model can indicate specific binary number to have the probability Estimation of particular value.When CABAC, term " binary number " is in order to refer to the position of the binarization version of syntactic element.
Fig. 3 illustrates the block diagram that can be configured to implement the example video encoder 30 of technology of the present invention.Fig. 3 provides for illustrative purposes, and also not as the present invention extensively illustrates and describes to limit described technology.For illustrative purposes, the Video Decoder 30 when the present invention is described in HEVC decoding.But technology of the present invention is applicable to other coding standards or method.
According to embodiments of the invention, Video Decoder 30 can comprise minimizing sampling module 170, and it can be configured to decoding (such as, decoding) and define at least one basal layer and the video data in the scalable video coding scheme of at least one enhancement layer.Reduce sampling module 170 and can reduce sampling at least some video data, as the part of decode procedure, wherein perform in a self-adaptive manner and reduce sampling, such as, by using the image filter set selected based on the phase shift information be associated with video data at least partly, such as, as follows about described by Fig. 4 to 7.
In the example of fig. 3, Video Decoder 30 comprises multiple functional unit.The functional unit of Video Decoder 30 comprises entropy decoder module 150, prediction module 152, inverse quantization module 154, inverse transform block 156, rebuilds module 158, filter module 159 and through decode picture buffer 160.Prediction module 152 comprises motion compensating module 162 and intra-framed prediction module 164.In some instances, Video Decoder 30 can perform with time secondary substantially reciprocal decoding of the coding described by the video encoder 20 about Fig. 2 all over secondary.In other example, Video Decoder 30 can comprise more, less or difference in functionality assembly.
Video Decoder 30 can receive the bit stream comprising encoded video data.Described bit stream can comprise multiple syntactic element.When Video Decoder 30 receives bit stream, entropy decoder module 150 can perform described bit stream and dissect operation.As bit stream being performed to the result dissecting operation, entropy decoder module 150 can extract syntactic element from described bit stream.As perform dissect operation part, entropy decoder module 150 can entropy decoding bit stream in through entropy code syntactic element.Prediction module 152, inverse quantization module 154, inverse transform block 156, reconstruction module 158 and filter module 159 can perform reconstruction operation, and it produces through decode video data based on the syntactic element extracted from bit stream.
As discussed above, bit stream can comprise a series of NAL unit.The NAL unit of bit stream can comprise sequence parameter set NAL unit, image parameters collection NAL unit, SEI NAL unit etc.As bit stream being performed to the part dissecting operation, entropy decoder module 150 can perform anatomy operation, and described anatomy operation is extracted from sequence parameter set NAL unit and entropy decoding sequence parameter set, to be extracted and entropy decoding picture parameter set, to extract and entropy decoding SEI data etc. from SEI NAL unit from image parameters collection NAL unit.
In addition, the NAL unit of bit stream can comprise through decoding section NAL unit.As bit stream being performed to the part dissecting operation, entropy decoder module 150 can perform anatomy operation, and described anatomy operation is to extract through decoding section NAL unit and entropy decoding is cut into slices through decoding.Each in decoding section can comprise section head and slice of data.Section head can containing the syntactic element about section.Syntactic element in section head can comprise the syntactic element identifying the image parameters collection be associated with the picture containing described section.Entropy decoder module 150 can perform entropy decode operation (such as, CABAC decode operation) to through the syntactic element write in code section head, to recover head of cutting into slices.
As the part extracting slice of data from NAL unit of cutting into slices through decoding, entropy decoder module 150 can perform from the parse operation extracting syntactic element through decoding CU slice of data.The syntactic element extracted can comprise the syntactic element be associated with transformation coefficient block.Entropy decoder module 150 then can perform CABAC decode operation to some in syntactic element.
After entropy decoder module 150 performs anatomy operation to undivided CU, Video Decoder 30 can perform reconstruction operation to undivided CU.For performing reconstruction operation to undivided CU, Video Decoder 30 can perform reconstruction operation to each TU of CU.By performing reconstruction operation to each TU of CU, Video Decoder 30 can rebuild the residual video block be associated with CU.
Perform the part of reconstruction operation as to TU, inverse quantization module 154 can the transformation coefficient block that is associated with TU of inverse quantization (such as, de-quantization).The mode that inverse quantization module 154 can be similar to for the de-quantization process defined proposed by HEVC or by H.264 decoding standard carrys out inverse quantization transformation coefficient block.Inverse quantization module 154 can use the quantization parameter QP calculated for the CU of transformation coefficient block by video encoder 20 to determine quantization degree, and similarly, for the inverse quantization degree that inverse quantization module 154 is applied.
After inverse quantization module 154 inverse quantization transformation coefficient block, inverse transform block 156 can produce the residual video block of the TU for being associated with transformation coefficient block.Inverse transformation can be applied to transformation coefficient block to produce the residual video block of described TU by inverse transform block 156.Such as, inverse DCT, inverse integer transform, anti-card can be neglected Nan-La Wei (Karhunen-Loeve) conversion (KLT), despining conversion, opposite orientation conversion or another inverse transformation and be applied to transformation coefficient block by inverse transform block 156.In some instances, inverse transform block 156 can based on the inverse transformation determining to be applicable to transformation coefficient block from transmitting of video encoder 20.In these examples, inverse transform block 156 can determine the inverse transformation of the tree block be associated with transformation coefficient block based on the conversion transmitted at the root node place in four points of trees.In other example, inverse transform block 156 can infer inverse transformation from one or more decoding characteristic (such as, block size, decoding mode or fellow).In some instances, inverse transform block 156 can apply the inverse transformation of cascade.
In some instances, motion compensating module 162 by based on interpolation filter perform interpolation and improve PU through predicted video block.For the identifier being used for carrying out with sub-sample accuracy the interpolation filter of motion compensation can be included in syntactic element.Motion compensating module 162 can use the identical interpolation filter used during producing the predicted video block of PU by video encoder 20 to carry out the interpolate value of the sub-integral sample of computing reference block.Motion compensating module 162 can be determined the interpolation filter that used by video encoder 20 and use described interpolation filter to produce through predicted video block according to received syntactic information.
If PU be use infra-frame prediction encode, so intra-framed prediction module 164 can perform infra-frame prediction with produce for PU through predicted video block.Such as, intra-framed prediction module 164 can determine the intra prediction mode of PU based on the syntactic element in bit stream.Bit stream can comprise intra-framed prediction module 164 can in order to determine the syntactic element of the intra prediction mode of PU.
In some cases, syntactic element can indicate intra-framed prediction module 164 to use the intra prediction mode of another PU to determine the intra prediction mode of current PU.Such as, the intra prediction mode of current PU identical with the intra prediction mode of adjacent PU can be possible.In other words, the intra prediction mode of adjacent PU can be the most probable pattern for current PU.Therefore, in this example, bit stream can comprise little syntactic element, and the intra prediction mode of described little syntactic element instruction PU is identical with the intra prediction mode of adjacent PU.Intra-framed prediction module 164 can then use the next generation of the video block based on the adjacent PU in space of described intra prediction mode for the prediction data (such as, through forecast sample) of described PU.
Rebuild the video block rebuilding CU through predicted video block (such as, intra-prediction data or inter prediction data, if be suitable for) that module 158 can use the PU of residual video block and the CU be associated with the TU of CU.Therefore, Video Decoder 30 can produce through predicted video block and residual video block based on the syntactic element in bit stream, and can produce video block based on through predicted video block and residual video block.
After reconstruction module 158 rebuilds the video block of CU, filter module 159 can perform deblocking operation to reduce the false shadow of the one-tenth block be associated with described CU.Perform deblocking operation to reduce the false movie queen of the one-tenth block be associated with described CU at filter module 159, the video block of CU can be stored in decode picture buffer 160 by Video Decoder 30.Reference picture can be provided for motion compensation subsequently, infra-frame prediction through decode picture buffer 160 and present in display unit (such as, the display unit 32 of Fig. 1).For example, Video Decoder 30 can perform infra-frame prediction or inter prediction based on the video block in decode picture buffer 160 to the PU of other CU.
The motion compensation loop of HEVC is similar to the motion compensation loop H.264/AVC, such as, and present frame reconstruction can equal quantize coefficient r add time prediction P:
I ^ = r + P - - - ( 1 )
Wherein P instruction is for the single directional prediction of P frame or bi-directional predicted for B frame.
There are roughly 35 available intra prediction modes in HEVC.In certain embodiments, present frame reconstruction also can be expressed by equation (1), wherein P indicate infra-frame prediction.Fig. 4 provides the figure of the embodiment of the scalability illustrated in different dimensions.As show in the figure, scalability can be realized in three dimensions.Such as, about the time, the frame per second with 7.5Hz, 15Hz or 30Hz can be supported by time scalability (T).About spatial scalability (S), the different resolution of such as QCIF, CIF and 4CIF can be realized.For each concrete spatial resolution and frame per second, SNR (Q) layer can be added to improve picture quality.Once with scalable mode video content, extractor tool just can be used according to application to require to adjust the content of transmission, and described application has required to can be depending on (such as) transmission channel and/or other parameter.In the embodiment of showing in the diagram, each cube object can containing the picture with identical in fact frame per second (time level), spatial resolution and SNR layer.In certain embodiments, the expression of improvement is reached by adding cube (picture) in one or more dimension.In addition, when realize two, three or even more scalabilities time, can support combine scalability.
According to HEVC SVC specification, the picture with lowest spatial and quality layers can with H.264/AVC compatible, and picture at minimum time level place can formation time basal layer, and the picture that described time basis layer can be used on higher time level place strengthens.Except H.264/AVC compatible layer, also can add some spaces and/or SNR enhancement layer to provide space and/or quality scalability.As used herein, SNR scalability also can be referred to as quality scalability.It is upper scalable that each space or SNR enhancement layer self can be the time, has the time scalability structure identical with H.264/AVC compatible layer.For a space or SNR enhancement layer, its lower level depended on also can be referred to as the basal layer of this particular space or SNR enhancement layer.
Fig. 5 illustrates the embodiment of SVC decoding architecture.The picture (picture with QCIF resolution in layer 0 and layer 1) with lowest spatial and quality layers can with H.264/AVC compatible.Wherein, those picture formation time basal layers of minimum time level, as in the layer 0 of Fig. 5 show.The picture that this time basal layer (layer 0) can be used for higher time level (such as, layer 1) strengthens.Except H.264/AVC compatible layer, also can add some spaces and/or SNR enhancement layer to provide space and/or quality scalability.For example, enhancement layer can be the CIF with resolution identical with layer 2 in fact and represents.In described embodiment, layer 3 is SNR enhancement layer.As demonstrated, space or SNR enhancement layer self can be scalable in time, have time scalability structure identical with H.264/AVC compatible layer in fact.In addition, enhancement layer can strengthen spatial resolution and frame per second.Such as, layer 4 can provide 4CIF enhancement layer, and frame per second is increased to 30Hz from 15Hz by further.
As in Fig. 6 A show, on bitstream order continuously, and an access unit can be formed in the case of svc through decoding section in single same time instance.Those SVC access units then can follow decoding order, and described decoding order can be different from display order, and are such as determined by time prediction relation.
Some of SVC are functional from H.264/AVC inheriting.Compared with previous scalable standard, some advantage (that is, inter-layer prediction and single-loop decoding) of HEVC SVC is hereafter being discussed in more detail.
single-loop decoding
In order to keep low-complexity decoder, in SVC, single-loop decoding is essential.In single-loop decoding, the layer of each support of can decoding by single movement compensation cycle.In order to reach this, only allowing to use inter-layer intra prediction for enhancement layer macro block, for enhancement layer macro block, being positioned at the reference layer signal in same place through intra-coding.Need further to use affined infra-frame prediction to carry out all layers of decoding in order to inter-layer prediction higher level.
inter-layer prediction
SVC provides inter-layer prediction based on texture, residual sum motion for space and/or SNR scalability.Spatial scalability in SVC can relate to any resolution between two layers.In certain embodiments, SNR scalability realizes by by coarse-grain scalability (CGS) or medium size scalability (MGS).In SVC, different spaces or CGS layer can belong to different interdependent layer (such as, being indicated by the dependency_id in NAL unit header), and different MG S layer can in same interdependent layer.Single-phase according to layer can comprise corresponding to quality enhancement layer with the quality layers be associated to the quality_id of high value from 0.In SVC, inter-layer prediction method can be utilized to reduce interlayer redundancy.Hereafter various inter-layer prediction method is described in more detail.
inter-layer intra prediction
In SVC, use the decoding mode of inter-layer intra prediction can be called " IntraBL " pattern in SVC.In order to realize single-loop decoding, inter-layer intra prediction can only can be used about MB, and MB has the MB being positioned at same place in the basal layer being decoded as affined frame mode.Affined frame mode MB through intra-coding, and not with reference to from the sample of the adjacent MB through interframe decoding.Decode in available embodiment at multi cycle, this restriction being co-located at the mode of the basal layer block in same place about decoding can not be there is.In certain embodiments, sampling can be increased according to spatial resolution and be positioned at same place MB.
Fig. 6 B illustrates the schematic diagram of the example 400 of BL model prediction in frame.In particular, enhancement layer 420 and basal layer 410 are positioned at same place.Block 412 in basal layer 410 may correspond to the block 422 in enhancement layer.In frame in BL pattern, the texture of corresponding basal layer block 412 can be used to predict the texture in block 422.If strengthen image to have the size larger than base layer image, so basal layer block 412 may need to increase sampling.Convertible, quantize and entropy code predicated error (it is called residual error).
inter-layer residual prediction
Use in the embodiment of residual prediction at instruction MB, the basal layer being positioned at same place for inter-layer prediction can have various constraint.Such as, this MB may be needed to be interframe MB; In addition, may be necessary or need the residual error increasing sampling MB according to associated spatial resolution.Residual difference between enhancement layer and basal layer can through decoding and for predicting purposes.Such as, the present frame of enhancement layer reconstruction can equal enhancement layer remove quantization parameter r e, from the time prediction P of enhancement layer ewith the quantification normalization residual coefficients r of basal layer bsummation, as in the following equation (2) provide.
I ^ e = r e + P e + - r b - - ( 2 )
to the increase sampling process of base layer pictures
About spatial scalability, basal layer and enhancement mode layer can have different spatial resolutions.Therefore, may be necessary or need to utilize the increase about basal layer to sample filtering to mate the space aspect ratio of enhancement mode layer.Such as, the set of increase sampling filter can be used for basal layer, its median filter is selected from the set according to fraction pixel skew (such as, phase place).In certain embodiments, phase place can be calculated based on space aspect ratio and the generic pixel grid position between basal layer and enhancement mode layer picture.
Fig. 6 C (b) is illustrated in the embodiment H.264/SVC increasing the relative brightness sampling grid of basal layer in sample program and enhancement layer for binary spatial scalability.In certain embodiments, as shown, it is 0.25 and 0.75 that the fraction pixel between enhancement mode layer and base layer pictures offsets.In H.264/SVC standard, 1/16 accuracy can quantize described phase place, it can cause 16 filters in filter set.
In certain embodiments, single increase sampling filter can be applied to base layer pictures to produce the content through convergent-divergent being used for inter-layer prediction.Although single increase sampling filter may be enough in some cases, about multiple inter-layer prediction type, it may not be enough or desirable.In certain embodiments, the set making full use of multiple increase sampling filter or filter improves the decoding performance of some inter-layer prediction method further, including (for example) in intraBL, difference domain frame and inter prediction and/or residual prediction.Below these concepts are disclosed in more detail.
In certain embodiments, multiple increase sampling filter is used for spatial scalability purposes by video decoding system, and multiple pre-processing filter is used for SNR scalability purposes.Such as, can select to process the special filter of the basal layer sample being positioned at same place at least partly based on the type of the inter-layer prediction just used.In certain embodiments, filter set can through offline design and hard decoding in systems in which.Alternatively, filter set can derive according to decoded content and send in bit stream.In addition, the phase shift used in minimizing sampling process can be transmitted in bit stream.
Although propose some embodiment disclosed herein when double-deck scalable video coding, but those skilled in the art can understand, the embodiment of announcement can be expanded to multilayer situation, such as, simple layer has the situation of multiple basal layer and/or enhancement layer.
increasing the mapping of the sample position in sampling process
Fig. 6 C and 6D shows that application reduces the video of sampling and reduces sampling plan from two of the different sample position mapping methods between original video.Such as, the square comprising square 510,512 and 514 may correspond to the position in enhancement layer pixels.The circle comprising circle 520 and 522 may correspond to the position in base layer pixels.Reduce sampling for brightness, the examples show of two sample position is in Fig. 6 C (a) and 6C (b).Being referred to as " zero phase reduces sampling " Fig. 6 C (a) in, the space length between enhancement layer pixels 510 with base layer pixels 520 is zero (" phase place " can refer to the space length between upper left quarter sample in enhancement layer and the corresponding upper left quarter sample in basal layer substantially).In Fig. 6 C (b) being referred to as " symmetrical minimizing samples ", 4 × 4 arrays of the luma samples in enhancement layer are through reducing 2 × 2 arrays sampled in basal layer, and two arrays have identical center.
When through convergent-divergent, base layer pictures can have different size from enhancement-layer pictures.Such as, in 2X spatial scalability as illustrated in fig. 6 c, the width of base layer pictures is the half of the width of enhancement-layer pictures, and the height of base layer pictures is the half of the height of enhancement-layer pictures.In an example, basal layer sequence is produced by reducing sampling enhancement layer.In order to perform inter-layer texture prediction, increase sampling can be applied to through rebuilding base layer pictures.
The minimizing sampling plan of showing in Fig. 6 C (a) and Fig. 6 D (a) can in order to produce the basal layer content use test sequence in HEVC-SVC, wherein by the upper left pixel grid alignment of the picture and original image through reducing sampling.In this example, between the picture through reducing sampling and original image the phase shift of upper left quarter grid be zero.Zero phase relation is there is between enhancement layer and basal layer.
The minimizing sampling plan of showing in Fig. 6 C (b) and Fig. 6 D (b) be H.264-SVC in acquiescence reduce sampling process, the phase shift between the picture wherein through reducing sampling and original image is distributed in all pixel grids comparably.Universe phase shift between picture through reducing sampling and original image is zero.In Fig. 6 C (b), between enhancement layer and basal layer, there is symmetrical phase relation.
In certain embodiments, the scheme illustrated by Fig. 6 C (a) and 6D (a) is sampled in filtering for reducing.Horizontal coordinate calculates by following equation:
Wherein, x_enhance is the horizontal coordinate of the sample in pending enhancing picture, the horizontal coordinate of the sample of its correspondence based on x_base in layer.As above mentioning, x_base can quantize by the accuracy of 1/16.Represent that the value of integer position equals through quantized value divided by 16, and represent that the phase place of fractional position equals through quantized value moding 16.
In certain embodiments, the scheme illustrated by Fig. 6 C (b) and 6D (b) is sampled in filtering for reducing.Horizontal coordinate calculates by following equation:
In the design of actual video coding decoder, can by equation (1) and (2) integer (such as, forcing as integer) to reduce computation complexity.
At the x calculated by equation (1) basewith the x calculated by equation (2) basebetween can there is constant difference.In one embodiment, can transmit in bit stream in the information (such as, phase information) increasing the grid potential difference used in sampling process.This information usually can be transmitted as SPS, PPS, APS or section header syntax unit.
In an embodiment, the program relevant with equation (2) can be appointed as default method.Value M can be SPS, PPS of transmitting or section head to represent extra shift.Following equation (3) or equation (4) are in order to calculate new x ' base.
x′ base=x base+M 4
In an embodiment, the program relevant with equation (3) can be appointed as default method.Value M can be SPS, PPS of transmitting or section head to represent that extra phase is revised.
In another embodiment, SPS, PPS of transmitting of information or section head are flag.Can two any one reduced in sampling plan illustrated in application drawing 6C and 6D be indicated to produce through reducing the base layer pictures sampled by usage flag.The position mapping method using correspondence in sampling process can increased.
Similarly, above method can be applicable to the mapping of vertical direction sample position.In an embodiment, can transmit for horizontal and vertical direction the above position map information transmitted in SPS, PPS and section head independently.
As another embodiment, the above position map information transmitted in SPS, PPS and section head can only transmit once, and all for horizontal and vertical direction.
May need to reduce sample position (such as, phase information) in increase sampling process.Such as, as institute in Fig. 6 C (a) shows, if given zero phase reduces sampling, so enhancement layer pixels (such as, 510 and 512) needs increase sampling from base layer pixels (such as, 520 and 522).Since 510 and 520 at same position (in this example), the phase place so for generation of the increase sampling filter of 510 is 0.And 512 is the mid point of 520 and 522.Therefore the phase place for generation of the increase sampling filter of 512 is 1/2.In a word, for the 2X spatial scalability with zero phase minimizing sampling filter, the phase place increasing sampling filter should be 0 and 1/2.Use similar analysis, for as in Fig. 6 C (b) the 2X spatial scalability reduced in symmetry in sampling shown, increase the phase place of sampling filter and should be 1/4 and 3/4.
In certain embodiments, perform to transmit and reduce sample position (such as, phase place) information, therefore coding decoder can select correct increase sampling phase/position.This transmits flag to realize by using the high-order grammer of such as video parameter collection (VPS), image parameters collection (PPS), sequence parameter set (SPS), section head etc.This flag by be integrated in for increasing sampling phase place calculating process in.
Fig. 7 illustrates the embodiment of the process 700 being used for coded video information.Described process can comprise and obtains the ground floor of video information, as at block 710 place illustrate.Described process can comprise further determine video information the second layer relative to the phase shift information of ground floor, as at block 720 place illustrate.Described process 700 can comprise further selects image filter based on phase shift information at least partly.In addition, at block 740, described process 700 can comprise further use selected filter produce ground floor through revision.Such as, in certain embodiments, layer based on ground floor, and ground floor through revision can be ground floor through increase sampled version, wherein selected digital image filter for increase sampling filter.Alternatively, in certain embodiments, ground floor is enhancement layer, and ground floor through revision can be ground floor through reduce sampled version, wherein selected digital image filter for reduce sampling filter.
In an embodiment, in SPS, transmit the flag of instruction phase information.In other embodiments, other high-order grammer of such as PPS and VPS can be used to transmit flag.Table 1 illustrates the example collection of the flag according to embodiment.
Table 1
Whether luma_phase_flag can specify the position of the luma samples in current layer picture symmetrical with the position of the luma samples for can be used for the layer frame of the layer picture of inter-layer prediction.Such as, luma_phase_flag setting equals 1 can specify the position of the luma samples in current layer picture and the positional symmetry (such as, as in Fig. 6 C (b) show) for can be used for the luma samples of the layer frame of the layer picture of inter-layer prediction.In addition, luma_phase_flag equals 0 and can specify the position of the upper left quarter luma samples in current layer picture and the position of the upper left quarter luma samples of the layer frame of the layer picture for can be used for inter-layer prediction in vertical and horizontal both direction, have zero phase-shift (such as, as in Fig. 6 C (a) show).When luma_phase_flag does not exist, deducibility its equal 0.
Chroma_phase_x_flag can specify the horizontal phase shift of the chromatic component in units of the half of the luma samples of frame or layer frame.When chroma_phase_x_flag does not exist, deducibility its equal 0.Phase shift can refer to the space displacement between upper left quarter chroma sample and upper left quarter luma samples.
Chroma_phase_y can specify the vertical phase shift of the chromatic component in units of the half of the luma samples of frame or layer frame.When chroma_phase_y does not exist, deducibility its equal 0.The value of chroma_phase_y (can comprise 0 and 2) in the scope of 0 to 2.Phase shift can refer to the space displacement between upper left quarter chroma sample and upper left quarter luma samples.
Ref_layer_chroma_phase_x_flag can specify the half of the luma samples of the layer frame for the layer picture that can be used for inter-layer prediction to be the horizontal phase shift of the chromatic component of unit.When ref_layer_chroma_phase_x_flag does not exist, deducibility its equal chroma_phase_x_flag.Phase shift can refer to the space displacement between upper left quarter chroma sample and upper left quarter luma samples.
Ref_layer_chroma_phase_y can specify the half of the luma samples of the layer frame for the layer picture that can be used for inter-layer prediction to be the vertical phase shift of the chromatic component of unit.When ref_layer_chroma_phase_y does not exist, deducibility its equal chroma_phase_y.The value of ref_layer_chroma_phase_y (can comprise 0 and 2) in the scope of 0 to 2.Phase shift can refer to the space displacement between upper left quarter chroma sample and upper left quarter luma samples.
In certain embodiments, variable refWL, refHL, scaledWL and scaledHL can as given a definition:
RefWL: the width of the reference layer picture with regard to luma samples
RefHL: the height of the reference layer picture with regard to luma samples
ScaledWL: the width of the current layer picture with regard to luma samples
ScaledHL: the height of the current layer picture with regard to luma samples
In certain embodiments, variable refWC, refHC, scaledWC and scaledHC can as given a definition:
RefWC: the width of the reference layer picture with regard to chroma sample
RefHC: the height of the reference layer picture with regard to chroma sample
ScaledWC: the width of the current layer picture with regard to chroma sample
ScaledHC: the height of the current layer picture with regard to chroma sample
In certain embodiments, variable PhaseXL, PhaseYL, refPhaseXC, refPhaseYC, phaseXC and phaseYC derives by following:
PhaseXL=2*luma_phase_flag
PhaseYL=2*luma_phase_flag
refPhaseXL=2*luma_phase_flag
refPhaseYL=2*luma_phase_flag
phaseXC=chroma_phase x_flag+luma_phase_flag
phaseYC=chroma_phase_y+luma_phase_flag
refPhaseXC=ref_layer_chroma_phase_x_flag+luma_phase_flag
refPhaseYC=ref_layer_chroma_phase_y+luma_phase_flag
In certain embodiments, variable shiftX and shifty derives by following:
shiftX=16
shifty=16
In certain embodiments, variable refW, refH, phaseX, phaseY, scaledW, scaledH, refPhaseX and refPhaseY are, refWL, refHL, phaseXL, phaseYL, scaledWL, scaledHL, refPhaseXL and refPhaseYL can define for luma samples, and refWC, refHC, phaseXC, phaseYC, scaledWC, scaledHC, refPhaseXC and refPhaseYC can define for chroma sample.
Such as, variable scaleX and scaleY derives by following:
scaleX=((refW<<shiftX)+(scaledW>>1))/scaledW
scaleY=((refH<<shifty)+(scaledH>>1))/scaledH
In addition, variable addX and deltaX derives by following:
addX=(((refW*phaseX)<<(shiftX-2))+(scaledW>>1))/scaledW+(1<<(shiftX-5))
deltaX=4*refPhaseX
In addition, variable addY and deltaY derives by following:
addY=(((refH*phaseY)<<(shifty-2))+(scaledH>>1))/scaledH+(1<<(shifty-5))
deltaY=4*refPhaseY
Again, for the position (x, y) in current layer, reference layer sample position (with regard to 1/16 sample) (xRef16, yRef16) derives by following:
xRef16=((x*scaleX+addX)>>(shiftX-4))-deltaX
yRef16=((y*scaleY+addY)>>(shifty-4))-deltaY
In another embodiment, in the minimizing sampling of luma samples, a dimension (such as, horizontal dimensions) can use nonzero phase, and another dimension (such as, vertical dimensions) can use zero phase.Such as, the phase information transmitted for level and vertical dimensions can be separated.Table 2 illustrates the example collection of the flag according to this embodiment.
Table 2
Whether luma_phase_x_flag can specify the horizontal level of the luma samples in current layer picture symmetrical with the horizontal level of the luma samples for can be used for the layer frame of the layer picture of inter-layer prediction.Such as, luma_phase_x_flag setting equals 1 the horizontal level of the luma samples in current layer picture can be specified symmetrical with the horizontal level for can be used for the luma samples of the layer frame of the layer picture of inter-layer prediction.Further, luma_phase_x_flag setting equals 0 and can specify the position of the upper left quarter luma samples in current layer picture and have zero phase-shift in the horizontal direction for the position of the upper left quarter luma samples of the layer frame of the layer picture that can be used for inter-layer prediction.When luma_phase_x_flag does not exist, deducibility its equal 0.
Whether luma_phase_y_flag can specify the upright position of the luma samples in current layer picture symmetrical with the upright position of the luma samples for can be used for the layer frame of the layer picture of inter-layer prediction.Such as, luma_phase_y_flag setting equals 1 the upright position of the luma samples in current layer picture can be specified symmetrical with the upright position for can be used for the luma samples of the layer frame of the layer picture of inter-layer prediction.Further, luma_phase_y_flag setting equals 0 and can specify the position of the upper left quarter luma samples in current layer picture and have zero phase-shift in vertical direction for the position of the upper left quarter luma samples of the layer frame of the layer picture that can be used for inter-layer prediction.When luma_phase_y_flag does not exist, deducibility its equal 0.
In this embodiment, variable PhaseXL, PhaseYL, refPhaseXC, refPhaseYC, phaseXC and phaseYC derives by following:
PhaseXL=2*luma_phase_x_flag
PhaseYL=2*luma_phase_y_flag
refPhaseXL=2*luma_phase_x_flag
refPhaseYL=2*luma_phase_y_flag
phaseXC=chroma_phase_x_flag+luma_phase_x_flag
phaseYC=chroma_phase_y+luma_phase_y_flag
refPhaseXC=ref_layer_chroma_phase_x_flag+luma_phase_x_flag
refPhaseYC=ref_layer_chroma_phase_y+luma_phase_y_flag
In certain embodiments, variable chroma_phase_x_flag, ref_chroma_phase_x_flag, chroma_phase_y and ref_chroma_phase_y are transmitted for chroma sampling position, bi-values can be used for x dimension grammer, and can have more many-valued for the syntax values of y dimension.In other embodiments, various combination can be there is.Such as, each that chroma sampling position transmits in variable chroma_phase_x_flag, ref_chroma_phase_x_flag, chroma_phase_y and ref_chroma_phase_y can be bi-values.In another example, the chroma sampling position each transmitted in variable chroma_phase_x_flag, ref_chroma_phase_x_flag, chroma_phase_y and ref_chroma_phase_y can have non-binary, multistage value.In particular, the chroma sampling position each transmitted in variable chroma_phase_x_flag, ref_chroma_phase_x_flag, chroma_phase_y and ref_chroma_phase_y can have binary or the multistage value of non-binary.
In some systems, the position of each optical pickocff (such as, pixel) can have thin tail sheep, and can somewhat different than boundary pixel for the optical path of center pixel.Such as, Fig. 8 A illustrates the out-of-alignment example of the pixel in 1-D pel array.The schematic diagram of the out-of-alignment example of Pixel Information.In particular, owing to implement damage, pixel 2 and pixel 3 misalignment.When performing binary coded decimal and reducing sampling, obtain pixel 1,3 and 5, himself is relatively aimed at.But, when basal layer is used for inter-layer reference picture dyad phase alignment increase sampling time, produce misalignment, as in Fig. 8 B show.In this example, the pixel of sampling is increased 2with 4with the home position misalignment of pixel 2 and 4.Therefore, use misalignment pixel prediction original pixels by negative effect enhancement layer decoding efficiency.
In certain embodiments, the phase place misalignment of each row and column can be transmitted, such as, use SPS.Enhancement layer decoder can use the information adjustment phase difference transmitted better to be predicted.Further, compressible phase information is to reduce expense.
In certain embodiments, phase place misalignment can be similar in every a line and each row.But, if imaging device becomes have heterogeneous phase alignment, so phase alignment information model can be turned to from x pixel coordinate and y pixel coordinate to the Function Mapping of phase deviation.The form of this function can be very flexible, such as, and the formula of pushing up more.The coefficient of function can be estimated and transmit with SPS by off-line.
At decoder-side, decoder can calculate the phase shift of each pixel, and correspondingly adjustment or change increase sample program to obtain better prediction signal.
As described previously, in certain embodiments, there is the syntactic element being used for chroma sampling positional information: chroma_phase_x_flag, chroma_phase_y, ref_chroma_phase_x_flag and ref_chroma_phase_y.Can transmit for the chroma sampling positional information of current layer with its reference layer for every one deck.
In other embodiments, vision signal can have multiple component.Such as, it can be Y, U and V component.In addition, the sampling density of component can be different.Such as, in 4: 2: 0 forms, in both the horizontal and vertical directions, the sampling rate of U or V is 1/2 of Y.In other words, 2 × 2 Y samples correspond to 1 U sample and 1 V sample.U or V sample can be different relative to the sample position of the upper left person in 2 × 2 Y-component.
Fig. 9 illustrates some examples of chroma sample position.Such as, 910 is the example of chroma sample type 2,920 is the example of chroma sample type 3,930 is the example of luma samples top field, 940 is the example of chroma sample type 0, and 950 is the example of chroma sample Class1, and 960 is the example of chroma sample type 4,970 is the example of chroma sample type 5, and 980 is the example of field bottom luma samples.As demonstrated, grey is filled and can be indicated bottom field sample type, and can indicate top field sample type without filling
In certain embodiments, for the layer of all supports, in VPS plane transport chroma sampling positional information.Table 3 illustrates the example collection of the flag according to this embodiment.
Table 3
Chroma_phase_x_flag [i] can specify using the horizontal phase shift had as the chromatic component in units of the half of the luma samples of the picture of the layer index of i or layer picture in CVS.When chroma_phase_x_flag does not exist, deducibility its equal 0.Phase shift can refer to the space displacement between upper left quarter chroma sample and upper left quarter luma samples.
Chroma_phase_y [i] can specify using the vertical phase shift had as the chromatic component in units of the half of the luma samples of the picture of the layer index of i or layer picture in CVS.When chroma_phase_y does not exist, deducibility its equal 0.The value of chroma_phase_y (can comprise 0 and 2) in the scope of 0 to 2.Phase shift can refer to the space displacement between upper left quarter chroma sample and upper left quarter luma samples.
In another embodiment, the single value of chroma_phase_x_flag and chroma_phase_y can through transmitting and being applied to all layers (such as, all layers can have same chroma sampling position).Table 4 illustrates the example collection of the flag according to this embodiment.
Table 4
Chroma_phase_x_flag can specify the horizontal phase shift of the chromatic component in units of the half of the luma samples of all pictures in CVS.When chroma_phase_x_flag does not exist, deducibility its equal 0.Phase shift can refer to the space displacement between upper left quarter chroma sample and upper left quarter luma samples.
Chroma_phase_y can specify the vertical phase shift of the chromatic component in units of the half of the luma samples of all pictures in CVS.When chroma_phase_y does not exist, deducibility its equal 0.The value of chroma_phase_y (can comprise 0 and 2) in the scope of 0 to 2.Phase shift can refer to the space displacement between upper left quarter chroma sample and upper left quarter luma samples.
acquiescence self adaptation increases sampling filter
The filter set listed in table 5 and 6 provides the example of the filter set increased in sampling embodiment.The one dimension listed increase sampling filter can usage level direction, vertical direction or both.The filter set listed in table 5 and 6 can be used as default filter set to select (such as) and whether in bit stream, does not transmit phase shift information.
Table 5-brightness increases the example of sampling filter coefficient
Phase shift Coefficient
0 {0,0,0,64,0,0,0,0,}
1 {0,1,-3,63,4,-2,1,0,}
2 {0,2,-6,61,9,-3,1,0,}
3 {-1,3,-8,60,13,-4,1,0,}
4 {-1,4,-10,58,17,-5,1,0,}
5 {-1,4,-11,53,25,-8,3,-1,}
6 {-1,4,-11,50,29,-9,3,-1,}
7 {-1,4,-11,45,34,-10,4,-1,}
8 {-1,4,-11,40,40,-11,4,-1,}
9 {-1,4,-10,34,45,-11,4,-1,}
10 {-1,3,-9,29,50,-11,4,-1,}
11 {-1,3,-8,25,53,-11,4,-1,}
12 {0,1,-5,17,58,-10,4,-1,}
13 {0,1,-4,13,60,-8,3,-1,}
14 {0,1,-3,8,62,-6,2,0,}
15 {0,1,-2,4,63,-3,1,0,}
Table 6-brightness increases the example of sampling filter coefficient
Phase shift Coefficient
0 {0,64,0,0},
1 {-2,62,4,0},
2 {-2,58,10,-2},
3 {-4,56,14,-2},
4 {-4,54,16,-2},
5 {-6,52,20,-2},
6 {-6,48,26,-4},
7 {-4,42,30,-4},
8 {-4,36,36,-4},
9 {-4,30,42,-4},
10 {-4,26,48,-6},
11 {-2,20,52,-6},
12 {-2,16,54,-4},
13 {-2,14,56,-4},
14 {-2,10,58,-2},
15 {0,4,62,-2}
self adaptation increases sampling filter (AUF)
In one embodiment, the filter coefficient used in increase sampling process can be transmitted in bit stream.Such as, can in SPS, PPS or section head Stratificational Grammar transmitting filter.When SPS level transmitting filter, the filter transmitted can be used to substitute default filter in whole sequence.When during transmitting filter, the filter transmitted being used to substitute corresponding picture or the default filter in cutting into slices in PPS or section head level.
In increase sampling filtering, only the several filters in filter set are used for the application of concrete spatial scalable.In one embodiment, the coefficient of these filters is only transmitted.Sampling mapping method described above can in order to derive the phase place participating in these filters increasing sampling process.
In an embodiment, the version simplified can be implemented.Such as, if the spatial scalability with ratio 2.0 and 1.5 is only supported in supposition, index can so be transmitted to indicate the space ratio between enhancement layer and basal layer in SPS.Whether grammer can be flag to indicate current EL is spatial scalability layer or SNR scalability layers, and another flag can distinguish space ratio 2.0 and 1.5.
Flag can be transmitted in PPS or section head and in the picture of correspondence or section, whether enable self adaptation increase sampling filter with instruction.When flag is true time, transmit the filter coefficient of the phase place related to, otherwise, default filter can be used in increase sampling process.Alternatively, can separate for horizontal and vertical direction and transmit two flags and increase sampling filter to indicate whether to enable in a particular direction self adaptation.
When enabling self adaptation and increasing sampling filter, can increase with instruction the filter length used in sampling process by transmitting filter length information N.
For the filter with length N, filter coefficient can by coeff [i] (i=0 ..., N-1) represent.In certain embodiments, for each filter, only can transmit N-1 coefficient, not residue coefficient of transmitting filter, and can derive at decoder-side.Its value equals (1<<filter_norm), deducts the summation of N-1 the coefficient that can transmit; Wherein (1<<filter_norm) is the summation of all filter coefficients; Representative value can be 32,64 and 128.As an example, be coeff [(N+1)/2-1+ (phase+7)/16] by the Department number selecting not transmit, suppose the largest Department number of its filter for this reason.
Decoding filter coefficient can be carried out with the VLC of a certain kind.An example is the absolute value using Exp-Golomb transliteration code coefficient.As fruit Department number and non-zero, so translate a yard sign for Department number.In addition, also can from the coefficient prediction filter coefficient of default filter.Only VLC decoding is through the difference between decoding filter coefficient and acquiescence coefficient.Also can from the filter codes coefficient prediction filter coefficient of previous decoding.Such as, when transmitting the filter coefficient of horizontal direction in advance, it can in order to predict the filter coefficient of vertical direction.
Transmitting filter coefficient can be separated for horizontal direction and vertical direction.Alternatively, a filter set can be transmitted, and for the described filter set of horizontal and vertical directions application.In addition, flag can be transmitted to indicate whether to share filter between level and vertical direction.
When the phase place of filter is 0 or 8, suppose that filter is symmetrical.Symmetrical feature can in order to the only half of transmitting filter coefficient, and it means for the filter with length N and only transmits (N+1)/2-1 coefficient and derive all the other coefficients.Further, if filter is symmetrical, the filter so for phase place p1 can have same factor with the filter with phase place (16-phase1); And by overturning another filter to obtain a filter.Symmetrical feature can in order to the only half of transmitting filter coefficient, and it means, when the filter with phase place p1 all participates in increasing sampling process with the filter with phase place (16-phase1), only transmits one wherein.
As discussed above, for some application, the sample position increasing sampling process maps possibility and non-optimal.In the case, self adaptation can be used to increase sampling filter to hold phase-shifted, and therefore symmetrical feature does not apply maintenance thus.The present invention proposes to transmit flag to indicate whether to apply symmetrical feature.May correspond to ground transmitting filter coefficient.
In one or more example, described function can be implemented with hardware, software, firmware or its any combination.If with implement software, so described function can be used as one or more instruction or code stores or transmits on computer-readable media, and is performed by hardware based processing unit.Computer-readable media can comprise computer-readable storage medium, it corresponds to tangible medium (such as, data storage medium), or comprise computer program is sent to the media at another place's (such as, according to communication protocol) by any promotion communication medium from one.In this way, computer-readable media generally may correspond to the tangible computer readable memory medium in (1) non-transitory, or (2) communication medium, such as, and signal or carrier wave.Data storage medium can be can by one or more computer or one or more processor access with retrieval for implementing any useable medium of the instruction of the technology described in the present invention, code and/or data structure.Computer program can comprise computer-readable media.
Unrestricted by means of example, this type of computer-readable storage medium can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage apparatus, disk storage device or other magnetic storage device, flash memory or any other and can be used to store in instruction or the desired program code of the form of data structure and can by the media of computer access.Further, any connection can be called computer-readable media rightly.Such as, if use the wireless technology such as coaxial cable, Connectorized fiber optic cabling, twisted-pair feeder, digital subscribe lines (DSL) or such as infrared ray, radio and microwave from website, server or other remote source instruction, so the wireless technology such as coaxial cable, Connectorized fiber optic cabling, twisted-pair feeder, DSL or such as infrared ray, radio and microwave is included in the definition of media.However, it should be understood that computer-readable storage medium and data storage medium do not comprise be connected, carrier wave, signal or other temporary media, but replace and be directed to non-transitory tangible storage medium.As used herein, disk and case for computer disc are containing compact disk (CD), laser-optical disk, optical compact disks, digital versatile disc (DVD), floppy discs and Blu-ray Disc, wherein disk is usually with magnetic means copy data, and usage of CD-ROM laser copy data to be optically.Above combination also should be included in the scope of computer-readable media.
Instruction can be performed by one or more processor, and one or more processor described is such as the integrated or discrete logic of one or more digital signal processor (DSP), general purpose microprocessor, application-specific integrated circuit (ASIC) (ASIC), field programmable logic array (FPGA) or other equivalence.Therefore, " processor " can refer to said structure or be suitable for implementing any one in other structure arbitrary of technology described herein as used herein, the term.In addition, in some respects, described hereinly functionally being configured for providing in the specialized hardware of Code And Decode and/or software module, or can be incorporated in assembly coding decoder.Further, described technology can be implemented in one or more circuit or logic element completely.
Technology of the present invention can be implemented in extensive multiple device or equipment, comprises the set (that is, chipset) of wireless handset, integrated circuit (IC) or IC.Describe in the present invention various assembly, module or unit be in order to emphasize to be configured to perform disclose the function aspects of the device of technology, but necessarily not realized by different hardware unit.On the contrary, as described above, various unit in conjunction with suitable software and/or firmware combinations in coding decoder hardware cell, or can be provided by the set of interoperability hardware cell, and described hardware cell comprises one or more processor as described above.
Various example has been described.These and other example within the scope of the appended claims.

Claims (47)

1. be configured to a device for coded video information, described device comprises:
Memory, it is configured to stored video data, and described video data comprises the ground floor of video information; And
With the processor of described memory communication, described processor is configured to:
Determine the phase shift information of the second layer of video information relative to described ground floor;
Image filter set is selected at least partly based on described phase shift information; And
Use the image filter set of described ground floor and described identification produce described ground floor through revision.
2. device according to claim 1, wherein:
Described ground floor comprises basal layer;
The described second layer comprises enhancement layer;
Described selected digital image filter set comprises increases sampled picture filter; And
Described processor is through being configured to further receive the syntactic element extracted from the coded video bitstream transmitting described phase shift information.
3. device according to claim 1, wherein:
Described ground floor comprises enhancement layer;
The described second layer comprises basal layer;
Described selected digital image filter set comprises minimizing sampled picture filter; And
Described processor is through being configured to the syntactic element of generation for coded video bitstream further to transmit described phase shift information.
4. device according to claim 1, wherein said phase shift information comprises the difference between the position of the pixel in described ground floor and the corresponding position of the described pixel in the described second layer.
5. device according to claim 1, wherein said phase shift information comprises the bi-values of the zero phase relation between the described ground floor of instruction and the described second layer or the one in symmetrical phase relation.
6. device according to claim 1, wherein said phase shift information comprises the first syntactic element to indicate horizontal phase displacement information and the second syntactic element to indicate vertical phase displacement information.
7. device according to claim 6, at least one in wherein said first syntactic element and described second syntactic element comprises non-bi-values.
8. device according to claim 1, wherein said processor is through being configured to further:
If described phase shift information does not transmit in bit stream, so select default image filter set; And
If described phase shift information transmits in bit stream, so select image filter set based on described phase shift information.
9. device according to claim 8, wherein said default image filter set is at least partly based on the symmetrical phase relation between described ground floor and the described second layer.
10. device according to claim 8, wherein said default image filter set is at least partly based on the zero phase relation between described ground floor and the described second layer.
11. devices according to claim 1, wherein said phase shift information comprises alignment information.
12. devices according to claim 11, x pixel coordinate and y pixel coordinate are mapped to the function of phase deviation by wherein said alignment information through being modeled as.
13. devices according to claim 1, wherein said selected digital image filter set comprises the coefficient that the part as bit stream transmits.
14. devices according to claim 1, wherein said selected digital image filter set comprises coefficient { 0,0,0,64,0,0,0, the 0} for zero phase-shift, for the coefficient { 0,1 ,-3,63,4 ,-2,1,0} of a phase shift, for the coefficient { 0,2 ,-6,61,9 ,-3,1,0} of two phase shifts, for the coefficient {-1,3 ,-8,60,13 ,-4,1,0} of three phase shifts, for coefficient {-Isosorbide-5-Nitrae ,-10,58,17 ,-5,1, the 0} of four phase shifts, for coefficient {-Isosorbide-5-Nitrae ,-11,53,25 ,-8,3 ,-1} of five phase shifts, for coefficient {-Isosorbide-5-Nitrae ,-11,50,29 ,-9,3 ,-1} of six phase shifts, for coefficient {-Isosorbide-5-Nitrae ,-11,45,34 ,-10,4 ,-1} of seven phase shifts, for coefficient {-Isosorbide-5-Nitrae ,-11,40,40 ,-11,4 ,-1} of eight phase shifts, for coefficient {-Isosorbide-5-Nitrae ,-10,34,45 ,-11,4 ,-1} of nine phase shifts, for the coefficient {-1,3 ,-9,29,50 ,-11,4 ,-1} of ten phase shifts, for the coefficient {-1,3 ,-8,25,53 ,-11,4 ,-1} of 11 phase shifts, for the coefficient { 0,1 ,-5,17,58 ,-10,4 ,-1} of 12 phase shifts, for the coefficient { 0,1 ,-4,13,60 ,-8,3 ,-1} of 13 phase shifts, for the coefficient { 0,1 ,-3,8,62 ,-6,2,0} and coefficient { 0,1 ,-2,4,63 ,-3,1,0} for 15 phase shifts of 14 phase shifts.
15. devices according to claim 1, wherein said selected digital image filter set comprises coefficient { 0,64,0, the 0} for zero phase-shift, for the coefficient {-2,62,4,0} of a phase shift, for the coefficient {-2,58,10 ,-2} of two phase shifts, for the coefficient {-4,56,14 ,-2} of three phase shifts, for the coefficient {-4,54,16 ,-2} of four phase shifts, for the coefficient {-6,52,20 ,-2} of five phase shifts, for the coefficient {-6,48,26 ,-4} of six phase shifts, for the coefficient {-4,42,30 ,-4} of seven phase shifts, for the coefficient {-4,36,36 ,-4} of eight phase shifts, for the coefficient {-4,30,42 ,-4} of nine phase shifts, for the coefficient {-4,26,48 ,-6} of ten phase shifts, for the coefficient {-2,20,52 ,-6} of 11 phase shifts, for the coefficient {-2,16,54 ,-4} of 12 phase shifts, for the coefficient {-2,14,56 ,-4} of 13 phase shifts, for the coefficient {-2,10,58 ,-2} and coefficient { 0,4,62 ,-2} for 15 phase shifts of 14 phase shifts.
16. devices according to claim 1, described device comprise further following at least one: desktop PC, notebook, flat computer, Set Top Box, telephone handset, TV, camera, display unit, digital media player, video game console and comprise the stream video device of memory and processor.
The method of 17. 1 kinds of decode video information, it comprises:
Obtain the basal layer of video information;
Receive the syntactic element extracted from coded video bitstream, institute's syntax elements comprises the phase shift information of described basal layer relative to enhancement layer of video information;
Image filter set is selected at least partly based on described phase shift information; And
Use the image filter set of described basal layer and described identification produce described enhancement layer through increase sampled version.
18. methods according to claim 17, wherein said phase shift information comprises the difference between the position of the pixel in described enhancement layer and the corresponding position of the described pixel in described basal layer.
19. methods according to claim 17, wherein said phase shift information comprises the bi-values of the zero phase relation between the described enhancement layer of instruction and described basal layer or the one in symmetrical phase relation.
20. methods according to claim 17, the syntactic element of wherein said reception comprises the first syntactic element of instruction horizontal phase displacement information and the second syntactic element of instruction vertical phase displacement information.
21. methods according to claim 20, at least one in wherein said first syntactic element and described second syntactic element comprises non-bi-values.
22. methods according to claim 17, it comprises further:
If described phase shift information does not transmit in bit stream, so select default image filter set; And
If described phase shift information transmits in bit stream, so select image filter set based on described phase shift information.
23. methods according to claim 22, wherein said default image filter set is at least partly based on the symmetrical phase relation between described enhancement layer and described basal layer.
24. methods according to claim 22, wherein said default image filter set is at least partly based on the zero phase relation between described enhancement layer and described basal layer.
25. methods according to claim 17, wherein said phase shift information comprises alignment information.
26. methods according to claim 17, x pixel coordinate and y pixel coordinate are mapped to the function of phase deviation by wherein said alignment information through being modeled as.
27. methods according to claim 17, wherein said selected digital image filter set comprises the coefficient that the part as bit stream transmits.
28. methods according to claim 17, wherein said selected digital image filter set comprises coefficient { 0,0,0,64,0,0,0, the 0} for zero phase-shift, for the coefficient { 0,1 ,-3,63,4 ,-2,1,0} of a phase shift, for the coefficient { 0,2 ,-6,61,9 ,-3,1,0} of two phase shifts, for the coefficient {-1,3 ,-8,60,13 ,-4,1,0} of three phase shifts, for coefficient {-Isosorbide-5-Nitrae ,-10,58,17 ,-5,1, the 0} of four phase shifts, for coefficient {-Isosorbide-5-Nitrae ,-11,53,25 ,-8,3 ,-1} of five phase shifts, for coefficient {-Isosorbide-5-Nitrae ,-11,50,29 ,-9,3 ,-1} of six phase shifts, for coefficient {-Isosorbide-5-Nitrae ,-11,45,34 ,-10,4 ,-1} of seven phase shifts, for coefficient {-Isosorbide-5-Nitrae ,-11,40,40 ,-11,4 ,-1} of eight phase shifts, for coefficient {-Isosorbide-5-Nitrae ,-10,34,45 ,-11,4 ,-1} of nine phase shifts, for the coefficient {-1,3 ,-9,29,50 ,-11,4 ,-1} of ten phase shifts, for the coefficient {-1,3 ,-8,25,53 ,-11,4 ,-1} of 11 phase shifts, for the coefficient { 0,1 ,-5,17,58 ,-10,4 ,-1} of 12 phase shifts, for the coefficient { 0,1 ,-4,13,60 ,-8,3 ,-1} of 13 phase shifts, for the coefficient { 0,1 ,-3,8,62 ,-6,2,0} and coefficient { 0,1 ,-2,4,63 ,-3,1,0} for 15 phase shifts of 14 phase shifts.
29. methods according to claim 17, wherein said selected digital image filter set comprises coefficient { 0,64,0, the 0} for zero phase-shift, for the coefficient {-2,62,4,0} of a phase shift, for the coefficient {-2,58,10 ,-2} of two phase shifts, for the coefficient {-4,56,14 ,-2} of three phase shifts, for the coefficient {-4,54,16 ,-2} of four phase shifts, for the coefficient {-6,52,20 ,-2} of five phase shifts, for the coefficient {-6,48,26 ,-4} of six phase shifts, for the coefficient {-4,42,30 ,-4} of seven phase shifts, for the coefficient {-4,36,36 ,-4} of eight phase shifts, for the coefficient {-4,30,42 ,-4} of nine phase shifts, for the coefficient {-4,26,48 ,-6} of ten phase shifts, for the coefficient {-2,20,52 ,-6} of 11 phase shifts, for the coefficient {-2,16,54 ,-4} of 12 phase shifts, for the coefficient {-2,14,56 ,-4} of 13 phase shifts, for the coefficient {-2,10,58 ,-2} and coefficient { 0,4,62 ,-2} for 15 phase shifts of 14 phase shifts.
The method of 30. 1 kinds of encode video informations, it comprises:
Obtain the enhancement layer of video information;
Select to reduce sampled picture filter set;
Described enhancement layer and described selected digital image filter set is used to produce basal layer; And
Produce and comprise the syntactic element of described basal layer relative to the phase shift information of described enhancement layer.
31. methods according to claim 30, wherein said phase shift information comprises the difference between the position of the pixel in described enhancement layer and the corresponding position of the described pixel in described basal layer.
32. methods according to claim 30, wherein said phase shift information comprises the bi-values of the zero phase relation between the described enhancement layer of instruction and described basal layer or the one in symmetrical phase relation.
33. methods according to claim 30, the syntactic element of wherein said generation comprises the first syntactic element of instruction horizontal phase displacement information and the second syntactic element of instruction vertical phase displacement information.
34. methods according to claim 33, at least one in wherein said first syntactic element and described second syntactic element comprises non-bi-values.
35. methods according to claim 30, wherein said selected digital image filter set is at least partly based on the default image filter set of the symmetrical phase relation between described enhancement layer and described basal layer.
36. methods according to claim 30, wherein said selected digital image filter set is at least partly based on the default image filter set of the zero phase relation between described enhancement layer and described basal layer.
37. methods according to claim 30, wherein said phase shift information comprises alignment information.
38. methods according to claim 30, x pixel coordinate and y pixel coordinate are mapped to the function of phase deviation by wherein said alignment information through being modeled as.
39. methods according to claim 30, wherein said selected digital image filter set comprises the coefficient that the part as bit stream transmits.
40. methods according to claim 30, wherein said selected digital image filter set comprises coefficient { 0,0,0,64,0,0,0, the 0} for zero phase-shift, for the coefficient { 0,1 ,-3,63,4 ,-2,1,0} of a phase shift, for the coefficient { 0,2 ,-6,61,9 ,-3,1,0} of two phase shifts, for the coefficient {-1,3 ,-8,60,13 ,-4,1,0} of three phase shifts, for coefficient {-Isosorbide-5-Nitrae ,-10,58,17 ,-5,1, the 0} of four phase shifts, for coefficient {-Isosorbide-5-Nitrae ,-11,53,25 ,-8,3 ,-1} of five phase shifts, for coefficient {-Isosorbide-5-Nitrae ,-11,50,29 ,-9,3 ,-1} of six phase shifts, for coefficient {-Isosorbide-5-Nitrae ,-11,45,34 ,-10,4 ,-1} of seven phase shifts, for coefficient {-Isosorbide-5-Nitrae ,-11,40,40 ,-11,4 ,-1} of eight phase shifts, for coefficient {-Isosorbide-5-Nitrae ,-10,34,45 ,-11,4 ,-1} of nine phase shifts, for the coefficient {-1,3 ,-9,29,50 ,-11,4 ,-1} of ten phase shifts, for the coefficient {-1,3 ,-8,25,53 ,-11,4 ,-1} of 11 phase shifts, for the coefficient { 0,1 ,-5,17,58 ,-10,4 ,-1} of 12 phase shifts, for the coefficient { 0,1 ,-4,13,60 ,-8,3 ,-1} of 13 phase shifts, for the coefficient { 0,1 ,-3,8,62 ,-6,2,0} and coefficient { 0,1 ,-2,4,63 ,-3,1,0} for 15 phase shifts of 14 phase shifts.
41. methods according to claim 30, wherein said selected digital image filter set comprises coefficient { 0,64,0, the 0} for zero phase-shift, for the coefficient {-2,62,4,0} of a phase shift, for the coefficient {-2,58,10 ,-2} of two phase shifts, for the coefficient {-4,56,14 ,-2} of three phase shifts, for the coefficient {-4,54,16 ,-2} of four phase shifts, for the coefficient {-6,52,20 ,-2} of five phase shifts, for the coefficient {-6,48,26 ,-4} of six phase shifts, for the coefficient {-4,42,30 ,-4} of seven phase shifts, for the coefficient {-4,36,36 ,-4} of eight phase shifts, for the coefficient {-4,30,42 ,-4} of nine phase shifts, for the coefficient {-4,26,48 ,-6} of ten phase shifts, for the coefficient {-2,20,52 ,-6} of 11 phase shifts, for the coefficient {-2,16,54 ,-4} of 12 phase shifts, for the coefficient {-2,14,56 ,-4} of 13 phase shifts, for the coefficient {-2,10,58 ,-2} and coefficient { 0,4,62 ,-2} for 15 phase shifts of 14 phase shifts.
42. 1 kinds of equipment for coded video bit stream, it comprises:
For obtaining the device of the enhancement layer of video information;
For generation of the device of basal layer relative to the syntactic element of the phase shift information of described enhancement layer comprising video information;
For at least part of device selecting image filter set based on described phase shift information;
For the device through reducing sampled version using the image filter set of described enhancement layer and described identification to produce described enhancement layer; And
For storing the described device through reducing sampled version of described enhancement layer.
43. equipment according to claim 42, wherein said phase shift information comprises the difference between the position of the pixel in described enhancement layer and the corresponding position of the described pixel in described basal layer.
44. equipment according to claim 42, wherein said phase shift information comprises the bi-values of the zero phase relation between the described enhancement layer of instruction and described basal layer or the one in symmetrical phase relation.
45. 1 kinds of non-transitory computer-readable medias with the instruction be stored thereon, described instruction makes described processor when being performed by processor:
Obtain the basal layer of video information;
Receive the syntactic element extracted from coded video bitstream, institute's syntax elements comprises the phase shift information of described basal layer relative to enhancement layer of video information;
Image filter set is selected at least partly based on described phase shift information; And
Use the image filter set of described enhancement layer and described identification produce described enhancement layer through increase sampled version.
46. non-transitory computer-readable medias according to claim 45, wherein said phase shift information comprises the difference between the position of the pixel in described enhancement layer and the corresponding position of the described pixel in described basal layer.
47. non-transitory computer-readable medias according to claim 45, wherein said phase shift information comprises the bi-values of the zero phase relation between the described enhancement layer of instruction and described basal layer or the one in symmetrical phase relation.
CN201380053388.5A 2012-09-04 2013-09-04 Transmitting for phase information is down-sampled in scalable video coding Active CN104718752B (en)

Applications Claiming Priority (9)

Application Number Priority Date Filing Date Title
US201261696722P 2012-09-04 2012-09-04
US61/696,722 2012-09-04
US201361808467P 2013-04-04 2013-04-04
US61/808,467 2013-04-04
US201361814243P 2013-04-20 2013-04-20
US61/814,243 2013-04-20
US14/017,169 US10448032B2 (en) 2012-09-04 2013-09-03 Signaling of down-sampling location information in scalable video coding
US14/017,169 2013-09-03
PCT/US2013/058050 WO2014039547A1 (en) 2012-09-04 2013-09-04 Signaling of down-sampling phase information in scalable video coding

Publications (2)

Publication Number Publication Date
CN104718752A true CN104718752A (en) 2015-06-17
CN104718752B CN104718752B (en) 2018-08-28

Family

ID=50187598

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201380053388.5A Active CN104718752B (en) 2012-09-04 2013-09-04 Transmitting for phase information is down-sampled in scalable video coding

Country Status (5)

Country Link
US (1) US10448032B2 (en)
JP (2) JP6342402B2 (en)
KR (1) KR20150052248A (en)
CN (1) CN104718752B (en)
WO (1) WO2014039547A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108353173A (en) * 2015-11-02 2018-07-31 杜比实验室特许公司 Piecewise linearity inter-layer prediction device for high dynamic range video coding
CN110121871A (en) * 2016-09-30 2019-08-13 亚马逊技术有限公司 For the coding based on request of flowing content part
TWI681672B (en) * 2017-07-13 2020-01-01 大陸商華為技術有限公司 Method, apparatus and system for processing picture

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9686543B2 (en) 2011-06-15 2017-06-20 Electronics And Telecommunications Research Institute Method for coding and decoding scalable video and apparatus using same
US9794555B2 (en) * 2013-03-15 2017-10-17 Arris Enterprises Llc Adaptive sampling filter process for scalable video coding
US9503733B2 (en) 2013-04-10 2016-11-22 ARRIS Enterprises, LLC Re-sampling with phase offset adjustment for luma and chroma to signal adaptive filters in scalable video coding
US9510001B2 (en) 2013-07-09 2016-11-29 Electronics And Telecommunications Research Institute Video decoding method and apparatus using the same
US10341685B2 (en) 2014-01-03 2019-07-02 Arris Enterprises Llc Conditionally parsed extension syntax for HEVC extension processing
CN106464890A (en) * 2014-03-14 2017-02-22 三星电子株式会社 Scalable video encoding/decoding method and apparatus
US20150271495A1 (en) * 2014-03-18 2015-09-24 Arris Enterprises, Inc. Scalable Video Coding using Phase Offset Flag Signaling
EP3700218A1 (en) * 2014-03-18 2020-08-26 ARRIS Enterprises LLC Scalable video coding using reference and scaled reference layer offsets
CA2943121C (en) * 2014-03-18 2020-09-08 Arris Enterprises Llc Scalable video coding using reference and scaled reference layer offsets
WO2015168581A1 (en) 2014-05-01 2015-11-05 Arris Enterprises, Inc. Reference layer and scaled reference layer offsets for scalable video coding
MX368227B (en) 2014-05-30 2019-09-25 Arris Entpr Llc Reference layer offset parameters for inter-layer prediction in scalable video coding.
GB2552353B (en) * 2016-07-20 2022-04-20 V Nova Int Ltd Apparatuses, methods, computer programs and computer-readable media
JP2021005741A (en) * 2017-09-14 2021-01-14 シャープ株式会社 Image coding device and image decoding device
WO2019070770A1 (en) * 2017-10-02 2019-04-11 Arris Enterprises Llc System and method for reducing blocking artifacts and providing improved coding efficiency
EP3579553B1 (en) * 2018-06-05 2020-05-20 Axis AB A method, controller, and system for encoding a sequence of video frames
US10972744B2 (en) * 2018-11-12 2021-04-06 Analog Devices International Unlimited Company Image scaling
JP7321364B2 (en) 2019-09-14 2023-08-04 バイトダンス インコーポレイテッド Chroma quantization parameter in video coding
US11356707B2 (en) * 2019-09-23 2022-06-07 Qualcomm Incorporated Signaling filters for video processing
CN114651442A (en) 2019-10-09 2022-06-21 字节跳动有限公司 Cross-component adaptive loop filtering in video coding and decoding
JP7443509B2 (en) 2019-10-14 2024-03-05 バイトダンス インコーポレイテッド Using chroma quantization parameters in video coding
WO2021118977A1 (en) 2019-12-09 2021-06-17 Bytedance Inc. Using quantization groups in video coding
CN114902657A (en) 2019-12-31 2022-08-12 字节跳动有限公司 Adaptive color transform in video coding and decoding
US20230108639A1 (en) * 2020-03-04 2023-04-06 Intellectual Discovery Co., Ltd. Video coding method and device, and recording medium storing bitstream

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1722838A (en) * 2004-07-15 2006-01-18 三星电子株式会社 Use the scalable video coding method and apparatus of basal layer
WO2007080477A2 (en) * 2006-01-10 2007-07-19 Nokia Corporation Switched filter up-sampling mechanism for scalable video coding
CN101379511A (en) * 2005-03-18 2009-03-04 夏普株式会社 Methods and systems for extended spatial scalability with picture-level adaptation
CN101895748A (en) * 2010-06-21 2010-11-24 华为终端有限公司 Coding and decoding methods and coding and decoding devices

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20060105407A (en) * 2005-04-01 2006-10-11 엘지전자 주식회사 Method for scalably encoding and decoding video signal
US8175168B2 (en) 2005-03-18 2012-05-08 Sharp Laboratories Of America, Inc. Methods and systems for picture up-sampling
EP1886502A2 (en) * 2005-04-13 2008-02-13 Universität Hannover Method and apparatus for enhanced video coding
US8320460B2 (en) 2006-09-18 2012-11-27 Freescale, Semiconductor, Inc. Dyadic spatial re-sampling filters for inter-layer texture predictions in scalable image processing
WO2008056959A1 (en) 2006-11-09 2008-05-15 Lg Electronics Inc. Method and apparatus for decoding/encoding a video signal
US8054886B2 (en) 2007-02-21 2011-11-08 Microsoft Corporation Signaling and use of chroma sample positioning information
EP2048886A1 (en) * 2007-10-11 2009-04-15 Panasonic Corporation Coding of adaptive interpolation filter coefficients
JP2008103774A (en) 2008-01-18 2008-05-01 Opnext Japan Inc High frequency optical transmission module, and optical transmitter
KR101066117B1 (en) 2009-11-12 2011-09-20 전자부품연구원 Method and apparatus for scalable video coding
US9554149B2 (en) 2012-02-29 2017-01-24 Lg Electronics, Inc. Inter-layer prediction method and apparatus using same
CN104704831B (en) * 2012-08-06 2019-01-04 Vid拓展公司 The sampling grids information of space layer is used in multi-layer video coding

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1722838A (en) * 2004-07-15 2006-01-18 三星电子株式会社 Use the scalable video coding method and apparatus of basal layer
CN101379511A (en) * 2005-03-18 2009-03-04 夏普株式会社 Methods and systems for extended spatial scalability with picture-level adaptation
CN102387366A (en) * 2005-03-18 2012-03-21 夏普株式会社 Methods and systems for extended spatial scalability with picture-level adaptation
WO2007080477A2 (en) * 2006-01-10 2007-07-19 Nokia Corporation Switched filter up-sampling mechanism for scalable video coding
CN101895748A (en) * 2010-06-21 2010-11-24 华为终端有限公司 Coding and decoding methods and coding and decoding devices

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LIWEI GUOJIANLE CHEN EA TL.: "Signaling of PhaseOffset in Up-sampling Process and Chroma Sampling Location", 《JOINT COLLABORATIVE TEAM ON VIDEO CODING (JCT-VC)》 *
SEUNG-WOOK PARK ET AL.: "Intra BL prediction considering phase shift", 《JOINT VIDEO TEAM (JVT) OF ISO/IEC MPEG & ITU-T VCEG (ISO/IEC JTC1/SC29/WG11 AND ITU-T SG16 Q.6) 15TH MEETING JVT-O023》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108353173A (en) * 2015-11-02 2018-07-31 杜比实验室特许公司 Piecewise linearity inter-layer prediction device for high dynamic range video coding
CN108353173B (en) * 2015-11-02 2020-06-05 杜比实验室特许公司 Piecewise-linear inter-layer predictor for high dynamic range video coding
CN110121871A (en) * 2016-09-30 2019-08-13 亚马逊技术有限公司 For the coding based on request of flowing content part
TWI681672B (en) * 2017-07-13 2020-01-01 大陸商華為技術有限公司 Method, apparatus and system for processing picture

Also Published As

Publication number Publication date
US10448032B2 (en) 2019-10-15
JP6342402B2 (en) 2018-06-13
CN104718752B (en) 2018-08-28
JP2018110412A (en) 2018-07-12
US20140064386A1 (en) 2014-03-06
KR20150052248A (en) 2015-05-13
WO2014039547A1 (en) 2014-03-13
JP2015527028A (en) 2015-09-10

Similar Documents

Publication Publication Date Title
CN104718752A (en) Signaling of down-sampling phase information in scalable video coding
CN105284113B (en) It is filtered between component
CN104396243B (en) Adaptively upper sampling filter for video compress
CN103959785B (en) Change of scale for chromatic component in video coding is split
CN104488267B (en) Method and apparatus for coded video
CN103190147B (en) For combined decoding method and the equipment of the syntactic element of video coding
CN104604224B (en) Transformation substrate adjustment in scalable video coding
CN104737537A (en) Weighted prediction mode for scalable video coding
CN104685875A (en) Intra-coding for 4:2:2 sample format in video coding
CN103563378A (en) Memory efficient context modeling
CN104471942A (en) Reusing Parameter Sets For Video Coding
CN104823449A (en) Signaling of regions of interest and gradual decoding refresh in video coding
CN105191311A (en) Parallel processing for video coding
CN105075258A (en) Inter-layer reference picture construction for spatial scalability with different aspect ratios
CN104412591A (en) Intra mode extensions for difference domain intra prediction
CN104429076B (en) For scalable video coding and the vague generalization residual prediction of 3D video codings
CN105144719A (en) Device and method for scalable and multiview/3D coding of video information using generalized residual prediction
CN105359532A (en) Intra motion compensation extensions
CN104221381A (en) Wavefront parallel processing for video coding
CN104247420A (en) Transform coefficient coding
CN104584550A (en) Intra prediction improvements for scalable video coding
CN104620576A (en) Alternative transform in scalable video coding
CN103748882A (en) Mvc Based 3dvc Codec Supporting Inside View Motion Prediction (Ivmp) Mode
CN106464917A (en) Signaling hrd parameters for bitstream partitions
CN104685887A (en) Signaling layer identifiers for operation points in video coding

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant