CN103283227A

CN103283227A - Systems and methods for adaptive video coding

Info

Publication number: CN103283227A
Application number: CN2011800628602A
Authority: CN
Inventors: S·杜肯; 陈志峰; 董洁; 叶琰
Original assignee: Vid Scale Inc
Current assignee: Vid Scale Inc
Priority date: 2010-10-27
Filing date: 2011-10-27
Publication date: 2013-09-04
Also published as: AU2011319844A1; KR20130105870A; EP2633685A1; WO2012058394A1

Abstract

Systems and methods are described for determining an optimised sampling ratio for coding video data that reduces overall distortion introduced by the coding process. It seeks to balance the information loss introduced during downsampling and the information loss introduced during coding. The sampling ratio is generally determined by reducing, or in some cases minimizing, the overall error introduced through the downsampling process and the coding process, and may be adaptive based on the content of the video data being processed and a target bit rate. Computation power can be saved by coding a downsampled video. The process derives a plurality of downsampling ratios, and selects a downsampling ratio that reduces the total amount of distortion introduced during the down - sampling and coding stages. The down - sampling ratio may be selected given the available data transmission capacity, input video signal statistics, and/or other operational parameters, and may optimally reduce the overall distortion.

Description

The system and method that is used for adaptive video coding

The cross reference of related application

The application requires the U.S. Provisional Application No.61/407 of submission on October 27th, 2010,329 rights and interests, and the full content of described application is incorporated into this as quoting.

Background technology

Digital video capabilities can be attached in the very large-scale equipment, comprises Digital Television, numeral directly broadcast system, wireless broadcast system, PDA(Personal Digital Assistant), notebook or desktop computer computer, digital camera, recording digital code equipment, video game device, video-game bar, honeycomb or satelline radio phone etc.Many digital-video equipments have been realized video compression technology, such as by MPEG-2, MPEG-4, ITU-T H.263 or ITU-T H.264/MPEG-4 describe in the expansion of the standard of the 10th part, advanced video coding (AVC) definition and these standards, thereby more effectively transmit and receiving digital video information.Although wireless communication technology has increased wireless bandwidth significantly and improved service quality at mobile device user, the video content demand that increases has been brought new challenge such as high definition (HD) video content by mobile Internet to mobile video content supplier, distributor and carrier service provider fast.

Summary of the invention

According to a kind of execution mode, method for video coding comprises receiving video data, and each the sampling ratio place in a plurality of down-sampling ratios determines the sampling error value.Method for video coding can also comprise that for bit rate each the sampling ratio place in a plurality of down-sampling ratios determines the encoding error value and described sampling error value and the described encoding error value in a plurality of down-sampling ratios each sampling ratio place are sued for peace.Described method for video coding can also comprise based on the sampling error value at the down-sampling ratio place of selecting and encoding error value with the ratio of selecting in described a plurality of down-sampling ratios of sampling, with the sampling ratio of selecting video data is carried out down-sampling, and to the coding video data behind the down-sampling.

According to another execution mode, a kind of video encoding/decoding method comprises: receive the video data after the compression and receive indication to the sampling ratio after selecting, wherein said sampling ratio be based on a plurality of sampling ratio up-sampling error amounts and encoding error value and.This video encoding/decoding method can also comprise decodes to form video data after the reconstruct to the video data after the compression, carries out up-sampling with the resolution that increases the video data after the described reconstruct and exports filtered video data with the sampling ratio the selected video data after to described reconstruct.

According to another execution mode, a kind of video decoding system comprises Video Decoder.This Video Decoder can be configured to receive the video data after the compression, and receives the indication to the sampling ratio of selecting, wherein said sampling ratio be based on a plurality of sampling ratio up-sampling error amounts and encoding error value and.Described Video Decoder can also be configured to the video data after the compression is decoded to form video data after the reconstruct, and the video data after the described reconstruct is carried out up-sampling with the resolution that increases the video data after the described reconstruct and the video data behind the output up-sampling.

Description of drawings

Can understand the present invention in more detail from following description, these descriptions provide by way of example, and can be understood by reference to the accompanying drawings, wherein:

Fig. 1 is for describing the block diagram of exemplary video Code And Decode system, and described exemplary video Code And Decode system can utilize adaptive coding technology described herein;

Fig. 2 is for describing the block diagram of video encoder example, and described video encoder can be realized the technology for the video signal self-adaptive coding;

Fig. 3 is for describing the block diagram of Video Decoder example, and described Video Decoder can be realized the technology for the video signal self-adaptive decoding;

Fig. 4 shows the encoding scheme that directly codec is applied on the input video;

Fig. 5 shows to utilize has the example embodiment of down-sampling and the coding in up-sampling stage;

Fig. 6 A and 6B show the processing that will describe among Fig. 5 and resolve into sampling section and coded portion respectively;

Fig. 7 is the look-up table that is used for α according to a kind of non-limiting execution mode;

Fig. 8 is the look-up table that is used for β according to a kind of non-limiting execution mode;

Fig. 9 A, 9B and 9C have described according to the search strategy of various non-limiting execution modes to find sampling ratio M _i

Figure 10 A and 10B are the flow chart according to a kind of non-limiting execution mode;

Figure 11 is for to have the down-sampling ratio according to a kind of non-limiting execution mode

The block diagram of horizontal down-sampling process;

Figure 12 has described example down-sampling process;

Figure 13 has described example up-sampling process;

Figure 14 has described example Gauss window function;

Figure 15 has described the pixel during example up-sampling process;

Figure 16 has described the example encoder framework according to a kind of non-limiting execution mode;

Figure 17 has described the example decoder framework according to a kind of non-limiting execution mode;

Figure 18 has described the pretreated illustrative embodiments of video data of relevant transcoder;

Figure 19 A is for realizing the system diagram of the example communication system of one or more disclosed execution modes therein;

Figure 19 B is the system diagram of example wireless transmitter/receiver unit (WTRU), and wherein said WTRU can use in the communication system shown in Figure 19 A; And

Figure 19 C, 19D and 19E are the system diagram of example wireless transmitter/receiver unit (WTRU), and wherein said WTRU can use in the communication system shown in Figure 19 A.

Embodiment

Multimedia technology and mobile communication have experienced dramatic growth and business success in recent years.Wireless communication technology has increased wireless bandwidth significantly and has improved service quality at the mobile subscriber.For example, third generation partner program (3GPP) Long Term Evolution (LTE) standard is compared with the second generation (2G) and/or the third generation (3G) and has been improved service quality.Although wireless communication technology is significantly improved, the video content demand that increases has been brought new challenge such as high definition (HD) video content by mobile Internet to mobile video content supplier, distributor and carrier service provider fast.

The video that exists at wired webpage and content of multimedia have ordered about user's expectation and have been equal to as required and visit this content from mobile device.World's mobile data services of higher percentage are becoming video content.Mobile video has had the high growth speed of any applicating category of weighing at this moment in the mobile data division scope of the VNI of Cisco prediction.

When the video content increase in demand, the data volume that need satisfy these demands also increases.At current compression standard such as H.264(AVC) block size for the treatment of video content under the standard is 16x16.Therefore, current compression standard is of value to little resolution video content, but does not benefit more high-quality and/or more high-definition video content, such as the HD video content.Be subjected to the ordering about of availability of the demand of high-quality and/or high-definition video content and more advanced compress technique, video encoding standard can be created, and described video encoding standard is compared with current standard such as AVC and can further be reduced at the required data transfer rate of high-quality video coding.For example, the group (JCT-VC) of the integration and cooperation group that group such as the relevant video that is formed by the mobile motion picture expert group version of the video of International Telecommunications Union coding expert group (ITU-VCEG) and International Standards Organization is encoded, thus be created to develop video encoding standard improvement video encoding standard.

Yet, based on video standard development Experience before, the expection of new video standard studies for a long period of time, the develop and field cycle can not be satisfied a large amount of appearance at the high-quality that transmits by mobile Internet as demand is desired and/or the demand of resolution video content fastly.Therefore, need system and method to satisfy the increased requirement of carrying out high-quality and/or the transmission of resolution video content by mobile Internet.For example, system and method can be provided for high-quality and/or the resolution video content with current operating such, for example such as with the HD video content of AVC video compression standard compatibility.

Fig. 1 is for describing the block diagram of exemplary video Code And Decode system 10, and described exemplary video Code And Decode system 10 can utilize adaptive coding technology described herein.As shown in fig. 1, system 10 comprises source device 12, and the video after described source device 12 is encoded by communication channel 16 transmission is to destination device 14.Source device 12 and destination device 14 can comprise arbitrarily equipment on a large scale.In some cases, source device 12 and destination device 14 can comprise wireless receiving/transmitter unit (WRTU), such as radio hand-held equipment or any wireless device that can transmit video information by communication channel 16, wherein communication channel 16 is wireless in this case.Yet system and method described herein must not be limited to wireless application or setting.For example, these technology can be applied to aerial television broadcasting, CATV transmission, satellite television transmission, internet video transmission, be encoded to digital video or other scene behind the coding on the storage medium.Correspondingly, communication channel 16 can comprise the combination in any of the wireless or wired media that is suitable for the video data after the transfer encoding.

In the example of Fig. 1, source device 12 comprises video source 18, video encoder 20, modulator (so-called modulator-demodulator) 22 and transmitter 24.Destination device 14 comprises receiver 26, demodulator (so-called modulator-demodulator) 28, Video Decoder 30 and display device 32.According to the present invention, the adaptive coding technology of describing in more detail below the video encoder 20 of source device 12 can be configured to use.In other example, source device and destination device can comprise other assembly or arrangement.For example, source device 12 can be from external video source 18 such as receiving video data the external camera.Similarly, destination device 14 can be connected with external display device and not comprise integrated display device.In other embodiments, the data flow that is generated by video encoder can be transferred into miscellaneous equipment and need not these data are modulated to carrier signal, for example transmit by Direct Digital, wherein said miscellaneous equipment can or can not modulated the data for transmission.

System 10 shown in Fig. 1 only is a kind of example.Technology described herein can be carried out by any digital video coding and/or decoding device.Although technology of the present invention is carried out by video encoder usually, this technology can also be carried out by video encoder/decoder, typically is called " codec (CODEC) ".In addition, technology of the present invention can also be carried out by video pre-processor.Source device 12 and destination device 14 only are the example of this encoding device, and wherein the video data behind the source device 12 generation codings is to be used for transferring to destination device 14.In some instances,

equipment

12,14 can be operated with the cardinal principle symmetrical manner, thus

equipment

12,14 each comprise video Code And Decode assembly.Therefore, system 10 can support single-pass or the transmission of bilateral video between

equipment

12 and 14, for example, be used for video flowing, video playback, video broadcasting or visual telephone.In some embodiments, source device can for the video flowing server be used for to generate at the video data behind the coding of one or more destination device, wherein destination device can communicate by wired and/or wireless communication system and source device.

The video source 18 of source device 12 can comprise video capture device, such as video camera, comprise before the video filing of video captured and/or the video that is provided by video content provider.As another replacement, video source 18 can generate data based on computer graphical as the source video, perhaps the combination of the video that generates of live video, filing video or computer.In some cases, if video source 18 is video camera, source device 12 and destination device 14 can form so-called camera phone or visual telephone.Yet as the above mentioned, the technology of describing among the present invention can be applicable to the video coding usually and go for wireless and/or wired application.Video that catch in each case,, that catch in advance or that computer generates can be encoded by video encoder 20.Can be modulated by modulator-demodulator 22 and be sent to destination device 14 by transmitter 24 according to communication standard after the video information behind the coding.Modulator-demodulator 22 can comprise various blenders, filter, amplifier or be designed to other assembly of signal modulation.Transmitter 24 can comprise the circuit that is designed to transmit data, comprises amplifier, filter and one or more antenna.

The receiver 26 of destination device 14 is by channel 16 reception information, and 28 pairs of these information of modulator-demodulator are carried out demodulation.Again, video decoding process can be realized one or more technology described herein.The information that transmits by channel 16 can comprise by video encoder 20 definition and the syntactic information that used by video encoder 30, comprise and describe macro block and other coding unit (for example, syntactic element of feature GOP) and/or processing.Display device 32 shows decoded video data to the user, and comprise one of any multiple display device, for example the display device of cathode ray tube (CRT), LCD (LCD), plasma display, Organic Light Emitting Diode (OLED) display or other type.

In the example of Fig. 1, communication channel 16 can comprise wireless or wire communication media arbitrarily, for example, and radio frequency (RF) frequency spectrum or one or more physical transmission lines or wireless and combination in any wired media.Communication channel 16 can form the part of packet-based network, for example, and local area network (LAN), wide area network or such as the World Wide Web of internet.Communication channel 16 ordinary representations are used for transmitting video data to any suitable communication medium of destination device 14 or the set of different communication medium from source device 12, comprise the combination of any appropriate of wired or wireless medium.Communication channel 16 can comprise router, switch, base station or be of value to any miscellaneous equipment of being convenient to from source device 12 to destination device 14 communication.

Video encoder 20 and Video Decoder 30 can be operated according to video compression standard, such as ITU-T standard (replacedly being called MPEG-4 the 10th part) H.264, advanced video coding (AVC).Yet technology of the present invention is not limited to any specific coding standard.H.263 other example comprises MPEG-2 and ITU-T.Although it is not shown in Figure 1, but in some respects video encoder 20 and Video Decoder 30 can be respectively with audio coder with decoder is integrated and can comprise suitable MUX-DEMUX unit, thereby perhaps other hardware and software is with the coding of common data stream or independent data stream processing audio and video.If be fit to, the MUX-DEMUX unit can meet ITUH.223 multiplexer agreement or other agreement such as User Datagram Protoco (UDP) (UDP).

ITU-T is H.264/MPEG-4(AVC) standard can move motion picture expert group version (MPEG) with ISO/IEC by ITU-T video coding expert group (VCEG) and be planned to product as the common partnership of joint video team (JVT).In some respects, the technology of describing among the present invention can be applied to meet usually the H.264 equipment of standard.H.264 H.264 standard was recommended at ITU-T by ITU-T seminar in March, 2005, the advanced video coding that is used for general audiovisual service is described, described ITU-T recommends H.264 to be known as herein H.264 standard or H.264 standard, perhaps H.264/AVC standard or standard.Joint video team (JVT) continues to be engaged in the expansion to H.264/MPEG-4AVC.

Video encoder 20 and Video Decoder 30 can be implemented as any in the various suitable encoder circuits respectively, such as one or more microprocessor, digital signal processor (DSP), application specific integrated circuit (ASIC), field programmable gate array (FPGA), discrete logic, software, hardware, firmware or its combination in any.Video encoder 20 and Video Decoder 30 can be included in respectively in one or more encoders or the decoder, and any one of described encoder or decoder can be integrated into the part in the encoder/decoder (CODEC) of the combination in separately video camera, computer, mobile device, subscriber device, broadcasting equipment, set-top box, server, media aware element etc.

Video sequence typically comprises a series of frame of video.Set of pictures (GOP) generally includes a series of one or more frame of video.GOP can be included in the syntax data in the frame head of one or more frame of frame head, GOP of GOP, perhaps otherwise describe a plurality of frames that are included among the GOP.Each frame can comprise that description is used for the frame syntax data of the coding mode of frame separately.Video encoder 20 typically the enterprising line operate of video piece in independent frame of video scope with to coding video data.The video piece can be corresponding to the part of macro block, macro block, the perhaps set of piece or macro block.The video piece can have size fixing or that change, and according to the prescribed coding standard in size and different.Each frame of video can comprise a plurality of.Every can comprise a plurality of macro blocks, and described macro block can be aligned to subregion, is also referred to as sub-piece.

A plurality of popular video coding standards, such as H.263, MPEG-2 and MPEG-4, advanced video encoding H.264/AVC(), the HEVC(efficient video coding) utilized the Predicting Technique of motion compensation.Image or frame of video can be divided into a plurality of macro blocks and each macro block can also be further divided.Macro block in the I frame can use from space neighbours' (other piece of I frame just) prediction and encode.Macro block in P or the B frame can use the prediction in the zone from its space neighbours (coding in spatial prediction or the pattern) or other frame (encoding between time prediction or pattern) to encode.Video encoding standard definition syntactic element comes presentation code information.For example, for each macro block, H.264 defined the mb_type value, wherein the mb_type value representation divided mode of macro block and Forecasting Methodology (space or time) therein.

Video encoder 20 can be provided for the independently moving vector of each subregion of macro block.For example, if video encoder 20 is selected whole macro blocks are used as single subregion, video encoder 20 can provide a motion vector that is used for macro block.As another example, if video encoder 20 selects the macroblock partition of 16x16 pixel is become the subregion of four 8x8, video encoder 20 can provide four motion vectors, and each motion vector is used for each subregion.For each subregion (perhaps sub-macroblock unit), video encoder 20 can provide mvd(motion vector difference) value and ref_idx value represent motion vector information.The mvd value can be represented with respect to motion predictor for the motion vector behind the coding of described subregion.The ref_idx(reference key) value can represent for potential reference picture list to be the index of reference frame.As example, two tabulations of reference picture are H.264 provided: tabulation 0 and tabulation 1.The ref_idx value can be identified in two tabulations one of them the tabulation in picture.Video encoder 20 can also provide indication ref_idx the information of the tabulation that value is associated with.

As example, ITU-T H.264 the standard support with the intra-prediction of various partition size, such as 16x16,8x8, perhaps 4x4 is used for luminance component, and the 8x8 that is used for chromatic component, and with the inter prediction of various block sizes, such as the 16x16 that is used for luminance component, 16x18,8x16,8x8,8x4,4x8 and 4x4 and the corresponding scale that is used for chromatic component.In the present invention, " NxN " and " N takes advantage of N " can alternately be used to refer to the Pixel Dimensions at the piece of vertical and level orientation, and for example, 16x16 pixel or 16 is taken advantage of 16 pixels.Usually, the piece of 16x16 has 16 pixels (y=16) in vertical direction and has 16 pixels (x=16) in the horizontal direction.Similarly, the piece of NxN has N pixel and has N pixel in a horizontal direction in the vertical direction usually, and wherein N represents nonnegative integral value.Pixel in the piece can be aligned to row and column.In addition, piece might not need to have in a horizontal direction with vertical direction in identical pixel count.For example, piece can comprise the NxM pixel, wherein the unnecessary N that equals of M.

Can be known as the subregion of 16x16 macro block less than the block size of 16x16.The video piece can comprise pixel data blocks or comprise transformation coefficient block in transform domain in pixel domain, for example, follow conversion and use, such as discrete cosine transform (DCT), integer transform, wavelet transformation or conceptive similar after being transformed into presentation code the video piece and the residual video blocks of data of the pixel difference between the predicted video block.In some cases, the video piece can be included in the transformation coefficient block after the quantification in the transform domain.

Littler video piece can provide better prediction and residual error still less, and can be used to comprise the location of the frame of video of high detail grade.Usually, macro block and various subregion (being called sub-piece sometimes) can be taken as the video piece.In addition, sheet can be taken as a plurality of video pieces, such as macro block and/or sub-piece.Each sheet can be the independent decodable code unit of frame of video.Replacedly, frame self can be the decodable code unit, and perhaps the other parts of frame can be defined as the decodable code unit.Term " unit behind the coding " or " coding unit " can refer to any independent decodable code unit of frame of video, such as the sheet of entire frame, frame, be also referred to as the set of pictures (GOP) of sequence or according to another independent decodable code unit of applicable coding techniques definition.

H.264 the standard support has the motion vector of 1/4th pixel precisions.Just, support H.264 encoder, decoder and encoder/decoder (CODEC) can use the motion vector that points to both full-pixel position or 1/15th fraction pixel position.The value that is used for the fraction pixel position can use adaptive interpolation filters or fixedly interpolation filter determine.In some instances, H.264 Jian Rong equipment can use filter to calculate value for half-pixel position, uses bi-linear filter to be identified for remaining the value of 1/4th location of pixels afterwards.Adaptive interpolation filters can be used to during the cataloged procedure adaptively defining interpolation filter coefficient and thus filter coefficient can be when carrying out adaptive interpolation filters time to time change.

Coding generates after predictive data and the residual error data and after any conversion (such as for H.264/AVC 4x4 or 8x8 integer transform or discrete cosine transform DCT) generated conversion coefficient, the quantification of conversion coefficient can be performed in prediction or between prediction.Quantize to be often referred to the process that conversion coefficient is quantized into the data volume that may reduce to represent coefficient.Quantizing process can reduce the bit-depth that is associated with some or all coefficients.For example, the n bit value can round (round down) downwards and be the m bit value during quantizing, and wherein n is greater than m.

After quantizing, the entropy coding that quantizes the back data can be performed, for example, and according to content-adaptive variable length code (CAVLC), background (context) adaptive binary arithmetic coding (CABAC) or another entropy coding method.Configuration is used for processing unit or another processing unit of entropy coding can carry out other processing capacity, zero run-length coding and/or the generative grammar information of the coefficient after for example quantizing, such as maximum macroblock size (for example, frame, sheet, macro block or sequence) of coded block pattern (CBP) value, macro block (mb) type, coding mode, coding unit etc.

Video encoder 20 can also for example send syntax data in frame head, build, head or GOP head, such as block-based syntax data, based on the syntax data of frame, based on the syntax data of sheet and/or based on the syntax data of GOP to Video Decoder 30.The GOP syntax data can be described in a plurality of frames among the GOP separately, and the frame syntax data can be indicated coding/predictive mode that corresponding frame is encoded.

Video Decoder 30 can receive the bit stream of the motion vector that comprises that arbitrary technology is encoded according to the present invention.Correspondingly, Video Decoder 30 motion vector that can be configured behind the digram coding is resolved.For example, at first analytical sequence parameter set or sheet parameter set determine whether motion vector behind the coding uses the method that all motion vectors are remained on a Motion Resolution rate to encode or the method for whether using motion predictor to be quantized into motion vector resolution is encoded to Video Decoder 30.After the Video Decoder 30 can by determine motion predictor and will be used for encoding after the motion vector value mode that be added to motion predictor come the motion vector with respect to motion predictor is decoded.

Video encoder 20 and Video Decoder 30 can be implemented various suitable encoders or in the decoder circuit any respectively, such as applicable one or more microprocessor, digital signal processor (DSP), application specific integrated circuit (ASIC), field programmable gate array (FPGA), discrete logic, software, hardware, firmware or its combination in any.Video encoder 20 and Video Decoder 30 can be included in respectively in one or more encoders or the decoder, and any one of described encoder or decoder can be integrated into the part in the encoder/decoder (CODEC) of combination.The device that comprises video encoder 20 and/or Video Decoder 30 can comprise integrated circuit, microprocessor and/or Wireless Telecom Equipment, such as cell phone.

Fig. 2 is for describing the block diagram of video encoder 200 examples, and described video encoder 200 has realized being used for the technology of video signal self-adaptive coding.Video encoder 200 can be carried out the interior and interframe encode of frame of piece in the frame of video scope, comprises subregion or the child partition of macro block or macro block.Intraframe coding depends on spatial prediction and reduces or remove spatial redundancy in the video in the given frame of video scope.Interframe encode depends on time prediction and reduces or remove time redundancy in the video in the consecutive frame scope of video sequence.Internal schema (I pattern) can refer to some based on the compact model in space and any one of inter-frame mode, such as single directional prediction (P pattern) or bi-directional predicted (B pattern), can refer to any one of some time-based compact models.Although described the assembly at coded in inter mode among Fig. 2, it should be understood that video encoder 200 can also comprise the assembly for the frame mode coding.Yet these assemblies are not illustrated for succinct and clear consideration.

Incoming video signal 202 is handled by block-by-block.The video module unit can multiply by 16 pixels (that is macro block (MB)) for 16 pixels.Current, the integration and cooperation group of the JCT-VC(video encoding context of ITU-T/SG16/Q.6/VCEG and ISO/IEC/MPEG) developing the video encoding standard that the next generation is called efficient video coding (HEVC).In HEVC, the block size of expansion (is called " coding unit " or CU) is used to more effectively compress high-resolution (1080p and more than) vision signal.In HEVC, CU can and be low to moderate the 4x4 pixel up to the 64x64 pixel.CU can further be divided into predicting unit or the PU that can use independent Forecasting Methodology.Each input video piece (MB, CU, PU etc.) can usage space predicting unit 260 and/or time prediction unit 262 handle.

Spatial prediction (being infra-frame prediction) uses and predicts current video block from the pixel in the adjacent block encoded in identical video pictures/sheet.Spatial prediction has reduced spatial redundancy intrinsic in vision signal.Time prediction (being inter prediction or motion compensated prediction) is used and is predicted current video piece from the pixel of encoded video pictures.Time prediction has reduced time redundancy intrinsic in the vision signal.Time prediction at given video piece is represented by one or more motion vectors usually, amount of exercise and the direction of motion between wherein said one or more motion vector indication current block and its one or more its reference block.

If a plurality of reference picture be supported (up-to-date video encoding standard, such as H.264/AVC or HEVC be exactly this situation), for each video piece, its reference picture index also is sent out so.Reference key is used to the recognition time prediction signal from which reference picture in the reference diagram valut 264.After space and/or time prediction, the mode decision in the encoder and encoder controller 280 are for example selected predictive mode according to the rate-distortion optimisation method.The prediction piece is deducted from current video block at adder 216 places afterwards and prediction residual is transformed unit 204 conversion and is quantized unit 206 quantifications.Thereby the residual error coefficient after the quantification at inverse quantization unit 210 places by re-quantization and form the residual error of reconstruct in the inverse transformation of inverse transformation block 212 places.The video piece after the piece after the reconstruct after thereby adder 226 places are added to prediction piece formation reconstruct back.This external circle filtering such as block elimination effect filter and auto-adaptive loop filter 266 can the video piece after reconstruct be placed in the video piece that is applied to before the reference diagram valut 264 after the reconstruct and be used to following video piece is encoded.For forming output video bit stream 220, coding mode (in interframe or the frame) is further compressed and packs to form bit stream 220 thereby the residual error coefficient after prediction mode information, movable information and the quantification is sent to entropy coding unit 208.As be described in greater detail below, system and method described herein can be realized in spatial prediction unit 260 scopes at least in part.

Fig. 3 is the block diagram according to the block-based Video Decoder of a non-limiting execution mode.Video bit stream 302 is at first unpacked and carries out entropy at entropy decoding unit 308 places and decode.If coding mode and information of forecasting are sent to spatial prediction unit 360(for intraframe coding) if or time prediction unit 362(be interframe encode) thereby form the prediction piece.Residual transform coefficient is sent to inverse quantization unit 310 and inverse transformation block 312 with the reconstruct residual block.Prediction piece and residual block are afterwards at 326 places addition together.Piece after the reconstruct can also pass through circle filtering unit 366 before it is stored in reference diagram valut 364.Outwards sent to drive display device after the video 320 after the reconstruct, and be used to predict following video piece.

According to a kind of execution mode, preliminary treatment and/or after-treatment system framework can compress the video data that original video data and/or transcoding have been encoded, such as quantizing by common control change territory and the bit stream of the further compression of spatial domain down-sampling and do not change the reference format of video flowing.The system architecture of preliminary treatment and/or reprocessing can with any form such as H.263, MPEG-2, Flash, MPEG-4, H.264/AVC, HEVC or any similar multimedia form be to coding video data and/or decoding.As described above, these and similarly form can use video-frequency compression method such as discrete cosine transform (DCT), fractal compression method, match tracing or wavelet transform (DWT).

The restriction of various existing compression standards is such as being the size of specifying macro block (MB) H.264/AVC, such as 16 * 16.In a MB, according to predictive mode, pixel can be divided into some block sizes.Arbitrarily the largest amount of piece can be 16 * 16, and any two MB transform and quantization independently.This technology can provide very high efficient to CIF/QCIF and other similar resolution content.Yet, its for more high-resolution such as 720p, 1080i/1080p and/or similar or even the video content of higher resolution not efficient.May be because there be higher correlation in this in the pixel in regional area.Therefore, 16 * 16MB of appointment size can limit the further compression that utilizes the relevant information in adjacent MB.

The resolution content of being encoded by little MB size may produce unnecessary spending.For example, in bit stream H.264, the codec element can comprise four types information: 1) movable information, such as motion vector and reference frame index etc.; 2) residual error data; 3) header information of MB is such as MB type, coded block pattern and/or quantization parameter (QP); 4) syntactic element of sequence layer, picture layer and/or lamella.Although movable information and residual error data may be highly to depend on content, MB header information and/or syntactic element can be constant relatively.The header information of MB and/or syntactic element can be represented the expense in the bit stream thus.Suppose content and/or coding configuration file (profile), the more high compression ratio of encoder can realize by the bit rate that reduces residual error data.For example, H.264 the more high compression ratio of encoder can realize by the bit rate that reduces residual error data.Compression ratio is more high, exists the percentage of expense more high.Therefore, in the application of high-resolution and/or low bit rate, expense may consume most for the bit stream of transmission and storage.Most of the described bit stream that is consumed by expense can make for example encoder poor efficiency H.264 of encoder.

According to system and method described herein, preliminary treatment and/or reprocessing may cause expense still less, to the adjustment (align) of pel motion compensation precision and reconstruction accuracy, and the enhancing of residual error precision, and/or less complexity and/or less storage requirement.When the quantity of MB is reduced to the down-sampling ratio, because the down-sampling of carrying out in preliminary treatment can produce less expense.Therefore, can be simplified near the syntactic element of constant MB header and/or lamella.

The precision of motion compensation and reconstruction accuracy can also be adjusted in the preliminary treatment of video data and/or reprocessing.In the down-sampling frame, the quantity of motion vector difference (MVD) can be reduced.According to a kind of execution mode, the minimizing of MVD can be saved the bit for coded motion information.In one embodiment, the bit of saving can be used in the scene of low bit rate predicated error be encoded.Therefore, reconstruction accuracy can be improved by the mode that the precision of the precision of motion compensation and quantized prediction error is adjusted.

The preliminary treatment of video data and/or reprocessing can also promote the residual error precision.For example, in the frame behind down-sampling, identical transform block size can correspond to the higher transform block size in the primitive frame.According to a kind of example, 8 * 8 transform block size can be corresponding to 16 * 16 variation block size of 1/4 down-sampling ratio.Because quantization step is identical for the conversion coefficient in encoder such as the encoder H.264, encoder can lose the information of high and low frequency component among both.Therefore, the preliminary treatment of video data described herein and/or reprocessing can be preserved the higher low frequency component precision than the conventional codec that is used for high-resolution and low rate encoding situation, and this can produce better subjective quality.Thereby the up-sampling process in decoder can be used to interpolated pixel recovers original frame.

The preliminary treatment of video data and/or reprocessing also can cause lower complexity and/or storage requirement.Be reduced to the down-sampling ratio owing to be used for the pixel quantity of coding behind down-sampling, complexity and/or the storage requirement of coding (perhaps transcoding) can be reduced to identical grade.Correspondingly, complexity of decoding and/or storage requirement also can be reduced to identical grade.These codings and/or decode procedure can be so that the application of low resolution encoder and/or decoder, such as the coding in the equipment of mobile phone and other resource-constrained.According to exemplary embodiment, these codings and/or decoding are handled can be so that combination and/or the application of the H.264 encoder in the mobile phone and/or decoder.

In order to solve the restriction of the conventional codec in the application of high-resolution and/or low bit rate, thereby system and method described herein can quantification and the spatial domain down-sampling in control change territory be realized further compression independently and/or jointly.Described quantification and down-sampling can be carried out with acceptable subjective quality.Fig. 4 shows the encoding scheme that codec (namely H.264/AVC codec) directly is applied in input video.Fig. 5 shows to utilize has the illustrative embodiments of down-sampling and the coding in up-sampling stage.Compare with method shown in Figure 4, the method for describing among Fig. 5 can distribute more bits to come in coding stage encoding with inter-prediction error in the frame; Therefore it can obtain better to have the reconstruct of higher visual quality.Although down-sampling has been introduced information dropout (particularly high fdrequency component), because network limits and low, surpass loss in detail in down-sampling process in the better reconstruct meeting of coding stage when the operation bit rate; Therefore the better overall visual quality is provided.Additionally, by less (be down-sampling after) video is encoded, can save rated output.Yet because down-sampling caused information dropout before cataloged procedure, if the excessive down-sampling of original video quilt, the information dropout of Yin Ruing can surpass the more Hi-Fi benefit in coding stage in front.Therefore, the information dropout introduced in the information dropout of introducing during the down-sampling and during encoding of the common seeking balance of system and method described herein.Specifically, process described herein can be derived, and a plurality of down-sampling ratios and selection are reduced in down-sampling and the down-sampling ratio of the total distortion amount introduced during coding stage.But given available data transmission capacity, incoming video signal statistics and/or other operating parameter, selected down-sampling ratio can be selected.In some embodiments, selected down-sampling ratio can be the optimum down-sampling ratio that reduces whole distortion.

The flexibility that provides of the filter of Miao Shuing is more useful such as the anti-sawtooth filter that only 2x2 down-sampling and up-sampling are provided than other filter from here.With high bit rate, such as the 512kbits/s that is used for CIF, even down-sampling ratio 2x2 is too high so that high fdrequency component is seriously lost and use lossless coding to be compensated.Therefore, with high bit rate, the balance that provides between resolution reduction and the details protection can be provided the sampling ratio.

With reference now to Fig. 5,, the down-sampling ratio of representing with M is variable, wherein said variable can be confirmed as the function of various parameters, such as quality of service class identifier (QCI) and the incoming video signal feature of, data available transmission capacity, the carrying related with video.For example, if data transmission capacity is abundant relatively for incoming video signal, so H.264/AVC encoder has enough bits and comes predicated error is encoded; In this case, the M value can be set up near 1.0.Otherwise, if it is insufficient that data transmission capacity is identified as for input signal, so bigger M value can be selected (producing more down-sampling) because since the information dropout of down-sampling process can by because of coding stage less encoding error well compensated.Because data transmission capacity is represented by the bit rate with fine granulation that usually in various execution modes, the M value can be very flexible.Describe in more detail as following, described system and method can be provided to the sampling ratio M that determines selection based on data available transmission capacity and incoming video signal based at least part of.Given selected sampling ratio M, special filter can be considered at coding video being carried out down-sampling and at showing decoded video being carried out up-sampling.With reference to figure 11-15, below also describe to be used for design in more detail at the various technology of the anti-sawtooth filter of any reasonable value sampling ratio.

Refer again to Fig. 4 and Fig. 5, the output that the video input is represented as f and conventional codec is represented as f ₁And be represented as f according to the output of described system and method example codec ₂The reconstructed error of the codec among Fig. 4 can be defined as equation (1):

σ ₁ ²＝E[(f-f ₁) ²] （1）

The reconstructed error of the codec among Fig. 5 can be defined as equation (2):

σ ₂ ²＝E[(f-f ₂) ²] （2）

Therefore, if σ ₂ ²Less than σ ₁ ², the codec among Fig. 5 shows better than the codec among Fig. 4.According to system and method described herein, by finding the mode of M, σ ₂ ²And σ ₁ ²Between difference can be increased (and maximization) in some cases, shown in equation (3):

M＝argmax _M(σ ₁ ²-σ ₂ ²) （3）

Because for given purpose bit rate σ ₁ ²Be constant, in some embodiments, equation (3) is simplified and is represented as equation (4)

M＝argmax _Mσ ₂ ² （4）

Therefore, according to system and method described herein, for given bit rate, sampling ratio M can be determined, thus reconstructed error (the σ of the codec shown in Fig. 5 ₂ ²) be reduced.In some embodiments, sampling ratio M can be determined, and this will cause the reconstructed error near minimum value (perhaps at least substantially near minimum value).In some embodiments, sampling ratio M can concentrate from predetermined sampling ratio and select, and wherein selected ratio M concentrates the reconstructed error that provides minimum from predetermined sampling ratio.

In some embodiments, M is scalar, and level has identical ratio with vertical direction thus.The resolution of given video W * H, the video resolution behind the down-sampling is

For using some execution modes of supporting non-all side's samplings (aspect ratio of namely sampling (SAR) is not equal to 1:1) and the video behind the down-sampling can being interpolated into the decoder of the full resolution with appropriate aspect ratio pictures (PAR), level can be different with vertical ratio.In this case, M=[M _h, M _v] be vector, wherein a M _hAnd M _vRepresent the sampling ratio at level and vertical direction respectively.Therefore, although described some example embodiment in the scalar environment, the present invention is not limited to this.On the contrary, some execution modes can utilize and have the cataloged procedure that unbalanced ratio is applied to each direction.

For simplified illustration, the processing of describing among Fig. 5 can be broken down into sampling section (Fig. 6 A) and coded portion (Fig. 6 B).With reference to the sampling section shown in the figure 6A, for input original video sequence f, thereby the up-sampling of and then using after with the down-sampling of factor M 602 with factor M 608 generates f ₃Just, f and f ₃Between error only produce and can be known as " down-sampling error " and be expressed as σ by sampling _d ², σ wherein _d ²Can be defined by equation (5):

σ _d ²＝E[(f-f ₃) ²] （5）

With reference to the coded portion shown in the figure 6B, be input as down-sampling video d ₁, and d ₁Thereby encoded by encoder 612 and decoded by decoder 614 and obtain reconstruction signal d ₂, d wherein ₂Be d ₁Version behind the degradation.d ₁And d ₂Between error only produce and can be known as " encoding error " and be expressed as σ by coding _c ², σ wherein _c ²Can be defined by equation (6):

σ _c ²＝E[(d ₁-d ₂) ²] （6）

σ ₂ ²(equation 2) and σ _d ²And σ _c ²Between relation thereby can be defined by equation (7):

σ ₂ ²＝μσ _d ²+σ _c ² （7）

Therefore, the optimization problem in (4) can be expressed again with equation (8):

M＝argmin _M(μσ _d ²+σ _c ²) （8）

In equation (6) and (7), μ is the weight factor in [0,1] scope.For simplicity but be without loss of generality, for illustrative embodiments described herein, weight factor μ is set to 1.

Sampling error is estimated

Between sampling period, before f was by down-sampling, f can be by anti-sawtooth filter filtering, and wherein said anti-sawtooth filter is low-pass filter type.The additional detail of relevant example filter is described about Figure 11-15 place following.With f ₃The sample phase of expression (Fig. 6 A) is output as the obfuscation version of f, because f ₃No longer have than the cut-off frequency of the anti-sawtooth filter that the is applied to f energy component of high frequency component also.Therefore, in some embodiments, sampling error can be present among the f but at f by measurement ₃In the energy of the high fdrequency component of losing measured in frequency domain.According to various execution modes, as described in more detail below, the Energy distribution of f can be come modeling based on real power spectral density (PSD) or the PSD that estimates.Replacedly, other technology can be used to assess the effect to the sampling ratio of frequency video signal content.

The PSD based on data of f estimates

Given have steadily (WSS) random field R (τ of automatically relevant broad sense _h, τ _v), PSDS _Xx(ω ₁, ω ₂) can calculate by the 2-D discrete time Fourier transform (DTFT) in the equation (9):

S_{xx} (ω_{1}, ω_{2}) = Σ_{τ_{h} = - \infty}^{\infty} Σ_{τ_{v} = - \infty}^{\infty} R (τ_{h}, τ_{v}) e^{- j ω_{1} τ_{h} {- jω}_{2} τ_{v}}

（9）

R (τ _h, τ _v) be the estimation based on the vision signal collection.2-D DTFT is applied to the R (τ of estimation _h, τ _v) producing the PSD that estimates, the PSD of wherein said estimation no longer is consistent.According to various execution modes, PSD estimates by the periodogram of random field, is provided by equation (10):

{\hat{S}}_{xx} (ω_{1}, ω_{2}) = \frac{1}{WH} {| X (ω_{1}, ω_{2}) |}^{2} = \frac{1}{WH} {| Σ_{w = 0}^{W - 1} Σ_{h = 0}^{H - 1} x [w, h] e^{- j ω_{1} w - j ω_{2} h} |}^{2}

（10）

Wherein W and H represent width and the height of video sequence.The factor

Can be used to guarantee that gross energy in the frequency domain equals the gross energy in the spatial domain, shown in equation (11):

{&Integral;}_{- π}^{π} {&Integral;}_{- π}^{π} {\hat{S}}_{xx} (ω_{1}, ω_{2}) {dω}_{1} {dω}_{2} = Σ_{w = 0}^{W - 1} Σ_{h = 0}^{H - 1} {| x [w, h] |}^{2}

（11）

According to system and method described herein, as given video sequence f, described video sequence f means and is input as deterministic 2-D signal rather than WSS random field, in the equation (10)

Also be taken as energy frequency spectrum density (ESD).

In equation (10), x[w, h] be the frame among the video sequence f;

Be x[w, h] expression in frequency domain.In one embodiment, video sequence f can comprise consistent content, such as pulse (single shot).In this case, in f based on a typical x[w, h] calculate such as first frame

The Energy distribution that can represent whole sequence f.In another embodiment, wherein f comprises the scene variation; In this case,

Can be the mean value of a plurality of PSD:

Deng, described Deng respectively based on a plurality of frame x ₁[w, h], x ₂[w, h] etc. and calculate.In addition, frame x _i[w, h] (i=1,2 etc.) can be selected from scene #i.

In some embodiments, the technology for the PSD that estimates whole sequence can change.For example, in one embodiment, a plurality of frames: x ₁[w, h], x ₂[w, h] etc. can select from f such as one second and the PSD of a plurality of correspondences with fixed intervals:

Deng can being calculated and by being averaging to generate

In one embodiment, video sequence f is subdivided into the I section, wherein every section w that forms (for example, this section can be content-based, motion, texture and border structure etc.) by one group of successive frame and have assignment _iWeight.Then, whole PSD

Be set to frame x _i[w, h] (i=0, i=1,2 ... I-1) weighted average of PSD, described each frame is selected from section #i, as shown in equation (12):

{\hat{S}}_{xx} (ω_{1}, ω_{2}) = \frac{1}{WH} Σ_{i = 0}^{l - 1} w_{i} {| X_{i} (ω_{1}, ω_{2}) |}^{2} = \frac{1}{WH} Σ_{i = 0}^{l - 1} w_{i} {| Σ_{w = 0}^{W - 1} Σ_{h = 0}^{H - 1} x_{i} [w, h] e^{- j ω_{1} w - j ω_{2} h} |}^{2}

（12）

The PSD based on model of f estimates

In some embodiments, such as the execution mode related with live video stream, the frame of not representing the representative content of sequence can be able to preliminary treatment (be the x[w in the equation (10), h]) and assess PSD.Therefore, in some embodiments, PSD Can use the formula as shown in equation (13), (14) and (15) to come modeling:

{\hat{S}}_{xx} (ω_{1}, ω_{2}) = F (ω_{1}, ω_{2}, \overset{&RightArrow;}{b})

（13）

Wherein

Vector for the independent variable (argument) that comprises function F (.).In one embodiment, be used to right

The function F (.) of modeling has a parameter, as shown in equation (14):

{\hat{S}}_{xx} = K \cdot e^{- (\frac{1 - \sqrt{{ω_{1}}^{2} + {ω_{2}}^{2}}}{b_{0}})}

（14）

Wherein K is the factor of guaranteeing the conservation of energy.Because the gross energy accurately in the spatial domain be unknown (because x[w, h] can not obtain), some execution modes its can be by estimation as shown in equation (15):

{&Integral;}_{- π}^{π} {&Integral;}_{- π}^{π} {\hat{S}}_{xx} (ω_{1}, ω_{2}) d ω_{1} {dω}_{2} = Σ_{w = 0}^{W - 1} Σ_{h = 0}^{H - 1} {| x [w, h] |}^{2} = W \times H \times 128^{2}

（15）

In equation (14), b ₀Be the independent variable of being determined by resolution and the content of video sequence.In one embodiment, b ₀Content be classified into three kinds: simple, medium and complicated.Illustrated in the table 1 according to the b of a kind of non-limiting execution mode at different resolution and background (context) ₀Empirical value.

Table 1

Form	Simply	Medium	Complicated
				CIF	0.1061	0.137	0.1410
WVGA	0.1020	0.124	0.1351
				1280x720	0.0983	0.105	0.1261
1920x1080	0.0803	0.092	0.1198

f ₃PSD estimate

Because ratio M is rational, it can be expressed as

A 〉=B.Thus, the video behind the down-sampling has resolution

In other words, the ratio of the resolution of reduction equals

In frequency domain, the ratio of the frequency component of losing also equals

And if the anti-sawtooth filter that is applied to f exists

The place has the sharp cut-off frequency, and all these components of losing are arranged in high-frequency domain.In ideal situation, (being that down-sampling output heel is with up-sampling) is at frequency band With

In Fig. 6 A in f ₃All high fdrequency components lost.Be expressed as

F ₃PSD can by will

{\hat{S}}_{xx} (ω_{1}, ω_{2}), (ω_{1}, ω_{2} &Element; [- π, - \frac{B}{A} π] \cup [\frac{B}{A} π, π])

Value be set to equal zero and from

Middle estimation, as shown in equation (16):

（16）

It should be noted that in (11) Estimation and not exclusively true because anti-sawtooth filter does not have desirable sharp cut-off frequency, but it is very near f ₃True PSD.

In addition, has different sampling ratios respectively when level and vertical direction

With

Estimation can express again with equation (17):

（17）

Sampling error is calculated

Estimating f and f ₃PSD(namely

)) afterwards, the down-sampling error

Can calculate by equation (18):

σ_{d}^{2} = \frac{1}{WH} {&Integral;}_{- π}^{π} {&Integral;}_{- π}^{π} [{\hat{S}}_{xx} (ω_{1}, ω_{2}) - {\hat{S}}_{yy} (ω_{1}, ω_{2})] {dω}_{1} {dω}_{2}

（18）

Usually, the down-sampling error that is provided by equation (18) Provide at incoming video signal with the indication of the high-frequency energy content difference between the vision signal of down-sampling ratio sampling.Other technology can be used to generate the down-sampling error For example, in some embodiments, the down-sampling error

Can pass through to determine behind the down-sampling and the vision signal f behind the up-sampling ₃And the mode of the mean square deviation between the incoming video signal f (MSE) obtains.For another example, in some embodiments, the down-sampling error

Can be by resisting the sawtooth filter to be applied to incoming video signal f and determining that the mode of the MSE between filtered f and the original input video f obtains.For another example, in some embodiments, the down-sampling error

Can by will have contrast to the high pass filter of the identical cut-off frequency of the anti-sawtooth filter mode that is applied to the average energy of each pixel of f after incoming video signal f and the definite high-pass filtering obtain.

The estimated coding error

Given target bit rate R, encoding error Can estimate by model.In some embodiments, used following rate distortion (R-D) model that is illustrated by equation (19):

σ_{c}^{2} = \frac{β}{γ^{α}}

（19）

Wherein γ is the average number of bits of distributing to each pixel, i.e. every pixel bit (bpp).In some embodiments, γ can calculate by equation (20):

γ = \frac{R \times M_{h} \times M_{v}}{fps \times W \times H}

（20）

In equation (20), fps is frame rate, means the frame number of catching in per second, M _hAnd M _vBe respectively the sampling ratio on level and the vertical direction, W is horizontal resolution, and H is that vertical resolution and R are bit rate.

Bit rate R can be acquired, and perhaps derives by various technology.For example, bit rate R can be provided by the user of coded system.In some embodiments, the network node related with coded system can monitor the bit rate that is associated with various video flowings such as video server or media aware element.Can indicate at the bit rate of particular video stream with request by the requester network node after the video encoder.In some embodiments, bit rate can change in time, such as switching or the IP related with the subscriber equipment receiver, video drifts during the moving sexual function (IFOM).Encoder can receive the message that comprises the target bit rate after the renewal.In some embodiments, bit rate R can derive from the service quality rating designator (QCI) of distributing to video flowing by decoder.For example, the current bit rate (GBR) that guarantees that provides of one to four QCI.GBR can be used to determine encoding error by video encoder

In some embodiments, bit rate R can be determined or be provided by the subscriber equipment related with decoder.For example, subscriber equipment can provide the estimation of total aggregated data throughput of transmissions by appropriate signaling to encoder.Have at subscriber equipment under the situation of multi radio access technology (RAT) communication capacity, bit rate R can be the indication of the throughput by two kinds or more kinds of radio access technologies such as honeycomb RAT or non-honeycomb RAT.In some embodiments, the RTP/RTCP agreement can be used to determine bitrate information.For example, thus RTP/RTCP can move in WRTU and base station and collects application layer bit rate.Can in equation (20), use after this bit rate R.

R-D model in the equation (19) has two parameter alpha and β, and wherein the value of α and β changes according to factors such as including but not limited to sequence content, sequence resolution, encoder realization and configuration.Below described various be used to the various execution modes that find α and β desired value in more detail.In case use various suitable technique definite value at α and β, at the encoding error of particular sample ratio

Can be calculated afterwards.For sampling ratio M _hAnd M _v, use equation (20) can at first determine the average bit r of each pixel.Afterwards, can be used to the calculation code error after the average bit r of determined each pixel

, such as equation (19) description.Can calculate encoding error at the difference ratio of sampling afterwards

At first, the new average bit r of each pixel can use the new sampling rate value in the equation (19) to calculate.Can be used to separate equation (19) after the new value of r.

α and β value-off-line mode

In some embodiments, be selected and during no time limit, off-line training can be used to find very exactly and distortion be predicted or the α of modeling and the value of β from cataloged procedure when the sampling ratio.Therefore, in one embodiment, video can be pretreated with the relation between deterministic bit rate and the coding distortion.Can when determining the sampling ratio, utilize after the determined relation, because Available Bit Rate or target bit rate change between the video transmission period in time.Described relation can be included but not limited to factor affecting such as video data content, video data resolution, encoder realization and configuration.

The factor of mentioning before selected, the encoder of configuration can be encoded to given sequence with full resolution in known the setting.This emulation can be at bit rate { R ₀, R ₁..., R _N-1Scope in carry out, produce the distortion collection { D corresponding to each bit rate ₀, D ₁..., D _N-1.Described bit rate can use equation (21) to be normalized to bpp{r ₀, r ₁..., r _N-1}:

r_{i} = \frac{R_{i}}{fps \times W \times H}

（21）

Corresponding distortion can correspondingly be normalized to mean square error (MSE), with { d ₀, d ₁..., d _N-1Expression.Bit rate after the normalization and distortion are to [r _i, d _i] (0≤i≤N) can be depicted as the R-D curve.The optimization algorithm of numeral can be by separating the α that equation (22) finds expectation _OptAnd β _OptThe mode of value is used to mate the R-D curve.

[α_{opt}, β_{opt}] = \arg mi n_{α, β} Σ_{i = 0}^{N - 1} {(d_{i} - \frac{β}{{r_{i}}^{α}})}^{2}

（22）

Value-line model of α and β

For some execution modes, video sequence or a sequence part can be used for preliminary treatment, but off-line training is for using owing to can't bear such as the high complexity degree.In these execution modes, signal analysis can be carried out and useful feature can be extracted according to the available part of video sequence, and the characteristics of described useful feature reflecting video sequence are such as motion, texture and border etc.The value of the feature of extracting and parameter alpha and β has high correlation, and therefore the feature of extracting can be used to estimate that α and β value are to reduce the distortion that coding causes.

In one embodiment, based on describing in detail more than the PSD() video sequence can be analyzed and two kinds of features can from

The middle extraction.The energy percentage F that a kind of feature that is utilized is the DC component _DCAnd another feature is cut-off frequency ± ω _c, wherein have at ± ω _cThe energy of the component of the frequency outside the scope for example has the threshold value of being lower than T(, gross energy T=0.5%).Usually, cut-off frequency ± ω _cThe PSD decline rate of high band is pointed in expression, wherein ± and ω _cAbsolute value be positioned at [0, π] scope.Therefore, ± ω _cValue more little, the PSD decline rate that points to high band is more fast.F _DCAnd ω _cCan calculate by equation (23) and (24) respectively:

F_{DC} = \frac{{\hat{S}}_{xx} (0,0)}{{&Integral;}_{- π}^{π} {&Integral;}_{- π}^{π} {\hat{S}}_{xx} (ω_{1}, ω_{2}) d ω_{1} d ω_{2}}

（23）

ω_{c} = \min {ω | \frac{{&Integral;}_{- ω}^{ω} {&Integral;}_{- ω}^{ω} {\hat{S}}_{xx} (ω_{1}, ω_{2}) d ω_{1} d ω_{2}}{{&Integral;}_{- ω}^{ω} {&Integral;}_{- π}^{π} {\hat{S}}_{xx} (ω_{1}, ω_{2}) d ω_{1} d ω_{2}} &GreaterEqual; (1 - T)}

（24）

In one embodiment, F _DCBlocked in the scope of [0.85,0.99] and by H rank uniform quantizer and quantized.In one embodiment, ω _cBlocked in the scope of [0,0.9 π] and by the uniform quantizer on L rank and quantized.F after the feature of these two kinds of extractions namely quantizes _DCAnd ω _c, with

With

Expression is searched the value that two items in the 2-D table obtain α and β respectively thereby can be used as two index.In one embodiment, F _DCQuilt is had to be positioned at 0.85,0.86 ..., 15 rank uniform quantizers of the reconstruction point at 0.98,0.99} place quantize and ω _cQuilt is had to be positioned at 0.0 π, 0.1 π ..., 0.8 π, 0.9 π } and 10 rank uniform quantizers of the reconstruction point located quantize.Fig. 7 and Fig. 8 show respectively will according to a kind of execution mode

With

The look-up table at α and β as index.It should be noted that in some-1.0 do not represent the value of α and β; On the contrary, enter have value-1.0 the item

With

The reality that is combined in do not take place.

The value of α and β-simplification pattern

In some embodiments, such as live video stream, the frame of not representing the representative content of sequence can be able to preliminary treatment (for example the x[w in the equation (10), h]) and assess PSD or correspondingly extract feature and analyze video sequence from PSD.In these cases, pattern (being called " simplification pattern " herein) can be used to estimate α and β.

The resolution of given input video f and content type, the value of α and β can be determined by the mode of searching the 2-D table.Predefined resolution format can be for form commonly used, such as CIF, WVGA, VGA, 720p, 1080p etc.True resolution at input f is not that the most similar pre-defined resolution can be used to be similar under the situation of predefined a kind of form.The content of video sequence can comprise motion, texture, border structure etc.Given bit rate, the video with simple content can lack than complicated video degradation after coding.In some embodiments, the content of video sequence can be classified into some classifications from " simply " to " complexity " according to the grain size category that application has.Content type for example can be indicated according to its priori to video by the user; Perhaps when priori knowledge did not exist, content type can automatically be set for default value.In one embodiment, table 2 can be used as the 2-D look-up table at α and β value.Table 2 has been indicated according to the value of various execution modes at α and the β of different resolution and content.

Table 2

Though predefined resolution comprises CIF, WVGA, 720p and 1080p and has used the content of three kinds (simple, medium, complexity) that the present invention is not limited to this.In some embodiments, Fu Jia grain size category can be included in the table.In addition, in some embodiments, the content type of acquiescence can be set to " medium ".

According to various execution modes, the complexity of video can be determined by various technology.For example, in one embodiment, user's input of the relative grade of indication complexity is received.Be used to determine suitable α and the β that is used in the equation (19) after this user's input.In some embodiments, video features information (for example, complexity) can receive from the network node that obtains this information.Based on this video information, suitable α and β value can be determined (for example, via look-up table) and use in equation (19) subsequently.In some embodiments, can from content statistics, calculate or estimate by the mode of before down-sampling first frame, storing some frames in advance at the complexity value of video.In this regard, various calculating can be utilized, such as pixel value gradient, block diagram, variance etc.

Search ratio M

Determine whole error

Minimum value be equivalent to find as defined sampling error in the equation (8)

And encoding error

And minimum value.More than discussed according to various non-limiting execution modes

With

Estimation.Below described various algorithms in more detail, described various algorithms are used to search for M to M minimizing and make whole error minimize in some cases.

The uniform sampling ratio M that is used for level and vertical direction

It is square that the pixel aspect ratio of the video behind down-sampling (PAR) is required that the identical and shape each pixel of the pixel aspect ratio with the full resolution video is needed as, and namely stores aspect ratio (SAR) and equal 1, at horizontal and vertical sampling ratio

Must be identical.Therefore, in some embodiments, this requirement can be served as first restriction.As second restriction, for a lot of application, it is the resolution behind the down-sampling preferably

Be necessary for integer for video format.Yet in some applications, some cut out and/or fill the integer value that can be used to obtain the pixel in each direction.In any situation, use this two kinds of restrictions, the probable value of M is limited.The greatest common divisor (GCD) of W and H is expressed as G, and possible ratio can be represented by equaling (25).

M = \frac{G}{G - n}, 0 \leq n \leq G - 1

（25）

Sometimes, output resolution ratio not only is needed as integer, also is needed as the multiple of K.For example, some H.264 encoder only handle the situation that K equals 16 because it does not support infilled frame to obtain an integer macro block (MB).Under this added limitations, the probable value of M is further reduced and (25) can be re-expressed as equation (26).

M = \frac{G}{G - nK}, 0 \leq n \leq \frac{G}{K} - 1

（26）

Under any circumstance, in some embodiments, " exhaustive " searching method can be used to find at might M global error

, wherein might be expressed as vector by M

And select sampling ratio M _i, wherein said sampling ratio provides the minimal overall error.In other embodiments, utilized a kind of searching method, described searching method finds suitable M value and need not determine global error at all possible M value.

Fig. 9 A, 9B and 9C have described search strategy according to various non-limiting execution modes and have found sampling ratio M _iFig. 9 A has illustrated the exhaustive search strategy, and Fig. 9 B has illustrated the search with big step-length, and Fig. 9 C has illustrated fine search.

At first with reference to figure 9A, in the global error of calculating at all M values

Afterwards, M ₁₃Be selected as the sampling ratio in the described execution mode.Do not lose M in order to save time _i(it reduces coding distortion), search can be carried out with big step-length, shown in Fig. 9 B, in order to reach the M of expectation _iThe scope that is positioned at.Subsequently, in this scope, further search for meticulousr step-length, in Fig. 9 C.In the example of describing in Fig. 9, M has 24 probable values, and the exhaustive search calculated population error among Fig. 9 A

24 times to find the M of selection _iAs a comparison, the combination rough and fine search among Fig. 9 B and Fig. 9 C reduces the amount of calculation of half.

In some embodiments, the sampling ratio of selection can be lower than the global error of global error threshold value from generation

Any appropriate ratio in select.In other words, with respect to the single sampling ratio of determining to cause " definitely " minimal overall error amount, exist to cause global error to be lower than a plurality of sampling ratios of the global error threshold value of expectation.Thus, according to each execution mode, cause the global error level to be lower than in the sampling ratio of threshold value any one and can be selected as sampling ratio for coding.In some embodiments, be lower than the specific threshold amount in case the sampling ratio is determined generation global error level, coding can continue with that ratio as the sampling ratio of selecting.

For level and the inhomogeneous sampling ratio of vertical direction M _hAnd M _v

In each execution mode, do not force the restriction that equates for the both direction ratio, horizontal vertical direction ratio M _hAnd M _vCan more freely select.M _hAnd M _vProbable value respectively shown in equation (27) and the equation (28):

M_{h} = \frac{W}{W - m}, 0 \leq m \leq W - 1

（27）

M_{v} = \frac{H}{H - n}, 0 \leq n \leq H - 1

（28）

Thus, (M _h, M _v) joint event can have W * H kind possibility.All these possibilities are finished in exhaustive search, and may use the event of too wasting for great majority simultaneously.A kind of as in the quick search strategy, W * H kind possibility can use big step-length to handle, shown in equation (29) and equation (30), Δ wherein _hAnd Δ _vBe respectively the integer step size for level and vertical direction:

M_{h} = \frac{W}{W - m Δ_{h}}, 0 \leq m \leq \frac{W}{Δ_{h}} - 1

（29）

M_{v} = \frac{H}{H - n Δ_{v}}, 0 \leq n \leq \frac{H}{Δ_{v}} - 1

（30）

Thus, the quantity of possibility reduces to

Provide minimum therein

Approximate extents

Can be found.Further fine search can exist subsequently

Near execution.

Yet, in some embodiments, when Have at (M _h, M _v) the local minimum of W * H kind possibility the time, the determined sampling ratio that is found by this strategy can be one in the local minimum rather than global optimum.In one embodiment, some ratios

Etc. be determined, it provides error

Relative smaller value.Then, fine search produces local minimum error near execution each candidate to find in given annex The meticulous ratio of difference

Etc..Final ratio exists subsequently

Etc. in be selected with minimum as generation

That.

In another embodiment, at first in both direction, evenly carry out big step length searching under the restriction of ratio, be similar to Fig. 9 B.The ratio that finds from this first step is established as M _iNote owing to the restriction of having forced even ratio, M _iBe applied in for level and vertical direction.Scope [M subsequently _a, M _b] be defined, it has comprised the ratio M of expectation _i, M just _a≤ M _i≤ M _bThe restriction that level and vertical direction is applied same ratio is cancelled subsequently, and subsequent searches can be performed to obtain to be used for separately each selected sampling ratio of both direction.Level and vertical ratio M _hAnd M _vThe hunting zone respectively shown in equation (31) and equation (32):

M_{h} = \frac{W}{W - m}, \frac{W}{M_{a}} \leq m \leq \frac{W}{M_{b}}

（31）

M_{v} = \frac{H}{H - n}, \frac{H}{M_{a}} \leq n \leq \frac{H}{M_{b}}

(32)

As can be seen, hunting zone (M _h, M _v) reduce to from W * H

Subsequently, the combinations thereof of following fine search after the rough search is employed to find the sub sampling ratio for the final selection of level and vertical direction in this hunting zone.

Figure 10 A has described the process flow 1000 that is used for coding video frequency data according to a kind of unrestricted execution mode.At 1002 places, the video data that is encoded is received.At 1004 places, sampling error value each place in a plurality of sampling ratios is determined.In some embodiments, the sampling error value uses the estimation of the PSD of the power spectral density (PSD) of the video data receive and the video data behind the down-sampling to determine.As previously discussed, in each execution mode, can be used to estimate the PSD of video data based on the technology of data.In each execution mode, can be used to estimate the PSD of video data based on the technology of model.At 1006 places, encoding error value each place in a plurality of sampling ratios is determined.Encoding error can be based on given bit rate.In some embodiments, bit rate can receive from the network node such as video server or end user device.For given bit rate, each the encoding error value that provides in a plurality of sampling ratios can be provided the encoding error model.The encoding error model can comprise first parameter and second parameter, and each characteristic based on the video data that receives changes independently.The value of first and second parameters can use any suitable technique to determine.For example, in one embodiment, first and second parameters are determined by the curve matching process.In another embodiment, first and second parameters can be determined by inquiring about each look-up table, as above more detailed description.In some embodiments, can determine before the sampling error value at 1004 places in the encoding error value at 1006 places.At 1008 places, the sampling ratio that the sampling error value at each sampling ratio place and encoding error value have been added to determine to reduce the global error value.At 1010 places, the sampling ratio is selected.In some embodiments, a plurality of sampling ratios are selected at whole video cataloged procedure duration.For example, the first sampling ratio can the video data that receives begin be selected, follow-up one or more additional sample ratios can be selected during the coding incident duration.In some embodiments, exhaustive search is performed the sampling ratio of selecting to determine.In other embodiments, non exhaustive search is performed the sampling ratio of selecting to determine.For example, the error that only is associated with the secondary set (subclass) of a plurality of sampling ratios can be added.The sampling ratio can be selected from the subclass of the sampling error of addition and encoding error.In some embodiments, can use additional search further to improve search at the sampling ratio of selecting.In any case at 1014 places, the sampling ratio that video data can be selected is by down-sampling, and at 1016 places, the video data behind the down-sampling can be encoded.In some embodiments, if bit rate changes, cataloged procedure can be reappraised with the sampling ratio after definite the renewal.In addition, in some embodiments, the sampling ratio comprises level sampling ratio and vertical sampling ratio.These levels can be same or different with vertical sampling ratio.

Figure 10 B has described the process flow 1050 that is used for decode video data according to a unrestricted execution mode.At 1052 places, the video data after the compression is received.Video data can receive from any suitable provider such as live video streams or previously stored video.At 1054 places, the indication of the sampling ratio of selection is received.The sampling ratio can based on the sampling error value on a plurality of sampling ratios for example and encoding error value and.At 1056 places, coefficient block is decoded to form the video data of reconstruct.At 1058 places, the video data of reconstruct is up-sampled to the resolution of the video data of reconstruct with the sampling ratio of selecting.At 1060 places, the video data behind the up-sampling is output.

According to each execution mode, for the input video with resolution W * H, down-sampling process (namely in Figure 16 by downsampling unit 1606) can be carried out down-sampling by factor a and b to it for level and vertical direction respectively, and wherein a and b are positive rational numbers.Then, output video has resolution When a and b can be any positive rational numbers, respectively by

With

Representative, wherein M _h, N _h, M _vAnd N _vAll be positive integer, the output of down-sampling process also is video data, and it has the capable and integer row pixel of integer.Thus, in each execution mode,

With

(both

With

) be integer, N wherein _hAnd N _vIt is the factor be used to the W that satisfies the output resolution ratio demand and H.

In some embodiments, up-sampling process (namely in Figure 17 by up-sampling unit 1712) can have the up-sampling ratio of the down-sampling ratio that equals the down-sampling process, and this causes processed video to have identical resolution with original input video.In other embodiments, the up-sampling ratio is from the decoupling zero of down-sampling ratio, and this allows up-sampling ratio more flexibly.For example, suppose that sampled video is had resolution W ₁* H ₁, the up-sampling ratio can be configured to c and d at level and vertical direction respectively, and makes the resolution of output video equal cW ₁* dH ₁, wherein c and d are positive rational numbers.The value of c and d can dispose based on various standards before up-sampling.For example, in order to make output video have the resolution more than or equal to input resolution, factor c and d should be more than or equal to 1.0.In addition, when c and d can be any positive rational number, respectively by

With

Expression, wherein K _h, L _h, K _vAnd L _vBe positive integer, in each execution mode, L _hAnd L _vBe respectively W ₁And H ₁The factor.As the additional standard that is used for selecting c and d, aspect ratio pictures (PAR) can be maintained at

Figure 11 has the down-sampling ratio

The block diagram 1100 of horizontal down-sampling process.Block diagram 1100 is included in the square frame 1102 up-sampling M of place _hDoubly, at the square frame 1104 filter application f of place _{D, h}, and at the square frame 1106 down-sampling N of place _hDoubly.After being handled by block diagram 1100, the width of output video is

Figure 12 has described M _h=3 and N _h=4 example down-sampling process.Have frequency spectrum F(Figure 12 (b)) the former X(of beginning Figure 12 (a)) at first come up-sampling M by inserting the null value sample _hDoubly.The row that produces is being X shown in Figure 12 (c) _uAs the result of up-sampling, frequency spectrum F is compressed M shown in Figure 12 (d) _hDoubly, be expressed as F _uAt F _uIn, with

Frequency spectrum centered by the integral multiple is introduced by zero insertion and need be by filter f _{D, h}Remove (shown in the square frame 1104 among Figure 11).Because X _uWith follow-up at square frame 1406 places by factor N _hDown-sampling, f _{D, h}Cut-off frequency should be

(for example

) rather than

, shown in Figure 12 (f).f _{D, h}Filter gain be M _h, should be the sampled M of capable X _hDoubly, length and energy also increase M _hDoubly.Therefore, f _{D, h}Can pass through ideal frequency response H _dUse inverse Fourier transform and calculate, such as Figure 12 (f) description, shown in equation (33):

f_{d, h} (n) = \frac{1}{2 π} {&Integral;}_{\frac{π}{N_{h}}}^{\frac{π}{N_{h}}} H_{d} e^{jnω} dω = \frac{1}{2 π} {&Integral;}_{- \frac{π}{N_{h}}}^{\frac{π}{N_{h}}} M_{h} e^{jnω} dω = \frac{M_{h}}{N_{h}} Sinc (\frac{π}{N_{h}} n)

（33）

Wherein

Sinc (x) = \{\begin{matrix} \frac{\sin (x)}{x}, x &NotEqual; 0 \\ 1, x = 0 \end{matrix}

（34）

By with F _u(Figure 12 (d)) and H _d(Figure 12 (f)) multiplies each other residual spectrum Z _fBe determined, shown in Figure 12 (g).In spatial domain, Z _fCorresponding to filtered row, be expressed as X _f(seeing the upper row of Figure 12 (e)).X _fPass through simply from X subsequently _fIn every N _hIndividual pixel is selected and by factor N _hCarry out down-sampling (square frame 1406 among Figure 14).At last, the capable X behind the down-sampling _d(Figure 12 (e)) and its frequency spectrum Z _d(Figure 12 (h)) is determined.

Similarly, vertical downsampling filter f _{D, v}Can use formula (35) to calculate:

f_{d, v} (n) = \frac{1}{2 π} {&Integral;}_{- \frac{π}{N_{v}}}^{\frac{π}{N_{v}}} M_{v} e^{jnω} dw = \frac{M_{v}}{N_{v}} Sinc (\frac{π}{N_{v}} n)

（35）

Has resolution M in order to generate _hW * M _vThe intermediate frame of H can use two step strategies: to original video respective application level and vertical filter (with any order).In some embodiments, two-dimentional non-separate filter f _{D, 2D}Can be calculated, it is f _{D, h}And f _{D, v}Two-dimensional convolution, and with f _{D, 2D}Be applied directly to original video.

The design up-sampling filter is similar to the design downsampling filter.For example, horizontal direction can first-selected be paid close attention to, and expands to vertical direction subsequently.After up-sampling, has width W ₁The resolution of input video can will change to As shown in figure 13, up-sampling process 1300 can be included in square frame 1302 places and comes the former K of beginning of up-sampling by zero insertion _hDoubly, 1304 places apply filter f at square frame _{U, h}, pass through every L at square frame 1306 places _hIndividual pixel is selected a pixel and is come down-sampling L _hDoubly, its median filter f _{U, h}Can be calculated by equation (36):

f_{u, h} (n) = \frac{1}{2 π} {&Integral;}_{- \frac{π}{K_{h}}}^{\frac{π}{K_{h}}} K_{h} e^{jnω} dω = Sinc (\frac{π}{K_{h}} n)

（36）

Similarly, vertical up-sampling filter f _{D, v}Can be calculated by (37):

f_{u, h} (n) = \frac{1}{2 π} {&Integral;}_{- \frac{π}{K_{v}}}^{\frac{π}{K_{v}}} K_{v} e^{jnω} dω = Sinc (\frac{π}{K_{v}} n)

（37）

In some embodiments, can use window function to limit the size of above-mentioned filter.Suitably the window function of type include but not limited to that for example the Chinese is peaceful, Hamming, triangle, gaussian sum Brackman window.

In one embodiment, Gauss's window function of expressing in the equation (38) is used, and wherein N represents the length of filter, and σ is the standard deviation of Gaussian function.Figure 14 has described the window function example of (N=71, σ=1.5).

w (n) = e^{- \frac{1}{2} {(\frac{n - (N - 1) / 2}{σ (N - 1) / 2})}^{2}}

（38）

Has resolution W in order to generate ₁K _h* H ₁K _VIntermediate frame, can use two step strategies: to original video respective application level and vertical filter (with any order).In some embodiments, two-dimentional non-separate filter f _{U, 2D}Can be calculated, it is f _{U, h}And f _{U, v}Two-dimensional convolution, and with f _{U, 2D}Be applied directly to original video.

When frame is WM by interpolation _h* HM _vAnd W ₁K _h* H ₁K _vAs the median that is used for down-sampling and up-sampling, many interpolating pixels are not used.For example, in some embodiments, only

(or

) individual pixel is selected to form for down-sampling has resolution

Final output video (or for up-sampling be

).Thus, great majority calculate and are not used.According to this result, in some embodiments, only will finally elect the pixel of formation output video by interpolation.

Figure 15 has described wherein up-sampling with M _h=3 and N _h=4 execution modes that are performed.1502, the 1504a that is expert at, 1504b, 1504c etc. represent integer pixel, 1506 representatives of white insert zero.Be replaced in all unknown positions are carried out interpolation, the pixel that forms the row behind the final down-sampling is at first selected, shown in the row 1508 of Figure 15.The position of these selections subsequently is classified into M based on its phase place _hIndividual classification.In one embodiment, the phase place of pixel is determined by its distance from contiguous integer pixel.In the row 1512 of Figure 15, there are three outs of phase, be described to zero phase 1514, the first phase places 1516 and second phase place 1518.

In some embodiments, down-sampling and up-sampling filter (are f _{D, h}, f _{D, v}, f _{U, h}And f _{U, v}) in each be broken down into one group of phase filter, each phase filter is used to the pixel that interpolation is associated.In table 3, f _{D, h}, f _{D, v}, f _{U, h}And f _{U, v}Length be represented as N respectively _{D, H}, N _{D, V}, N _{U, H}And N _{U, V}Decomposable process is provided in the table 3, and wherein i is nonnegative integer, and k is the index of filter.

Table 3

Figure 16 and Figure 17 show the illustrative embodiments of framework, be included in according to system and method described herein to before coding video data, decoding and/or the transcoding, afterwards or operable preliminary treatment simultaneously and/or post-processing step.Preliminary treatment and/or reprocessing can be the quantifications that for example comprises video data, down-sampling, up-sampling, anti-sawtooth, low pass interpolation filtering, and/or the adaptive process of blur prevention filtering.According to execution mode, video data is carried out preliminary treatment and/or reprocessing can be enabled encoder and/or the decoder of use standard, as H.264 encoder and/or decoder.

The example encoder framework

Figure 16 shows example encoder framework 1600, this framework 1600 be included in to before the coding video data or the processing of carrying out simultaneously and preliminary treatment in order to obtain the sampling ratio of selection.As above conversion 1608 conversion of describing with reference to Fig. 2 quantize 1610, entropy coding 1612, and re-quantization 1614, inverse transformation 1616, motion compensation 1620, memory 1618 and/or estimation 1624 can be the parts for the coder processes of video data.Anti-sawtooth filter 1604, downsampling unit 1606, encoder controller 1622 can be for the part to the pre-treatment step of coding video data.These pretreatment elements can be incorporated into encoder, are independent of encoder work, perhaps are configured to be in outside the encoder.Under any circumstance, after being encoded from input 1602 video data, the video data behind the coding can be sent out and/or store by channel 1626.

In some embodiments, can provide output buffer, with the video data behind the coding that is used for storage output.The buffer plumpness is monitored, or the input and output rate of buffer is compared, to determine its relative plumpness grade and can indicate this relative plumpness grade to controller.Output buffer can for example use the buffer plumpness signal that offers coding controller 1622 from output buffer to indicate relative plumpness grade.Encoder controller 1622 can monitor various parameters and/or the constraint that is associated with the computing capability of channel 1626, video encoder system, user's demand etc., and can set up target component and follow Quality of experience (QoE) with what the constraint that is suitable for appointment and/or channel condition be provided.Target bit rate can be adjusted every now and then according to constraint and/or the channel condition of appointment.Typical target bit rate comprises for example 64kbps, 128kbps, 256kbps, 384kbps's, 512kbps etc.

As shown in figure 16, video data is from input 1602(such as video source) be received.Received video data can comprise vision signal original or decoding, video sequence, bit stream, or representative image or any other data of video content.According to system and method described herein, the video data that receives can be by anti-sawtooth filter 1604, downsampling unit 1606, and/or encoder controller 1622 carries out preliminary treatment.Anti-sawtooth filter 1604, downsampling unit 1606, and/or encoder controller 1622 can be communicate with one another and/or communicate by letter with other elements of encoder with to the coding video data that receives to be used for transmission.In some embodiments, the technology of describing with reference to Figure 11-15 above anti-sawtooth filter 1604 can use designs.Preliminary treatment to the video data that receives can quantize in the conversion of encoder, the entropy coding, and re-quantization, inverse transformation, motion compensation, and/or before the performed processing of other elements of estimation or be performed simultaneously.

As shown in figure 16, video data original and/or decoding can be sent to anti-sawtooth filter 1604 to carry out preliminary treatment.Anti-sawtooth filter can be used for limiting the frequency content of video data, to satisfy the condition of downsampling unit 1606.According to an execution mode, can be 11-tap FIR at the anti-sawtooth filter 1604 of 2:1 down-sampling, i.e. [1,0 ,-5,0,20,32,20,0 ,-5,0,1]/64.According to an execution mode, anti-sawtooth filter can be adaptive to the content that receiving and/or with quantization parameter (QP) co-design.Encoder controller 1622 can be determined the sampling ratio selected, communicates by letter selected sampling ratio to be provided for downsampling unit 1606 with downsampling unit 1606 between the pre-treatment period of video data.For example, encoder controller 1622 can come selective filter type (separable or inseparable) on any dimension adaptively, the coefficient of filter, and/or filter length based on the statistics of video data and/or the data transmission capabilities of channel.

As shown in figure 16, the preliminary treatment to video data can comprise that 1606 pairs of video datas of use downsampling unit carry out down-sampling.The downsampling unit 1606 ratio M that can sample carries out down-sampling, describes in detail as top.Video data can be sent to downsampling unit 1606 from anti-sawtooth filter 1604.Replacedly, video data original and/or decoding can directly be sent to downsampling unit 1606.Under any circumstance, downsampling unit 1606 can be carried out down-sampling to video data, to reduce the sampling ratio of video data.Video data is carried out down-sampling may be produced than by the represented original image of video data and/or video more image and/or the video of low resolution.As mentioned above, 1606 of downsampling unit sampling ratio M can be adaptive to the content that receives and/or with the QP co-design.For example, encoder controller 1622 can for example be selected the down-sampling ratio based on the data transmission capabilities of instantaneous video content and/or channel adaptively, and for example 1/3 or reasonable mark.

Can be by being controlled with communicating by letter of encoder controller 1622 and/or being assisted by the preliminary treatment that anti-sawtooth filter 1604 and/or downsampling unit 1606 are carried out.Encoder controller 1622 can be additionally, or replacedly, the quantification that the control video data is carried out in handling.Encoder controller 1622 can be configured to select coding parameter.For example, the encoder controller can depend on content, can be used to the movable information in video data, residual error data, and other add up to determine coding parameter and/or pretreatment parameter, ratio M for example samples.

The example decoder framework

Figure 17 illustrate for the treatment of with the example decoder framework 1700 of reprocessing, described processing and reprocessing are performed with decode video data.Entropy decoding 1704, re-quantization 1706, inverse transformation 1708, and/or motion compensation 1720 can be the part for the decoder processes of video data.Up-sampling unit 1712, low pass filter 1714, blur prevention filter 1016, and/or decoder controller 1710 can be the part for the post-processing step of this video data of decoding.The element of these reprocessings can merge in the decoder 1700, is independent of decoder functions, perhaps is configured to outside decoder.Under any circumstance, after decoded and reprocessing had been performed, decoded video data can be sent to for example storage medium or output device via output 1718 at the video data that comes self-channel 1702.

As shown in figure 17, video data by channel 1702 such as from encoder or storage medium, receiving.Received video data can comprise the vision signal behind the coding, video sequence, bit stream, or any other data of representative image or video content.The video data that receives can use entropy decoding, re-quantization, and inverse transformation, and/or motion compensation process handles, as shown in Figure 3.Processing to the video data behind the coding can be carried out before reprocessing or simultaneously.Video data behind the coding can be by up-sampling unit 1712, low pass filter 1714, and blur prevention filter 1716 and/or decoder controller 1710 carry out reprocessing.Decoder controller 1710 can receive the indication of selected sampling ratio and the sampling ratio that will select is sent to sampling unit 1712.Up-sampling unit 1712, low pass filter 1714, blur prevention filter 1716, thus and/or decoder controller 1718 can communicate with one another and/or display is stored and/or outputed to the video data that receives of communicating by letter with other elements of decoder 1700 to decode.In some embodiments, low pass filter 1714 can use the above technology of describing with reference to Figure 14-18 to design.

As shown in figure 17, the reprocessing to video data can comprise the up-sampling video data.The up-sampling ratio can be selected speed Mi, and is as described above.Video data is by decoder 1700(as shown in the figure) be sent to up-sampling unit 1712 after handling.Up-sampling unit 1712 can increase resolution and/or the quality of the video of reconstruct.For example, can be corresponding to the down-sampling of video data being carried out when the encoder preliminary treatment to the up-sampling of video data.Be similar to downsampling unit 1606(Figure 16), up-sampling unit 1712 has for the dynamic sampling ratio that video data is carried out up-sampling.

According to an execution mode, can comprise low-pass interpolation filters 1714 to the reprocessing of video data.Low-pass interpolation filters can realize anti-sawtooth, and improves quality and the definition of the video content of being represented by video data.According to an execution mode, the low-pass interpolation filters that is used for the 1:2 up-sampling comprises 4 tap FIR, i.e. [0.25,0.75,0.75,0.25].Low-pass interpolation filters 1714 can be adaptive to content and/or with the QP co-design.According to an execution mode, decoder controller is selective filter type, filter coefficient and/or filter length on any dimension adaptively.The selection of being made by decoder controller can be based on statistics and/or the grammer in the video data of coding, such as the statistics of frame and the QP of present frame before, as described in detail above.

As shown in figure 17, in some embodiments, the reprocessing of video data is comprised blur prevention (or sharpening) filter 1716.It is fuzzy that can be used to blur prevention filter 1716 to compensate down-sampling and/or low-pass filtering cause.According to an execution mode, the blur prevention filter can comprise two-dimentional Laplace filter, i.e. [0,0,0,0,1,0,0,0,0]+[1 ,-1 ,-1 ,-1,8 ,-1 ,-1 ,-1 ,-1]/5.The blur prevention filter can be adaptive to content and/or with the QP co-design.According to an execution mode, decoder controller 1710 is selective filter type on any dimension adaptively, filter coefficient, and/or filter length.Selection can be based on the statistics in coded video bit stream and/or grammer, and for example, the QP of the statistics of frame and present frame describes in more detail as top before.

According to an execution mode, carry out the encoder of preliminary treatment and reprocessing respectively and can know each other.For example, encoder can have and makes and will be transferred to the communication link (as the communication channel 16 in Fig. 1) of decoder corresponding to the pretreated information of video data.Similarly, decoder can transmit corresponding to the information of video data reprocessing by communication link and give encoder.Such communication link can make decoder adjust reprocessing based on the preliminary treatment that takes place in encoder.Similarly, communication link can make encoder regulate preliminary treatment based on the reprocessing that takes place at the decoder place.If preliminary treatment and reprocessing are not carried out at the encoder place respectively, then similarly communication link also can be set up with other entities of the preliminary treatment of carrying out video data and/or reprocessing.

Figure 18 illustrate about code converter the pretreated example embodiment of video data.As shown in figure 18, video data 1804 can be received, such as bit stream, and vision signal, video sequence, or any other data of representative image or video content.Video data can be by by anti-sawtooth filter 1808, down-sampler 1810, and/or encoder controller 1802 carries out preliminary treatment.Anti-sawtooth filter 1808, down-sampler 1810, and/or encoder controller 1802 can communicate with one another and/or communicate by letter with other elements of encoder and/or decoder.Preliminary treatment to the video data that receives can or be carried out before the processing of being carried out by encoder and/or decoder simultaneously.Video data can be by above described pretreated at the pretreated discussion of the video data among Figure 16.

Describe at Fig. 1 as above, for example, can be by communication network via communication channel 16 transmissions according to the video of system and method coding described herein, communication channel 16 can comprise wired connection and/or wireless connections.Communication network can be the communication system of any suitable type, as below at figure Figure 19 A, 19B, 19C and 19D more detailed description.

Figure 19 A is the legend that can implement the example communication system 1900 of one or more disclosed execution mode therein.Communication system 1900 can be the multi-access systems that the content such as voice, data, video, message, broadcasting etc. is offered a plurality of wireless users.Communication system 1900 can make a plurality of wireless users can visit these contents by the shared of system resource (comprising wireless bandwidth).For example, communication system 1900 can be used one or more channel access methods, for example code division multiple access (CDMA), time division multiple access (TDMA), frequency division multiple access (FDMA), quadrature FDMA(OFDMA), Single Carrier Frequency Division Multiple Access (SC-FDMA) etc.

Shown in Figure 19 A, communication system 1900 can comprise wireless transmitter/receiver unit (WTRU) 1902a, 1902b, 1902c, 1902d, radio access network (RAN) 1904, core net 1906, public switch telephone network (PSTN) 1908, internet 1910 and other networks 1912, but be understandable that WTRU, base station, network and/or the network element that disclosed execution mode can contemplates any number.WTRU1902a, 1902b, 1902c, each among the 1902d can be the device that is configured in radio communication any kind of operation and/or communication.As example, WTRU1902a, 1902b, 1902c, 1902d can be configured to send and/or receive wireless signal, and can comprise that subscriber equipment (UE), mobile radio station, fixing or moving user unit, beep-pager, cell phone, PDA(Personal Digital Assistant), smart phone, portable computer, net book, personal computer, wireless senser, consumption electronic product or any other can receive and handle the terminal of compressed video communication.

Communication system 1900 can also comprise base station 1914a and base station 1914b.Base station 1914a, among the 1914b each can be to be configured to and WTRU1902a, 1902b, 1902c, at least one wireless interaction among the 1902d is so that insert the device of any kind of one or more communication networks (for example core net 1906, internet 1910 and/or network 1912).For example,

base station

1914a, 1914b can be base station transceiver station (BTS), Node B, e Node B, family expenses Node B, family expenses e Node B, site controller, access point (AP), wireless router and similar device.Although base station 1914a, each all is described to discrete component 1914b, is understandable that

base station

1914a, and 1914b can comprise any amount of interconnected base station and/or network element.

Base station 1914a can be the part of RAN1904, and this RAN1904 can also comprise other base stations and/or the network element (not shown) such as site controller (BSC), radio network controller (RNC), via node.Base station 1914a and/or base station 1914b can be configured to send and/or receive the wireless signal in the specific geographical area, and this specific geographical area can be known as the residential quarter (not shown).The residential quarter can also be divided into cell sector.For example the residential quarter that is associated with base station 1914a can be divided into three sectors.Thus, in one embodiment, base station 1914a can comprise three transceivers, and namely there is a transceiver each sector at described residential quarter.In another embodiment, base station 1914a can use multiple-input and multiple-output (MIMO) technology, and can use a plurality of transceivers at each sector of residential quarter thus.

Base station

1914a, 1914b can pass through air interface 1916 and WTRU1902a, 1902b, 1902c, one or more communication among the 1902d, this air interface 1916 can be any suitable wireless communication link (for example radio frequency (RF), microwave, infrared (IR), ultraviolet (UV), visible light etc.).Air interface 1916 can use any suitable radio access technologies (RAT) to set up.

More specifically, as previously mentioned, communication system 1900 can be multi-access systems, and can use one or more channel access schemes, for example CDMA, TDMA, FDMA, OFDMA, SC-FDMA and similar scheme.For example, base station 1914a and WTRU1902a in RAN1904,1902b, 1902c can implement to insert (UTRA) such as using wideband CDMA (WCDMA) to set up the radiotechnics of air interface 1916 such as Universal Mobile Telecommunications System (UMTS) terrestrial radio.WCDMA can comprise such as high-speed packet access (HSPA) and/or evolved HSPA (HSPA+).HSPA can comprise that high-speed downlink packet inserts (HSDPA) and/or High Speed Uplink Packet inserts (HSUPA).

In another embodiment, base station 1914a and WTRU1902a, 1902b, 1902c can implement the radiotechnics such as evolved UMTS terrestrial radio inserts (E-UTRA), and it can use Long Term Evolution (LTE) and/or senior LTE(LTE-A) set up air interface 1916.

In other embodiments, base station 1914a and WTRU1902a, it is that worldwide interoperability for microwave inserts (WiMAX) that 1902b, 1902c can implement such as IEEE802.16(), CDMA2000, CDMA20001X, CDMA2000EV-DO, interim standard 2000(IS-2000), interim standard 95(IS-95), interim standard 856(IS-856), global system for mobile communications (GSM), enhanced data rates for gsm evolution (EDGE), GSM EDGE(GERAN) radiotechnics.

By way of example, base station 1914b among Figure 19 A can be wireless router, family expenses Node B, family expenses e Node B or access point, and can use any suitable R AT, to be used for promoting communicating to connect at the regional area such as company, family, vehicle, campus.In one embodiment, base station 1914b and WTRU1902c, 1902d can implement radiotechnics such as IEEE802.11 to set up wireless lan (wlan).In another embodiment, base station 1914b and WTRU1902c, 1902d can implement radiotechnics such as IEEE802.15 to set up Wireless Personal Network (WPAN).In another execution mode, base station 1914b and WTRU1902c, 1902d can use for example WCDMA, CDMA2000, GSM, LTE, LTE-A etc. based on the RAT(of honeycomb) to set up (picocell) residential quarter and Femto cell (femtocell) slightly.Shown in Figure 19 A, base station 1914b can have to the internet 1910 direct connection.Thus, base station 1914b needn't enter the Internet 1910 via core net 1906.

RAN1904 can communicate by letter with core net 1906, this core net 1906 can be to be configured to the voice on voice, data, application program and/or the Internet protocol (VoIP) service is provided to WTRU1902a, 1902b, 1902c, the network of any kind of one or more among the 1902d.For example, core net 1906 can provide calls out control, bill service, the service based on the shift position, prepaid call, internet interconnected, video distribution etc., and/or execution advanced security feature, for example user rs authentication.Although not shown among Figure 19 A, it will be appreciated that RAN1904 and/or core net 1906 can communicate with other RAN directly or indirectly, these other RAT can use the RAT identical with RAN1904 or different RAT.For example, except being connected to the RAN1904 that can adopt the E-UTRA radiotechnics, core net 1906 can not show with another RAN(that uses the gsm radio technology yet) communicate by letter.

Core net 1906 also can be used as WTRU1902a, 1902b, and 1902c, 1902d inserts the gateway of PSTN1908, internet 1910 and/or other networks 1912.PSTN1908 can comprise the circuit exchanging telephone network that plain old telephone service (POTS) is provided.Internet 1910 can comprise the global system of interconnected computer network and the device that uses common communicating protocol, described common communicating protocol for example transmission control protocol (TCP)/Internet protocol (IP) Internet Protocol external member in TCP, User Datagram Protoco (UDP) (UDP) and IP.Network 1912 can comprise by other serves the wireless or wireline communication network that the provider has and/or operates.For example, network 1912 can comprise another core net that is connected to one or more RAN, and these RAN can use the RAT identical with RAN104 or different RAT.

WTRU1902a in the

communication system

1900,1902b, 1902c, among the 1902d some or all can comprise the multi-mode ability, i.e. WTRU1902a, 1902b, 1902c, 1902d can comprise a plurality of transceivers that communicate for by a plurality of communication links and different wireless network.For example, the WTRU1902c that shows among Figure 19 A can be configured to communicate with the base station 1914a of use based on the radiotechnics of honeycomb, and communicates with the base station 1914b that uses the IEEE802 radiotechnics.

Figure 19 B is the system block diagram of example WTRU1902.Shown in Figure 19 B, WTRU1902 can comprise processor 1918, transceiver 1920, transmitting/receiving element 1922, loud speaker/microphone 1924, keyboard 1926, display/touch pad 1928, non-removable memory 1930, removable memory 1932, power supply 1934, global positioning system chipset 1936 and other ancillary equipment 1938.It will be appreciated that with when above execution mode is consistent, WTRU1902 can comprise any subclass of said elements.

Processor 1918 can be integrated circuit (IC), state machine of general purpose processor, special-purpose purpose processor, conventional processors, digital signal processor (DSP), graphics processing unit (GPU), a plurality of microprocessor, the one or more microprocessors that are associated with the DSP core, controller, microcontroller, application-specific integrated circuit (ASIC) (ASIC), field programmable gate array (FPGA) circuit, other any kinds etc.Processor 1918 can be carried out signal coding, data processing, power control, I/O is handled and/or make WTRU1902 can operate in other any functions in the wireless environment.Processor 1918 can be coupled to transceiver 1920, and this transceiver 1920 can be coupled to transmitting/receiving element 1922.Although among Figure 19 B processor 1918 and transceiver 1920 are described as independently assembly, be understandable that processor 1918 and transceiver 1920 can be integrated in Electronic Packaging or the chip together.

Transmitting/receiving element 1922 can be configured to send signal to base station (for example base station 1914a) by air interface 1916, and (for example base station 1914a) receives signal perhaps from the base station.For example, in one embodiment, transmitting/receiving element 1919 can be the antenna that is configured to send and/or receive the RF signal.In another embodiment, transmitting/receiving element 1922 can be the transmitter/detector that is configured to send and/or receive for example IR, UV or visible light signal.In another execution mode, transmitting/receiving element 1922 can be configured to send and receive RF signal and light signal.It will be appreciated that transmitting/receiving element 1922 can be configured to send and/or receive the combination in any of wireless signal.

In addition, although transmitting/receiving element 1922 is described to discrete component in Figure 19 B, WTRU1902 can comprise any amount of transmitting/receiving element 1922.More particularly, WTRU1902 can use the MIMO technology.Thus, in one embodiment, WTRU1902 can comprise for example a plurality of antennas of two or more transmitting/receiving elements 1922() to be used for transmitting and receiving wireless signal by air interface 1916.

Transceiver 1920 can be configured to will being modulated by the signal that transmitting/receiving element 1922 sends, and is configured to the signal that is received by transmitting/receiving element 1922 is carried out demodulation.As mentioned above, WTRU1902 can have the multi-mode ability.Thus, transceiver 1920 can comprise a plurality of transceivers to be used for making WTRU1902 to communicate via many RAT, for example UTRA and IEEE802.11.

The processor 1918 of WTRU1902 can be coupled to that loud speaker/microphone 1924, keyboard 1926 and/or display/touch pad 1928(for example, liquid crystal display (LCD) display unit or Organic Light Emitting Diode (OLED) display unit), and can receive user input data from said apparatus.Processor 1918 can also be to loud speaker/microphone 1924, keyboard 1926 and/or display/touch pad 1928 output data.In addition, processor 1918 can be visited from the information in the suitable memory of any kind, and stores data in the suitable memory of any kind, and described memory for example can be non-removable memory 1930 and/or removable memory 1932.Non-removable memory 1930 can comprise the memory storage apparatus of incoming memory (RAM), readable memory (ROM), hard disk or any other type at random.Removable memory 1932 can comprise similar devices such as Subscriber Identity Module (SIM) card, memory stick, secure digital (SD) storage card.In other embodiments, processor 1918 can be visited the data that for example are positioned at the memory on server or the home computer (not shown) on the WTRU1902 from physically not being positioned at, and stores data in above-mentioned memory.

Processor 1918 can be from power supply 1934 received powers, and can be configured to give other assemblies among the WTRU1902 with power division and/or the power of other assemblies to WTRU1902 controlled.Power supply 1934 can be any device that powers up to WTRU1902 that is applicable to.For example, power supply 1934 can comprise one or more dry cells (NI-G (NiCd), nickel zinc (NiZn), ni-mh (NiMH), lithium ion (Li-ion) etc.), solar cell, fuel cell etc.

Processor 1918 can also be coupled to GPS chipset 1936, and this GPS chipset 1936 can be configured to provide the positional information about the current location of WTRU1902 (for example longitude and latitude).As replenishing or substituting from the information of GPS chipset 1936, WTRU1902 can by air interface 1916 from the base station (base station 1914a for example, 1914b) receiving position information, and/or determine its position based on the timing of the signal that receives from two or more adjacent base stations.It will be appreciated that with when execution mode is consistent, WTRU1902 can obtain positional information by any suitable location determining method.

Processor 1918 can also be coupled to other ancillary equipment 1938, and this ancillary equipment 1938 can comprise one or more softwares and/or the hardware module that supplementary features, functional and/or wireless or wired connection are provided.For example, ancillary equipment 1938 can comprise accelerometer, digital compass (e-compass), satellite transceiver, digital camera (being used for photo or video), USB (USB) port, shaking device, TV transceiver, hands-free headsets, bluetooth R zero module, frequency modulation (FM) radio unit, digital music player, media player, video game machine module, explorer etc.

Figure 19 C is the system block diagram according to a kind of RAN1904 of execution mode and core net 1906.As mentioned above, RAN1904 can use the UTRA radiotechnics to communicate by letter with 1902c with WTRU1902a, 1902b by air interface 1916.RAN1904 can also communicate by letter with core net 1906.Shown in Figure 19 C, RAN1904 can comprise

Node B

1940a, 1940b, 1940c, wherein

Node B

1940a, 1940b, 1940c each can comprise one or more transceivers, this transceiver is communicated by letter with WTRU1902a, 1902b, 1902c by air interface 1916.Among Node B 1490a, 1940b, the 1940c each can be associated with the discrete cell (not shown) in the RAN1904 scope.RAN1904 can also comprise RNC1942a, 1942b.It should be understood that RAN1904 can comprise the Node B of any amount and RNC and still is consistent with execution mode.

Shown in Figure 19 C,

Node B

1940a, 1940b can communicate with RNC1942a.In addition, Node B 1940c can communicate with

RNC1942b.Node B

1940a, 1940b, 1940c can communicate with corresponding RNC1942a, 1942b by Iub interface.RNC1942a, 1942b can communicate mutually by the Iur interface.RNC1942a, 1942b can be configured to control

Node B

1940a, 1940b, the 1940c of connected correspondence respectively.In addition, RNC1942a, 1942b can be configured to implement or support other function respectively, such as exterior ring power control, load control, allowance control, packet scheduling, switching controls, grand diversity, security functions, data encryption etc.

Core net 1906 shown in Figure 19 C can comprise media gateway (MGW) 1944, mobile switching centre (MSC) 1946, Serving GPRS Support Node (SGSN) 1948, and/or Gateway GPRS Support Node (GGSN) 1950.Although each in the said elements is described to the part of core net 1906, it should be understood that in these elements any one can be had by the entity except core network carrier and/or run.

RNC1942a among the RAN1904 can be connected to MSC1946 in the core net 1906 by the IuCS interface.MSC1946 can be connected to MGW1944.MSC1946 and MGW1944 can provide to the access of Circuit Switching Network (for example PSTN1908) to WTRU1902a, 1902b, 1902c, thereby are convenient to communicating by letter between WTRU1902a, 1902b, 1902c and the traditional landline communication devices.

RNC1942a among the RAN1904 can also be connected to SGSN1948 in the core net 1906 by the IuPS interface.SGSN1948 can be connected among the GGSN1950.SGSN1948 and GGSN1950 can provide to the access of the packet switching network (for example the internet 1910) to WTRU1902a, 1902b, 1902c, thereby are convenient to communicating by letter between WTRU1902a, 1902b, 1902c and the IP enabled devices.

As previously discussed, core net 1906 can also be connected to other network 1912, and wherein said other network 1912 can comprise other wired or wireless networks that had and/or run by other service providers.

Figure 19 D is the system diagram according to a kind of RAN1904 of execution mode and core net 1906.As mentioned above, RAN1904 can use the E-UTRA radiotechnics to communicate with WTRU1902a, 1902b and 1902c by air interface 1916.RAN1904 can also communicate with core net 1906.

RAN1904 can comprise e Node B 1960a, 1960b, 1960c, still is consistent with execution mode although it should be understood that RAN1904 can comprise the e Node B of any amount.Each can comprise one or more transceivers e Node B 1960a, 1960b, 1960c, and this transceiver is communicated by letter with WTRU1902a, 1902b, 1902c by air interface 1916.In one embodiment, e Node B 1960a, 1960b, 1960c can use the MIMO technology.Thus, for example e Node B 1960a can use a plurality of antennas to transmit wireless signal and receive wireless messages to WTRU1902a and from WTRU1902a.

Among e Node B 1960a, 1960b, the 1960c each can be associated with the specific cell (not shown) and can be configured to handle provided for radio resources management in up link and/or down link and determine, switch decision, user's scheduling.As shown in Figure 19 D, e Node B 1960a, 1960b, 1960c can communicate each other by X2 interface.

Core net 1906 shown in Figure 19 D can comprise mobile management gateway (MME) 1962, gateway 1964 and packet data network (PDN) gateway 1966.Although each in the said elements is described to the part of core network 1906, it should be understood that in these elements any one can be had by the entity except core network operators and/or run.

MME1962 can by the S1 interface be connected to e Node B 1960a among the RAN1904, among 190b, the 1960c each and can be used as the control node.For example, MME1962 can be responsible for authenticating WTRU1902a, 1902b, 1902c user, bearing activation/deexcitation, during the initial connection of WTRU1902a, 1902b, 1902c, select particular service gateway, etc.MME1962 also can be for RAN1904 and is used other RAN(of other radiotechnicss (for example GSM or WCDMA) not shown) between exchange the control plane function is provided.

Gateway 1964 can be connected to the e Node B 1960a among the RAN1904, each of 1960b, 1960c by the S1 interface.Gateway 1964 usually can route and is transmitted user data packets to WTRU1902a, 1902b, 1902c, perhaps route and transmit user data packets from WTRU1902a, 1902b, 1902c.Gateway 1964 also can be carried out other functions, for example at grappling user plane between transfer period between the e Node B, when down link data can be used for WTRU1902a, 1902b, 1902c, trigger paging, be WTRU1902a, 1902b, 1902c management and storage context etc.

Gateway 1964 also can be connected to PDN Gateway 1966, this gateway 1966 can provide to the access of the packet switching network (for example the internet 1910) to WTRU1902a, 1902b, 1902c, thereby is convenient to communicating by letter between WTRU1902a, 1902b, 1902c and the IP enabled devices.

Core net 1906 can promote with other networks between communicate by letter.For example, core net 1906 can provide to the access of circuit-switched network (for example PSTN1908) to WTRU1902a, 1902b, 1902c, thereby is convenient to communicating by letter between WTRU1902a, 1902b, 1902c and the traditional landline communication devices.For example, core net 1906 can comprise, or can communicate by letter with following: as the IP gateway (for example, IP Multimedia System (IMS) server) of interface between core net 1906 and the PSTN1908.In addition, core net 1906 can provide to the access of network 1912 to WTRU1902a, 1902b, 1902c, and this network 1912 can comprise other wired or wireless networks that had and/or run by other service providers.

Figure 19 E is the system's legend according to a kind of RAN1904 of execution mode and core net 1906.The access service network (ASN) that RAN1904 can be to use the IEEE802.16 radiotechnics to communicate by air interface 1916 and WTRU1902a, 1902b, 1902c.As what hereinafter will continue to discuss, the communication line between the difference in functionality entity of WTRU1902a, 1902b, 1902c, RAN1904 and core net 1906 can be defined as reference point.

Shown in Figure 19 E, RAN1904 can comprise

base station

1970a, 1970b, 1970c and ASN gateway 1972, can comprise the base station of any amount and ASN gateway and still is consistent with execution mode although it should be understood that

RAN1904.Base station

1970a, 1970b, 1970c are associated with specific cell (not shown) among the RAN1904 respectively, and can comprise one or more transceivers respectively, and this transceiver is communicated by letter with WTRU1902a, 1902b, 1902c by air interface 1916.In one embodiment,

base station

1970a, 1970b, 1970c can use the MIMO technology.Thus, for example base station 1970a can use a plurality of antennas to transmit wireless signal and receive wireless signal to WTRU1902a and from

WTRU1902a.Base station

1970a, 1970b, 1970c can also provide the mobile management function, and for example handover trigger, tunnel foundation, provided for radio resources management, business are classified, service quality (QoS) strategy execution, etc.ASN gateway 1972 can be used as the service hub and can duty pager, the buffer memory of user profile, be routed to core network 1906, etc.

Air interface 1916 between WTRU1902a, 1902b, 1902c and the RAN1904 can be defined as carrying out the R1 reference point of IEEE802.16 standard.In addition, each among WTRU1902a, 1902b, the 1902c can be set up and the logic interfacing (not shown) of 1906 of core net.The logic interfacing that WTRU1902a, 1902b, 1902c and core net are 1906 can be defined as the R2 reference point, can be used to authentication, mandate, the management of IP host configuration and/or mobile management.

Communication link between among

base station

1970a, 1970b, the 1970c each can be defined as comprising the R8 reference point be used to the agreement of being convenient to the transfer of data between WTRU switching and the base station.Communication link between

base station

1970a, 1970b, 1970c and the ASN gateway 1972 can be defined as the R6 reference point.The R6 reference point can comprise be used to be convenient to based on the agreement of the mobile management of each WTRU9102a, 1902b, moving event that 1900c is relevant.

Shown in Figure 19 E, RAN1904 can be connected to core net 1906.Communication link between RAN1904 and the core net 1906 can be defined as for example comprising the R3 reference point be used to the agreement of being convenient to transfer of data and mobile management ability.Core net 1906 can comprise mobile IP home agent (MIP-HA) 1974, checking, authorize, keep accounts (AAA) server 1976 and gateway 1978.Although each said elements is described to the part of core net 1906, it should be understood that in these elements any one can be had by the entity beyond the core network carrier and/or run.

MIP-HA1974 can be responsible for IP address management, and can be so that WTRU1902a, 1902b, 1902c roam between different ASN and/or different core net.MIP-HA1974 can provide to the access of packet switching network (for example the internet 1910) to WTRU1902a, 1902b, 1902c, thereby is convenient to the communication between WTRU1902a, 1902b, 1902c and the IP enabled devices.Aaa server 1976 can be responsible for the user and authenticate and support that the user serves.Gateway 1978 can promote and other networks between interworking.For example, gateway 1978 can provide to the access of circuit-switched network (for example PSTN1908) to WTRU1902a, 1902b, 1902c, thereby is convenient to communicating by letter between WTRU1902a, 1902b, 1902c and the traditional landline communication devices.In addition, gateway 1978 can provide to the access of network 1912 to WTRU1902a, 1902b, 1902c, and this network 1912 can comprise other wired or wireless networks that had and/or run by other service providers.

Though not shown in Figure 19 E, it should be understood that RAN1904 can be connected to other ASN and core net 1906 can be connected to other core net.Communication link between RAN1904 and other ASN can be defined as the R4 reference point, and this R4 reference point can comprise for the ambulant agreement of coordinating between RAN1904 and other ASN of WTRU1902a, 1902b, 1902c.Communication link between core net 1906 and other core net can be defined as the R5 reference point, and this R5 reference point can comprise be used to being convenient to local core net and the agreement of the interworking between the core net of being interviewed.

Embodiment

A kind of method for video coding, this method comprises: receiving video data; Each sampling ratio place in a plurality of sampling ratios determines the sampling error value; For bit rate, each the sampling ratio place in a plurality of sampling ratios determines the encoding error value; Described sampling error value and the addition of described encoding error value with in a plurality of sampling ratios each sampling ratio place; Based on the sampling error value at the sampling ratio place of selecting and encoding error value with the ratio of selecting in described a plurality of sampling ratios of sampling; With the sampling ratio of selecting video data is carried out down-sampling; And to the coding video data behind the down-sampling.

According to the method for previous embodiment, wherein select in described a plurality of sampling ratio one sampling ratio to comprise to select to cause in described a plurality of sampling ratios sampling error value and encoding error value and sampling ratio minimum.

According to any method of previous embodiment, wherein select in described a plurality of sampling ratio one sampling ratio to comprise to select to cause in described a plurality of sampling ratios sampling error value and encoding error value and with global error value of being lower than the global error threshold value one adopt ratio.

According to any method of previous embodiment, wherein said sampling error value is based on the power spectral density (PSD) of described video data and to the estimation of the PSD of the video data behind the down-sampling.

According to any method of previous embodiment, wherein the estimation to the PSD of the video data behind the down-sampling is a function, and at least one parameter of wherein said function is determined by at least one characteristic of described video data.

According to any method of previous embodiment, wherein said sampling error value is based on the video data that receives and the difference of the filtered video data of anti-sawtooth.

According to any method of previous embodiment, wherein said encoding error value is based on the encoding error model, and wherein said encoding error model is the function of bit rate and sampling ratio.

According to any method of previous embodiment, wherein said encoding error model comprises first parameter and second parameter, and wherein said first parameter and described second parameter each determined by at least one characteristic of described video data.

According to any method of previous embodiment, this method also comprises: for each bit rate in a plurality of bit rates, determine every pixel bit value; For each bit rate in a plurality of bit rates, determine distortion value; For each bit rate in a plurality of bit rates, determine the distortion value of a plurality of estimations based on a plurality of values of a plurality of values of first parameter of described encoding error model and second parameter; And the selective value of first parameter of definite described encoding error model and the value of second parameter, so that the difference between the distortion value of described a plurality of distortion value and a plurality of estimations is minimum.

According to any method of previous embodiment, this method also comprises: the value of selecting described first parameter from first look-up table; And the value of selecting described second parameter from second look-up table.

According to any method of previous embodiment, this method also comprises: determine the power spectral density of described video data, the value of wherein said first and second parameters is based on the DC component of described power spectral density.

According to any method of previous embodiment, this method also comprises: determine the power spectral density of described video data, the value of wherein said first and second parameters is based on to the decline rate of the high band of described power spectral density.

According to any method of previous embodiment, wherein said at least one characteristic is the complexity value of the video data that receives; And wherein said complexity value from user input and network node one receive.

According to any method of previous embodiment, this method also comprises: receive the indication of described bit rate from network node.

According to any method of previous embodiment, this method also comprises: after the sampling ratio in selecting described a plurality of sampling ratios, receive the indication of second bit rate; For second bit rate, determine the encoding error value after the renewal in described a plurality of sampling ratios one sampling ratio place; Based on described sampling error value and the encoding error value after upgrading with select to upgrade after the sampling ratio; With the sampling ratio after upgrading described input video is carried out down-sampling; And the video sequence behind the down-sampling encoded.

According to any method of previous embodiment, wherein said sampling ratio comprises level sampling ratio and vertical sampling ratio, and described level sampling ratio is different with described vertical sampling ratio.

According to any method of previous embodiment, wherein said sampling ratio comprises level sampling ratio and vertical sampling ratio, and described level sampling ratio is identical with described vertical sampling ratio.

According to any method of previous embodiment, the place that begins that first of wherein said sampling ratio is chosen in the video data that receives carries out, and at least the second of described sampling ratio is chosen in the duration of the video data that receives and carries out.

A kind of video encoding/decoding method, this method comprises: receive the video data after compressing; Reception is to the indication of the sampling ratio selected, wherein said sampling ratio be based on sampling error value and encoding error value in a plurality of sampling ratios and; Video data after the compression is decoded to form video data after the reconstruct; Carry out up-sampling to increase the resolution of the video data after the reconstruct behind the up-sampling with the sampling ratio the selected video data after to described reconstruct; And the video data behind the output up-sampling.

A kind of video decoding system, this system comprises: Video Decoder, this Video Decoder is configured to: receive the video data after compressing; Reception is to the indication of the sampling ratio selected, wherein said sampling ratio be based on sampling error value and encoding error value in a plurality of sampling ratios and; Video data after the compression is decoded to form video data after the reconstruct; Video data after the described reconstruct is carried out up-sampling to increase the resolution of the video data after the described reconstruct; And the video data after the output filtering.

Video coding system according to previous embodiment also comprises: the wireless receiving/transmitter unit of communicating by letter with communication system, wherein said wireless receiving/transmitter unit are configured to receive described video data from described communication system.

Though feature of the present invention and element are specifically to be combined in above being described, but what one of ordinary skill in the art will appreciate that is, each feature or element can be under the situation that does not have further feature and element use separately, or are using under various situations that any further feature of the present invention and element are combined.In addition, execution mode provided by the invention can be implemented in computer program, software or the firmware carried out by computer or processor, and wherein said computer program, software or firmware are comprised in the computer-readable recording medium.The example of computer-readable medium comprises electronic signal (transmitting by wired or wireless connections) and computer-readable recording medium.About the example of the computer-readable recording medium light medium including, but not limited to read-only memory (ROM), random-access memory (ram), register, buffer storage, semiconductor memory apparatus, magnetizing mediums (for example, internal hard drive or moveable magnetic disc), magnet-optical medium and CD-ROM CD and digital versatile disc (DVD) and so on.The processor relevant with software can be used to be implemented in the radio frequency transceiver that uses in WTRU, UE, terminal, base station, RNC or any master computer.

It is possible and need not to depart from scope of the present invention that method, apparatus and system described above are changed.In view of the extensively same sex not of applicable execution mode, the execution mode that it should be understood that description is exemplary, and should not work as the scope that is restricted following claim.

In addition, in execution mode described above, mentioned processing platform, computing system, controller and comprised the miscellaneous equipment of processor.These equipment comprise a CPU (" CPU ") and memory at least.According to the experience the technical staff in computer code field, various CPU and memory can be carried out the reference to the symbolic representation of action and operation or instruction.These actions and operation or instruction can be called by " execution ", " computer is carried out " or " CPU carries out ".

Those skilled in the art should be noted that the operation of this action and symbolic representation or instruction comprise the operation by the electric signal of CPU.Electrical system is represented data bit, and described data bit can make the conversion of generation or the maintenance of the minimizing of electric signal and the data bit that the memory location in storage system is located reconfigure or otherwise change CPU operation and the processing of other signal.The maintained memory location of data bit is specific electric, magnetic, light or the organic attribute that has corresponding to data bit or expression data bit.It should be understood that described illustrative embodiments is not limited to above-mentioned platform or CPU and other platform and CPU and can supports method described above.

Described data bit can also be safeguarded comprising on disk, CD and any other volatibility (for example, random access memory (" RAM ")) or non-volatile (for example, read-only memory (" ROM ")) mass-storage system by CPU.Described computer-readable medium comprises cooperation or interconnected computer-readable medium, and wherein said cooperation or interconnected computer-readable medium is present on the treatment system specially or is distributed in for treatment system in the local or long-range a plurality of interconnected treatment system.It should be understood that described illustrative embodiments is not limited to above-mentioned memory and other platform and memory and can supports method described above.

The element that uses in the present invention describes, action or instruction should not be interpreted as the present invention related or essential, unless be described as so specially.In addition, be intended to comprise one or more things as the measure word " a " that uses herein.When only meaning a things, term " (one) " or similarly language be used.In addition, follow the tabulation of a plurality of things and/or the things of a plurality of classifications to be intended to comprise " any (any of) ", " any combination (any combination of) ", " any a plurality of (any multiple ofs) " and/or " a plurality of any combination (any combination of multiples of) " things and/or things classification as term " any (any of) " back of using herein, individually or in conjunction with other things and/or other things classification.In addition, be intended to comprise as the term " group (set) " that uses herein and the things of any amount comprise zero.In addition, be intended to comprise Any Digit as the term " quantity (number) " that uses herein, comprise zero.

In addition, unless described claim should not be read as and be limited to order described above or element is set forth at this effect.In addition, the use of term " device (means) " is intended to quote in any claim

And any claim of not having word " device (means) " does not mean this.

Claims

1. method for video coding, this method comprises:

Receiving video data;

Each sampling ratio place in a plurality of sampling ratios determines the sampling error value;

At bit rate, each the sampling ratio place in described a plurality of sampling ratios determines the encoding error value;

Described sampling error value and the addition of described encoding error value with in described a plurality of sampling ratios each sampling ratio place;

Based on the described sampling error value at the sampling ratio place of selecting and described encoding error value and select the ratio of sampling in described a plurality of sampling ratio;

With the sampling ratio of selecting described video data is carried out down-sampling; And

To the coding video data behind the down-sampling.

2. method according to claim 1 wherein selects a sampling ratio in described a plurality of sampling ratio to comprise to select to cause in described a plurality of sampling ratios described sampling error value and described encoding error value and sampling ratio minimum.

3. method according to claim 1 is wherein selected a sampling ratio in described a plurality of sampling ratio to comprise and is selected to cause in described a plurality of sampling ratios a sampling ratio described sampling error value and described encoding error value and that have the global error value that is lower than the global error threshold value.

4. method according to claim 1, wherein said sampling error value are based on the power spectral density (PSD) of described video data and to the estimation of the PSD of the video data behind the down-sampling.

5. method according to claim 4, the estimation of wherein said PSD to the video data behind the down-sampling is a function, at least one parameter of wherein said function is determined by at least one characteristic of described video data.

6. method according to claim 1, wherein said sampling error value is based on video data and the filtered difference between video data of anti-sawtooth that receives.

7. method according to claim 1, wherein said encoding error value is based on the encoding error model, and wherein said encoding error model is the function of described bit rate and sampling ratio.

8. method according to claim 7, wherein said encoding error model comprises first parameter and second parameter, and each of wherein said first parameter and described second parameter is determined by at least one characteristic of described video data.

9. method according to claim 8, this method also comprises:

At each bit rate in a plurality of bit rates, determine every pixel bit value;

At each bit rate in described a plurality of bit rates, determine distortion value;

At each bit rate in described a plurality of bit rates, determine the distortion value of a plurality of estimations based on a plurality of values of a plurality of values of first parameter of described encoding error model and second parameter; And

Determine the selective value of first parameter of described encoding error model and the value of second parameter, make that the difference between the distortion value of described a plurality of distortion value and described a plurality of estimations is minimum.

10. method according to claim 8, this method also comprises:

From first look-up table, select the value of described first parameter; And

From second look-up table, select the value of described second parameter.

11. method according to claim 8, this method also comprises:

Determine the power spectral density of described video data, the value of the value of wherein said first parameter and second parameter is based on the DC component of described power spectral density.

12. method according to claim 8, this method also comprises:

Determine the power spectral density of described video data, the value of the value of wherein said first parameter and second parameter is based on to the decline rate of the high band of described power spectral density.

13. method according to claim 8, this method also comprises:

Wherein said at least one characteristic is the complexity value of the video data that receives; And

Wherein said complexity value from user input and network node one be received.

14. method according to claim 1, this method also comprises:

Receive the indication of described bit rate from network node.

15. method according to claim 14, this method also comprises:

After the sampling ratio in selecting described a plurality of sampling ratios, receive the indication of second bit rate;

At second bit rate, determine the encoding error value after the renewal at each the sampling ratio place in described a plurality of sampling ratios;

Based on described sampling error value and the encoding error value after upgrading with select to upgrade after the sampling ratio;

With the sampling ratio after the described renewal described input video is carried out down-sampling; And

Video sequence behind the down-sampling is encoded.

16. method according to claim 1, wherein said sampling ratio comprise level sampling ratio and vertical sampling ratio, and described level sampling ratio is different with described vertical sampling ratio.

17. method according to claim 1, wherein said sampling ratio comprise level sampling ratio and vertical sampling ratio, and described level sampling ratio is identical with described vertical sampling ratio.

18. method according to claim 1, first of wherein said sampling ratio are chosen in the place that begins of the video data that receives and are performed, the duration that at least the second of described sampling ratio is chosen in the video data that receives is performed.

19. a video encoding/decoding method, this method comprises:

Receive the video data after compressing;

Reception is to the indication of the sampling ratio selected, wherein said sampling ratio be based on the sampling error value of a plurality of sampling ratios and encoding error value and;

Video data after the compression is decoded to form video data after the reconstruct;

Carry out up-sampling to increase the resolution of the video data after the described reconstruct with the sampling ratio the selected video data after to described reconstruct; And

Export filtered video data.

20. a video decoding system, this system comprises:

Video Decoder, this Video Decoder is configured to:

Receive the video data after compressing;

Video data after the described reconstruct is carried out up-sampling to increase the resolution of the video after the described reconstruct; And

Video data behind the output up-sampling.

21. video decoding system according to claim 20, this system also comprises:

Wireless receiving/the transmitter unit of communicating by letter with communication system, wherein said wireless receiving/transmitter unit are configured to receive described video data from described communication system.