The difference between AVS video standard and H.264 core technology

One of the most important developments in video coding technology over the past few years has been the development of the H.264/MPEG-4 AVC standard by the Joint Video Team (JVT) of the ITU and ISO/IEC. In the development process, the industry has adopted many different names for this new standard. The ITU began to process H.26L (long-term) with important new coding tools in 1997. The results were encouraging, so the ISO decided to join hands with the ITU to form a JVT and adopt a common standard. Therefore, everyone sometimes hears someone refer to this standard as JVT, even though it has an informal name. The ITU approved the new H.264 standard in May 2003. ISO approved the standard in October 2003 under the name MPEG-4 Part 10, Advanced Video Coding or AVC.

Improvements in H.264 implementation create new market opportunities

H.264/AVC has made a huge breakthrough in compression efficiency, and generally achieves approximately 2 times the compression efficiency of MPEG-2 and MPEG-4 simplified classes. In the formal test conducted by JVT, H.264 achieved more than 1.5 times the coding efficiency improvement in 85 cases, and more than 2 times in 77% of cases, and even up to 4 times in some cases. Improvements in H.264 implementation have created new market opportunities, such as: 600Kbps VHS quality video can be video-on-demand via ADSL lines; high-definition movies can be adapted to regular DVDs without the need for a new laser head.

H.264 standardization supports three categories: basic classes, main classes, and extension classes. Later, a revision called High Fidelity Range Extension (FRExt) introduced four additional classes called advanced classes. In the early days, the basic classes and main classes were mainly of interest. The base class reduces computational and system memory requirements and is optimized for low latency. Due to the inherent delay of the B frame and the computational complexity of CABAC, it does not include both. The base class is ideal for video telephony applications and other applications that require low-cost real-time encoding.

The main class provides the highest compression efficiency, but it also requires much higher processing power than the base class, making it difficult to use for low-cost real-time encoding and low latency applications. Broadcast and content storage applications are most interested in the main class, in order to get the highest video quality at the lowest possible bit rate.

Although H.264 uses the same main coding functions as the old standard, it also has many new features that are different from the old ones, which together achieve an increase in coding efficiency. The main differences are summarized as follows:

Intra Prediction and Coding: H.264 employs spatial domain intra prediction techniques to predict pixels in the Intra-MB of neighboring blocks of neighboring blocks. It encodes the prediction residual signal and the prediction mode instead of the actual pixels in the coding block. This can significantly improve the efficiency of intra coding.

Interframe Prediction and Coding: Interframe coding in H.264 uses the main features of the old standard, while also adding flexibility and operability, including several block size options for multiple functions, such as motion compensation. , quarter-pixel motion compensation, multiple reference frames, generalized bidirectional prediction, and adaptive loop deblocking.

Variable Vector Block Size: Allows motion compensation to be performed with different block sizes. A single motion vector can be transmitted for blocks as small as 4 (4), so up to 32 motion vectors can be transmitted for a single MB in the case of bidirectional prediction. Also supports 16 (8, 8 (16, 8 (8, 8 (4) And 4 (8 block size. Lowering the block size can improve the processing power of the motion details, thus improving the subjective quality experience, including eliminating large block distortion.

Quarter-pixel motion estimation: Motion compensation can be improved by allowing half-pixel and quarter-pixel motion vector resolution.

Multi-reference frame prediction: 16 different reference frames can be used for inter-frame coding, which can improve the subjective perception of video quality and improve coding efficiency. Providing multiple reference frames also helps improve the fault tolerance of the H.264 bitstream. It is worth noting that this feature increases the memory requirements of the encoder and decoder because multiple reference frames must be saved in memory.

Adaptive Loop Deblocking Filter: H.264 uses an adaptive deblocking filter that processes the horizontal and vertical block edges within the prediction loop to eliminate distortion caused by block prediction errors. This filtering is usually based on 4 (4 block boundaries are the basis of the operation, where 3 pixels on each side of the boundary can be updated by a 4-level filter.

Integer Transform: The early criteria for using DCT must define the tolerance range for rounding errors for the fixed-point implementation of the inverse transform. The drift caused by the IDCT precision mismatch between the encoder and the decoder is the source of quality loss. H.264 solves this problem with integer 4 (4 spatial domain transforms - this transform is an approximation of DCT. 4 (4 blocks also help to reduce blocking and ringing distortion.

Quantization and transform coefficient scanning: The transform coefficients are quantized by scalar quantization without generating an increased dead zone. Similar to the previous standard, each MB can choose a different quantization step size, but the step size increases at a composite rate of approximately 12.5% ​​instead of a fixed increment. At the same time, finer quantization steps can also be used for chroma components, especially in the case of poorly quantized photometric coefficients.

Entropy coding: Unlike previous standards that provide multiple static VLC tables depending on the type of data involved, H.264 employs context adaptive VLC for transform coefficients while employing a uniform VLC (Universal VLC) approach for all other symbols. The main class also supports the new Context Adaptive Binary Arithmetic Encoder (CABAC). CAVLC is superior to previous VLC implementations, but at a higher cost than VLC.

CABAC utilizes the probability model of the encoder and decoder to process all syntax elements, including transform coefficients and motion vectors. In order to improve the coding efficiency of arithmetic coding, the basic probability model adapts the constantly changing statistics in the video frame by a method called context modeling. The context modeling analysis provides a conditional probability estimate of the encoded symbol. As long as the appropriate context model is used, it is possible to switch between different probability models according to the coded symbols around the symbols to be coded, thereby fully utilizing the redundancy between symbols. Each syntax element can maintain a different model (for example, motion vectors and transform coefficients have different models). Compared to the VLC entropy coding method (UVLC/CAVLC), CABAC can save 10% bit rate.

Weighted prediction: It uses the weighted sum of forward and backward prediction to establish a prediction of the bidirectional interpolation macroblock, which can improve the coding efficiency when the scene changes, especially in the case of fading.

Fidelity range expansion: In July 2004, the H.264 standard added a new revision called Fidelity Range Extension (FRExt) [11]. This extension adds a whole set of tools to H.264 and allows for additional color gamut, video format and bit depth. Support for lossless interframe coding and stereoscopic display video has also been added. The FRExt revision introduces four new classes in H.264, namely:

• High Profile (HP): For standard 4:2:0 chroma sampling, 8 bits per component. This class introduces new tools -- and then detailed.

• High 10 Profile (Hi10P): Standard 4:2:0 chroma sampling for higher definition video display, 10-bit color.

• High 4:2:2 10 bit color profile (H422P): For source editing.

• High 4:4:4 12 bit color profile (H444P): Highest quality source editing and color fidelity, support for lossless encoding of video regions and new integer gamut conversion (from RGB to YUV and black).

In new applications, H.264 HP is especially beneficial for broadcast and DVD. Some tests have shown that the performance of H.264 HP is three times better than MPEG2. The main additional tools introduced in H.264 HP are described below.

Adaptive residual block size and integer 8 (8 transform: The residual block used for transform coding can be switched between 8 (8 and 4 (4). A new 16-bit integer transform for 8 (8 blocks) is introduced. Small blocks can still use the previous 4 (4 transform).

8 (8 luma intra prediction: 8 modes have been added, except for the previous 16 (16 and 4 (4 blocks, so that the luminance inner macroblock can also perform intra prediction on 8 (8 blocks).

Quantization weighting: A new quantization weighting matrix used to quantize 8 (8 transform coefficients).

Monochrome: Supports black/white video encoding.

AVS video standard

In 2002, the Audio and Video Technology Standards (AVS) Working Group established by the Ministry of Information Industry of China announced that it was preparing to prepare a national standard for mobile multimedia, broadcasting, DVD and other applications. This video standard is called AVS [14] and consists of two related parts: AVS-M for mobile video applications and AVS 1.0 for broadcast and DVD. The AVS standard is similar to H.264.

AVS1.0 supports both interlaced and progressive scan modes. The P frame in the AVS can utilize the forward frame of 2 frames, while allowing the B frame to adopt one frame before and after. In interlaced mode, 4 fields can be used as a reference. Frame/field coding in interlaced mode can be performed only at the frame level, unlike H.264, which allows MB-level adaptation of this option. AVS has a loop filter similar to H.264 and can be turned off at the frame level. In addition, the B frame does not require a loop filter. The intra prediction is performed in units of 8 (8 blocks. The MC allows 1/4 pixel compensation for the luma block. The block size of the ME can be 16 (16, 16 (8, 8 (16 or 8 (8. It is based on 16-bit 8 (8 integer transform (similar to WMV9). VLC is based on context adaptive 2D run/level coding. Four different Exp-Golomb encodings are used. The encoding for each quantized coefficient is adaptive to The same 8 (the preceding symbols in the 8 blocks. Since the Exp-Golomb table is a parameterized table, the table is smaller. The video quality of AVS 1.0 for progressive video sequences is slightly inferior to that of H.264 at the same bit rate. class.

AVS-M is primarily targeted at mobile video applications and intersects with the H.264 base specification. It only supports progressive video, I and P frames, and does not support B frames. The main AVS-M coding tools include 4 (4 blocks of intra prediction, 1/4 pixel motion compensation, integer transform and quantization, context adaptive VLC, and highly simplified loop filters. Similar to the H.264 basic specification) The motion vector block size in AVS-M is reduced to 4 (4, so MB can have up to 16 motion vectors. Multi-frame prediction is used, but only 2 reference frames are supported. In addition, AVS is defined in AVS-M. A subset of the 264 HRD/SEI messages. The AVS-M encodes a frequency of approximately 0.3 dB, which is slightly lower than the H.264 basic specification at the same setting, while the decoder complexity is reduced by approximately 20%.

H.264 and AVS background

H.264/MPEG-4 AVC is a next-generation video coding standard jointly developed by ITU-T's VCG (Video Coding Experts Group) and ISO/IEC MPEG (Moving Picture Experts Group). Applications include video telephony, video conferencing, and more. The main feature of H.264 is that it greatly increases the compression ratio, which is more than double the MPEG-2 and MPEG-4 compression efficiency. The H.264 core technology is the same as the previous standard, and still uses a hybrid coding framework based on predictive transformation, but there is a big difference in the implementation of the details, that is, the improvement in detail leads to a great improvement in compression efficiency. And the new generation video coding standard H.264 has good network adaptability and fault tolerance.

The birth of AVS can be said to be a historical opportunity. Faced with high standard patent fees such as H.264 and MPEG-2, China's digital video industry faces serious challenges. In addition, China is committed to improving the core competitiveness of the domestic digital audio and video industry. In June 2006, the Science and Technology Department of the Ministry of Information Industry approved the establishment of the “Digital Audio Video Codec Technical Standard Working Group”, which is engaged in digital audio and video. Research institutes and enterprises developed by codec technology, in response to the demand of China's audio and video industry, proposed the source coding standard of China's independent intellectual property rights---《Information Technology Advanced Audio Video Coding》 series standard, referred to as AVS (audio video coding standard) The independent AVS standard is at the international advanced level in terms of technology and performance. If we seize this opportunity, China may have a comprehensive initiative in the technology-patent-standard-chip-system-industry industry chain.

Analysis and comparison of H.264 and AVS core technologies

H.264, like the previous standard, is a hybrid coding framework. The AVS video standard uses a similar technical framework to H.264, including transform, quantization, entropy coding, intra prediction, inter prediction, and loop filtering. And other modules. The differences in their core technologies include the following:

First, transform and quantify

H.264 uses block-based transform coding for residual data to remove the spatial redundancy of the original image, so that the image capability is concentrated on a small part of the coefficient, and the DC coefficient value is generally the largest, which can improve the compression ratio and enhance the resistance. Interference ability. The previous standard generally adopts the DCT transform. The disadvantage of this transform is that there will be a mismatch phenomenon. The original data will have a difference after being transformed by the transform and the inverse transform, and the calculation amount is also large because it is a real number operation. H.264 uses an integer transform based on 4&TImes; 4 blocks.

AVS uses 8&TImes; 8 integer transforms, which can be implemented without mismatch on 16-bit processors. The high-resolution video image de-correlation is more efficient than the 4&TImes;4 transform, using 64-level quantization, which can adapt to different application and service requirements for code stream and quality.

Second, intra prediction

Both H.264 and AVS techniques use intra prediction to predict the current block with adjacent pixels, using multiple prediction modes that represent spatial domain textures. The luminance prediction of H.264 has 4 prediction modes of 4 & TImes; 4 blocks and 16×16 blocks. For 4×4 blocks: adding a DC prediction from -135 degrees to +22.5 degrees is a total of 9 prediction directions; For 16x16 blocks: There are 4 prediction directions. The chrominance prediction is an 8×8 block with four prediction modes, similar to the four modes of intra 16×16 prediction, where DC is mode 0, horizontal is mode 1, vertical is mode 2, and plane is mode 3.

Third, inter prediction

H.264 inter-prediction is a prediction mode that utilizes coded video frames and block-based motion compensation. The difference from previous standard interframe prediction is the wider block size range, the use of sub-pixel motion vectors, and the use of multiple reference frames.

H.264 has 8 macroblocks and sub-macroblock partitions of 16×16, 16×8, 8×16, 8×8, 8×4, 4×8 and 4×4, while AVS is only 16×16, 16 There are 4 macroblock division modes of ×8, 8×16 and 8×8.

H.264 supports the prediction of inter macroblocks and slices using a plurality of different reference frames. In AVS, P frames can utilize up to 2 frames of forward reference frames, and B frames use one frame before and after.

Fourth, entropy coding

H.264 has developed information-based entropy coding efficiency, one is to use uniform variable length coding (UVLC) for all symbols to be coded, and the other is to use content-based adaptive binary arithmetic coding (CABAC, Context-Adaptive Binary Arithmetic Coding) greatly reduces block coding correlation redundancy and improves coding efficiency. The UVLC calculation complexity is low, mainly for applications with strict coding time. The disadvantage is low efficiency and high code rate. CABAC is an efficient entropy coding method with a coding efficiency 50% higher than UVLC coding.

AVS entropy coding uses adaptive variable length coding techniques. In the AVS entropy encoding process, all syntax elements and residual data are mapped into a binary bit stream in the form of an exponential Golomb code.

The advantage of using the index Columbus code is: on the one hand, its hardware complexity is relatively low, the code can be parsed according to the closed formula, no need to look up the table; on the other hand, it can flexibly determine the K-order index Columbus according to the probability distribution of the coding elements. Code coding, if K is chosen properly, the coding efficiency can approach the information entropy.

The block transform coefficients of the prediction residuals are scanned to form (level, run) pairs, and level and run are not independent events, but there is a strong correlation. In AVS, level and run are combined by two-dimensional joint coding, and according to The current probability distribution trends of level and run, adaptively change the order of the index Columbus code.

In addition, there are no SI or SP frames in AVS. It can be said that AVS is developed on the basis of H.264, absorbing the essence of H.264, but in order to bypass the trouble of patents, it has to give up some core algorithms of H.264. change

The price comes from the fact that the coding efficiency is greatly reduced, and the complexity is greatly reduced.

AVS is the standard of China's independent intellectual property rights. It is not yet used on a large scale and is in its infancy. Most enterprises are in a wait-and-see state. They are not heavily invested and face many difficulties. However, its broad prospects cannot be ignored, and with the strong support of the country, it will certainly develop more perfect.

Neon Christmas Lights

Neon Christmas Lights,Led Neon Christmas Lights,Neon Word Lights,Christmas Neon Light Signs

Shenzhen Oleda Technology Co.,Ltd , http://www.baiyangsign.com

This entry was posted in on