-
Input — the original signal to be encoded.
-
Predictor stem[p] — function that estimates the next piece of data based on predecessors (e.g., weighted average).
-
Example:
-
-
Error
-
Can be used as a preprocess stage before Huffman or some other symbol encoder.
-
Best when we know what "trends" in the data we can expect.
Compression
- Compression
-
-
Encoding data to take up less storage space / bandwidth at the cost of additional decoding calculations.
-
Algorithmical perspective: The goal is to remove redundancy.
-
- Psychovisual and psychoacoustic redundancy
-
Most things we humans see and hear contain low frequencies. If we remove the high-frequency data, the difference is near imperceivable.
- LZW coding (Lempel-Ziv-Welch)
-
-
GIF, TIFF, PDF, ZIP
-
Doesn’t need to transmit a dictionary, only the base set.
-
není potřeba posílat ani pravděpodobnostní tabulku
-
když naplníme slovník, zahodíme ho a začneme nový
-
bylo navrženo na text
-
- Arithmetic coding
-
TODO
- Huffman coding
-
TODO
- Entropy coding
-
TODO
- Context-Adaptive Binary Arithmetic Coding (CABAC)
-
TODO
- Lossless predictive encoding
-
The decoder must know what the predictor is!
- Lossy predictive coding
-
-
Has much higher compression ratio than lossless.
-
Decoded data is not the same as the original (before encoding) data.
-
Quantizer — reduces the amount of bits required to express the value but loses information (i.e., division).
-
The first value of input is passed as is (i.e., its predicted value is 0). It is then quantized, and then fed to the predictor to get the predicted value for the next value of f.
-
The predictor is used on its own output (i.e., in a feedback loop) so that it can be used the same way in the encoder as well as the decoder. If it were used only on the original input values, decoder would quickly accumulate undesired error.
-
Example: Delta modulation (DM) — error is always either a positive or negative
-
, where is a prediction coefficient ()
-
, where is a positive constant.
-
-
- Transform coding
-
-
DFT nebo DCT - zahazujeme typicky vysoké frekvence
-
- Run length encoding (RLE)
-
-
Lossless.
-
Stores repeated runs of data as a single occurrence along with the number of occurrences.
-
green green green green green⇒ (green, 5)
-
Image Compression
- CCITT Group 3
-
-
Consultative Committee for International Telephony and Telegraphy
-
Used for transmission fax machines for black-and-white bitmaps.
-
A combination of Huffman and RLE.
-
Also called Modified Huffman coding. TODO
-
- CCITT Group 4
-
-
Used for storage on disk drives. TODO
-
- Chroma subsampling
-
-
Using less bandwidth for chroma than for luma.
-
Humans are more likely to notice difference in black-white, than in color.
-
- YUV
-
-
Y = luma, U, V = chroma
-
- YCbCr
-
-
Y = luma, Cb = blue-difference chroma (B-Y), Cr = red-difference chroma (R-Y)
-
YUV expressed differently
-
- Subsampling scheme
-
-
Describes the number of samples for each channel in a 2-pixel-high conceptual region.
-
where
-
= width of the conceptual region (thus the number of luma samples)
-
= number of chroma samples (Cb, Cr) in the first row of pixels
-
= number of chroma changes (Cb, Cr) between the two rows of pixels (typically 0 or same as )
-
-
- JPEG
-
-
Joint Photographic Experts Group
-
The image format is actually JPEG File Interchange Format (JFIF)
-
Relies on Discrete Cosine Transform (DCT).
-
Convert RGB to YCbCr
-
Reduce resolution of Cb and Cr by 2 (i.e., subsampling scheme 4:2:0)
-
Process each of Y, Cb, Cr independently.
-
Split into 8x8 blocks.
-
Apply 8x8 forward DCT on each block.
-
Quantize each block by dividing it with a constant quantization matrix.
-
One matrix for Y, different for Cb and Cr. Both empirically derived and emphasizing lower frequencies, resulting in many zeroes in lower right corner.
-
The quantization matrix is multiplied by the -scale (1-100 %) quality factor.
-
-
Use lossless predictive coding on the quantized DC coefficient of each block.
-
Store quantized AC coefficients in a zigzag pattern and use RLE.
-
Use Huffman/arithmetic codingon zero run-lengths and magnitude of AC values.
-
- JPEG 2000 (J2K)
-
-
Hierarchical
-
Relies on Discrete Wavelet Transform (DWT). TODO
-
Video Compression
- Video Data Redundancy
-
-
Spatial — large areas of homogenous color
-
Temporal — many pixels unchanged between consecutive frames
-
- Hybrid Block Based Video Coding
-
-
Each video frame is split into (square) blocks.
-
Called superblocks, macroblocks, coding units depending on the codec.
-
Size depends on the codec.
-
Can be processed line-by-line, or sliced/tiled and processed in parallel.
-
Hybrid = combines inter & intra prediction with transform coding (e.g., DCT + quantization + entropy coding).
-
Blocks can be split in various ways, much like in a quadtree.
-
- Encoding loop
-
Each block in the block tree is processed:
-
Subtract prediction (intra or inter) from the input block, resulting in an error signal.
-
Transform the signal (e.g., using DCT or DST).
-
Quantize the signal ("round" the coefficients to fewer levels).
-
This is where a lot of the information in an image is lost.
-
Quantized coefficients are put into the bitstream.
-
-
Entropy coding (e.g., arithmetic coding)
-
Includes also motion information, prediction modes, block sizes, etc.
The processed block is reconstructed and used for encoding the next block:
-
-
Inverse quantization
-
Inverse transformation
-
Prediction and all of the motion information and other data is used in loop filtering
-
Deblocking filter — smoothing filter on block borders to make their boundaries less visible.
-
Decoder would do the same except the part in the upper left corner of the image. (It wouldn’t subtract the input because there is no input to be encoded.) The entropy coding is reversed. The picture buffer contains the decoded video.
-
- Intra prediction
-
-
Predicting the contents of a block using surrounding pixels.
-
Has a direction associated.
-
Many different prediction modes (mathematical formulas).
-
Different sets for different codecs.
-
-
- Inter prediction / motion compensation
-
-
Predicting the contents of a block based on the previous reference frame.
-
Typically associated with a direction and an offset.
-
- Frame coding order
-
-
The order in which frames are encoded is not the same as the order in which they are diplayed (chronological).
-
Allows for bidirectional motion compensation — predicting the frame based on the temporally past and temporally future frames.
-
Weighted combination of motion vectors from the two frames.
-
-
- Drift
-
Bad artifacts resulting from encoder and decoder not using the same prediction methods and other implementation details.
- Wavefront decoding
-
Decoding as many block as parallel, provided block to the top and left have been processed. The effect looks like a diagonal "wave".
- MPEG-1, MPEG-2
-
-
Moving Pictures Experts Group
-
DCT and predictive coding
-
- Advanced Video Coding (AVC, h.264, MPEG-4 Part 10)
-
-
Published in 2004.
-
DCT
-
Macroblocks (maximum 16x16)
-
Intra-/independent frames (I-frames)
-
Relatively few (e.g., 1/30 depending on framerate).
-
Frequency varies. One is typically at the start of a scene.
-
Uses a JPEG-like compression.
-
-
Predicted frames (P-frames)
-
Difference to a previous P- or I-frame.
-
-
Bidirectional frames (B-frames)
-
Most common
-
Difference between the current frame and a prediction of it based on the previous I- or P-frame and the next P-frame.
-
-
Seemingly can also use DWT for specific purposes, such as encoding of textures and watermarking.
-
- High Efficiency Video Coding (HEVC, h.265, MPEG-H Part 2)
-
-
Published in 2013.
-
Successor to AVC
-
More prediction modes than AVC (9 → 35)
-
Macroblocks renamed to Coding Tree Units
-
Costly licensing.
-
Relies on both DCT and DST (discrete sine transform).
-
- VP9
-
-
Published in 2013.
-
Royalty-free.
-
- AOMedia Video 1 (AV1)
-
-
Published in 2018.
-
AOMedia = Alliance for Open Media
-
Industry consortium with Amazon, Apple, Cisco, Google, Intel, Meta, Microsoft, etc.
-
-
Open standard, royalty-free (h.265 is not).
-
Macroblocks of up to 128x128.
-
Questions
- Explain the difference among coding redundancy, spatial/temporal redundancy, and psychoacoustic redundancy.
-
TODO
- What does quantizer do?
-
TODO
- Show the construction of Huffman coding tree.
-
TODO
- How does the arithmetic decoder know when it should stop decoding one codeword?
-
TODO
- How does the encoder deliver the dictionary to the receiver’s decoder in LZW compression scheme?
-
TODO
- Explain the meaning of predictor and error in lossless predictive coding scheme.
-
TODO
- Explain the main idea of employing DCT into a transform coding compression scheme.
-
TODO
- Why invent newer and newer image/video compression methods?
-
-
Resolution and computation capabilities goe up, but network bandwidth is limiting.
-
Storage is also costly.
-
Better quality for the same bandwidth.
-