🎬

FFmpeg — The Swiss Army Knife That Has Powered Media for 25 Years

Architecture, codec abstraction, filter graphs, hardware acceleration — dissecting FFmpeg internals

What Is FFmpeg?

FFmpeg is both a CLI tool and a collection of libraries. "FF" stands for Fast Forward, "mpeg" comes from the video standards group name. Created by Fabrice Bellard in 2000, it's been developed by hundreds of contributors including Michael Niedermayer.

One-line summary: it can take virtually any media format as input and produce virtually any media format as output.

ffmpeg -i input.mov -c:v libx264 -c:a aac output.mp4

This single line converts a MOV file to MP4 with H.264 video + AAC audio.

Core Components

The FFmpeg project consists of multiple binaries and libraries.

Binaries:

ffmpeg — main CLI for media conversion and encoding
ffprobe — media file metadata and stream info query
ffplay — SDL-based simple media player

Libraries (libav*):

libavcodec — codec implementations (H.264, HEVC, VP9, AV1, AAC, Opus, etc.)
libavformat — container format handling (MP4, MKV, WebM, FLV, HLS, etc.)
libavfilter — filter graphs (scaling, cropping, overlay, color correction, etc.)
libavutil — common utilities (math, pixel formats, error codes, etc.)
libswscale — pixel format conversion + scaling
libswresample — audio resampling + format conversion
libavdevice — capture device I/O (camera, mic, screen capture)

Processing Pipeline

FFmpeg's data flow:

Input → Demux → Decode → Filter → Encode → Mux → Output

Demuxing (libavformat) — separates individual streams (video, audio, subtitles) from containers. Extracts compressed packets
Decoding (libavcodec) — decodes compressed packets to raw frames. Video becomes YUV/RGB pixel data, audio becomes PCM samples
Filtering (libavfilter) — applies filter chains to raw frames. Scaling, cropping, color correction, text overlay, denoising, etc.
Encoding (libavcodec) — re-compresses processed frames. Encodes according to target codec, bitrate, and quality settings
Muxing (libavformat) — wraps encoded streams into a container and writes the output file

In stream copy mode (-c copy), decoding/encoding is skipped and packets are remuxed as-is. Used when changing containers without quality loss.

Codec Abstraction

The core of libavcodec is the AVCodec struct. Every codec (H.264, HEVC, VP9, AV1, ...) implements the same interface.

typedef struct AVCodec {
    const char *name;           // "libx264", "h264_nvenc", etc.
    enum AVMediaType type;      // AVMEDIA_TYPE_VIDEO, AUDIO, etc.
    enum AVCodecID id;          // AV_CODEC_ID_H264, etc.
    int (*init)(AVCodecContext *);
    int (*encode2)(AVCodecContext *, AVPacket *, const AVFrame *, int *);
    int (*decode)(AVCodecContext *, AVFrame *, int *, AVPacket *);
    int (*close)(AVCodecContext *);
    // ...
} AVCodec;

Multiple implementations can exist for the same codec ID (e.g., H.264):

libx264 — software encoder (CPU)
h264_nvenc — NVIDIA GPU hardware encoder
h264_qsv — Intel Quick Sync
h264_vaapi — VA-API (Linux)
h264_videotoolbox — macOS hardware encoder

Users just switch the encoder name with -c:v libx264. The rest of the pipeline works identically.

Filter Graphs

libavfilter connects filters as a directed graph.

ffmpeg -i input.mp4 -vf "scale=1280:720,fps=30,eq=brightness=0.1" output.mp4

Three filters chained:
1. scale — resize to 1280x720
2. fps — convert to 30fps
3. eq — increase brightness by 0.1

Complex filter graphs can have multiple inputs and outputs:

ffmpeg -i main.mp4 -i logo.png \
  -filter_complex "[0:v][1:v]overlay=10:10[out]" \
  -map "[out]" output.mp4

Overlays a logo on the main video. [0:v] and [1:v] are each input's video stream, [out] is the filter graph's output label.

Hardware Acceleration

FFmpeg abstracts hardware acceleration through the hwaccel API.

NVIDIA — NVDEC (decode) + NVENC (encode) + CUDA filters
Intel — QSV (Quick Sync Video)
AMD — AMF (Advanced Media Framework)
VA-API — Linux generic hardware acceleration
VideoToolbox — macOS/iOS
Vulkan — cross-platform GPU compute

GPU decode → GPU filter → GPU encode pipeline keeps frames in GPU memory, eliminating CPU-GPU data transfer overhead.

ffmpeg -hwaccel cuda -hwaccel_output_format cuda \
  -i input.mp4 \
  -vf scale_cuda=1280:720 \
  -c:v h264_nvenc -preset p4 output.mp4

Who Uses FFmpeg?

Virtually every service and tool that handles media.

YouTube — upload transcoding
Netflix — content encoding pipeline
Meta — tens of billions of executions daily (VOD + livestreaming)
VLC — core playback engine
OBS Studio — streaming/recording
HandBrake — desktop transcoding
Plex/Jellyfin — media server transcoding

Modern media infrastructure doesn't work without FFmpeg. Over 3 million lines of C code, developed for 25+ years, powers all of it.

How It Works

1

Demuxing — separate video, audio, subtitle streams from container

2

Decoding — decompress packets to raw frames (YUV/PCM)

3

Filtering — apply scaling, cropping, color correction via filter graph

4

Encoding — re-compress processed frames with target codec

5

Muxing — wrap encoded streams into container (MP4/MKV/WebM) and write output

Use Cases

Media transcoding — format, codec, resolution, bitrate conversion Live streaming — RTMP/HLS/DASH I/O + real-time encoding Media analysis — query codec, resolution, framerate, bitrate with ffprobe