FFmpeg — The Swiss Army Knife That Has Powered Media for 25 Years
Architecture, codec abstraction, filter graphs, hardware acceleration — dissecting FFmpeg internals
What Is FFmpeg?
FFmpeg is both a CLI tool and a collection of libraries. "FF" stands for Fast Forward, "mpeg" comes from the video standards group name. Created by Fabrice Bellard in 2000, it's been developed by hundreds of contributors including Michael Niedermayer.
One-line summary: it can take virtually any media format as input and produce virtually any media format as output.
ffmpeg -i input.mov -c:v libx264 -c:a aac output.mp4
This single line converts a MOV file to MP4 with H.264 video + AAC audio.
Core Components
The FFmpeg project consists of multiple binaries and libraries.
Binaries:
ffmpeg— main CLI for media conversion and encodingffprobe— media file metadata and stream info queryffplay— SDL-based simple media player
Libraries (libav*):
libavcodec— codec implementations (H.264, HEVC, VP9, AV1, AAC, Opus, etc.)libavformat— container format handling (MP4, MKV, WebM, FLV, HLS, etc.)libavfilter— filter graphs (scaling, cropping, overlay, color correction, etc.)libavutil— common utilities (math, pixel formats, error codes, etc.)libswscale— pixel format conversion + scalinglibswresample— audio resampling + format conversionlibavdevice— capture device I/O (camera, mic, screen capture)
Processing Pipeline
FFmpeg's data flow:
Input → Demux → Decode → Filter → Encode → Mux → Output
- Demuxing (libavformat) — separates individual streams (video, audio, subtitles) from containers. Extracts compressed packets
- Decoding (libavcodec) — decodes compressed packets to raw frames. Video becomes YUV/RGB pixel data, audio becomes PCM samples
- Filtering (libavfilter) — applies filter chains to raw frames. Scaling, cropping, color correction, text overlay, denoising, etc.
- Encoding (libavcodec) — re-compresses processed frames. Encodes according to target codec, bitrate, and quality settings
- Muxing (libavformat) — wraps encoded streams into a container and writes the output file
In stream copy mode (-c copy), decoding/encoding is skipped and packets are remuxed as-is. Used when changing containers without quality loss.
Codec Abstraction
The core of libavcodec is the AVCodec struct. Every codec (H.264, HEVC, VP9, AV1, ...) implements the same interface.
typedef struct AVCodec {
const char *name; // "libx264", "h264_nvenc", etc.
enum AVMediaType type; // AVMEDIA_TYPE_VIDEO, AUDIO, etc.
enum AVCodecID id; // AV_CODEC_ID_H264, etc.
int (*init)(AVCodecContext *);
int (*encode2)(AVCodecContext *, AVPacket *, const AVFrame *, int *);
int (*decode)(AVCodecContext *, AVFrame *, int *, AVPacket *);
int (*close)(AVCodecContext *);
// ...
} AVCodec;
Multiple implementations can exist for the same codec ID (e.g., H.264):
libx264— software encoder (CPU)h264_nvenc— NVIDIA GPU hardware encoderh264_qsv— Intel Quick Synch264_vaapi— VA-API (Linux)h264_videotoolbox— macOS hardware encoder
Users just switch the encoder name with -c:v libx264. The rest of the pipeline works identically.
Filter Graphs
libavfilter connects filters as a directed graph.
ffmpeg -i input.mp4 -vf "scale=1280:720,fps=30,eq=brightness=0.1" output.mp4
Three filters chained:
1. scale — resize to 1280x720
2. fps — convert to 30fps
3. eq — increase brightness by 0.1
Complex filter graphs can have multiple inputs and outputs:
ffmpeg -i main.mp4 -i logo.png \
-filter_complex "[0:v][1:v]overlay=10:10[out]" \
-map "[out]" output.mp4
Overlays a logo on the main video. [0:v] and [1:v] are each input's video stream, [out] is the filter graph's output label.
Hardware Acceleration
FFmpeg abstracts hardware acceleration through the hwaccel API.
NVIDIA — NVDEC (decode) + NVENC (encode) + CUDA filters
Intel — QSV (Quick Sync Video)
AMD — AMF (Advanced Media Framework)
VA-API — Linux generic hardware acceleration
VideoToolbox — macOS/iOS
Vulkan — cross-platform GPU compute
GPU decode → GPU filter → GPU encode pipeline keeps frames in GPU memory, eliminating CPU-GPU data transfer overhead.
ffmpeg -hwaccel cuda -hwaccel_output_format cuda \
-i input.mp4 \
-vf scale_cuda=1280:720 \
-c:v h264_nvenc -preset p4 output.mp4
Who Uses FFmpeg?
Virtually every service and tool that handles media.
YouTube — upload transcoding
Netflix — content encoding pipeline
Meta — tens of billions of executions daily (VOD + livestreaming)
VLC — core playback engine
OBS Studio — streaming/recording
HandBrake — desktop transcoding
Plex/Jellyfin — media server transcoding
Modern media infrastructure doesn't work without FFmpeg. Over 3 million lines of C code, developed for 25+ years, powers all of it.
How It Works
Demuxing — separate video, audio, subtitle streams from container
Decoding — decompress packets to raw frames (YUV/PCM)
Filtering — apply scaling, cropping, color correction via filter graph
Encoding — re-compress processed frames with target codec
Muxing — wrap encoded streams into container (MP4/MKV/WebM) and write output