AWS S3 Mountpoint: Treating S3 Like a Local Disk (And When It Actually Makes Sense) — aniketkarneai.com

Every ML engineer has written some version of this script:

# Download training data from S3 every epoch
aws s3 cp s3://my-bucket/data/ /local/data/ --recursive
python train.py

It’s clunky. It works. But managing that sync logic — figuring out what’s changed, handling partial downloads, making sure you don’t read stale data — is friction that has nothing to do with your actual problem.

AWS S3 Mountpoint takes a different approach. Instead of copying data down, you mount the bucket directly and read it like a local disk.

What Is S3 Mountpoint?

Mountpoint for Amazon S3 is an open-source FUSE-based file client that mounts an S3 bucket as a local filesystem. Your applications use normal file operations (open, read, ls, cat) and Mountpoint translates those into S3 object API calls behind the scenes.

Install it on Linux in one line:

# Amazon Linux 2023
sudo dnf install mount-s3

# Ubuntu/Debian
wget https://s3.amazonaws.com/mountpoint-s3-release/latest/x86_64/mount-s3.deb
sudo apt-get install ./mount-s3.deb

Then mount a bucket:

mkdir ~/mnt
mount-s3 my-bucket ~/mnt

That’s it. Your bucket is now accessible at ~/mnt, with object keys mapped to file paths. The object Data/2023-01-01.csv becomes the file ~/mnt/Data/2023-01-01.csv.

What It Can and Can’t Do

Mountpoint is explicitly not a full POSIX filesystem. It’s optimized for large-scale read-heavy workloads. Key constraints:

Reads and writes work, but it cannot modify existing files in-place or delete directories
No symbolic links, no file locking
Files up to 50 TB are supported
Not compatible with S3 Glacier storage classes
Available only on Linux

For workloads that need full POSIX semantics (shared concurrent writes, strict consistency guarantees), AWS recommends Amazon FSx for Lustre instead. But for bulk sequential reads and writes at scale, Mountpoint is significantly simpler.

Three Real-World Use Cases

1. ML Training Pipelines — Eliminating the Download Step

The classic ML workflow downloads data from S3 to local storage before each training run. With Mountpoint, your training script can read directly from S3 with no download step:

mount-s3 ml-bucket/training-data ~/training-data
python train.py  # Reads directly from S3

Add local caching for repeated reads across epochs:

mount-s3 ml-bucket ~/training-data --cache /local/cache

If your training job reads the same files multiple times per epoch, the local cache stores them after the first read — subsequent accesses are instant. For distributed training across multiple EC2 instances, you can layer a shared S3 Express One Zone cache:

mount-s3 ml-bucket ~/training-data   --cache /local/cache   --cache-xz ml-bucket--usw2-az1--x-s3

This way, repeated reads across instances hit the shared cache instead of re-fetching from S3.

2. Rendering Farms and Media Processing — Shared Asset Access Without Sync

A rendering farm processing 10,000 image assets doesn’t want to copy assets to every render node. With Mountpoint, all nodes access the same S3 bucket:

# On every render node
mount-s3 assets-bucket ~/assets
blender --scene character.blend --input-dir ~/assets/frames/

Standard tools (cat, ffmpeg, Python’s PIL) work directly on the mounted bucket. No asset sync service, no rsync cron jobs, no figuring out which node has which version. The S3 bucket is the single source of truth.

For read-heavy rendering workloads with repeated texture lookups, local caching gives you the performance of local storage with the centralization of object storage.

3. ETL Without a Data Pipeline

Setting up Apache Airflow or a similar orchestration tool just to move CSV files from S3 into a processing script is often overkill. With Mountpoint:

mount-s3 analytics-bucket ~/analytics
cat ~/analytics/events/2026-03-*.csv | python process.py

Your existing shell scripts, Python pipelines, and awk/grep chains work directly on S3 data. For one-off analysis tasks, ad-hoc reporting, and prototyping data pipelines, this is dramatically simpler than configuring a full ETL stack.

For production ETL with concurrent writes and complex dependencies, use proper tools. But for exploratory analysis and prototyping, Mountpoint removes an entire infrastructure layer.

The Catch

The most important thing to understand: Mountpoint does not provide strong consistency for in-place modifications. You can write new objects, but you can’t update existing ones. If your workload needs concurrent reads and writes to the same files, look at Amazon FSx for Lustre or EFS.

For bulk sequential reads and writes — data lakes, model training, media processing, ETL prototyping — Mountpoint is exactly the right tool. It removes infrastructure complexity without sacrificing scale.

Docs: Amazon S3 Mountpoint

Aniket Karne

DevOps & AI Engineer · Amsterdam

Back to all posts