Skip to main content

Cloudflare R2 and MosaicML: Train LLMs on Any Compute with Zero Switching Costs


Share this post
Cloudflare R2 and MosaicML: Train LLMs on Any Compute with Zero Switching Costs

Together, Cloudflare and MosaicML give users the freedom to train LLMs on any compute, anywhere in the world, for faster, cheaper training runs without vendor lock-in.

Read the complete blog post to learn more!

Building generative AI models requires massive compute AND data storage infrastructure. Training huge datasets means that terabytes of data must be read in parallel by thousands of processes. In addition, model checkpoints need to be saved frequently throughout a training run, and these checkpoints alone can be hundreds of gigabytes in size.

In a recent blog post, Cloudflare and MosaicML engineers discuss how their tools work together to address these challenges. MosaicML’s open source StreamingDataset and Composer libraries let users easily stream in training data and read/write model checkpoints back to Cloudflare R2. And thanks to R2’s zero-egress pricing and MosaicML’s cloud-agnostic platform, users can start/stop/move/resize jobs in response to GPU availability and prices across compute providers, without paying any data transfer fees. By eliminating egress fees, R2’s storage is an exceptionally cost-effective complement to MosaicML training, providing maximum autonomy and control.