Replit: Customer Spotlight
At their Developer Day on April 25, Replit announced their own open source code generation LLM that was trained in 10 days on the MosaicML platform with 256 x A100-40GB GPUs, leveraging MosaicML's latest LLM examples repo. The replit-code-v1-3b model is a 2.7B LLM trained on 20 languages (Markdown, Java, JavaScript, Python, TypeScript, PHP, SQL, JSX, reStructuredText, Rust, C, CSS, Go, C++, HTML, Vue, Ruby, Jupyter Notebook, R, Shell) from the Stack Dedup v1.2 dataset.
Replit shared more details on how they used MosaicML to train their LLMs.
Why is it important for customers to train their own LLMs?
There are plenty of reasons why a company might decide to train its own LLMs, ranging from data privacy and security to increased control over updates and improvements. At Replit, we care primarily about customization, reduced dependency, and cost efficiency. Training a custom model allows us to tailor it to our specific needs and requirements. While we'll always use the right model based on the task at hand, we believe there are benefits to being less dependent on only a handful of AI providers. LLMs are still prohibitively expensive for use amongst the global developer community. At Replit, our mission is to bring the next billion software creators online. To make this possible, we train custom models that are smaller, more efficient, and can be hosted with drastically reduced cost.
How are you working with MosaicML to build out your model training stack?
LLMs require an immense amount of data to train. Training them requires building robust data pipelines that are highly optimized and yet flexible enough to easily include new sources of both public and proprietary data. We train our models using MosaicML. Having previously deployed our own training clusters, we found that the MosaicML platform gives us a few key benefits.
Mosaic gives us the ability to leverage GPUs from different cloud providers without the overhead of setting up an account and all of the required integrations. The Composer library has a number of well-tuned configurations for training a variety of models and for different types of training objectives. Their managed infrastructure provides us with orchestration, efficiency optimizations, and fault tolerance (i.e., recovery from node failures).
What did MosaicML do well as a trusted infrastructure provider vs. a commodity GPU provider?
One benefit is you can actually get GPUs from different providers and you don't need to be signed up for that cloud provider. So it kind of detaches like your GPU offering from the rest of your cloud because most of our cloud runs in GCP.
[The MosaicML platform] allowed us to leverage GPUs and other providers as well. And then another thing is infrastructure as a service. GPUs burn out. You have node failures, you have like all kinds of hardware issues that come up. And so the ability to kind of not have to deal with that and allow the MosaicML team to provide that fault tolerance was huge for us.
They have a lot of experience in training these models. They have the right kind of pre-configured setups for various models that make sure that you have the right learning rates, the right training parameters, and you're making the best use of the GPU and the underlying hardware. And so your GPU utilization is always at optimal levels. You have fewer loss spikes and if you do, you can recover from them. And you're really getting the most value out of the compute. We found that to be incredibly, incredibly helpful.
Of the time that we spent running things on MosaicML, very little is trying to figure out why the GPU isn't being utilized or why it keeps crashing or you have CUDA out of memory errors or something like that. All of those things that make training a nightmare are really well handled by MosaicML and their Composer and ecosystem.
What do you think a year from now people will be the most surprised by in AI?
I'm really interested in seeing how a lot of this technology will be applied to domains outside of chat. And, and I think we're kind of just at the beginning of that world. [Generative AI chatbots] took a lot of people by surprise because it was the first time that people started to actually interact with it and see what the capabilities were.
And I think that once you start to apply it to actual products and businesses use cases, it's going to become incredibly powerful. For example, if you are traveling and you want to be able to ask specific questions about where you're going and plan out your trip, and maybe you wanna know if there are noise complaints in the Airbnb you just are thinking of booking. You might have a chatbot be able to create a query that goes and looks at noise complaints that were filed or construction permits that are filed that fall within the same date range of your stay.
I think that type of transfer learning when applied to specific industries and specific products is gonna be incredibly powerful. And I don't think anyone has that many clues in terms of what's going to be possible there and how much a lot of our favorite products might change and become a lot more powerful with this technology.
For more information:
- Intro to Replit Ghostwriter
- replit-code-v1-3b code announcement on Twitter
- Replit org page on HuggingFace
- Replit blog post: how to train your own LLMs
Sources: Replit blog post published on April 19, 2023; Latent Space podcast published on May 3, 2023.