Setting up your Gretel environment (2/4) - How to set up the Gretel SDK
In part 2 of this video series on setting up your Gretel environment, Alex walks through using a local box and then configures the Gretel SDK to create a synthetic model.
Generate textbook-quality synthetic data for training LLMs and SLMs
How to get started creating high quality synthetic data for LLM training
How to use Gretel's PostgreSQL connector to automate synthetic data generation
Automating synthetic data generation using Gretel and PostgreSQL
(00:02): In our last video, we set up a box to run in GCP that is configured for deep learning with Gretel and training models using Gretel's synthetics. In this video, we're going to walk through using the local boxes, it could be something running in GCP, Azure, AWS, or even your local desktop, that's configured to work with the GPU. We will go ahead through setting up the Gretel SDK and configuring it to train a synthetic model. Picking right up where we left off, the first thing we're going to do, we will come back to generating an API key later. The first step for us is to go ahead and set up the CLI. We started with a base instance of Ubuntu and one of the surprises about it is it really ships with a really old version of Python. So let's go ahead and check out and see what we're running here.
(00:48): Let's see what we have here. It's called Python 3. So here we see a Python 3.6, which is too old to work with the Gretel client, so we're gonna go ahead and upgrade Python first. We have a handy script here that we can run, so go ahead and open this up. Might make some quick changes to the script and we will go ahead and run it. We'll grab the script; copy it in my environment. We'll call it "set up". Yeah, I set up Python. Gonna make a quick change here. So here, what we see it doing is we are adding the app repository for Python. We're setting up Python 3-dev -e as the install. Since we don't have a Python binary here, we're going to update this to be Python 3, just like we ran in the previous command. And we're going to point Python 3.8 at Python 3. Go ahead and exit out using VI or your favorite editor. And we'll go ahead and run this script. Here, you can see it connecting to the repositories, downloading the base image for Python 3 and upgrading us to 3.8.
(02:18): Jump back in and we are ready to start installing our Python packages. So let's go ahead and use PIP, and we're gonna install the Gretel client. And we are installed. Next step, following the instructions here is to run Gretel configure.
(02:45): We're going to connect to the cloud end point. This just sends usage information back to the Gretel cloud. So API that Gretel.cloud is correct. We're going to use a local runner here since we're going to be using our local box to do the compute, instead of asking the cloud service to do this for us. So type "local". Next, it's going to ask for API key. So if you haven't done this already, we're going to go log into the Gretel console and generate an API key Simplest way to get there is to connect to Gretel.ai, click sign in, connect with your favorite service provider. Go over to API key, and we'll export our API key to use here. So copy that to our clipboard. We'll go back to our command line interface here and paste that in. Since we don't have a project yet, we're going to create "none" for the default project. So we were mostly set up. Now we're going to create a default project. A project is a space for your models to be stored or essentially for you to enable collaboration with other people and share datasets. We'll go ahead and create a default local project here.
(04:02): There's a command line here to tell us what to do. I'm going to create a new project and name it. We'll make it our default. Great. We are all set up to start training a model using Gretel.