Skip to main content

Finetuning

1. Sign in

Before you start using the CLI, you need to log in with your username and password.

stochasticx login

If you don't have an account, you can create one on the Stochastic platform website: https://app.stochastic.ai/signup

2. Download models and datasets

For this tutorial you will need to download a Hugging Face model that can solve a text classification task, like the distilbert-base-uncased. To do that run the following code:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
tokenizer.save_pretrained("models/distilbert-base-uncased")

model = AutoModelForQuestionAnswering.from_pretrained("distilbert-base-uncased")
model.save_pretrained("models/distilbert-base-uncased")

You will also need a dataset. For this tutorial we will download the GLUE - MRPC dataset from the Hugging Face Hub:

from datasets import load_dataset
dataset = load_dataset("glue", "mrpc")
dataset['train'] = dataset['train'].select(range(100))
dataset.save_to_disk("datasets/mrpc")
info

In this finetuning job we will use only 100 samples to do the finetuning. Feel free to comment this line to use the full dataset.

3. Initialize the local mode in the CLI

By default, the stochasticx CLI will be communicating with the cloud infrastructure. If you want to work in the local mode, you can execute:

stochasticx local init

This will download and start some Docker containers needed to work locally. If you want to check in which mode you are working, you can execute the following command:

stochasticx config inspect
CLI output
[+] You are in the **local** mode...

To switch to the cloud mode, just execute:

stochasticx config cloud
CLI output
[+] Setting up **local** mode...

4. Upload your model and dataset to the local registry

To upload your model to the local registry, execute the following command:

stochasticx models add --name distilbert-base-uncased --dir_path models/distilbert-base-uncased --type hf
CLI output
[+] Adding the model to the registry...

[+] Model added

To upload your dataset to the local registry, execute the following command:

stochasticx datasets add --name mrpc --dir_path datasets/mrpc --type hf
CLI output
[+] Adding the dataset to the registry...

[+] Dataset added

You can also list your local models:

stochasticx models ls
CLI output
[+] Collecting all local models

┏━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━━┓
┃ Id ┃ Name ┃ Directory path ┃ Type ┃ Uploaded ┃
┡━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━━━━┩
│ 1 │ distilbert-base-uncased │ /vol/registry/models/d24141bb-5e93-4442-91ac-d245d20890d4 │ hf │ True │
└────┴─────────────────────────┴───────────────────────────────────────────────────────────┴──────┴──────────┘

You can list your datasets:

stochasticx datasets ls
CLI output
[+] Collecting all local datasets

┏━━━━┳━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━━┓
┃ Id ┃ Name ┃ Directory path ┃ Type ┃ Uploaded ┃
┡━━━━╇━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━━━━┩
│ 1 │ mrpc │ /vol/registry/datasets/980eb081-20f1-427d-ba2e-a261bf37150a │ hf │ True │
└────┴──────┴─────────────────────────────────────────────────────────────┴──────┴──────────┘

5. Launch the finetuning job

Now you are ready to launch a finetuning job

stochasticx finetuning launch sequence_classification \
--name finetuning_job_name \
--model_name_or_id distilbert-base-uncased \
--dataset_name_or_id mrpc \
--sentence1_column "sentence1" \
--sentence2_column "sentence2" \
--label_column "label"
CLI output
[+] Launching finetuning job

List your finetuning jobs

stochasticx finetuning ls
CLI output
[+] Collecting all jobs

┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
┃ Name ┃ status ┃ Created at ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
│ finetuning_job_name │ running │ 2022-11-16 11:32:47 │
└─────────────────────┴────────────┴─────────────────────┘

Wait until it is finished. You can know it is finished when the job status is set to successful.

CLI output
[+] Collecting all jobs

┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
┃ Name ┃ status ┃ Created at ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
│ finetuning_job_name │ successful │ 2022-11-16 11:32:47 │
└─────────────────────┴────────────┴─────────────────────┘

You can also collect the logs of the finetuning job:

stochasticx finetuning logs --name finetuning_job_name
CLI output
[+] Collecting logs

{'loss': 0.3763, 'learning_rate': 1.3768115942028985e-05, 'epoch': 2.17}
{'train_runtime': 115.5523, 'train_samples_per_second': 95.23, 'train_steps_per_second': 5.971, 'train_loss': 0.31296598738518316, 'epoch': 3.0}
***** train metrics *****
epoch = 3.0
train_loss = 0.313
train_runtime = 0:01:55.55
train_samples = 3668
train_samples_per_second = 95.23
train_steps_per_second = 5.971

Your optimized model should have the name finetuning_job_name_optimized.

stochasticx models ls
CLI output
[+] Collecting all local models

┏━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━━┓
┃ Id ┃ Name ┃ Directory path ┃ Type ┃ Uploaded ┃
┡━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━━━━┩
│ 1 │ distilbert-base-uncased │ /vol/registry/models/b1d89b1a-82bc-45ca-bacf-c023116ef0fa │ hf │ True │
│ 2 │ finetuning_job_name_optimized │ /vol/registry/models/d102c2fc-3a3b-4e76-badb-c8c00638f878 │ hf │ True │
└────┴───────────────────────────────┴───────────────────────────────────────────────────────────┴──────┴──────────┘

Congratulations, you have run your first local finetuning job!