Skip to main content

Optimization jobs

tip

Recommended tutorials before starting starting to work with optimization jobs.

1. Sign in

Before you start using the CLI, you need to log in with your username and password.

stochasticx login --username "my_email@email.com" --password "my_password"

2. Create a job

A job allows you to finetune and accelerate a model. In this tutorial a job will be launched to finetune and accelerate a BERT model using the SQuAD dataset downloaded from the Hugging Face Hub. This model will be able to retrieve the answer to a question, given the context. Perform the following steps to start the job:

  1. Download the SQuAD dataset from the Hugging Face Hub:
from datasets import load_dataset
dataset = load_dataset("squad")
dataset.save_to_disk("./squad")
  1. Upload the dataset to the Stochastic platform:
stochasticx datasets upload \
--name "squad_dataset" \
--dir_path "./squad" \
--type "hf"
  1. Download the BERT model from the Hugging Face Hub:
from transformers import AutoTokenizer, AutoModelForQuestionAnswering

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
tokenizer.save_pretrained("./bert")

model = AutoModelForQuestionAnswering.from_pretrained("bert-base-uncased")
model.save_pretrained("./bert")
  1. Upload the model to the Stochastic platform:
stochasticx models upload \
--name "BERT" \
--dir_path "./bert" \
--type "hf"
  1. Get the ID of the uploaded dataset.
stochasticx datasets ls
CLI output
[+] Collecting all datasets

┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┓
┃ Id ┃ Name ┃ Directory path ┃ Type ┃ Uploaded ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━┩
│ 62e912a598855200266c0478 │ squad_dataset │ │ json │ True │
└──────────────────────────┴──────────────────────────────┴────────────────┴───────┴──────────┘
  1. Get the ID of the uploaded model.
stochasticx models ls
CLI output
[+] Collecting uploaded models

┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┓
┃ Id ┃ Name ┃ Directory path ┃ Type ┃ Uploaded ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━┩
│ 62e9195598855200266c0497 │ BERT │ │ hf │ True │
└──────────────────────────┴──────────────────────────────┴────────────────┴───────┴──────────┘
  1. Get the dataset columns
stochasticx datasets columns --id 62e912a598855200266c0478
CLI output
[+] Collecting columns from the dataset

['id', 'title', 'context', 'question', 'answers']
  1. Start the optimization job
stochasticx jobs launch question_answering \
--job_name "job_tutorial" \
--model_id "62e9195598855200266c0497" \
--dataset_id "62e912a598855200266c0478" \
--optimization_criteria "latency" \
--question_column "question" \
--answer_column "answers" \
--context_column "context"

3. List your jobs

Use the following command to list the jobs. The job status you have started should be New, which means that the job is still running. When it finishes, its status should change to Successful.

stochasticx jobs ls
CLI output
[+] Collecting all jobs

┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Id ┃ Name ┃ Status ┃ Created at ┃ Optimization type ┃ Optimization criteria ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━┩
│ 6357eace1da66b553c95f89d │ job_tutorial │ New │ 2022-08-02 13:55:26 │ auto │ latency │
│ 62d93018b3828f002719dbc9 │ Test │ successful │ 2022-07-21 10:53:12 │ auto │ latency │
└──────────────────────────┴───────────────────────────┴──────────────┴─────────────────────┴───────────────────┴───────────────────────┘