Skip to main content


Scale large Deep Learning models, deliver blazing fast inferences and optimize infraestructure costs with the Stochastic tools.


It is recommended that you first familiarize yourself with the Stochastic platform before beginning this tutorial.

1. Set up the Stochastic

a) Sign up for a free account at

b) Install the Stochastic library on your machine in a Python 3 environment using pip or conda. The CLI will be automatically installed in your computer.

conda install stochasticx

c) Verify the installation.

stochasticx --help
CLI output
Usage: stochasticx [OPTIONS] COMMAND [ARGS]...

--help Show this message and exit.


2. Sign in

Before you start using the CLI, you need to log in with your username and password.

stochasticx login --username "" --password "my_password"
CLI output
[+] Login successfully

3. Available classes

Usage: stochasticx [OPTIONS] COMMAND [ARGS]...

--help Show this message and exit.


The CLI is organized into different commands. You have commands to work with datasets, models, deployments, inferences, ... These commands can in turn have other commands. For instance, you can list all available models that you have uploaded to the Stochastic platform by running this command stochasticx models ls

4. Models and datasets

Before starting an optimization job, we will upload a HuggingFace dataset and a HuggingFace model in the Stochastic Platform using the Python library.

First download the HuggingFace dataset and save it in the disk. We will use the SQuAD dataset that contains questions, answers and contexts.

from datasets import load_dataset
dataset = load_dataset("squad")

Once you have saved it in the disk, you can upload it to the Stochastic Platform.

stochasticx datasets upload \
--name "squad_dataset" \
--dir_path "./squad" \
--type "hf"
  • name: will allow you to identify the dataset later.
  • dir_path: directory path where your dataset is located.
  • type: dataset type. Right now we are supporting Hugging Face, CSV, JSON, JSONL datasets.

Then, we will have to do the same for the model. In this tutorial, we will use the following one bert-base-uncased. Feel free to select another BERT model.

from transformers import AutoTokenizer, AutoModelForQuestionAnswering

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

model = AutoModelForQuestionAnswering.from_pretrained("bert-base-uncased")

Now you can upload this model to the Stochastic platform.

!stochasticx models upload \
--name "BERT SQuAD" \
--dir_path "./bert-squad" \
--type "hf"
  • name: will allow you to identify the model later.
  • dir_path: directory path where your model is located.
  • type: model type. In this case a Hugging Face model

You can list out your models:

stochasticx models ls
CLI output
[+] Collecting uploaded models

┃ Id ┃ Name ┃ Directory path ┃ Type ┃ Uploaded ┃
│ 62e15b3db5beb3002644046e │ bert │ │ hf │ True │
│ 62d545be860f360027140b74 │ distill-bert │ │ hf │ True │

You can also list your datasets:

stochasticx datasets ls
CLI output
[+] Collecting all datasets

┃ Id ┃ Name ┃ Directory path ┃ Type ┃ Uploaded ┃
│ 62e903a198855200266c03fe │ squad │ │ hf │ True │
│ 62d92ff2b3828f002719dba7 │ GLUE │ │ │ True │

5. Finetune or accelerate your favourite model for question answering

To start a finetuning or acceleration job, you will need to specify the following information:

  • Job name: will allow you to identify the job later.
  • Model ID: the model that you want to finetune and optimize.
  • Dataset ID: dataset that will be used to finetune and evaluate your model.
  • Optimization criteria: latency or lossless.
    • latency optimization will optimize the model latency and thoughput without taking into account the loss of accuracy. The accuracy is usually reduced around 3%, but the speedup can be x7 or more.
    • lossless optimization will optimize your model without loosing accuracy.
  • Task type: right now we only support the following tasks: sequence classification, question answering, summarization, translation and token classification.
  • Dataset columns: depending on the task you will have to map dataset columns to job metadata. For example, for question answering you will need to specify which column of the dataset contains the question, which one contains the context and which one contains the answer.

To get the columns of your dataset you can execute the following code or command, you also can try our web platform:

stochasticx datasets columns --id 62e903a198855200266c03fe
CLI output
[+] Collecting columns from the dataset

['id', 'title', 'context', 'question', 'answers']

To run this example job, BERT model and squad dataset will be used. They had been listed in the previous section

stochasticx jobs launch question_answering \
--job_name "quickstart_tutorial" \
--model_id "62e15b3db5beb3002644046e" \
--dataset_id "62e903a198855200266c03fe" \
--optimization_criteria "latency" \
--question_column "question" \
--answer_column "answers" \
--context_column "context"

Now, list your jobs.

stochasticx jobs ls
CLI Output
[+] Collecting all jobs

┃ Id ┃ Name ┃ Status ┃ Created at ┃ Optimization type ┃ Optimization criteria ┃
│ 62e9071c98855200266c0434 │ quickstart_tutorial │ New │ 2022-08-02 11:14:36 │ auto │ latency │

If you are seeing something similar, congratulations, you have started your first job using the CLI. Now, wait until your job is finished.