Skip to main content


Navigate through the Stochastic Platform in the Model API side tab and select FLAN-T5 model card.

Model card

You can now choose to use the model either by our Playground or API. You can choose the method by selecting the corresponding tab.

Playground usage

In this page you can find a textarea in which you can insert any prompt you would like the model to generate the text based on it. On the right side panel you can personalize supported parameter values based on your preference. Finally press the Submit button to trigger a request. The output result will be shown in the central panel.

  • Max new tokens: Maximum number of tokens generated by the model. A token is roughly 4 characters including alphanumerics and special characters.
  • TopK: Top-K sampling means sorting by probability and zero-ing out the probabilities for anything below the k'th token. A lower value improves quality by removing the tail and making it less likely to go off topic.
  • Penalty alpha: Regulates the importance of the model confidence and the degeneration penalty in contrastive search decoding. When generating output, contrastive search jointly considers the probability predicted by the language model to maintain the semantic coherence between the generated text and the prefix text and the similarity with respect to the previous context to avoid model degeneration

Stochastic-x model API usage

To use the model API in your application there are two main steps


To use the model API you have to have a Stochastic account. Sign up for a free account.

In this step we have to submit a inference request to the ApiUrl to get the responseUrl and the queuePosition. Specification of request and response are mentioned below.

  • Request

    • Method : POST

    • Header : In the request header add a property called apiKey . (Get the apiKey form the Stochastic Platform)

    Use tab

    • Body : The request body can contain the following properties:

      • prompt: Required, the prompt for text generation or can be an Array of Strings
      • params: Required, Params for generation

      Here is an example for the request body:

      "prompt": "The Inflation Reduction Act lowers prescription drug costs, health care costs, and energy costs. It's the most aggressive action on tackling the climate crisis in American history, which will lift up American workers and create good-paying, union jobs across the country. It'll lower the deficit and ask the ultra-wealthy and corporations to pay their fair share. And no one making under $400,000 per year will pay a penny more in taxes",
      "params": {
      "max_new_tokens": 64,
      "top_k": 4,
      "penalty_alpha": 0.6
  • Response

    If the request is successful you will receive the responseUrl and the queuePosition

    "success": true,
    "data": {
    "id": "6389ce23460c900d80fa2290",
    "responseUrl": "",
    "queuePosition": "0"

Python example

Below you can find an example request with Python to get the completion. Don't forget to add your API key in the example.

import requests
import time

response_step1 =
"apiKey": "your API key"
"prompt": "A step by step recipe to make bolognese pasta:",
"params": {
"max_new_tokens": 64,
"top_k": 4,
"penalty_alpha": 0.6

data_step1 = response_step1.json()["data"]

completed = False

while not completed:
response_step2 = requests.get(
"apiKey": "your API key"

data_step2 = response_step2.json()["data"]

completion = data_step2.get("completion")

completed = completion is not None

['In a large saucepan, combine the ground beef, tomato paste, tomato sauce, oregano, basil, salt, pepper, and thyme. Bring to a boil, then reduce the heat to low and simmer for 30 minutes. Meanwhile, cook the pasta in salted boiling water according to']