Single and Batch Inference with OpenAI

This blog provides codes for both single and batch inference using the OpenAI Chat Completions API. It expands on the official documentation and explains key implementation steps to streamline your workflow.

🔗 Resources

🧭 Official Documentation

🛠️ Pricing Calculator

There are many similar tools available online to help estimate usage costs. If you’d prefer to calculate pricing manually, follow these steps:

  1. Identify the input and output token pricing for the model you’re using (refer to OpenAI pricing).
  2. Use tiktoken to count:
    • Input tokens
    • Estimated output tokens (expected response)
  3. Apply the cost formula below:
def calculate_openai_cost(input_tokens, output_tokens, input_price_per_1k, output_price_per_1k):
    cost = (input_tokens / 1000) * input_price_per_1k + (output_tokens / 1000) * output_price_per_1k
    return round(cost, 6)

Set Up

You’ll need an API key to access OpenAI endpoints. If you don’t have:

If you don’t have an account:

  1. Sign up at OpenAI Create Account.
  2. Generate your first API key.
  3. Check OpenAI Pricing to purchase credits if needed.

If you already have an account:

  1. Log in at OpenAI Log In.
  2. Go to Settings ⚙️ → Project → API Keys → + Create new secret key.

💡 Important: Save your API key securely — it won’t be shown again! I recommend saving your API key to a text file named key.txt.

Install the OpenAI library:

pip install openai

I’ll use English-to-Japanese translation as the example in the following sections.


Single inference

from openai import OpenAI

with open('key.txt') as f:
    key = f.read()
client = OpenAI(api_key=key)

MODEL = "gpt-4o-mini" # Specify your model here
SYSTEM_PROMPT = "Translate from English to Japanese."

# Few-shot examples
EXAMPLES = [
    {
        "user": "English: Muscles never let you down. Japanese:",
        "assistant": "筋肉は裏切らない。"
    },
    {
        "user": "English: An apple a day keeps the docter away. Japanese:",
        "assistant": "1日1個のリンゴは医者を遠ざける。"
    },
    {
        "user": (
            "English: The agriculture ministry says the average price for a 5-kilogram "
            "bag of rice was 4,172 yen, or roughly 28 dollars, for the week through March 16. Japanese:"
        ),
        "assistant": "農林水産省によると、3月16日までの1週間で、5キログラム入りの米の平均価格は4,172円、つまりおよそ28ドルだったということです。"
    }
]

def translator(src: str, model: str = MODEL, fewshot: bool = True) -> str:
    messages = [{"role": "system", "content": SYSTEM_PROMPT}]

    if fewshot:
        for i, example in enumerate(EXAMPLES, 1):
            messages.append({
                "role": "user", "content": example["user"], "name": f"example_user_{i}"
            })
            messages.append({
                "role": "assistant", "content": example["assistant"], "name": f"example_assistant_{i}"
            })

    messages.append({
        "role": "user", "content": f"English: {src.strip()} Japanese:"
    })

    response = client.chat.completions.create(
        model=model,
        messages=messages
    )

    return response.choices[0].message.content.strip()

Batch Inference

When dealing with large volumes of data, OpenAI’s Batch API offers a faster, cheaper way to process prompts in bulk.

STEP 1: Convert Text File to JSONL Format

Suppose you have a .txt file with one English sentence per line. We’ll convert each line into the JSON structure like this:

{
  "custom_id": "request-1",
  "method": "POST",
  "url": "/v1/chat/completions",
  "body": {
    "model": "gpt-4o-mini",
    "messages": [
      { "role": "system", "content": "Translate from English to Japanese." },
      { "role": "user", "content": "English: The sun sets earlier in winter than in summer. Japanese: " }
    ],
    "max_tokens": 1000
  }
}

Code

Save the following code as convert.py:

import os, json, argparse
from tqdm import tqdm

MODEL = "gpt-4o-mini" # Specify your model here

def convertor(input_path: str, output_path: str, model: str = MODEL, max_tokens: int = 1000):
    with open(input_path, 'r', encoding='utf-8') as f:
        total_lines = sum(1 for _ in f)

    start_line = 0
    if os.path.exists(output_path):
        with open(output_path, 'r', encoding='utf-8') as f:
            start_line = sum(1 for _ in f)
        print(f"[INFO] Resuming from line {start_line}...")

    with open(input_path, 'r', encoding='utf-8') as infile, \
         open(output_path, 'a', encoding='utf-8') as outfile:

        for i, line in enumerate(tqdm(infile, total=total_lines, desc="Converting")):
            if i < start_line:
                continue

            line = line.strip()
            if line:
                json_line = {
                    "custom_id": f"request-{i+1}",
                    "method": "POST",
                    "url": "/v1/chat/completions",
                    "body": {
                        "model": model,
                        "messages": [
                            {"role": "system", "content": "Translate from English to Japanese."},
                            {"role": "user", "content": f"English: {line} Japanese:"}
                        ],
                        "max_tokens": max_tokens
                    }
                }
                outfile.write(json.dumps(json_line, ensure_ascii=False) + "\n")

    print("[DONE] Conversion complete.")

def main():
    parser = argparse.ArgumentParser(description="Convert a TXT file to JSONL format for OpenAI Batch API.")
    parser.add_argument("--input", required=True, help="Path to input TXT file")
    parser.add_argument("--output", required=True, help="Path to output JSONL file")
    args = parser.parse_args()

    convertor(args.input, args.output)

if __name__ == "__main__":
    main()

Run Conversion (Single File)

python convert.py --input eng.txt --output eng.jsonl

Optional: Run on Multiple Files with run.sh

Save the following code as run.sh:

#!/bin/bash

TXT_DIR="input_folder_path"  # Specify your INPUT TXT folder path here
JSONL_DIR="output_folder_path"  # Specify your OUTPUT JSONL folder path here
SCRIPT="convert.py"

# Create output directory if it doesn't exist
mkdir -p "$JSONL_DIR"

# Loop over all .txt files in input folder
for txt_file in "$TXT_DIR"/*.txt; do
    filename=$(basename "$txt_file" .txt)
    jsonl_file="$JSONL_DIR/$filename.jsonl"

    echo "Converting: $txt_file$jsonl_file"
    python "$SCRIPT" --input "$txt_file" --output "$jsonl_file"
done

echo "✅ Batch conversion complete."

Run it:

bash run.sh

STEP 2: Upload JSONL File to OpenAI

with open('key.txt') as f:
    key = f.read()
client = OpenAI(api_key=key)

batch_input_file = client.files.create(
    file=open("input.jsonl", "rb"),
    purpose="batch"
)
print(batch_input_file)

STEP 3: Create and Monitor the Batch

Once your input JSONL file is uploaded, you can use the OpenAI API to create a batch job and monitor its progress.

Create the Batch Job

batch_input_file_id = batch_input_file.id
job = client.batches.create(
    input_file_id=batch_input_file_id,
    endpoint="/v1/chat/completions",
    completion_window="24h",
    metadata={
        "description": "translation job"
    }
)
print("✅ Batch job created!")
print(f"Job ID: {job.id}")

Check Batch Status

Use the job ID to retrieve the current status of your batch job:

batch = client.batches.retrieve(job.id)
print(batch)
print(batch.status)

You may see status like: validating, completed, failed … You can poll this status periodically to know when the job is complete.

STEP 4: Retrieve and Save Results

Once your batch job is marked as completed, you can download and extract the assistant responses.

file_response = client.files.content(batch.output_file_id)
# Optional: print raw text output
print(file_response.text)

STEP 5: Save Assistant Outputs to a .txt File

After retrieving your OpenAI Batch results, you can extract the assistant responses and save them to a .txt file — one line per entry.

with open("batch_output.txt", "w", encoding="utf-8") as f:
    for line in file_response.text.strip().splitlines():
        output = json.loads(line)["response"]["body"]["choices"][0]["message"]["content"]
        f.write(output + "\n")

💡 json.loads() automatically decodes escaped Unicode. For example, "\u51ac\u306f\u590f\u3088\u308a\u3082\u65e9\u304f\u65e5\u304c\u6c88\u307f\u307e\u3059\u3002" becomes the human-readable Japanese: "冬は夏よりも早く日が沈みます" 🎌




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • How to Build a LINE ChatGPT Chatbot with Zero Coding Effort