Tokenizer

Learn about language model tokenization.

Large language models process text using tokens, which are common sequences of characters found in a set of text. The models learn to understand the statistical relationships between these tokens, and excel at producing the next token in a sequence of tokens.

You can use the tool below to understand how a piece of text might be tokenized by a language model, and see the total count of tokens and estimated costs.

Model:

Results

Tokens: 0

Characters: 0

Helpful rule of thumb: one token generally corresponds to ~4 characters of text for common English text. This translates to roughly ¾ of a word (so 100 tokens ~= 75 words).