The Microsoft.ML.Tokenizers.Data.Gpt2 includes the Tiktoken tokenizer data file gpt2.tiktoken, which is utilized by models such as Gpt-2.
$ dotnet add package Microsoft.ML.Tokenizers.Data.Gpt2The Microsoft.ML.Tokenizers.Data.Gpt2 includes the Tiktoken tokenizer data file gpt2.tiktoken, which is utilized by models such as Gpt-2.
Reference this package in your project to use the Tiktoken tokenizer with the specified model.
// Create a tokenizer for the specified model
Tokenizer tokenizer = TiktokenTokenizer.CreateForModel("Gpt-2");
Users shouldn't use any types exposed by this package directly. This package is intended to provide tokenizer data files.
Microsoft.ML.Tokenizers
Microsoft.ML.Tokenizers.Data.Gpt2 is released as open source under the MIT license. Bug reports and contributions are welcome at the GitHub repository.