Yes, you need to care about truncation/padding/max numbers for your fine-tuning dataset
The question here is just about the max number, but since you need to care about how you steer the preprocessing, this linked question will help: How can you get a Huggingface fine-tuning model with the Trainer class from your own text where you can set the arguments for truncation and padding? for code and more insight.
Unless you load a pretrained dataset with Auto classes, you should care about such parameters. You should check if your text is fully tokenized by decoding the input_ids that you must have in the dataset. Also mind that the model has its own setup after the tokenizing which should outweigh the preprocessing setup. The preprocessing is just hand over the right input for the training. The model training reads and understand this input in its own way.
You need to care about truncation and the like during both preprocessing and training. Check samples and try understanding what is going on. I still do not know how the eos_token and pad_token come into play for the fine-tuning training and which arguments change what. The next chapter is only about the max numbers. I guess that changing the max number of tokens is not enough to steer the ship.
Auto classes (of GPT2) and DistilBert: max numbers
This takes up the other answer and checks the max numbers of the tokenizer and model objects of two example models ("german-gpt2" and "distilbert-base-cased-distilled-squad").
It shows that the AutoTokenizer is empty when it is loaded. That is why the tokenizer.model_max_length is the highest number you can get from the data type. It does not play any role, and that is why loading the model works at all. As long as you try loading it with the GPT2Tokenizer, it will fail to load, but if you load it with the empty AutoTokenizer instead, as the model card asks you to, it works. From this, I can see without any further reading that the model outweighs the settings of the tokenizer. Large text input will be split into blocks that the model allows, and it does not seem to play a role how the tokenizer itself is set up. One could say that the AutoTokenizer is the tokenizer of the model's settings.
The checks also show that you sometimes have to search for the right model max variable yourself since for the DistilBert model, the max number seems to be called n_positions in the "german-gpt2" config but max_position_embeddings in the "distilbert-base-cased-distilled-squad" config, and yet, you can ask both models for model.config.max_position_embeddings.
model_name = "dbmdz/german-gpt2"
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
print(tokenizer.max_len_sentences_pair)
print(tokenizer.max_len_single_sentence)
print(tokenizer.max_model_input_sizes)
print(tokenizer.model_max_length)
1000000000000000019884624838656
1000000000000000019884624838656
{'gpt2': 1024, 'gpt2-medium': 1024, 'gpt2-large': 1024, 'gpt2-xl': 1024, 'distilgpt2': 1024}
1000000000000000019884624838656
But once you set the tokenizer with your own arguments, the output changes:
tokenizer = AutoTokenizer.from_pretrained(
model_name, eos_token="<|endoftext|>", pad_token="[PAD]", model_max_length=512)
print(tokenizer.max_len_sentences_pair)
print(tokenizer.max_len_single_sentence)
print(tokenizer.max_model_input_sizes)
print(tokenizer.model_max_length)
512
512
{'gpt2': 1024, 'gpt2-medium': 1024, 'gpt2-large': 1024, 'gpt2-xl': 1024, 'distilgpt2': 1024}
512
Yet, you do not seem to need the tokenizing at all since the model will tell in which blocks it will read the input.
model = AutoModelForCausalLM.from_pretrained(model_name)
model.config.max_position_embeddings
1024
Here is an insight into the max variables of DistilBert:
model_name = "distilbert-base-cased-distilled-squad"
from transformers import DistilBertTokenizer
tokenizer = DistilBertTokenizer.from_pretrained(model_name)
print(tokenizer.max_len_sentences_pair)
print(tokenizer.max_len_single_sentence)
print(tokenizer.max_model_input_sizes)
print(tokenizer.model_max_length)
from transformers import DistilBertModel
model = DistilBertModel.from_pretrained(model_name)
model.config.max_position_embeddings
509
510
{'distilbert-base-uncased': 512, 'distilbert-base-uncased-distilled-squad': 512, 'distilbert-base-cased': 512, 'distilbert-base-cased-distilled-squad': 512, 'distilbert-base-german-cased': 512, 'distilbert-base-multilingual-cased': 512}
512
512
And if you check the model.config, you find these max variables, and n_positions should be the same as max_position_embeddings:
"n_ctx": 1024,
"n_embd": 768,
...
"n_positions": 1024,
Deep dive
Needed for the checks:
import inspect
Example 1: German GPT2 (with Autoclasses)
You have to import this model with AutoClasses, see the model card Using the model. They do not ask you to enter any arguments for the AutoTokenizer object, and the class is empty but still works since the mode config outweighs the tokenizer settings.
AutoTokenizer
model_name = "dbmdz/german-gpt2"
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
inspect.signature(AutoTokenizer)
<Signature ()>
dir(AutoTokenizer)
['__class__',
'__delattr__',
'__dict__',
'__dir__',
'__doc__',
'__eq__',
'__format__',
'__ge__',
'__getattribute__',
'__gt__',
'__hash__',
'__init__',
'__init_subclass__',
'__le__',
'__lt__',
'__module__',
'__ne__',
'__new__',
'__reduce__',
'__reduce_ex__',
'__repr__',
'__setattr__',
'__sizeof__',
'__str__',
'__subclasshook__',
'__weakref__',
'from_pretrained',
'register']
from transformers import AutoModel
model = AutoModel.from_pretrained(model_name)
model.config.max_position_embeddings
1024
dir(AutoModel)
['__class__',
'__delattr__',
'__dict__',
'__dir__',
'__doc__',
'__eq__',
'__format__',
'__ge__',
'__getattribute__',
'__gt__',
'__hash__',
'__init__',
'__init_subclass__',
'__le__',
'__lt__',
'__module__',
'__ne__',
'__new__',
'__reduce__',
'__reduce_ex__',
'__repr__',
'__setattr__',
'__sizeof__',
'__str__',
'__subclasshook__',
'__weakref__',
'_model_mapping',
'from_config',
'from_pretrained',
'register']
GPT2Tokenizer
Even though this throws an error for the tokenizer of the german-gpt2 model, we can still check how its classes and parameters look like.
from transformers import GPT2Tokenizer
inspect.signature(GPT2Tokenizer)
<Signature (vocab_file, merges_file, errors='replace', unk_token='<|endoftext|>', bos_token='<|endoftext|>', eos_token='<|endoftext|>', pad_token=None, add_prefix_space=False, add_bos_token=False, **kwargs)>
dir(GPT2Tokenizer)
['SPECIAL_TOKENS_ATTRIBUTES',
'__annotations__',
'__call__',
'__class__',
'__delattr__',
'__dict__',
'__dir__',
'__doc__',
'__eq__',
'__format__',
'__ge__',
'__getattribute__',
'__gt__',
'__hash__',
'__init__',
'__init_subclass__',
'__le__',
'__len__',
'__lt__',
'__module__',
'__ne__',
'__new__',
'__reduce__',
'__reduce_ex__',
'__repr__',
'__setattr__',
'__sizeof__',
'__str__',
'__subclasshook__',
'__weakref__',
'_add_tokens',
'_auto_class',
'_batch_encode_plus',
'_batch_prepare_for_model',
'_build_conversation_input_ids',
'_convert_id_to_token',
'_convert_token_to_id',
'_convert_token_to_id_with_added_voc',
'_create_or_get_repo',
'_create_trie',
'_decode',
'_encode_plus',
'_eventual_warn_about_too_long_sequence',
'_eventually_correct_t5_max_length',
'_from_pretrained',
'_get_padding_truncation_strategies',
'_get_repo_url_from_name',
'_pad',
'_push_to_hub',
'_save_pretrained',
'_set_processor_class',
'_tokenize',
'add_special_tokens',
'add_tokens',
'additional_special_tokens',
'additional_special_tokens_ids',
'all_special_ids',
'all_special_tokens',
'all_special_tokens_extended',
'as_target_tokenizer',
'batch_decode',
'batch_encode_plus',
'bos_token',
'bos_token_id',
'bpe',
'build_inputs_with_special_tokens',
'clean_up_tokenization',
'cls_token',
'cls_token_id',
'convert_ids_to_tokens',
'convert_tokens_to_ids',
'convert_tokens_to_string',
'create_token_type_ids_from_sequences',
'decode',
'encode',
'encode_plus',
'eos_token',
'eos_token_id',
'from_pretrained',
'get_added_vocab',
'get_special_tokens_mask',
'get_vocab',
'is_fast',
'mask_token',
'mask_token_id',
'max_len_sentences_pair',
'max_len_single_sentence',
'max_model_input_sizes',
'model_input_names',
'num_special_tokens_to_add',
'pad',
'pad_token',
'pad_token_id',
'pad_token_type_id',
'padding_side',
'prepare_for_model',
'prepare_for_tokenization',
'prepare_seq2seq_batch',
'pretrained_init_configuration',
'pretrained_vocab_files_map',
'push_to_hub',
'register_for_auto_class',
'sanitize_special_tokens',
'save_pretrained',
'save_vocabulary',
'sep_token',
'sep_token_id',
'slow_tokenizer_class',
'special_tokens_map',
'special_tokens_map_extended',
'tokenize',
'truncate_sequences',
'truncation_side',
'unk_token',
'unk_token_id',
'vocab_files_names',
'vocab_size']
(AutoTokenizer) tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
inspect.signature(tokenizer)
<Signature (text: Union[str, List[str], List[List[str]]], text_pair: Union[str, List[str], List[List[str]], NoneType] = None, add_special_tokens: bool = True, padding: Union[bool, str, transformers.utils.generic.PaddingStrategy] = False, truncation: Union[bool, str, transformers.tokenization_utils_base.TruncationStrategy] = False, max_length: Optional[int] = None, stride: int = 0, is_split_into_words: bool = False, pad_to_multiple_of: Optional[int] = None, return_tensors: Union[str, transformers.utils.generic.TensorType, NoneType] = None, return_token_type_ids: Optional[bool] = None, return_attention_mask: Optional[bool] = None, return_overflowing_tokens: bool = False, return_special_tokens_mask: bool = False, return_offsets_mapping: bool = False, return_length: bool = False, verbose: bool = True, **kwargs) -> transformers.tokenization_utils_base.BatchEncoding>
dir(tokenizer)
['SPECIAL_TOKENS_ATTRIBUTES',
'__annotations__',
'__call__',
'__class__',
'__delattr__',
'__dict__',
'__dir__',
'__doc__',
'__eq__',
'__format__',
'__ge__',
'__getattribute__',
'__gt__',
'__hash__',
'__init__',
'__init_subclass__',
'__le__',
'__len__',
'__lt__',
'__module__',
'__ne__',
'__new__',
'__reduce__',
'__reduce_ex__',
'__repr__',
'__setattr__',
'__sizeof__',
'__str__',
'__subclasshook__',
'__weakref__',
'_add_tokens',
'_additional_special_tokens',
'_auto_class',
'_batch_encode_plus',
'_bos_token',
'_build_conversation_input_ids',
'_cls_token',
'_convert_encoding',
'_convert_id_to_token',
'_convert_token_to_id_with_added_voc',
'_create_or_get_repo',
'_decode',
'_decode_use_source_tokenizer',
'_encode_plus',
'_eos_token',
'_eventual_warn_about_too_long_sequence',
'_eventually_correct_t5_max_length',
'_from_pretrained',
'_get_padding_truncation_strategies',
'_get_repo_url_from_name',
'_mask_token',
'_pad',
'_pad_token',
'_pad_token_type_id',
'_processor_class',
'_push_to_hub',
'_save_pretrained',
'_sep_token',
'_set_processor_class',
'_tokenizer',
'_unk_token',
'add_prefix_space',
'add_special_tokens',
'add_tokens',
'additional_special_tokens',
'additional_special_tokens_ids',
'all_special_ids',
'all_special_tokens',
'all_special_tokens_extended',
'as_target_tokenizer',
'backend_tokenizer',
'batch_decode',
'batch_encode_plus',
'bos_token',
'bos_token_id',
'build_inputs_with_special_tokens',
'can_save_slow_tokenizer',
'clean_up_tokenization',
'cls_token',
'cls_token_id',
'convert_ids_to_tokens',
'convert_tokens_to_ids',
'convert_tokens_to_string',
'create_token_type_ids_from_sequences',
'decode',
'decoder',
'deprecation_warnings',
'encode',
'encode_plus',
'eos_token',
'eos_token_id',
'from_pretrained',
'get_added_vocab',
'get_special_tokens_mask',
'get_vocab',
'init_inputs',
'init_kwargs',
'is_fast',
'mask_token',
'mask_token_id',
'max_len_sentences_pair',
'max_len_single_sentence',
'max_model_input_sizes',
'model_input_names',
'model_max_length',
'name_or_path',
'num_special_tokens_to_add',
'pad',
'pad_token',
'pad_token_id',
'pad_token_type_id',
'padding_side',
'prepare_for_model',
'prepare_seq2seq_batch',
'pretrained_init_configuration',
'pretrained_vocab_files_map',
'push_to_hub',
'register_for_auto_class',
'sanitize_special_tokens',
'save_pretrained',
'save_vocabulary',
'sep_token',
'sep_token_id',
'set_truncation_and_padding',
'slow_tokenizer_class',
'special_tokens_map',
'special_tokens_map_extended',
'tokenize',
'train_new_from_iterator',
'truncate_sequences',
'truncation_side',
'unk_token',
'unk_token_id',
'verbose',
'vocab',
'vocab_files_names',
'vocab_size']
(AutoTokenizer) tokenizer.encode
inspect.signature(tokenizer.encode)
<Signature (text: Union[str, List[str], List[int]], text_pair: Union[str, List[str], List[int], NoneType] = None, add_special_tokens: bool = True, padding: Union[bool, str, transformers.utils.generic.PaddingStrategy] = False, truncation: Union[bool, str, transformers.tokenization_utils_base.TruncationStrategy] = False, max_length: Optional[int] = None, stride: int = 0, return_tensors: Union[str, transformers.utils.generic.TensorType, NoneType] = None, **kwargs) -> List[int]>
dir(tokenizer.encode)
['__call__',
'__class__',
'__delattr__',
'__dir__',
'__doc__',
'__eq__',
'__format__',
'__func__',
'__ge__',
'__get__',
'__getattribute__',
'__gt__',
'__hash__',
'__init__',
'__init_subclass__',
'__le__',
'__lt__',
'__ne__',
'__new__',
'__reduce__',
'__reduce_ex__',
'__repr__',
'__self__',
'__setattr__',
'__sizeof__',
'__str__',
'__subclasshook__']
(Autotokenizer) tokenizer.encode_plus
inspect.signature(tokenizer.encode_plus)
<Signature (text: Union[str, List[str], List[int]], text_pair: Union[str, List[str], List[int], NoneType] = None, add_special_tokens: bool = True, padding: Union[bool, str, transformers.utils.generic.PaddingStrategy] = False, truncation: Union[bool, str, transformers.tokenization_utils_base.TruncationStrategy] = False, max_length: Optional[int] = None, stride: int = 0, is_split_into_words: bool = False, pad_to_multiple_of: Optional[int] = None, return_tensors: Union[str, transformers.utils.generic.TensorType, NoneType] = None, return_token_type_ids: Optional[bool] = None, return_attention_mask: Optional[bool] = None, return_overflowing_tokens: bool = False, return_special_tokens_mask: bool = False, return_offsets_mapping: bool = False, return_length: bool = False, verbose: bool = True, **kwargs) -> transformers.tokenization_utils_base.BatchEncoding>
dir(tokenizer.encode_plus)
['__call__',
'__class__',
'__delattr__',
'__dir__',
'__doc__',
'__eq__',
'__format__',
'__func__',
'__ge__',
'__get__',
'__getattribute__',
'__gt__',
'__hash__',
'__init__',
'__init_subclass__',
'__le__',
'__lt__',
'__ne__',
'__new__',
'__reduce__',
'__reduce_ex__',
'__repr__',
'__self__',
'__setattr__',
'__sizeof__',
'__str__',
'__subclasshook__']
AutoConfig
model.config
GPT2Config {
"_name_or_path": "dbmdz/german-gpt2",
"activation_function": "gelu_new",
"architectures": [
"GPT2LMHeadModel"
],
"attn_pdrop": 0.0,
"bos_token_id": 50256,
"embd_pdrop": 0.0,
"eos_token_id": 50256,
"gradient_checkpointing": false,
"initializer_range": 0.02,
"layer_norm_epsilon": 1e-05,
"model_type": "gpt2",
"n_ctx": 1024,
"n_embd": 768,
"n_head": 12,
"n_inner": null,
"n_layer": 12,
"n_positions": 1024,
"reorder_and_upcast_attn": false,
"resid_pdrop": 0.0,
"scale_attn_by_inverse_layer_idx": false,
"scale_attn_weights": true,
"summary_activation": null,
"summary_first_dropout": 0.1,
"summary_proj_to_labels": true,
"summary_type": "cls_index",
"summary_use_proj": true,
"task_specific_params": {
"text-generation": {
"do_sample": true,
"max_length": 50
}
},
"torch_dtype": "float32",
"transformers_version": "4.19.2",
"use_cache": true,
"vocab_size": 50265
}
Example 2: DistilBert
DistilBertTokenizer
from transformers import DistilBertTokenizer
inspect.signature(DistilBertTokenizer)
<Signature (vocab_file, do_lower_case=True, do_basic_tokenize=True, never_split=None, unk_token='[UNK]', sep_token='[SEP]', pad_token='[PAD]', cls_token='[CLS]', mask_token='[MASK]', tokenize_chinese_chars=True, strip_accents=None, **kwargs)>
dir(DistilBertTokenizer)
['SPECIAL_TOKENS_ATTRIBUTES',
'__annotations__',
'__call__',
'__class__',
'__delattr__',
'__dict__',
'__dir__',
'__doc__',
'__eq__',
'__format__',
'__ge__',
'__getattribute__',
'__gt__',
'__hash__',
'__init__',
'__init_subclass__',
'__le__',
'__len__',
'__lt__',
'__module__',
'__ne__',
'__new__',
'__reduce__',
'__reduce_ex__',
'__repr__',
'__setattr__',
'__sizeof__',
'__str__',
'__subclasshook__',
'__weakref__',
'_add_tokens',
'_auto_class',
'_batch_encode_plus',
'_batch_prepare_for_model',
'_convert_id_to_token',
'_convert_token_to_id',
'_convert_token_to_id_with_added_voc',
'_create_or_get_repo',
'_create_trie',
'_decode',
'_encode_plus',
'_eventual_warn_about_too_long_sequence',
'_eventually_correct_t5_max_length',
'_from_pretrained',
'_get_padding_truncation_strategies',
'_get_repo_url_from_name',
'_pad',
'_push_to_hub',
'_save_pretrained',
'_set_processor_class',
'_tokenize',
'add_special_tokens',
'add_tokens',
'additional_special_tokens',
'additional_special_tokens_ids',
'all_special_ids',
'all_special_tokens',
'all_special_tokens_extended',
'as_target_tokenizer',
'batch_decode',
'batch_encode_plus',
'bos_token',
'bos_token_id',
'build_inputs_with_special_tokens',
'clean_up_tokenization',
'cls_token',
'cls_token_id',
'convert_ids_to_tokens',
'convert_tokens_to_ids',
'convert_tokens_to_string',
'create_token_type_ids_from_sequences',
'decode',
'do_lower_case',
'encode',
'encode_plus',
'eos_token',
'eos_token_id',
'from_pretrained',
'get_added_vocab',
'get_special_tokens_mask',
'get_vocab',
'is_fast',
'mask_token',
'mask_token_id',
'max_len_sentences_pair',
'max_len_single_sentence',
'max_model_input_sizes',
'model_input_names',
'num_special_tokens_to_add',
'pad',
'pad_token',
'pad_token_id',
'pad_token_type_id',
'padding_side',
'prepare_for_model',
'prepare_for_tokenization',
'prepare_seq2seq_batch',
'pretrained_init_configuration',
'pretrained_vocab_files_map',
'push_to_hub',
'register_for_auto_class',
'sanitize_special_tokens',
'save_pretrained',
'save_vocabulary',
'sep_token',
'sep_token_id',
'slow_tokenizer_class',
'special_tokens_map',
'special_tokens_map_extended',
'tokenize',
'truncate_sequences',
'truncation_side',
'unk_token',
'unk_token_id',
'vocab_files_names',
'vocab_size']
DistilBertModel
dir(DistilBertModel)
['T_destination',
'__annotations__',
'__call__',
'__class__',
'__delattr__',
'__dict__',
'__dir__',
'__doc__',
'__eq__',
'__format__',
'__ge__',
'__getattr__',
'__getattribute__',
'__gt__',
'__hash__',
'__init__',
'__init_subclass__',
'__le__',
'__lt__',
'__module__',
'__ne__',
'__new__',
'__reduce__',
'__reduce_ex__',
'__repr__',
'__setattr__',
'__setstate__',
'__sizeof__',
'__str__',
'__subclasshook__',
'__weakref__',
'_apply',
'_auto_class',
'_backward_compatibility_gradient_checkpointing',
'_call_impl',
'_can_retrieve_inputs_from_name',
'_convert_head_mask_to_5d',
'_create_or_get_repo',
'_expand_inputs_for_generation',
'_from_config',
'_get_backward_hooks',
'_get_backward_pre_hooks',
'_get_decoder_start_token_id',
'_get_logits_processor',
'_get_logits_warper',
'_get_name',
'_get_repo_url_from_name',
'_get_resized_embeddings',
'_get_resized_lm_head',
'_get_stopping_criteria',
'_hook_rss_memory_post_forward',
'_hook_rss_memory_pre_forward',
'_init_weights',
'_keys_to_ignore_on_load_missing',
'_keys_to_ignore_on_load_unexpected',
'_keys_to_ignore_on_save',
'_load_from_state_dict',
'_load_pretrained_model',
'_load_pretrained_model_low_mem',
'_maybe_warn_non_full_backward_hook',
'_merge_criteria_processor_list',
'_named_members',
'_prepare_attention_mask_for_generation',
'_prepare_decoder_input_ids_for_generation',
'_prepare_encoder_decoder_kwargs_for_generation',
'_prepare_input_ids_for_generation',
'_prepare_model_inputs',
'_prune_heads',
'_push_to_hub',
'_register_load_state_dict_pre_hook',
'_register_state_dict_hook',
'_reorder_cache',
'_replicate_for_data_parallel',
'_resize_token_embeddings',
'_save_to_state_dict',
'_set_default_torch_dtype',
'_slow_forward',
'_tie_encoder_decoder_weights',
'_tie_or_clone_weights',
'_update_model_kwargs_for_generation',
'_version',
'add_memory_hooks',
'add_module',
'adjust_logits_during_generation',
'apply',
'base_model',
'base_model_prefix',
'beam_sample',
'beam_search',
'bfloat16',
'buffers',
'call_super_init',
'children',
'compute_transition_beam_scores',
'config_class',
'constrained_beam_search',
'cpu',
'create_extended_attention_mask_for_decoder',
'cuda',
'device',
'double',
'dtype',
'dummy_inputs',
'dump_patches',
'estimate_tokens',
'eval',
'extra_repr',
'float',
'floating_point_ops',
'forward',
'framework',
'from_pretrained',
'generate',
'get_buffer',
'get_extended_attention_mask',
'get_extra_state',
'get_head_mask',
'get_input_embeddings',
'get_output_embeddings',
'get_parameter',
'get_position_embeddings',
'get_submodule',
'gradient_checkpointing_disable',
'gradient_checkpointing_enable',
'greedy_search',
'group_beam_search',
'half',
'init_weights',
'invert_attention_mask',
'ipu',
'is_gradient_checkpointing',
'is_parallelizable',
'load_state_dict',
'load_tf_weights',
'main_input_name',
'modules',
'named_buffers',
'named_children',
'named_modules',
'named_parameters',
'num_parameters',
'parameters',
'post_init',
'prepare_inputs_for_generation',
'prune_heads',
'push_to_hub',
'register_backward_hook',
'register_buffer',
'register_for_auto_class',
'register_forward_hook',
'register_forward_pre_hook',
'register_full_backward_hook',
'register_full_backward_pre_hook',
'register_load_state_dict_post_hook',
'register_module',
'register_parameter',
'register_state_dict_pre_hook',
'requires_grad_',
'reset_memory_hooks_state',
'resize_position_embeddings',
'resize_token_embeddings',
'retrieve_modules_from_names',
'sample',
'save_pretrained',
'set_extra_state',
'set_input_embeddings',
'share_memory',
'state_dict',
'supports_gradient_checkpointing',
'tie_weights',
'to',
'to_empty',
'train',
'type',
'xpu',
'zero_grad']
model.config
DistilBertConfig {
"_name_or_path": "distilbert-base-cased-distilled-squad",
"activation": "gelu",
"architectures": [
"DistilBertForQuestionAnswering"
],
"attention_dropout": 0.1,
"dim": 768,
"dropout": 0.1,
"hidden_dim": 3072,
"initializer_range": 0.02,
"max_position_embeddings": 512,
"model_type": "distilbert",
"n_heads": 12,
"n_layers": 6,
"output_past": true,
"pad_token_id": 0,
"qa_dropout": 0.1,
"seq_classif_dropout": 0.2,
"sinusoidal_pos_embds": true,
"tie_weights_": true,
"transformers_version": "4.19.2",
"vocab_size": 28996
}