-
Notifications
You must be signed in to change notification settings - Fork 30.8k
Closed
Description
System Info
transformers
version: 4.30.2- Platform: Linux-5.4.0-137-generic-x86_64-with-glibc2.31
- Python version: 3.10.0
- Huggingface_hub version: 0.15.1
- Safetensors version: 0.3.1
- PyTorch version (GPU?): 2.0.1+cu117 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?:
- Using distributed or parallel set-up in script?:
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
import transformers, torch
weights_dir = "weights/recovered"
question = 'Hello, there!'
model = transformers.AutoModelForCausalLM.from_pretrained(weights_dir)
model = model.cuda()
print(model.config)
# LlamaConfig {
# "_name_or_path": "weights/recovered",
# "architectures": [
# "LlamaForCausalLM"
# ],
# "bos_token_id": 1,
# "eos_token_id": 2,
# "hidden_act": "silu",
# "hidden_size": 4096,
# "initializer_range": 0.02,
# "intermediate_size": 11008,
# "max_position_embeddings": 2048,
# "model_type": "llama",
# "num_attention_heads": 32,
# "num_hidden_layers": 32,
# "pad_token_id": 0,
# "rms_norm_eps": 1e-06,
# "tie_word_embeddings": false,
# "torch_dtype": "float32",
# "transformers_version": "4.30.2",
# "use_cache": true,
# "vocab_size": 32001
# }
tokenizer = transformers.AutoTokenizer.from_pretrained(weights_dir)
question_ids = tokenizer.encode(question + tokenizer.eos_token, return_tensors='pt')
question_ids = question_ids.cuda()
print(tokenizer.eos_token_id, tokenizer.bos_token_id, tokenizer.pad_token_id)
# 2, 1, 32000
print(question_ids)
# tensor([[ 1, 15043, 29892, 727, 29991, 829, 29879, 29958]],
device='cuda:0')
print(tokenizer.decode(question_ids[0]))
# <s> Hello, there!</s>
outputs = model.generate(
question_ids,
eos_token_id=2,
max_new_tokens=200,
num_beams=4,
num_return_sequences=2,
early_stopping=True
)
answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(answer)
# Hello, there!</s>
# Hello, there!</s>
# <s>Hello, there!</s>
No matter how I changing the parameters of model.generate, it always ignores the </s>
as the ending token (id:2).
In addition, the skip_special_tokens
of tokenizer is not working too.
Where am I doing wrong? Please help, many thanks!
Expected behavior
The model.generate
stop at the first time of </s>
Metadata
Metadata
Assignees
Labels
No labels