'eos_token_id' for llama model.generate is not working

### System Info

- `transformers` version: 4.30.2
- Platform: Linux-5.4.0-137-generic-x86_64-with-glibc2.31
- Python version: 3.10.0
- Huggingface_hub version: 0.15.1
- Safetensors version: 0.3.1
- PyTorch version (GPU?): 2.0.1+cu117 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>


### Who can help?

_No response_

### Information

- [ ] The official example scripts
- [x] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction


```
import transformers, torch

weights_dir = "weights/recovered"
question = 'Hello, there!'

model = transformers.AutoModelForCausalLM.from_pretrained(weights_dir)
model = model.cuda()
print(model.config)
# LlamaConfig {
#   "_name_or_path": "weights/recovered",
#   "architectures": [
#     "LlamaForCausalLM"
#   ],
#   "bos_token_id": 1,
#   "eos_token_id": 2,
#   "hidden_act": "silu",
#   "hidden_size": 4096,
#   "initializer_range": 0.02,
#   "intermediate_size": 11008,
#   "max_position_embeddings": 2048,
#   "model_type": "llama",
#   "num_attention_heads": 32,
#   "num_hidden_layers": 32,
#   "pad_token_id": 0,
#   "rms_norm_eps": 1e-06,
#   "tie_word_embeddings": false,
#   "torch_dtype": "float32",
#   "transformers_version": "4.30.2",
#   "use_cache": true,
#   "vocab_size": 32001
# }

tokenizer = transformers.AutoTokenizer.from_pretrained(weights_dir)
question_ids = tokenizer.encode(question + tokenizer.eos_token, return_tensors='pt')
question_ids = question_ids.cuda()

print(tokenizer.eos_token_id, tokenizer.bos_token_id, tokenizer.pad_token_id)
# 2, 1, 32000

print(question_ids)
# tensor([[    1, 15043, 29892,   727, 29991,   829, 29879, 29958]],
       device='cuda:0')

print(tokenizer.decode(question_ids[0]))
# <s> Hello, there!</s>

outputs = model.generate(
        question_ids,
        eos_token_id=2,
        max_new_tokens=200,
        num_beams=4,
        num_return_sequences=2,
        early_stopping=True
    )
answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(answer)
# Hello, there!</s>
# Hello, there!</s>
# <s>Hello, there!</s>
```

No matter how I changing the parameters of model.generate, it always ignores the `</s>` as the ending token (id:2).

In addition, the `skip_special_tokens` of tokenizer is not working too.

Where am I doing wrong? Please help, many thanks!

### Expected behavior

The `model.generate` stop at the first time of `</s>`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

'eos_token_id' for llama model.generate is not working #24644

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

'eos_token_id' for llama model.generate is not working #24644

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions