std and mean for image normalization different from ImageNet

torchvision model-zoo's image normalization is:

```
mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
```

CLIP's is:
```
mean=[0.48145466, 0.4578275, 0.40821073], std=[0.26862954, 0.26130258, 0.27577711]
```

what's the story behind the difference? Are CLIP's normalization parameters re-calculated on WebImageText?