We provide multiple examples of using LightSeq to accelerate Hugging Face model training.
Before doing next training, you need to switch to the BERT directory:
cd examples/training/huggingface/bertFirst you should install these requirements:
pip install torch ninja transformers seqeval datasetsThen you can easily fine-tunes BERT on different tasks by running the bash scripts task_ner/run_ner.sh
, task_glue/run_glue.sh, task_qa/run_qa.sh, etc.
You can also fine-tune the models using int8 mixed-precision by running task_ner/run_quant_ner.sh.
Before doing next training, you need to switch to the GPT2 directory:
cd examples/training/huggingface/gptFirst you should install these requirements:
pip install -r requirements.txtThen you can easily fine-tunes GPT2 by running the bash scripts run_clm.sh.
You can also fine-tune the models using int8 mixed-precision by running run_quant_clm.sh.
Before doing next training, you need to switch to the GPT2 directory:
cd examples/training/huggingface/bart/summarizationFirst you should install these requirements:
pip install -r requirements.txtThen you can easily fine-tunes BART by running the bash scripts run_summarization.sh.
Before doing next training, you need to switch to the ViT directory:
cd examples/training/huggingface/vitFirst you should install these requirements:
pip install torch ninja transformers seqeval datasetsThen you can easily fine-tunes ViT by running the bash scripts run_vit.sh.
You can also fine-tune the models using int8 mixed-precision by running run_quant_vit.sh.
LightSeq support Hugging Face training using GCQ. Taking BERT as an example, first you need to switch to BERT directory you can easily fine-tunes BERT with GCQ on different tasks by running the bash scripts task_ner/run_gcq_ner.sh , task_glue/run_gcq_glue.sh, task_qa/run_gcq_qa.sh, etc.
You can use --enable_GCQ to enable GCQ in your multi-machine distributed training.
You can set --GCQ_quantile to a float value between 0.0 and 1.0, which will use the quantile of gradient bucket as clip-max value when quantizing gradients. E.g., when setting --GCQ_quantile 0.99, the clip-max value is equal to the 0.99-th quantile of gradient bucket.
You can use multiple NICs in NCCL communication. E.g., if every machine has 4 NICs: eth0, eth1, eth2, eth3, you can use the following command.
export NCCL_SOCKET_IFNAME=eth0,eth1,eth2,eth3