Apr 25, 2024

Guide on How to Run OpenELM – New Al Models Presented by Apple

Lately Apple have introduced eight open source language models, the OpenELM models (Open-source Efficient Language Models). What makes them special is that they run directly on the device and not on cloud servers. And in this short guide, we will show you how to run and use them.

OpenELM Mobile Phone

Apple have introduced eight open-source language models known as OpenELM (Open-source Efficient Language Models). Unique for their ability to operate directly on devices rather than relying on cloud servers, these models mark a significant advancement in AI technology. This guide will show you how to set up and use these innovative Apple AI models.

Apple's Efficient Language Models

Developers now have access to these large language models, which can be easily downloaded and implemented through the Hugging Face Hub. Notably, four of the OpenELM models were trained with the CoreNet library, a resource also launched by Apple for the training of deep neural networks.

The other four models (Instruct) are designed as instructional tools, capable of interpreting and responding to direct instructions. This full suite of models, along with comprehensive training and evaluation frameworks, is available on public datasets. These include detailed training protocols, various checkpoints, and diverse pre-training configurations.

The OpenELM family includes several models tailored to different needs. Please click to read more details about models.

Running OpenELM via HuggingFace

Install

To help you get started, we've provided a sample function in generate_openelm.py for generating output from OpenELM models via the Hugging Face Hub. To test the model, simply run the following command:

python generate_openelm.py --model [MODEL_NAME] --hf_access_token [HF_ACCESS_TOKEN] --prompt 'Once upon a time there was' --generate_kwargs repetition_penalty=1.2

For access to your Hugging Face token, please follow this link.

Additionally, you can customize the generate function with various arguments. For instance, to enhance inference speed, consider using the prompt_lookup_num_tokens argument for lookup token speculative generation.

python generate_openelm.py --model [MODEL_NAME] --hf_access_token [HF_ACCESS_TOKEN] --prompt 'Once upon a time there was' --generate_kwargs repetition_penalty=1.2 prompt_lookup_num_tokens=10

Or, for an assistive model, use a smaller model via the assistant_model argument as shown below:

python generate_openelm.py --model [MODEL_NAME] --hf_access_token [HF_ACCESS_TOKEN] --prompt 'Once upon a time there was' --generate_kwargs repetition_penalty=1.2 --assistant_model [SMALLER_MODEL_NAME]

Setting up

Make sure to install the necessary dependencies:

# install public lm-eval-harness

harness_repo="public-lm-eval-harness"
git clone https://github.com/EleutherAI/lm-evaluation-harness ${harness_repo}
cd ${harness_repo}
# use main branch on 03-15-2024, SHA is dc90fec
git checkout dc90fec
pip install -e .
cd ..
# 66d6242 is the main branch on 2024-04-01 
pip install datasets@git+https://github.com/huggingface/datasets.git@66d6242
pip install tokenizers>=0.15.2 transformers>=4.38.2 sentencepiece>=0.2.0


Evaluation of OpenELM

# OpenELM-270M
hf_model=OpenELM-270M

# this flag is needed because lm-eval-harness set add_bos_token to False by default, but OpenELM uses LLaMA tokenizer which requires add_bos_token to be True
tokenizer=meta-llama/Llama-2-7b-hf
add_bos_token=True
batch_size=1
mkdir lm_eval_output
shot=0
task=arc_challenge,arc_easy,boolq,hellaswag,piqa,race,winogrande,sciq,truthfulqa_mc2
lm_eval --model hf \
        --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
        --tasks ${task} \
        --device cuda:0 \
        --num_fewshot ${shot} \
        --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
        --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log
shot=5
task=mmlu,winogrande
lm_eval --model hf \
        --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
        --tasks ${task} \
        --device cuda:0 \
        --num_fewshot ${shot} \
        --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
        --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log
shot=25
task=arc_challenge,crows_pairs_english
lm_eval --model hf \
        --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
        --tasks ${task} \
        --device cuda:0 \
        --num_fewshot ${shot} \
        --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
        --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log
shot=10
task=hellaswag
lm_eval --model hf \
        --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
        --tasks ${task} \
        --device cuda:0 \
        --num_fewshot ${shot} \
        --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
        --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log


Conclusion: Considerations for Using OpenELM Models

The introduction of OpenELM models by Apple marks a significant advancement, offering the research community cutting-edge language models. These tools are trained on publicly available datasets and are provided without guarantees of safety. This may lead to outputs that could be inaccurate, harmful, or biased. Therefore, it is crucial for both users and developers to conduct extensive safety tests and establish robust filtering mechanisms to meet their unique needs and ensure responsible usage.

Apr 25, 2024

Guide on How to Run OpenELM – New Al Models Presented by Apple

Lately Apple have introduced eight open source language models, the OpenELM models (Open-source Efficient Language Models). What makes them special is that they run directly on the device and not on cloud servers. And in this short guide, we will show you how to run and use them.

OpenELM Mobile Phone

Apple have introduced eight open-source language models known as OpenELM (Open-source Efficient Language Models). Unique for their ability to operate directly on devices rather than relying on cloud servers, these models mark a significant advancement in AI technology. This guide will show you how to set up and use these innovative Apple AI models.

Apple's Efficient Language Models

Developers now have access to these large language models, which can be easily downloaded and implemented through the Hugging Face Hub. Notably, four of the OpenELM models were trained with the CoreNet library, a resource also launched by Apple for the training of deep neural networks.

The other four models (Instruct) are designed as instructional tools, capable of interpreting and responding to direct instructions. This full suite of models, along with comprehensive training and evaluation frameworks, is available on public datasets. These include detailed training protocols, various checkpoints, and diverse pre-training configurations.

The OpenELM family includes several models tailored to different needs. Please click to read more details about models.

Running OpenELM via HuggingFace

Install

To help you get started, we've provided a sample function in generate_openelm.py for generating output from OpenELM models via the Hugging Face Hub. To test the model, simply run the following command:

python generate_openelm.py --model [MODEL_NAME] --hf_access_token [HF_ACCESS_TOKEN] --prompt 'Once upon a time there was' --generate_kwargs repetition_penalty=1.2

For access to your Hugging Face token, please follow this link.

Additionally, you can customize the generate function with various arguments. For instance, to enhance inference speed, consider using the prompt_lookup_num_tokens argument for lookup token speculative generation.

python generate_openelm.py --model [MODEL_NAME] --hf_access_token [HF_ACCESS_TOKEN] --prompt 'Once upon a time there was' --generate_kwargs repetition_penalty=1.2 prompt_lookup_num_tokens=10

Or, for an assistive model, use a smaller model via the assistant_model argument as shown below:

python generate_openelm.py --model [MODEL_NAME] --hf_access_token [HF_ACCESS_TOKEN] --prompt 'Once upon a time there was' --generate_kwargs repetition_penalty=1.2 --assistant_model [SMALLER_MODEL_NAME]

Setting up

Make sure to install the necessary dependencies:

# install public lm-eval-harness

harness_repo="public-lm-eval-harness"
git clone https://github.com/EleutherAI/lm-evaluation-harness ${harness_repo}
cd ${harness_repo}
# use main branch on 03-15-2024, SHA is dc90fec
git checkout dc90fec
pip install -e .
cd ..
# 66d6242 is the main branch on 2024-04-01 
pip install datasets@git+https://github.com/huggingface/datasets.git@66d6242
pip install tokenizers>=0.15.2 transformers>=4.38.2 sentencepiece>=0.2.0


Evaluation of OpenELM

# OpenELM-270M
hf_model=OpenELM-270M

# this flag is needed because lm-eval-harness set add_bos_token to False by default, but OpenELM uses LLaMA tokenizer which requires add_bos_token to be True
tokenizer=meta-llama/Llama-2-7b-hf
add_bos_token=True
batch_size=1
mkdir lm_eval_output
shot=0
task=arc_challenge,arc_easy,boolq,hellaswag,piqa,race,winogrande,sciq,truthfulqa_mc2
lm_eval --model hf \
        --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
        --tasks ${task} \
        --device cuda:0 \
        --num_fewshot ${shot} \
        --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
        --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log
shot=5
task=mmlu,winogrande
lm_eval --model hf \
        --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
        --tasks ${task} \
        --device cuda:0 \
        --num_fewshot ${shot} \
        --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
        --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log
shot=25
task=arc_challenge,crows_pairs_english
lm_eval --model hf \
        --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
        --tasks ${task} \
        --device cuda:0 \
        --num_fewshot ${shot} \
        --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
        --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log
shot=10
task=hellaswag
lm_eval --model hf \
        --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
        --tasks ${task} \
        --device cuda:0 \
        --num_fewshot ${shot} \
        --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
        --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log


Conclusion: Considerations for Using OpenELM Models

The introduction of OpenELM models by Apple marks a significant advancement, offering the research community cutting-edge language models. These tools are trained on publicly available datasets and are provided without guarantees of safety. This may lead to outputs that could be inaccurate, harmful, or biased. Therefore, it is crucial for both users and developers to conduct extensive safety tests and establish robust filtering mechanisms to meet their unique needs and ensure responsible usage.

Apr 25, 2024

Guide on How to Run OpenELM – New Al Models Presented by Apple

Lately Apple have introduced eight open source language models, the OpenELM models (Open-source Efficient Language Models). What makes them special is that they run directly on the device and not on cloud servers. And in this short guide, we will show you how to run and use them.

OpenELM Mobile Phone

Apple have introduced eight open-source language models known as OpenELM (Open-source Efficient Language Models). Unique for their ability to operate directly on devices rather than relying on cloud servers, these models mark a significant advancement in AI technology. This guide will show you how to set up and use these innovative Apple AI models.

Apple's Efficient Language Models

Developers now have access to these large language models, which can be easily downloaded and implemented through the Hugging Face Hub. Notably, four of the OpenELM models were trained with the CoreNet library, a resource also launched by Apple for the training of deep neural networks.

The other four models (Instruct) are designed as instructional tools, capable of interpreting and responding to direct instructions. This full suite of models, along with comprehensive training and evaluation frameworks, is available on public datasets. These include detailed training protocols, various checkpoints, and diverse pre-training configurations.

The OpenELM family includes several models tailored to different needs. Please click to read more details about models.

Running OpenELM via HuggingFace

Install

To help you get started, we've provided a sample function in generate_openelm.py for generating output from OpenELM models via the Hugging Face Hub. To test the model, simply run the following command:

python generate_openelm.py --model [MODEL_NAME] --hf_access_token [HF_ACCESS_TOKEN] --prompt 'Once upon a time there was' --generate_kwargs repetition_penalty=1.2

For access to your Hugging Face token, please follow this link.

Additionally, you can customize the generate function with various arguments. For instance, to enhance inference speed, consider using the prompt_lookup_num_tokens argument for lookup token speculative generation.

python generate_openelm.py --model [MODEL_NAME] --hf_access_token [HF_ACCESS_TOKEN] --prompt 'Once upon a time there was' --generate_kwargs repetition_penalty=1.2 prompt_lookup_num_tokens=10

Or, for an assistive model, use a smaller model via the assistant_model argument as shown below:

python generate_openelm.py --model [MODEL_NAME] --hf_access_token [HF_ACCESS_TOKEN] --prompt 'Once upon a time there was' --generate_kwargs repetition_penalty=1.2 --assistant_model [SMALLER_MODEL_NAME]

Setting up

Make sure to install the necessary dependencies:

# install public lm-eval-harness

harness_repo="public-lm-eval-harness"
git clone https://github.com/EleutherAI/lm-evaluation-harness ${harness_repo}
cd ${harness_repo}
# use main branch on 03-15-2024, SHA is dc90fec
git checkout dc90fec
pip install -e .
cd ..
# 66d6242 is the main branch on 2024-04-01 
pip install datasets@git+https://github.com/huggingface/datasets.git@66d6242
pip install tokenizers>=0.15.2 transformers>=4.38.2 sentencepiece>=0.2.0


Evaluation of OpenELM

# OpenELM-270M
hf_model=OpenELM-270M

# this flag is needed because lm-eval-harness set add_bos_token to False by default, but OpenELM uses LLaMA tokenizer which requires add_bos_token to be True
tokenizer=meta-llama/Llama-2-7b-hf
add_bos_token=True
batch_size=1
mkdir lm_eval_output
shot=0
task=arc_challenge,arc_easy,boolq,hellaswag,piqa,race,winogrande,sciq,truthfulqa_mc2
lm_eval --model hf \
        --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
        --tasks ${task} \
        --device cuda:0 \
        --num_fewshot ${shot} \
        --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
        --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log
shot=5
task=mmlu,winogrande
lm_eval --model hf \
        --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
        --tasks ${task} \
        --device cuda:0 \
        --num_fewshot ${shot} \
        --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
        --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log
shot=25
task=arc_challenge,crows_pairs_english
lm_eval --model hf \
        --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
        --tasks ${task} \
        --device cuda:0 \
        --num_fewshot ${shot} \
        --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
        --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log
shot=10
task=hellaswag
lm_eval --model hf \
        --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
        --tasks ${task} \
        --device cuda:0 \
        --num_fewshot ${shot} \
        --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
        --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log


Conclusion: Considerations for Using OpenELM Models

The introduction of OpenELM models by Apple marks a significant advancement, offering the research community cutting-edge language models. These tools are trained on publicly available datasets and are provided without guarantees of safety. This may lead to outputs that could be inaccurate, harmful, or biased. Therefore, it is crucial for both users and developers to conduct extensive safety tests and establish robust filtering mechanisms to meet their unique needs and ensure responsible usage.

Apr 25, 2024

Guide on How to Run OpenELM – New Al Models Presented by Apple

Lately Apple have introduced eight open source language models, the OpenELM models (Open-source Efficient Language Models). What makes them special is that they run directly on the device and not on cloud servers. And in this short guide, we will show you how to run and use them.

OpenELM Mobile Phone

Apple have introduced eight open-source language models known as OpenELM (Open-source Efficient Language Models). Unique for their ability to operate directly on devices rather than relying on cloud servers, these models mark a significant advancement in AI technology. This guide will show you how to set up and use these innovative Apple AI models.

Apple's Efficient Language Models

Developers now have access to these large language models, which can be easily downloaded and implemented through the Hugging Face Hub. Notably, four of the OpenELM models were trained with the CoreNet library, a resource also launched by Apple for the training of deep neural networks.

The other four models (Instruct) are designed as instructional tools, capable of interpreting and responding to direct instructions. This full suite of models, along with comprehensive training and evaluation frameworks, is available on public datasets. These include detailed training protocols, various checkpoints, and diverse pre-training configurations.

The OpenELM family includes several models tailored to different needs. Please click to read more details about models.

Running OpenELM via HuggingFace

Install

To help you get started, we've provided a sample function in generate_openelm.py for generating output from OpenELM models via the Hugging Face Hub. To test the model, simply run the following command:

python generate_openelm.py --model [MODEL_NAME] --hf_access_token [HF_ACCESS_TOKEN] --prompt 'Once upon a time there was' --generate_kwargs repetition_penalty=1.2

For access to your Hugging Face token, please follow this link.

Additionally, you can customize the generate function with various arguments. For instance, to enhance inference speed, consider using the prompt_lookup_num_tokens argument for lookup token speculative generation.

python generate_openelm.py --model [MODEL_NAME] --hf_access_token [HF_ACCESS_TOKEN] --prompt 'Once upon a time there was' --generate_kwargs repetition_penalty=1.2 prompt_lookup_num_tokens=10

Or, for an assistive model, use a smaller model via the assistant_model argument as shown below:

python generate_openelm.py --model [MODEL_NAME] --hf_access_token [HF_ACCESS_TOKEN] --prompt 'Once upon a time there was' --generate_kwargs repetition_penalty=1.2 --assistant_model [SMALLER_MODEL_NAME]

Setting up

Make sure to install the necessary dependencies:

# install public lm-eval-harness

harness_repo="public-lm-eval-harness"
git clone https://github.com/EleutherAI/lm-evaluation-harness ${harness_repo}
cd ${harness_repo}
# use main branch on 03-15-2024, SHA is dc90fec
git checkout dc90fec
pip install -e .
cd ..
# 66d6242 is the main branch on 2024-04-01 
pip install datasets@git+https://github.com/huggingface/datasets.git@66d6242
pip install tokenizers>=0.15.2 transformers>=4.38.2 sentencepiece>=0.2.0


Evaluation of OpenELM

# OpenELM-270M
hf_model=OpenELM-270M

# this flag is needed because lm-eval-harness set add_bos_token to False by default, but OpenELM uses LLaMA tokenizer which requires add_bos_token to be True
tokenizer=meta-llama/Llama-2-7b-hf
add_bos_token=True
batch_size=1
mkdir lm_eval_output
shot=0
task=arc_challenge,arc_easy,boolq,hellaswag,piqa,race,winogrande,sciq,truthfulqa_mc2
lm_eval --model hf \
        --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
        --tasks ${task} \
        --device cuda:0 \
        --num_fewshot ${shot} \
        --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
        --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log
shot=5
task=mmlu,winogrande
lm_eval --model hf \
        --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
        --tasks ${task} \
        --device cuda:0 \
        --num_fewshot ${shot} \
        --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
        --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log
shot=25
task=arc_challenge,crows_pairs_english
lm_eval --model hf \
        --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
        --tasks ${task} \
        --device cuda:0 \
        --num_fewshot ${shot} \
        --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
        --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log
shot=10
task=hellaswag
lm_eval --model hf \
        --model_args pretrained=${hf_model},trust_remote_code=True,add_bos_token=${add_bos_token},tokenizer=${tokenizer} \
        --tasks ${task} \
        --device cuda:0 \
        --num_fewshot ${shot} \
        --output_path ./lm_eval_output/${hf_model//\//_}_${task//,/_}-${shot}shot \
        --batch_size ${batch_size} 2>&1 | tee ./lm_eval_output/eval-${hf_model//\//_}_${task//,/_}-${shot}shot.log


Conclusion: Considerations for Using OpenELM Models

The introduction of OpenELM models by Apple marks a significant advancement, offering the research community cutting-edge language models. These tools are trained on publicly available datasets and are provided without guarantees of safety. This may lead to outputs that could be inaccurate, harmful, or biased. Therefore, it is crucial for both users and developers to conduct extensive safety tests and establish robust filtering mechanisms to meet their unique needs and ensure responsible usage.

Sign up just in one minute.

Sign up just in one minute

Sign up just in one minute