Chapter 1 - Introduction to Language Models

Exploring the exciting field of Language AI

No description has been provided for this image No description has been provided for this image No description has been provided for this image Open In Colab


This notebook is for Chapter 1 of the Hands-On Large Language Models book by Jay Alammar and Maarten Grootendorst.


No description has been provided for this image

[OPTIONAL] - Installing Packages on No description has been provided for this image

If you are viewing this notebook on Google Colab (or any other cloud vendor), you need to uncomment and run the following codeblock to install the dependencies for this chapter:


💡 NOTE: We will want to use a GPU to run the examples in this notebook. In Google Colab, go to Runtime > Change runtime type > Hardware accelerator > GPU > GPU type > T4.


In [1]:
%%capture
!pip install "transformers>=4.40.1" "accelerate>=0.27.2"
In [2]:
import os
os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'

Phi-3¶

The first step is to load our model onto the GPU for faster inference. Note that we load the model and tokenizer separately (although that isn't always necessary).

In [3]:
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct",
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=False,
)
tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")
/root/.pyenv/versions/3.11.1/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
Loading checkpoint shards: 100%|██████████| 2/2 [00:30<00:00, 15.40s/it]

Although we can now use the model and tokenizer directly, it's much easier to wrap it in a pipeline object:

In [4]:
from transformers import pipeline

# Create a pipeline
generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    return_full_text=False,
    max_new_tokens=500,
    do_sample=False
)
Device set to use cuda

Finally, we create our prompt as a user and give it to the model:

In [5]:
# The prompt (user input / query)
messages = [
    {"role": "user", "content": "Create a funny joke about chickens."}
]

# Generate output
output = generator(messages)
print(output[0]["generated_text"])
 Why did the chicken join the band? Because it had the drumsticks!
In [6]:
# The prompt (user input / query)
messages = [
    {"role": "user", "content": "Create a funny joke about little fat pigs."}
]

# Generate output
output = generator(messages)
print(output[0]["generated_text"])
 Why did the little fat pig go to the doctor? Because he needed to get his "bacon" checked!