How to Run LLM in Google Colab for Free

Name: Lynn Mikami

Published on 4/30/2024

If you've been wanting to experiment with large language models but are concerned about the hefty price tag that often comes with them, there's good news for you. Grace Smith from Cheatsheet.md is here to share some money-saving tips with tech enthusiasts who are eager to dive into the world of big language models.

Introducing Google Colab

The first cost-saving method Grace suggests is using Google Colab, a free cloud-based Jupyter notebook environment. With just a Google account, you can write and run Python code in the cloud, alleviating any worries about your computer's processing power. What's more, Google Colab supports GPU acceleration, making training large models faster and easier.

Exploring Language Model Providers

While many people are familiar with OpenAI as a provider of large language models, Grace highlights another option: Azure OpenAI. Specifically designed for enterprise users, Azure OpenAI offers a solid alternative to consider. In addition to these providers, there are other open-source platforms like Hugging Face and Fireworks AI that offer a wide selection of high-quality models to choose from.

Running Large Models on Google Colab

Curious to know how you can leverage Google Colab to run large language models? Grace provides a step-by-step guide:

Create a Google account if you don't already have one.
Visit the provided link: https://colab.research.google.com/ to access the Google Colab interface.
In the "Files" menu, select "New Notebook" to create a new Jupyter notebook.
Before proceeding, you need to mount Google Drive to ensure your files are saved. Without this step, any files downloaded within the notebook will be temporary and won't persist between sessions. Run the following code snippet to mount your Google Drive:

from google.colab import drive
drive.mount('/content/drive')

To take advantage of GPU acceleration, click on the "Runtime" option in the menu and select "Change Runtime Type." Under "Hardware Accelerator," choose "GPU." Google Colab offers free access to a 15G T4 GPU.
To verify you're in the GPU environment, run the following code snippet:

import tensorflow as tf
tf.test.gpu_device_name()
! /opt/bin/nvidia-smi

Now you're ready to use open-source models. Hugging Face is a popular choice, offering a wide range of transformers-based models. Simply providing the model name to the HuggingFaceEmbedding class will handle the model download, loading, and text embedding calculations for you.

Grace provides an example code snippet using Hugging Face and Llama 2 open-source models for intelligent search and large-scale knowledge base applications. The code showcases the process of loading pre-trained models, encoding text, retrieving similar information from a knowledge base, and generating responses.

Step by Step Sample Codes to Run LLM with Google Colab

Creating an application with LLMs such as Stable Diffusion and Google Flan T5 XL requires a step-by-step approach, especially when leveraging the power of Google Colab's free GPUs. This guide will walk you through setting up a Flask application that integrates these models, allowing you to generate images and text based on user input. The process involves coding in Python, utilizing Flask for the web framework, and deploying the app with ngrok for public accessibility.

Step 1: Setting Up Your Environment

Before diving into the code, ensure you have a Google account to access Google Colab. Google Colab is a powerful platform that allows you to write, run, and share Python code through your browser.

Step 2: Creating a Flask Application

Flask is a lightweight WSGI web application framework in Python. It's designed to make getting started quick and easy, with the ability to scale up to complex applications. Begin by setting up your Flask application structure:

from flask import Flask, render_template, request
from flask_ngrok import run_with_ngrok
 
app = Flask(__name__)
run_with_ngrok(app)  # Start ngrok when app is run

Step 3: Integrating LLMs

For this project, you'll use two models: Stable Diffusion for generating images and Google Flan T5 XL for text generation. These models require specific libraries: transformers, diffusers, and torch.

import torch
from diffusers import StableDiffusionPipeline
from transformers import T5Tokenizer, T5ForConditionalGeneration
 
# Load the Flan-T5-XL model
tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-xl")
model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-xl").to("cuda")
 
# Load the Stable Diffusion model
pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", 
    revision="fp16", 
    torch_dtype=torch.float16
).to("cuda")

Step 4: Designing Your Application

Create a basic HTML template for your application. This template will include a form where users can enter prompts for the AI models. Save this template as index.html in a templates folder within your Flask project directory.

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>LLM Flask App</title>
</head>
<body>
    <form action="/submit-prompt" method="post">
        <input type="text" name="prompt-input" placeholder="Enter your prompt">
        <button type="submit">Generate</button>
    </form>
</body>
</html>

Step 5: Handling Requests

Back in your Flask app, set up routes to handle requests. When a user submits a prompt, your application will use the LLMs to generate an image and text based on that prompt.

from flask import Flask, render_template, request, jsonify
from io import BytesIO
import base64
 
@app.route('/')
def index():
    return render_template('index.html')
 
@app.route('/submit-prompt', methods=['POST'])
def generate():
    prompt = request.form['prompt-input']
    
    # Generate image with Stable Diffusion
    image = pipe(prompt=prompt).images[0]
    buffered = BytesIO()
    image.save(buffered, format="PNG")
    img_str = base64.b64encode(buffered.getvalue()).decode()
    
    # Generate text with Flan-T5-XL
    input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to("cuda")
    generated_output = model.generate(input_ids, max_length=512)
    generated_text = tokenizer.decode(generated_output[0], skip_special_tokens=True)
    
    return render_template('index.html', generated_image=img_str, generated_text=generated_text)

Step 6: Deploying with Ngrok

Ngrok exposes local servers behind NATs and firewalls to the public internet over secure tunnels. After installing ngrok, authenticate it using your token:

!ngrok authtoken <YOUR_AUTHTOKEN_HERE>

Run your Flask app, and ngrok will provide a public URL to access it:

!python app.py

Step 7: Running in Google Colab

To run this entire setup in Google Colab, you'll clone your GitHub repository containing the Flask app and execute it. Google Colab allows you to install necessary libraries, authenticate ngrok, and run your Flask application, all from a notebook.

Clone your repository:

!git clone https://github.com/yourusername/yourrepository.git

Change directory to your app:

import os
os.chdir("your repository")

Install the required Python packages:

!pip install flask flask_ngrok torch diffusers transformers

Authenticate ngrok:

!ngrok authtoken YOUR_NGROK_AUTHTOKEN

Execute your Flask application. This command runs the Flask app and makes it accessible via a public URL provided by ngrok:

!python app.py

Upon running the Flask application in Google Colab, you'll see an output with a link to your ngrok public URL. This URL leads to your Flask application, now accessible from anywhere.

Step 8: Interacting with Your Application

Navigate to the provided ngrok URL to view your Flask application. Enter a prompt into the form and submit it. The backend Flask application processes this input using the Stable Diffusion and Google Flan T5 XL models to generate an image and text. These results are then displayed on the same page, showcasing the capabilities of these large language models.

Step 9: Exploring Further

This project scratches the surface of what's possible with LLMs and cloud-based computing. Consider enhancing your application with additional features, such as:

Customization Options: Allow users to specify parameters for the image and text generation, such as the model's creativity level or the type of image to generate.
Handling Larger Loads: Implement queuing systems to handle multiple requests efficiently, ensuring that your application scales with user demand.
Advanced Model Usage: Explore other models and their unique capabilities. For instance, you might integrate models specialized in specific domains like medical advice or legal analysis.

Conclusion

Building a Flask application to run large language models like Stable Diffusion and Google Flan T5 XL, and deploying it with Google Colab and ngrok, demonstrates the accessibility and power of modern AI technologies. With just a few steps, developers can create interactive applications that leverage state-of-the-art models, all within a browser. This guide not only highlights the practical application of these models but also encourages further exploration into AI's potential to transform industries and creative endeavors.

As you delve deeper into AI development, remember the importance of ethical considerations, particularly in generating content that respects copyright, privacy, and fairness. The journey into AI and machine learning is full of opportunities to create impactful, innovative applications that respect these principles.

The Power of Google Colab with Language Models

By combining Google Colab's free cloud environment with libraries like sentence-transformers and the Llama 2 open-source model, tech enthusiasts can easily delve into semantic retrieval and question-answering tasks. Not only does this approach save valuable hardware resources, but it also enables the creation of smarter and more personalized question-answering systems.

With Google Colab, developers can explore the possibilities of large language models without breaking the bank. Grace encourages readers to try it themselves and discover even more exciting applications. Feel free to leave comments and share your achievements!

In conclusion, Grace Smith, an independent open-source software developer and author of SolidUI, is passionate about new technologies, particularly in the field of AI and data. If you found her article informative and engaging, don't forget to show your support by liking and bookmarking it.

Remember: always stay curious and keep exploring the fascinating world of technology!

How to Easily Run Llama 3 Locally without Hassle How to Run Mistral Models Locally - A Complete Guide