LangSmith: Best Way to Test LLMs and AI Application
Published on
If you're in the world of Language Learning Models (LLMs), you've probably heard of LangSmith. But do you know how it can transform your LLM applications from good to great? This article is your one-stop guide to understanding LangSmith, a platform that offers a plethora of features for debugging, testing, evaluating, and monitoring LLM applications.
Whether you're a seasoned developer or a beginner in the field of LLMs, LangSmith has something for everyone. From its seamless integration with LangChain to its robust Cookbook filled with real-world examples, LangSmith is a game-changer. Let's dive in!
What is LangSmith?
LangSmith is a cutting-edge platform designed to elevate your LLM applications to production-grade quality. But what does that mean? In simple terms, LangSmith is your toolkit for building, testing, and deploying intelligent agents and chains based on any LLM framework. It's developed by LangChain, the same company behind the open-source LangChain framework, and integrates seamlessly with it.
Key Features of LangSmith
-
Debugging and Testing: LangSmith isn't just about building; it's about building right. The platform offers interactive tutorials and a quick start guide to get you up and running. Whether you're coding in Python, TypeScript, or any other language, LangSmith has got you covered.
-
API and Environment Setup: Before you start building, you'll need to set up your environment. LangSmith makes this easy with its API key access and straightforward environment configuration steps. For instance, you can install the latest version of LangChain for your target environment using simple commands like
pip install -U langchain
. -
Tracing Capabilities: One of the standout features of LangSmith is its ability to trace code. This is crucial for debugging and improving your applications. You can customize run names, trace nested calls, and much more.
Why Choose LangSmith?
-
Ease of Use: LangSmith is designed with user-friendliness in mind. The platform offers a range of tutorials and documentation to help you get started.
-
Versatility: Whether you're working on a small project or a large-scale application, LangSmith is versatile enough to meet your needs.
-
Community Support: LangSmith has a strong community of developers and experts who are always ready to help. You can join the community forums or even contribute to the Cookbook with your own examples.
By now, you should have a good understanding of what LangSmith is and why it's a valuable asset for anyone working with LLMs. In the next section, we'll delve deeper into how to set up LangSmith and make the most of its features.
Setting Up LangSmith
Setting up LangSmith is a breeze, thanks to its user-friendly interface and well-documented steps. But before you dive in, you'll need an API key for access. Don't worry; getting one is as easy as pie.
Steps to Get Your API Key
-
Create a LangSmith Account: Head over to the LangSmith website and sign up for an account. You can use various supported login methods.
-
Navigate to Settings: Once your account is set up, go to the settings page. Here, you'll find the option to create an API key.
-
Generate API Key: Click on the 'Generate API Key' button, and voila! You have your API key.
Configuring Your Environment
After obtaining your API key, the next step is to configure your runtime environment. LangSmith allows you to do this using simple shell commands. Here's how:
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_ENDPOINT=https://api.smith.langchain.com
export LANGCHAIN_API_KEY=<your-api-key>
Replace <your-api-key>
with the API key you generated earlier. These commands set up your environment variables, making it easier to interact with LangSmith.
LangSmith Cookbook: Real-world Lang Smith Examples
The LangSmith Cookbook is not just a compilation of code snippets; it's a goldmine of hands-on examples designed to inspire and assist you in your projects. Whether you're a beginner or an expert in the field of Language Learning Models (LLMs), the Cookbook offers a wealth of practical insights into common patterns and real-world use-cases. So, let's dig deeper into what the LangSmith Cookbook has to offer.
What is the LangSmith Cookbook?
The LangSmith Cookbook is a repository that serves as your practical guide to mastering LangSmith. It goes beyond the basics covered in standard documentation, diving into common patterns and real-world scenarios. These recipes empower you to debug, evaluate, test, and continuously improve your LLM applications.
Your Input Matters
The Cookbook is a community-driven resource. If you have insights to share or feel that a specific use-case has been missed, you're encouraged to raise a GitHub issue or contact the LangChain development team. Your expertise shapes this community, making the Cookbook a dynamic and ever-evolving resource.
Key Examples from the Cookbook
Tracing Your Code
-
Tracing without LangChain (opens in a new tab): Learn how to trace applications independently of LangChain using Python SDK's
@traceable
decorator. -
REST API (opens in a new tab): Get acquainted with REST API features for logging LLM and chat model runs, and understand nested runs.
-
Customizing Run Names (opens in a new tab): Improve UI clarity by assigning specific names to LangSmith chain runs. This includes examples for chains, lambda functions, and agents.
-
Tracing Nested Calls within Tools (opens in a new tab): Learn how to include all nested tool subcalls in a single trace.
-
Display Trace Links (opens in a new tab): Speed up your development by adding trace links to your application. This allows you to quickly see its execution flow, add feedback to a run, or add the run to a dataset.
LangChain Hub
-
RetrievalQA Chain (opens in a new tab): Use prompts from the Hub in an example RAG pipeline.
-
Prompt Versioning (opens in a new tab): Ensure deployment stability by selecting specific prompt versions.
-
Runnable PromptTemplate (opens in a new tab): Save prompts to the Hub from the playground and integrate them into runnable chains.
Testing & Evaluation
-
Q&A System Correctness (opens in a new tab): Evaluate your retrieval-augmented Q&A pipeline end-to-end on a dataset.
-
Evaluating Q&A Systems with Dynamic Data (opens in a new tab): Use evaluators that dereference labels to handle data that changes over time.
-
RAG Evaluation using Fixed Sources (opens in a new tab): Evaluate the response component of a RAG pipeline by providing retrieved documents in the dataset.
-
Comparison Evals (opens in a new tab): Use labeled preference scoring to contrast system versions and determine the most optimal outputs.
-
LangSmith in Pytest (opens in a new tab): Benchmark your chain in pytest and assert aggregate metrics meet the quality bar.
-
Unit Testing with Pytest (opens in a new tab): Write individual unit tests and log assertions as feedback.
-
Evaluating Existing Runs (opens in a new tab): Add AI-assisted feedback and evaluation metrics to existing run traces.
-
Naming Test Projects (opens in a new tab): Manually name your tests with
run_on_dataset(..., project_name='my-project-name')
. -
How to Download Feedback and Examples (opens in a new tab): Export predictions, evaluation results, and other information to add to your reports programmatically.
TypeScript / JavaScript Testing Examples
-
Evaluating JS Chains in Python (opens in a new tab): Evaluate JS chains using custom Python evaluators.
-
Logging Assertions as Feedback (opens in a new tab): Convert CI test assertions into LangSmith feedback.
Using Feedback
-
Streamlit Chat App (opens in a new tab): A minimal chat app that captures user feedback and shares traces of the chat application.
-
Next.js Chat App (opens in a new tab): A Chat app but for Next.js version.
-
Real-time Automated Feedback (opens in a new tab): Generate feedback metrics for every run using an async callback.
-
Real-time RAG Chat Bot Evaluation (opens in a new tab): Automatically check for hallucinations in your RAG chat bot responses against the retrieved documents.
Exporting Data for Fine-tuning
-
OpenAI Fine-Tuning (opens in a new tab): List LLM runs and convert them to OpenAI's fine-tuning format.
-
Lilac Dataset Curation (opens in a new tab): Further curate your LangSmith datasets using Lilac to detect near-duplicates and check for PII.
Exploratory Data Analysis
-
Exporting LLM Runs and Feedback (opens in a new tab): Extract and interpret LangSmith LLM run data for various analytical platforms.
-
Lilac (opens in a new tab): Enrich datasets using the open-source analytics tool, Lilac, to better label and organize your data.
By exploring these examples, you'll gain a comprehensive understanding of LangSmith's capabilities, empowering you to take your LLM applications to the next level. So why wait? Dive into the LangSmith Cookbook and start cooking up some code magic!
Conclusion
LangSmith is not just another tool; it's a comprehensive platform that can take your LLM applications to the next level. From its robust tracing capabilities to its seamless integration with the LangChain Hub, LangSmith offers a range of features designed to make your life easier. And let's not forget the LangSmith Cookbook, a treasure trove of real-world examples and hands-on code snippets. Whether you're just starting out or looking to optimize your existing applications, LangSmith has got you covered.
FAQs
What does LangSmith do?
LangSmith is a platform designed to help you build, test, evaluate, and monitor LLM applications. It offers a range of features including tracing, API access, and a Cookbook filled with real-world examples.
What is the difference between LangSmith and LangChain?
While LangSmith is focused on building and managing LLM applications, LangChain serves as a framework for developing language models. LangSmith integrates seamlessly with LangChain, offering a unified platform for all your LLM needs.
How do I get access to LangSmith?
To get access to LangSmith, you'll need to sign up for an account on their website. Once registered, you can generate an API key that will allow you to interact with the platform.