Keywords AI
Introducing Lab & Testsets: Efficient LLM evaluation tools
We've launched Lab & Testsets! 🔬
📡 Lab: A spreadsheet-style editor for running prompts and models across multiple test cases. Import testsets to easily test, evaluate and optimize your LLM outputs.
📊 Testsets: Easily manage and organize test cases. Import a CSV file and edit it like a Google Sheet.
Customize Retries on Keywords AI
Retries feature: When an LLM call fails, our system detects the error and retries the request to prevent failover.
Customize your Retries settings!
Manage your usage limit and rate limits.
We've added a new Limits page where you can view your current usage and rate limits.
You can also customized your usage limits for every deployment.
OpenAI o1 family now available on Keywords AI
O1 family is designed to spend more time thinking before they respond. They can reason through complex tasks and solve harder problems than previous models in science, coding, and math.
BAML integration
We've partnered with Boundary (YC W23)! Now you can build your LLM app using BAML and monitor it with Keywords AI (YC W24). Simply create a Keywords AI client to get started.
BAML is a templating language for writing typed LLM functions, treating prompts as functions. Learn how to integrate: BAML documentation.
User intent classification
We participated in Mintlify's hackathon this weekend and launched an open-source project for user intent classification. This tool can be used to identify user intents in chatbots or other conversational AI projects.
For more details, please visit our Github repository.
Caches UI
We've added Caches to our frontend, allowing you to see cache hit counts and the time and cost savings from using the cache.
For more information on Caches, please refer to our documentation.
LLM monitoring -> Multimodal LLM monitoring
We're excited to announce that we've expanded to multimodal monitoring! Our unified Model API now supports over 200 LLMs, embedding models, and audio models — all monitored on a single platform.
Learn how to monitor your multimodal models by checking out our documentation.
PostHog integration
You can bring LLM metrics from Keywords AI (YC W24) to your PostHog dashboard easily! Check out the tutorial here!
New LLM usage page
We're introducing the new LLM Usage page! This page shows an overview of your LLM usage, including request numbers, LLM costs, and evaluation costs. It also breaks down your usage by month, helping you track patterns and improve AI performance.
LLM request caching
Supercharge your LLM calls with caching. Our new Caches feature allows you to store and reuse LLM responses, eliminating redundant API calls.
This smart caching system optimizes your AI performance by delivering instant responses, reducing costs, and ensuring consistent, high-quality outputs.
Mistral Large 2 availability
Integrated Mistral Large 2 into Keywords AI, try it out in our model playground.
Llama 3.1 family availability
Integrated Llama 3.1 faimliy into Keywords AI, try it out in our model playground.
Introducing Threads
We pushed our latest update to improve the observability of chatlogs. Now, you can group logs by thread ID to have a better understanding of the conversation flow.
GPT-4o mini availability
Integrated GPT-4o mini into Keywords AI, try it out in our model playground.
Minute-level dashboard graphs
We have put much effort into improving the performance of our dashboard. Now, you can see your LLM usage and performance at minute and hour levels. Its loading time is even 2x faster than before.
Introducing fallbacks
Today, you could specify fallback models for your LLM deployments on our platform. If the primary model fails to respond, your fallback models will be used instead. This feature is especially useful for critical deployments where you can't afford any downtime.
Model load balancing
We're thrilled to introduce our latest feature.
There are 2 ways to load balance your LLM requests. First, you can specify weights for the models you want to load balance in the code, ensuring that requests are distributed based on your desired percentages.
The second way is to easily add your credentials and set the weight for each one to distribute requests efficiently between deployments.
Introducing Datasets
Today, we are introducing a new feature called Datasets. With Datasets, you can effortlessly save and export log data for various purposes, such as fine-tuning, synthetic data generation, and evaluation. Simply click the 'Create Dataset' button on the Logs page and adjust the filters as needed.