Saturday, March 2, 2024
No menu items!
Home Blog Page 2

StabilityAI Releases Stable Cascade

StabilityAI has made a new contribution with the introduction of Stable Cascade—a cutting-edge text-to-image model that is set to redefine the way we interact with AI-generated visuals. Tailored for enthusiasts and developers alike, Stable Cascade stands out by being released under a non-commercial license, which opens the doors for countless non-commercial applications and learning opportunities.

Image made by X user @cocktailpeanut with Stability Cascade

This model leverages a three-stage approach, making it not only groundbreaking but also exceptionally user-friendly in terms of training and fine-tuning—even on standard consumer hardware. The creators of Stable Cascade have revolutionized the field with their hierarchical compression technique, which facilitates the creation of high-quality images from a highly compressed latent space. This offers a powerful and efficient method for generating images that could potentially transform the industry.

Not just that but, Stable Cascade has been engineered to provide seamless integration with the diffusers library, ensuring that users can employ the model for inference with ease. In a move to foster transparency and collaboration, StabilityAI has made the model’s training and inference code publicly accessible on their GitHub page.

Features of Stable Cascade

What sets Stable Cascade apart is its unique architecture, which consists of three distinct stages—A, B, and C—that work in concert to produce exceptional outputs. This departure from the Stable Diffusion models showcases StabilityAI’s commitment to innovation and versatility within the AI space.

Adding to its impressive capabilities, the model offers additional features such as image variations and image-to-image generation. These features not only enhance the creative possibilities but also demonstrate the flexibility of the model to cater to a wide range of artistic and practical applications.

The comprehensive release of Stable Cascade does not stop at the model itself. It includes all the necessary code for training and fine-tuning, accompanied by tools like ControlNet and LoRA, which aim to lower the barriers to further experimentation and refinement of this already remarkable architecture.

As StabilityAI unveils Stable Cascade to the world, the potential for creativity and innovation in the realm of text-to-image models takes a monumental leap forward, promising to unlock new possibilities for creators and developers alike.

Stable Cascade’s Unique Architecture

Stable Cascade is a new text to image model released by Stability AI. It is built on a three-stage architecture, comprising Stages A, B, and C, which allows for a hierarchical compression of images, achieving remarkable outputs while utilizing a highly compressed latent space. The model is exceptionally easy to train and finetune on consumer hardware, and it is being released under a non-commercial license that permits non-commercial use only.

The three stages of the Stable Cascade architecture are:

  • Stage A: This stage generates a low-resolution version of the image.
  • Stage B: This stage refines the image from Stage A and adds more detail.
  • Stage C: This stage generates the final, high-resolution image.

Stable Cascade introduces an interesting three-stage approach, setting new benchmarks for quality, flexibility, fine-tuning, and efficiency with a focus on further eliminating hardware barriers. The model is available for inference in the diffusers library. The architecture of Stable Cascade allows for additional training or finetuning, including ControlNets and LoRAs, to be completed singularly on Stage C, which comes with a 16x cost reduction compared to training a similar-sized Stable Diffusion model. The model’s modular approach helps keep the expected VRAM requirements for inference to approximately 20gb but can be further lowered by using the smaller variants. Stable Cascade performs best in both prompt alignment and aesthetic quality in almost all model comparisons.

In addition to standard text-to-image generation, Stable Cascade can generate image variations and image-to-image generations. The release includes all the code for training, finetuning, ControlNet, and LoRA to lower the requirements to experiment with this architecture further.

Final Thoughts

The model overall looks promising. It seems to do pretty well with text in images, something AI has seemed to strugle with. However, most AI image models are getting better at it. Ideogram was one of the first to release decent text in images, then came DALL-E 3 and eventually Midjourney.

My concern with these models has always been whether they can be freely downloaded and fucked around with. As long as the community is able to get their hands on them and fine-tune them, train new base models and LoRAs, and just generally break them in new and unexpected ways, then the existence of a commercial license seems completely fine to me. From what I’ve seen it works better. not 100% perfect, but hands and text seem a lot better finally.

While I’m excited about the new base model and architecture from Stability AI, which is akin to SD 1.5, SDXL, and Cascade in terms of being a foundational model that needs fine-tuning by the open-source community, there’s one concern weighing on my mind. Specifically, it’s the $20/month licensing fee – if I have to pay this even without generating any net earnings from a project, it could make devs pause before diving in. Ideally, I’d prefer a structure where I only need to pay once my earnings can cover the cost. It’s worth noting that Stability AI is currently losing $8 million per month and relies heavily on support from its community for survival. Nonetheless, stability remains crucial as it ensures continued progress in this field.

Nvidia Releases Chat with RTX

Nvidia just released Chat with RTX, an open sourced local AI Chatbot for PCs Powered by Its Own GPUs: This is from Nvidia’s new technology demo called “Chat with RTX” that allows users to use open-source AI large-language models to interact with their local files and documents.

An AI chatbot that runs locally on your PC

Nvidia has released chat with RTX, in a tech demo they showed what allows users to personalize a chatbot with their own content, accelerated by a local NVIDIA GeForce RTX 30 Series GPU or higher with at least 8GB of video random access memory, or VRAM. The tool uses retrieval-augmented generation (RAG), NVIDIA TensorRT-LLM software, and NVIDIA RTX acceleration to bring generative AI capabilities to local, GeForce-powered Windows PCs.

Users can connect local files on a PC as a dataset to an open-source large language model, enabling queries for quick, contextually relevant answers. The tool supports various file formats and allows users to include information from YouTube videos and playlists. Chat with RTX runs locally on Windows RTX PCs and workstations, providing fast results, and ensuring that the user’s data stays on the device. It requires a GeForce RTX 30 Series GPU or higher with a minimum 8GB of VRAM, Windows 10 or 11, and the latest NVIDIA GPU drivers. The app is built from the TensorRT-LLM RAG developer reference project, available on GitHub, and developers can use the reference project to develop and deploy their own RAG-based applications for RTX, accelerated by TensorRT-LLM.

Open Source Continues

The release of Chat with RTX is a testament to the ongoing commitment Nvidia has to the open-source community. The decision to allow local processing of AI applications opens up a new frontier for developers and enthusiasts alike. By running these models locally, users have greater control over their privacy and data security while still tapping into the power of cutting-edge AI.

With the compatibility of open-source models like Mistral and Llama, users can now leverage the power of Nvidia GPUs to run sophisticated large-language models directly on their PCs. This local approach is not only a boon for privacy but also for performance, as it reduces the latency typically associated with cloud-based services. As users interact with these AI models, their feedback and modifications can contribute to the larger pool of knowledge, fostering a collaborative environment for improvement and growth.

Closing Thoughts

Nvidia’s latest move with Chat with RTX is nothing short of a bold stride into a future where local AI processing becomes as commonplace as the graphics processing we’ve become accustomed to. The thought of models meticulously optimized for maximum performance on specific hardware is an attractive one. There’s something deeply satisfying about the economy of resources—no excess, no waste—just pure, streamlined efficiency. Nvidia’s understanding of this is clear; by refining their GPUs to tailor-fit the demands of large language models (LLMs), they’re maximizing the value that users get out of their hardware.

This optimization goes beyond sheer performance. It’s the realization that they don’t need to license their GPU architectures to third parties to make an impact in the AI space. Instead, they can be the direct LLM provider, leveraging their hardware expertise to craft a user-friendly AI ecosystem. The integration of desktop retrieval-augmented generation (RAG) is particularly exciting. Historically, local UIs have either overlooked this feature or tacked it on as an afterthought. Nvidia’s holistic approach indicates a keen understanding of what users want and need.

From a personal standpoint, I am thrilled by the user-friendly aspect of their new technology. The ability to easily upload and train one’s own datasets is often considered an advanced task, yet Nvidia appears to making this beginner friendly. For beginners looking to dip their toes into the world of customized AI, this approachability is a significant draw.

Nonetheless, it’s important to recognize that the concept of running a local model, even one powered by RAG, is not necessarily for those with the right hardware. However, Nvidia distinguishes itself not through the novelty of the idea, but in the execution—delivering a seamless, accessible experience for a broad audience.

Reka Releases Reka Flash, a Highly Capable Multimodal Model

In the ever-evolving landscape of AI, Reka is setting a new standard with the unveiling of Reka Flash, an exceptional multimodal and multilingual model designed for efficiency and speed. Emerging as a “turbo-class” contender, the 21-billion parameter powerhouse, Reka Flash, has been meticulously trained from the ground up to push the boundaries of AI capabilities. It stands out in the marketplace with its ability to rival the performance of much larger contemporaries, striking a formidable balance between agility and quality. This makes it an ideal solution for demanding applications that necessitate rapid processing without compromising on output excellence.

As Reka solidifies its position in the high-performance AI arena, Reka Edge offers a compact alternative. With a 7-billion parameter construct, it’s tailored for environments where efficiency is paramount. Whether deployed on devices or utilized locally, Reka Edge promises to deliver robust AI capabilities without the heft of its larger counterparts.

Available for exploration in the Reka Playground through a public beta, Reka Flash and Reka Edge are poised to redefine what’s possible in the intersection of language comprehension and visual perception. And for those looking to push the envelope even further, Reka teases the arrival of its most ambitious project yet, Reka Core, set to launch in the coming weeks.

Overview of Reka’s new AI model

As per their benchmarks, he models include Reka Flash, Gemini Pro, GPT-3.5, Grok-1, Mixtral 45B, Llama-2, GPT-4, and Gemini Ultra. The benchmarks include MMLU, GSM8K, HumanEval, and GPQA.

Here are some of the key things you can tell from the benchmark:

  • Reka Flash performs well on all four benchmarks, but it is not the best model on any of them.
  • Reka Flash is a relatively small model (21B parameters), but it is able to achieve competitive performance with much larger models.
  • The best model on a particular benchmark depends on the specific task that the model is being used for.

Overall, their results shows that their model is pretty powerful for its size.

Reka Multimodal Capabilities

Reka Flash performs well across the board on the listed benchmarks. It’s also worth noting that this table only shows a small sample of benchmarks. There are many other factors to consider when evaluating a language model, such as its training data, its architecture, and its computational efficiency.

Testing The Model

First let’s start off by giving it a simple coding question.

Ok not bad. Not let’s ask it a hard question.

This question was pulled from Leetcode 2751 Robot Collisions. Notice how I didn’t mention Leetcode or the question tile in the prompt? I did this so we can try and make sure there the question wasn’t seen in its training data by chance. I also tried to pick a relatively newer question, so the chances of it being in its data were even less. Nonethless, here is the result we got. It seemed to have gotten the correct parameters, it just has a different name function and no return types. hWich makes sense, considering we just asked the raw question.

I will post the rest of the answer here in case you want to copy it:

def survivingRobots(positions, healths, directions):
    i = 0
    while i < len(positions) - 1:
        j = i + 1
        while j < len(positions):
            if directions[i] == 'L' and positions[i] == positions[j]:
                if healths[i] < healths[j]:
                    healths[i] -= 1
                elif healths[i] > healths[j]:
                    healths[j] = 0
                    healths[i] = 0
                    healths[j] = 0
                j += 1
            elif directions[i] == 'R' and positions[i] == positions[j]:
                if healths[i] < healths[j]:
                    healths[j] -= 1
                elif healths[i] > healths[j]:
                    healths[i] = 0
                    healths[i] = 0
                    healths[j] = 0
                j += 1
                i += 1
                j += 1
    return [h for h in healths if h > 0]

Unfortunately when it was ran it only passed 128 / 2433 testcases. Albeit it’s a hard question and most language models probably wouldn’t get it first shot.

Testing Reka Flash’s v1.0

The result:


This was very impressive. Seems to have very good OCR under the hood. Go ahead and test the code yourself and compare it to the table.

Closing Thoughts

The arrival of Reka Flash is indeed a noteworthy leap in the realm of artificial intelligence, presenting itself as a fairly impressive model with considerable potential. As a testament to its capabilities, my initial interaction with the model suggests there’s much to be explored and harnessed within its sophisticated architecture. However, to fully grasp the extent of its prowess, further experimentation and exploration are essential.

While Reka Flash positions itself as a high-caliber model, it’s important to note that this isn’t the pinnacle of Reka’s innovation. The impending release of Reka Core looms on the horizon, teasing the promise of an even more powerful tool in the AI toolkit. Given what we’ve seen from Reka Flash and Reka Edge, expectations are high for what Reka Core will bring to the table.

The anticipation of Reka Core brings about contemplation of Reka’s trajectory among the constellation of companies in the LLM (large language model) space. It’s an arena filled with heavyweights and emerging challengers, each vying to push the boundaries of what’s possible. In such a competitive market, Reka’s strategy and offerings will be crucial factors.

An unfortunate caveat to the excitement around Reka’s models is the lack of availability of their weights. The AI community thrives on shared knowledge and the ability to build upon others’ work; the inaccessible weights mean that some practitioners and researchers will miss out on the chance to delve deeper into the inner workings and potential applications of these models.

As we look towards what’s next, it’s clear that Reka is carving out its own path in the AI landscape. With the balance between efficiency and power in Reka Flash and Reka Edge, coupled with the anticipated launch of Reka Core, there’s a palpable buzz around where this AI company is headed. One thing is certain: the AI community is watching, waiting, and eager to see how Reka’s contributions will shape the future of technology.

Using SQL Window Functions


In the realm of data analysis and database management, mastering SQL window functions is pivotal for anyone aiming to gain deeper insights from complex datasets. These powerful tools extend the capabilities of SQL beyond the realms of simple queries, enabling analysts to perform sophisticated calculations across sets of rows related to the current query. Whether it’s calculating running totals, performing rankings, or computing moving averages, SQL window functions provide the efficiency and flexibility required to handle advanced data manipulation tasks with ease.

Introduction to SQL Window Functions

This diagram shows that SQL Window Functions consist of three main components: the Frame Clause, the Order By Clause, and the Window Function Types. The Frame Clause specifies the rows that are included in the window, while the Order By Clause determines the order of the rows. The Window Function Types include Ranking Functions, Aggregate Functions, and Analytic Functions. Ranking Functions include RANK, DENSE_RANK, ROW_NUMBER, and NTILE. Aggregate Functions include SUM, AVG, MIN, MAX, and COUNT. Analytic Functions include LAG, LEAD, FIRST_VALUE, and LAST_VALUE.

Importance of SQL Window Functions in Data Analysis

One might spend years navigating the depths of SQL without touching upon the powerful suite of SQL window functions, unaware of its capabilities. It’s not until you’re faced with a complex analytical problem that you realize the true value they hold. Picture yourself sifting through voluminous tables where single records—like the most recent entry out of a repeating group—play a crucial role in your analysis. This is where window functions shine, simplifying what would otherwise involve convoluted operations.

Imagine the need to analyze time series data or track status changes across rows that share a relationship, but are not necessarily adjacent. SQL window functions adeptly cater to these scenarios, granting the ability to compute on surrounding rows, such as generating running totals, without breaking a sweat. For data analysts, they become indispensable when working with chronological data, mainly when the context of time is paramount.

Consider, for instance, the task of ascertaining the elapsed time between events. Using SQL window functions, specifically LAG with an offset of one, you can easily peer into the previous row of data. Partitioned by asset ID and ordered by a timestamp, this function allows for pinpoint accuracy in identifying the timing and nature of past events. This capability is invaluable for error-checking sequences—such as erroneous consecutive start events—and for maintaining the integrity of your analysis.

Furthermore, window functions excel in relative analysis, like establishing that “this record is x% of the total for this person.” They offer a level of detail and precision in aggregative comparisons that would be cumbersome to achieve otherwise. The alternative approach, which often involves correlated subqueries, can quickly become inefficient and unwieldy as the size of the result set increases.

Let’s take the case of accumulating sums over time. With a list detailing monthly expenses, and the goal to present a cumulative sum up to any given point in the fiscal year, a window function not only accomplishes this with ease but also with remarkable performance efficiency.

This efficiency stems from the core advantage of window functions: they avoid the need for repeatedly scanning the same table or joining a table to itself, which can be costly in terms of resources. Their ability to peer across rows that share a certain logic, coupled with their impressive performance even on large datasets, makes them not just a tool but a powerhouse at the disposal of any data analyst.

Source: Toptal

The diagram shows two types of window functions: aggregate functions and window functions. Aggregate functions, such as SUM and AVG, are used to calculate a single value for a group of rows. Window functions, such as OVER, PARTITION, and ORDER BY, are used to calculate a value for each row within a group of rows.

In short, SQL window functions are powerful—extremely so. The performance

Understanding the Basics of Window Functions

Let’s consider a hypothetical scenario where we have a table named orders that contains information about orders placed by customers, including the order_idcustomer_idorder_date, and order_status.To illustrate the use of SQL window functions, we’ll focus on calculating the number of days it takes for each order to be shipped, as well as the total number of orders placed by each customer up to the current order.Here’s an example query using SQL window functions to achieve this:

WITH order_lag AS (
    LAG(order_date) OVER (PARTITION BY customer_id ORDER BY order_date) AS previous_order_date
  COALESCE(order_date - previous_order_date, 0) AS days_to_ship,
  order_status = 'shipped'

In this query, we first create a Common Table Expression (CTE) named order_lag to calculate the lagged order_date for each row based on the customer_id. The LAG() function is a window function that accesses a row at a specified physical offset that comes before the current row.Next, we use the COALESCE() function to calculate the number of days it takes for each order to be shipped by subtracting the previous_order_date from the order_date. If there’s no previous order, we set the value to 0.Finally, we use the COUNT() window function with the OVER() clause to calculate the total number of orders placed by each customer up to the current order. The ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW clause specifies that the window should include all rows from the start of the partition up to the current row.By using SQL window functions, we can efficiently analyze time series data and track status changes across rows without the need for complex subqueries or self-joins.

Best Practices for Using SQL Window Functions

  1. Understand the use cases: SQL window functions are powerful tools for analyzing data, but they can be complex and resource-intensive. Make sure you understand the use cases and the specific problems you’re trying to solve before using them.
  2. Choose the right window function: SQL provides several window functions, including SUM()AVG()MIN()MAX()COUNT()ROW_NUMBER()RANK()DENSE_RANK()NTILE()LAG()LEAD(), and FIRST_VALUE(). Choose the right function for your specific use case.
  3. Use window functions with caution: Window functions can be resource-intensive, especially when working with large datasets. Use them judiciously and test their performance before deploying them in production.
  4. Use window functions with appropriate window clauses: Window functions require window clauses to define the window over which the function is applied. Make sure you understand the different window clauses, including ROWSRANGE, and GROUPS, and use them appropriately.
  5. Use window functions with appropriate partitioning: Window functions can be partitioned to apply the function to subsets of the data. Make sure you understand how partitioning works and use it appropriately to improve performance and accuracy.

What is Pydantic and Why It’s Useful for AI?


Pydantic is a popular open-source Python library for data validation and modeling. It offers tools to define the structure and rules of your data, ensuring its consistency and reliability. Pydantic is looking to have a lot of potential in AI, in regards to data preprocessing and cleaning.

Its ability to validate and serialize data makes it an ideal choice for handling the large and complex datasets often used in AI applications. Additionally, Pydantic’s support for type annotations and type checking can help catch errors early in the development process, making it easier to build and maintain reliable AI systems. Not just that but, Pydantic’s integration with popular AI libraries such as TensorFlow and PyTorch, allows for seamless data manipulation and model training.

Why Use Pydantic

Data Validation

Pydantic enforces data types and constraints you define, catching invalid entries before they cause issues. This is crucial in AI, where incorrect data can lead to biased or inaccurate models.

Data validation is a process that ensures the data entered into a system is correct and useful. It checks the accuracy and quality of data before it’s processed. Here are a few examples of data validation using the Pydantic library in Python:

  1. Type Hints Validation: Pydantic uses Python type hints to validate data. For instance, in the following code, the Fruit class has attributes namecolorweight, and bazam with specific type hints. Pydantic validates the data against these type hints. If the data doesn’t match the type hints, a validation error is raised.
from typing import Annotated, Dict, List, Literal, Tuple
from pydantic import BaseModel

class Fruit(BaseModel):
    name: str
    color: Literal['red', 'green']
    weight: Annotated[float, Gt(0)]
    bazam: Dict[str, List[Tuple[int, bool, float]]]

        bazam={'foobar': [(1, True, 0.1)]}
  1. Strict Mode Validation: Pydantic also has a strict mode where types are not coerced and a validation error is raised unless the input data exactly matches the schema or type hint. Here’s an example:
from datetime import datetime
from pydantic import BaseModel, ValidationError

class Meeting(BaseModel):
    when: datetime
    where: bytes

    m = Meeting.model_validate(
        {'when': '2020-01-01T12:00', 'where': 'home'}, 
except ValidationError as e:
  1. Custom Validators: Pydantic allows for customizing validation via functional validators. For instance, in the following code, a custom validator is used to check if the when attribute is ‘now’ and if so, it returns the current datetime.
from datetime import datetime, timezone
from pydantic import BaseModel, field_validator

class Meeting(BaseModel):
    when: datetime

    @field_validator('when', mode='wrap')
    def when_now(cls, input_value, handler):
        if input_value == 'now':
        when = handler(input_value)
        if when.tzinfo is None:
            when = when.replace(tzinfo=timezone.utc)
        return when

These examples demonstrate how Pydantic can be used for data validation in Python, ensuring that the data being processed matches the expected types and constraints

Data Modeling

Define the structure of your data, including nested objects and relationships. This makes it easier to work with complex data sets and helps keep your code organized.


Convert data between different formats like JSON, Python dictionaries, and others. This allows seamless integration with external APIs and data sources.

How is Pydantic Useful in AI?

One of the burgeoning challenges in the realm of artificial intelligence (AI), particularly when working with Large Language Models (LLMs), is structuring responses. These sophisticated models can generate vast quantities of unstructured data, which then necessitates meticulous organization. This is where Pydantic, a data validation and settings management library in Python, steps in with an elegant solution. It simplifies the formidable task by enabling developers to define a clear model for their data, ensuring that the responses from LLMs are well-structured and conform to expected formats.

Leveraging Models to Structure Large Language Model Responses

When interfacing with LLMs, it’s crucial to not just receive data but to parse and utilize it effectively. Pydantic facilitates this by allowing the creation of models that serve as blueprints for the expected data. This means that developers can predefine the structure, types, and requirements of the data they are to handle, making it easier to manage and ensuring that the information is in the correct form for further processing or analysis.

Pydantic 2.7: Optional Support for Incomplete JSON Parsing

The upcoming Pydantic version 2.7 introduces optional support for parsing and validating incomplete JSON, which is particularly beneficial for AI applications. This feature aligns perfectly with the needs of developers processing streamed responses from an LLM. Instead of waiting for the entire payload, developers can start processing the data as it arrives, enabling real-time data handling and reducing latency in the AI system’s response.

Integration with DSPy and JSON Schemas

Furthermore, there is ongoing experimentation with combining DSPy, Pydantic types, and JSON Schemas to further enhance data validation and transformation capabilities. Such integrations broaden the potential applications of Pydantic in the AI space by leveraging the advantages of each tool, leading to more robust and versatile data handling solutions.

OpenAI Function Calls and Query Plans

An often-underappreciated aspect of OpenAI’s capabilities is its function calling feature that permits the generation of entire query plans. These plans can be represented by nested Pydantic objects, adding a structured and executable layer over retrieval and Reading Comprehension Answer Generator (RAG) pipelines. By adopting this method, developers can obtain plan-and-execute capabilities which allow for handling intricate queries over assorted data sources. An example of this in practice is LlamaIndex, which capitalizes on such a layered approach to access and for generating structured data.

Getting Started with DSPy for Beginners

If you’re new to the world of language models and prompt engineering, getting started with DSPy might seem daunting at first. However, DSPy offers a beginner-friendly tutorial that can help you get up to speed quickly. While DSPy may not be the most efficient tool for simple language model tasks, it really shines when it comes to more complex tasks such as knowledge database lookups, chain of thought reasoning, and multi-hop lookups.

One of the biggest advantages of DSPy is its clean class-based representation of the workflow, which makes it easier to solve for the best prompt structure to solve a problem. DSPy also promises to eliminate tedious prompt engineering by training prompts on a set of examples. By simulating the code on the inputs and making one or more simple zero-shot calls that respect the declarative signature, DSPy provides a highly-constrained search process that can automate and optimize the prompt generation process.

So, while DSPy may not be suitable for all tasks, it can offer significant advantages for more complex tasks by automating and optimizing the prompt generation process. Whether you’re a seasoned language model expert or just getting started, DSPy is definitely worth checking out.


Getting started with DSPy is relatively sytraight forward, thanks to the comprehensive documentation and beginner-friendly Collab Notebook provided by the DSPy team. The notebook introduces the DSPy framework for programming with foundation models, including language models (LMs) and retrieval models (RMs).

One of the key features of DSPy is its emphasis on programming over prompting. Instead of relying solely on prompt engineering, DSPy provides a minimalistic set of Pythonic operations that compose and learn, allowing you to express complex tasks in a familiar syntax.

DSPy provides composable and declarative modules for instructing LMs, making it easy to define the steps of your program in a clear and concise way. On top of that, DSPy includes an automatic compiler that teaches LMs how to conduct the declarative steps in your program. The compiler will internally trace your program and then craft high-quality prompts for large LMs or train automatic finetunes for small LMs to teach them the steps of your task.

To get started with DSPy, simply follow the installation instructions provided in the documentation. Once you have DSPy installed, you can open the beginner-friendly Collab Notebook and start exploring the framework’s features and capabilities. The notebook includes a series of examples and exercises that will help you get up to speed quickly and start building your own programs with DSPy.

This code prepares your environment to use DSPy. It checks if you have the necessary libraries installed and sets up a cache for faster data access. Finally, it makes the DSPy library available for you to use.

%load_ext autoreload
%autoreload 2

import sys
import os

try: # When on google Colab, let's clone the notebook so we download the cache.
    import google.colab
    repo_path = 'dspy'
    !git -C $repo_path pull origin || git clone $repo_path
    repo_path = '.'

if repo_path not in sys.path:

# Set up the cache for this notebook
os.environ["DSP_NOTEBOOK_CACHEDIR"] = os.path.join(repo_path, 'cache')

import pkg_resources # Install the package if it's not installed
if not "dspy-ai" in {pkg.key for pkg in pkg_resources.working_set}:
    !pip install -U pip
    !pip install dspy-ai
    !pip install openai~=0.28.1
    # !pip install -e $repo_path

import dspy

Getting Started

This code sets up DSPy to work with two different language models: a text generator (GPT-3.5-turbo) and a knowledge retriever that can access information from Wikipedia (ColBERTv2). This combination allows DSPy to generate text while also incorporating knowledge from a vast database.

turbo = dspy.OpenAI(model='gpt-3.5-turbo')
colbertv2_wiki17_abstracts = dspy.ColBERTv2(url='')

dspy.settings.configure(lm=turbo, rm=colbertv2_wiki17_abstracts)

Building A Q&A

The code loads a tiny sample from a dataset called HotPotQA, which contains questions and answers.

from dspy.datasets import HotPotQA

# Load the dataset.
dataset = HotPotQA(train_seed=1, train_size=20, eval_seed=2023, dev_size=50, test_size=0)

# Tell DSPy that the 'question' field is the input. Any other fields are labels and/or metadata.
trainset = [x.with_inputs('question') for x in dataset.train]
devset = [x.with_inputs('question') for x in]

len(trainset), len(devset)

DSPy requires minimal labeling: you only need labels for the initial question and final answer, and it figures out the rest.

train_example = trainset[0]
print(f"Question: {train_example.question}")
print(f"Answer: {train_example.answer}")

While this example uses an existing dataset, you can also define your own data format using dspy.Example.

How DSPy works behind the scenes to LLMs

Key points:

  • Clean Separation: You focus on designing the information flow of your program (like steps needed to answer a question), while DSPy handles how to use the LLM effectively for each step.
  • Automatic Optimization: DSPy figures out the best way to “talk” to the LLM (e.g., what prompts to use) to achieve your desired outcome.
  • Comparison to PyTorch: If you know PyTorch (a framework for machine learning), think of DSPy as a similar tool but specifically for working with LLMs.
class BasicQA(dspy.Signature):
    """Answer questions with short factoid answers."""

    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")


  • Think of it as a recipe for giving instructions to the LLM.
  • It tells the LLM:
    • What kind of work it needs to do (e.g., answer a question).
    • What information it will receive (e.g., the question itself).
    • What kind of answer it should give (e.g., the answer to the question).
  • Each piece of information (question, answer) is called a “field.”
  • You can customize it for different tasks, like giving the LLM a long text and asking it to summarize it.


  • Once you have a signature, you create a “predictor.”
  • Think of it as a skilled chef who follows the recipe (signature) and uses the LLM (ingredients) to cook the dish (answer).
  • Importantly, this chef can learn and adapt! As you use the predictor with different examples, it gets better at using the LLM to achieve the desired outcome.
# Define the predictor.
generate_answer = dspy.Predict(BasicQA)

# Call the predictor on a particular input.
pred = generate_answer(question=dev_example.question)

# Print the input and the prediction.
print(f"Question: {dev_example.question}")
print(f"Predicted Answer: {pred.answer}")

Building the RAG

This example shows how to create a program in DSPy that answers questions using relevant information from Wikipedia. The program retrieves the top 3 relevant passages from Wikipedia based on the question. Then it uses those passages as context to generate an answer using an LLM.

class GenerateAnswer(dspy.Signature):
    """Answer questions with short factoid answers."""

    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")

Putting it Together

Here’s a simplified explanation of the last part on Basic Retrieval-Augmented Generation (RAG):

Building a program to answer questions:

  • This example shows how to create a program in DSPy that answers questions using relevant information from Wikipedia.
  • The program retrieves the top 3 relevant passages from Wikipedia based on the question.
  • Then it uses those passages as context to generate an answer using an LLM.

Putting it together:

  • First, we define a “signature” called GenerateAnswer which specifies the task:
    • Input: context (relevant facts) and question.
    • Output: answer (short factoid).
  • Next, we create a program called RAG that inherits from dspy.Module.
    • It has two sub-modules:
      • dspy.Retrieve: finds relevant passages.
      • dspy.ChainOfThought: generates an answer using the retrieved context and the question.
    • The forward method defines the main steps:
      1. Retrieve relevant passages using self.retrieve.
      2. Generate an answer using self.generate_answer with the retrieved context and the question.
      3. Return the answer along with the retrieved context.
class RAG(dspy.Module):
    def __init__(self, num_passages=3):

        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
    def forward(self, question):
        context = self.retrieve(question).passages
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=prediction.answer)


Now lastly, we just need to compile the RAG. Compiling fine-tunes the program using examples and a metric. Teleprompters are like AI chefs who improve the program’s instructions to the LLM. This is similar to training a neural network, but uses prompts instead of direct parameter updates.

from dspy.teleprompt import BootstrapFewShot

# Validation logic: check that the predicted answer is correct.
# Also check that the retrieved context does actually contain that answer.
def validate_context_and_answer(example, pred, trace=None):
    answer_EM = dspy.evaluate.answer_exact_match(example, pred)
    answer_PM = dspy.evaluate.answer_passage_match(example, pred)
    return answer_EM and answer_PM

# Set up a basic teleprompter, which will compile our RAG program.
teleprompter = BootstrapFewShot(metric=validate_context_and_answer)

# Compile!
compiled_rag = teleprompter.compile(RAG(), trainset=trainset)

And when the RAG is tried out.

# Ask any question you like to this simple RAG program.
my_question = "What castle did David Gregory inherit?"

# Get the prediction. This contains `pred.context` and `pred.answer`.
pred = compiled_rag(my_question)

# Print the contexts and the answer.
print(f"Question: {my_question}")
print(f"Predicted Answer: {pred.answer}")
print(f"Retrieved Contexts (truncated): {[c[:200] + '...' for c in pred.context]}")
Question: What castle did David Gregory inherit?
Predicted Answer: Kinnairdy Castle
Retrieved Contexts (truncated): ['David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinn...', 'Gregory Tarchaneiotes | Gregory Tarchaneiotes (Greek: Γρηγόριος Ταρχανειώτης , Italian: "Gregorio Tracanioto" or "Tracamoto" ) was a "protospatharius" and the long-reigning catepan of Italy from 998 t...', 'David Gregory (mathematician) | David Gregory (originally spelt Gregorie) FRS (? 1659 – 10 October 1708) was a Scottish mathematician and astronomer. He was professor of mathematics at the University ...']


The key point is that DSPy makes it easier to build programs that use LLMs by automating some of the complex steps involved. For beginners, DSPy in my opinion is Potentially challenging. DSPy assumes some understanding of large language models, machine learning concepts, and Python programming. The documentation and examples use technical terms and require familiarity with these areas. It’s defenitly not going to be as plug and play as other tools for example to build agents. There is quite a bit of a steep learning curve as well. While DSPy simplifies some aspects of working with LLMs, understanding its core concepts and building programs might require significant effort for someone new to these fields. DSPy is not inherently “simple” but aims to offer a more manageable way to work with LLMs for those who already have the necessary background.

What is DSPy? Will it Challenge LLM Frameworks

DSPy, now stands for 𝗗eclarative 𝗦elf-improving Language 𝗣rograms (in p𝘆thon), according to Omar Khattab, author of DSPy. DSPy is a framework developed by StanfordNLP for algorithmically optimizing language model (LM) prompts and weights, particularly when LMs are used multiple times within a pipeline. It helps in separating the flow of a program from the parameters, such as prompt instructions, few-shot examples, and LM weights.

This is helpful since this separation simplifies the process of using language models to build a complex system by eliminating the need to manually tweak prompts and finetune LMs, which can be hard and messy. DSPy abstracts LM pipelines as text transformation graphs, allowing for the automatic optimization of prompt structures to solve specific problems. It also provides a clean class-based representation of workflows and a way to solve for the best prompt structure, promising to eliminate tedious prompt engineering. Essentially, DSPy aims to streamline the use of LMs in complex systems by automating the optimization of prompt structures and finetuning steps, thereby reducing the manual effort and complexity involved in using LMs within a pipeline.

DSPy Key Features

DSPy is a framework for optimizing large language model (LM) prompts and weights, especially in complex pipelines. Its key features include:

  1. Separation of program flow and parameters: DSPy separates the flow of the program (modules) from the parameters (LM prompts and weights) of each step, making it easier to optimize and modify the system.
  2. LM-driven optimizers: DSPy introduces new optimizers that can tune the prompts and/or the weights of LM calls to maximize a given metric. These optimizers are LM-driven algorithms that generate effective prompts and weight updates for each LM in the pipeline.
  3. Improved reliability and performance: DSPy can teach powerful models like GPT-3.5 or GPT-4 to be more reliable and avoid specific failure patterns. It can also improve the performance of local models like T5-base or Llama2-13b.
  4. Systematic approach: DSPy provides a more systematic approach to solving hard tasks with LMs, reducing the need for manual prompting and one-off synthetic data generators.
  5. General-purpose modules: DSPy provides general-purpose modules like ChainOfThought and ReAct, which replace string-based prompting tricks and make it easier to build complex systems with LMs.
  6. Compilation process: DSPy compiles the same program into different instructions, few-shot prompts, and/or weight updates for each LM, allowing for more effective and efficient use of LMs in the pipeline.

How does DSPy work?

The DSPy framework, as described in the provided document, works by integrating LM Assertions as a programming construct for expressing computational constraints that language models (LMs) should satisfy. These constraints ensure that the LM pipeline’s behavior aligns with specified invariants or guidelines, enhancing the reliability, predictability, and correctness of the pipeline’s output. The LM Assertions are categorized into two well-defined programming constructs, namely Assertions and Suggestions, denoted by the constructs Assert and Suggest. They enforce constraints and guide an LM pipeline’s execution flow. The Assert construct offers a sophisticated retry mechanism, while supporting a number of other new optimizations. On an Assert failing, the pipeline transitions to a special retry state, allowing it to reattempt a failing LM call while being aware of its previous attempts and the error message raised. If, after a maximum number of self-refinement attempts, the assertion still fails, the pipeline transitions to an error state and raises an AssertionError, terminating the pipeline.

Essentially, it helps make language models more reliable and predictable by adding a new programming construct called LM Assertions. These assertions allow you to specify rules or guidelines that the LM should follow when generating output.There are two types of assertions: Assert and Suggest. The Assert construct enforces a strict rule that the LM must follow, while the Suggest construct provides a guideline that the LM should try to follow. If an Assert fails, the LM will try to fix the error and retry the failed call, up to a maximum number of times. If it still fails after the maximum number of attempts, an error is raised and the pipeline is terminated.This retry mechanism and other optimizations make it easier to build complex LM pipelines that produce reliable and correct output. By using LM Assertions, you can ensure that your LM pipeline behaves as expected and avoid common failure patterns.

Advantages of using DSPy

  1. Improved reliability and predictability: By specifying constraints and guidelines for the LM pipeline, you can ensure that the output is reliable and predictable, even in complex scenarios.
  2. Enhanced correctness: LM Assertions help ensure that the LM pipeline’s output is correct and aligns with the specified invariants or guidelines.

Also note that this is not a direct competitor to Langchain, as a matter of fact the two could actually be used together.

Examples and Use Cases

DSPy isn’t just another LLM framework; it’s a potential game-changer for agent development. Unlike pre-defined workflows in tools like Langchain, DSPy lets you programmatically guide LLMs with declarative modules. No more hand-crafted prompts – build agents who reason, retrieve information, and learn through composed modules like ChainOfThought and ReAct.

This opens doors to agents who answer your questions with clear steps, summarize complex topics with external knowledge, and even engage in creative content generation with defined styles. While both DSPy and Langchain aim to empower LLMs, DSPy’s focus on programmability and learning gives you unmatched control and interpretability. It’s akin to building modular robots instead of pre-programmed machines, opening a new chapter in the evolution of intelligent agents. Note that a lot of this is still in the early days and are constantly having changes and updates.

Getting Started with DSPy

Here are some resources to get you started with DSPy. In another blog post, we’ll discuss and walk through setting up DSPy for a beginner.

Official Documentation and Tutorials:


  • Follow the installation instructions based on your environment (Python, Google Colab) on the official website.

Additional Resources:


  • Start with the tutorials to get a basic understanding of DSPy’s concepts and workflow.
  • Explore the community projects for inspiration and learn from others’ implementations.
  • Don’t hesitate to experiment and try different modules and functionalities.
  • Join the DSPy community/discord forum or discussions to ask questions and connect with other users.

Remember, DSPy is an actively developed framework, so stay updated with the latest documentation and releases. Most importantly, have fun and explore the possibilities of programming LLMs with DSPy.

What is LangGraph?

large language models (LLMs) that maintain state, and it is built upon LangChain with the intention of being used in conjunction with it.

LangGraph expands the capabilities of the LangChain Expression Language by enabling the coordination of multiple chains or actors across multiple computational steps in a cyclical manner. This design is influenced by Pregel and Apache Beam. The current interface is modeled after NetworkX.The primary function of LangGraph is to introduce cycles into your LLM application. It is essential to note that this is NOT a directed acyclic graph (DAG) framework. If you wish to create a DAG, you should utilize the LangChain Expression Language directly. Cyclical structures are vital for agent-like behaviors, as they allow you to repeatedly call an LLM in a loop and request its next action.

How it works

Concept of stateful, multi-actor applications and how LangGraph enables their creation using LLMs

LangGraph is a library that enables the creation of stateful, multi-actor applications with LLMs (LangModel Models) using LangChain. It extends the LangChain Expression Language, allowing the coordination of multiple chains or actors across multiple steps of computation in a cyclic manner. This is particularly useful for building agent-like behaviors, where an LLM is called in a loop to determine the next action.

The concept of stateful, multi-actor applications is central to LangGraph. It allows the creation of applications where multiple actors (representing different components or entities) maintain their state and interact with each other in a coordinated manner. This is achieved by defining a StatefulGraph, which is parameterized by a state object that is passed around to each node. Each node then returns operations to update that state. These operations can either set specific attributes on the state or add to the existing attributes. The main type of graph in LangGraph is the StatefulGraph, which facilitates the management of state within the application.

Essentially, LangGraph enables the creation of stateful, multi-actor applications by providing a framework for coordinating multiple actors and managing their state using LLMs and LangChain.

LangGraph vs directed acyclic graph (DAG).

The main difference between LangGraph and a directed acyclic graph (DAG) is that LangGraph allows for cycles, while a DAG does not. A DAG is a directed graph with no directed cycles, meaning it is impossible to start at a vertex and follow the edges in such a way that eventually loops back to the same vertex. On the other hand, LangGraph specifically provides the ability to create cyclic behavior, allowing for repeated actions and interactions between actors in the graph

LangGraph Use Cases

Some examples of applications that can benefit from LangGraph include:

  1. Agent-like Behaviors: LangGraph is useful for applications that require agent-like behaviors, where an LLM is called in a loop, asking it what action to take next. This can be applied in chatbots, conversational agents, or any system where an agent needs to make sequential decisions based on the state of the conversation or environment. CrewAI is building something similiar to this using LangChain.
  2. Coordinating Multiple Chains or Actors: LangGraph extends the LangChain Expression Language with the ability to coordinate multiple chains or actors across multiple steps of computation in a cyclic manner. This feature is beneficial for applications that involve coordinating and managing multiple interconnected processes or actors.
  3. Web-Enabled Agents: WebVoyager, built with LangGraph is a new kind of web-browsing agent using multi-model AI.
  4. Stateful Applications: Applications that need to maintain and update a state as they progress, such as task-oriented dialogue systems, can benefit from the stateful nature of LangGraph.
  5. Custom Tool Integration: LangGraph allows the integration of custom tools, making it suitable for applications that require the use of diverse external tools and services in their decision-making processes.

LangGraph is beneficial for applications that require agent-like behaviors, coordination of multiple actors, cyclic behavior, stateful processing, and integration of custom tools. It is particularly well-suited for building complex, interactive, and stateful language-based applications.

Compare and Contrast LangGraph

Would be interesting to see how it compares to Llamaindex, PyTorch. LLM Frameworks in general seem to get a lot of flak for over-complicating things. DSPy has also been gaining popularity. DSPy, is a framework for algorithmically optimizing LM prompts and weights, especially when LMs are used multiple times within a pipeline. DSPy separates the flow of the program from the parameters of each step, allowing for more systematic and powerful optimization of LM prompts and weights. DSPy also introduces new optimizers that are LM-driven algorithms that can tune the prompts and/or weights of LM calls to maximize a given metric.

Final Thoughts

LangGraph shows promise as a valuable addition to the growing ecosystem of LangChain. With its ability to enable the creation of stateful, multi-actor applications using LLMs and LangChain, LangGraph opens up new possibilities for building complex and interactive language-based systems.

The future of agents looks promising, as they are expected to have massive use cases. While agents in the past may have been ineffective and token-consuming, advancements in technologies like LangGraph can help address these challenges. By allowing for more advanced agent runtimes from academia, such as LLM Compiler and plan-and-solve, LangChain aims to enhance the effectiveness and efficiency of agents.

Stateful tools are also on the horizon for LangChain, which would enable tools to modify the state of applications. This capability would further enhance the flexibility and adaptability of stateful, multi-actor applications, enabling them to better respond to dynamic environments.

Moreover, LangChain is actively exploring the integration of more controlled human-in-the-loop workflows. This would provide opportunities for human involvement and guidance in decision-making processes, augmenting the capabilities of automated systems.

In the future, LangGraph and LangChain are expected to continue evolving and growing, offering more advanced features and expanding their capabilities. This opens up exciting research directions and potential applications that could benefit from advancements in LangGraph and similar technologies.

Overall, LangGraph’s potential to improve agent performance, enable stateful applications, and support multi-agent workflows positions it as a promising tool in the realm of language-based systems. As the LangChain ecosystem continues to thrive and innovate, we can anticipate even more exciting developments in the near future.

Alibaba Releases Qwen 1.5

Alibaba, the world’s largest e-commerce giant in China, has released Qwen 1.5, a groundbreaking language model that has been making waves in the AI community. Developed in-house by Alibaba’s AI lab, Qwen 1.5 is the latest in line of innovative models. Back in November Alibaba released version 1 of Qwen 72B. This release includes several models, including their largest open source model, the 72B chat, which has surpassed the performance of other state-of-the-art models such as Claude 2.1 and GPT 3.5 on both MT-Bench and Alpaca-Eval v2. With a total of 6 models, Qwen 1.5 is capable of processing a 32K context length, making it a versatile and powerful tool for a wide range of applications.

Benchmarks & Performance

When it comes to benchmarks and Qwen 1.5 truly shines. In particular, the Qwen 1.5-7B model has shown impressive results in tool-use, outperforming the Mistral-7B model. This achievement highlights the robust capabilities of Qwen 1.5 in tasks requiring specialized knowledge and application.

The largest model in the Qwen 1.5 lineup, the 72B chat, delivers performance that is comparable to that of GPT-4, a highly advanced language model. This demonstrates the immense power and potential of Qwen 1.5 in leveraging artificial intelligence for complex language processing tasks.

With overall strong metrics across its different models, Qwen 1.5 offers users a reliable and efficient solution for a wide range of applications. Its impressive performance in various benchmarks showcases Alibaba’s commitment to pushing the boundaries of AI technology and delivering cutting-edge solutions to the e-commerce industry and beyond.

Closing Thoughts

In closing, Qwen 1.5 has demonstrated its remarkable capabilities and performance, particularly with its 72B model. This powerful language model exhibits performance that is comparable to, and even surpasses, Mistral-medium. This comparison serves as an encouragement for Mistral to release their proper mistral-medium model instead of relying on leaked Miqu weights. By doing so, it opens up the opportunity for further fine-tuning and improvement.

It’s worth noting that Qwen 1.5 has already paved the way for the development of a flagship LLM series called Quyen. This highlights the immense potential and impact of Qwen 1.5 in driving innovation and progress in the field of AI and language processing.

As we embrace the advancements brought forth by Qwen 1.5, we can anticipate further breakthroughs and discoveries that will shape the future of AI and its applications in various industries. Alibaba’s commitment to pushing the boundaries of AI technology is evident in the development and release of Qwen 1.5, ultimately driving progress and innovation in the e-commerce industry and beyond.

Nomic AI Releases Embeddings, A truly Open Source Embedding Model

Nomic embeded-text-v1 is the newest SOTA long-context embedding model. Tired of drowning in unstructured data – text documents, images, audio, you name it – that your current tools just can’t handle? Welcome to the open seas of understanding, where Nomic AI’s Embeddings act as your life raft, transforming this chaos into a treasure trove of insights.

Forget rigid spreadsheets and clunky interfaces. Nomic Atlas, the platform redefining how we interact with information, empowers you to explore, analyze, and structure massive datasets with unprecedented ease. But what truly sets Nomic apart is its commitment to openness and accessibility. That’s where Embeddings, their latest offering, comes in.

Embeddings are the secret sauce, the vector representations that unlock the meaning within your data. Imagine each data point as a ship on a vast, trackless ocean. Embeddings act as lighthouses, guiding you towards similar data, revealing hidden connections, and making sense of the seemingly incoherent.

And the best part? Nomic’s Embeddings are truly open source, meaning they’re free to use, modify, and share. This transparency fosters collaboration and innovation, putting the power of AI-powered analysis directly in your hands.

The Struggle with Unstructured Data

AI loves structured data. Imagine feeding spaghetti to a baby – that’s like throwing unstructured data at AI. Text documents, images, videos – a tangled mess AI struggles to digest. It craves the neat rows and columns of structured data, the spreadsheets and databases where information sits organized and labeled. Nomic open source AI’s Embeddings are transforming that spaghetti into bite-sized insights, ready for AI and unlock the hidden potential within your data.

Understanding Embeddings

Where Embedding Can Help

Embedding models have the potential to assist companies and developers in several key ways:

  • Handling Long-Form Content: Many organizations have vast troves of long-form content in research papers, reports, articles, and other documents. Embedding models can help make this content more findable and usable. By embedding these documents, the models can enable more semantic search and retrieval, allowing users to find relevant content even if the exact search keywords don’t appear in a document.
  • Auditing Model Behavior: As AI and machine learning models permeate more sensitive and critical applications, explainability and auditability become crucial. Embedding models can assist by providing a meaningful vector space that developers can analyze to better understand model behavior. By examining how certain inputs map to vector spaces, developers can gain insight into how models handle different data points.
  • Enhancing NLP Capabilities: Embedding models serve as a foundational layer that enhances many other natural language processing capabilities. By structuring language in vector spaces, embedding enables better performance downstream on tasks like sentiment analysis, topic modeling, text generation, and more. Embedding essentially extracts more understanding from text.

Embedding models empower more semantic search and retrieval, auditable model behaviors, and impactful NLP capabilities. Companies need embedders to help structure and exploit long-form content. And developers need embedding to infuse AI transparency and interpretability into sensitive applications. The vector spaces embedding provides for language are critical for many modern NLP breakthroughs.

Nomic AI’s Training Details

Nomic AI’s Embeddings boast impressive performance, and understanding their training process sheds light on this achievement. Instead of relying on a single training stage, Nomic employs a multi-stage pipeline, meticulously crafted to extract the most meaning from various sources.

Imagine baking a delicious cake. Each ingredient plays a specific role, and their careful combination creates the final masterpiece. Similarly, Nomic’s pipeline uses different “ingredients” in each stage:

Stage 1: Unsupervised Contrastive Learning:

  • Think of this as building the cake’s foundation. Nomic starts with a large, pre-trained BERT model. Think of BERT as a skilled baker with a repertoire of techniques.
  • Next, they feed BERT a unique dataset of weakly related text pairs. This might include question-answer pairs from forums like StackExchange, reviews with titles and bodies, or news articles with summaries. These pairings help BERT grasp semantic relationships between different types of text.
  • Think of this stage as BERT learning the basic grammar and flavor profiles of different ingredients.

Stage 2: Finetuning with High-Quality Labeled Data:

  • Now, the cake gets its delicious details! Here, Nomic introduces high-quality labeled datasets, like search queries and corresponding answers. These act like precise instructions for the baker, ensuring the cake isn’t just structurally sound but also flavorful.
  • A crucial step in this stage is data curation and hard-example mining. This involves selecting the most informative data points and identifying challenging examples that push BERT’s learning further. Think of this as the baker carefully choosing the freshest ingredients and mastering complex techniques.

This two-stage approach allows Nomic’s Embeddings to benefit from both the broad knowledge base of the pre-trained BERT model and the targeted guidance of high-quality labeled data. The result? Embeddings that capture rich semantic meaning and excel at various tasks, empowering you to unlock the true potential of your unstructured data.


Nomic AI’s Embeddings offer a compelling proposition: powerful performance, unparalleled transparency, and seamless integration. By reportedly surpassing OpenAI’s text-embedding-3-small model and sharing their entire training recipe openly, Nomic empowers anyone to build and understand state-of-the-art embeddings. This democratization of knowledge fosters collaboration and innovation, pushing the boundaries of what’s possible with unstructured data.

There is also seamless integration with popular LLM frameworks like Langchain and Llamaindex makes Nomic Embeddings instantly accessible to developers working on advanced search and summarization tasks. This translates to more efficient data exploration, uncovering hidden connections, and ultimately, deriving deeper insights from your information ocean.

So, whether you’re a seasoned data scientist or just starting your AI journey, Nomic Embeddings are an invitation to dive deeper. With their open-source nature, powerful performance, and seamless integration, they unlock a world of possibilities, empowering you to transform your unstructured data into a gold mine of insights.