In a move that turned heads and sparked instant debate, open-source model startup Mistral AI released their latest LLM not with a splashy launch event or polished press release, but with a simple tweet containing a single link: a magnet URL for a massive torrent file.
This audacious approach stands in stark contrast to the carefully orchestrated media blitz that accompanied Google’s recent Gemini launch, or the “over-rehearsed professional release video talking about a revolution in AI” that OpenAI’s Andrej Karpathy mocked on social media. While other companies were busy crafting narratives and highlighting their technological prowess, Mistral simply dropped the mic with a torrent link.
The LLM in question, MoE 8x7B, has generated immediate buzz within the AI community. Described by some as a “scaled-down GPT-4,” it’s believed to be a Mixture of Experts model with 8 individual experts, each possessing 7 billion parameters. This architecture mirrors what we know of GPT-4, albeit with significantly fewer parameters.
This bare-bones release, devoid of any formal documentation or promotional materials, is characteristic of Mistral AI. As AI consultant and community leader Uri Eliabayev noted, “Mistral is well-known for this kind of release, without any paper, blog, code or press release.” While some may find this approach unconventional, it has undoubtedly generated a significant amount of attention and speculation. As open source AI advocate Jay Scambler aptly put it, “It’s definitely unusual, but it has generated quite a bit of buzz, which I think is the point.”
Whether this unorthodox approach marks a new era of open-source AI development remains to be seen. However, one thing is certain: Mistral AI has succeeded in capturing the imagination of the AI community, and their enigmatic release has sparked important conversations about transparency, accessibility, and the future of large language models.
Details of the Release
In a tweet Mistral drops a torrent link containing 8x 7B MoE model but, there were no further details.
Mistral AI provided minimal details about the release of MoE 8x7B, opting for a cryptic tweet containing only a torrent link. However, some insights can be gleaned from the limited information available.
- Model Parameters: The
params.jsonfile reveals several key parameters:
- Hidden dimension: 14336 (3.5x expansion)
- Dimension: 4096
- Number of heads: 32 (4x multiquery)
- Number of KV heads: 8
- Mixture of Experts (MoE): 8 experts, top 2 used for inference
- Related Code: While no official code for MoE 8x7B is available, the GitHub repository for
megablocks-publiclikely contains relevant code related to the model’s architecture.
- Noticeably Absent: Unlike many other LLM releases, MoE 8x7B was not accompanied by a polished launch video or press release.
These details suggest that MoE 8x7B is a powerful LLM with a unique architecture. The MoE approach allows for efficient inference by utilizing only the top 2 experts for each token, while still maintaining high performance. The 3.5x expansion of the hidden dimension and 4x multiquery further enhance the model’s capabilities.
The timing of the release, just before the NeurIPS conference, suggests that Mistral AI may be aiming to generate interest and discussion within the AI community. The absence of a traditional launch event is likely intentional, as it aligns with Mistral’s more open-source and community-driven approach.
While the lack of detailed information may leave some wanting more, it also fosters a sense of mystery and excitement. This unorthodox approach has undoubtedly captured the attention of the AI community, and we can expect to see further analysis and experimentation with MoE 8x7B in the coming weeks and months.
The parameters in the JSON file provide some insights into the architecture of MoE 8x7B. The hidden dimension is 14336, which is 3.5 times larger than the dimension of 4096. This suggests that MoE 8x7B is a very powerful LLM with a high degree of complexity.
The number of heads is 32 and the number of KV heads is 8. This indicates that MoE 8x7B uses a multiquery attention mechanism, which allows it to process multiple input sequences simultaneously.
The MoE architecture is a type of Mixture of Experts model, which means that it consists of multiple experts, each specialized to perform a different task. In the case of MoE 8x7B, there are 8 experts. Only the top 2 experts are used for inference, which allows for efficient computing and high performance.
Basically, the parameters in the JSON file suggest that MoE 8x7B is a cutting-edge LLM with a unique architecture. It is still too early to say how well MoE 8x7B will perform on real-world tasks, but it is certainly a promising development in the field of AI.
What could this mean for the future of AI?
The release of MoE 8x7B demonstrates a few important trends in the field of AI:
- The increasing importance of open source software. MoE 8x7B is an open-source LLM, which means that anyone can download and use it for free. This is a significant development, as it democratizes access to powerful AI technology.
- The rise of new LLM architectures. MoE 8x7B is a Mixture of Experts model, which is a relatively new type of LLM architecture. This suggests that the field of LLM research is still evolving and that there is still significant room for innovation.
- The increasing focus on efficiency and performance. MoE 8x7B is designed to be efficient and performant, even when used on resource-constrained devices. This is important for enabling the use of LLMs in real-world applications.
This release of MoE 8x7B is a positive development for the future of AI. It demonstrates the power of open source software, the rise of new LLM architectures, and the increasing focus on efficiency and performance. It is likely that we will see more and more innovative and powerful LLMs being released in the coming years, and MoE 8x7B is a clear example of this trend.