At Google’s I/O event, the company unveiled Gemini Flash, a new model that boasts an impressive trifecta of benefits: it’s lightweight, fast, and cost-efficient. What’s more, Gemini features advanced multimodal reasoning capabilities and a groundbreaking long context window that can process up to one million tokens.
Benchmarks
Capability | GEMINI 1.0 PRO | GEMINI 1.0 ULTRA | GEMINI 1.5 PRO (Feb 2024) | GEMINI 1.5 FLASH |
---|---|---|---|---|
General – MMLU | 71.8% | 83.7% | 81.9% | 78.9% |
Code – Natural2Code | 69.6% | 74.9% | 77.7% | 77.2% |
Math – MATH | 32.6% | 53.2% | 58.5% | 54.9% |
Reasoning – GPQA (main) | 27.9% | 35.7% | 41.5% | 39.5% |
Reasoning – Big-Bench Hard | 75.0% | 83.6% | 84.0% | 85.5% |
Multilingual – WMT23 | 71.7 | 74.4 | 75.2 | 74.1 |
Image – MMMU | 47.9% | 59.4% | 58.5% | 56.1% |
MathVista | 45.2% | 53.0% | 52.1% | 54.3% |
Audio – FLEURS (55 languages) | 6.4 | 6.0 | 6.6 | 9.8 |
Video – EgoSchema | 55.7% | 61.5% | 63.2% | 63.5% |
Based on this table, we can conclude the following about Gemini Flash:
- Performance: Gemini Flash (1.5) generally performs similarly to or slightly worse than Gemini 1.5 Pro in most tasks, except for Audio – FLEURS (55 languages), where it shows a significant improvement (9.8 vs 6.6).
- Comparison to previous versions: Gemini Flash has lower scores than Gemini 1.0 Ultra in most tasks, indicating that it may not be as powerful as the Ultra version. However, it still outperforms Gemini 1.0 Pro in some tasks.
- Strengths and weaknesses: Gemini Flash seems to perform well in tasks related to reasoning (GPQA, Big-Bench Hard) and video (EgoSchema), but struggles with math-related tasks (MATH, MathVista).
- Multimodal capabilities: Gemini Flash is designed to handle multimodal input, as evidenced by its performance in tasks like Image – MMMU, Audio – FLEURS, and Video – EgoSchema.
Gemini Flash appears to be a capable AI model with strengths in certain areas, but it may not be the top-performing model in every task.
Testing The Model
You can test the model now on the Google AI Studio. Let’s start by asking it a simple coding question, “Reverse a linked list using Typescript”
class Node { data: any; next: Node | null; constructor(data: any) { this.data = data; this.next = null; } } class LinkedList { head: Node | null; constructor() { this.head = null; } // Function to reverse the linked list reverse(): void { let prev: Node | null = null; let current: Node | null = this.head; let next: Node | null = null; while (current !== null) { next = current.next; // Store the next node current.next = prev; // Reverse the link prev = current; // Move prev to the current node current = next; // Move current to the next node } this.head = prev; // Update the head to the last node } // Function to add a node to the end of the list append(data: any): void { const newNode = new Node(data); if (this.head === null) { this.head = newNode; } else { let current = this.head; while (current.next !== null) { current = current.next; } current.next = newNode; } } // Function to print the linked list print(): void { let current = this.head; let str = ""; while (current !== null) { str += current.data + " "; current = current.next; } console.log(str); } } // Example usage const list = new LinkedList(); list.append(1); list.append(2); list.append(3); list.append(4); list.append(5); console.log("Original list:"); list.print(); // Output: 1 2 3 4 5 list.reverse(); console.log("Reversed list:"); list.print(); // Output: 5 4 3 2 1
Looks great, it gave me a function to print the list and everything compiles well.
Now we will ask it a random hard Leetcode question. I will just give it the problem description along with the starter code. I don’t want to include the title or name of the problem in case it my be in the training data. I picked Leetcode 1537, Get The Maximum Score since it’s relatively newer problem.
This is the one shot answer in JavaScript:
/** * @param {number[]} nums1 * @param {number[]} nums2 * @return {number} */ var maxSum = function(nums1, nums2) { const MOD = 1000000007; let n = nums1.length; let m = nums2.length; let i = 0, j = 0; let sum1 = 0, sum2 = 0; let maxSum = 0; while (i < n || j < m) { if (i < n && (j >= m || nums1[i] < nums2[j])) { sum1 += nums1[i]; i++; } else if (j < m && (i >= n || nums2[j] < nums1[i])) { sum2 += nums2[j]; j++; } else { // nums1[i] == nums2[j] maxSum = Math.max(maxSum, sum1, sum2); sum1 += nums1[i]; sum2 += nums2[j]; i++; j++; } } maxSum = Math.max(maxSum, sum1, sum2); return maxSum % MOD; };
Unfortunately this did not pass all the test cases.
CSS Testing
Now let’s try some CSS, we will ask Gemini Flash to create a header component using Tailwind CSS.
<header class="bg-gray-900 text-white py-6 px-8 md:py-8 md:px-16"> <div class="container mx-auto flex items-center justify-between"> <a href="/" class="text-2xl font-bold"> My Website </a> <nav> <ul class="flex space-x-6"> <li> <a href="#" class="hover:text-gray-200">Home</a> </li> <li> <a href="#" class="hover:text-gray-200">About</a> </li> <li> <a href="#" class="hover:text-gray-200">Blog</a> </li> <li> <a href="#" class="hover:text-gray-200">Contact</a> </li> </ul> </nav> </div> </header>
A simple but great header. Nothing crazy, but it can be great for creating general boilerplate.
Conclusion
My experience with Gemini Flash has been relatively positive from a coding perspective. While it’s not a revolutionary model that can perform tasks that other models can’t, it’s a solid option for simple coding tasks and boilerplate generation. One of its most attractive features is its affordable API pricing, with a cost of only 35 cents per 1 million tokens, making it a more budget-friendly option compared to models like GPT 3.5, which costs $0.50 per 1 million tokens. Just remember that Gemini Flash is the lightweight version of the Gemini models, and if you need to tackle more complex and intense coding tasks, you may want to consider more advanced models. Even within the Gemini family, the Pro 1.5 version appears to have better coding capabilities, as seen from the benchmarks. For the most part Gemini Flash is a good choice for simple coding tasks and those on a budget, but for more demanding projects, it’s worth exploring other options.