Learning Library

← Back to Library

Small Labs Lead Voice AI Innovation

4m • Unknown Channel • ai-ml • news • beginner • Watch on YouTube ↗

Key Points

Apple announced it won’t release an LLM‑powered Siri until at least 2027, meaning its voice assistant will continue lagging behind newer competitors.
Amazon’s new Alexa Plus demonstrates a growing trend of major platforms partnering with smaller LLM creators, as it is powered by Anthropic’s Claude.
Sesame (sesame.com), a fresh AI research shop, focuses on ultra‑human vocal inflection, producing a companion that convincingly tricks listeners into treating it like a real person.
The speaker notes that while small labs (e.g., Sesame, 11 Labs) excel at building high‑quality voice experiences, big tech firms have the scale to deploy them, suggesting more M&A and partnership opportunities in the voice space.
A parallel pattern emerges in other generative domains—image and video—where niche innovators (e.g., Midjourney, Stable Diffusion) lead the tech while large companies rely on distribution, and independent LLM leaders like OpenAI and Anthropic remain the top performers.

Sections

00:00:00 Voice AI Landscape Shifts - The speaker highlights Apple’s Siri delay, Amazon’s Alexa+ powered by the smaller‑company Claude model, and Sesame.com’s hyper‑human vocal AI that blurs the line between machine and person.

Full Transcript

# Small Labs Lead Voice AI Innovation **Source:** [https://www.youtube.com/watch?v=Vn9tVViLyRA](https://www.youtube.com/watch?v=Vn9tVViLyRA) **Duration:** 00:04:12 ## Summary - Apple announced it won’t release an LLM‑powered Siri until at least 2027, meaning its voice assistant will continue lagging behind newer competitors. - Amazon’s new Alexa Plus demonstrates a growing trend of major platforms partnering with smaller LLM creators, as it is powered by Anthropic’s Claude. - Sesame (sesame.com), a fresh AI research shop, focuses on ultra‑human vocal inflection, producing a companion that convincingly tricks listeners into treating it like a real person. - The speaker notes that while small labs (e.g., Sesame, 11 Labs) excel at building high‑quality voice experiences, big tech firms have the scale to deploy them, suggesting more M&A and partnership opportunities in the voice space. - A parallel pattern emerges in other generative domains—image and video—where niche innovators (e.g., Midjourney, Stable Diffusion) lead the tech while large companies rely on distribution, and independent LLM leaders like OpenAI and Anthropic remain the top performers. ## Sections - [00:00:00](https://www.youtube.com/watch?v=Vn9tVViLyRA&t=0s) **Voice AI Landscape Shifts** - The speaker highlights Apple’s Siri delay, Amazon’s Alexa+ powered by the smaller‑company Claude model, and Sesame.com’s hyper‑human vocal AI that blurs the line between machine and person. ## Full Transcript

0:00this one is all about voice uh we're 0:02going to start with apple apple has 0:04apparently decided that they are not 0:06going to be producing a new version of 0:08Siri that is based on a smart large 0:10language model until at least 0:142027 so more than two years away at this 0:17point most likely which means Siri is 0:19going to keep waking up when you don't 0:21want it to Siri is going to keep falling 0:24farther and farther behind advanced 0:26voice 0:27agents there is something about voice 0:29that seems to work with smaller builders 0:32that just doesn't work with the large 0:35companies Apple clearly is having 0:37trouble shipping a new version of Siri 0:39Amazon finally shipped Alexa plus but 0:42they're powering it with Claude so it's 0:45a voice mode but it's powered by a large 0:47language model built by a smaller 0:49independent uh model maker and now we 0:52have Sesame if you haven't heard 0:54sesame.com it's a new research uh and de 1:00velopment shop it's focused on building 1:02a personable AI companion what they've 1:05really focused on is the vocal 1:07inflections that humans perceive as 1:10human which means that talking to it 1:13feels like talking to a person it is so 1:16Eerie that people are having really 1:18strong reactions to it positive and 1:20negative uh when I've been talking to it 1:23my brain can't help but think it's human 1:26like it fools my brain at a subconscious 1:29level like I can tell myself this is a 1:30large language model that's fine but my 1:33brain doesn't buy it my brain is 1:35interacting with it like it's human 1:38because it sounds so human it pauses it 1:41will have the like little disregulation 1:44in speech that we humans have like uh or 1:47taking a breath or thinking about it yes 1:49this can take a breath isn't that weird 1:52so if you haven't had a chance try it 1:55they are scaling fast they're getting a 1:57lot of attention when I checked them 10 1:58minutes ago they were scaling so fast 2:01they couldn't pick up my call otherwise 2:02I would have done a live demo here with 2:04Maya um but yeah have a look at them I'm 2:07sure they'll be back up later 2:09today and I think that the thing that I 2:13am taking away from all of this is that 2:15voice is hard voice seems to work better 2:18built by small labs like Sesame or like 2:2011 Labs but voice is also highly 2:23valuable and so we should look for 2:25mergers and Acquisitions in this space 2:27we should look for more Partnerships 2:28like Amazon snap up sort of Claude as a 2:31partnership that they can work with for 2:32Alexa plus uh at the end of the day the 2:36big companies like Amazon like Google 2:39like Facebook they have 2:43distribution they have massive 2:45distribution and they can actually scale 2:47these voice Services whereas the little 2:49lab seem to be good at actually making 2:51voice services that work well it's funny 2:53I see a similar pattern with image 2:56generation as well where you have small 2:58companies like mid Journey doing really 3:00well with stable diffusion um other 3:02small companies working on 3:04video and large language model companies 3:07seem to be a bit different they may the 3:09best one still may be sort of 3:11independent companies for now like open 3:13AI or anthropic but at the end of the 3:16day they require a lot more Capital 3:18versus The Voice models or the image 3:20models and so they become larger 3:23faster anthropic just raised three and a 3:25half billion on a Series E I think 3:28yesterday 3:30so part of part of what I've been 3:32pondering is that the culture of 3:34innovation that we're seeing in this 3:36cycle for Tech is following some of the 3:39similar patterns we've seen in previous 3:41Cycles where smaller more Nimble 3:43companies that are startups are 3:45disrupting through technology and larger 3:48companies are effectively having to 3:50partner with or purchase that Innovation 3:53and we see that with voice we see that 3:55with large language models Etc so check 3:58out Sesame it's 4:00fun and uh Sesame apparently remembers 4:03what you say for two weeks so have a 4:05conversation come back have another 4:07conversation and uh see what happens 4:09cheers