Small Labs Lead Voice AI Innovation
Key Points
- Apple announced it won’t release an LLM‑powered Siri until at least 2027, meaning its voice assistant will continue lagging behind newer competitors.
- Amazon’s new Alexa Plus demonstrates a growing trend of major platforms partnering with smaller LLM creators, as it is powered by Anthropic’s Claude.
- Sesame (sesame.com), a fresh AI research shop, focuses on ultra‑human vocal inflection, producing a companion that convincingly tricks listeners into treating it like a real person.
- The speaker notes that while small labs (e.g., Sesame, 11 Labs) excel at building high‑quality voice experiences, big tech firms have the scale to deploy them, suggesting more M&A and partnership opportunities in the voice space.
- A parallel pattern emerges in other generative domains—image and video—where niche innovators (e.g., Midjourney, Stable Diffusion) lead the tech while large companies rely on distribution, and independent LLM leaders like OpenAI and Anthropic remain the top performers.
Full Transcript
# Small Labs Lead Voice AI Innovation **Source:** [https://www.youtube.com/watch?v=Vn9tVViLyRA](https://www.youtube.com/watch?v=Vn9tVViLyRA) **Duration:** 00:04:12 ## Summary - Apple announced it won’t release an LLM‑powered Siri until at least 2027, meaning its voice assistant will continue lagging behind newer competitors. - Amazon’s new Alexa Plus demonstrates a growing trend of major platforms partnering with smaller LLM creators, as it is powered by Anthropic’s Claude. - Sesame (sesame.com), a fresh AI research shop, focuses on ultra‑human vocal inflection, producing a companion that convincingly tricks listeners into treating it like a real person. - The speaker notes that while small labs (e.g., Sesame, 11 Labs) excel at building high‑quality voice experiences, big tech firms have the scale to deploy them, suggesting more M&A and partnership opportunities in the voice space. - A parallel pattern emerges in other generative domains—image and video—where niche innovators (e.g., Midjourney, Stable Diffusion) lead the tech while large companies rely on distribution, and independent LLM leaders like OpenAI and Anthropic remain the top performers. ## Sections - [00:00:00](https://www.youtube.com/watch?v=Vn9tVViLyRA&t=0s) **Voice AI Landscape Shifts** - The speaker highlights Apple’s Siri delay, Amazon’s Alexa+ powered by the smaller‑company Claude model, and Sesame.com’s hyper‑human vocal AI that blurs the line between machine and person. ## Full Transcript
this one is all about voice uh we're
going to start with apple apple has
apparently decided that they are not
going to be producing a new version of
Siri that is based on a smart large
language model until at least
2027 so more than two years away at this
point most likely which means Siri is
going to keep waking up when you don't
want it to Siri is going to keep falling
farther and farther behind advanced
voice
agents there is something about voice
that seems to work with smaller builders
that just doesn't work with the large
companies Apple clearly is having
trouble shipping a new version of Siri
Amazon finally shipped Alexa plus but
they're powering it with Claude so it's
a voice mode but it's powered by a large
language model built by a smaller
independent uh model maker and now we
have Sesame if you haven't heard
sesame.com it's a new research uh and de
velopment shop it's focused on building
a personable AI companion what they've
really focused on is the vocal
inflections that humans perceive as
human which means that talking to it
feels like talking to a person it is so
Eerie that people are having really
strong reactions to it positive and
negative uh when I've been talking to it
my brain can't help but think it's human
like it fools my brain at a subconscious
level like I can tell myself this is a
large language model that's fine but my
brain doesn't buy it my brain is
interacting with it like it's human
because it sounds so human it pauses it
will have the like little disregulation
in speech that we humans have like uh or
taking a breath or thinking about it yes
this can take a breath isn't that weird
so if you haven't had a chance try it
they are scaling fast they're getting a
lot of attention when I checked them 10
minutes ago they were scaling so fast
they couldn't pick up my call otherwise
I would have done a live demo here with
Maya um but yeah have a look at them I'm
sure they'll be back up later
today and I think that the thing that I
am taking away from all of this is that
voice is hard voice seems to work better
built by small labs like Sesame or like
11 Labs but voice is also highly
valuable and so we should look for
mergers and Acquisitions in this space
we should look for more Partnerships
like Amazon snap up sort of Claude as a
partnership that they can work with for
Alexa plus uh at the end of the day the
big companies like Amazon like Google
like Facebook they have
distribution they have massive
distribution and they can actually scale
these voice Services whereas the little
lab seem to be good at actually making
voice services that work well it's funny
I see a similar pattern with image
generation as well where you have small
companies like mid Journey doing really
well with stable diffusion um other
small companies working on
video and large language model companies
seem to be a bit different they may the
best one still may be sort of
independent companies for now like open
AI or anthropic but at the end of the
day they require a lot more Capital
versus The Voice models or the image
models and so they become larger
faster anthropic just raised three and a
half billion on a Series E I think
yesterday
so part of part of what I've been
pondering is that the culture of
innovation that we're seeing in this
cycle for Tech is following some of the
similar patterns we've seen in previous
Cycles where smaller more Nimble
companies that are startups are
disrupting through technology and larger
companies are effectively having to
partner with or purchase that Innovation
and we see that with voice we see that
with large language models Etc so check
out Sesame it's
fun and uh Sesame apparently remembers
what you say for two weeks so have a
conversation come back have another
conversation and uh see what happens
cheers